RAID in DBMS

Overview

RAID is a technique which is used to combine multiple disks together for more efficient storage of data across the disks. Some RAID techniques can also be used to reconstruct the data if it is lost.

Introduction to RAID in DBMS

Redundancy array of independent disk (RAID) is a way to combine multiple disk storages for increased performance, data redundancy and disk reliability.

What is the problem with single disk storage?

No backup Disk: If a single disk failure occurs, the whole system fails to perform. Due to data Redundancy if copy of data is stored in multiple disks then even if one disk failure occurs the system can still fetch data from the other redundant disk.
Performance: If large amount of data is stored in a single disk it can degrade the performance and effectiveness of the system. This can be solved by using multiple disks with Redundant data.

In RAID technique, the combined disks are considered as a single logical disk by the operating system. These individual disk uses different methods to store data. It depends on the type of RAID levels used.

RAID Levels in DBMS

The different RAID levels used in DBMS are :

RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
RAID 6

RAID 0

RAID 0 implements data striping. The data blocks are placed in multiple disks without redundancy. None of the disks are used for data redundancy so if one disk fails then all the data in the array is lost.

Let's understand RAID 0 implementation with an example :

DISK 0	DISK 1	DISK 2	DISK 3
10	11	12	13
14	15	16	17
18	19	20	21
22	23	24	25

Block '10,11,12,13' form a stripe. Also, no data block is being repeated in any disk.

Instead of placing one block of data in a disk, we can place more than one block of data in a disk and then move to another disk.

DISK 0	DISK 1	DISK 2	DISK 3
10	12	14	16
11	13	15	17
18	20	22	24
19	21	23	25

In above example, block 10 and 11 are first placed in DISK 0 and then we move to another disk to place further blocks.

Pros of RAID 0

All the disk space is utilized and hence performance is increased.
Data requests can be on multiple disks and not on a single disk hence improving the throughput.

Cons of RAID 0

Failure of one disk can lead to complete data loss in the respective array.
No data Redundancy is implemented so one disk failure can lead to system failure.

RAID 1

RAID 1 implements mirroring which means the data of one disk is replicated in another disk. This helps in preventing system failure as if one disk fails then the redundant disk takes over.

Let's understand RAID 1 implementation with an example :

DISK 0	DISK 1	DISK 2	DISK 3
10	10	12	12
14	14	16	16
18	18	20	20
22	22	24	24

Here Disk 0 and Disk 1 have the same data as disk 0 is copied to disk 1. Same is the case with Disk 2 and Disk 3.

Pros of RAID 1

Failure of one Disk does not lead to system failure as there is redundant data in other disk.

Cons of RAID 1

Extra space is required for each disk as each disk data is copied to some other disk also.

RAID 2

RAID 2 is used when error in data has to be checked at bit level, which uses a Hamming code detection method. Two disks are used in this technique. One is used to store bit of each word in the disk and another is used to store error code correction (Parity bits) of data words. The structure of this RAID is complex, so it is not used commonly.

We will learn about Hamming code error detection in computer networks.

Let's understand RAID 2 with an Example:

DISK 0	DISK 1	DISK 2	DISK 3	DISK 4	DISK 5
10	11	12	P(10)	P(11)	P(12)
14	15	16	P(14)	P(14)	P(16)
18	19	20	P(18)	P(18)	P(20)
22	23	24	P(22)	P(22)	P(24)

Here Disk 3, Disk 4 and Disk 5 stores the parity bits of Data stored in Disk 0, Disk 1, and Disk 2 respectively. Parity bits are used to detect the error in data.

Pros of RAID 2

It checks for error at a bit level for every data word.
One full disk is used to store parity bits which helps in detecting error.

Cons of RAID 2

Large extra space is used for parity bit storage.

RAID 3

RAID 3 implements byte-level striping of Data. Data is stored across disks with their parity bits in a separate disk. The parity bits helps to reconstruct the data when there is a data loss.

Let's see RAID 3 High-level implementation with an example:

DISK 0	DISK 1	DISK 2	DISK 3
10	11	12	P(10,11,12)
14	15	16	P(14,15,16)
18	19	20	P(18,19,20)
22	23	24	P(22,23,24)

Here Disk 3 contains the Parity bits for Disk 0 Disk 1 and Disk 2. If any one of the Disk's data is lost the data can be reconstructed using parity bits in Disk 3.

Pros of RAID 3

Data can be recovered with the help of parity bits.

Cons of RAID 3

Extra space for storing parity bits is used.

RAID 4

RAID 4 implements block-level striping of data with dedicated parity drive. If only one of the data is lost in any disk then it can be reconstructed with the help of parity drive. Parity is calculated with the help of XOR operation over each data disk block.

Let's see RAID 4 with an example:

DISK 0	DISK 1	DISK 2	DISK 3
0	1	0	P0
1	1	0	P1

Here P0 is calculated using XOR(0,1,0) = 1 and P1 is calculated using XOR(1,1,0) = 0 If there is even number of 1 then XOR is 0 and for odd numeber of 1 XOR is 1. If suppose Disk 0 data is lost, by checking parity P0=1 we will know that Disk 0 should have 0 to make the Parity P0 as 1 whereas if there was 1 in Disk 0 it would have made the parity P0=0 which contradicts with the current parity value.

Pros of RAID 4

Parity bits helps to reconstruct the data if at most one data is lost from the disks.

Cons of RAID 4

Extra space for Parity is required.
If there is more than one data loss from multiple disks then Parity cannot help us reconstruct the data.

RAID 5

RAID 5 is similar to RAID 4 with only one difference. The parity Rotates among the Disks.

Let's see an example of RAID 5 implementation:

DISK 0	DISK 1	DISK 2	DISK 3
0	1	0	P0
1	1	P1	0
1	P2	0	1
P3	1	0	0

Here We can see the rotation of Parity bits from Disk 3 to Disk 1.

Pros of RAID 5

Parity is distributed over the disk and makes the performance better.
Data can be reconstructed using parity bits.

Cons of RAID 5

Parity bits are useful only when there is data loss in at most one Disk. If there is loss in more than one Disk Block then parity is of no use.
Extra space for parity is required.

RAID 6

If there is more than one Disk failure, then RAID 6 implementation helps in that case. In RAID 6 there are two parity in each array/row. It is similar to RAID 5 with extra parity.

Let's See RAID 6 with the help of an example :

DISK 0	DISK 1	DISK 2	DISK 3
0	1	Q0	P0
1	Q1	P1	0
Q2	P2	0	1
P3	1	0	Q3

Here P0,P1,P2,P3 and Q0,Q1,Q2,Q3 are two parity to reconstruct the data if atmost two disks fail.

Pros of RAID 6

More parity helps in reconstructing at most 2 Disk data.

Cons of RAID 6

Extra space is used for both parities. (P and Q).
More than 2 disk failures can not be corrected.

Conclusion

RAID is used to backup the data when a disk fails for some reason.
RAID 0 implements data striping.
RAID 1 implements mirroring which creates redundant data.
RAID 2 uses Hamming code Error Detection method to correct error in data.
RAID 3 does byte-level data striping and has parity bits for each data word.
RAID 4 does block-level data striping.
RAID 5 has rotating parity across the disks.
RAID 6 has two parity which can handle at most two disk failures.