Wednesday, September 3, 2008

RAID Part 2, RAID 5

I am going to talk a bit today about the various configurations of RAID 5. As I have mentioned before, RAID is not a standard, but a concept of how to use multiple disks as a single device. First defined by the paper "A Case for Redundant Arrays of Inexpensive Disks (RAID)” by Gibson, Katz, and Patterson in 1988 by the CS powerhouse of the time…you guessed it, Berkley. However, the paper described the concept of RAID, it did not define a standard nor was the concept ever boiled down to one. As a result, RAID 5, or most any other RAID level, is implemented differently by each manufacturer. This makes it difficult indeed to recover an array when things go bad.
The concept of RAID 5 is striping with rotating parity. The idea is that data is written in blocks to an ordered disk set and one of the disks for every stripe contains the parity (simple XOR) of the data in the stripe. There are a bunch of sites out there that explain this, but the basic idea is that the data from each data block in the stripe is XOR’ed with the others, the result of which is written to the parity block. Thus, if a disk fails, the data from the remaining blocks can be XOR’ed to reconstruct the missing disk. The parity level is the number of data blocks plus the parity block, e.g. parity 3 (p3) is two data blocks plus parity. Since the parity block and its member blocks must each be located on a unique disk, the maximum number of blocks per pass is the number of disks minus one (n-1). While it is of the norm to see the parity level equal the number of disks in a set, it is not a requirement. Therefore, the parity level can equal any number between 3 and the number of disks in the array, which can cause al sorts of headaches for those trying to recover data off a dead array.

RAID 5 can be broken down into four different methods. There are certainly more variations on these methods which I will talk about in another article, but this is the best place to start. The methods are left asymmetrical, left symmetrical, right asymmetrical, and right symmetrical. For the purpose of the explanation, I am going to ignore block sizes and iteration (delayed rotation), which must be known or discovered to recover data.

Left asymmetrical is most employed by hardware RAID cards. It writes data to blocks n, n+1, n+2, … n-1 then parities those data blocks to the last block so that the resulting sequence is n, n+1, n+2…n-1, p. Parity is then rotated left (backwards) one block so that the sequence is n, n+1, n+2…p, n-1.

left-asymmetrical RAID 5, Parity 5, on 5 disks

This is probably the simplest form of RAID 5 to understand; however, from a performance standpoint, the system can read only from Parity-1 blocks before running to the possibility of reading a disk twice. This is where symmetrical RAID 5 comes in. Here this blocks are written as before, n, n+1, n+2...n-1,p; however, instead of the next block being written to next sequential device, it is written after the parity block (which is below the previous parity block) and the stripe is written in-line from that point on, until it wraps to the parity block. This method allows for all blocks in a stripe to be read.


left-symmetrical RAID 5, parity 5, on 5 disks

There are the right hand versions of the above as well. Right asymmetrical starts with the parity block at the first placement, and rotates right. The first data block in the stripe is always in the first placement (except when the parity block is located there, then it is incremented).


right-asymmetrical RAID 5, parity 5, on 5 disks
Right symmetrical writes the data blocks following the parity block, as with its left rotating brother, it writes data following the parity block and wraps around until it reaches the parity block again.


Right symmetrical RAID 5, parity 5, on 5 disks


Adding to the confusion
In the above examples, we looked at RAID 5, parity 5, on 5 disks. The parity level could be any integer greater than or equal to 3. To make things more difficult, the number of disks is not limited to parity level. They can number greater than or equal to the parity level. Lets take a look at a left-asymmetrical RAID 5, parity 4, on 5 disks.


Left asymmetrical RAID 5, parity 4, on 5 disks
Here the data is striped with three data blocks and a parity block, which is parity 4. However, in the second stripe, position 0 is on the first block or the 5th disk, so the beginning of the stripe starts there. Notice that if we did not know the configuration of the array, we would end up with 2 parity blocks in some rows.
In the next installment, I will talk about rotation iteration (delay), block sizes, and ways to determine parity levels. Stay tuned.
j
Powered by Qumana

No comments: