RAID Structure

RAID Structure

* The basic idea behind RAID is to employ a group of hard drives together with some form of duplication, either to increase reliability or to speed up operations, (or sometimes both.)

* RAID originally stood for Redundant Array of Inexpensive Disks, and was planed to use a bunch of cheap small disks in place of one or two larger more expensive ones. Today RAID systems engage large possibly expensive disks as their components, changing the definition to Independent disks.

Improvement in Performance via Parallelism

* There is also a performance merit to mirroring, particularly with respect to reads. Since every block of data is copied on many disks, read operations can be satisfied from any available copy, and multiple disks can be reading different data blocks at the same time in parallel. (Writes could possibly be fasten up as well through careful scheduling algorithms, but it would be hard in practice.)

* Another way of improving disk process time is with striping, which basically means spreading data out across multiple disks that can be accessed simultaneously.

•With bit-level striping the bits of each byte are cleared across multiple disks. For

example if 8 disks were included, then each 8-bit byte would be read in parallel

by 8 heads on separate disks. A single disk read would retrieve 8 * 512 bytes = 4K worth of data in the time normally required to read 512 bytes. Similarly if 4 disks

were involved, then two bits of each byte could be kept on each disk, for 2K

worth of disk process per read or write operation.

• Block-level striping extends a file system across multiple disks on a block-by-

block basis, so if block N were located on disk 0, then block N + 1 would be on disk 1, and so on. This is particularly useful when file systems are accessed in clusters of physical blocks. Other striping possibilities subsists, with block-level striping being the most common.

RAID Levels

* Mirroring gives reliability but is costly; Striping improves performance, but does not improve reliability. Accordingly there are a number of different methods that combine the principals of mirroring and striping in different ways, in order to balance reliability versus performance versus cost.

• Raid Level 0 - This level adds striping only, with no mirroring.

• Raid Level 1 -This level adds mirroring only, no striping.

• Raid Level 2 - This level keeps error-correcting codes on additional disks,

allowing for any damaged data to be rebuilt by subtraction from the

remaining undamaged data. Note that this method needs only three extra disks

to protect 4 disks worth of data, as opposed to complete mirroring. (The number ofdisks needed is a function of the error-correcting algorithms, and the means by which the particular bad bit(s) is (are) identified.)

• Raid Level 3 - This level is same to level 2, except that it takes advantage of the

fact that each disk is still doing itself error-detection, so that when an error

occurs, there is no question about which disk in the array has the not good data. As a result a single parity bit is all that is needed to recover the lost data from an array of disks. Level 3 also adds striping, which increases performance. The downside with the parity method is that every disk must take part in every disk

access, and the parity bits must be constantly calculated and checked, reducing performance. Hardware-level parity calculations and NVRAM cache can help with both of those issues. In practice level 3 is greatly preferred over level 2.

• Raid Level 4 - This level is same to level 3, employing block-level striping

instead of bit-level striping. The merits are that multiple blocks can be read

independently, and changes to a block only needs writing two blocks (data and

parity) rather than involving all disks. Note that new disks can be added seamlessly to the system gives they are initialized to all zeros, as this does not affect the parity results.

• Raid Level 5 - This level is equal to level 4, except the parity blocks are distributed over all disks, thereby more evenly balancing the load on the system. For any given block on the disk(s), one of the disks will grip the parity information for that block and the other N-1 disks will grip the data. Note that the same disk cannot hold both data and parity for the same block, as both would be lost in the event of a disk crash.

• Raid Level 6 - This level expands raid level 5 by storing multiple bits of error-

recovery codes, (such as the Reed-Solomon codes), for each bit position of data, rather than a single parity bit. In the example shown below 2 bits of ECC are

stored for every 4 bits of data, permitting data recovery in the face of up to two

simultaneous disk failures. Note that this still involves only 50% increase in storage needs, as opposed to 100% for simple mirroring which could only tolerate

a single disk failure.

* There are also two RAID levels which merge RAID levels 0 and 1 (striping and

mirroring) in various combinations, designed to provide both performance and reliability at the expense of increased cost.

• RAID level 0 + 1 disks are first striped, and then the striped disks mirrored to

another set. This level generally provides good performance than RAID level 5.

• RAID level 1 + 0 mirrors disks in pairs, and then stripes the mirrored pairs. The

storage capacity, performance, etc. are all the same, but there is an benefit to

this method in the event of multiple disk failures, as shown below:.

* In diagram (a) below, the 8 disks have been splitted into two sets of four, each of which is striped, and then one stripe set is used to mirror the other set.

-> If a single disk fails, it wash out the entire stripe set, but the system can keep on functioning using the remaining set.

-> However if a second disk from the other stripe set now fails, then the whole system is lost, as a result of two disk failures.

* In diagram (b), the same 8 disks are splitted into four sets of two, each of

which is mirrored, and then the file system is cleared across the four sets of mirrored disks.

-> If a single disk fails, then that mirror set is decreased to a single disk, but the system rolls on, and the other three mirror sets continue mirroring.

-> Now if a second disk fails, (that is not the mirror of the already failed disk), then another one of the mirror sets is decreased to a single disk, but the system can extend without data loss.

-> In fact the second arrangement could handle as many as four at a same time failed disks, as long as no two of them were from the same mirror pair.

Selecting a RAID Level

* Trade-offs in selecting the optimal RAID level for a particular application include cost, volume of data, need for reliability, need for performance, and rebuild time, the latter of which can affect the likelihood that a second disk will fail while the first failed disk is again build.

* Other decisions adds how many disks are involved in a RAID set and how many disks to protect with a single parity bit. More disks in the set increases performance but increases cost.Protecting more disks per parity bit saves cost, but increases the likelihood that a second disk will fail before the first bad disk is repaired.

Extensions

RAID concepts have been increased to tape drives (e.g. striping tapes for faster backups or parity

checking tapes for reliability), and for broadcasting of data.

Problems with RAID

* RAID protects against physical errors, but not against any number of bugs or other errors that could write erroneous data.

* ZFS adds an extra level of protection by increasing data block checksums in all inodes along with the pointers to the data blocks. If data are mirrored and one copy has the correct checksum and the other does not, then the data with the bad checksum will be replaced with a copy of the data with the good checksum. This increases reliability greatly over RAID alone, at a cost of a performance hit that is acceptable because ZFS is so fast to begin with.

* Another problem with general file systems is that the sizes are fixed, and relatively difficult to change. Where RAID sets are involved it becomes even harder to adjust file system sizes, because a file system cannot span across multiple file systems.

* ZFS solves these problems by pooling RAID sets, and by dynamically allocating space to file systems as needed. File system sizes can be restricted by quotas, and space can also be reserved to guarantee that a file system will be able to grow later, but these parameters can be changed at any time by the file system’s owner. Otherwise file systems increase and decrease dynamically as needed.

Mass storage structure

Swap space management

RAID structure

Stable storage implementation

PROBLEM SOLVING AND PYTHON PROGRAMMING QUIZ

1) What is the first step in problem-solving? A) Writing code B) Debugging C) Understanding the problem D) Optimizing the solution Answer: C 2) Which of these is not a step in the problem-solving process? A) Algorithm development B) Problem analysis C) Random guessing D) Testing and debugging Answer: C 3) What is an algorithm? A) A high-level programming language B) A step-by-step procedure to solve a problem C) A flowchart D) A data structure Answer: B 4) Which of these is the simplest data structure for representing a sequence of elements? A) Dictionary B) List C) Set D) Tuple Answer: B 5) What does a flowchart represent? A) Errors in a program B) A graphical representation of an algorithm C) The final solution to a problem D) A set of Python modules Answer: B 6) What is pseudocode? A) Code written in Python B) Fake code written for fun C) An informal high-level description of an algorithm D) A tool for testing code Answer: C 7) Which of the following tools is NOT commonly used in pr...

ENGINEERING BLOG

Search This Blog

PROBLEM SOLVING AND PYTHON PROGRAMMING QUIZ

RAID Structure

Labels

Popular posts from this blog

Abbreviations

PROBLEM SOLVING AND PYTHON PROGRAMMING QUIZ

Mathematics