Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

ASSIGNMENT NO.

1
Gohar Mumtaz Telecom Engg. UET Taxila

RAID:
RAID (redundant array of independent disks, originally redundant array of inexpensive disks) is a storage technology that combines multiple disk drive components into a logical unit. Data is distributed across the drives in one of several ways called "RAID levels", depending on what level of redundancy and performance (via parallel communication) is required. RAID is an example of storage virtualization and was first defined by David Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987. Marketers representing industry RAID manufacturers later attempted to reinvent the term to describe a redundant array of independent disks as a means of dissociating a low-cost expectation from RAID technology. RAID is now used as an umbrella term for computer data storage schemes that can divide and replicate data among multiple physical drives. The physical drives are said to be in a RAID array, which is accessed by the operating system as one single drive. The different schemes or architectures are named by the word RAID followed by a number (e.g., RAID 0, RAID 1). Each scheme provides a different balance between three key goals: resiliency, performance, and capacity. The standard RAID levels are a basic set of RAID configurations and employ striping, mirroring, or parity. The standard RAID levels can be modified for benefits. RAID 5 is only being discussed here.

RAID 5:
RAID 5 (block-level striping with distributed parity) distributes parity along with the data and requires all drives but one to be present to operate; the array is not destroyed by a single drive failure. Upon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user. However, a single drive failure results in reduced performance of the entire array until the failed drive has been replaced and the associated data rebuilt.

ASSIGNMENT NO. 1
Additionally, there is the potentially disastrous RAID 5 write hole. RAID 5 requires at least three disks. A RAID 5 uses block-level striping with parity data distributed across all member disks. RAID 5 has achieved popularity because of its low cost of redundancy. This can be seen by comparing the number of drives needed to achieve a given capacity. For an array of drives, with being the size of the smallest disk in the array, other RAID levels that yield redundancy give only a storage capacity of (for RAID 1), or (for RAID 1+0). In RAID 5, the yield is . For example, four 1 TB drives can be made into two separate 1 TB redundant arrays under RAID 1 or 2 TB under RAID 1+0, but the same four drives can be used to build a 3 TB array under RAID 5. Although RAID 5 may be implemented in a disk controller, some have hardware support for parity calculations (hardware RAID cards with onboard processors) while some use the main system processor (a form of software RAID in vendor drivers for inexpensive controllers). Many operating systems also provide software RAID support independently of the disk controller, such as Windows Dynamic Disks, Linux mdadm, or RAID-Z. In most implementations, a minimum of three disks is required for a complete RAID 5 configuration. In some implementations a degraded RAID 5 disk set can be made (three disk set of which only two are online), while mdadm supports a fully functional (non-degraded) RAID 5 setup with two disks - which functions as a slow RAID-1, but can be expanded with further volumes. In the example, a read request for block A1 would be serviced by disk 0. A simultaneous read request for block B1 would have to wait, but a read request for B2 could be serviced concurrently by disk 1.

Diagram of a RAID 5 setup with distributed parity with each color representing the group of blocks in the respective parity block (a stripe). This diagram shows left asymmetric algorithm

ASSIGNMENT NO. 1
RAID 5 parity handling A concurrent series of blocks - one on each of the disks in an array - is collectively called a stripe. If another block, or some portion thereof, is written on that same stripe, the parity block, or some portion thereof, is recalculated and rewritten. For small writes, this requires:

Read the old data block Read the old parity block Compare the old data block with the write request. For each bit that has flipped (changed from 0 to 1, or from 1 to 0) in the data block, flip the corresponding bit in the parity block Write the new data block Write the new parity block

The disk used for the parity block is staggered from one stripe to the next; hence the term distributed parity blocks. RAID 5 writes are expensive in terms of disk operations and traffic between the disks and the controller. The parity blocks are not read on data reads, since this would add unnecessary overhead and would diminish performance. The parity blocks are read, however, when a read of blocks in the stripe fails due to failure of any one of the disks, and the parity block in the stripe are used to reconstruct the errant sector. The CRC error is thus hidden from the main computer. Likewise, should a disk fail in the array, the parity blocks from the surviving disks are combined mathematically with the data blocks from the surviving disks to reconstruct the data from the failed drive on-the-fly. This is sometimes called Interim Data Recovery Mode. The computer knows that a disk drive has failed, but this is only so that the operating system can notify the administrator that a drive needs replacement; applications running on the computer are unaware of the failure. Reading and writing to the drive array continues seamlessly, though with some performance degradation. RAID 5 recovery issues In the event of a system failure while there are active writes, the parity of a stripe may become inconsistent with the data. If this is not detected and repaired before a

ASSIGNMENT NO. 1
disk or block fails, data loss may ensue as incorrect parity will be used to reconstruct the missing block in that stripe. This potential vulnerability is sometimes known as the write hole. Battery-backed cache and similar techniques are commonly used to reduce the window of opportunity for this to occur. The same issue occurs for RAID-6. RAID 5 performance RAID 5 implementations suffer from poor performance when faced with a workload that includes many writes that are not aligned to stripe boundaries, or are smaller than the capacity of a single stripe. This is because parity must be updated on each write, requiring read-modify-write sequences for both the data block and the parity block. More complex implementations may include a non-volatile write back cache to reduce the performance impact of incremental parity updates. Large writes, spanning an entire stripe width, can however be done without read-modifywrite cycles for each data + parity block but only if they are stripe aligned, by simply overwriting the parity block with the computed parity since the new data for each data block in the stripe is known in its entirety at the time of the write. This is sometimes called a full stripe write. Random write performance is poor, especially at high concurrency levels common in large multi-user databases. The read-modify-write cycle requirement of RAID 5's parity implementation penalizes random writes by as much as an order of magnitude compared to RAID 0. Performance problems can be so severe that some database experts have formed a group called BAARF the Battle Against Any Raid Five. The read performance of RAID 5 is almost as good as RAID 0 for the same number of disks. Except for the parity blocks, the distribution of data over the drives follows the same pattern as RAID 0. The reason RAID 5 is slightly slower is that the disks must skip over the parity blocks. RAID 5 latency When a disk record is randomly accessed there is a delay as the disk rotates sufficiently for the data to come under the head for processing. This delay is called latency. On average, a single disk will need to rotate 1/2 revolution. Thus, for a 7200 RPM disk the average latency is 4.2 milliseconds. In RAID 5 arrays all the disks must be accessed so the latency can become a significant factor. In a RAID 5

ASSIGNMENT NO. 1
array, with n randomly oriented disks, the mean latency is revolutions and the median latency is . In order to mitigate this problem, well-designed RAID systems will synchronize the angular orientation of their disks. In this case the random nature of the angular displacements goes away, the average latency returns to 1/2 revolution, and a savings of up to 50% in latency is achieved. Since solid state drives do not have disks, their latency does not follow this model. Effect of Angular Desynchronization Number of Mean Latency Median Latency Disks (Rev) (Rev) 1 0.50 0.50 2 0.67 (+33%) 0.71 (+41%) 3 0.75 (+50%) 0.79 (+59%) 4 0.80 (+60%) 0.84 (+68%) 5 0.83 (+67%) 0.87 (+74%) 6 0.86 (+71%) 0.89 (+78%) 7 0.88 (+75%) 0.91 (+81%) 8 0.89 (+78%) 0.92 (+83%) RAID 5 usable size Parity uses up the capacity of one drive in the array. (This can be seen by comparing it with RAID 4: RAID 5 distributes the parity data across the disks, while RAID 4 centralizes it on one disk, but the amount of parity data is the same.) If the drives vary in capacity, the smallest one sets the limit. Therefore, the usable capacity of a RAID 5 array is , where is the total number of drives in the array and is the capacity of the smallest drive in the array. The number of hard disks that can belong to a single array is limited only by the capacity of the storage controller in hardware implementations, or by the OS in software RAID. One caveat is that unlike RAID 1, as the number of disks in an array increases, the probability of data loss due to multiple drive failures also increases. This is because there is a reduced ratio of "losable" drives (the number of drives that can fail before data is lost) to total drives.

You might also like