Comparison of Various Raid Systems

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 27

“COMPARISON OF VARIOUS RAID SYSTEMS”

Storage Technologies (ITE 2009)


Slot: D2

By
Ashish Tiwari -16BIT0005

Submitted To
PROF. SIVA RAMA KRISHNAN S

SCHOOL OF INFORMATION TECHNOLOGY AND


ENGINEERING

April, 2019
ABSTRACT

The constant evolution of new varieties of computing systems have necessitated a


growing need for highly reliable, high performing, available and secure storage systems.
While CPU performance has scaled with Moore’s Law, data storage is much less. One
method of improving storage performance is through the use of special storage structures.
Such architectures often include redundant arrays of independent disks (RAID). RAID
provides a meaningful way to increase storage performance on a variety of levels. The
fastest performing arrays, however, come at the expense of data reliability, preventing
their use in many applications. RAID provides an additional benefit of security that is not
currently provided by similar storage architectures. In this project, various levels of RAID
and the current field of RAID are outlined. Furthermore, the benefits of various
combinations of RAID configurations are also discussed
INTRODUCTION

With the increasing amounts of data produced worldwide every day, storing the data
becomes a major issue. Many secure systems like online banking and other government
services, the need for highly reliable and available online storage has increased. While
most of these systems depend on the cloud for data storage, cloud computing comes with
many disadvantages like slow data access and risk of the cloud server downtime. In
addition, maintaining data privacy is another issue.

RAID, or Redundant arrays of independent disks provide benefits which enable users to
circumvent these issues. RAID involves combining drives in various patterns and using
them instead of operating them individually. Three metrics govern the choice of which
RAID level is chosen: performance, reliability, and feasibility. Apart from RAID 0, the
other RAIDs provide redundancy, thus improving data reliability and performance,
especially the read- speeds. The primary benefits of using RAID are performance
improvements, resiliency and low costs.

 Performance improvement is achieved because the server has more spindles to


read from or write to when data is accessed from a drive
 Availability and resiliency are increased because the RAID controller can recreate
lost data from parity information
OBJECTIVES

 To discuss various RAID configurations, both basic and nested


 To discuss the advantages and disadvantages for each RAID configuration
 To identify the effective usage of the respective RAIDs
 To find the best RAID configuration based on the users’ needs
 RAID rebuild methods
 Analysis of RAID reliability
OVERVIEW OF RAID CONFIGURATIONS

RAID 0:
RAID 0 involves striping of data across two or more devices. Striping the data breaks it up
into chunks which is then written to each disk in the array. By using multiple disks, this level
offers superior input-output performance which can be further increased merely by using
multiple controllers.

Advantages:

 Great performance for both read and write operations


 No overhead caused by parity controls
 Easy to implement
 Low Hardware requirement

Disadvantage:

 Not fault-tolerant. Since data is split up, failure of a single device will bring down the
total system
Ideal use:
RAID 0 is best for storage of data that have to be written at high speed such as video editing
or image retouching

RAID 1:
In RAID 1, data is stored twice(Mirroring) as they are written to a set of data drives along
with a mirror drive. In case of a drive failure, the controller uses either of the data drives for
data recovery.
Advantages:

 Offers excellent read and write speed


 Fault tolerant and easy data
recovery Disadvantages:

 Storage capacity is only half of the total drive capacity


 Do not allow hot swapping of a failed drive
Ideal use:
RAID 1 is best for mission critical storage and is suitable for small servers.

RAID 2:
RAID 2 implementation is carried out through splitting facts on the basis of bits and
spreading it over some of records disks and a number of redundancy disks. In RAID 2,
information isn't stripped at blocks, but at the extent of bits. This is very efficient in
ascertaining the single bit corruption. As a result, this RAID degree affords a very excessive
statistics transfer rate.

Advantages:

 Excessive facts transfer costs.


 Corruption can be detected with ease.
 Easy “on the fly” information blunders correction
 Easy controlled layout in comparison to RAID 3,4 and 5
Disadvantages:

 High priced and often calls for many drives


 Controllers required are complicated and expensive.
 Much less efficient than RAID five and RAID 6 with decrease in performance and
reliability.
 Very high ratio of ECC disks to statistics disks with smaller phrase sizes - inefficient.
 Entry level cost very high - calls for very high transfer rate requirement to justify
 No industrial implementations exist / not commercially
feasible Ideal use:
Since RAID 2 has a lot of disadvantages, using it is not recommended.

RAID 3:
RAID 3 makes use of striping at the byte stage and stores committed parity bits on a separate
disk power. RAID 3 requires a special controller that allows for the synchronized spinning of
all disks. It stripes the bits, which might be stored on one-of-a-kind disk drives. This
configuration is used less usually than other RAID ranges.

Advantages:

 Very excessive read records switch charge


 Very excessive Write records transfer charge
 Excessive throughput for moving massive amounts of statistics
 Resistant to disk failure and breakdown
 Low ratio of ECC (Parity) disks to facts disks means excessive performance
Disadvantages:

 Transaction rate equal to that of a single disk power at first-rate (if spindles
are synchronized)
 Controller design is complex.
 The configuration may be an excessive amount of if a small report switch is the
most effective requirement.
 Disk screw ups may drastically lower throughput.
 Very difficult and useful resource in depth to do as a "software" RAID

RAID 4:
RAID 4 uses block-level data striping and a single dedicated disk for storing parity bits. It
does not require synchronized spinning, and each disk functions independently when single
data blocks are requested. This is in contrast to RAID 3, which stripes at block-level, versus
bit-level. It also does not distribute parity bits. This configuration requires at least three disks.
Data or files may be distributed among multiple, independently operating drives. This
configuration facilitates parallel input/output (I/O) request performance. However, when
parity bits are stored in a single drive for each block of data, system bottlenecks may result.
When this occurs, system performance depends on parity drive performance.

Advantages:

 Data block striping, which facilitates simultaneous I/O requests


 Low storage overhead, which lowers as more disks are added
 Does not require synchronized spindles or
controller Disadvantages:

 Parity drives may lead to bottlenecks


 Slow random writes, which result when a parity must be separately written for
each write
Ideal use:
This RAID level is suitable in situations requiring good read performances
RAID 5:
RAID 5 is a cluster-level (typically 64 thousand bytes blocks) implementation of data striping
with distributed parity for enhanced performance. The clusters and parity are evenly
distributed across multiple hard drives which provides better performance than using a single
drive for parity. Although RAID 5 can be achieved in software, a hardware controller is the
recommended one. Extra cache memory is often used on the controllers to improve the write
performance.

Advantages:

 Good fault tolerance. Can tolerate loss of one drive.


 Hot sparing and automatic rebuild features are
good. Disadvantages:

 Drive failures have an effect on throughput


 Complex technology
Ideal use:
RAID 5 is best suited for transaction processing and is often used for “general purpose”
service, as well as for relational database applications and other business systems.

RAID 6:
RAID 6 is an extension of RAID level 5 that implements fault tolerance by using a second
independent distributed parity scheme (dual parity). Like in RAID level 5 data is striped on
block level across a set of drives, and a second set of parity is calculated and written across
all the drives; This RAID level provides data fault tolerance and can sustain multiple
simultaneous drive failure. It also gives protection against multiple bad block failures
Advantages:

 Perfect solution for mission critical applications.


 It is robust, as it makes use of an additional disk for parity.
 Performance is good.
 RAID 6 reads from all the drives, so reading is faster
Disadvantages:

 Writing is poor as it has to stripe over multiple disks


 It is expensive as it requires two independent drives for parity functions.
 No data loss, even after two disk fails. We can rebuild from parity after replacing
the failed disk.
Ideal use:
RAID 6 is preferable over RAID 5 in file and application servers that use many large drives
for data storage.

RAID 7:
In RAID 7, all input-output transfers are asynchronous, independently controlled and cached
including host interface transfers. All reads and write are centrally cached via the high-speed
x-bus and dedicated parity drive can be on any channel. This is a fully implemented process
oriented real time operating system resident on embedded array control microprocessor. In
RAID 7, parity generation is integrated into the cache.

Advantages:

 Overall write performance is 25% to 90% better than single spindle performance
 Host interfaces are scalable for connectivity or increased host transfer bandwidth
 Small reads in multi user environment have very high cache hit rate resulting in
near zero access times
 Write performance improves with an increase in the number of drives in the array
 No extra data transfers required for parity
manipulation Disadvantages:

 One vendor proprietary solution


 Extremely high cost per MB
 Very short warranty
 Not user serviceable
Ideal use:
Suitable for open-source ZFS implementations

HIGHER-ORDER RAID CONFIGURATIONS

Higher-order RAID configurations are also known as nested RAID configurations. They
combine multiple RAID levels within a single array. Theoretically, any RAID levels can be
combined with any other level. However, not all of these levels are meaningful to improve
system functionality. The table below is a comparison of the various nested RAID
configurations.

Level Description Minimum Space Fault tolerance


number efficiency
of drives
Min Max

RAID 01 Block-level striping, 4 1 / stripes stripes − n − n / stripes


and mirroring 1
without parity

RAID 10 Mirroring without 4 stripes / n m−1 (m − 1)


parity, and block- × stripes
level striping

RAID 03 Block-level striping, 6 1−1 1 n / stripes


and byte-level / stripes
striping with
dedicated parity

RAID Mirroring without 8 (stripes − 2) 3×m− n − (stripes −


parity, and block- /n 1 2)
1+6 level striping with
double distributed
parity

RAID 50 Block-level striping 6 1 1 stripes


with distributed − stripes / n (One per
parity, and block- stripe)
level striping

RAID 60 Block-level striping 8 1−2 2 2 × stripes


with double × stripes / n (Two per
distributed parity, stripe)
and block-level
striping

RAID Mirroring without 8 1/m m−1 (m − 1)


parity, and two levels × stripes
100 of block-level
striping
LITERATURE REVIEW

RAID 0, or striping, utilizes two physical hard drives. Every other block is written to
the hard drive which, in theory, could double the performance of a single hard drive.
However, this type of RAID increases the risk of a data loss, since all the data is
practically lost if one hard drive fails. This is also a reason why this is not so widely
used in homes and especially in production use in companies. RAID 1, or mirroring,
utilizes two physical hard drives. Everything is written to both hard drives, which
decreases significantly the risks of data loss. However, the capacity is only half of the
actual capacity, which makes this type of RAID to be 12 quite expensive. RAID 1 is a
commonly used in home environment. RAID 5 utilizes three or more physical hard
drives. Practically it lowers the risk of data loss, since in three disks RAID it is possible
to have a single hard disk failure without losing any data. For example, in RAID 5 with
six disks, the capacity is the sum from five disks and one of the disks contains the
necessary data to rebuild the data after a single hard disk failure. This setup is quite
common in server environment. (Thompson & Thompson, 2011, p. 142.) RAID 10 is a
stacked RAID which can be said to be RAID 0+1, RAID 1+0 or RAID 10. It uses four
hard drives which are arranged so that there are two separate RAID 1 setups, which are
emerged into one RAID 0 setup, so basically it utilizes two different RAID layers. This
setup is fairly common in server environment.
OBSERVATION:

1. Raid level tested:


a. RAID 0
b. RAID 10
c. RAID 5
d. RAID 6
2. Test Setup
a. CPU: Intel Core i3-6006U @ 2.00Ghz
b. RAM: 4 GB
c. Operating System: Windows 10 pro
d. Drivers: 2 X 500 GB, 3-Removal device
e. File System: NTFS
f. Queue depth: 4
g. Test script that generates RAID arrays, file system and run the test
h. Cache: All write caching was disabled during the test
Test Result:
I/O Latency
Max. Read and Max. write:
Analysis

With this kind of testing, there are so many variables that it will be difficult to
make any solid observations. But these results are interesting.

Results are in line with reality


First of all, the results do not seem unexpected. Six drives at 7200 RPM should
each provide about 75 IOPS. This should result in a total of 450 IOPS for the
entire array. The read performance does show exactly this kind of performance.

With all caching disabled, write performance is worse. And especially the
RAID levels with parity (RAID 5 and RAID 6) show a significant drop in
performance when it comes to random writes. RAID 6 write performance got so
low and erratic that I wonder if there is something wrong with the driver or the
setup. Especially the I/O latency is off-the-charts with RAID 6, so there must be
something wrong.

Read performance is equal for all RAID levels


However, the most interesting graphs are about IOPS and latency. Read
performance of all different RAID arrays is almost equal. RAID 10 seems to
have the upper hand in all read benchmarks. I'm not sure why this is. Both
bandwidth and latency are better than the other RAID levels. I'm really curious
about a good technical explanation about why this should be expected. Edit:
RAID10 is basically multiple RAID 1 sets stuck together. Data is striped across
RAID 1 sets. When reading, a single stripe can be delivered by both disks in the
particular RAID mirror it resides on, thus there is a higher risk that one of the
heads is in the vicinity of the requested sector.
RAID 0 is not something that should be used in a production environment, but it
is included to provide a comparison for the other RAID levels. The IOPS graph
regarding write performance is most telling. With RAID 10 using 6 drives, you
only get the effective IOPS of 3 drives, thus about 225 IOPS. This is exactly
what the graph is showing us.
Raid with parity suffers regarding write performance
RAID 5 needs four write I/Os for every application-level write request. So with
6 x 75 = 450 IOPS divided by 4, we get 112,5 IOPS. This is also on par with the
graph. This is still ok, but notice the latency: it is clearly around 40
milliseconds, whereas 20 milliseconds is the rule of thumb where performance
will start to significantly degrade.

RAID 6 needs six write I/Os for every application-level write request. So with
450 IOPS total, divided by 6, we only have single-disk performance of 75 IOPS.
If we average the line, we do approximately get this performance, but the
latency is so erratic that it would not be usable.

RAID chunk size and performance


So I was wondering if the RAID array chunk size does impact random I/O
performance. It seems not.

Conclusion
 Overall, the results seem to indicate that the actual testing itself is
realistic. We do get figures that are in tune with theoretical results.
 The erratic RAID 6 write performance would need a thorough
explanation, one that I can't give.
 Based on the test results, it seems that random I/O performance for a
single test file is not affected by the chunk size or stripe size of an RAID
array.
 The results show to me that my benchmarking method provides a nice
basis for further testing.
 Removable drives cannot be used for partition because RAID require
dynamic types of disk and removable disk cannot be converted into
dynamic because once the drive is removed all the volumes is crashed.
Cost and Performance comparison:

RAID Small Small Write Large Large Storage


TYPE Read Read Write Efficiency
0 1 1 1 1 1
1 1 1/2 1 1/2 1/2
3 1/G 1/G (G-1)/G (G-1)/G (G-1)/G
5 1 Max(1/G,1/4) 1 (G-1)/G (G-1)/G
6 1 Max(1/G,1/6) 1 (G-2)/G (G-2)/G
Conclusion:
 The performance/cost of RAID level 1 is equivalent to the the
performance/cost of RAID level 5 system when the parity group size is
equal to 2
 The performance/cost of RAID level 3 system is always less then or equal
to the performance/cost of RAID level 5 systems.
 The question of which RAID level to use is better expressed as more
general configuration question concerning the size of the parity group and
striping unit. For a parity group size of 2, mirroring is desirable, while for
a very small striping unit RAID level 3 would be suited
RAID RELIABILITY ANALYSIS

1.RAID 0
In RAID 0, since distribution of data across multiple disk drives is equally
sized(striping), the drives are considered to be in series. The mathematical
relationship that evaluates the reliability of a six-disk RAID-0 array is the
product of the individual HDD reliabilities.

2.RAID 1
RAID 1 uses mirroring and requires at least two HDDs to implement. In this
configuration, one HDD can fail in a paired set without loss of data. The
reliability can thus be given as:

3.RAID 0+1
In this, data is striped to one disk set and then mirrored to another disk set.
RAID-0+1 requires a minimum of four HDDs to implement. The reliability
would be:

4.RAID 1+0
In this configuration, data is striped across mirrored sets of drives. RAID 1+0
also requires a minimum of four drives to implement. The reliability would be:
5.RAID 3
The RAID 3 controller calculates parity information and stores it to a dedicated
parity HDD. This requires a minimum of 3 drives to implement. The reliability
would be:

6.RAID 4
RAID-4 is identical to RAID-3, except that it accommodates larger chunks.
Thus, the reliability is also the same:

7.RAID 5
RAID-5 is similar to RAID-4 except that the parity data is striped across all
HDDs instead of written on a dedicated HDD. Thus, the reliability is also the
same:
Refernces

[1]Delmar, Michael Graves (2003). "Data Recovery and Fault Tolerance". The
Complete Guide to Networking and Network+. Cengage Learning. p. 448.
ISBN 1-4018-3339-X.
[2]Mishra, S. K.; Vemulapalli, S. K.; Mohapatra, P. (1995). "Dual-Crosshatch
Disk Array: A
Highly Reliable Hybrid-RAID Architecture". Proceedings of the 1995
International
Conference on Parallel Processing: Volume 1. CRC Press. pp. I-146ff. ISBN 0-
8493-2615-X.
[3]Layton, Jeffrey B. (2011-01-06). "Intro to Nested-RAID: RAID-01 and
RAID-10". LinuxMag.com. Linux Magazine. Retrieved 2015-02-01.
[4]"20.2. The Z File System (ZFS)". freebsd.org. Archived from the original on
2014-07-03. Retrieved 2014-07-27.
[5]"Double Parity RAID-Z (raidz2) (Solaris ZFS Administration Guide)".
Oracle Corporation. Retrieved 2014-07-27.
[6]"Triple Parity RAIDZ (raidz3) (Solaris ZFS Administration Guide)". Oracle
Corporation. Retrieved 2014-07-27.

You might also like