Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

USING NAND FLASH MEMORY TO IMPROVE THE PERFORMANCE OF HDDS

Huang-Te Hsu and Ying-Wen Bai

Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University


Taipei, Taiwan, 24205, R.O.C., 497598030@mail.fju.edu.tw
Department of Electronic Engineering, Fu Jen Catholic University
Taipei, Taiwan, 24205, R.O.C., bai@ee.fju.edu.tw

ABSTRACT even though the CPU operating speed is extremely fast, the
efficiency of the whole system is being limited due to the
Recently the Hard Disk Driver (HDD) manufacturers have mechanical access head movements of the HDD. The access
begun using the Dynamic Random Access Memory (DRAM) speed of the HDD, therefore, becomes the bottleneck of the
as cache memory to improve the HDD performance. In this whole system. The system wastes lots of time waiting while
paper we use an additional NAND Flash memory as a the access head of the HDD is searching for the programs
second level cache memory for that purpose. The NAND and the data. From this aspect, in order to increase the
Flash memories have some advantages over both the HDD overall system performance, we improve both the HDD
and the DRAM cache, such as faster access time, smaller performance and the system performance.
size and larger capacity.
First, we design a Solid State Disk (SSD) by using the
NAND Flash memory. The SSD is connected to a PC
system by means of the Serial Advanced Technology
Normalized Performance

Attachment (SATA) bus. Then, by using software, we


combine the HDD and the SSD into one storage device of
the PC. The Operating System (OS) will save files to the
different storage devices according to the sizes and uses of
the files. When the user wants to open the frequently used
programs, the system reads data from the SSD in order to
avoid any HDD mechanical access head delay. We utilize
this mixed architecture to improve the performance of the
whole PC system.

Index Terms—Hard Disk Driver, NAND Flash


Memory, Cache Memory, Solid State Disk, Hybrid HDD
Figure 1. Normalized performance of CPU and HDD
1. INTRODUCTION
To do this, by building a RAID system, using parallel
Figure 1, which shows the development of CPU and I/O access to increase the data throughput, we improve the
HDD performance over a period of 13 years, illustrates that average read and write performance by 50% to 70% [3].
the performance has been improved more than 60 times for However, the read and write speed is limited by the
the single core CPUs, and more than 175 times for the multi mechanical access head movements of the HDD which
core CPUs within 13 years, but the storage device of the decreases performance through its random access delay.
PC-mechanical HDD by only 1.3 times within the same There is some improvement of the system performance by
period [1]. Thus the access speed of the HDD has become speeding up the disk, increasing the number of access heads,
not only a bottleneck for the whole system but also a improving the access head’s moving speed and adding more
limitation of any performance improvement of the PC disks [4]. Though the performance of the HDD is actually
system [2]. improved by those modifications, the access head delay is
The architecture of a modern PC stores the OS and the still present because the HDD is based on a mechanical
application programs in the HDD. While the system boots structure, thus limiting the whole system’s performance
up, the CPU must first load programs and data from the through the delay caused by the access head’s mechanical
HDD into the system memory. Based on this relationship, movements.

978-1-4244-5377-1/10/$26.00 ©2010 IEEE


Recently some storage device vendors announced that IC manufacturing process. The prices of 2GB storage media
there are no more HDD in the feature PC. The latest storage are approximately US$4 for NAND Flash memory and
technology, “Racetrack”, provides faster and higher US$40 for DRAM (2009). The maximum storage capacities
capacity storage media to replace the mechanical HDD. The of every single chip are 64Gb for NAND Flash memory and
differences from the HDD are that the Racetrack technology 2Gb for DRAM [10]. Currently the HDDs only have a
no longer uses a spinning hard disk as the storage medium DRAM cache size ranging from 8 to 32 MB [11]. Because
and mechanical access heads with their access delay time. of its small cache size, the DRAM cannot increase the
Instead it uses tiny “nano-pipes” of magnetic materials as performance when the OS requires a huge amount of data to
the storage media. The data are stored in the nano-pipe by be either loaded or transferred. Two other techniques which
magnetic type where they can be moved by the driver. The have been released to accelerate HDD performance are
fixed read/write access reads and writes the data from the Hybrid HDD [12] and Turbo Memory [13], both of which
changing magnetic signals when the data are moving. The use the NAND Flash memory. In comparison with the
nano-pipe is made of U-shaped wires looking like a DRAM, the NAND Flash memory has all the advantages of
racetrack, thus the name “Racetrack” technology. But this the DRAM, but also a larger size [14], currently about 8 to
new technology is only developing and may be 16 GB, which is almost 128 times larger than that of the
implemented in the PC world during the next 10 years [5]. DRAM. Therefore the NAND Flash memory can be used as
Some studies have used cache memory to improve the the level 2 cache of the HDD to improve the overall
performance of the HDD [6], both by means of various performance of a PC, as shown in Figure 2.
mathematical calculations and by prefetching the data into In this paper, we add the NAND Flash memory as the
the cache memory [7]. This technique reduces the access level 2 cache to the HDD structure to decrease the
delay of the HDD and improves the performance by better mechanical access delay. The NAND Flash memory is used
mathematical calculations, raising the cache memory hit rate as storage medium to design an SSD and also to combine
[8]. On the other hand, when the OS is loading data, a larger SSD and HDD as a hybrid storage device to improve the
cache memory can prefetch more data for the OS to use. In system performance.
other words, the OS stores data into the DRAM as a buffer In Section II we use a Serial Advanced Technology
until the HDD traffic is free, and then writes back to the Attachment Solid State Disk (SATA SSD) as the cache of
HDD. The size of the DRAM is a significant key to the HDD, in Section III we measure the performance of the
improving the performance of the HDD [9]. Increasing the SATA SSD as the cache of the HDD, and in Section 4 we
size of the DRAM can obviously improve this performance, describe the test results. The final Section draws the
but using the DRAM as a cache buffer is being limited by conclusion and provides a look at future plans.
the capacity of the IC manufacturing process.
2. SYSTEM ARCHITECTURE

According to the idea mentioned in the previous section,


we design a SATA SSD as in Figure 3 [15]. We use the
SATA to NAND Flash memory control chip to transfer the
NAND Flash memory [16] as a SATA interface [17] and
connect to the system.

Figure 3. Block diagram of the SATA SSD


There are two types of the NAND Flash memory
structures. One is Single Level Cell (SLC) and the other is
Figure 2. Block diagram of storage device cache system Multi Level Cell (MLC). For the SLC structure, each
A simple way to improve the performance of the HDD is NAND Flash memory unit only can store one bit, but the
to enlarge its DRAM cache memory. But there are still some MLC can store two or more bits. The MLC can store more
difficulties in implementing the large-size DRAM. One is information than the SLC at a cost lower than the SLC, but
the cost concern, and another is the limited capacity of the the MLC’s access speed is lower than the SLC’s due to the
MLC’s need for a controller to deal with the several levels,
the ECC function and decoding the information in a single To synthesize these techniques, we use the SATA SSD
cell. which is connected to the SATA interface as the cache of
In Table I, the first and the second device are the SLC the SATA HDD. This kind of architecture can be applied in
devices; the third device is the MLC device. The main the system which is connecting the HDD to the SATA
differences between these two type devices are size and interface. We enlarge the cache memory capacity and
access speed. The MLC is larger than the SLC, but the SLC improve the throughput of NAND Flash by using the 64bit
is faster than the MLC. In our case we use the SLC as the NAND Flash memory interface. The capacity can be
testing storage device because of its greater access speed. enlarged by using 8 NAND Flash memories; and the
throughput can be improved by the 8 channel parallel I/O
Table I. Comparison of NAND Flash memories
interface. Figure 4 shows the architecture.
Item! JS29F04F08B K9K4G08Q0M K9G8G08U0M There are two individual storage devices in our design,
Random Read both managed by the OS: the SATA SSD and the SATA
(s)! 25 25 60 HDD. When the OS accesses the data from these two
Sequential Read devices, they cannot accelerate each other. We need to use
(ns)! 30 50 30
the acceleration software to link them as one hybrid storage.
Page program When the OS accesses the hybrid storage by means of the
(s)! 300 300 800
driver, it accelerates its access speed.
Block erase
(ms)! 2 2 1.5
Endurance
(cycle)! 100k 100k 10k
Data retention
(year)! 10 10 10

After we compare the NAND Flash memory techniques,


the Hybrid HDD is readily integrated with the NAND Flash
memory into the HDD control board. When the capacity of
the NAND Flash is 128MB to 256MB, the performance
improvement has been limited, because the capacity of the
NAND Flash memory is not enough for the huge amount of
data. The Turbo Memory technique connects the NAND
Flash memory to the PCI Express (PCIE) interface [18].
When the capacity of the NAND Flash is 512MB to 2GB,
the capacity is just enough for the huge amount of data. In
this case, the performance improvement has been limited by
the throughput, because the throughput of NAND Flash
memory is not enough for the huge amount of data. The
other drawback of Turbo Memory technique is only
designed to be use in a specific system. Figure 5. Combination of SATA SSD and HDD
Figure 5 illustrates how the OS accesses the storage
device by the acceleration software. The SATA SSD plays a
role as the cache of the SATA HDD. The acceleration
software manages the two storage devices more efficiently.

3. EXPERIMENT

Table II shows the comparison of the performance of


storage devices. The first device is a single HDD, the
second is a RAID0 device which consists of three HDDs,
and the third is the SATA SSD. We use the Sandra program
[19] to test their performance. When we use the HDDs as a
RAID0 device, this has 2.3 times the performance of a
single HDD and reduces the average access delay by 5 ms,
from 11 ms to 6 ms. We use a multi-channel NAND Flash
interface as the SATA SSD and connect it to the system as a
Figure 4. Connection of SATA SSD and HDD storage device by means of the SATA interface. The actual
read/write data rate of the SATA SSD is 87MB/sec and program we open a Word file to meet the end-user’s process
29MB/sec. if he is using these programs as well. Figure 6 shows the
measurement of the access time of the various application
Table II. Comparison of the average access delay of storage
programs [19].
devices
Before the test we clear the prefetched programs in the
Item! Device 1! Device 2 Device 3! memory cache. Then we follow the test procedure both to
Buffered Read
133 302 87 launch and to end the test program where the acceleration
(MB/sec)!
Sequential Read software monitors the frequently used application programs.
64 171 117
(MB/sec)! After that we add the SATA SSD and recount the total time
Random Read used.
(MB/sec)!
38 82 79
Buffered Write Table III shows the test programs and the size of their
(MB/sec)!
129 189 29
data. The total data is around 1689MB.
Sequential Write
(MB/sec)!
62 157 56
Table III. File size of test programs
Random Write
(MB/sec)!
42 91 17 Item! Program! Data Total!
Average Access Delay Adobe
(ms)!
11 6 <0.1 214MB 988MB 1202MB
Photoshop!
Microsoft
Office!
337MB 10MB 347MB
The test program’s minimum time unit is ms, and if the Windows
average access delay is less than 0.1 ms, the test value will Photo Gallery!
7.8MB 4.6MB 12.4MB
be shown as <0.1 ms. We use the SATA SSD as the cache Windows
4.3MB 2.7MB 7MB
Media Player!
of the SATA HDD to attain the access time of the Acrobat
121MB 179KB 121.2MB
application programs in the OS. The test programs are: Reader!
Notepad! 148KB 4KB 152KB
(1) Adobe Photoshop
(2) Microsoft Office
(3) Windows Photo Gallery We record the access time as shown in Figure 7. To
(4) Windows Media Player reduce measurement errors we repeat this test 20 times and
(5) Acrobat Reader measure the access time of each test.
(6) Notepad
70

60
Application Access Time (secs.)

50

40

30

20
SATA HDD
SATA HDD with SATA SSD
10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Times

Figure 7. Access time of application programs


Figure 7 shows two curves; one is for the SATA HDD,
Figure 6. Access time of application programs the other for the SATA HDD with an 8GB SATA SSD.
In the experiment we install all the test programs and From the test and the measurement results we find that by
preload the timer program to measure the access time. When using the SATA SSD as the cache of the SATA HDD we
we start the test, we launch each test program separately and improve the system’s performance.
set up the timer. After one test program is finished, we
Data ( MB )
launch another. We do not launch all test programs at the Throughput ( MB / sec) (1)
Time (sec .)
same time, in order to prevent a wrong count of the testing
time. For instance, after launching a Photoshop test program
we open a picture file, and after launching a Word test
Table IV shows the access time and the throughput adding the SATA SSD. Hence the average performance is
both with and without the SATA SSD as the cache of the improved by 22.8%.
SATA HDD. We obtain the data throughput according to Both the capacity and the throughput of the NAND
Eq. (1). Flash memory have been increased because of the
improvement of the semiconductor process. We draw an
Table IV. Access time of application programs
estimated NAND Flash memory capacity and throughput
Data Original Cached!
Item! Size
curve for the coming five years, according to the statistics.
Time Th Time Th
(MB)!
(sec.)! (MB/s)! (sec.)! (MB/s)!
The capacity can double every 18 months, and the
Adobe throughput can double every two years. The NAND Flash
1202 49.7 24.19 40.3 29.83
Photoshop! memory capacity can be 1Tb, and the throughput will be
Microsoft 256MB/sec in 2014, as shown in Figure 9.
347 7.7 45.06 4.4 78.86
Office!
Windows Capacity Throughput
Photo 12.4 0.6 20.67 0.4 31.00 (bits) (Byte/sec.)
Gallery! 1T 1Tb 256M

Windows Throughput (MB/s)


Media 7 0.3 23.33 0.2 35.00
Player! Capacity (Gb)
Acrobat SATA HDD and SATA SSD
121.2 3.8 31.89 2.5 48.48 512G 512Gb 128M 128M
Reader! access time (sec.)
Notepad! 152KB 0.1 1.52 0.1 1.52

256G 64M 256Gb 64M


Table IV shows that we can improve the data 47.9sec.
throughput by adding the SATA SSD as the cache of the 32M
128G 32M
SATA HDD. Depending on the size of the data, the 24sec.
128Gb
performance can be improved by 23% to 75%. 16M
12sec.
64G 16M
10M 64Gb
n
Data HDDi Data Flashi 4M
[¦ (
32G 8M
Taccess  )] / n (2) 16G
2M
512Mb 2Gb 8Gb
32Gb 6sec.
4M
i 1 ThHDDi ThFlashi 8G 2M
98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 (Year)

Taccess in Eq. (2) represents the average access time, Figure 9. NAND Flash capacity, throughput and access
DataHDD is the random file size, ThHDD is the data time improvement
throughput of the HDD, DataFlash is the file size from the The application files total 1689MB in this experiment,
SATA SSD, and ThFlash is the throughput of the SATA SSD. and the cached access time is 47.9 seconds. This time can be
The acceleration software monitors the frequency of the reduced by half every two years because the throughputs of
application programs and prefetches them into the cache the NAND Flash memory can double every two years. The
from the SATA HDD. This causes the differences in the cache access time can be reduced to 6 seconds in 2014.
data access times from the cache and from the SATA HDD. Furthermore, if we can increase the number of parallel
Eq. (2) shows the recording average of the 20 tries and channels of the current design by 2, 4, and 8 times, we can
Figure 8 shows the comparison of the average access time reduce the cache access time to 3, 1.5, and 0.75 seconds
both for our design and for a traditional design. respectively.

4. CONCLUSION

The acceleration software monitors the frequency of the


use of the application programs and prefetches the
application programs with high frequency of use into the
SATA SSD instead of letting them stay in the SATA HDD.
When the OS reloads the high use frequency application
programs, it prefers to access these programs from the
SATA SSD instead of from the SATA HDD in order to
Figure 8. Average access time of application programs reduce the system delay from the low mechanical read/write
performance.
The average access time of the traditional single SATA The capacity of the NAND Flash memory is much larger
HDD is 62.1 seconds; this is reduced to 47.9 seconds by than that of the DRAM. This is the reason why we choose
the NAND Flash memory as the second level cache of the
SATA HDD to accelerate its performance rather than [14] Samsung Electronics Co., Ltd., “K9MDG08U5M Flash
Memory Datasheet,” http://www.samsung.com/, 2008.
increasing the size of the DRAM. However, in comparison
[15] C. Park, P. Talawar, D. Won, M. Jung, J. Im, S. Kim and Y.
with the access speed of the DRAM, the NAND Flash Choi, “A High Performance Controller for NAND Flash-
memory is still as slow as the first level cache and can at based Solid State Disk,” IEEE Non-Volatile Semiconductor
present only serve as the second level cache between the Memory Workshop, pp.17-20, 2006.
DRAM and the SATA HDD in order to increase the [16] Micron Technology Inc., “4Gb, 8Gb, and 16Gb x8 NAND
performance of the SATA HDD. By accelerating the system Flash Memory Features,” http://download.micron.com/pdf/
datasheets/flash/nand/4gb_nand _m40a.pdf, 2006.
access by using the SATA SSD, the average access time of [17] The Serial ATA International Organization, "Serial ATA
application programs is reduced from 62.1 to 47.9 seconds, Specification, Revision 1.0a,” http://www.serialata.org, 2003.
which increases the performance by about 22.8%. [18] PCI-SIG, "PCIe Base Spec. 1.1," http://www.pcisig.com/
From the results of the experiment the conclusion can be members/downloads/specifications/pciexpress/PCI_Express_
drawn that, to improve the performance of the SATA HDD, Base_11.pdf, 2005.
we can increase the capacity of the SATA SSD to 16GB or [19] SiSoftware, “Sandra 2007 Lite,” http://www.sisoftware.net/,
2007.
more in order to enlarge the cache capacity. In addition, we
can use parallel transfer architecture to increase the
bandwidth of the SATA SSD. Extending the NAND Flash
interface from 16 to 32 bits or further will also help improve
the performance of the SATA HDDs.

5. REFERENCES

[1] S. Ameer, “How Solid-State Drives Improve Computing


Platforms,” http://www.intel.com/idf/idf-highlights/, October
2008.
[2] P. M. Chen and D. A. Pamrson, “Storage Performance-
Metrics and Benchmarks,” Proceedings of the IEEE, Vol. 81.
No. 8, pp.1151-1165, August 1993.
[3] A. K. Sahai, “Performance aspects of RAID architectures,”
IEEE Performance, Computing and Communications
International Conference, pp.321-327, Feb. 1997.
[4] N. K. Lee, T. D. Han, S. D. Kim and S. B. Yang, “High
performance RAID system by using dual head disk structure,”
High Performance Computing on the Information
Superhighway, pp.325-330, 1997.
[5] International Business Machines Corp., “Magnetic Racetrack
Memory Project,” http://www.almaden.ibm.com/spinaps
/research/sd/?racetrack, 2008.
[6] R. Karedla, J. S. Love and B. G. Wherry, “Caching Strategies
to Improve Disk System Performancs,” IEEE Computer, Vol.
27, Issue 3, pp.38-46, March 1994.
[7] A. Hospodor, “Hit Ratio of Caching Disk Buffers,” 37th
IEEE Computer Society International Conference, pp.427-
432, Feb. 1992.
[8] K. S. Grimsrud, J. K. Archibald and B. E. Nelson, “Multiple
prefetch adaptive disk caching,” IEEE Transactions on
Knowledge and Data Engineering, Vol. 5, Issue 1, pp.88-103,
Feb. 1993.
[9] P. G. Ferez, J. Piernas and T. Cortes, “The RAM Enhanced
Disk Cache Project,” 24th IEEE Mass Storage Systems and
Technologies Conference, pp.251-256, Sept. 2007.
[10] TrendForce Corporation, “DRAMeXchange market
intelligence,” http://www.dramexchange.com/, 2009.
[11] Hynix Semiconductor, “256M Hynix SDRAM Memory
Datasheet,” http://www.hynix.com/datasheet/pdf/consumer/
HY57V561620F(L)T(P)Series_(Rev1.1).pdf, 2008.
[12] PC World, “Hard Drive Rivals Promote New Hybrid
Technology,” http://www.pcworld.com/article/id,128395-
c,harddrives/article.html, 2007.
[13] Intel Corporation, “Intel Turbo Memory with User Pinning,”
http://www.intel.com/design/flash/nand/turbomemory/316979.
pdf, 2008.

You might also like