Solid-State Drive (SSD) Endurance

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

JEDEC

STANDARD

Solid-State Drive (SSD) Endurance


Workloads

JESD219A
(Revision of JESD219, September 2010)

JULY 2012

JEDEC SOLID STATE TECHNOLOGY ASSOCIATION


NOTICE

JEDEC standards and publications contain material that has been prepared, reviewed, and
approved through the JEDEC Board of Directors level and subsequently reviewed and approved
by the JEDEC legal counsel.

JEDEC standards and publications are designed to serve the public interest through eliminating
misunderstandings between manufacturers and purchasers, facilitating interchangeability and
improvement of products, and assisting the purchaser in selecting and obtaining with minimum
delay the proper product for use by those other than JEDEC members, whether the standard is to
be used either domestically or internationally.

JEDEC standards and publications are adopted without regard to whether or not their adoption
may involve patents or articles, materials, or processes. By such action JEDEC does not assume
any liability to any patent owner, nor does it assume any obligation whatever to parties adopting
the JEDEC standards or publications.

The information included in JEDEC standards and publications represents a sound approach to
product specification and application, principally from the solid state device manufacturer
viewpoint. Within the JEDEC organization there are procedures whereby a JEDEC standard or
publication may be further processed and ultimately become an ANSI standard.

No claims to be in conformance with this standard may be made unless all requirements stated in
the standard are met.

Inquiries, comments, and suggestions relative to the content of this JEDEC standard or
publication should be addressed to JEDEC at the address below, or refer to www.jedec.org under
Standards and Documents for alternative contact information.

Published by
JEDEC Solid State Technology Association 2012
3103 North 10th Street
Suite 240 South
Arlington, VA 22201-2107

This document may be downloaded free of charge; however JEDEC retains the
copyright on this material. By downloading this file the individual agrees not to
charge for or resell the resulting material.

PRICE: Contact JEDEC

Printed in the U.S.A.


All rights reserved
PLEASE!

DONT VIOLATE
THE
LAW!

This document is copyrighted by JEDEC and may not be


reproduced without permission.

For information, contact:

JEDEC Solid State Technology Association


3103 North 10th Street
Suite 240 South
Arlington, VA 22201-2107

or refer to www.jedec.org under Standards-Documents/Copyright Information.


JEDEC Standard No. 219A
Page 1

SOLID STATE DRIVE (SSD) ENDURANCE WORKLOADS

(From JEDEC Board Ballot, JCB-12-16, formulated under the cognizance of the JC-64.8 Subcommittee
on Solid-State Drives.)

1 Scope

This standard defines workloads for the endurance rating and endurance verification of SSD application
classes. These workloads shall be used in conjunction with the Solid State Drive (SSD) Requirements and
Endurance Test Method standard, JESD218.

2 Reference documents

JESD218, Solid State Drive (SSD) Requirements and Endurance Test Method

3 Terms and definitions

cluster: The minimum # of contiguous LBAs allocated by the host operating system.
Cold LBA Range: An LBA Range not referenced by a write or trim in a trace.
command trace: A trace of I/O commands.
compression: The process of scaling LBAs associated with commands in the Master Trace to produce a
Test Trace with a smaller LBA Range.
entropy: The non-compressibility of a data pattern when in file format such that 100% entropy has no
reduction in file size and 50% entropy has approximately a 50% reduction in file size when converting to
a zip file.
expansion: The process of scaling LBAs associated with commands in the Master Trace to produce a
Test Trace with a larger LBA Range.
footprint: A set of LBAs.
Free LBA Space: A set of LBAs within LBA Address Space, which are not currently allocated for data
storage by the host OS.
LBA Address Space: The full set of user addressable LBAs on the device.
LBA Range: The set of contiguous LBA addresses bounded by the lowest and highest LBA of an
addressed structure.
Maximum LBA: A value specifying the LBA number corresponding to the last LBA in the LBA
Address Space.
Master Trace: A command trace typical of a period of continuous drive operation in a typical client
environment, from which Test Traces are derived.
JEDEC Standard No. 219
Page 2

3 Terms and definitions (contd)


PreCond%full: The % of clusters in LBA Address Space that are assigned to valid data on the device
under test by the preconditioning process.
SSD_Capacity: See JESD218.
Test Trace: A command trace comprising I/O operations to a drive under test that is run during the test
phase of a client workload.
Trim: A command from the host that identifies data that is no longer used by the operating system,
allowing the memory space to be considered Free LBA Space by the drive.

4 Enterprise endurance workload

The enterprise endurance workload consists of random data distributed across an SSD in a manner similar
to some enterprise workload traces that are publicly available for review.

Prior to running the workload, the SSD under test shall have the user-addressable LBA space filled with
valid data (e.g., the drive does not return a data read error because of the content of the LBA prior to
being written during the test routine). Initialization may not be necessary if the formatted SSD satisfies
this requirement.

a) The enterprise endurance workload shall be comprised of random data with the following payload
size distribution:

512 bytes (0.5K) 4%


1024 bytes (1K) 1%
1536 bytes (1.5K) 1%
2048 bytes (2K) 1%
2560 bytes (2.5K) 1%
3072 bytes (3K) 1%
3584 bytes (3.5K) 1%
4096 bytes (4K) 67%
8192 bytes (8K) 10%
16,384 bytes (16K) 7%
32,768 bytes (32K) 3%
65,536 bytes (64K) 3%

b) The data payloads greater than or equal to 4096 bytes data payload sizes shall be arranged such that
the data payloads less than 4096 bytes are pseudo randomized among the data payloads greater than
or equal to 4096 bytes. Data payloads greater than or equal to 4096 bytes are aligned on 4K
boundaries.
JEDEC Standard No. 219A
Page 3

4 Enterprise endurance workload (contd)

c) The workload shall be distributed across the SSD such that the following is achieved:

1) 50% of accesses to first 5% of user LBA space (LBA group a)


2) 30% of accesses to next 15% of user LBA space (LBA group b)
3) 20% of accesses to remainder of user LBA space (LBA group c)

d) To avoid testing only a particular area of the SSD, the distribution described in c) is offset through the
user LBA space on different units under test such that all of the SSD LBAs are subjected to the
highest number of accesses (e.g., SSD 1 has LBA group a applied to the first 5% of LBAs, SSD 2 has
LBA group a applied to the next 5% of LBAs, etc).

e) The write data payload size distribution shall be applied to each of the three LBA groups
concurrently. The write sequence across the LBA groups may be applied in either a deterministic
fashion or randomly, depending on the capabilities of the test tools, so long as the percentage of
accesses to each LBA group conforms to those specified in 4c and the LBA access shall be random
(not sequential) within each LBA group. An example of a deterministic method is given in Annex A.1
and an example of a random method is given in Annex A.2.

f) Random data is considered to have 100% entropy. The randomization of the data shall be such that if
data compression/reduction is done by the SSD under test, the compression/reduction has the same
effect as it would on encrypted data. It is acceptable to substitute a few bytes in each sector with
metadata (such as the LBA number) or other information in place of the random data in those bytes if
the addition of such information provides enhanced test interpretation capability. The random data for
the payload may be generated by various means. An informative example script for generating the
random data and the read/write distribution across LBA groups is provided in Annex A.2. This script
uses open-source software that may be found at vdbench.org.

5 Client endurance workload

5.1 Client endurance workload overview


The client endurance workload consists of a standard reference trace of standard ATA I/O commands,
played back on the target device.

The standard TBW rating for a client drive shall be derived for and verified under the following workload
conditions:

a) PreCond%full = 100%;
b) trim commands enabled; and
c) random data pattern.

Additional TBW ratings may optionally be derived and verified for different values of these parameters;
however, such ratings shall be reported only in addition to the standard TBW rating. See clause 6
concerning extrapolation of non-standard TBW ratings.
JEDEC Standard No. 219
Page 4

5.1 Client endurance workload overview (contd)

The application of the client workload occurs in two phases:


1. Preconditioning Phase

The preconditioning phase establishes an initial pattern of free LBA distribution within the LBA
Address Space of the drive under test. This phase creates the logical fullness of the drive at the start
of the test phase, as defined by the PreCond%full parameter specified for the test.

The preconditioning phase comprises the following two steps.

1) Write once, sequentially, to the full LBA Address Space using random data (see 5.3.2). If
additional TBW ratings are being determined with non-random data (see 5.3.3), non-random
data shall be used for this step instead of random data.
2) Create Free LBA Space from the Maximum LBA contiguous to the lower LBA address
corresponding to the PreCond%full value if the PreCond%full value is less than 100%.

2. Test Phase

The test phase stresses the drive in a manner that is representative of use during long-term operation
in a client environment. The test phase comprises multiple runs of a Test Trace. The Test Trace
has the same LBA footprint for each iteration.

During the test phase, a Test Trace, as specified in 5.2.2, is run repeatedly, per the methods
specified in JESD218.

The client endurance workload traces referenced in this standard are contained in a trace library
maintained by JEDEC and can be downloaded at www.jedec.org, using search criteria: JESD219.

5.2 Client Endurance Workload Traces


5.2.1 Master Trace
A Master Trace is a command trace from which all Test Traces within the SSD_Capacity range of the
Master Trace in the trace library are derived. It comprises drive input/output operations captured over
extended period of time in a single user PC with an installed operating system that supports trim
commands. The Master Trace incorporates all I/O operations to a representative SSD, including those
occurring during shutdown/boot and hibernate/restore activity. Operations include the following:
a) write commands;
b) trim commands 1;
c) flush cache commands.

This Master Trace should not be used to verify the endurance of SSDs utilized in cache applications
since the workload may vary from the activity performed in the client workload Master Trace.

1
JESD219 refers to trim commands. Trim applies to the ATA command set. The equivalent of the trim command
is defined in other interface standards (e.g., SCSI Unmap; NVMe Deallocate). If a device using a command set
other than ATA is being tested, the relevant form of trim for the devices command set should be used by the test
software.
JEDEC Standard No. 219A
Page 5

5.2.1 Master Trace (contd)


A Master Trace may be scaled to produce Test Traces for use with a drive of any capacity within the
SSD_Capacity range of the Master Trace by means of compression or expansion. Compression or
expansion is used to create a Test Trace with an LBA Range that almost spans the LBA address range of
the drive to which it relates.

Compression is achieved by reducing the length of Cold LBA Ranges in the trace. Expansion is achieved
by increasing the length of Cold LBA Ranges in the trace.

Expanded
LBA Footprint for 1 2 3 4 5
Test Trace

LBA Footprint of
1 2 3 4 5
Master Trace

Compressed
LBA Footprint for 1 2 3 4 5
Test Trace

Compressed Expanded
Write/Trim LBA Range Cold LBA Range
Cold LBA Range Cold LBA Range

Figure 1 Compression and Expansion of Master Trace

If the Master Trace has an LBA Range = M, and it is required to create a Test Trace with LBA Range =
T, then the total length of Cold LBA Ranges within the Master Trace is compressed (or expanded) by M-
T in order to create the Test Trace.

If the total length of all the Cold LBA Ranges within the Master Trace is C, then the length of each
individual Cold LBA Range, of length > 20MB, shall be reduced by a fraction (M-T)/C, or increased by
a fraction (T-M)/C, as appropriate. During reduction or expansion, the change in length of an individual
Cold LBA Range should be a multiple of 4KB.

5.2.2 Test Trace


A Test Trace contains all commands that are present in the Master Trace.

The LBA Range of a Test Trace almost spans the LBA address range of the drive to which it relates. A
Test Trace for a specified SSD_Capacity parameter is derived from the Master Trace with the use of
compression or expansion if required.

The Test Trace may combine consecutive trim commands found in the Master Trace into a single trim
command.

Multiple Test Traces exist in the JEDEC trace library, each corresponding to a specific value of the
SSD_Capacity parameter.
JEDEC Standard No. 219
Page 6

5.3 Client workload data patterns

5.3.1 Data pattern overview

The workload trace files only contain the ATA I/O commands. The data payload content of the LBAs is
not included in the traces due to the size of such a file not being practical and due to the need for the data
to change within the LBA during repeated writes. Some drives may include data compression or reduction
techniques. Some encryption software may cause data to become essentially 100% random. Due to these
variables, random data shall be used as the standard TBW rating condition. Based on examination of
drives in field applications, for additional TBW ratings of drives that incorporate data compression or
reduction techniques, the non-random data shall have entropy of approximately 50%.

5.3.2 Random data

Random data shall be generated according to the description for the enterprise workload (see 4f). A local
copy of the test trace file may be written to the SSD in place of some random data as required by the
testing method.

5.3.3 Non-random data

Non-random data patterns should be created using the non-random data file created by the Perl code
located in Annex B.1. The non-random data file generated by this Perl code has approximately 50%
entropy. A random byte offset shall be used to ensure that the data selected from the non-random data file
has sufficient randomness to exceed the SSD_Capacity. It is acceptable to substitute a few bytes in each
sector with metadata (such as the LBA number) or other information in place of the non-random data in
those bytes if the addition of such information provides enhanced test interpretation capability. A local
copy of the test trace file may be written to the SSD in place of some non-random data as required by the
testing method.

See Annex B.1 for an example of Perl code that produces a 1GB data file with approximately 50%
entropy.
See Annex B.2 for an example of Perl code that produces the random byte offset.
See Annex B.3 for an example of Perl code that evaluates the entropy of the output file generated.

See clause 6 concerning extrapolation of the non-standard TBW rating.

5.4 Trim

To verify an additional TBW rating in a non-trim-enabled environment, two options exist:

a) disable the trim command function on the drive; or


b) modify the JESD219 Client workload to remove trim commands.

See clause 6 concerning extrapolation of the non-standard TBW rating.


JEDEC Standard No. 219A
Page 7

6 Extrapolation of Non-Standard TBW Ratings

A minimum sample of two drives shall be tested with the non-standard conditions to be rated
(i.e., PreCond%full < 100%, trim commands disabled, and/or non-random data pattern) to establish the
steady-state write amplification factor (WAFnonstd). One, two, or three of these variables may be combined
in a single test, depending on the desired test configuration. The ratio of the steady-state WAF of the drive
family verified with the standard conditions described in 5.1 (WAFrnd) to the steady-state non-standard
conditions WAF (WAFnonstd) of the drive is used to extrapolate the TBW with non-random data:

WAFstd
TBWstd = TBWnonstd
WAFnonstd

Non-standard TBW ratings shall be reported only in addition to the standard TBW rating.
JEDEC Standard No. 219
Page 8

Annex A (informative) Examples of write sequence methods

A.1 Deterministic method example

The write data payload size distribution is applied to each of the three LBA groups. Accesses are mixed
among the LBA groups such that the sequence of single write commands of the defined data payload
sizes are performed within the following LBA group:

1) LBA group a
2) LBA group b
3) LBA group a
4) LBA group c
5) LBA group a
6) LBA group b
7) LBA group a
8) LBA group c
9) LBA group a
10) LBA group b
11) Back to 1) and repeat sequence

A.2 Random method example

This script is provided as an example only. It is not intended to imply a preferred method for generating
the enterprise workload. Due to possible updates made to this open-source software, minor changes to the
script may be required to achieve the desired functions. The following are instructions for running a
VDbench script.

1) Latest VDbench installed and java jre6 or greater on a Windows platform.


2) Copy scripts into VDbench working directory.
3) Identify the physical drive number of the device to be exercised.
4) Run maxio.cmd.
5) maxio.cmd may be modified according to the user (i.e., the start thread count may be changed to
expedite test time).
6) maxio.cmd may be modified according to the user to change the Read/Write ratio from the
currently set 40/60 to what is desired.
7) maxio.cmd is currently set to run for 1000 hours, which may be changed by the user for different
duration.
8) maxio.cmd is currently set to stop on 10 errors (i.e., uncorrectable read error), this may be
changed by the user to stop after a set number of errors. Errors are logged.
9) Analysis may be done on the logfile.html in the results file directory.

NOTE The script may be wrapped to the next line due to margin limitations. Function identifiers begin each line
of script.
JEDEC Standard No. 219A
Page 9

A.2 Random method example (contd)

MAXIO.CMD
echo off
echo **
Echo ** usage example :
echo **
echo ** SSDIO 1 50 3000 MySSD_results
echo ** use Storage Device physicaldrive = 1
echo ** Apply max. Thread count/workload definition = 50
echo ** IO rate set = 3000 IOs per second
echo ** output directory name for results = MySSD_results
echo **
echo Test using SD=%1 Threads=%2 IOrate=%3 results=%4
echo on
pause
echo off
REM ==================================================
REM SSDIO Endurance Profile Example
REM ==================================================
set Drv=%1%
set T=%2%
set IO=%3%
set /A A1=%IO%*90
set /A A=%A1%/100
set /A B1=%IO%*80
set /A B=%B1%/100
set /A C1=%IO%*50
set /A C=%C1%/100
set /A D1=%IO%*10
set /A D=%D1%/100
set /A E=%IO%/100
REM ***************************************************************************
REM enduration time set by elapse time=1000h (hours)
REM set data errors to 10 Any greater then device testing should not continue
REM ***************************************************************************
REM
echo * SSDIO Profile > ssdio.go
echo sd=sd1,lun=\\.\physicaldrive%Drv%,align=4k,threads=%T% >> ssdio.go
echo data_errors=10
echo
wd=wd1,sd=sd1,rdpct=40,seek=100,xfersize=(512,4,1k,1,1.5k,1,2k,1,2.5k,1,3k,1,3.5k,1,4K,67,8k,10,16k,7,32k,3,6
4k,3),range=(1,5),skew=50 >> ssdio.go
echo
wd=wd2,sd=sd1,rdpct=40,seek=100,xfersize=(512,4,1k,1,1.5k,1,2k,1,2.5k,1,3k,1,3.5k,1,4K,67,8k,10,16k,7,32k,3,6
4k,3),range=(6,20),skew=30 >> ssdio.go
echo
wd=wd3,sd=sd1,rdpct=40,seek=100,xfersize=(512,4,1k,1,1.5k,1,2k,1,2.5k,1,3k,1,3.5k,1,4K,67,8k,10,16k,7,32k,3,6
4k,3),range=(21,100),skew=20 >> ssdio.go
echo rd=SSD_Sustain,wd=wd*,iorate=%3%,elapsed=1000h,interval=60 >> ssdio.go
REM echo Please Check these values are correct!
REM pause
vdbench -f ssdio.go -o %4%
JEDEC Standard No. 219
Page 10

Annex B (informative) Example Perl code associated with non- random data payloads

B.1 Example Perl code that generates the data payload with approximately 50% entropy.

#!/usr/bin/perl

# Generate a file with a given "entropy" (compressibility).


# The file is comprised of random sections (incompressible) and "fixed"
# sections (compressible) such that the overall compressibility achieves
# the target. The "fixed" sections are either runs (repeated characters)
# or copies (repitions of nearby previous portions of the file).
# Default settings provide a 1GB file.

use Getopt::Long;

# Default values of arguments


$MAX_LEN = (1 << 30);
$entropy = 0.50;

# Parse arguments...
if (!GetOptions (
"max_len=i" => \$MAX_LEN,
"entropy=f" => \$entropy,
"help" => \&help)) {
die "$0: Failed to parse options";
exit 17;
}

# Check for output file being present


if (@ARGV) {
die "$0: must provide exactly one file name argument for output file"
if (@ARGV > 1);
$ofile = $ARGV[0];
}
else {
die "$0: must provide file name argument for output file";
}

# Initialize RNG...
random_init ();

# Below three bytes, copies/repeats generally not effective


# Above 50 bytes, copies/repeats are >> 10-to-1
$MIN_FIXED = 3;
$MAX_FIXED = 50;
JEDEC Standard No. 219A
Page 11

# Restrict copies to being within this range (further adjusted by $MAX_FIXED below)
$COPY_RANGE_LIMIT = 512;

# Empirically determined so that at 50%, we get ~50% compressibility of the whole file with gzip -9
$random_efficiency = (0.559 + ($entropy / 0.25) * 0.168);

$avg_compression = 1.0 / $entropy;

# Determine the number of random vs. "fixed" (copies/repeats) characters we want overall
$len_random = (($entropy > $random_efficiency) ? $MAX_LEN :
($entropy * $random_efficiency * $MAX_LEN));
$len_fixed = ($MAX_LEN - $len_random);

# $fixed_to_random_ratio provides the fine-grained localized control by making things stay close locally
$fixed_to_random_ratio = (($len_fixed > 0) ? ($len_random / $len_fixed) : 0);

# Open the output file


open (FOO, ">$ofile") ||
die "$0: can't open '$ofile' for writing: $!";

# Initializations for generation loop


$in_random = 0;
$in_fixed = 0;
$last_in_fixed = 0;
printf ("Total length %d (interleaved fixed=%d, random=%d)\n",
$MAX_LEN, $len_fixed, $len_random);
printf (" Target entropy = %5.2f, assuming compression efficiency = %5.3f\n",
$entropy, $random_efficiency);
$fifo_ptr = 0;
$fifo_max = ($COPY_RANGE_LIMIT + 2);
@fifo = ();
$wrapped = 0;
# Limit of how far back to possibly copy
$copy_limit = ($COPY_RANGE_LIMIT - $MAX_FIXED);
# Loop over all characters to be generated...
for ($i = 0; $i < $MAX_LEN; $i++) {
# Remember if we were just in a fixed sequence...
$last_in_fixed = $in_fixed;
# See if we're in fixed or random mode...
if ($in_fixed) {
# Fixed mode -- generate based on $use_fifo flag a copy or a repeat
if ($use_fifo) {
if (++$fifo_pos >= $fifo_max) {
$fifo_pos = 0;
}
$char = $fifo[$fifo_pos];
JEDEC Standard No. 219
Page 12

}
else {
$char = $fixed_char;
}
# Decrement count, and check to see if we should switch to random for pre-determined amount
unless (--$in_fixed) {
$in_random = $next_in_random;
}
}
else {
# Not fixed -- see if we're currently in random mode
if ($in_random) {
# In random mode, output a random character and decrement count...
$char = int (rand (256));
--$in_random;
}
elsif (!$last_in_fixed && ($len_fixed > 0)) {
# Finished random mode and didn't just fall out of fixed mode (initial state) and have fixed chars
left to generate
# Empirically, about half of compressibility due to runs (very data dependent, of course)
$use_fifo = ((rand (1.0) <= 0.5) && $wrapped);
if ($use_fifo) {
# FIFO (copy) mode
# Need to make sure we have at least a two-character repeat
$fifo_pos = $fifo_ptr - (int (rand ($copy_limit - 2)) + 2);
$fifo_pos += $fifo_max if ($fifo_pos < 0);
$char = $fifo[$fifo_pos];
}
else {
# Repeated character mode
$fixed_char = int (rand (256));
$char = $fixed_char;
}
# Determine fixed length as a bound between the min and the max
$in_fixed = int (rand ($MAX_FIXED - $MIN_FIXED) + $MIN_FIXED);
# Decrement the number of fixed chars we have left to generate
$len_fixed -= $in_fixed;
# Determine how many random characters should follow this fixed sequence --
# Either all the rest if fixed are exhausted, or the right ratio to keep things even...
$next_in_random = (($len_fixed <= 0) ? $len_random :
int ($fixed_to_random_ratio * $in_fixed));
# printf ("Fixed = %d\n", $in_fixed);
}
else {
# We must be out of fixed chars -- it's all random for the rest...
$char = int (rand (256));
}
}
print (FOO chr($char));
$fifo[$fifo_ptr] = $char;
JEDEC Standard No. 219A
Page 13

if (++$fifo_ptr >= $fifo_max) {


$fifo_ptr = 0;
$wrapped = 1;
}
}

# Close the output file


close (FOO);

# That's all, folks!


exit (0);

#
# random_init:
# Initialize the built-in RNG.

sub random_init {
my ($i);

srand (time ^ $$);


}

# help:
# Help message...

sub help {
printf ("$0 help:\n");
printf (" -max_len #\n");
printf (" -entropy #.#\n");
printf (" -help \tprint this useful information\n");
printf ("\n");
exit (0);
}
JEDEC Standard No. 219
Page 14

B.2 Example Perl code that generates the data payload with approximately 50% entropy.

#!/usr/bin/Perl

# Output a random chunk of a file.


# Pieces <= 4KB are returned "straight"
# Peices > 4KB are returned as random 4KB chunks.

use Getopt::Long;

# Default values of argument


$SIZE = 4096;

# Parse arguments...
if (!GetOptions (
"size=i" => \$SIZE,
"help" => \&help)) {
die "$0: Failed to parse options";
exit 17;
}

# Check for input file being present


if (@ARGV) {
die "$0: must provide exactly one file name argument for input file"
if (@ARGV > 1);
$ifile = $ARGV[0];
unless (-r $ifile) {
die "$0: can't read input file '$ifile'";
}
}
else {
die "$0: must provide file name argument for input file";
}

# Get byte size of input file


($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $ifile_size) = stat $ifile;

# Make sure input file is at least as large as desired sampling window


if ($ifile_size < $SIZE) {
die "$0: input file '$ifile' is only $ifile_size bytes long -- less than chunk size $SIZE";
}

# Initialize RNG...
random_init ();

# Largest size chunk we want to extract from one point


$MAX_CHUNK = 4096;
JEDEC Standard No. 219A
Page 15

# Open the input file


open (FOO, $ifile) ||
die "$0: can't open input file '$ifile' for reading: $!";

# For each piece that we want to generate...


$str = "";
do {
$need = (($SIZE >= $MAX_CHUNK) ? $MAX_CHUNK : $SIZE);
# Pick a random offset that fits in the file
$offset = int (rand ($ifile_size - $need) + 0.9999999);
# Seek to the random offset
unless (seek (FOO, $offset, 0)) {
die "$0: couldn't seek to offset $offset in input file '$ifile' of size $ifile_size: $!";
}
# Read a window's worth of chraracters at the offset position
unless (read (FOO, $tmp_str, $need) == $need) {
die "$0: couldn't read $need bytes at offset $offset in input file '$ifile' of size $ifile_size: $!";
}
$str .= $tmp_str;
$SIZE -= $MAX_CHUNK;
} while ($SIZE > 0);

# Close the input file


close (FOO);

# Print the resulting string


print $str;

# That's all, folks!


exit (0);

#
# random_init:
# Initialize the built-in RNG.

sub random_init {
my ($i);

srand (time ^ $$);


}

# help:
# Help message...

sub help {
printf ("$0 help:\n");
printf (" -size # \tsize of the random chunk to generate\n");
printf (" -help \tprint this useful information\n");
printf ("\n");
exit (0);
}
JEDEC Standard No. 219
Page 16

B.3 Example Perl code for evaluating the entropy of the output file generated for data payload

#!/usr/bin/Perl

# Evalute the compressibility of a file in "windows" a given size.


# A window is NUM pieces each of size LEN, where each piece is at a
# random (byte) offset in the file.
# Multiple samples (multiple windows) are tested for compressibility
# using different metics.
# Entropy and Huffman are character-based metrics and do not reflect
# compressibility as they only look at the character distribution.
# gzip -9 (by default) is used to measure compressibility.

use Getopt::Long;
use Algorithm::Huffman;
use IPC::Open2;

# Default values of argument


$NUM = 1;
$LEN = 4096;
$SAMPLES = 1000;
$GZIP_CMD = "gzip -9 -c";

# Parse arguments...
if (!GetOptions (
"num=i" => \$NUM,
"len=i" => \$LEN,
"samples=i" => \$SAMPLES,
"gzip_cmd=s" => \$GZIP_CMD,
"help" => \&help)) {
die "$0: Failed to parse options";
exit 17;
}

# Check for input file being present


if (@ARGV) {
die "$0: must provide exactly one file name argument for input file"
if (@ARGV > 1);
$ifile = $ARGV[0];
unless (-r $ifile) {
die "$0: can't read input file '$ifile'";
}
}
else {
die "$0: must provide file name argument for input file";
}

# Get byte size of input file


JEDEC Standard No. 219A
Page 17

($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $ifile_size) = stat $ifile;

# Make sure input file is at least as large as desired sampling window


if ($ifile_size < $LEN) {
die "$0: input file '$ifile' is only $ifile_size bytes long -- less than window size $LEN";
}

# Initialize RNG...
random_init ();

# Open the input file


open (FOO, $ifile) ||
die "$0: can't open input file '$ifile' for reading: $!";

# Initializations for sampling loop


$min_ent = 1.0e50;
$max_ent = -1.0e50;
$sum_ent = 0.0;
$min_hln = 1.0e50;
$max_hln = -1.0e50;
$sum_hln = 0.0;
$min_gln = 1.0e50;
$max_gln = -1.0e50;
$sum_gln = 0.0;
# Loop for the number of samples we want
$window_len = ($NUM * $LEN);
for ($i = 0; $i < $SAMPLES; $i++) {
# For each piece that we want to generate...
$str = "";
for ($j = 0; $j < $NUM; $j++) {
# Pick a random offset that fits in the file
$offset = int (rand ($ifile_size - $LEN) + 0.9999999);
# Seek to the random offset
unless (seek (FOO, $offset, 0)) {
die "$0: couldn't seek to offset $offset in input file '$ifile' of size $ifile_size: $!";
}
# Read a window's worth of chraracters at the offset position
unless (read (FOO, $tmp_str, $LEN) == $LEN) {
die "$0: couldn't read $LEN bytes at offset $offset in input file '$ifile' of size $ifile_size: $!";
}
$str .= $tmp_str;
}
# Entropy
$ent = compute_entropy ($str, $window_len);
$min_ent = $ent if $min_ent > $ent;
$max_ent = $ent if $max_ent < $ent;
$sum_ent += $ent;
# Huffman
$hln = compute_huffman ($str, $window_len);
$min_hln = $hln if $min_hln > $hln;
JEDEC Standard No. 219
Page 18

$max_hln = $hln if $max_hln < $hln;


$sum_hln += $hln;
# gzip
$gln = compute_gzip ($str, $window_len);
$min_gln = $gln if $min_gln > $gln;
$max_gln = $gln if $max_gln < $gln;
$sum_gln += $gln;
}

# Close the input file


close (FOO);

# Output statistics...
printf ("Entropy (char) for %d samples of length %d: min=%5.3f, max=%5.3f, avg=%5.3f\n",
$SAMPLES, $window_len, $min_ent, $max_ent, ($sum_ent/$SAMPLES));
printf ("Huffman (char) for %d samples of length %d: min=%5.3f, max=%5.3f, avg=%5.3f\n",
$SAMPLES, $window_len, $min_hln, $max_hln, ($sum_hln/$SAMPLES));
($str = $GZIP_CMD) =~ s/ -c//;
printf ("%-14s for %d samples of length %d: min=%5.3f, max=%5.3f, avg=%5.3f\n",
$str, $SAMPLES, $window_len, $min_gln, $max_gln, ($sum_gln/$SAMPLES));

# That's all, folks!


exit (0);

#
# compute_entropy:
# Compute and return the entropy on the argument string of the given length

sub compute_entropy {
my ($str, $len) = @_;
my ($i, $chr, $chr_pos, $prob, $ent, @count);

@count = ();
for ($i = 0; $i < $len; $i++) {
$chr = substr ($str, $i, 1);
$chr_pos = ord ($chr);
++$count[$chr_pos];
}
$ent = 0.0;
for ($i = 0; $i < 256; $i++) {
if ($count[$i]) {
$prob = ($count[$i] / $len);
$ent += $prob * log2 ($prob);
# printf ("count %d, prob %6.4f, ent %6.3f\n", $count[$i], $prob, $ent);
}
}
# Entropy is # of bits needed to encode characters -- convert to fraction of 8 bits (compressibility)
return (-$ent / 8);
}
JEDEC Standard No. 219A
Page 19

#
# compute_huffman:
# Compute and return the Huffman length of the argument string of the given length

sub compute_huffman {
my ($str, $len) = @_;
my ($i, $chr, $huff, $encode_hash, $encode_of_str, %count);

%count = ();
for ($i = 0; $i < $len; $i++) {
$chr = substr ($str, $i, 1);
++$count{$chr};
}

$huff = Algorithm::Huffman->new(\%count);
$encode_hash = $huff->encode_hash;
$encode_of_str = $huff->encode_bitstring($str);

# Huffman length is # of bits in message -- convert to compressibility


return length ($encode_of_str) / 8 / $len;
}

#
# compute_gzip:
# Compute and return the gzip length of the argument string of the given length

sub compute_gzip {
my ($str, $len) = @_;
my ($pid, $i, $j, @out_str);
local (*Reader, *Writer);

$pid = open2 (\*Reader, \*Writer, $GZIP_CMD);


print Writer $str;
close Writer;
@out_str = <Reader>;
close Reader;
waitpid ($pid, 0);

$j = 0;
foreach $i (@out_str) {
$j += length ($i);
}

# Length is byte length -- convert to compressibility


return $j / $len;
}
JEDEC Standard No. 219
Page 20

#
# log2:
# Compute and return the log base 2 of the given input.

sub log2 {
my ($val) = @_;

return log ($val) / 0.69314718055994530941;


}

#
# random_init:
# Initialize the built-in RNG.

sub random_init {
my ($i);

srand (time ^ $$);


}

# help:
# Help message...

sub help {
printf ("$0 help:\n");
printf (" -num # \tnumber of pieces of size LEN to make the window\n");
printf (" -len # \tlength of each piece over which to evaluate compressibility\n");
printf (" -samples # \tnumber of random spots in file to test\n");
printf (" -gzip_cmd <str>\tcommand to use for compressing each sample\n");
printf (" -help \tprint this useful information\n");
printf ("\n");
exit (0);
}
Standard Improvement Form JEDEC JESD219A

The purpose of this form is to provide the Technical Committees of JEDEC with input from the industry
regarding usage of the subject standard. Individuals or companies are invited to submit comments to
JEDEC. All comments will be collected and dispersed to the appropriate committee(s).

If you can provide input, please complete this form and return to:

JEDEC Fax: 703.907.7583


Attn: Publications Department
th
3103 North 10 Street
Suite 240 South
Arlington, VA 22201-2107

1. I recommend changes to the following:


Requirement, clause number

Test method number Clause number

The referenced clause number has proven to be:


Unclear Too Rigid In Error

Other

2. Recommendations for correction:

3. Other suggestions for document improvement:

Submitted by
Name: Phone:
Company: E-mail:
Address:
City/State/Zip: Date:

You might also like