Professional Documents
Culture Documents
TSI1945 TSE1945 Student Guide v3-0x
TSI1945 TSE1945 Student Guide v3-0x
For HDS internal use only. This document is not to be used for instructor-led
training without written approval from geo Academy leaders. In addition,
this document should not be used in place of HDS maintenance manuals
and/or user guides.
All other trademarks, trade names, and service marks used herein are the rightful property of their respective
owners.
NOTICE:
Notational conventions: 1KB stands for 1,024 bytes, 1MB for 1,024 kilobytes, 1GB for 1,024 megabytes, and
1TB for 1,024 gigabytes, as is consistent with IEC (International Electrotechnical Commission) standards for
prefixes for binary and metric multiples.
© Hitachi Data Systems Corporation 2012. All Rights Reserved
HDS Academy 1062
Student Introductions
• Name
• Position
• Experience
• Your expectations
Course Description
Prerequisites
Recommended
• TSI2048 — Virtualization Solutions
• TSI1848 — Hitachi Tuning Manager Software v7.x Advanced Operations
• TSI0945 — Managing Storage Performance with Hitachi Tuning Manager
Software v7.x
Required knowledge and skills
• Working knowledge of Hitachi Enterprise Storage, Hitachi Modular
Storage, Hitachi Optimization and Virtualization tools, and Hitachi
Monitoring and Reporting tools
Course Objectives
Course Topics
Learning Paths
HDS.com: http://www.hds.com/services/education/
Partner Xchange Portal: https://portal.hds.com/
HDSnet: http://hdsnet.hds.com/hds_academy/
Please contact your local training administrator if you have any questions regarding
Learning Paths or visit your applicable website.
Academy in theLoop!
theLoop:
http://loop.hds.com/community/hds_academy/course_announcements_and_feed
back_community ― HDS internal only
Twitter site
Site URL: http://www.twitter.com/HDSAcademy
LinkedIn Group
Site URL: http://www.linkedin.com/groups?gid=3044480&trk=myg_ugrp_ovr
Sign in
What Is Performance?
Fulfillment of an Expectation
Performance = Reality – Expectations
Happiness = Reality – Expectations
Performance = Happiness
Measure Reality
• Establish comprehensive data collection
Ask about Customer Expectations
• Do quantifiable expectations exist?
• How are customer expectations not being met?
Fulfillment of an Expectation
If both performance and happiness equal the same thing (reality minus
expectations), then it follows that performance must equal happiness
Measure Reality
Establish comprehensive data collection
Ask about Customer Expectations
Do quantifiable expectations exist?
Throughput (IOPs, MB/sec); response time
How are customer expectations not being met?
Specific targets, timing or circumstances
How do they know they are unhappy?
Head/Suspension
Cover
Gimbal
Base- Suspension
Disk
casting
Slider
(1mm )
Actuator
(VCM and head- Slider
Coil
suspension assembly Suspension
Rotational
Latency
Recording Medium
Lube
• If Random Access Spindle
– Mechanical Latency is Motor Overcoat
significant Recording layer
• If Sequential Access Growth control layer
– Mechanical Latency not Electronics
Card Substrate
usually important
• Cache Hits hide latency
Modular Storage
• RAID-0, RAID-1, RAID-1+0, RAID-5, RAID-6
Enterprise Storage
• RAID-1+, RAID-5, RAID-6
Hitachi data controllers are the most advanced in the storage industry and employ
advanced algorithms to manage performance. The intelligent controllers provide
disk-interface and RAID management circuitry, offloading these tasks to dedicated
embedded processors. All user data disks in the system are defined as part of a
RAID array group of one type or another.
Example 2: 10 K RPM drive Capacity (GB) 73 146 146 300 300 500 500
with 2.99 ms average rotation RPM 15000 10000 15000 10000 10000 7200 7200
and 4.9 ms average seek
1000 ms / (2.99 + 4.9) = 127 Av Latency (ms) 2.01 2.99 2.01 2.99 2.01 4.17 4.17
Av Seek (Write) 4.2 5.4 4.1 5.1 4.1 8.5 8.5
Example 3: 7200 RPM drive Av Seek (Read) 3.8 4.9 3.8 4.7 3.8 8.5 8.5
with 4.17 ms average rotation Av Tot Latency 6.21 8.39 6.11 8.09 6.11 12.67 12.67
and 8.5 ms average seek
Av IOPS (Read) 172 127 172 130 172 79 79
1000 ms / (4.17 + 8.5) = 79
Av IOPS (Write) 161 119 164 124 164 79 79
1 2 3
RAID-1
XYZ
or
Also called mirroring For writes, a copy For reads, the data
must be written to both can be read from
Two copies of the data
disk drives either disk drive
Requires 2x number Two parity group disk Read activity
of disk drives drive writes for every distributed over both
host write copies reduces disk
Does not matter what drive busy status (due
previous data was; just to reads) to ½ of what
overwrite with new it would be to read
data from a single (non-
RAID) disk drive
RAID Terminology
Mirror
RAID-0+1
Stripe Stripe
RAID-1+0
• 2D+2D or 4D+4D
• With rotating copy
RAID-1+0 is available in 2 data plus 2 data (2D+2D) and 4 data plus 4 data (4D+4D)
disk configurations.
The configurations include a rotating copy, where the primary and secondary
stripes are toggled back and forth across the physical disk drives for performance.
RAID-1+0 is best suited to applications with low cache hit ratios, such as random
I/O activity and with high write to read ratios.
RAID-1+0
ABCDEFGHIJKL
(Sequential write shown)
RAID Terminology
RAID-5
• 3D+1P or 7D+1P
• Data is striped with parity over RAID members
D1 D2 D3 D4 D5 D6 D7 P
P D1 D2 D3 D4 D5 D6 D7
D7 P D1 D2 D3 D4 D5 D6
D6 D7 P D1 D2 D3 D4 D5
D5 D6 D7 P D1 D2 D3 D4
D4 D5 D6 D7 P D1 D2 D3
D3 D4 D5 D6 D7 P D1 D2
D2 D3 D4 D5 D6 D7 P D1
Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Disk 6 Disk 7 Disk 8
RAID-5
Each parity bit is set as necessary to keep an odd number of “1” bits in that
bit position across the whole parity group (odd parity).
Adding more data drives does not add more parity.
11111
Read missing
Normal operation = not overloaded
That is, only host I/O operation that does not complete at electronic speed with
just an access to cache
Copy of data and some extra in the same area is usually kept in cache
Because hosts tend to ask for the same or close-by data later
Loading of some extra is governed by the intelligent learning algorithm
10011 .....
01010 00000 00110
Sequence:
1. Read old data; read old parity.
2. Remove old data from old parity giving partial parity (parity for the rest of the
row).
3. Add new data into partial parity to generate new parity.
4. Write new data and new parity to disk.
8 K block positioned
within segment, only Old data #1 Old data #2 Old parity
partially occupies Data #1 Data #2 Data #3 Parity
segment
RAID Terminology
RAID-6
• 6D + 2P
• Data is striped with parity over RAID members
D1 D2 D3 D4 D5 D6 P1 P1
D2 D3 D4 D5 D6 P1 P2 D1
D3 D4 D5 D6 P1 P2 D1 D2
D4 D5 D6 P1 P2 D1 D2 D3
D5 D6 P1 P2 D1 D2 D3 D4
D6 P1 P2 D1 D2 D3 D4 D5
P1 P2 D1 D2 D3 D4 D5 D6
P2 D1 D2 D3 D4 D5 D6 P2
Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Disk 6 Disk 7 Disk 8
Like RAID-5, RAID-6 stripes blocks of data and parity across an array of drives.
However, RAID-6 maintains redundant parity information for each stripe of data.
This redundancy enables RAID-6 to recover from the failure of up to two drives in
an array, i.e., a double fault. Other RAID configurations can only tolerate a single
fault. As with RAID-5, performance is adjusted by varying stripe sizes.
RAID-6 is good for applications using the largest disks and performing many
sequential reads.
RAID-6
D1 D2 D3 D4 D5 D6 P Q
RAID-5 concept uses 2 separate parity-type fields usually called “P” and “Q.” RAID-
6 allows data to be reconstructed from the remaining drives in a parity group when
any 1 or 2 drives have failed.
The mathematics are beyond a basic course, but is the same as for ECC used to
correct errors in dynamic random-access memory (DRAM) or on the surface of
disk drives
Each RAID-6 host random write turns into 6 parity group I/O operations.
Read old data, read old P, read old Q
(Compute new P, Q)
Write new data, write new P, write new Q
RAID-6 parity group sizes usually start at 6+2.
Has the same space efficiency as RAID-5 3+1
Example
Take
• a 4D+1P RAID-5 LUN,
• a 4D+4D RAID-1+0 LUN, and
• a 4D+2P RAID-6 LUN.
Issue 4 I/Os.
How many back end disk I/Os are generated?
RAID-5 RAID-1 RAID-6
Random Read 4 4 4
Random Write 16 8 24
Sequential Read 4 4 4
Sequential Write 5 8 6
HDD IOPS =
1000/ (Avg Rot Latency ms +
Avg Seek ms)
Raw RG IOPS =
n * IOPS
Actual IOPS =
Raw IOPS / (Read IOPS +
(Write IOPS * y)
SATA Specifics
Try to spread load across as many HDDs as possible to reduce utilization; this
also helps rebuilds.
Do not use large RAID groups or rebuild times will be longer because all disks
must be read.
SATA writes require verification (misdirected write) and this uses extra CTL
CPU cycles.
Hitachi implements LRC, but overhead with RAID-1+0.
Module Summary
Defined performance
Described disk drive characteristics and features
Listed RAID level performance characteristics available with Hitachi
storage systems
Module Review
1. Define Performance.
2. List the RAID types supported by Hitachi storage systems.
3. What RAID type gives good small-block random reads and writes?
4. Which RAID type provides good sequential performance —
RAID-1+0 vs. RAID-5?
1 module 2 module
HDD (2.5 in.) 1,024 2,048
RK-
12 RK-
11 RK-
10 RK-
00 RK-
01 RK-
02
The VSP array can be configured as a single chassis or dual chassis. Each chassis has
at least 1 rack — a control rack (DKC) — in which there is a logic box for boards and
one or 2 optional disk units (DKUs).
A single chassis can be expanded by 1 or 2 disk expansion racks, each with up to 3
DKUs. A DKU is either of the small form factor (2.5 in. disks) or large form factor
(3.5 in. disks) type.
A dual chassis array will have 2 control racks and up to 4 disk expansion racks. The
logic boxes in each chassis are cross connected at the grid switch level to create a
single continuous array (not a cluster pair). A fully configured dual chassis (6‐rack)
array occupies a small footprint (3.6 ft by 11.8 ft) and consumes much less power
than the previous USP V design.
Unlike most storage arrays from other vendors, the VSP is a purpose built system
designed from the ground up to be a storage array and comprised of (where
appropriate) custom logic and processor ASICs. All Hitachi midrange and enterprise
storage arrays are purpose built. In VSP (like all Hitachi designs), there are separate
intelligent components from which the array is created. These components are
operated in parallel to achieve high performance, scalability, and reliability.
The VSP uses 5 types of logic boards:
Grid switches
Data cache adapters (cache boards)
Front‐end directors (FC or FICON ports)
Back‐end directors (disk controllers)
Virtual storage directors (processor boards)
Front-End Directors
The front-end director (FED) board controls the interaction between the array and
servers attached to the host ports. It also manages the connections of external
storage or remote copy products (Hitachi Universal Replicator). There are 2 types of
FED features (a pair of boards) available: a 16‐port Open Fibre and a 16‐port FICON.
The 2 types of interface options can be supported simultaneously by mixing FED
features (pairs of boards) within the VSP.
There are up to 8 FED boards per chassis (2 or 4 more can be added if BEDs are not
installed). Unlike the previous Hitachi Enterprise array designs, the FED board does
not decode and execute I/O commands. In the simplest terms, a VSP FED accepts
and responds to host requests by directing the host I/O requests to the VSD
managing the LDEV in question.
The VSD processes the commands, manages the metadata in Control Memory, and
creates jobs for the Data Accelerator processors in FEDs and BEDs. These then
transfer data between the host and cache, virtualized arrays and cache, disks and
cache, or replication operations and cache.
The VSD that owns an LDEV tells the FED where to read or write the data in cache.
This location will be within the partition allocated to that VSD from the Cache‐A
and Cache‐B pools.
The 16‐port Open Fibre feature The 16‐port FICON feature has
consists of two boards, each with eight 8 Gb/sec FICON ports.
eight 8 Gb/sec open fibre ports. Each port can auto negotiate to 1
Each port can auto negotiate to 1 of 3 host rates: 2 Gb/sec, 4
of 3 host rates: 2 Gb/sec, 4 Gb/sec or 8 Gb/sec
Gb/sec and 8 Gb/sec.
Data cache adapter (DCA) boards are memory boards that hold all
user data and the master copy of control memory (metadata).
Up to 8 DCAs installed per chassis, with 16 GB to 64 GB of cache
per board (64 GB to 1024 GB per chassis).
The Data Cache Adapter (DCA) boards are the memory boards that hold all user data
and the master copy of Control Memory (metadata). There are up to 8 DCAs installed per
chassis, with 16 GB to 64 GB of cache per board (64 GB to 1024 GB per chassis).
The 2 boards of a feature must have the same RAM configuration, but each DCA feature
can be different. The first 2 DCA boards in the base chassis (but not in the expansion
chassis) have a region of up to 48 GB (24 GB per board) used for the master copy of
Control Memory. Each DCA board also has a 500 MB region reserved for a Cache
Directory.
This is a mapping table to manage pointers from LDEVs and allocated cache slots to those
LDEVs in that cache board. Each DCA board also has one or two on‐board SSD drives (63
GB each) for use in backing up the entire memory space in the event of an array
shutdown.
If the full 64 GB of RAM is installed on a DCA, it must have two 63 GB SSDs installed.
On‐board batteries power each DCA board long enough to complete several such
shutdown operations back‐to‐back in the event of repeated power failures before the
batteries have had a chance to charge back up.
The control memory requirement depends on program products licensed on the storage
system. For example, Hitachi Dynamic Provisioning requires 2 GB of additional control
memory and Hitachi Dynamic Tiering requires 8 GB of additional control memory.
Each DCA cache board has 8 DDR3‐800 DIMM slots, organized as 4 banks of RAM (thus
32 independent banks of RAM in all 8 features in a dual chassis array). Each DCA board
can support 16‐ 64GB of RAM using the 8 GB DDR3 DIMMs.
The same amount of RAM must be installed on each DCA board of a feature pair, but
may differ among the installed features. The pair of boards for a DCA feature is installed
in different power domains in the Logic Box (Cluster‐1 and Cluster‐2).
Each DCA board has four 2 GB/sec full duplex GSW ports, each operating at a 1024
MB/sec send and 1024 MB/sec receive rate (concurrently). This provides for 8 GB/sec of
aggregate read‐write bandwidth (wire speed) per board to the FEDs and BEDs.
Each bank of DDR3‐800 RAM has a peak bandwidth of 6.25 GB/sec, or 25 GB/sec
possible per board, although each board is factory rated at 10 GB/sec. Due to the very
high speed of the RAM compared to the GSW ports, all four GSW ports can operate at
full speed.
Each DCA board has one or two 63 GB SSD drives installed. If there is a power outage,
these are the backup target for the entire cache space. In the case of a planned system
shutdown, these are the backup target for the Control Memory region on the first 2 DCA
boards installed in a system. During a power outage, the on‐board battery power keeps
the RAM, the SSDs and the board logic functioning while the DCA microprocessor de-
stages cache to the embedded SSD. There is enough battery power to support 2 or 3 such
outages back‐to‐back without recharging. One SSD is standard and the second one is
required when the RAM size per board is 64 GB.
The memory systems used within the VSP design are quite different than what was
used in the USP V design. This is mostly due to the elimination of the USP V discrete
Shared Memory system. In addition, the VSP does not use dedicated I/O processors
(MPs) on FEDs and BEDs. The VSP uses a segmented cache system to provide a
common region for the Control Memory master copy and the buffer space for each
VSD board to manage its discrete set of LDEVs.
Back-End Directors
Back-end director (BED) boards execute all I/O jobs received from
processor boards and control all reading or writing to disks
• 1 or 2 features (2 or 4 BED boards) per chassis
The Back‐end Director (BED) boards execute all I/O jobs received from the
processor boards and control all reading or writing to disks. There are 1 or 2 features
(2 or 4 BED boards) per chassis. Note that BEDs do not understand the notion of an
LDEV — just a disk ID, a block address and a cache address.
BED functions include the following:
Execute jobs received from a VSD board
Use DMA to move data in or out of data cache
Create RAID‐5 and RAID‐6 parity with an embedded XOR processor
Encrypt data on disk (if desired)
Manage all reads or writes to the attached disks
Each BED board has eight 6 Gb/sec SAS links. There are up to 640 LFF disks or 1024
SFF disks per chassis attached to the 16 or thirty-two 6 Gb/sec SAS links from these
2 or 4 BED boards.
A BED feature consists of two boards, each having four 2W SAS ports. Each port
accepts a cable which contains 2 independent 6 Gb/sec SAS links. Each cable
connects to one port on a DKU.
Thus there are 8 active SAS links per board.
The speed at which a link is driven (3 Gb/sec or 6 Gb/sec) depends on the interface
speed of the disk target per operation. Those disks with the 3 Gb/sec interface are
driven at that rate by the SAS link, and those disks that are 6 Gb/sec are driven at
that higher rate whenever a FED is communicating with them.
The speed in use of each SAS link is thus dynamic and depends on the individual
connection made through the switches moment by moment. Each BED board has 2
Data Accelerator processors, 2 SPC SAS I/O Controller processors, and 2 banks of
local RAM. There is a lot of processing power on each BED board.
The Data Accelerator processor communicates with the VSD boards, accepting or
responding to their I/O commands. It is also a DMA engine to read or write data to
cache. The DA processor also sends I/O commands to its powerful companion SPC
processor. Each SPC processor contains a high performance CPU that has the ability
to directly drive SAS or SATA disks (using SAS link protocol encapsulation), and
provides four 6 Gb/sec SAS links.
This Data Accelerator processor is actually 2 chips under one cover working
together: a processor and a programmable ASIC. This Data Accelerator processor is
different from the ones used in the FED boards. It contains the processor that — like
the DRR processors on USP V BED boards — manages RAID parity operations, data
encryption and disk rebuilds. Note the naming convention of the SPC 2W ports
(Ports 0 to 3) — these will be used in the SAS Engine discussion to follow.
The local RAM is used for buffering data, maintaining VDEV mapping tables (to the
managing VSDs), and other housekeeping tables. There are 4 GSW ports, over which
pass job requests to VSDs and user data moving to or from cache.
Virtual Storage Directors (VSD) is the Hitachi Virtual Storage Platform I/O
processing board.
• 2 or 4 installed per chassis
Each VSD board executes all I/O requests for LDEVs assigned to it.
The VSD board is the VSP I/O processing board. There are 2 or 4 of these installed
per chassis. Each board includes 1 Intel 2.33GHz Core Duo Xeon CPU with 4
processor cores and 12 MB of L2 cache. There are 4 GB of local DDR2 RAM on each
board (2 DIMMs). This local RAM space is partitioned into 5 regions, with 1 region
used for each core’s private execution space, plus a shared Control Memory region
used by all 4 cores.
Each VSD board executes all I/O requests for those LDEVs (up to 16,320 LDEVs per
VSD of the 65,280 LDEV array limit) that are assigned to that board. No other VSD
board can process I/Os for these LDEVs.
This strict LDEV ownership by VSD eliminates all competition for access to Control
Memory and data blocks in cache. Ownership of a VSD’s LDEVs is temporarily
passed to the other VSD in that feature pair if one should fail, with that same
ownership returning upon replacement of the board.
No user data is processed within the VSD itself, so no user data is transferred across
the 4 GSW ports.
Each FED maintains a local copy of the LDEV mapping tables in order to know
which VSD owns which LDEV. No other VSD boards will ever be involved in these
operations unless that VSD board fails.
The firmware loaded onto the VSD board contains all five types of previously
separated code on the USP V. Each Xeon core will schedule a process that depends
upon the nature of the job it is executing. A process will be of one of the following
types or their mainframe equivalents: Target, External (virtualization), BED, HUR
Initiator, HUR Target. In addition there will be system housekeeping type processes.
Target process – manages host requests to and from a FED board for a particular
LDEV.
External (virtualization) process – manages requests to or from a FED port used in
virtualization mode (external storage). Here the FED port is operated as if it were a
host to operate the external array.
BED process – manages the staging or destaging of data between cache blocks and
internal disks via a BED board.
HUR Initiator (MCU) process – manages the “respond” side of a Hitachi Universal
Replicator connection on a FED port.
RCU Target (RCU) process – manages the “pull” side of a remote copy connection
on a FED port.
Grid Switches
There can be 2 or 4 GSWs installed per chassis. Each board has 24 high speed ports,
where each port supports a full‐duplex rate (send plus receive) of 2048 MB/sec.
For every I/O request, FED or BED user data blocks directly go to the Cache
Memory Adapter boards, while all FED or BED metadata and job control traffic goes
to the Virtual Storage Director boards.
Note that VSD boards can only read and write to the reserved Control Memory
region of cache memory, as there is a hardware addressing interlock preventing
each VSD from accessing the host data portion of cache.
Each grid switch (GSW) board has 24 full duplex 2 GB/sec ports
Each PCI Express 4‐lane port can concurrently send at 1024 MB/sec
and receive at 1024 MB/sec
GSW supports an aggregate peak load of 24 GB/sec send and
24 GB/sec receive (a 48 GB/sec full duplex aggregate)
Each GSW board has 24 full duplex 2 GB/sec ports, with each PCI Express 4‐lane
port able to concurrently send at 1024 MB/sec and receive at 1024 MB/sec. As such,
the GSW supports an aggregate peak load of 24 GB/sec send and 24 GB/sec receive
(a 48 GB/sec full duplex aggregate).
The 24 ports are used as follows:
8 ports are used to connect to FED and BED ports. These ports see both data and
metadata traffic intermixed.
4 ports are used to connect to VSD boards. These ports only see job request traffic
or system metadata (such as the address in cache for a target slot).
8 ports are attached to DCA boards, which see Control Memory updates as well
as user data reads and writes.
4 ports are used to cross connect that GSW to the matching GSW board in the
second chassis (if used).
There are no connections among the 2 or 4 GSWs within a chassis. Every FED and
BED board in a chassis attaches to 2 GSWs, while each VSD and DCA board attaches
to all 4 GSWs.
The VSP is not being offered in specific models but as basic model that is a starting point
for configurations that meet a variety of customer requirements.
The figure shows a single chassis array with all logic boards installed, with all cache
DIMMs installed in each DCA board.
The various peak wire speed bandwidths are shown for the different points in the array.
Wire speed rates are what can be achieved for a short burst in a laboratory environment
but aren’t to be expected in typical usage. But these are the rates that all vendors
advertise since user achievable rates depend greatly on the test environment and
workloads. So the only reference value is the electrical limits of the various components.
In the figure shown here, the numbers inside the colored arrows (such as 32 on the red
arrow to cache) indicate the number of Grid Switch paths. The numbers next to them
indicate the peak wire speed rates for sending or receiving. Each such GSW path has a 1
GB/sec send rate and a concurrent 1 GB/sec receive rate. So, 16 such ports gives an
aggregate of 16 GB/sec send and 16 GB/sec receive.
These peak rates (as send + receive GB/sec) for a fully loaded single chassis system
under sustained heavy loads using all ports are:
GSWs to FEDs: 16 + 16GB/sec
GSWs to DCAs: 32 + 32GB/sec
GSWs to VSDs: 16+16GB/sec
GSWs to BEDs: 16 + 16GB/sec
GSWs (Chassis‐1) to GSWs (Chassis‐2): 16 + 16GB/sec
DKU/HDU Overview
Front
HDD x 40
HDD x 40
Fan Assembly
LFF 3.5 HDD box – maximum 80 HDD
Each of the 1 to 8 DKU boxes per chassis contains eight HDU containers. The DKU is
organized into a front and rear, each with 4 HDU sections for 10 or 16 disks. The LFF
DKU holds 80 disks while the SFF DKU holds 128.
Shown here is an artist’s view of a DKU box. There are 2 fan doors on the front and 2
more on the rear. One fan door is removed in this picture. When the pair of fan
doors on the front or rear (an interlock prevents opening both at the same time) are
opened, those fans stop. The fans on the closed doors are then run at double speed
to maintain the same airflow. When these doors are closed, all fans run at half speed
(quieter).
An empty DKU weighs about 100 pounds.
If the DKU box is the SFF type, it can hold 128 disks, with each HDU holding up to
16 disks. If it is the LFF type, it can hold 80 disks, with each HDU holding up to 10
disks.
Disks are added to 4 of the HDU containers within a DKU as sets of 4 (the Array
Group). Array Groups are installed (following a certain upgrade order) into specific
HDU disk slots in a DKU named region known as a “B4‐x.” Each B4‐x is the set of 4
HDUs either from the top section or the bottom section of each DKU box. Note that
all of the HDUs within a chassis are controlled by those two or four BED boards
installed in that chassis.
The generalized BED and DKU layout is shown. A standard performance single chassis
configuration uses only one BED feature (BED‐0), which provides 16 back‐end 6
Gb/sec SAS links to the set of 1‐to‐8 DKUs. The high performance single chassis
uses two BED features for a total of thirty-two 6 Gb/sec SAS links. This doubles the
number of active SAS links to the same sets of HDUs. In each case, with all 8
optional DKUs installed, there may be up to 1024 SFF disks or 640 LFF disks in a
single chassis.
This illustration shows how the SAS links pass through the 2 quads of HDUs per
DKU. Notice how there is a vertical alignment by HDU.
Each colored line indicates 2 SAS 2W cables (1 from each BED feature) where each
cable has 2 active SAS links. So, on a chassis with 2 BED features, there are 32 SAS
links passing through that chassis’ stack of DKUs (1 to 8 of them), with 8 SAS links
passing through each stack of HDUs.
This illustration shows the organization of zones in a DKU into which Parity Groups
are defined. Either the top 4 or bottom 4 HDUs per DKU are a zone. These zones are
usually referred to as a “B4‐x”, from which you get the names of the Parity Groups
when they are created. For 4‐disk Parity Groups (RAID‐ 5 3D+1P, RAID‐10 2D+2D),
all members come from a single B4 zone. For the 8‐disk Parity Groups
(RAID‐5 7D+1P, RAID‐10 4D+4D, RAID‐6 6D+2P), the 8 disks come from the two B4
zones within the same DKU. The name associated with the 8‐disk Parity Group is
taken from the B4 name of the 4-disk Parity Group in the lower zone.
Notice how these disk zones cross the SAS links and power domains (clusters).
8 6Gb/sec SAS links per HDU — switches can connect 8 disks within
the HDU or pass directly to the next HDU on a per link basis
VSP uses 8 6 Gb/sec SAS links per HDU, with the switches able to connect 8 disks
within that HDU at the same time, or to pass directly to the next HDU on a per link
basis.
The figure is a view of a single chassis array with a Standard Performance
configuration (1 BED feature) and 8 DKUs (only 4 DKUs are shown here to keep it
simple) with 64 HDUs (32 shown here) installed in a chassis. This provides sixteen
6Gb/sec SAS links (eight 2W cables) that attach to up to 640 LFF disks or 1280 SFF
disks (or some mixture of those).
A dual chassis array would have 2 of these structures side by side with up to 1280
LFF disks or 2048 SFF disks (or some mixture) in all. Each row of green and blue
boxes is one DKU with all 8 of its HDUs. This BED‐to‐DKU association is fixed
within each chassis.
This figure is the same view as the previous slide, this time showing a single chassis
array with a High Performance configuration (2 BED features) and the same 4
DKUs. This provides 32-6Gb/sec SAS links (16 2W cables) to up to 640 LFF disks or
1280 SFF disks (or some mixture) when using all 8 DKUs. A dual-chassis array
would have 2 of these structures with up to 1280 LFF disks or 2048 SFF disks (or
some mixture).
Note that in the dual chassis configuration, while any FED board can interact with
any VSD or DCA cache board and any VSD can interact with any BED, there is an
exclusive association among the BED boards and the DKUs by chassis. There is no
cross‐chassis distribution at this level since that wouldn’t be possible. The SPC
processors manage certain HDUs from the DKUs within a chassis.
Review:
• A single chassis Virtual Storage Platform
supports a mixture of up to 1024 2.5 in. disks
(SSD, SAS) or 640 3.5 in. disks (SSD, SATA).
• SFF DKUs can hold 128 disks.
• LFF DKUs can hold 80 disks.
• There can be up to 8 DKUs per chassis.
• Total disk counts are determined by how
many DKUs are installed and of which types.
• A single chassis VSP has a limit of 128 SSD
drives.
A single chassis VSP supports a mixture (in Array Group installable features of four
disks) of up to 1024 2.5 in. disks (SSD, SAS) or 640 3.5 in. disks (SSD, SATA) or some
value in between.
Each SFF DKU can hold 128 disks and each LFF DKU can hold 80 disks. There may
be up to 8 DKUs per chassis, so the total disk counts will be determined by how
many DKUs are installed and of which types.
The single chassis VSP has an overall limit of 128 SSD drives, with up to 256 SSDs
possible in the dual chassis configuration.
In the next few slides, we will review various disk types and compare their rated
speeds.
SSD Drives
SSD drives are the newest drive technology in storage arrays. “Drive” is the proper
term since these are not “disks” in any sense. SSD drives are very costly in the “$ per
GB” metric, but fairly inexpensive in the “$ per IOPS” metric.
SSD drives are currently available in 2 sizes: a 200 GB SFF and a 400 GB LFF, both
currently using the 3 Gb/sec SAS interface.
Note that each requires a different type of HDU. With that in mind, if all drives on a
chassis are of the SFF type, it may be a better choice (economically and for maximum
overall drive counts) to stick with the smaller SSD since it will intermix in the DKUs
needed by the SAS drives. If there already are 1 or more LFF DKUs in a chassis, then
this would not be an issue.
Also keep in mind that the IOPS rate per SSD for these 2 models is not the same.
Having eight 400 GB SSDs in the array will yield about 20% less write performance
than when using 8 of the 200GB SSDs.
Also consider that using sixteen 200 GB SSDs will provide a large IOPS boost over
the 8 400 GB SSDs (equal capacity solution). So the cost justification needs to factor
this into the equation.
SAS Disks
SAS disks are the primary disks used in both enterprise and midrange arrays from
Hitachi. These are workhorse disks, with both high performance and good capacities.
They fall in between the random performance levels of SSD and SATA (closer to
SATA). All three disk types will have about the same sequential read performance
per matching RAID levels.
SAS disks are the same as Fibre Channel disks but with a SAS interface.
SAS disks used in the VSP come in 15K RPM and 10K RPM rotation speeds, and in
sizes ranging from 146 GB to 600 GB. The SAS interface speeds on all Hitachi
supported disks is 6 Gb/sec.
SAS disks are designed for high performance, having dual processors, dual host
ports and large caches. The dual host ports allow two concurrent interactions with
the attached BEDs. One port may be receiving or responding to a new I/O
commend while the other is transferring data (only 1 such transfer per disk at a
time).
The first SAS disks were of the usual Large Form Factor size (3.5 in.). Now they are
shifting to the smaller Small Form Factor (2.5 in.). This reduces cost and heat, as well
as reducing seek times (smaller platters).
In general, a 2.5 in. SFF 10K RPM SAS disk will have somewhat higher performance
than a 3.5 in. LFF 10K RPM disk, but lower performance than a 3.5 in. LFF 15K RPM
SAS disk.
SATA Disks
SATA disks offer very high capacities (now at 2 TB per disk but with 4 TB coming
soon) at an “economy” level of performance. They are best suited for archival duty
and are not suitable for high levels of random workloads (like OLTP) with even
modest (20% or so) sustained levels of write.
Due to the large capacities, most SATA disks are used in RAID‐6 configurations in
order to avoid the likelihood of a dual disk failure during the potentially long
rebuild times of a failed disk onto a spare disk.
However, the use of RAID‐6 carries a very high RAID write penalty factor of 6
internal array disk operations per host write request. With SATA, there are three
more such operations (read‐verify) per write, or 9 in all per host write request.
Some users will trade off usable capacity for a large increase in write performance
by the use of RAID‐10 rather than RAID‐6. RAID‐10 only carries a write penalty of 6
disk operations per small‐block host write request (two 512/520 pre‐reads, 2 writes
and 2 read‐verify operations). So the usable capacity is reduced from 80% (RAID‐6
6D+2P) to 50% (RAID‐10 4D).
Hitachi’s recommendation is to use SATA volumes for near‐line storage with a low
frequency of access but online availability. The decision to use up valuable internal
chassis disk slots with SATA disks rather than the much higher performing SSD or
SAS disks should be carefully considered. In many cases, the use of SATA disks
within virtualized storage on an external midrange product might make better sense.
However, there will be individual cases where the use of some internal slots for
SATA disks will solve a specific customer business problem. It is not expected that a
VSP would normally be configured with a high percentage of internal SATA disks.
Nominal
Usable
Form Port Advertised Random
HDD Type RPM Size
Factor Speed Size (GB) Read
(GB)
IOPS
2TB SATA 7200 LFF 3 Gb/sec 2000 1832 80
This table illustrates the advertised size (base‐10) and the typical usable size (based‐2,
after Parity Group formatting) of each type of disk. Also shown is the “rule of thumb”
average expected random IOPS rate for each type of disk.
The type and quantity of disks used in a VSP and the RAID levels chosen for those
disks will vary according to analysis of the user workload mix, cost, application
performance targets and usable protected capacity requirements.
All LDEVs (actually pointers to VDEVs containers) that are created from Parity
Groups that use SATA disks can be configured either as OPEN‐V or 3390‐M
emulation; therefore these internal SATA VDEVs are usable for mainframe volumes.
Overview
Hitachi Universal Storage Platform® V is a high performance and large capacity disk
storage system at the high end that follows the architecture of Universal Storage
Platform and has an improved Hi-Star Net Architecture and a faster microprocessor.
A USP V can consist of one disk control frame (DKC), which can install 128 HDDs,
and up to 4 disk array frames (DKU) that each can install 256 HDDs and a
subsystem. It is a flexible configuration that goes from 5 to 1,152 HDDs (minimum
and maximum). The USP V is lined up with 2 models of a 3-phase AC power model
and a single-phase model, and each model is connectable to both mainframe systems
and open systems.
Number of disk drives:
Up to 256 HDDs/16 disk paths (when 2 DKA pairs are installed), or
Up to 640 HDDs/32 disk paths (when 4 DKA pairs are installed), or
Up to 896 HDDs/48 disk paths (when 6 DKA pairs are installed), or
Up to 1,152 HDDs/64 disk paths (when 8 DKA pairs are installed)
The USP V new half-sized PCBs allow for a less costly, more granular expansion of
a system. For instance, there were typically 4-6 front-end director (FED) packages
installed in a Model Universal Storage Platform 600, and they could be a mixture of
Open Fibre, ESCON, FICON and iSCSI. However, this gives ave you a large number
of ports of a single type that you may not need, with a substantial reduction of other
port types that you may need to maximize.
With the new half-sized cards, you can have 8 CHA packages (or up to 16 at the
expense of disk BED packages), using any mixture of the interface types as before.
However, there are half as many ports per board so that fewer of the less-used port
types can be installed. Packages are still installed as pairs of PCB cards just as with
Universal Storage Platform.
The USP V back -nd disk adapter (BED) has four 4 Gb/sec loops supporting up to
128 physical disks. The Universal Storage Platform BED pair has 16-2 Gb/sec Fibre
loops supporting up to 256 physical disks. There are up to 8 BED packages (16 PCBs)
in USP V, while Universal Storage Platform has 4 BED packages (8 PCBs), both with
the same number of loops.
BED = Back-end Director
DKA = Disk Adapter
DKC = Disk Controller Unit
Metadata lives in
Shared Memory
BC difference All cache is
tables, CoW available for
tables, etc. data
CHPs can assist other CHPs on the same PCB • CHPs can only assist other CHPs on
CHP controls 2 FC paths Queue tag pool of 4096 is the same physical card.
(for example, 1A, 5A). managed by the CHP • CHPs can only assist other CHPs of
(across FC 2 ports). the same port mode (any target/TC
initiator/external initiator).
Universal Storage Platform – V Series • MP contention is reduced
1A 3A 5A 7A 1B 3B 5B 7B 2A 4A 6A 8A 2B 4B 6B 8B automatically.
• Sharing reduces as all MPs get busy
– the higher the CHP utilization
DX4 DX4 DX4 DX4 DX4 DX4 DX4 DX4
percentage, the less likely it is to
assist another CHP.
CHP
400MH CHP
400MH CHP
400MH CHP
400MH CHP
400MH CHP
400MH CHP
400MH CHP
400MH
z z z z z z z z
DTA
DTA DTA
DTA
• Parity generation in
hardware
DKP DKP DKP DKP DKP DKP DKP DKP • Data still needs to be
pumped through the
paths by DKPs to
allow parity to be
calculated
• BED utilization will be
DTA MPA DTA MPA
DTA DTA
higher with RAID-5
and higher again with
RAID-6
2x1064 MB/sec 8 X 150 MB/sec 2x1064 MB/sec 8 X 150 MB/sec
DATA Only Meta-DATA Only DATA Only Meta-DATA Only
SATA
SATA R-a-W occurs
in disk canister and
does not load BED R-a-W
or switch.
64 KB segment
16 KB sub-segment 16 KB sub-segment 16 KB sub-segment 16 KB sub-segment
16KB sub-segment
2 KB block 2 KB block 2 KB block 2 KB block 2 KB block 2 KB block 2 KB block 2K B block
Architecture — Storage
Storage Overview
6. Assign addresses in
CU:LDEV format
00:00
00:01
00:02
RAID Overview
The factors in determining which RAID level to use are cost, reliability and
performance. The table above shows the major benefits and disadvantage of each
RAID type. Each type provides its own unique set of benefits so a clear
understanding of your customer’s requirements is crucial in this decision.
Another characteristic of RAID is the idea of “write penalty.” Each type of RAID has
a different back‐end physical disk I/O cost, determined by the mechanism of that
RAID level. The table above illustrates the trade‐offs between the various RAID
levels for write operations. There are additional physical disk reads and writes for
every application write due to the use of mirrors or XOR parity.
Note that SATA disks are usually deployed with RAID‐6 to protect against a second
disk failure within the Parity Group during the lengthy disk rebuild of a failed disk.
Also note that you must deploy many more SATA disks than SAS disks to be able to
meet the same level of IOPS performance. To protect data, in the case of SATA disks,
there are additional physical I/O operations per write in order to compare what was
just written to disk with the data and parity blocks held in cache. With RAID‐6, there
are 3 such blocks: data, parity1 and parity2.
Module Summary
Module Review
Unified
Storage
Block
Modules
Enterprise design
Host multipathing
Hardware load balancing
Dynamic optimized performance
99.999+% reliability
Throughput
HUS150
16-32 GB cache
HUS 130 16 FC, 8 FC and 4 iSCSI
16 GB cache Mix up to 960 flash drives,
16 FC, 8 FC and 4 iSCSI SAS and capacity SAS
HUS 110 Mix up to 264 flash drives, 32 SAS links (6 Gb/sec)
8 GB cache SAS and capacity SAS Max. 40 standard 2.5 in. trays
8 FC and 4 iSCSI ports 16 SAS links (6 Gb/sec) Max. 80 standard 3.5 in. trays
Mix up to 120 flash drives, Max. 10 standard 2.5 in. trays Max. 20 dense 3.5 in. trays
SAS and capacity SAS Max. 19 standard 3.5 in. trays
8 SAS links (6 Gb/sec) Max. 5 dense 3.5 in. trays
Max. 4 standard 2.5 in. trays
Max. 9 standard 3.5 in. trays
Scalability
Supply
Power
Ports Xeon 2-core 3.5GB
3.5GB Xeon 2-core
LUN Management Region
1.73Hz RAM
RAM 1.73Hz I/O Module - I/O Module - I/O Module - I/O Module -
Ports Ports Ports Ports
Intel
6.4GB/s
6.4GB/s
6.4GB/s
6.4GB/s
3.2GB/s PCH Mngt
Mngt PCH 3.2GB/s
CPUs
Passive Backplane
NVRAM
NVRAM
RAID RAID
Processor Processor 16GB
16GB 6.4GB/s Crossover
(DCTL)
10.6GB/s
DDR3
10.6GB/s
Supply
Power
SAS CTLR SAS CTLR SAS CTLR SAS CTLR
SAS
8 x 6Gbps 8 x 6Gbps 8 x 6Gbps 8 x 6Gbps
LOAD BALANCING REGION
0 1 2 3 0 1 2 3
SAS Wide Cable
(4 Links @
6Gbps each)
Disk
Enclosures Tray 0 Tray 1 Tray 2 Tray 3
24 2.5" HDDs 12 3.5" HDDs 24 2.5" HDDs 12 3.5" HDDs
The Dynamic Virtual Controller of the Hitachi Unified Storage 100 family controllers
allows for simultaneous host access of any LUN on any host port on either
controller with very little added overhead. A host accessing a LUN via a port on
Controller‐0 can actually have most of the I/O request processed completely by
Controller‐1 with little intervention by the Intel CPU in Controller‐0. When a LUN is
accessed on ports of the non‐managing controller, the data is moved across the
inter‐DCTL bus into the alternate controller’s mirror region of cache by the
managing controller when the back‐end I/O is completed.
LUN Management
• Hitachi Unified Storage uses a dynamic, global table of all configured
LUNS that determines which controller will execute the back-end part of
an I/O request for a LUN
• Execution is independent of which front-end port (either controller) is
involved in accepting or responding to the host I/O request
• All LUNs are initially automatically assigned by Hitachi Storage Navigator
Modular 2 on a round-robin basis to the controller I/O management lists
as LUNs are created
• The LUN management table is changed over time by the operation of the
Hardware I/O Load Balancing (HLB) feature (described below)
LUN Management
Local Local
Data #0 Data #1
Mirror of Mirror of
#1 #0
The LUN Management system allows for a host I/O present on any front-end port
to be processed by either controller. All LUNs are associated with 1 of the 2
controllers by the LUN Management Table. If a host request for a LUN arrives on a
port on the current managing controller, then that is called “Direct Mode.” If that
request is for a LUN managed by the other controller, that is known as “Cross
Mode.” In Cross Mode, the local Xeon processor directly sends the request to the
Xeon CPU on the other controller for execution over the inter-DCTL
communications bus.
While the LUN Management front‐end design of the Hitachi Unified Storage 100
family eliminates most host path assignment management between the ports on the
2 controllers, there is still a need for host path failover in the event of a path failure
(bad cable, complete loss of a controller, switch failure, etc.).
Default Logic in Hitachi Unified Storage 100 family systems for writes
received from Hosts is “write back”
• Host issues a write.
• Data is written into cache and acknowledged to host.
• This data (dirty) is then asynchronously destaged to disks.
System
Partition #0
MML
CTL-1 Mirror
User Data
Writes Writes
CTL-1 Cache
System
Partition #1
MML
CTL-0 Mirror
User Data
Cache Memory
SI/TC/
Mirror User Data Area Others COW/TCE
MVM
RG DP Pool DMLU
The default cache configuration for each controller is automatically split into 3
regions, where these are the System Region, the Memory Management Layer and the
User Data Region. The User Data Region is further split equally into the User Data
Area and the Mirror Area. The User Data Area is the buffer space for local host I/O
operations, while the mirror region is the mirror of the User Data Area on the other
controller in the subsystem pair.
Memory Management Layer (MML) is used to manage the metadata for several
Replication Products. It is a fixed size (per Hitachi Unified Storage model) and is
always present on each subsystem whether or not any Replication licenses are
installed. It provides a virtual memory space for cache for the following:
Hitachi ShadowImage® In‐System Replication software bundle (SI)
Hitachi Copy‐on‐Write Snapshot (CoW)
Hitachi TrueCopy® Remote Replication (TC)
Hitachi TrueCopy Extended Distance (TCe)
Modular Volume Migration (MVM)
Quick Format (QF)
RG – RAID Group, a hidden space set aside on each RAID Group when they are
created, with the maximum size set at 5 GB. The size is a function of the number of
disks and their sizes. It is used by the Quick Format function.
DMLU – Differential Management Logical Unit, located in a user selectable
location, with a size that ranges from 10 GB to 128 GB. It is used by ShadowImage,
TrueCopy and volume migrator for their metadata. The target location may be a
regular LUN, a DPVOL or a hidden LUN. The DMLU function has actually existed
on Hitachi modular systems since the Thunder 9500 model.
DP Pool – space on a DP Pool for the metadata for Copy-on-Write and TrueCopy
Extended Distance. The maximum space taken from the pool is 50 TB. This space is
allocated from the pool as needed, just as with a standard DP-VOL.
System and
Software Partition #0
MML
CTL-1 Mirror
User Data
Writes Writes
CTL-1 Cache
System and
Software
Partition #1
MML
CTL-0 Mirror
User Data
Enabling the HDP and HDT keys is the only time the System Region will grow. The
HDT key is supposed to add extra space over what the HDP key creates, but our
tests don’t show this to be true.
When the licenses for HDP (high capacity mode) is enabled, the System Region
grows (reboot not required) to create the large workspace for this packages. The
default overall cache configuration changes to the following sizes:
32 GB (HUS 150) - 9320 MB User Data and 9320 MB Mirror, 2048 MB MML
and 12,080 MB System
16 GB (HUS 130) - 3920 MB User Data and 3920 MB Mirror, 1024MB MML and
7520 MB System
8 GB (HUS 110) - 950 MB User Data and 950 MB Mirror, 816MB MML and
5476 MB System
System
Partition #0 Partition #1
MML
When a controller fails, the surviving controller modifies its cache usage. The Mirror
Region is replaced by a second Base Partition to replace the one lost on the failed
controller. Writes are no longer mirrored. All of the LUNs that were managed by the
other controller are serviced by the remaining controller, and it uses the second
partition as the work space for those LUNs. If software was installed, the System
Region shown here would be much larger, and the 2 Base Partitions would be much
smaller.
Partitioning Cache
CTL-0 Cache
Partition 0
Partition 2
Partition 4
System
Master
MML
Sub-
Sub-
CTL-1 Mirror
Writes Writes
CTL-1 Cache
Partition 1
Partition 3
Partition 5
System
Master
MML
Sub-
Sub-
CTL-0 Mirror
Cache Partition Manager (CPM) allows for the creation of a Master Partition and one
or more small Sub-partitions in the User Data Area. The small Sub-partitions may be
used to isolate selected LUNs in cache so that I/O operations on those LUNs do not
affect the general population of LUNs. Additionally, the cache segment size may be
changed for each Sub-partition if desired. The default cache segment size is 16 KB.
The alternate choices for a Sub-partition are: 4 KB, 8 KB, 64 KB, 256 KB and 512 KB.
PCI-e x8 PCI-e x8
SXP controls which of the four 6 Gb/sec SAS links from the IOC will be cross-
connected to an individual disk for an I/O operation.
This matrix connection feature implements automatic assignment of disks to
links where each RAID Group is defined.
SAS Multiplexor (SMX) interface chip
Are connected to either an IOC processor or the SXP chip (not on HUS 150). The
SMX chip is where each Wide SAS cable ( four 6 Gb/sec SAS links) is attached to
the rear panel of a controller.
Controller-1
Wide Cable Expander To
next
tray
4-SAS 4-SAS
Links Links
12 3.5"
To
next
tray
4-SAS 4-SAS
Links Links
Controller-0 Expander
Wide Cable
This is the LFF form factor 12-disk tray, which may be populated with any of the 3.5
in. SAS 7200 RPM drive choices. The 2 SAS Expanders in each tray attach each disk
port to any of the four 6Gbps SAS links in that Expander. Additional trays are daisy-
chained from the outbound SAS links port. HUS 110 and HUS 130 models have
either a similar 12-disk or the 24-disk tray as an internal disk bay.
Controller-1
Wide Cable
Expander
4-SAS 4-SAS
Links Links To
next
tray
24 2.5"
To
next
tray
4-SAS 4-SAS
Links Links
Expander
Controller-0
Wide Cable
This is the SFF form factor 24-disk tray, which may be freely populated with 2.5 in.
SAS and SSD drives. The 2 SAS Expanders in each tray attach each disk port to any
of the four 6 Gb/sec SAS links in that Expander. Additional trays are daisy-chained
from the outbound SAS links port. HUS 110 and HUS 130 models have either a 24-
disk or a 12-disk equivalent tray for the internal disks.
Controller-1
Expander
4-SAS 4-SAS
Links Links
Port-3
24 3.5"
Controller-0
4-SAS 4-SAS
Links Links
Port-2 Expander
Controller-1 Expander
4-SAS 4-SAS
Links Links
Port-1
24 3.5"
Controller-0
4-SAS 4-SAS
Links Links
Port-0 Expander
This is a representation of the high density 48-disk LFF drawer. It holds up to 48 SAS
7200 RPM 3.5 in. disks. The 2 pairs of SAS Expanders in each tray attach each disk
port to any of the 4 SAS links in that Expander. Additional drawers are daisy-
chained from the outbound SAS links port. This top-load drawer functions as 2
independent 24-slot trays.
0 0
SAS Wide Cable
(4 Links @
6Gbps each)
Internal Disk Options:
Internal HDD 12 LFF Slots
Tray 0
24 SFF Slots
12 3.5" HDDs
Tray 2
ENCLOSURE
“STACK”
This is a view of the attachment of 2 types of trays (in just 1 “stack”) to the 2
controllers. There are eight 6 Gb/sec SAS links in all, and 2 SAS engines.
The drives may be any mixture which is supported by that tray type.
The HUS 110 system chassis has a built‐in 3.5 in. 12‐disk or a 2.5 in. 24‐disk bay. It
can be expanded by use of nine 3.5” in.12‐disk trays or four 2.5 in. 24‐disk trays (or
some combination), with the 3.5 in.
Up to 120 disks
• Up to 15 SSDs
• SAS disks (1 internal 12/24 disk bay, up to four 24-slot or nine 12-slot disk trays)
Two controllers
Up to 12 host ports
Host ports
Eight 8 Gb/sec FC ports, plus pair of optional daughter cards that provides 4 x [1
GigE iSCSI or 10 Gb/sec iSCSI] ports
OR
Four x [1 GigE iSCSI or 10 Gb/sec iSCSI] ports (all FC ports are license key
disabled)
0 1 0 1
SAS Wide Cable
(4 Links @
6Gbps each)
Internal Disk Options:
Internal HDD 24 2.5" HDDs 12 LFF Slots
Tray 0 Tray 1
24 SFF Slots
48 3.5" HDDs
Tray 4 Tray 5
ENCLOSURE
“STACK”
This is a view of the attachment of three types of trays (in 2 “stacks”) to the 2
controllers. There are sixteen 6 Gb/sec SAS links in all, and 2 SAS engines.
The drives may be any mixture which is supported by that tray type. The 130 system
chassis has a built‐in 3.5 in. 12‐disk or a 2.5 in. 24‐disk bay. It can be expanded by
use of nineteen 3.5 in. 12‐disk trays, ten 2.5 in. 24‐disk trays, or five 3.5 in. 48‐disk
drawers (or some combination).
Up to 264 disks
• Up to 15 SSDs
• SAS disks (1 internal 12/24 disk bay, up to ten 24-slot, nineteen 12-slot, or five
48-slot disk trays)
Two controllers
Up to 16 host ports:
• Eight 8 Gb/sec FC (built-in)
• Optional: eight 8Gb/sec FC (4-port modules) or four 10Gb/sec iSCSI
(2-port modules)
I/O request limit of 512 tags per port, 8 MB maximum transfer size
0 1 2 3 0 1 2 3
SAS Wide Cable
(4 Links @
6Gbps each)
This is a view of the attachment of the 3 types of trays (in 4 “stacks”) to the 2
controllers. There are 32 X 6 Gb/sec SAS links in all, and 4 SAS engines.
3U Block Module
No internal HDDs
3U File Module
2 node cluster standard 4U LFF Dense Disk Tray
3.5 in. HDD x 48 (SAS 7.2k)
The disks may be any mixture of SAS or SSD drives that the HUS 100 family
supports. The 150 has no built‐in disk bay, but the system can support up to forty 3.5
in. 12‐disk trays, forty 24‐disk 2.5 in trays, or twenty 3.5 in. 48‐disk dense drawers
(or some combination). Due to SAS link timing windows, HUS 150 is limited to
having forty trays (as 10 per “stack”), so with the exclusive use of the 12‐disk trays,
the disk limit is 480.
Up to 960 disks
• Up to 30 SSDs
• SAS disks (up to forty 24-slot, eighty 12-slot, or twenty 48-slot external disk trays)
Two controllers
Up to 16 Host ports:
• Eight or sixteen 8 Gb/sec FC (4-port modules)
• Eight 8 Gb/sec FC and four 10 Gb/sec iSCSI (2-port modules)
• Eight 10 Gb/sec iSCSI (2-port modules)
I/O request limit of 512 tags per port, 8 MB maximum transfer size
Module Summary
Module Review
1. Which statements are true related to Hitachi Unified Storage? (Select two.)
A. The HUS family has 6Gb/sec SAS back end ports.
B. The HUS family supports FC disks.
C. The HUS 130 supports 8GB DIMMs.
D. The HUS controller cache is backed up to flash for unlimited retention.
E. The AMS2500 model can be upgraded to HUS 150.
Adaptable
Modular
Storage 2500
Adaptable
Modular
Adaptable Storage 2300
Modular
Storage 2100
Price
First released in August 2008, the Hitachi Adaptable Modular Storage (AMS) family
replaced the previous generation of midrange Hitachi modular storage systems. The
AMS systems have much higher performance and incorporate several significant
design changes.
Rev 1
Access other CTL ports
without involving the other
processor
Controller cache
Rev 2
At the core of each REV2 2100 controller there is a 10th-generation Hitachi DCTL
processor (a high-performance RAID and DMA engine) and an Intel Core Duo
Celeron processor.
Each Adaptable Modular Storage 2100 controller (2 per system) includes:
A DCTL processor (the I/O “pump” with cache control and RAID-XOR
functions)
A 1.67 GHz Intel Core Duo Celeron Value series (low voltage) processor and 1
GB of local memory (not cache); this processor is the microcode engine or the
I/O management “brains”
Cache: 2 GB (2 x 2 GB DIMMs) or 4 GB (2 x 4 GB DIMMs) per controller
2 or 4 host ports, including:
1 8 Gb/sec FC ports, plus 1 optional daughter card that provides:
1 8 Gb/sec FC ports
OR
2 GigE iSCSI ports
1 SAS IOC controller providing eight active back-end SAS disk links
2 GB/sec PCI Express (PCIe) internal busses
Rev 1
More FC ports
Faster cache
access
4Gb/sec
Rev 2
At the core of each REV2 2300 controller there is a tenth generation Hitachi DCTL
processor (a high-performance RAID and DMA engine) and an Intel Core Duo
Celeron processor.
Each Adaptable Modular Storage 2300 controller (2 per system) includes:
A DCTL processor (the I/O “pump” with cache control and RAID-XOR
functions)
A 1.67 GHz Intel Core Duo Celeron Value series (low voltage) processor and 1
GB of local memory (not cache); this processor is the microcode engine or the
I/O management “brains”
Cache: 4 GB (2 GB DIMMs) or 8 GB (4 GB DIMMs) per controller
4, 6 or 8 host ports, including:
4 8 Gb/sec FC ports, plus one optional daughter card that provides:
4 8 Gb/sec FC ports
OR
2 GigE iSCSI ports
1 SAS IOC controller providing eight active back-end SAS disk links
2 GB/sec PCI Express (PCIe) internal busses
Rev 1
More FC ports
Dual core
faster processors
4GB/sec
Rev 2
This section discusses some of the details and usage options of the new design.
Logical View
Tachyon Features
A Tachyon processor (single chip) is used to bridge the Fibre Channel host
connection to a usable form for internal use by the Xeon and DCTL processors.
The QE8 processors can provide very high levels of performance as they are
connected to the same high-performance, 2 GB/sec PCIe bus as the Xeon controller
and the DCTL RAID processor. The QE8 processor can drive all of its 8 Gb/sec ports
at full speed, whereas the older DX4 chip can only drive one of its two 4 Gb/sec
ports at full speed. However, for random small block loads, the controller cannot
drive all four of the QE8’s 8 Gb/sec ports at full speed.
Note that each QE8 operates as a companion processor to the Celerons/Xeons and
DCTL processors rather than a directly connected slave chip to the Xeon processor.
A useful application of this feature is when a management processor (the
Celeron/Xeon CPU) is rebooting for a microcode upgrade or certain configuration
changes, this action does not also reset the QE8 processor(s) on that same controller.
However, during a reboot the Tachyon processor(s) will tell the host ports on the
servers to suspend I/O for a short period to keep the connections alive, rather than
appear to be a port that has stopped working. This will prevent servers from taking
a port offline and switching to a failover port.
Allows for:
• Simultaneous host access of any LUN on any host port on either
controller with very little added overhead
• Concurrent use of operating system native path management and host
managed load balancing (such as Hitachi Dynamic Link Manager)
LUNs are alternately assigned to different controllers and cores
No manual assignment necessary for most general workload
installations
System automatically balances loads internally
• Dynamic load balancing does not change host path access scheme
The Hitachi Adaptable Modular Storage 2000 family introduced the active-active
symmetric front-end design, a totally new concept to the midrange storage system
arena. The rigid concept of LUN ownership by controller has been replaced with a
more powerful method of LUN Management. Rather than a simple LUN ownership
by controller, now there is a dynamic, global table of all configured LUNs that
determines which controller will execute the back-end portion of an I/O request for
a LUN. This control list is independent of which front-end port (either controller) is
involved in the host I/O request. The active-active symmetric front end enables this
new capability and the corresponding freedom from micromanaging the appearance
of LUNs on certain paths for certain hosts.
All LUNs are initially automatically assigned by Hitachi Storage Navigator Modular
2 (HSNM 2) on a generally round robin basis to the controller I/O management lists
as LUNs are created. The table of I/O management is changed over time by the
operation of the hardware I/O load balancing feature (described below) that, if
enabled, will move certain LUNs from one controller’s management list to the other
controller. Management may also be changed manually via HSNM 2.
The active-active symmetric controller front end architecture allows for the access of
any LUN over any front-end host port on either controller. Note that managing the
inbound I/O request from a host on a port is an independent operation from the
actual execution of the I/O request.
Simplified Installation
Benefits Quick and easy setting at installation:
1. No need to set controller ownership for each LU
2. Set host connection port without regards to controller ownership
Paths set to
match controller
ownership Connect HBAs
of LUs to any controller
User not
User required
CTL0 CTL1 required to CTL0 CTL1
to set paths
0 1 2 3 set paths 1 3 0 2
LU
The active-active symmetric controller that comes with the AMS 2000 simplifies the
set up and maintenance of the system. The administrator is not required to assign
controller ownership to each LU nor is required to set controller ownership to each
host port.
Standard setting Path Path Path Path Standard setting Path Path Path Path
Manager Manager Manager Manager Manager Manager Manager Manager
(Load balancing) (Load balancing)
HBA HBA HBA HBA HBA HBA HBA HBA HBA HBA HBA HBA HBA HBA HBA HBA
Primary path
Alternate path
User not
User required required
CTL0 CTL1 CTL0 CTL1
to set paths to set
0 1 2 3 0 1 2 3
paths
Server-based path management software will balance the I/O load between the
controllers.
Unique
0 Owning controller of LUN 0 for Midrange
The back end load balancing capabilities in the AMS 2000 family is unique for
modular products. If the CPU utilization rate for a controller exceeds 70% while the
utilization rate for the other controller is less than 10%, then the system will
automatically balance the load by re-routing traffic to the underutilized controller.
This will optimize the system performance without intervention required by the
administrator.
A completely new feature for the Adaptable Modular Storage 2000 family is that of
hardware back-end load balancing between the controllers. On an Adaptable
Modular Storage 2000 system, all back-end I/O for a LUN is performed by the
controller that currently manages that LUN (a very different mechanism than the
normal LUN ownership by controller). If there is an ongoing imbalance of loads
between the controllers, such as one averaging 72% busy and the other averaging 30
percent busy, the load balancing mechanism will decide to move management of
some of the hardest hit LUNs to the non-managing controller. This will shift the
back-end workload for those LUNs to the underutilized controller. Note, this is
independent of which host ports are accessing that LUN — a key observation to
make.
Full Cache
Mirroring (CMD • Owning controller
and Read Data processes the I/O
• DCTL fills both
Flow)
caches using
hardware mirror
logic
Cache has a local data region as well as a separate mirror region that is updated by
the other controller. Not only are all blocks that are being written to disk mirrored in
the other controller’s cache, but all data blocks being read are copied there as well.
Therefore, each mirror region is an exact copy of the other controller’s local data
region. Each controller can supply cache hits for host reads from this mirror region
in local cache for LUNs managed by the other controller. Therefore, any host port
can have an I/O request for any LUN satisfied by the local controller through its
DCTL chip and local cache.
Figure above illustrates the two types of active-active symmetric operations that
occur on the Adaptable Modular Storage 2000 family. When an inbound I/O request
arrives at a port on the controller that currently manages that LUN (red traces), the
I/O operation is called “Direct Mode.” When an inbound I/O request arrives at a
port on the controller that does not currently manage that LUN (dotted blue traces),
this is referred to as “Cross Mode.” When both “Cross” and “Direct” I/O modes are
present (probably the normal operating state), this is called “Mixed Mode.” In
Adaptable Modular Storage 2300 lab tests, there was a fairly small overhead
measured (1% to 4% typically) for a Cross mode test when running a 50:50 mix of
Direct and Cross, using a large number of LUNs and running heavy concurrent test
workloads on 4 ports.
Up to 32 high performance
3 GB/sec point-to-point links
for 9600 MB/sec system
bandwidth
Each link is available for
concurrent I/O
No loop arbitration
bottlenecks
8 SAS links to every disk
tray for redundancy
Common disk tray for SAS
and SATA drives
Any failed component is
identified by its unique
address
Beyond the performance advantages that a SAS back-end design provides, there are
also a number of simplicity advantages. The layout of the connections is simple and
easy to do. The cables and connections are keyed to ensure correct installations.
Finally, SAS and SATA disk drives use the same drive tray. This ensures maximum
system flexibility and reduces the cost of setting up intermixed environments.
The Hitachi Adaptable Modular Storage 2000 family with advanced Point-to-Point
back-end provides higher throughput, and greater I/O concurrency then designs
using legacy Fibre Channel Arbitrated Loop.
Each model in the family has a controller tray with dual controller boards. While the
AMS 2500 controller tray does not have any drive slots, the AMS 2100 and AMS
2300 controller trays have slots for the installation of up to 15 internal disks in the
controller tray. Additional storage can be added to the system by installing disk
trays. Standard disk trays are 3U high and have 15 disk slots each while high density
disk trays are 4U high and have 48 high capacity SATA disk slots.
Every controller board in a system controller tray has a DCTL RAID processor that
connects to either 1 or 2 SAS I/O controller processors (IOC) depending on the
model. AMS 2100 and AMS 2300 have 1 IOC per controller board while AMS 2500
has 2. Each of the IOCs has two SAS multiplexor (SMX) ports that connect to a SAS
expander (SXP) on one of the internal or external disk trays. The connection is made
with a wide (x4) SAS cable that provides four 3 Gb/sec SAS links. An illustration of
the SAS IOC and its connections is shown in Figure above.
The SXP is a SAS expander processor and functions to establish the links between
components. Two SXPs reside on each controller tray with internal disks (AMS 2100
and AMS 2300 systems) and each standard disk tray. There are 4 SXPs on each high
density disk tray. On a standard disk tray, 1 SXP is connected to Controller-0 and
the other is connected to Controller-1. As a result, each controller board has access to
all of the disk trays. Each disk tray has eight SAS links for communications between
the disks and the controllers.
Dynamic link
assignment to HDDs
(per I/O)
12 Gb/sec
3 Grit/sec
Figure above illustrates how the SXPs access the disk drives in each standard tray.
Since a SATA disk drive can be installed in a SAS slot, there is a common tray for
SAS and SATA disks. Each SAS disk is dual ported so that either SXP can access the
drive. The SATA disks are connected to an AA MUX (multiplexor) component,
inside the disk canister, that allows a single port SATA disk to be connected to each
SXP.
Whenever an I/O request is processed by the controller board, the SAS IOC will use
1 of its 8 SAS links for communicating with a disk. Eight links are available on each
of the controllers of the AMS 2100 and AMS 2300 systems and 16 links are available
on each controller of the AMS 2500. Therefore, these systems have plenty of back
end bandwidth available. The SMX port on the controller will route the I/O to the
first SXP expander. The SXP chip provides the actual connection between the IOC
SAS links and the disks in an enclosure. Since the AMS 2100 and AMS 2300
controllers have internal disks, there are 2 SXP chips in their controller modules.
There are no SXP chips in the AMS 2500 controller module since there are no
internal disks.
If the target disk address resides on the same tray as the first SXP, then the expander
will route the request from 1 of its 4 SAS links to the disk. Alternatively, if the target
disk address is not one of the disks on the same tray as the SXP, then the I/O will be
routed out the back of the expander to the next disk tray to which it is connected.
This process continues until the SXP is reached that has a direct connection to the
targeted disk.
Disk Types
Note: Maximum number of SAS SSDs that may be installed per system is: 15 (2100,
2300) or 30 (2500).
RAID Levels
RAID Overhead
Heavy writes/updates with RAID-5 can cause the data controller and RAID groups
to become busy (due to write penalty).
Module Summary
Module Review
Physical Properties
Physical properties of the storage determine the applications and service levels it
is suited to support.
Media specifications (15K 144GB drive, SATA versus FC, SSD, and so on)
Path specifications (4GB, dual path, and so on)
Media failure protection (RAID-10, RAID-5, RAID-6)
Storage Architecture (Enterprise versus Modular, Internal versus External)
Virtualized internal storage versus virtualized external storage
Storage Services
Value-added services, such as replication, are part of a tier definition.
Portability (Hitachi Tiered Storage Manager)
Replication (ShadowImage — SI, Hitachi Universal Replicator — HUR, Copy-on-
Write — CoW)
Thin Provisioning
Conditions
Conditions include characteristics such as dedicated versus shared storage
resources.
I/O rate
Read/Write ratio
Average transfer size
Cache hit rate (if possible)
Response time
Disk capacity in GB
Note that this profile does not identify the level of random writes,
the key criteria from choosing RAID-10 vs. RAID-5.
Read Write
I/O Rate I/O Rate
Random Request Size Request Size
% Cache Hits
I/O Rate I/O Rate
Sequential Request Size Request Size
Read/Write %
• Reads consume fewer back end disk I/Os per host operation
• RAID-5, RAID-6, and SATA handle random writes inefficiently
Cache Hits
• Reduce the number of back end operations per read
Block size
• Small block = more IOPS, less MB/sec
• Large Block = fewer IOPS, more MB/sec, bus occupancy
Random versus Sequential
• Random = disk head movement, fewer cache hits
• Sequential = less disk head movement, more cache hits
% Random Writes, RAID-5 versus RAID-10
• Interactive applications > 5–10% and < 60% cache hits => RAID-10
• Batch Applications > 20% => RAID-10
HDP Overview
HDP Benefits
HDP-VOLume Overview
HDT Overview
Least
Tier 2
Referenced
Pages
Normal
Working
Set
Tier 3
Dynamic
Tiering Quiet
Data Set
Volume
The pool contains multiple tiers (not the other way around like in USPV/HDP).
The logical volumes have pages mapped to the pool (same as USPV/HDP). Those
pages can be anywhere in the pool on any tier in that pool.
The pages can move (migrate) within the pool for performance optimization
purposes (move up/down between tiers).
HDT will try to use as much of the higher tiers as possible. (T1 and T2 will be used
as much as possible while T3 will have more spare capacity.)
You can add capacity to any tier at any time. You can also remove capacity
dynamically. So, sizing a tier for a pool is a lot easier.
Quantity added/removed should be in ARRAY Group quantities.
The first version of HDT (with VSP at GA):
Up to a maximum 3 tiers in a pool.
We will start with managing resources in a 3 tier approach. That may mean:
1-Flash drives, 2-SAS, 3-SATA or
1-SAS(15k), 2-SAS(10k), 3-SATA (or something else)
The Pool’s tiers are defined by HDD type
External storage is also supported as a Tier (lowest).
HDT Benefits
With Hitachi Dynamic Tiering, the complexities and overhead of implementing data
lifecycle management and optimizing use of tiered storage are solved. Dynamic
Tiering simplifies storage administration by eliminating the need for time
consuming manual data classification and movement of data to optimize usage of
tiered storage.
Hitachi Dynamic Tiering automatically moves data on fine-grain pages within
Dynamic Tiering virtual volumes to the most appropriate media according to
workload to maximize service levels and minimize TCO of storage.
For example, a database index that is frequently read and written will migrate to
high performance flash technology while older data that has not been touched for a
while will move to slower, cheaper disks.
No elaborate decision criteria are needed; data is automatically moved according to
simple rules. One, two, or three tiers of storage can be defined and used within a
single virtual volume, using any of the storage media types available for the Hitachi
Virtual Storage Platform. Tier creation is automatic based on user configuration
policies, including media type and speed, RAID level, and sustained I/O level
requirements. Using ongoing embedded performance monitoring and periodic
analysis, the data is moved at a fine grain page level to the most appropriate tier.
The most active data moves to the highest tier. During the process, the system
automatically maximizes the use of storage keeping the higher tiers fully utilized.
Workload Capacity
Ratio(%)
High
I/O 50%
5% Tier1
80% of capacity
I/O 30% 15% Tier2 Tier 2
20% of I/O
Why should Automatic Tiering work? Why does performance increase? Why do
costs lower?
Well it goes back to decades of seeing the same statistical behavior over and over. A
small population at any moment in time is vastly more active than the rest of the
population. The active population will have different members as time goes on but
the size of the active population remains relatively small.
Here we show that about 5% of the data has about 50% of the I/O traffic (this is
physical I/O after read cache hits). Another 15% of the data accounts for 30% of the
I/O traffic. Now you start to see the old 80/20 rule. We see this in cache
implementations, automated warehousing, commuter traffic and so on.
Just like media type, RAID Hitachi Dynamic Tiering volumes are
configurations, and speeds in another kind of volume in a tiered
volume design, Hitachi Dynamic storage architecture.
Tiering multitier volumes are
another way of delivering
tailored storage cost and
performance service levels.
Depending on requirements,
Hitachi Dynamic Tiering pools
and volumes can be configured
to optimize for:
• Maximum performance
• Balanced performance and cost
• Minimum cost for lower tiers
Hitachi Dynamic Tiering volumes deliver superior service levels at lower cost.
HDT Specifications
We will start with managing resources in a 3 tier approach. That may mean:
1-SSD, 2-SAS, 3-SATA or
1-SAS15k, 2-SAS10k, 3-SATA (or something else)
External Storage can also be added as a Tier.
Tier management
• Fills top tiers as much as possible. Monitors I/O references. Adjusts page
placement according to trailing 24 hour heat map cyclically (adjustable
from 30 min to 24 hours).
• Automatic versus Manual controls are available.
• Tier management (migration up and down tier) is automatic and built into
the system’s firmware.
• Hitachi Dynamic Tiering is a unique product (HDP add-on).
We also calculate the average sustainable I/O rate of the tier. We avoid over driving
a tier as well. In some cases a tier may not be able to handle all the heaviest used
pages, so we may elect to keep some in a lower tier. This is unlikely to be a real
problem in most cases, but never the less we watch for this outlier case.
We will have reporting in Device Manager and Tuning Manager with utilization
performance data that can be used to help, for example:
Size tiers in pools
Size pools
Help chargeback (This is a topic that can get overly complex. Basic
representation is to chargeback on quality of service not on specific utilization
levels.)
Determine DP-VOL spread across the tiers
Determine I/O amounts per tier
Tiering Policy
Example
• If tier 1 is specified for a DP-VOL, HDT relocates the DP-VOL pages to
the tier 1. In other words, no matter how low the I/O workload in the page
is, the page is allocated to the tier 1.
In order to maximize the entire pool performance, HDT allocates pages from higher
tier according to relative page ranking based on I/O workload. For this, a page with
lower I/O workload is likely to be allocated to lower tier. Depending on customer
environment, there is data with high importance but low I/O and such data should
be allocated to higher tier in some cases. However, because of low I/O workload,
DT allocates the data to tier 2 or tier 3 and the data with low importance and high
I/O workload may use tier 1. Setting Tiering Policies will help resolve such issues.
Note: The entire volume is locked to a tier as per the defined policy.
Tier
allocation Description Pool: 1-Tier config. Pool: 2-Tier config. Pool: 3-Tier config.
policy
All tiers
All tiers are used All tiers are used All tiers are used All tiers are used
(Default)
Level 3 Only Tier2 is used All tiers are used Only Tier 2 is used Only Tier 2 is used
Module Summary
This module:
• Defined tiers, resource pools and workload profiles
• Identified the membership criteria for storage tiers and resource pools
• Differentiated between batch and interactive workloads
• Defined Hitachi Dynamic Provisioning (HDP)
• Defined Hitachi Dynamic Tiering (HDT)
Module Review
dd Utility
Iometer — Overview
I/O subsystem
• Measurement tool
• Characterization tool
Workload generator
Single and clustered systems.
Can emulate disk or network I/O load.
Measurement and characterization of:
• Bandwidth of buses
• Latency of buses
• Network throughput
• Shared bus performance
• Disk performance
• Network Controller performance
When you launch Iometer, the Iometer window appears, and a Dynamo workload
generator is automatically launched on the local computer.
The topology displays a hierarchical list of managers (copies of Dynamo) and
workers. Workers are threads within each copy of Dynamo.
Disk workers access logical drives by reading and writing a file called iobw.tst in the
root directory of the drive. If this file exists, the drive is shown with a plain yellow
icon. If the file does not exist, the drive is shown with a red slash through the icon.
(If this file exists but is not writable, the drive is considered read-only and is not
shown at all.)
Maximum Disk Size
The Maximum Disk Size control specifies how many disk sectors are used by the
selected workers. The default is 0, meaning the entire disk or iobw.tst file.
Starting Disk Sector
The Starting Disk Sector control specifies the lowest-numbered disk sector used by
the selected workers during the test. The default is 0, meaning the first 512-byte
sector in the disk or iobw.tst file.
# of Outstanding I/Os
The # of Outstanding I/Os control specifies the maximum number of outstanding
asynchronous I/O operations per disk the selected workers will attempt to have
active at one time. (The actual queue depth seen by the disks may be less if the
operations complete very quickly.) The default value is 1.
The value of this control applies to each selected worker and each selected disk.
For example, if you select a manager with 2 disk workers in the Topology panel,
select 4 disks in the Disk Targets tab, and specify a # of Outstanding I/Os of 8. In
this case, the disks will be distributed among the workers (2 disks per worker), and
each worker will generate a maximum of 8 outstanding I/Os to each of its disks. The
system as a whole will have a maximum of 32 outstanding I/Os at a time (2 workers
* 2 disks/worker * 8 outstanding I/Os per disk) from this manager.
Test Connection Rate
The Test Connection Rate control specifies how often the workers open and close
their disks.
Pre-Defined
Workloads
The Access Specification tab lets you control the type of I/O each worker performs
to its selected targets.
Set Sequential/
Random Percentage
Iozone
Automated
mode is
possible, but
be prepared
to wait a long
time!
Also, be
prepared to
ignore lots of
unnecessary
information
IOPS output
Jetstress
Jetstress is a tool provided by Microsoft to help you verify the performance and
stability of a disk storage system prior to putting an Exchange server into
production.
Jetstress helps verify disk performance by simulating Exchange disk I/O load.
Specifically, Jetstress simulates the Exchange database and log file loads produced
by a specific number of users.
You can use available tools such as Performance Monitor, Event Viewer, or ESEUTIL
(Exchange Server Database Utilities) in conjunction with Jetstress to verify that your
disk subsystem meets or exceeds the performance criteria you establish.
SQLIO
Block Size
Disk Config
More Queuing
Processor Counters
% DPC Time, % Interrupt Time, % Privileged Time
If Interrupt Time and Deferred Procedure Call Time are a large portion of Privileged
Time, the kernel is spending a significant amount of time processing I/Os. In some
cases, it works best to keep interrupts and DPCs affinitized to a small number of
CPUs on a multiprocessor system, in order to improve cache locality. In other cases,
it works best to distribute the interrupts and DPCs among many CPUs, so as to keep
the interrupt and DPC activity from becoming a bottleneck.
DPCs Queued / second
Another measurement of how DPCs are consuming CPU time and kernel resources.
Interrupts / second
Another measurement of how Interrupts are consuming CPU time and kernel
resources. Modern disk controllers often combine or coalesce interrupts so that a
single interrupt results in the processing of multiple I/O completions. Of course, it is
a trade-off between delaying interrupts (and thus completions) and economizing
CPU processing time.
Wait defines the Queue not Actv defines the Queue accepted by the
yet accepted by the storage. storage, this should be up to the LUN
Queue Depth value.
iostat is a utility that iteratively reports terminal, tape, and disk I/O activity. It also
reports CPU utilization.
iostat uses counters maintained by the kernel to measure throughput, utilization,
queue lengths, transaction rates, and service time.
iostat does not generate I/O.
The output of the iostat utility includes the following information:
device name of the disk
r/s – Reads per second
w/s – Writes per second
kr/s – Kilobytes read per second (The average I/O size during the interval can
be computed from kr/s divided by r/s.)
kw/s – Kilobytes written per second (The average I/O size during the interval
can be computed from kw/s divided by w/s.)
wait – Average number of transactions waiting for service (queue length)
This is the number of I/O operations held in the device driver queue waiting for
acceptance by the device.
iostat output
iostat is a utility that iteratively reports terminal, tape, and disk I/O activity. It also
reports CPU utilization.
iostat uses counters maintained by the kernel to measure throughput, utilization,
queue lengths, transaction rates, and service time.
iostat does not generate I/O.
The output of the iostat utility includes the following information:
device name of the disk
r/s – Reads per second
w/s – Writes per second
kr/s – Kilobytes read per second (The average I/O size during the interval can
be computed from kr/s divided by r/s.)
kw/s – Kilobytes written per second (The average I/O size during the interval
can be computed from kw/s divided by w/s.)
wait – Average number of transactions waiting for service (queue length)
This is the number of I/O operations held in the device driver queue waiting for
acceptance by the device.
actv – Average number of transactions actively being serviced (removed from
the queue but not yet completed)
This is the number of I/O operations accepted, but not yet serviced, by the
device.
svc_t – Average response time of transactions, in milliseconds
The svc_t output reports the overall response time, rather than the service time,
of a device. The overall time includes the time that transactions are in queue and
the time that transactions are being serviced. The time spent in queue is shown
with the -x option in the wsvc_t output column. The time spent servicing
transactions is the true service time. Service time is also shown with the -x option
and appears in the asvc_t output column of the same report.
%w – Percent of time there are transactions waiting for service (queue non-
empty)
%b – Percent of time the disk is busy (transactions in progress)
wsvc_t – Average service time in wait queue, in milliseconds
asvc_t – Average service time of active transactions, in milliseconds
wt – The I/O wait time is no longer calculated as a percentage of CPU time, and
this statistic will always return zero.
nmon provides
comprehensive
performance
data.
The nmon tool is designed for AIX, and Linux. It can be used to measure and analyze
performance data such as:
CPU utilization
Memory use
Kernel statistics and run queue information
Disks I/O rates, transfers, and read/write ratios
Free space on file systems
Disk adapters
Network I/O rates, transfers, and read/write ratios
Paging space and paging rates
CPU and AIX specification
Top processors
IBM HTTP Web cache
User-defined disk groups
Machine details and resources
Asynchronous I/O — AIX only
Workload Manager (WLM) — AIX only
IBM TotalStorage® Enterprise Storage Server® (ESS) disks — AIX only
Network File System (NFS)
Dynamic LPAR (DLPAR) changes
Storage Monitoring
Performance Monitor
Enterprise Storage
Supports:
• All new interface
• User identifies CUs and HBAs to monitor. VSP collects resource data.
• VSP resource data supports graphing resource utilization, including:
▪ Controller, MPs, DRRs
▪ Cache
▪ Access Paths : CHA, DKA, MPs, Cache
▪ Ports, WWNs
▪ LDEVs, LUNs
▪ External Storage
Enterprise Storage
Enterprise Storage
VSP
Accessed in 3 ways:
• Reports
• Accordion Menu
• General Tasks
The Accordion option (Performance Monitor) allows configuring CUs, WWNs, Short-
Range monitoring and so on, in addition to displaying the Monitor Performance
screen with a button. The Monitor Performance screen is the main tool for tracking
resource utilization.
The Reports option, like the Accordion option, allows for configuration and
monitoring, but with menu selections instead of buttons.
The General Tasks option does not provide for configuration; it goes straight to
Monitor Performance. If you look carefully in General Tasks, it says Monitor
Performance rather than Performance Monitor.
Export Tool
If you want to use the Export Tool, you must create a user ID for exclusive use of the
Export Tool before using it. Assign only the Storage Administrator (Performance
Management) role to the user ID for the Export Tool. We recommend that you do
not assign the roles other than the Storage Administrator (Performance
Management) role to the user ID for exclusive use of the Export Tool to manage the
storage system.
The user who is assigned to the Storage Administrator (Performance Management)
role may perform the following operations:
To save the monitoring data into files
To change the gathering interval
To start or stop monitoring by the set subcommand
Refer to the Hitachi VSP Performance Guide to learn more about the installation and
usage of Export Tool.
RAIDCOM CLI
9500 V
AMS500
AMS1000
only
AMS2x00
9500 V stands for Hitachi Thunder 9500 V Series Modular storage system.
AMS stands for Hitachi Adaptable Modular Storage.
Austatistics — Capabilities
PFM — Capabilities
PFM Stats
• auperform
• StorNav
GUI
O/S
AMS
Application
Assists in analyzing trends in disk I/Os and detects peak I/O time
Can help to identify system bottlenecks within the storage system
For enterprise storage systems, Performance Monitor lets you obtain usage statistics
about physical hard disk drives, volumes, processors, and other resources in your
storage system. Performance Monitor also lets you obtain statistics about workloads
on disk drives and traffic between hosts and the storage system.
Performance Monitor lets you obtain usage statistics about the physical hard disk
drive, logical volumes, processors, or other resources in your storage system.
Performance Monitor also lets you obtain statistics about workloads on disk drives
and traffic between hosts and the storage system. The Performance Monitor panel
displays a line graph that indicates changes in the usage rates, workloads, or traffic.
You can view information in the panel and analyze trends in disk I/Os and detect
peak I/O time. If system performance is poor, you can use information in the panel
to detect bottlenecks in the system.
If Performance Monitor is not enabled, you cannot use Server Priority Manager.
Short Range
• All the statistics that can be monitored by Performance Monitor are
collected and stored in short range
• For 64 or fewer control units
▪ Collects statistics at interval of between 1 and 15 minutes
▪ Stores them between 1 and 15 days
• For 65 or more control units
▪ Collects statistics at interval of between 5, 10, or 15 minutes
▪ Stores them between 8 hours and one day
Long Range
• Collects statistics at fixed 15 minute intervals, and stores them for 3
months (93 days).
The usage statistics about resources (physical tab) in the storage system are collected
and stored also in long range, in parallel with short range. However, some of the
usage statistics about resources cannot be collected in long range.
Performance Monitor panels can display statistics within the range of the storing
periods above.
You can specify a part of the storing period to display statistics. All statistics, except
some information related to Volume Migration, can be displayed in short range on
Performance Monitor panels. In addition, usage statistics about resources in the disk
subsystem can be displayed in both short range and long range because they are
monitored in both ranges. When you display usage statistics about resources, you
can select the displayed range.
Troubleshooting requires a view of the path from the application to the storage
system. Without a tool that consolidates and normalizes all of the data, the system
administrator has difficulty distinguishing between the possible sources of problems
in the different layers involved. When a performance problem occurs or the “DB
application response time exceeds acceptable levels”, they must quickly determine if
the problem is in the application server or outside.
Server/Application Analysis — Is the problem caused by trouble on the server?
(database (DB), file system, Host Bus Adapter (HBA)
Fabric Analysis — Is there a SAN switch problem? (Port, ISL, and more)
Storage Analysis — Is the storage system a bottleneck?
All of the data from the components of the Storage network must be gathered by
different device-specific tools and interpreted, correlated, and integrated manually,
including the timestamps, in order to find the root cause of a problem.
Some customers achieve this by exporting lots of data to spreadsheets and then
manually sorting and manipulating the data.
Server External
App SAN Storage
Storage
Tuning Manager
Server
Oracle, SQL Server, DB
instances, tablespaces, file
Storage
systems, CPU Utility,
Switch Ports, LDEVs, parity group,
memory, paging, swapping,
Whole Fabric, each switch, cache utilization,
file system performance,
each port, MB/sec, performance IOPS, MB/sec,
capacity, and utilization,
frames/sec, and buffer and utilization
VMguests correlation
credits
Collection Manager
Main Console
Performance Reporter
Agents
Collection Manager
Collection Manager is the basic component of the Tuning Manager server.
Main Console
The Main Console stores the configuration and capacity information that the Agents
and Device Manager collect from the monitored resources in the database.
The Main Console displays links to Performance Reporter.
Performance Reporter
Performance Reporter displays performance data collected directly from the Store
database of each Agent.
Agents
Agents manage, as monitored resources, Hitachi disk array storage systems, SAN
switches, file systems on hosts, operating systems, Oracle, and other applications
according to their features.
Agents also collect performance information (such as, the I/O count per second) and
capacity information (such as logical disk capacity) from these resources as
performance data.
Collection Manager
Collection Manager provides following functions:
• Managing Agents
• Managing events issued by Agents
• Controlling data transmission between a Tuning Manager server and Agents
Main Console
According to the specified time frame and interval, the Main Console displays
reports in which the data accumulated in the database is mapped to the performance
data managed by the Agents. The Tuning Manager server database is managed by
the relational database system HiRDB.
The Main Console displays links to Performance Reporter.
Performance Reporter
Performance Reporter provides a simple menu-driven method to develop your own
custom reports. In this way, Performance Reporter enables you to display Agent-
instance level reports and customized reports with a simple mouse click.
Performance Reporter also enables you to display reports in which the current status
of monitored targets is shown in real time. Performance Reporter does not connect
to HiRDB.
Agents
Agents run in the background and collect and record performance data. A separate
Agent must exist for each monitored resource. For example, Agent for RAID is
required to monitor storage systems, and Agent for Oracle is required to monitor
Oracle. The Agents can continually gather hundreds of performance metrics and
store them in the Store databases for instant recall.
Server
Application Oracle, SQL Server, DB2,
Exchange, DB Instances,
Database Tablespaces, File systems,
OS CPU Util., Memory, Paging,
HBA Driver Swapping, File System
performance, capacity,
utilization
Switch
Whole Fabric, each switch,
each port, MB/sec,
frames/sec, buffer credits
Storage
Ports, LDEVs, Parity Group,
Cache utilization, performance
IOPS, MB/sec, capacity
Note: For HDP Pools on Modular storage HTnM collects configuration data only.
The above metrics are collected assuming the relevant agents are installed and
configured to report data to Tuning Manager. These metrics could be the basis for
creating IO Profiles.
Key Metrics
You can use the Analytics tab to analyze storage system performance for the Virtual
Storage Platform, Universal Storage Platform V/VM, or Hitachi USP and determine
if a performance bottleneck is related to a storage system.
Note that a Tuning Manager license is required to use the Analytics tab. For details
of Tuning Manager, see the Hitachi Command Suite Tuning Manager Software User
Guide. If there is an application performance problem, obtain the information needed
to identify a logical group, for example the host where the problem occurred, the
label of the volume, or the mount point. Based on the acquired information, specify
the period (up to the past 30 days) for analyzing the performance and the logical
group or the host that corresponds to the application in order to analyze the
performance bottleneck. Check that there is no resource that exceeds the threshold
values, and determine whether the cause of the performance bottleneck is the
storage system. If the problem exists in the storage system, check the performance
information of each metric within the storage system in more detail. If you could not
detect the problem correctly by using the specified threshold value, change the
threshold value.
For more detailed performance information, start Performance Reporter, to perform
such operations as analysis in minute units or analysis of long-term trends. If you
import a report definition to Performance Reporter in Tuning Manager in advance,
you can display a report with items that correspond to the items in the Identify
Performance Problems wizard.
Module Summary
Module Review
General Concepts
Metrics to Be Collected
Attempt to separate large block I/O and small block I/O onto different port pairs
when possible.
Group sporadic I/O profiles.
Consider redundancy requirements.
Avoid introducing a single point of failure.
Failover should not overwhelm a port or port pair; that is, avoid creating a
cascading failure.
Service levels should play a role in the grouping process.
Physically or logically isolate critical hosts.
Another factor to consider in planning port loading is what happens when a port
fails. If ports are deployed in pairs, then the port microprocessor use of each
member of the pair should be kept below approximately 40% during the normal
load cycle. This allows continued operations without degradation in the event that
one member of the pair fails and the surviving port must carry the entire load (now
80%). In cases where port pair microprocessor use is above 80%, it is likely that
hosts are negatively impacting each other as they compete for storage processing
resources.
One way to improve on the effective capacity of ports is to deploy I/O paths in
groups of four, also known as quads. If one path of a quad fails, then 25% of the
capacity available to the affected servers is lost. However, if one path of a pair fails,
then 50% of the capacity available to the affected servers is lost.
Assuming that 80% utilization represents effective full use, and assuming continued
operation without degradation is a goal, the net of this failure consideration is:
40% should be considered full utilization under normal operating circumstances for
port pairs and ACP pairs.
60% should be considered full utilization under normal operating circumstances for
paths in a balanced port quad.
Metrics to Be Collected
% write pending is the percentage of cache occupied by writes that have yet to be
destaged to disk and is a measure of this accumulation of write data in cache.
Back End Port Utilization
Options to alleviate Back End Director (BED) overutilization:
Add more BEDs and disk enclosures. Evenly distribute the array groups across all
BED pairs in the storage system. This allows the BED utilization to remain within
recommended deployment practices and ultimately ensure the storage system is not
susceptible to I/O degradation caused by single-point failures.
Prevent BED over utilization by migrating LDEVs to another storage system.
For example to remediate BED over utilization on 9980V you can migrate some
LDEVs serviced by the BED to a USP V.
Cache Write Pending Rate
Write Pending of 30% or below is considered normal operation. Write pending of
40% or above warrants corrective action.
When the cache write pending percentage reaches 70%, the storage system stops
accepting new writes in an attempt to destage the writes currently in cache. This
type of spike has a severe impact on all hosts using the storage system.
Metrics to Be Collected
Metrics to Be Collected
Response time
• This threshold depends on the application needs and the Service Level
Agreement (SLA) for the application.
• Since the Logical Unit (LU) Response Time has a direct impact on
applications, this indicator should be monitored on key LUs to determine
deltas as loads increase.
Watch out for worst performing LUs
• Use Performance Reporter or Tuning Manager to monitor worst
performance LUs.
For example:
In a Microsoft Exchange environment, the LU Read or write Response Time for the
Database Disk should be below 20ms in average and spikes should not be higher
than 50ms.
For the Temp Disk, the LU Response Time should be below 10 ms in average and
spikes should not be higher than 50ms.
In a Microsoft SQL Server environment, the LU Read Response Time for the
Database Disk should be below 20ms in average and spikes should not be higher
than 50ms.
Less than 10ms is very good
Between 10–20ms is okay
Between 20–50ms is slow, needs attention
Greater than 50ms is a serious I/O bottleneck
Capacity Planning
I/O Profile
Maintenance Commands
iostat(1M)
Storage Pools
Storage Pools
The administrative objective is to:
Provide each member of the pool with access to as much array group bandwidth
(IOPS and MB/s) as it is likely to require, even in exceptional circumstances
Distribute I/O as evenly as possible within the pool
Keep the storage allocations for the group of applications within the pool
It is not necessary to disperse the I/O of every application in the pool across every
array group in the pool. In fact it is common to define subsets of the pool as I/O
dispersal groups. Nonetheless, it is generally beneficial to distribute the I/O of each
application across more than the minimum number of array groups required to
provide the storage space. This in turn means that applications typically engage in
managed sharing of array group resources.
This approach gives each application access to a larger maximum storage bandwidth
capacity from the array groups it uses. This benefit arises from the premise that it is
unlikely that every application in a dispersal group will have its peak bandwidth
requirements at the same moment.
Storage Pools also share an available space pool. For example, space is added to the
pool, one or more array groups at a time. Considering the Logical Volume Manager
recommendations earlier in this report, capacity will probably be added to Storage
Pools four array groups at a time. This fact alone makes a large number of small
pools inappropriate.
Storage Pools should be large enough to achieve I/O dispersal and bandwidth
sharing among applications, and large enough to reasonably avoid excessive
fragmentation of the available space pool. Storage pools should be small enough to
be manageable.
Compatible service objectives means not having applications that seek minimum
response time sharing the same resources with applications that seek maximum
throughput.
Demand cycles are most compatible when different servers sharing the same
resources have their peak demands at different times.
Request sizes means not mixing small block I/O with large block I/O on the same
array group or front end port at the same time. For example, it means not mixing
more than a difference of several doubles, such as 8K to 64K.
SAN Planning
When planning the storage pool layout and the configuration for a
NAS Platform (HNAS) environment, ask the customer to provide:
• I/O behavior of the application that will run on the environment
• Data and file system structure
In HNAS, you can change the read ahead chunk size property.
• Best to turn off Pre-fetch in the storage
Modify this setting according to the Read I/O behavior of the
application which runs on NAS Platform.
Optimal setting improves the Read I/O performance.
CL1-A CL2-A
CL3-A CL4-A
CL1 CL2
Cluster
• All file systems in a storage pool belong to EVSs on one node.
Queue depth maximum 512 per Cluster (Modular Storage).
The Node has a fixed SCSI Queue Depth of 32 per LUN.
Module Summary
Module Review
External
parity group Port Port
type VDEV Virtual LDEVs are individually
mapped to Pools.
LU External
subsystem
LU
A Logical Unit is a virtual volume that is presented to a host as a path.
Path
A path is identified by three components:
Port
Host group or HSD (Host Storage Domain) identified by numeric ID or by name
LU number (LUN)
HDEV
Host paths comprising a LU map onto an HDEV, which is either mapped directly to
an LDEV or mapped to a LUSE.
The HDEV is identified by its LDEV number, which for a single LDEV is the LDEV’s
number and which for a LUSE is the number of the first LDEV in the LUSE.
The HDEV is visible using the SVP configuration printout tool.
LDEV
LDEV numbers for the USP (RAID500) look like 00:00 or CU:LDEV.
LDEV numbers for the USP V (RAID600) look like 00:00:00 or LDKC:CU:LDEV.
There is only one LDEV name space. That is, each LDEV number within a subsystem
is unique.
LUSE
A LUSE is comprised of from 2 to 36 LDEVs..
Is identified by the LDEV number of the first LDEV in the LUSE (the head LDEV)
VDEV Size
For physical disk parity groups, LDEVs are aligned on parity group
stripe (row) boundaries
For CoW and HDP V-VOL groups, LDEVs are aligned on cache slot
boundaries.
For emulation types other than Open-V, LDEVs are aligned on
logical emulation cylinder boundaries.
Alignment of LDEVs on above boundaries will mean that LDEV size
will be rounded up to a multiple of the boundary interval for the
purpose of laying out the starting point of the LDEV within the VDEV.
For physical disk parity groups, LDEVs are aligned on parity group stripe (row)
boundaries
Although LDEV sizes are in increments of one sector, each LDEV must start at
the beginning of a parity group stripe (row). Therefore you may have wasted
space at the end of the stripe that contains the last sectors in an LDEV.
For example, in Open-V 3+1, the stripe size is 64 KB per logical track x 8 logical
tracks per chunk x 3 chunks per stripe = 1.5 binary MB per stripe.
USP / NSC55 / USP V / USP VM Open-V cache slots contain from 1 to 4
cache segments, each corresponding to an Open-V logical track of 64KB.
Open-V logical cylinders consisting of 15 logical tracks (an old legacy
concept) come into play when sizing volumes in MB*. Use sizing in blocks
(sectors) to get exact volume sizes.
For Mainframe 3390 3+1, stripe size is 58 binary KB per track x 8 tracks per
chunk x 3 chunks per stripe = 1392 binary KB per stripe.
For CoW and HDP V-VOL groups, LDEVs are aligned on cache slot boundaries.
For Open-V, cache slot size is 4 x 64 binary KB = 256 binary KB
For emulation types other than Open-V, LDEVs are aligned on logical emulation
cylinder boundaries.
For Open-X, logical track size is 48 binary KB, logical cylinders are 15 logical
tracks, thus logical cylinder size is 15 x 48 binary KB = 720 binary KB.
This was true for 9900V's OPEN-V as well
Alignment of LDEVs on the above boundaries will mean that LDEV size will be
rounded up to a multiple of the boundary interval for the purpose of laying out the
starting point of the LDEV within the VDEV.
This affects the number of LDEVs that fit within a VDEV.
Some LDEV types also require control cylinders. This also affects the number of
LDEVs that fit within a parity group.*
*Note: See the LUN Expansion and Virtual LVI/LUN User’s Guide for more details.
Host
HBA HBA
Port Port
V-VOL
Group Pool ID
LU HDEV LDEV Pool ID
VDEV
LUSE
When (internal) LDEVs are assigned as CoW or HDP pool volumes, the suffix “ P” is
shown on the name of the parity group that the pool volume LDEV is on.
For example, if a normal LDEV 00:12:34 which is on parity group “1-1” is assigned
as a CoW or HDP pool volume, from that point on, LDEV 00:12:34 will show as
being on parity group “1-1 P”. In this case, the “P” is really a marker to indicate
special treatment as a pool volume, not that there is a parity group whose name is
“1-1 P”.
You can see this using the SVP configuration printout tool.
0 1 2 3 4 5 6 P 0-6 0 1 2 3 4 5 6 P 0-6
8 9 10 11 12 13 P 7-13 7 8 9 10 11 12 13 P 7-13 7
16 17 18 19 20 P14-20 14 15 16 17 18 19 20 P14-20 14 15
24 25 26 27 P 21-27 21 22 23 24 25 26 27 P 21-27 21 22 23
32 33 34 P 28-34 28 29 30 31 32 33 34 P 28-34 28 29 30 31
40 41 P 35-41 35 36 37 38 39 40 41 P 35-41 35 36 37 38 39
48 P 42-48 42 43 44 45 46 47 48 P 42-48 42 43 44 45 46 47
P 49-55 49 50 51 52 53 54 55 P 49-55 49 50 51 52 53 54 55
Parity groups 1-1 and 5-1 have been concatenated, meaning that VDEVs for parity groups
1-1 and 5-1 are now striped across both physical parity groups.
• This does not change parity group that VDEV starts on
• This does not change size of VDEV
Notes: This numbering layout shown above is for normal (non-concatenated) parity
groups. (Colors are accurate above, not necessarily the numbering of the chunks.)
Chunk numbering for concatenated parity groups may be laid out in a different
fashion so as to be able to read chunks 0–15 at once from all 16 drives.
Host
HBA HBA
Port Port
VDEV
LU HDEV LDEV
• • • • • • • • • • • • •
• • • • • • • • • • • • • • •
• • • • • • • • • • • • • •
• • • • • • • • • • • • • • • •
• • • • • • • • • • • • • • • •
• • • • • • • • • • • • • •
• • • • • • • • • • • • • • • •
Host
HBA HBA
Port Port
LDEV
LUSE
LU HDEV
LDEV VDEV
VDEV
• • • • • • • • • • • • •
• • • • • • • • • • • • • • •
• • • • • • • • • • • • • •
• • • • • • • • • • • • • • • •
• • • • • • • • • • • • • • • •
• • • • • • • • • • • • • •
• • • • • • • • • • • • • • • •
The application must also generate concurrent IOPS in order for I/O
Rates to benefit from a large TCQ count.
I/O Rate
Performance is
stuck here with
single threaded 50% Increase
workloads, and in IOPS
when TCQ=1.
Response Chart
Response is
usually at its best
TCQ=16 but
with single
also near the
threaded
mechanical
workloads, and
limit of 2 AGs
when TCQ=1
(8 drives)
1200
Mix of Read and
Writes impairs
1000 optimization
800
I/O Rate Per sec
TCQ=16
600
TCQ=16 R/W
400
200
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
IOMETER Outstanding I/O Count
Rules
• Do not exceed the physical number of Tags per port/LUN.
▪ # Servers x # LUNs x LUNQD <= 512 (Each active I/O uses 1 Tag)
• No help from the factory if you break the rules!
Need to check
• Note: The rules can be broken in some cases. the specifications
of each storage
Working out if it has gone wrong – and what happens system
• Unexplained drop in IOPS (but needs a high IOPS rate in first place)
• Unexplained increase in response
• Unexplained variability of response
• Need to calculate sum of active queues of all servers or LUNs on port
# Servers x # LUNs x LUNQD <= 512 (Each active I/O uses 1 Tag).
If AMS is used as an external storage the max queue-depth for the whole system is
limited to 500.
“I am having trouble
getting the throughput of
an AMS up in a
benchmark test. It is for a 12 RAID groups
BIG customer and they (108 HDDs)
have their hair on fire. were sharing a
Please give me a call
ASAP.” Tag limit of 16.
iSCSI
FC
FC
LDEV 39
Note: Modular Storage supports up to 500 tags per subsystem when used as an
external storage. In the case of too high Queue Depth of I/Os from virtualizing
storage to AMS 2000 storage as external storage, a lot of dirty data can be easily
stored in the cache memory and that can inhibits new write commands in the
controller. Additionally, it takes much time to search a lot of dirty data. As a result,
CPU utilization becomes high and performance degradation can occur.
Module Summary
Module Review
1. What is a VDEV?
2. What is Command Tagged Queuing?
Application Monitoring
The running application is going to drive you to the metrics that you observe.
Airline reservations are actually a special case because small block size (less than
1KB) creates distinct requirements.
40
35
Users define their
30 acceptable or target
response levels
Response in Msec
25
RAID1 15K
RAD5 15K
20
RAID5 10K
RAID1 10K
15
10
0
0 2000 4000 6000 8000 10000 12000 14000 16000
IOPS
Metrics
Name in Value Normal Bad
Metric Name As seen from
HTnM Description Value Value
I/O Rate IOPS I/Os per second N/A Host View
HTnM Rules of Thumb
Read Rate Disk Reads/sec I/Os per second N/A Host & Port View Guide Document
Write Rate Disk Writes/sec I/Os per second N/A Host & Port View • A starting point –
Read Block Size Avg Disk Bytes xfered per 4096 to N/A Host View you need to
Bytes/Read I/O operation 8192 ‘Calibrate’ your
Write Block Size Avg Disk Bytes xfered per 2048 to N/A Host View Applications and
Bytes/Write I/O operation 8192 fine-tune the values
Read Response Avg Disk Time required to 1 to 7 >7 Host & Port View that are OK in your
Time Sec/Read complete a environment.
Read I/O
(Millisecond) • This will help you
Write Response Avg Disk Time required to 1 to 2 >2 Host & Port View remember what is a
Time Sec/Write complete a good value or bad
Write I/O value if you need to
(Millisecond)
troubleshoot a
problem at a later
Values shown in the Normal Value column are date.
planning estimates.
You should baseline your I/O profile when all systems are in a good
and normal running state.
Metrics
Normal
Metric Name Value Description
Value
Average Queue Average Number of disk requests queued for 1 to 8
Length execution on one specific LUN 8 is the Max Qdepth value
Rich Media
Normal As seen
Metric Name Name in HTnM Value Description Bad Value
Value from
I/O Rate IOPS I/Os per second N/A N/A Host View
Read Rate Read IOPS I/Os per second N/A N/A Host View
Read Block Size Avg Disk Bytes xfered per 65536 N/A Host View
Bytes/Read I/O operation
Read Data xfer Avg Read Disk Bytes xfered per .5 to 1.875 < .5 MB/s per Host View
Rate per user (1) Bytes/sec second per user MB/s per user
user
Port Utilization Port Transfer Data Transfer Rate 50 to 187 >187 MB/s Port View
(100 users) (2) (MB/s) MB/s OK if Response
Time still good
Read Response Avg Disk Sec/Read Time required to 1 to 2 >2 Host & Port
Time complete a Read View
I/O (Millisecond)
Read Hit Ratio Read Hit % % of Read I/Os 90% to < 90 % Port View
satisfied from 100%
Cache
Email Systems
Bad
Metric Name Name in HTnM Value Description Normal Value As seen from
Value
I/O Rate IOPS I/Os per second N/A N/A Host View
Read Rate Read IOPS I/Os per second N/A N/A Host View
Write Rate Write IOPS I/Os per second N/A N/A Host View
Avg Disk Bytes xfered per
Read Block Size 4096 N/A Host View
Bytes/Read I/O operation
Avg Disk Bytes xfered per
Write Block Size 4096 N/A Host View
Bytes/Write I/O operation
Time required to
Read Response
Avg Disk Sec/Read complete a Read 1 to 5 >5 Host & Port View
Time
I/O (Millisecond)
Bad
Metric Name Name in HTnM Value Description Normal Value As seen from
Value
Average Queue Avg Disk Queue Average Number of 1 to 8 >8 Host View
Length Length disk requests
8 is the Max (Note that Max
queued for
Qdepth value Qdepth might vary
execution on one with different
specific LUN hardware settings)
Read Hit Ratio Read Hit % % of Read I/Os 35 to 65 < 35 Port View
satisfied from
Cache
Write Hit Ratio Write Hit % % of Write I/Os 100% < 100% Port View
satisfied from
(0% on USP)
Cache
Average Write Write Pending Rate Percentage of the 1% to 35% > 35% Port View
Pending Cache used for
Write Pending
Microsoft Exchange
The operating system temporary drive is where all the format conversions occur,
such as from RTF to HTML. It is also the home for all temporary files created and
accessed during crawls performed by the Microsoft Index Server Indexing Service.
When first installed, the operating system sets the location for creation and use of
temporary files as the same disk used by the operating system itself. This means that
any I/O for the temp disk competes with I/O for programs and page file operations
being run from that drive. This competition for I/O impacts performance. To avoid
having the operating system compete with for I/O with the temp disk, it is
recommended that you change the global environment setting of TEMP to point to
another disk and, thereby, set the temp disk to its own disk.
Database
• .edb file ― Stores all MAPI messages and tables
• .stm file ― Stores all non-MAPI data (dropped for 2007)
• Characterized by random I/O
• Should be on its own Array Group
• Counters:
▪ Average Disk sec/Read & Write ― Less than 20ms; no spikes higher
than 50ms
▪ Read Hit % ― Greater than 35%
▪ Disk Queue Length ― Lower than 9
▪ Read Response Time ― Less than 6ms
▪ Write Response Time ― Less than 3ms
Note: Synchronous replication will impact above, so also use:
▪ Database Page Fault Stalls/sec ― Should be 0
Exchange servers should generally have database write latencies under 20ms, with
spikes (maximum values) under 50ms. However, it is not always possible to keep
write latencies in this range when synchronous replication is in use. Database write
latency issues often do not become apparent to the end user until the database cache
is full and cannot be written to. When using synchronous replication, the
Performance Monitor Database Page Fault Stalls/sec counter is a better indicator of
whether the client is being affected by write latency than the Physical Disk\Average
Disk sec/Write counter.
On a production server, the value of the Database Page Fault Stalls/sec counter
should always be zero, because a database page fault stall indicates that the database
cache is full. A full database cache means that Exchange cannot place items in cache
until pages are committed to disk. Moreover, on most storage systems, read
latencies are affected by write latencies. These read latencies may not be detectable
at the default storage system Performance Monitor sampling rate. Remote procedure
call (RPC) latencies also increase as a consequence of database page fault stalls,
which can degrade the client experience.
Because disk-related performance problems can negatively affect the user experience,
it is recommended that administrators monitor disk performance as part of routine
system health monitoring. When analyzing a database logical unit number (LUN) in
a synchronously replicated environment, you can use the counters listed below to
determine whether there is any performance degradation on the disks:
Average Disk sec/Read & Write – less than 20ms; no spikes higher than 50ms
Read Hit % - greater than 35%
Disk Queue Length – lower than 9
Read Response Time – less than 6ms
Write Response Time – less than 3ms
Note: Synchronous replication will impact the process the counters, so also use
this counter:
Database Page Fault Stalls/sec – Should be 0
Transaction Logs
• Maintain database integrity
• Writes only — write performance is key
• Characterized by sequential I/O
• Should be in its own volume
• Counters:
▪ Average Disk sec/Write ― Less than 10ms; no spikes higher than
50ms
▪ Log Record Stalls/sec ― Less than 10 per second; no spikes higher
than 100 per second
▪ Log Threads Waiting ― Less than 10
The transaction log files maintain the state and integrity of your .edb and .stm files.
This means that the log files, in effect, represent the data. There is a transaction log
file set for each storage group. To increase performance, Exchange implements each
transaction log file as a database. If a disaster occurs and you have to rebuild your
server, use the latest transaction log files to rebuild your databases. If you have the
log files and the latest backup, you can recover all of your data. However, if you lose
your log files, the data is lost.
There are generally no reads to the log drives, except when restoring backups. This
means that write performance is essential to the transaction logs and any analysis
should closely observe this aspect. When analyzed per physical log disk, you can
use the counters listed below to determine whether there is any performance
degradation on the disks:
Average Disk sec/Write – Less than 10ms; no spikes higher than 50ms
Log Record Stalls/sec – Less than 10 per second; no spikes higher than 100 per
second
Log Threads Waiting – Less than 10
SMTP Queue
• Stores messages until processed by Exchange
• Characterized by random I/O
• Key counters:
▪ Average Disk sec/Read ― Less than 10ms; no spikes higher than
50ms
▪ Average Disk sec/Write ― Same as above
▪ Average Disk Queue Length ― Less than number of spindles
The SMTP queue stores SMTP messages until Exchange writes them to a database
(private or public), or sends them to another server or connector. SMTP queues
generally experience random, small I/Os.
When analyzed per physical SMTP queue disk, you can use the counters listed
below to determine if there is any performance degradation on the disks:
Average Disk sec/Read – Less than 10ms; no spikes higher than 50ms
Average Disk sec/Write – Same as above
Average Disk Queue Length – Less than number of spindles
Page File
• Disk cache for physical memory
• Always used, even when there is excess RAM
• Key counters:
▪ Average Disk sec/Read ― Always less than 10ms
▪ Average Disk sec/Write ― Always less than 10ms
▪ Average Disk Queue Length ― Less than number of spindles
The page file acts as an extension of the physical memory, serving as an area where
the system puts unused pages or pages it will need later. The page file always sees
some use, even in machines with a good amount of free memory. This constant
utilization is because the operating system tries to keep in memory only the pages
that it needs and enough free space for operations. For example, a printing tool that
is used only at startup might have some of its memory paged to disk and never
brought back if it is never used.
In servers where the physical memory is being used heavily, it is important to
ensure that all access to the page file is as fast as possible and to avoid thrashing
situations. It is common for servers to start seeing errors in memory operations long
before the page file is full. So, observing usage patterns of the page file disk is more
important than how full the disk is. Use the counters listed below to determine
whether there is any performance degradation on the page file disk:
Average Disk sec/Read – Always less than 10ms
Average Disk sec/Write – Always less than 10ms
Average Disk Queue Length – Less than number of spindles
Databases
In Exchange Server 2010, the changes to the Extensible Storage Engine (ESE) enable
the use of large databases (approximately 2TB) on larger, slower disks while
maintaining adequate performance. The ESE uses larger and more sequential I/O,
and database maintenance routines like online defragmentation and checksums are
run continually in the background.
The Exchange Store’s database tables make better use of the underlying storage
system and cache and the store no longer relies on secondary indexing, making it
less sensitive to performance issues.
Oracle Databases
Normal
Metric Name Name in HTnM Value Description Bad Value As seen from
Value
I/O Rate IOPS I/Os per second N/A N/A Host View
Read Rate Read IOPS I/Os per second N/A N/A Host View
Read Block Size Avg Disk Bytes xfered per I/O 65536 to < 65536 Host View
Bytes/Read operation 262144
Port Data xfer Port Transfer Bytes xfered per 90 MB/s to < 90 MB/s Port View
Rate (1) second per port 170 MB/s per FC port
per FC port
Sequential Disk Sequential Number of IOPS in > 90% < 90% Port View
Content (2) IOPS Sequential mode
Random Content Disk Random Number of IOPS in < 10% > 10% Port View
IOPS Random mode
Read Response Avg Disk Time required to 1 to 10 > 10 Host & Port
Time Sec/Read complete a Read I/O View
(Millisecond)
Read Hit Ratio Read Hit % % of Read I/Os 90% to < 90 % Port View
satisfied from Cache 100%
Normal Bad
Metric Name Name in HTnM Value Description As seen from
Value Value
I/O Rate IOPS I/Os per second N/A N/A Host View
Read Rate Read IOPS I/Os per second N/A N/A Host View
Write Rate Write IOPS I/Os per second N/A N/A Host View
Read Block Size Avg Disk Bytes xfered per 4096 to 8192 N/A Host View
Bytes/Read I/O operation
Write Block Size Avg Disk Bytes xfered per 2048 to 4096 N/A Host View
Bytes/Write I/O operation
Read Response Avg Disk Sec/Read Time required to 1 to 5 >5 Host & Port View
Time complete a Read
I/O (Millisecond)
Write Response Avg Disk Sec/Write Time required to 1 to 2 >2 Host & Port View
Time complete a Write
I/O (Millisecond)
Read Hit Ratio Read Hit % % of Read I/Os 30 to 65 < 30 Port View
satisfied from
Cache
Average Queue Avg Disk Queue Average Number of 1 to 8 >8 Host View
Length Length disk requests 8 is the Max (Note that Max
queued for
Qdepth value Qdepth might vary
execution on one
with different
specific LUN hardware settings)
Backup Systems
Typical Metrics
• R/W ratio
▪ Backup: 1:50
▪ Restore: 50:1
• I/O type: Sequential or random access
• Block size: 64 K and higher
MTU size Header data size "unused" bytes used bytes bits used Half duplex Half duplex Full duplex
(bytes) (bytes) (bytes) in a packet in a packet in a packet packets/sec ** MB/s ** MB/s ** $
9,018 104 8,192 722 8,296 66,368 16,002 125.0 175.0
4,450 104 4,096 250 4,200 33,600 31,607 123.5 172.9
1,518 104 1,414 0 1,518 12,144 87,451 117.9 165.1
1,518 104 1,024 390 1,128 9,024 117,686 114.9 160.9
1,518 104 512 902 616 4,928 215,503 105.2 210.5
1,518 104 20 1,394 124 992 1,070,565 20.4 40.8
TCP/IP bytes ** Perfect World: does not include turnaround times of NIC, IP stack, or the O/S capabilites
Max Header 104 $ assumes a typical 40% gain in full duplex over half duplex
NFS-CIFS-ODBC 42
VLAN 4 NOTE: NFS Meta data transactions are very small packets. A high ratio of
CRC 18 these to larger data packets will significantly drop the peak throughput rates.
TCP 20
IP 20 NOTE: Full duplex mode doesn't work if the NIC card or IP stack aren't actually capable of it.
Just being able to set FD mode doesn't mean anything. Also true of fibre channel.
In order to achieve these rates, there must be sufficient horsepower present at every
point in the FC path in both the server, (switch) and storage. Many current storage
systems on the market have enough horsepower to reach these rates on a single port,
but cannot scale at these rates across very many of their unshared ports. An unshared
port is one with a dedicated processor behind it. On virtually every system available,
there is shared logic behind each pair of ports.
In order to achieve these rates, there must be sufficient horsepower present at every
point in the FC path in both the server, (switch) and storage. When deploying 4Gb
on most current systems on the market, you may see a boost in sequential
performance (maybe 1.4x) but an actual loss (up to 2X) for random workloads. This
is due to overloading the hardware and microcode with an overly high arrival rate
of frames.
Basics
Architecture Overview
Even if only 2
Pagefile, Bitmap, sectors are
Fragmented
Directory Header required, the
File – part #2 whole cluster is
access is handled
allocated.
where?
Answer: Same as Tables
the data in the client Fragmented
Index
File – part #3
Metadata
File IDs, list of clusters
attributes
What are you measuring if you address File logical blocks in an NFS-
mounted volume?
Multipathing
Options
HDLM
• Use Extended Round Robin with USP
DMP
• Use dmp_pathswitch_blks_shift to define
number of blocks issued per path before
switching (default = 1 MB)
MPXIO
• Use LOAD_BALANCE_LBA to define LBA
region size per path before switching
(typically 32 MB)
VMware
• MRU or FIXED (auto-restore)
General comments
• Up to 4 paths per LUN with USP
• 2 paths per LUN with 9900V range
Overview
Increase Resilience
Increase Performance
• More Paths = More MB/sec
• More Paths = More effective tags per LUN
• More Port Processors = More IOPS
• Full benefit of VDEV striping
Cautions
• Shared processor paths will not increase IOPS
• Loss of sequential detection with round robin access
• Port Buffer overhead with too many Paths (five and over)
• Reduced number of LUNs
HDLM Features
Multipathing
• HDLM enables multiple paths from host to devices, allowing access to
device even if a specific path is unavailable
• Multiple paths also can be used to share I/O workloads and improve
performance
Path Failover
• HDLM automatically redirects I/O operations to alternate paths if a failure
occurs, allowing processing to continue without interruption
• With threat of I/O bottlenecks removed and data paths protected,
performance and reliability increase
Failback
• When failed path becomes available, HDLM the recovered path back
online
Ensures maximum number of paths is always available for load
balancing and failover
Load balancing
• HDLM intelligently allocates I/O requests across all available paths to
prevent a heavily loaded path from adversely affecting processing speed
• Load balancing ensures continuous operation at optimum performance
levels, along with improved system and application performance
Path health checking
• HDLM automatically checks the path status at regular user-specified
intervals, eliminating the need to perform repeated manual path status
checks
• Proactive identification of issues
Server Server
Applications Applications
HDLM HDLM
*Note: O/S and array dependent; check system requirements for details.
Load Balancing
Server Server
Applications Applications
Load
I/O Bottleneck Balancin
g
Volumes Volumes
Storage Storage
Without Load Balancing With Load Balancing
HDLM distributes storage access across multiple paths to improve
I/O performance with load balancing
Bandwidth control at the HBA level, and in conjunction with Global
Availability Manager at the LUN level
When there is more than one path to a single device in a logical unit (LU), the
Dynamic Link Manager can distribute the load across the paths by using the paths to
issue I/O commands. Load balancing prevents a heavily loaded path from affecting
the performance of the entire system.
Microsoft Windows
• Microsoft Cluster Server
Active Host Standby Host
• Oracle RAC
• VERITAS Cluster Server Cluster
Sun Solaris
• Sun Cluster HDLM HDLM
• VERITAS Cluster Server
• Oracle RAC
HBA HBA HBA HBA
HP-UX
• MC/Serviceguard
Load Balance
• Oracle RAC
AIX
CHA CHA
• HACMP
• VERITAS Custer Server
LUN
• Oracle RAC
Linux Storage
• Redhat AS Bundle Cluster
• SuSE Linux Bundle Cluster
• VERITAS Cluster Server
• Oracle RAC
Round Robin is a more appropriate scheme for the VSP because Sequential detection
is managed by the owning VSD processor in the VSP and not the port.
Dynamic Link
Manager Version
Error management
function settings
Select the severity of
Log and Trace Levels
Optional Parameters
Load Balancing
Path Health Check
When enabled (default), Dynamic Link Manager monitors all online paths at
specified interval and puts them into Offline(E) or Online(E) status if a
failure is detected.
There is a slight performance penalty due to extra probing I/O.
The default interval is 30 minutes.
Auto Failback
When enabled (not the default), Dynamic Link Manager monitors all
Offline(E) and Online(E) paths at specified intervals and restores them to
online status if they are found to be operational. The default interval is one
minute.
Intermittent Error Monitor
Auto Failback must be On.
Parameters are Monitoring Interval and Number of Times.
Example: Monitoring Interval = 30 minutes
Module Summary
Module Review
Microsoft Controller-based
UNIX® Host UNIX® Host z/OS®
Windows®
Host Mainframe virtualization
No virtualization layer
between host and
Fibre storage controller
Channel
Fibre Channel • No added complexity
SAN ESCON or or latency between
FICON
host and storage
• No cracking open of
Fibre Channel Single Pool of Fibre Channel packets
Heterogeneous • Support for mainframe
External Storage
Storage
Virtualization hosts, direct attach,
Controller
and SAN attached
VSP
EMC Symmetrix
Model AMS200
Features
Users can operate multiple storage systems as if they are all part of
a single storage system.
Fibre-Channel
Interface
Host 1 Host 2 Connection
Switch
Internal
Volumes
Mapping
Legend
:Volumes installed in the storage system
:Virtual volumes that do not have physical memory space
:Lines showing the concept of mapping
Open-V
Write to LDEV Write to LDEV A read operation returns the data on the
whose cache whose cache local storage cache memory to HOST, if
mode is ON mode is OFF exists, regardless of the cache mode status.
Host FC-SW
Cache
memory
Large Block Streaming Reads and Writes: faster with cached I/O
• D-2-D backup, video, audio, imaging
Random Writes: faster with cached I/O
• Database, file system
Small Block Reads: use either cached or uncached
Writes signalled
complete when in
VSP cache
Normal LRU cache
VSP Cache Note: must be Temporary management for reads
aware of write staging
pending!
Normal LRU cache Partition advised
management for Writes signalled complete
reads when in external storage
cache
Normal cache
External Storage management for
Cache reads, plus prefetch
E-LUN E-LUN
USP V
HDDs busy
in the AMS
Migration to a larger LDEV - This function carves the same size LDEV out of a larger
target LDEV. The migration is still same size LDEV to LDEV. Space freed from
carving out the target from a larger LDEV is returned to Array Group free space.
This function increases the target candidates for migrations and is helpful when the
user does not have the same size LDEVs available for the targets.
Benefits
• Improves security
• Assures Quality of Service
• Storage resources optimized to application/business requirements
• Enables departmental view of storage
Improves Security
Virtual Partition Manager restricts access to data and resources from users and
storage administrators without authorization to that partition. It also restricts access
from users and administrators to data and resources outside of their authorized
partition.
Assures Quality of Service
Virtual Partition Manager dedicates resources (for example, cache, disk) for
exclusive use by specific applications to maintain priority and quality of service for
business-critical applications. You can secure and/or restrict access to storage
resources to ensure confidentiality for specific applications. You can also use Virtual
Partition Manager to adjust data storage resources dynamically to satisfy changing
business requirements.
Storage Resources Optimized to Application/Business Requirements
Virtual Partition Manager supports Services Oriented Storage Solutions from
Hitachi Data Systems to match data to appropriate storage resources based on
availability, performance, capacity, and cost. It improves flexibility by allowing
dynamic changes to cache partitions while in use.
Enables Departmental View of Storage
A Departmental view of storage delivers accountability and chargeback, segregation
of workload, facilitates departmental management and control within partitions,
and permits centralized control over departments.
Deleting a CLPR moves the resources (Cache and Parity Groups) back to
CLPR0
Cache Cache
Created to allocate the storage system resources into two or more virtual
storage systems, each of which can be accessed only by the storage
administrator/storage partition administrator/users for that partition
Are created for administrative and security reasons
Consists of:
• One or more CLPRs
• One or more Target ports
Inflow Control
Normal occurs in CLPR.
Mode 454
70% WP Limit
Accelerated
Destage triggered Mode 454
Av. Limit
at an Average WP
level.
Accelerated Destage
Cache Partition Manager is a priced optional feature of the disk array that enables
the user data area in the disk array to be used being divided more finely. Each of the
divided portions of the cache memory is called a partition. A volume defined in the
disk array is used being assigned to the partition. A user can specify a size of the
partition and a segment size (size of a unit of data management) of a partition can be
changed also. Therefore you can optimize the data reception/sending from/to a
host by assigning the most suitable partition to a volume according to a kind of data
to be received from a host.
In a typical cache memory (no partitioning), data is spread throughout cache and
inefficiently allocated. Cache can be fragmented while loading to or from RAID
groups. There is no way for the cache to treat a specific LUN more efficiently,
because it must accommodate all LUNs and all RAID stripes equally.
When handling a lot of small data with lengths shorter than 16KB, you can raise the cache
hit rate and lower response time (read) by specifying data to be processed via partition 2.
By modifying Cache Partitioning Manager to match stripe size, you can lower
overhead and improve cache utilization and efficiency.
Benefits
• Delivers cache environment that can be optimized to specific customer
application requirements
• Less cache required for specific workloads
• Better hit rate for same cache size
• Better optimization of I/O throughput for mixed workloads
Selectable segment size — Customize the cache segment size for a user application
Partitioned cache memory — Decrease negative effect on performance between
applications in a 'storage consolidated' system by dividing the cache memory into
multiple partitions individually used by each application
Selectable stripe size — Increase performance by customizing the disk access size
Multipathing
Load Balancing
Load Balancing
Description
Algorithm
Round Robin Distributes all I/Os among multiple paths
Extended Round Distributes I/Os to paths depending on whether
Robin the I/O involves sequential or random access:
• For sequential access, a certain number of
I/Os are issued to one path in succession.
The next path is chosen according to the
round robin algorithm.
• For random access, I/Os will be distributed to
multiple paths according to the round robin
algorithm.
Load Balancing
Description
Algorithm
Least I/Os I/O operations are issued to path that has least
number of I/Os being processed*
Extended least I/O operations are issued to path that has least
I/Os (Default) number of I/Os being processed*
• For sequential access, a certain number of
I/Os are issued to one path in succession
– Next path is chosen according to least
I/Os algorithm
• For random access, I/Os are issued to
multiple paths according to the least I/Os
algorithm
Load Balancing
Description
Algorithm
Least blocks I/O operations are issued to path with least
pending I/O block size*
Extended least I/O operations are issued to path with least
blocks pending I/O block size*
• For sequential access, a certain number of
I/Os are issued to one path in succession
Next path is chosen according to least
blocks algorithm
• For random access, I/Os are issued to
multiple paths according to least blocks
algorithm
Multipathing Primer
Round Robin
• Use with AMS 2000 family (uses all available paths).
• HDLM uses only the Owner Path with the legacy modular systems and
AMS 500/1000.
Extended Round Robin
• Use with Enterprise Storage to allow a sequence of sequential I/Os (up to
100) to be handled by the same port and improve prefetch and cache
efficiency.
• Nonsequential I/Os switch paths more frequently.
Load Balancing Off
• Only uses the first discovered or currently active path.
Notes:
Read performance is generally improved with multiple paths.
Write performance can be limited by other factors, for example Write Pending.
Performance may not increase with multiple paths if the number of active LUNs
is less than the number of paths.
HDLM
• Use Extended Round Robin
DMP
• Use dmp_pathswitch_blks_shift to define
number of blocks issued per path before
switching (default = 1 MB)
MPXIO
• Use LOAD_BALANCE_LBA to define LBA
region size per path before switching (typically
32 MB)
VMware
• MRU or FIXED (auto-restore)
AIX MPIO
• Round Robin or None
General comments
• Up to 4 paths per LUN with USP V
Why not to use MPXIO Round Robin with 9500V and AMS 500/1000:
MRU
• Each host will continue to use its most recently used path unless an error occurs.
• When it detects a failure, it will try to failover to another path. If successful, this
becomes the new most recently used path.
• MRU should not be used with AMS 500/1000 or
Thunder 9500 V when there are multiple servers
accessing the same LUNs (VMware cluster).
FIXED
• The host uses the defined path in preference to
any other.
• When it detects a failure, it will try to failover to
another path. If successful, this path is used until
the original path is restored. Access then switches back to the defined Fixed path.
• Fixed should be used with AMS 500/1000 or Thunder 9500 V in a clustered
environment.
Round Robin (when available)
• Will be valid for the AMS 2000 family
Note: All options are valid for the VSP, USP, USP V, and AMS 2000 family.
Dynamic Link 5 3 1
Manager Not Sequential
RR
6 4 2 • Less efficient cache usage
• Less efficient back-end I/O
• Suboptimal response times
Module Summary
Module Review
Dynamic Provisioning
Fat Provisioning occurs on traditional storage arrays where large pools of storage
capacity are allocated to individual applications but remain unused (that is, not
written to) with storage utilization often as low as 50%.
Thin Provisioning is a mechanism that applies to large-scale centralized computer
disk storage systems. Thin Provisioning allows space to be easily allocated to servers,
on a just-enough and just-in-time basis.
Over Allocation is a mechanism that allows server applications to allocate more
storage capacity than has been physically reserved on the storage array itself. This
allows leeway in growth of application storage volumes, without having to
accurately predict which volumes will grow by how much. Physical storage capacity
on the array is dedicated only when data is actually written by the application, not
when the storage volume is initially allocated.
V-VOL
The illustration shows the allocation, and timing of allocation, of the pages. Page is a
Hitachi Dynamic Provisioning (DP) construct. It is the unit of allocation and is
allocated upon receipt of a write for either the first unit of allocation for the DP V-
Volume or a write that requires additional allocation due to filling up previously
allocated pages.
So the light gray shades represent the allocation, but the data is not written until the
first write is received and settled, at which point the gray turns darker gray.
The point is that subsequent pages are not required to be contiguous and, in fact,
will be randomly spread over the real LDEVs within the DP Pool.
Benefits
Overview
Components
Host Reporting
Logical/Virtual
Volumes
Physical Capacity
HDP-VOL HDP-VOL HDP-VOL
Pool-VOL
Dynamic
Mapping
Table
Controller
HFS Write all @ 10MB interval Virtual Volume Size = Pool Consumed
Microsoft Server NTFS Write at beginning only Expected benefits high
2003
Linux XFS Write all @ 2GB interval Expected benefits high
Ext2,Ext3 Write all @ 128MB interval FS create will use 30% of DP volume capacity
Solaris UFS Write all @ 52MB interval Virtual Volume Size = Pool Consumed
This shows how different file systems deal with their meta data. If it is written once
in a single location, it will work well with HDP. If it writes meta data at regular
intervals over the whole filesystem, it will not work well with HDP because that
would cause the whole virtual volume to be fully provisioned at the start. The real
message is that there are some OS/FS that are great for HDP, some that are not, and
some in the middle. Understanding this leads to establishment of best practices.
The table above shows the potential capacity use by file system type for each OS. (FS
is created with the parameter set to default.) OS and FS should be carefully selected
when using Dynamic Provisioning
Dynamic Tiering
POOL A
Different tiers of storage are now EFD/SSD
TIER 1
in one pool.
If data becomes less active, it
migrates to lower level tiers. Last Referenced
SAS
TIER 2
The pool contains multiple tiers (not the other way around like in USP V/HDP or
USP/HTSM).
The logical volumes have pages mapped to the pool (same as USP V/HDP). Those
pages can be anywhere in the pool on any tier in that pool.
The pages can move (migrate) within the pool for performance optimization
purposes (move up/down between tiers).
HDT will try to use as much of the higher tiers as possible. (T1 and T2 will be used
as much as possible while T3 will have more spare capacity.)
You can add capacity to any tier at any time. You can also remove capacity
dynamically. So, sizing a tier for a pool is a lot easier.
Quantity added/removed should be in ARRAY Group quantities.
The first version of HDT (with VSP at GA):
Up to a maximum 3 tiers in a pool.
We will start with managing resources in a 3 tier approach. That may mean:
1-Flash drives, 2-SAS, 3-SATA or
1-SAS(15k), 2-SAS(10k), 3-SATA (or something else)
The Pool’s tiers are defined by HDD type
No support for RAID-10
No external storage supported (in v01)
No mainframe support in v01
Business Value
• Significant savings by moving data to
Quiet lower cost tiers
Data Set
• Increase storage utilization up to 50%
• Easily align business application needs
to the right cost infrastructure
With Hitachi Dynamic Tiering, the complexities and overhead of implementing data
lifecycle management and optimizing use of tiered storage are solved. Dynamic Tiering
simplifies storage administration by eliminating the need for time consuming manual
data classification and movement of data to optimize usage of tiered storage.
Hitachi Dynamic Tiering automatically moves data on fine-grain pages within Dynamic
Tiering virtual volumes to the most appropriate media according to workload to
maximize service levels and minimize TCO of storage.
For example, a database index that is frequently read and written will migrate to high
performance flash technology while older data that has not been touched for a while will
move to slower, cheaper disks.
No elaborate decision criteria are needed; data is automatically moved according to
simple rules. One, two, or three tiers of storage can be defined and used within a single
virtual volume, using any of the storage media types available for the Hitachi Virtual
Storage Platform. Tier creation is automatic based on user configuration policies,
including media type and speed, RAID level, and sustained I/O level requirements.
Using ongoing embedded performance monitoring and periodic analysis, the data is
moved at a fine grain page level to the most appropriate tier. The most active data moves
to the highest tier. During the process, the system automatically maximizes the use of
storage keeping the higher tiers fully utilized.
LDEV Design
HDT Tiers
Tier Management
If you select Auto, performance monitoring and tier relocation are automatically
performed.
If you select Manual, you can manually perform performance monitoring and tier
relocation with the CLI commands.
Cycle Time
You can also select ½ or 1 or 2 or 4 or 8 hr intervals.
When you select 24 Hours, Monitoring period can be specified.
Monitoring Period field
Specify the start and end time of performance monitoring in 00:00 to 23:59
(default value).
One or more times must be taken between the starting time and the ending time,
and the starting time must be before the ending time.
You can view the information gathered by performance monitoring with Storage
Navigator/Tuning Manager.
When you select any of ½ Hour, 1 Hour, 2 Hours, 4 Hours or 8 Hours:
Cycle Management
• Auto Tier Cycle Management
▪ 24 hours or less: 8, 4, 2, 1 hours, 30 min
▪ anchored at midnight 00:00
4/19 0:00 Cycle 4/20 0:00 4/21 0:00 4/22 0:00
Time
There are two concurrent pool tasks that occur repeatedly over time which give the
HDT Pool its functionality and behavior — Performance Monitoring tasks and Page
Migration tasks.
An HDT Pool can be configured so that the Monitoring and Migration tasks occur in
a cyclic fashion, triggered based on automated time settings. An HDT Pool can also
be configured for Manual operations.
At the end of a Monitoring phase, the collected page access metrics are frozen and
analyzed and are used to determine which data pages will be migrated and to which
tier during the following Migration phase. When the Pool is operating under Auto
configuration, the next Monitoring cycle will begin and will collect page access
metrics even while the Migration phase is running.
The HDT Monitoring logic excludes system internally-generated I/O such as that
generated by page migration or I/O that results from a drive sparing or correction
copy operation.
The Pool and/or Tier status identifies whether either or both of the cycle processes
are running:
Status Meaning
STP “stopped” — neither monitoring or relocation cycle are
running
MON “monitoring” — monitoring cycle is running, relocation
cycle is not running
RLC “relocation” — relocation cycle is running, monitoring
cycle is not running
RLM “relocation and monitoring” — both relocation and
monitoring cycles are running concurrently
These are the status codes that are reported when the pool and tier data is reported
by the raidcom get dp_pool CLI command.
The status of the monitoring and relocation cycles is also visible in the Storage
Navigator 2 GUI.
Automatic, Manual or Continuous cycle settings can be set and changed either
through Storage Navigator 2 GUI and/or using CLI commands. An HDT pool’s
cycle control can be changed at any time.
When an HDT pool is configured for Automatic cycle control, the cycle period can be
set at one of the available intervals: 30 minutes, 1 hour, 2 hours, 4 hours, 8 hours or
24 hours. A cycle always starts at the “start of an hour.” That is, the next cycle will
start at the next “00” minutes of the next hour. It is important that you understand
this as you anticipate the cycle processing and availability of monitoring data from
the “next” full and successful monitoring cycle.
When an HDT pool is configured for Manual operation, the storage administrator
can start and stop monitoring and relocation cycles using either Storage Navigator 2
GUI and/or CLI raidcom command(s). The maximum duration for a manual
monitoring cycle is 7 days.
More recently, with VSP microcode version 2, the option of “Continuous mode”
monitoring is supported. Continuous mode aggregates the collected monitoring
metrics across multiple monitoring periods. In Periodic mode, each monitoring cycle
starts new with counting the page level accesses. The new Continuous mode results
in different page promotion and demotion behavior.
Tier Performance Not Busy Not Busy Not Busy • No problem in any tiers that
can handle more I/O’s
Unused SATA
Used SSD Used SAS • SSD and SAS tiers are used
up but their performance
Tier Capacity Used SATA potential is still available.
Full Available
Full HDT pool No actions required.
Busy Not Busy
Tier Performance Not Busy • No problem in SSD tier that
can handle more I/O’s
Unused SATA
Used SSD Used SAS • SAS capacity is used up, and
Used SATA its performance needs your
Tier Capacity attention
Full Full HDT pool Available
Need to add more SAS drives.
Migration Decisions
To avoid pages bouncing in and out of a tier, data in the gray zone is
not migrated.
Does relocation hurt?
• Relocation is capped at 3TB/day (36MB/s). Very rare.
• Similar to rebalance.
• A pool will have a minimum of 4*SATA AG.
• Supports 233MB/s sequential write (16TB/day).
• 18% absolute worst case. SAS: 5%.
Tier1
グレー
Tier2 Gray zone
ゾーン
Tier3
Overview
Measurements
Module Summary
Module Review
Fulfillment of an expectation.
Performance = Reality – Expectations
Happiness = Reality – Expectations
Performance = Happiness
Measure Reality
• Establish comprehensive data collection
Ask about Customer Expectations
• Quantifiable expectations exist?
• How are customer expectations not being met?
Fulfillment of an Expectation
If both performance and happiness = the same thing (reality minus expectations),
then it follows that performance must equal happiness
Measure Reality
Establish comprehensive data collection
Ask about Customer Expectations
Quantifiable expectations exist?
Throughput (IOPs, MBS); Response Time
How are customer expectations not being met?
Specific targets; Timing or circumstances; How do they know they are unhappy?
Reporting Intervals
50% 75%
Note:
Design/build for • Some applications are
normal operation more sensitive than
others.
• OLTP, mail and some DBs
Design/build for Allow for Utilization
failure operation
are response-critical.
during failure modes
• Backup and streaming
applications can push
Traffic utilization to a much
Light higher level without it ever
being a problem
System
• For example, 80% may be
explained:
red for a database, but
green for a backup or
batch application.
RAID-5 is lower cost per MB and performs well for most workloads
except highly random write workloads.
If the workload is material, and the:
• Percentage of random write operations is greater than 20%, then RAID-
10 is generally indicated.
• Percentage of random write operations is greater than 5% and the
workload is highly sensitive to response time, then RAID-10 is generally
indicated.
• Otherwise, RAID-5 is often recommended for its superior cost
effectiveness.
Write Pending is cache used by host writes to cache that have yet
to be destaged (written) to disk.
High write pending is caused by rate of host writes to cache
exceeding storage system ability to transfer data from cache to disk.
High write pending is caused by inadequate back end storage
access resources to service the host write workload.
Cache Mode:
Enabled, I/O complete issued to host immediately.
More likely to cause Write Pending problems
Disabled, I/O complete issued to host after I/O complete received from external
storage.
Less likely to cause Write Pending problems
Write pending should be among the first metrics checked during any
troubleshooting procedure.
• Easy to check and evaluate.
• Write pending problems impact all servers in a cache partition attempting
to write to the storage systems.
When write pending is elevated, the cause is usually identified by
locating an intense write workload whose timing coincides with the
elevated write pending levels.
Enterprise reports
Array Group Utilization
or Array Group Busy %
Write Hit Rate % being less than 100% is another indication of high write pending.
RG #4
RG #3
RG #4
Lu #1 RG #2
RG #3
RG #2 RG #1
RG #1
RG in one tray
6D+1P (7 disks) is the magic
Best performance All SAS Ports are number (or sensible variations)
HDP Pool equally balanced
for a single LUN
Lu #1
Lu #1
Disable the
automatic internal
Load Balancing
function
Module Summary
Module Review
Chat
Q&A
Feedback Options
• Raise Hand
• Yes/No
• Emoticons
Markup Tools
• Drawing Tools
• Text Tool
Automatic
• With Intercall / WebEx Teleconference Call-Back Feature
Otherwise
• To transfer your audio from Main Room to virtual Breakout Room
1. Enter *9
2. You will hear a recording – follow instructions
3. Enter Your Assigned Breakout Room number #
For example, *9 1# (Breakout Room #1)
• To return your audio to Main Room
Enter *9
800.374.1852
Simulated Labs
Labs provide
• A video demonstration
• A practice mode
• An online help – a detailed lab guide that steps through the lab practice.
You may want to first watch the video demonstration.
Then, learn while practicing.
Note:
After you have completed the course, if you have access to the Learning Center, the
course and Simulated Labs are recorded in your Learning History. To access the
Simulated Labs to refresh your learning or for additional practice:
1. Log on to the Learning Center > My Learning > All Learning Activity > My
Completed Courses.
2. Clear date fields, then select Search to view complete learning history.
3. Find the course and hover over Actions, click View Learning Assignments.
4. Launch the lab module.
3. If a current course:
i. Locate the appropriate course and click View Sessions & Progress to
access the lab module.
Title Delivery Type Sessions Start Date Package Location Facility Attempts on Content Status Actions
<Course Virtual VILT M 01-11-2010 US EDT Not Confirmed View Sessions &
Name> Instructor 10:00 - Applicable Progress
Led 14:00 Drop
Simulated Lab - Content Module Required Attempts Allowed: Not Evaluated Launch
<Course Code> Unlimited
Validate your
knowledge and Check your Collaborate and
skills with progress in the share with fellow
certification learning paths HDS colleagues
Learning Center:
http://learningcenter.hds.com
LinkedIn:
http://www.linkedin.com/groups?home=&gid=3044480&trk=anet_ug_hm&
goback=%2Emyg%2Eanb_3044480_*2
Twitter:
http://twitter.com/#!/HDSAcademy
White Papers:
http://www.hds.com/corporate/resources/
Certification:
http://www.hds.com/services/education/certification
Learning Paths:
APAC:
http://www.hds.com/services/education/apac/?_p=v#GlobalTabNavi
Americas:
http://www.hds.com/services/education/north-
america/?tab=LocationContent1#GlobalTabNavi
EMEA:
http://www.hds.com/services/education/emea/#GlobalTabNavi
theLoop:
http://loop.hds.com/index.jspa ― HDS internal only
ACC — Action Code. A SIM (System Information AL-PA — Arbitrated Loop Physical Address.
Message). AMS — Adaptable Modular Storage.
ACE — Access Control Entry. Stores access rights APAR — Authorized Program Analysis Reports.
for a single user or group within the APF — Authorized Program Facility. In IBM z/OS
Windows security model. and OS/390 environments, a facility that
ACL — Access Control List. Stores a set of ACEs, permits the identification of programs that
so that it describes the complete set of access are authorized to use restricted functions.
rights for a file system object within the API — Application Programming Interface.
Microsoft Windows security model.
APID — Application Identification. An ID to
ACP ― Array Control Processor. Microprocessor identify a command device.
mounted on the disk adapter circuit board
(DKA) that controls the drives in a specific Application Management — The processes that
disk array. Considered part of the back end; manage the capacity and performance of
it controls data transfer between cache and applications.
the hard drives. ARB — Arbitration or request.
ACP Domain ― Also Array Domain. All of the ARM — Automated Restart Manager.
array-groups controlled by the same pair of Array Domain — Also ACP Domain. All
DKA boards, or the HDDs managed by 1 functions, paths, and disk drives controlled
ACP PAIR (also called BED). by a single ACP pair. An array domain can
ACP PAIR ― Physical disk access control logic. contain a variety of LVI or LU
Each ACP consists of 2 DKA PCBs to configurations.
provide 8 loop paths to the real HDDs. Array Group — Also called a parity group. A
Actuator (arm) — Read/write heads are attached group of hard disk drives (HDDs) that form
to a single head actuator, or actuator arm, the basic unit of storage in a subsystem. All
that moves the heads around the platters. HDDs in a parity group must have the same
AD — Active Directory. physical capacity.
ADC — Accelerated Data Copy. Array Unit — A group of hard disk drives in 1
RAID structure. Same as parity group.
Address — A location of data, usually in main
memory or on a disk. A name or token that ASIC — Application specific integrated circuit.
identifies a network component. In local area ASSY — Assembly.
networks (LANs), for example, every node Asymmetric virtualization — See Out-of-band
has a unique address. virtualization.
ADP — Adapter. Asynchronous — An I/O operation whose
ADS — Active Directory Service. initiator does not await its completion before
AIX — IBM UNIX. proceeding with other work. Asynchronous
I/O operations enable an initiator to have
BED — Back end director. Controls the paths to Cache hit rate — When data is found in the cache,
the HDDs. it is called a cache hit, and the effectiveness
of a cache is judged by its hit rate.
Big Data — Refers to data that becomes so large in
size or quantity that a dataset becomes Cache partitioning — Storage management
awkward to work with using traditional software that allows the virtual partitioning
database management systems. Big data of cache and allocation of it to different
entails data capacity or measurement that applications.
requires terms such as Terabyte (TB), CAD — Computer-Aided Design.
Petabyte (PB), Exabyte (EB), Zettabyte (ZB) Capacity — Capacity is the amount of data that a
or Yottabyte (YB). Note that variations of storage system or drive can store after
this term are subject to proprietary configuration and/or formatting.
trademark disputes in multiple countries at
the present time.
CP — Central Processor (also called Processing CXRC — Coupled z/OS Global Mirror.
Unit or PU). -back to top-
Data Migration — The process of moving data DFSMShsm — Data Facility Storage Management
from 1 storage device to another. In this Subsystem Hierarchical Storage Manager.
context, data migration is the same as DFSMSrmm — Data Facility Storage Management
Hierarchical Storage Management (HSM). Subsystem Removable Media Manager.
Data Pipe or Data Stream — The connection set up DFSMStvs — Data Facility Storage Management
between the MediaAgent, source or Subsystem Transactional VSAM Services.
destination server is called a Data Pipe or DFW — DASD Fast Write.
more commonly a Data Stream.
DICOM — Digital Imaging and Communications
Data Pool — A volume containing differential in Medicine.
data only.
DIMM — Dual In-line Memory Module.
Data Protection Directive — A major compliance
Direct Access Storage Device (DASD) — A type of
and privacy protection initiative within the
storage device, in which bits of data are
European Union (EU) that applies to cloud
stored at precise locations, enabling the
computing. Includes the Safe Harbor
computer to retrieve information directly
Agreement.
without having to scan a series of records.
Data Stream — CommVault’s patented high
Direct Attached Storage (DAS) — Storage that is
performance data mover used to move data
directly attached to the application or file
back and forth between a data source and a
server. No other device on the network can
MediaAgent or between 2 MediaAgents.
access the stored data.
Data Striping — Disk array data mapping
Director class switches — Larger switches often
technique in which fixed-length sequences of
used as the core of large switched fabrics.
virtual disk data addresses are mapped to
sequences of member disk addresses in a Disaster Recovery Plan (DRP) — A plan that
regular rotating pattern. describes how an organization will deal with
potential disasters. It may include the
Data Transfer Rate (DTR) — The speed at which
precautions taken to either maintain or
data can be transferred. Measured in
quickly resume mission-critical functions.
kilobytes per second for a CD-ROM drive, in
Sometimes also referred to as a Business
bits per second for a modem, and in
Continuity Plan.
megabytes per second for a hard drive. Also,
often called data rate. Disk Administrator — An administrative tool that
displays the actual LU storage configuration.
DBMS — Data Base Management System.
Disk Array — A linked group of 1 or more
DCA ― Data Cache Adapter.
physical independent hard disk drives
DDL — Database Definition Language. generally used to replace larger, single disk
DDM — Disk Drive Module. drive systems. The most common disk
DDNS — Dynamic DNS. arrays are in daisy chain configuration or
implement RAID (Redundant Array of
DE — Data Exchange Software. Independent Disks) technology.
DKA ― Disk Adapter. Also called an array control DSB — Dynamic Super Block.
processor (ACP); it provides the control DSF — Device Support Facility.
functions for data transfer between drives DSF INIT — Device Support Facility Initialization
and cache. The DKA contains DRR (Data (for DASD).
Recover and Reconstruct), a parity generator
DSP — Disk Slave Program.
circuit.
DTA —Data adapter and path to cache-switches.
DKC ― Disk Controller Unit. In a multi-frame
configuration, the frame that contains the DTR — Data Transfer Rate.
front end (control and memory DVE — Dynamic Volume Expansion.
components). DW — Duplex Write.
DKCMN ― Disk Controller Monitor. Monitors DWDM — Dense Wavelength Division
temperature and power status throughout Multiplexing.
the machine.
DWL — Duplex Write Line or Dynamic
DKF ― Fibre disk adapter. Another term for a Workspace Linking.
DKA.
-back to top-
DKU — Disk Array Frame or Disk Unit. In a
multi-frame configuration, a frame that
—E—
contains hard disk units (HDUs). EAV — Extended Address Volume.
DKUPS — Disk Unit Power Supply. EB — Exabyte.
DLIBs — Distribution Libraries. EC — Enterprise Class (in contrast with BC,
Business Class).
DKUP — Disk Unit Power Supply.
ECC — Error Checking and Correction.
DLM — Data Lifecycle Management.
ECC.DDR SDRAM — Error Correction Code
DMA — Direct Memory Access.
Double Data Rate Synchronous Dynamic
DM-LU — Differential Management Logical Unit. RAM Memory.
DM-LU is used for saving management ECM — Extended Control Memory.
information of the copy functions in the
cache. ECN — Engineering Change Notice.
iFCP — Internet Fibre Channel Protocol. Allows ISC — Initial shipping condition or Inter-System
an organization to extend Fibre Channel Communication.
storage networks over the Internet by using iSCSI — Internet SCSI. Pronounced eye skuzzy.
TCP/IP. TCP is responsible for managing An IP-based standard for linking data
congestion control as well as error detection storage devices over a network and
and recovery services. transferring data by carrying SCSI
commands over IP networks.
iFCP allows an organization to create an IP SAN
fabric that minimizes the Fibre Channel ISE — Integrated Scripting Environment.
fabric component and maximizes use of the iSER — iSCSI Extensions for RDMA.
company's TCP/IP infrastructure. ISL — Inter-Switch Link.
IFL — Integrated Facility for LINUX. iSNS — Internet Storage Name Service.
IHE — Integrating the Healthcare Enterprise. ISOE — iSCSI Offload Engine.
IID — Initiator ID. ISP — Internet service provider.
IIS — Internet Information Server. ISPF — Interactive System Productivity Facility.
ILM — Information Life Cycle Management. ISPF/PDF — Interactive System Productivity
Facility/Program Development Facility.
ILO — (Hewlett-Packard) Integrated Lights-Out.
ISV — Independent Software Vendor.
IML — Initial Microprogram Load.
ITaaS — IT as a Service. A cloud computing
IMS — Information Management System. business model. This general model is an
In-band virtualization — Refers to the location of umbrella model that entails the SPI business
the storage network path, between the model (SaaS, PaaS and IaaS — Software,
application host servers in the storage Platform and Infrastructure as a Service).
systems. Provides both control and data -back to top-
JMS — Java Message Service. LDKC — Logical Disk Controller or Logical Disk
Controller Manual.
JNL — Journal.
LDM — Logical Disk Manager.
JNLG — Journal Group.
LDS — Linear Data Set.
JRE —Java Runtime Environment.
LED — Light Emitting Diode.
JVM — Java Virtual Machine.
LFF — Large Form Factor.
J-VOL — Journal Volume.
LIC — Licensed Internal Code.
-back to top-
LIS — Laboratory Information Systems.
LLQ — Lowest Level Qualifier.
—K— LM — Local Memory.
KSDS — Key Sequence Data Set.
LMODs — Load Modules.
kVA— Kilovolt Ampere.
LNKLST — Link List.
KVM — Kernel-based Virtual Machine or
Load balancing — The process of distributing
Keyboard-Video Display-Mouse.
processing and communications activity
kW — Kilowatt. evenly across a computer network so that no
-back to top- single device is overwhelmed. Load
balancing is especially important for
—L— networks where it is difficult to predict the
LACP — Link Aggregation Control Protocol. number of requests that will be issued to a
LAG — Link Aggregation Groups. server. If 1 server starts to be swamped,
requests are forwarded to another server
LAN — Local Area Network. A communications with more capacity. Load balancing can also
network that serves clients within a refer to the communications channels
geographical area, such as a building. themselves.
LBA — Logical block address. A 28-bit value that LOC — “Locations” section of the Maintenance
maps to a specific cylinder-head-sector Manual.
address on the disk.
Logical DKC (LDKC) — Logical Disk Controller
LC — Lucent connector. Fibre Channel connector Manual. An internal architecture extension
that is smaller than a simplex connector (SC). to the Control Unit addressing scheme that
LCDG — Link Processor Control Diagnostics. allows more LDEVs to be identified within 1
LCM — Link Control Module. Hitachi enterprise storage system.
MPI — (Electronic) Master Patient Identifier. Also NIS — Network Information Service (originally
known as EMPI. called the Yellow Pages or YP).
R-JNL — Secondary journal volumes. RTO — Recovery Time Objective. The length of
time that can be tolerated between a disaster
RKAJAT — Rack Additional SATA disk tray. and recovery of data.
SFP — Small Form-Factor Pluggable module Host SMTP — Simple Mail Transfer Protocol.
connector. A specification for a new SMU — System Management Unit.
generation of optical modular transceivers.
The devices are designed for use with small Snapshot Image — A logical duplicated volume
form factor (SFF) connectors, offer high (V-VOL) of the primary volume. It is an
speed and physical compactness, and are internal volume intended for restoration.
hot-swappable. SNIA — Storage Networking Industry
SHSN — Shared memory Hierarchical Star Association. An association of producers and
Network. consumers of storage networking products,
SID — Security Identifier. A user or group whose goal is to further storage networking
identifier within the Microsoft Windows technology and applications. Active in cloud
security model. computing.
SIGP — Signal Processor. SNMP — Simple Network Management Protocol.
SIM — (1) Service Information Message. A A TCP/IP protocol that was designed for
message reporting an error that contains fix management of networks over TCP/IP,
guidance information. (2) Storage Interface using agents and stations.
Module. (3) Subscriber Identity Module. SOA — Service Oriented Architecture.
—Y—
YB — Yottabyte.
Yottabyte — A highest-end measurement of data
at the present time. 1YB = 1,024ZB, or 1
quadrillion GB. A recent estimate (2011) is
that all the computer hard drives in the
world do not contain 1YB of data.
-back to top-
—Z—
z/OS — z Operating System (IBM® S/390® or
z/OS® Environments).
z/OS NFS — (System) z/OS Network File System.
z/OSMF — (System) z/OS Management Facility.
zAAP — (System) z Application Assist Processor
(for Java and XML workloads).
Zettabyte (ZB) — A high-end measurement of
data at the present time. 1ZB = 1,024EB.
zFS — (System) zSeries File System.
zHPF — (System) z High Performance FICON.
zIIP — (System) z Integrated Information
Processor (specialty processor for database).
Zone — A collection of Fibre Channel Ports that
are permitted to communicate with each
other via the fabric.