Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 43

Data Collection

Tools
Module 17
Data ONTAP 7-Mode Administration
Module Objectives

By the end of this module, you should be able to:


 Use the sysstat, stats, and statit
commands
 Describe the factors that affect RAID performance
 Execute commands to collect data about write and
reads throughputs
 Execute commands to verify the operation of
hardware, software, and network components
 Identify commands and options used to obtain
configuration and status

© 2011 NetApp, Inc. All rights reserved. 2


System Health

Performance problems can originate from


multiple sources. To avoid some of these
problems, check or monitor the following:
 Disk configuration
– Disk status
– Read and write performance
 RAID configuration
 Connectivity configuration
 Performance measures

© 2011 NetApp, Inc. All rights reserved. 3


Disk Status

© 2011 NetApp, Inc. All rights reserved. 4


Disk Status
 Monitor disks:
– shelfchk
– sasadmin
– led_on diskid and led_off diskid
(priv set advanced command)
 Storage Health Monitor:
– Is a simple storage system management service
– Is automatically initiated during system boot
– Provides background monitoring of individual disk
performance
– Detects impending disk problems before they actually
occur 
– disk shm_stats (priv set advanced command)

© 2011 NetApp, Inc. All rights reserved. 5


Syslog Messages
 shm: disk has reported a predicted
failure (PFA) event: disk XX,
serial_number XXXX
 shm: link failure detected, upstream
from disk: id XX, serial_number XXXXX
 shm: disk I/O completion times too long:
disk XX, serial number XXXXX
 shm: possible link errors on disk: id
XX, serial number XXXXX
 shm: disk returns excessive recovered
errors: disk XX, serial number XXXXX
 shm: intermittent instability on the
loop that is attached to Fibre Channel
adapter: id XXX, name XXXXX

© 2011 NetApp, Inc. All rights reserved. 6


Read and Write
Performance

© 2011 NetApp, Inc. All rights reserved. 7


Read Performance
 The Data ONTAP® operating system is
optimized for write performance.
 Read performance can decrease over time,
although efficient use of cache can offset some
disk performance issues.
 To administrate read performance:
– To measure optimization level:
system> reallocate measure [vol | file]
– To optimize a system for read performance:
system> reallocate start pathname

© 2011 NetApp, Inc. All rights reserved. 8


Write Performance Commands
Use these commands to research write performance:

Command Function
sysstat Displays system-wide real-time statistics
for an interval (in seconds)
stats Displays system-wide real-time
performance data averaged over an
interval (in seconds)
statit Displays collected disk utilization

© 2011 NetApp, Inc. All rights reserved. 9


Write Performance: sysstat Command

system> sysstat -c 10 -s 5
CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache
in out read write read write age
2% 0 0 0 0 0 9 23 0 0 >60
0% 0 0 0 0 0 0 0 0 0 >60
5% 0 0 0 0 0 21 27 0 0 >60
1% 0 0 0 0 0 0 0 0 0 >60
5% 0 0 0 0 0 20 28 0 0 >60
1% 0 0 0 0 0 0 0 0 0 >60
4% 0 0 0 0 0 21 26 0 0 >60
1% 0 0 0 0 0 0 0 0 0 >60
5% 0 0 0 0 0 22 27 0 0 >60
0% 0 0 0 0 0 0 0 0 0 >60
--
Summary Statistics (10 samples 5.0 secs/sample)
CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache
in out read write read write age
Min
0% 0 0 0 0 0 0 0 0 0 >60
Avg
2% 0 0 0 0 0 9 13 0 0 >60
Max
5% 0 0 0 0 0 22 28 0 0 >60

© 2011 NetApp, Inc. All rights reserved. 10


Performance Counters
 Counters are organized in an object-instance-
counter hierarchy.
 Counters are collected from Counter Manager.
 The stats command allows users to look at any
object-instance and the corresponding counter (and
supports preset files).

vol1 avg_latency:53.18us
volume
vol2 avg_latency:53.18us

© 2011 NetApp, Inc. All rights reserved. 11


Counter Manager

Third-Party Tools Auto- Ops. Windows New and Existing


Support Mgr. Perfmon Enhanced PerformanceC
Clients CLI ommands

Data SNMP Zephyr APIs


Archive

SMB calls
Windows
Perfmon
Support

Counter Manager (CM)

Performance Counters

© 2011 NetApp, Inc. All rights reserved. 12


stats Command Syntax
 The stats command lets you collect or view
statistical data on a storage system.
 The stats command can be run in one of
three ways:
– Single: current counter values are displayed
stats show
– Repeating: counter values are displayed
multiple times at a fixed interval
stats show –i 1
– Period: counters are gathered over a single
period of time and then displayed
stats start then stats stop

© 2011 NetApp, Inc. All rights reserved. 13


stats Command Example 1

system> stats list objects

system> stats list instances

qtree aggregates iscsi fcp volume lun

flexvol1/users aggr1 iscsi fcp vol0 /vol/clone1/lun1 : C4/phnu0DG6S


flexvol2/home vol0 flexvol1 /vol/flexvol3/lun1 : C4/phnu0AbVV
flexvol2 /vol/flexvol2/lun1 : C4/phnu0Ab3S
flexvol3 /vol/flexvol1/lun1 : C4/phnu0AaWl
clone1

target nfsv3 cifs ifnet processor system disk

vtic nfs cifs e0 processor0 system 20:00:00:0c:50:a3:c7:1


iswta e7a processor1 20:00:00:0c:50:a3:b5:0
e7b processor2 20:00:00:0c:50:a3:66:f
processor3 20:00:00:0c:50:a3:6b:5
20:00:00:0c:50:a3:69:8

© 2011 NetApp, Inc. All rights reserved. 14


stats Command Example 2
system> stats list counters qtree system> stats list counters volume

system> stats explain counters system> stats explain counters volume


qtree nfs_ops write_ops

qtree volume

flexvol1/users vol0
flexvol2/home flexvol1
clone1

nfs_ops
cifs_ops total_ops
avg_latency
read_ops
read_data Counters for object name:
Counters for object name: qtree read_latency volume
Name: nfs_ops write_ops Name: write_ops
Description: Number of NFS write_data Description: Number of writes
operations per second to the qtree write_latency per second to the volume
Properties: rate other_ops Properties: rate
Unit: per_sec other_latency Unit: per_sec

© 2011 NetApp, Inc. All rights reserved. 15


stats Command Example 3
disk
system> stats show disk:*:*
20:00:00:0c:50:a3:c7:1
20:00:00:0c:50:a3:b5:0 disk:20:00:00:0c:50:a3:c7:11:total_transfers:0/s
20:00:00:0c:50:a3:66:f disk:20:00:00:0c:50:a3:c7:11:user_reads:0/s
20:00:00:0c:50:a3:6b:5 disk:20:00:00:0c:50:a3:c7:11:user_writes:0/s
20:00:00:0c:50:a3:69:8 disk:20:00:00:0c:50:a3:c7:11:cp_reads:0/s
disk:20:00:00:0c:50:a3:c7:11:guaranteed_reads:0/s
total_transfers disk:20:00:00:0c:50:a3:c7:11:guaranteed_writes:0/s
user_reads disk:20:00:00:0c:50:a3:c7:11:user_read_chain:0
user_writes …
cp_reads disk:20:00:00:0c:50:a3:67:5a:user_read_chain:0
guaranteed_reads disk:20:00:00:0c:50:a3:67:5a:user_write_chain0
guaranteed_writes
user_read_chain
user_write_chain In the sample above, we are listing stats for all the disks
cp_read_chain
guarenteed_read_chain system>stats show disk:20::00::00::0c::50::a3::6b::58:disk_busy
guarenteed_write_chain
disk:20:00:00:0c:50:a3:6b:58:disk_busy:0%
user_read_blocks
system>
user_write_blocks
cp_read_blocks
guarenteed_read_blocks
guarenteed_write_blocks
user_read_latency
user_write_latency Note: The disk instance name contains colons, therefore it
cp_read_latency must de-referenced by using the colon twice
guarenteed_read_latency
guarenteed_write_latency
disk_busy In the sample above, we are listing a specific counter for a disk instance
© 2011 NetApp, Inc. All rights reserved. 16
Preset sysstat.xml File

system> stats show -p sysstat -i 1


CPU NFS CIFS HTTP Net in Net out Disk rea Disk wri
#cat /etc/stats/preset/sysstat.xml % /s /s /s KB/s KB/s KB/s KB/s
<?xml version = "1.0" ?> 0 0 0 0 0 0 0 0
<!-- This preset is similar to the tradition
'sys- stat‘ command, using column
1 0 0 0 0 1 48 268
output --> 0 0 0 0 1 0 0 0
<preset orientation="column"
2 0 34 0 924 23 0 0
print_instance_names="false"
catenate_instances="true" >
<object name="SYSTEM">
<counter name="cpu_busy">
<width>4</width>
<title>CPU</title>
</counter>
<counter name="nfs_ops">
<width>6</width>
<title>NFS</title>
</counter> You can create customized XML files
<counter name="cifs_ops">
<width>6</width> that display only the statistics that are
<title>CIFS</title>
</counter> important to you

</object>
</preset>
#

© 2011 NetApp, Inc. All rights reserved. 17


Client-Side Tools: Perfmon

The Windows perfmon utility:


 Connects to the storage
system from Window Server
2003
 Requires that CIFS be
licensed and running on the
storage system
 Receives output from the
stats command and graphs
the data
 To view the Add Counters
screen, in the Performance
window, click the plus sign (+).
NOTE: Does not work with
Windows Server 2008 and higher

© 2011 NetApp, Inc. All rights reserved. 18


RAID Configuration

© 2011 NetApp, Inc. All rights reserved. 19


RAID Groups

aggr0 aggr1 aggr2

rg0 rg0 rg0

rg1

© 2011 NetApp, Inc. All rights reserved. 20


RAID Group Size and Composition

Poor RAID configuration choices:


 Unnecessary use of multiple RAID groups
 Mixed disk sizes
 RAID groups with wide variations in capacity
 RAID groups with only one or two data disks
each
 RAID groups with a number of disks larger
than the default

© 2011 NetApp, Inc. All rights reserved. 21


Initial RAID Group Configuration
 Limit the number of disks in a RAID group to
the recommended number
 Ensure that each RAID group in an aggregate
has approximately the same capacity
 Ensure that each RAID group in an aggregate
has at least three data disks
 Use disks of the same size within a RAID
group to optimize write performance
 Use the RAID-DP® feature of Data ONTAP to
protect against disk failures

© 2011 NetApp, Inc. All rights reserved. 22


Adding Disks to Existing RAID Groups
 Add RAID groups when the applied load is
stressing the drives in the current array.
 Add RAID groups and disks before the file
system or aggregate is 80% to 90% full.
 Add disks in groups.
 Plan data expansion so that at least several
data disks are used for each RAID group.

© 2011 NetApp, Inc. All rights reserved. 23


Monitoring
Connectivity

© 2011 NetApp, Inc. All rights reserved. 24


Monitor Connectivity
 Media Access Control (MAC)
– ifconfig
– ifstat
– arp
 TCP/IP
– ifconfig
– /etc/rc and /etc/hosts
– ping
– netstat –r
– netdiag
 Protocols
– nfsstat
– cifs stat
– nbtstat

© 2011 NetApp, Inc. All rights reserved. 25


Performance Measures

© 2011 NetApp, Inc. All rights reserved. 26


Measuring NFS Performance
 Use this option: nfs.per_client_stats.enable [on|off].
 Disable the option when you are not using nfsstat –l.
 This display shows the breakdown on this mountpoint of lookups,
reads, writes, and all operations. The average deviation and the
settings for retransmissions of each type also are displayed.

Data ONTAP NFS Output Command: nfsstat -l


Round-trip system> nfsstat -l
172.17.25.13 sherlock NFSOPS = 2943506 (90%)
response times 172.17.25.16 watson NFSOPS = 3553686 ( 2%)
for specific NFS 172.17.25.18 hudson NFSOPS = 2738083 ( 1%)
operations are 172.17.230.7 conan NFSOPS = 673247l ( 3%)
displayed. 172.17.230.8 baker NFSOPS = 202614527 ( 1%)
172.17.230.9 moriarty NFSOPS = 1006881 ( 0%)
175.17.230.10 doyle NFSOPS = 1185 ( 0%))

© 2011 NetApp, Inc. All rights reserved. 27


Measuring CIFS Performance
This number is the total number of This column represents
operations since smb_hist millisecond (ms) timestamps
statistics were last reset. for operations.

Analyzing smb_hist output


CIFS request time processing: (46457) - milliseconds units
0ms 1ms 2ms 3ms 4ms 5ms 6ms 7ms
13175 17752 5111 664 451 478 570 568
<16ms <24ms <32ms <40ms <48ms <56ms <64ms unused
4039 2309 569 165 61 21 10 0

Every other row displays the number of The time interval window lies halfway between
operations that took place in the interval in the values for adjacent columns. In this
the row above it. In this example, 13,715 example, 165 operations occurred in the
operations happened in less than 0.5 ms. window from 36 ms to 44 ms.

© 2011 NetApp, Inc. All rights reserved. 28


The statit Command
 Is an advanced-mode command used for
detailed analysis of system performance
 Gathers per-second statistics averaged over
the length of time it runs in the background
 Shows statistics representing all physical and
some logical objects on the storage system
 Collects data that usually represents rates at
which things happen

© 2011 NetApp, Inc. All rights reserved. 29


Using the statit Command
To obtain statistics using the statit command,
complete these steps:
1. Enter advanced privilege mode:
priv set advanced
2. Start collecting statistics:
statit –b
3. After the necessary amount of time to capture the
desired functionality’s statistics, run:
statit –e –n
4. To return to normal admin privilege mode, run:
priv set admin

© 2011 NetApp, Inc. All rights reserved. 30


Sections of the statit Command Report
 CPU
 Multiprocessor
 CSMP domain switches
 Miscellaneous
 WAFL® (Write Anywhere File Layout)
 RAID
 Network interface
 Disk
 Aggregate
 Spares and other disks
 FCP
 iSCSI
 Tape

© 2011 NetApp, Inc. All rights reserved. 31


CPU Statistics
CPU Statistics
506.934263 time (seconds) 100 %
275.044317 system time 54 %
23.412966 rupt time 5 % (7022 rupts x 0 usec/rupt
251.466451 non-rupt system time 50 %
271.837944 idle time 44 %
439.543653 time in CP 92 % 100 %
21.837230 rupt time in CP 5 % (132 rupts x 0 sec/rupt)

© 2011 NetApp, Inc. All rights reserved. 32


Multiprocessor Statistics
Multiprocessor Statistics (per second)
cpu0 cpu1 total
sk switches 1378.09 46.82 1424.91
hard switches 1175.27 29.15 1204.42
domain switches 103.89 16.08 119.96
CP rupts 0.00 0.00 0.00
nonCP rupts 100.00 0.00 100.00
nonCP rupt usec 0.00 0.00 0.00
Idle 1000000.00 1000000.00 2000000.00
kahuna 0.00 0.00 0.00
network 0.00 0.00 0.00
storage 0.00 0.00 0.00
exempt 0.00 0.00 0.00
raid 0.00 0.00 0.00
target 0.00 0.00 0.00
netcache 0.00 0.00 0.00
netcache2 0.00 0.00 0.00

© 2011 NetApp, Inc. All rights reserved. 33


Miscellaneous Statistics

Miscellaneous Statistics (per second)


175680.88 hard context switches 16477.97 NFS operations
0.00 CIFS operations 0.00 HTTP operations
50215.09 network KB received 102220.83 network KB transmitted
101387.45 disk KB read 76757.23 disk KB written
46074.00 NVRAM KB written 0.00 nolog KB written
23517.69 WAFL bufs given to clients 0.00 checksum cache hits(0%)
23517.69 no checksum - partial buffer 0.00 FCP operations
0.00 iSCSI operations  

© 2011 NetApp, Inc. All rights reserved. 34


WAFL Rates
WAFL Statistics (per second)
47.29 name cache hits ( 99%) 0.39 name cache misses ( 1%)
213379.74 buf hash hits ( 87%) 31023.74 buf hash misses ( 13%)
28896.85 inode cache hits ( 100%) 0.00 inode cache misses ( 0%)
38058.36 buf cache hits ( 62%) 23119.68 buf cache misses ( 38%)
910.79 blocks read 23551.76 blocks read-ahead
11436.80 chains read-ahead 63.04 dummy reads
1778.91 blocks speculative read-ahead 16734.62 blocks written
20.76 stripes written 0.00 blocks over-written
0.01 wafl_timer generated CP 0.00 snapshot generated CP
0.00 wafl_avail_bufs generated CP 0.00 dirty_blk_cnt generated CP
0.00 full NV-log generated CP 0.00 back-to-back CP
0.00 flush generated CP 0.00 sync generated CP
0.00 deferred back-to-back CP 0.00 container-indirect-pin CP
0.00 low mbufs generated CP 0.17 low datavecs generated CP
49323.58 non-restart messages 862.70 IOWAIT suspends
93731410.56 next nvlog nearly full msecs 0.00 dirty buffer susp msecs
0.00 nvlog full susp msecs 1429632 buffers

© 2011 NetApp, Inc. All rights reserved. 35


Network Interface Statistics
Network Interface Statistics (per second)
iface side bytes packets multicasts errors collisions
e0 recv 171.69 2.55 0.00 0.00 0.00
xmit 115.22 1.42 0.00 0.00 0.00
e9 recv 0.00 0.00 0.00 0.00 0.00
xmit 0.00 0.00 0.00 0.00 0.00
e6 recv 0.00 0.00 0.00 0.00 0.00
xmit 0.00 0.00 0.00 0.00 0.00
vh recv 0.00 0.00 0.00 0.00 0.00
xmit 0.00 0.00 0.00 0.00 0.00

© 2011 NetApp, Inc. All rights reserved. 36


Disk Statistics
Disk Statistics (per second)
ut% is the percent of time the disk was busy.
xfers is the number of data transfer commands issued per second.
xfers = ureads + writes + cpreads + greads + gwrites

chain is the average number of 4K blocks per command.


usecs is the average disk round trip time per 4K block.
disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs
/vol0/plex0/rg0:
8a.16 5 3.69 0.57 1.00 94500 ...
8a.21 4 3.12 0.57 1.00 39500 ...

© 2011 NetApp, Inc. All rights reserved. 37


Aggregate and Other Disk Statistics
Aggregate statistics:

Minimum 0 0.00 0.00 0.00 0.00 0.00 0.00


Mean 1 0.28 0.00 0.28 0.00 0.00 0.00
Maximum 5 3.69 0.57 3.12 0.00 0.00 0.00

Spares and other disks:


8b.16 2 1.70 1.70 1.00 10167 0.00 .... . 0.00 .... . 0.00 .... . 0.00 ..
8b.17 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .
8b.18 0 0.00 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... . 0.00 .... .

© 2011 NetApp, Inc. All rights reserved. 38


FC, iSCSI, and Tape Operations
FCP Statistics (per second)
0.00 FCP Bytes recv 0.00 FCP Bytes sent
0.00 FCP ops

iSCSI Statistics (per second)


0.00 iSCSI Bytes recv 0.00 iSCSI Bytes xmit
0.00 iSCSI ops

Interrupt Statistics (per second)


2000.15 Clock 3.97 Fast Enet
47.68 FCAL 4.54 int_22
3.41 FCAL 2059.75 total

© 2011 NetApp, Inc. All rights reserved. 39


Other Resources
For more information about data collection and performance,
see the Data ONTAP Performance Analysis course, in which
you learn to:
 Use the recommended methodology to compare performance
data and performance analysis information
 Monitor performance using performance tools and establish a
baseline of expected throughput and response times for storage
systems under planned and increasing workloads
 Perform capacity planning by monitoring performance and
comparing baseline information over time to determine when a
storage system will reach maximum capacity
 Tune protocols such as CIFS, NFS, and SAN for optimal
performance (including locating resources with tuning guidelines
for database scenarios)
 Perform bottleneck analysis

© 2011 NetApp, Inc. All rights reserved. 40


Module Summary

In this module, you should have learned to:


 Use the sysstat, stats, and statit
commands
 Describe the factors that affect RAID performance
 Execute commands to collect data about write and
read throughputs
 Execute commands to verify the operation of
hardware, software, and network components
 Identify commands and options used to obtain
configuration and status

© 2011 NetApp, Inc. All rights reserved. 41


Exercise
Module 17: Data Collection Tools
Estimated Time: 60 minutes
Check Your Understanding: Answers
 Which command or commands can you use to
display disk utilization?
statit, stats
 Which command or commands can you use to
monitor connectivity?
ifconfig, ifstat, arp, ping, netstat
 Which command or commands can you use to
help detect impending disk problems before
they occur? 
disk shm_stats

© 2011 NetApp, Inc. All rights reserved. 45

You might also like