QV4311Exercise SG Hints

V3.1.0.
cover
Front cover
AIX Performance
Management I: Concepts and
Tools
(Course Code QV431)
Student Exercises
with Hints
ERC 1.1
UNIX Software Service Enablement

Student Exercises with Hints
April 2009 Edition
The information contained in this document has not been submitted to any formal IBM test and is distributed on an “as is” basis without
any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer
responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While
each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will
result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.
© Copyright International Business Machines Corporation 2009. All rights reserved.

This document may not be reproduced in whole or in part without the prior written permission of IBM.
Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is subject to restrictions
set forth in GSA ADP Schedule Contract with IBM Corp.
V3.1.0.1
TOC Contents
Exercise 1. Data Collection and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
Exercise 2. Tuning Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
Exercise 3. Monitoring CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
Exercise 4. Virtual Memory Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . 4-1
Exercise 5. I/O Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
© Copyright IBM Corp. 2009 Contents iii

Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
iv AIX Performance Management I © Copyright IBM Corp. 2009

V3.1.0.1
EXempty Exercise 1. Data Collection and Analysis

(with Hints)
Introduction
This exercise covers the basic performance tools available for
monitoring and analysis.
Exercise Objectives
At the end of the lab, you should be able to:
• Be familiar with some basic performance analysis tools
• Use topas to monitor the system
• Install PerfPMR
• Collect performance data using PerfPMR
References
More information about the commands in this exercise are
available from the IBM Systems Information Center:
http://publib.boulder.ibm.com/infocenter/systems/index.jsp
© Copyright IBM Corp. 2009 Exercise 1. Data Collection and Analysis 1-1
Exercise Instructions with Hints

Preface
This exercise includes information for you to read, and exercise steps for you to
perform. The following examples illustrate the numbered checklist format used to
identify exercise steps:
__ 1. (This is example step one.) Login to ...
__ 2. (This is example step two.) Execute the following ...
Two versions of these instructions are available: one with hints and one without. You
can use either version to complete this exercise (or flip back and forth between the two
versions). In other words, use these two versions of the exercise in whatever way best
aids your learning. Also, please don’t hesitate to ask the instructor if you have
questions.
» In some cases, the answer given in a hint may be just an example, and there
may be other correct answers.
» All hints are marked with a >> sign.
1-2 AIX Performance Management I © Copyright IBM Corp. 2009

V3.1.0.1
EXempty Part 1 - Getting familiar with some basic performance analysis tools
In this section of the exercise, you will be introduced to some basic performance
analysis commands.
Steps
__ 1. Place a checkmark next to the commands below which are related to performance
monitoring.
Commands Performance monitoring command?

vmstat X
pstart
iostat X
topas X
auditpr
ps X
capture
tprof X
netperf
logevent
time X
getrunmode
sar X
lsperf
filemon X
acctprc
fileplace X
filestat
netstat X
dodisk
monitord
schedo
neto
ioo
» The table above has been filled in with the answers.
__ 2. Login to your assigned system as the root user.
__ 3. Which of the following commands display the percentage of processor time used?
The next few steps will lead you through running these commands and determining
the answers.
Does it display the percentage

Commands usage of processor?
Yes/No
vmstat Yes
iostat Yes
ps Yes
sar Yes

» All commands (vmstat, iostat, ps and sar) display the percentage usage
of processor time. These commands and others are normally used in
conjunction to better reflect the complete situation of the system
performance.
__ a. The vmstat command reports virtual memory statistics. Use the following
command to display three reports at 5-second intervals:
# vmstat 5 3
» The output should be similar to the following:

System configuration: lcpu=1 mem=1024MB ent=0.40
kthr memory page faults cpu

----- ----------- ------------------------ ------------ -----------------------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
1 0 169717 58198 0 0 0 0 0 0 7 106 165 0 0 99 0 0.00 0.7
1 0 169717 58198 0 0 0 0 0 0 1 16 152 0 0 99 0 0.00 0.5
1 0 169717 58198 0 0 0 0 0 0 5 17 149 0 0 99 0 0.00 0.5

V3.1.0.1
EXempty __ b. The iostat command reports CPU statistics, I/O statistics for the entire system,
adapters, TTY devices, disks CD-ROMs, tapes and file systems. Use the
following command to print the system throughput report since boot:
# iostat -s

System configuration: lcpu=1 drives=3 ent=0.40 paths=3 vdisks=1
tty: tin tout avg-cpu: % user % sys % idle % iowait physc % entc
0.0 0.9 0.0 0.0 99.7 0.3 0.0 0.0
System: rand212.beaverton.ibm.com
Kbps tps Kb_read Kb_wrtn
Physical 16.2 1.9 114431 67540
Disks: % tm_act Kbps tps Kb_read Kb_wrtn

hdisk0 0.6 16.2 1.9 114431 67540
hdisk1 0.0 0.0 0.0 0 0
hdisk2 0.0 0.0 0.0 0 0
__ c. The ps command shows the current status of processes, including CPU time.
Use the following command to print information about the processes associated
with this terminal.
# ps u

USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND
root 315604 0.0 0.0 808 844 pts/0 A 17:44:49 0:00 -ksh
root 332004 0.0 0.0 740 764 pts/0 A 17:50:05 0:00 ps u
__ d. The sar command collects, reports, or saves system activity information. Use
the following command to display three reports at 5-second intervals:
# sar 5 3
AIX rand212 1 6 00066C32D900 03/04/09
System configuration: lcpu=1 ent=0.40 mode=Uncapped
17:50:46 %usr %sys %wio %idle physc %entc

17:50:51 0 0 0 100 0.00 0.6
17:50:56 0 0 0 100 0.00 0.7
17:51:01 0 0 0 100 0.00 0.6
Average 0 0 0 100 0.00 0.6
__ 4. Which of the following commands report memory statistics? The next few steps will
lead you through running these commands and determining the answers.
Does it report memory statistics?

Commands
Yes/No
vmstat Yes
ps Yes
sar Yes
» All commands (vmstat, ps and sar) reports memory statistics. These
commands and others are normally used in conjunction to better reflect the
complete situation of the system performance.
__ a. The vmstat command reports virtual memory statistics. Use the following
command to display various memory statistics:
# vmstat -s

969626 total address trans. faults
51289 page ins
11514 page outs
0 paging space page ins
76 paging space page outs
0 total reclaims
386426 zero filled pages faults
3399 executable filled pages faults
0 pages examined by clock
0 revolutions of the clock hand
0 pages freed by the clock
56070 backtracks
0 free frame waits
0 extend XPT waits
12981 pending I/O waits
62731 start I/Os
24678 iodones
1408659 cpu context switches
50542 device interrupts
574087 software interrupts
1373664 decrementer interrupts
0 mpc-sent interrupts
0 mpc-receive interrupts
14650 phantom interrupts
0 traps
9930065 syscalls

V3.1.0.1
EXempty
__ b. The ps command shows current status of processes, including memory usage.

Use the following command to print information about the processes associated
with this terminal.
# ps u

root 315604 0.0 0.0 808 844 pts/0 A 17:44:49 0:00 -ksh
root 331776 0.0 0.0 740 768 pts/0 A 18:00:30 0:00 ps u
__ c. The sar command collects, reports, or saves system activity information. Use
the following command to display three paging statistic reports at 5-second
intervals.
# sar -r 5 3
AIX rand212 1 6 00066C32D900 03/04/09
System configuration: lcpu=1 mem=1024MB ent=0.40 mode=Uncapped
18:01:49 slots cycle/s fault/s odio/s

18:01:54 129309 0.00 5.80 0.00
18:01:59 129309 0.00 0.40 0.00
18:02:04 129309 0.00 0.00 0.00
Average 129309 0 2 0
__ 5. Run the following commands and note which ones display input/output statistics for
the entire system. The next two steps will lead you through running these
commands and determining the answers.
Does it report I/O statistics?

Commands
Yes/No
iostat Yes
sar Yes
» Commands iostat and sar report I/O statistics. These commands and
others are normally used in conjunction to better reflect the complete
situation of the system performance.
__ a. The iostat command reports CPU statistics, input/output statistics for the entire
system, adapters, TTY devices, disks CD-ROMs, tapes and file systems. Use
the following command to display an extended drive report for all disks:
# iostat -D 1 1

System configuration: lcpu=1 drives=3 paths=3 vdisks=1
hdisk0 xfer: %tm_act bps tps bread bwrtn

0.0 0.0 0.0 0.0 0.0
read: rps avgserv minserv maxserv timeouts fails
0.0 0.0 0.0 0.0 0 0
write: wps avgserv minserv maxserv timeouts fails
0.0 0.0 0.0 0.0 0 0
queue: avgtime mintime maxtime avgwqsz avgsqsz sqfull
0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0 0
0.0 0.0 0.0 0.0 0 0
0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0 0
0.0 0.0 0.0 0.0 0 0
0.0 0.0 0.0 0.0 0.0 0.0
__ b. The sar command collects, reports, or saves system activity information. Use
the following command to report activity for each block device with the exception
of tape drives:
# sar -d 1
AIX rand212 1 6 00066C32D900 03/04/09
System configuration: lcpu=1 drives=3 ent=0.40 mode=Uncapped
18:04:21 device %busy avque r+w/s Kbs/s avwait avserv

18:04:22 hdisk0 0 0.0 0 0 0.0 0.0
hdisk1 0 0.0 0 0 0.0 0.0
hdisk2 0 0.0 0 0 0.0 0.0

V3.1.0.1
EXempty
__ 6. Run the following commands and note which ones display network statistics. The
next two steps will lead you through running these commands and determining the
answers.
Does it display network statistics?

Commands
Yes/No
netstat Yes
entstat Yes

» Commands netstat and entstat report network statistics. These
commands and others are normally used in conjunction to better reflect the
complete situation of the network performance.
__ a. The netstat command reports network status. Use the following command to
display the number of packets received, transmitted, and dropped in the
communications subsystem:
# netstat -D

Source Ipkts Opkts Idrops Odrops
-------------------------------------------------------------------------------
ent_dev0 29850 3326 0 0
---------------------------------------------------------------
Devices Total 29850 3326 0 0
-------------------------------------------------------------------------------
ent_dd0 29850 3326 0 0
---------------------------------------------------------------
Drivers Total 29850 3326 0 0
-------------------------------------------------------------------------------
ent_dmx0 29811 N/A 39 N/A
---------------------------------------------------------------
Demuxer Total 29811 N/A 39 N/A
-------------------------------------------------------------------------------
IP 7878 6858 51 39
IPv6 48 48 0 244
TCP 2102 1399 1 0
UDP 5757 5086 717 0
---------------------------------------------------------------
Protocols Total 15737 13343 769 39
-------------------------------------------------------------------------------
en_if0 29811 3279 0 0
lo_if0 3912 3919 7 0
---------------------------------------------------------------
Net IF Total 33723 7198 7 0

-------------------------------------------------------------------------------
NFS/RPC Client 12 N/A 0 N/A
NFS/RPC Server 0 N/A 0 N/A
NFS Client 0 N/A 0 N/A
NFS Server 0 N/A 0 N/A
---------------------------------------------------------------
NFS/RPC Total N/A 12 0 0
-------------------------------------------------------------------------------
(Note: N/A -> Not Applicable)
__ b. The entstat command reports ethernet device driver and device statistics. Use
the following command to display the device generic statistics for ent0:
# entstat ent0
-------------------------------------------------------------
ETHERNET STATISTICS (ent0) :
Device Type: Host Ethernet Adapter (l-hea)
Hardware Address: 00:21:5e:1d:09:80
Elapsed Time: 0 days 3 hours 24 minutes 48 seconds
Transmit Statistics: Receive Statistics:

-------------------- -------------------
Packets: 3306 Packets: 30009
Bytes: 368569 Bytes: 2599791
Interrupts: 0 Interrupts: 27974
Transmit Errors: 0 Receive Errors: 0
Packets Dropped: 0 Packets Dropped: 0
Bad Packets: 0
Max Packets on S/W Transmit Queue: 59
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 59
Broadcast Packets: 11 Broadcast Packets: 26399

Multicast Packets: 2 Multicast Packets: 0
No Carrier Sense: 0 CRC Errors: 0
DMA Underrun: 0 DMA Overrun: 0
Lost CTS Errors: 0 Alignment Errors: 0
Max Collision Errors: 0 No Resource Errors: 0
Late Collision Errors: 0 Receive Collision Errors: 0
Deferred: 0 Packet Too Short Errors: 0
SQE Test: 0 Packet Too Long Errors: 0
Timeout Errors: 0 Packets Discarded by Adapter: 0
Single Collision Count: 0 Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 59
General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 200
Driver Flags: Broadcast Running Simplex
64BitSupport ChecksumOffload LargeSend
DataRateSet

V3.1.0.1
EXempty Part 2 - Using topas to monitor the system

In this section of the exercise, you will run the topas command.
Steps
__ 7. If you are not already logged in, login to your assigned system as the root user.
__ 8. Run topas with no options by entering the following command:

# topas
The topas command reports selected statistics about the activity on the local
system. The command uses the curses library to display its output in a format
suitable for viewing on an 80x25 character-based display or in a window of at least
the same size on a graphical display. It can be configured to display a selected set
of performance metrics.
» The display should be similar to the following:
Topas Monitor for host: rand212 EVENTS/QUEUES FILE/TTY
Wed Mar 4 18:09:26 2009 Interval: 2 Cswitch 146 Readch 0
Syscall 33 Writech 169
CPU User% Kern% Wait% Idle% Physc Entc Reads 0 Rawin 0
ALL 0.1 0.4 0.0 99.5 0.00 0.6 Writes 0 Ttyout 152
Forks 0 Igets 0
Network KBPS I-Pack O-Pack KB-In KB-Out Execs 0 Namei 1
Total 0.5 3.0 2.0 0.2 0.3 Runqueue 1.0 Dirblk 0
Waitqueue 0.0
Disk Busy% KBPS TPS KB-Read KB-Writ MEMORY
Total 0.0 0.0 0.0 0.0 0.0 PAGING Real,MB 1024
Faults 0 % Comp 66.6
FileSystem KBPS TPS KB-Read KB-Writ Steals 0 % Noncomp 11.1
Total 0.0 0.2 0.0 0.0 PgspIn 0 % Client 11.1
PgspOut 0
Name PID CPU% PgSp Owner PageIn 0 PAGING SPACE
topas 331808 0.9 1.3 root PageOut 0 Size,MB 512
getty 229560 0.1 0.5 root Sios 0 % Used 1.1
java 266414 0.0 70.4 pconsole % Free 99.9
gil 53274 0.0 0.9 root NFS (calls/sec)
java 290960 0.0 37.4 root SerV2 0 WPAR Activ 0
snmpdv3n 180452 0.0 1.0 root CliV2 0 WPAR Total 0
aixmibd 176330 0.0 1.1 root SerV3 0 Press: "h"-help
snmpmibd 135342 0.0 0.9 root CliV3 0 "q"-quit
rpc.lock 217280 0.0 1.2 root
rmcd 270470 0.0 2.6 root
hostmibd 192728 0.0 1.0 root
sendmail 200802 0.0 1.1 root
tracelog 250014 0.0 0.7 root
netm 49176 0.0 0.4 root
pilegc 40980 0.0 0.6 root
lvmbb 86080 0.0 0.4 root
shlap64 90250 0.0 0.4 root
memp_rbd 94282 0.0 0.4 root
rgsr 98514 0.0 0.4 root
ldmp_pro 102522 0.0 0.5 root
__ 9. Stop the topas display by typing <Ctrl-C>.
__ 10. Use the following command to directly display the running processes:
# topas -P
Topas Monitor for host: rand212 Interval: 2 Wed Mar 4 18:10:26 2009
DATA TEXT PAGE PGFAULTS

USER PID PPID PRI NI RES RES SPACE TIME CPU% I/O OTH COMMAND
root 331814 315604 58 41 340 89 340 0:00 0.1 0 0 topas
root 229560 1 60 20 136 21 136 0:01 0.1 0 0 getty
root 53274 0 37 41 240 0 240 0:01 0.0 0 0 gil
pconsole 266414 311478 82 20 18016 18 18016 0:19 0.0 0 0 java
root 315604 172182 60 20 143 68 143 0:00 0.0 0 15 ksh
root 290960 168090 82 20 9567 18 9567 0:09 0.0 0 0 java
root 217280 1 60 20 304 0 304 0:00 0.0 0 0 rpc.lock
root 270470 168090 39 20 653 168 653 0:00 0.0 0 0 rmcd
root 98514 0 60 20 112 0 112 0:00 0.0 0 0 rgsr
root 200802 168090 60 20 275 184 275 0:00 0.0 0 0 sendmail
root 250016 168090 60 20 185 69 185 0:00 0.0 0 0 tracelog
root 49176 0 36 41 112 0 112 0:00 0.0 0 0 netm
root 40980 0 59 41 160 0 160 0:00 0.0 0 0 pilegc
root 147536 0 60 20 112 0 112 0:00 0.0 0 0 n4bg
root 155724 0 60 20 128 0 128 0:00 0.0 0 0 nfsSM
root 159822 0 60 20 112 0 112 0:00 0.0 0 0 rdpgc
root 90250 1 60 20 99 11 99 0:00 0.0 0 0 shlap64
root 94282 0 60 20 112 0 112 0:00 0.0 0 0 memp_rbd
root 82058 1 60 20 137 2 137 0:00 0.0 0 0 syncd
root 102522 1 60 20 128 0 128 0:00 0.0 0 0 ldmp_pro

V3.1.0.1
EXempty __ 12. Use the following command to directly display the file system display statistics:
# topas -F
================================================================================
FileSystem KBPS TPS KB-R KB-W Open Create Lock
/usr 0.0 0.0 0.0 0.0 0 0 0
/var 0.0 0.0 0.0 0.0 0 0 0
/tmp 0.0 0.0 0.0 0.0 0 0 0
/ 0.0 0.0 0.0 0.0 0 0 0
/home 0.0 0.0 0.0 0.0 0 0 0
/opt 0.0 0.0 0.0 0.0 0 0 0
/admin 0.0 0.0 0.0 0.0 0 0 0
/proc 0.0 0.0 0.0 0.0 0 0 0
__ 14. Use the following command to directly the disk metric statistics:
# topas -D
===============================================================================
Disk Busy% KBPS TPS KB-R ART MRT KB-W AWT MWT AQW AQD
hdisk0 0.0 2.4 0.6 0.0 0.0 0.0 2.4 6.9 9.6 0.0 0.0
hdisk1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
hdisk2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
» Note: Instead of starting/stopping topas, you can just type the letter D to
replace the current display with the disk metric statistics. If the D key is
pressed again, the display toggles back to the default main screen.
Part 3 - Installing PerfPMR

In this section of the exercise you will install the PerfPMR tools. The installation
package for AIX 6.1 is distributed as a compressed tar file available in the IBM FTP site
at ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr/perf61/.
For your convenience in this training lab environment, the PerfPMR package is
provided in the /u/QV431/PerfPMR_Package directory.
Steps
__ 17. Create a directory called perf61 under /u/QV431.

» # mkdir /u/QV431/perf61
__ 18. Change your directory to /u/QV431/perf61.

» # cd /u/QV431/perf61
__ 19. Uncompress the PerfPMR package into /u/QV431/perf61 using the following
command:
# zcat /u/QV431/PerfPMR_Package/perf61.tar.Z | tar -xvf -
x Install, 2237 bytes, 5 media blocks.
x PROBLEM.INFO, 2616 bytes, 6 media blocks.
x README, 9818 bytes, 20 media blocks.
x config.sh, 24808 bytes, 49 media blocks.
x emstat.sh, 1867 bytes, 4 media blocks.
x filemon.sh, 2281 bytes, 5 media blocks.
x getdate, 4667 bytes, 10 media blocks.
x getevars, 6594 bytes, 13 media blocks.
x getj2mem.sh, 1722 bytes, 4 media blocks.
x getmempool.sh, 1919 bytes, 4 media blocks.
x getvmpool.sh, 507 bytes, 1 media blocks.
x hpmcount.sh, 802 bytes, 2 media blocks.
x hpmstat.sh, 1890 bytes, 4 media blocks.
x iomon, 10092 bytes, 20 media blocks.
x iostat.sh, 4291 bytes, 9 media blocks.
x iptrace.sh, 1954 bytes, 4 media blocks.
x lparstat.sh, 3011 bytes, 6 media blocks.
x lsc, 28343 bytes, 56 media blocks.
x memdetails.sh, 19752 bytes, 39 media blocks.
x memfill, 7342 bytes, 15 media blocks.

V3.1.0.1
x monitor.sh, 8213 bytes, 17 media blocks.

EXempty x mpstat.sh, 1830 bytes, 4 media blocks.
x netstat.sh, 7826 bytes, 16 media blocks.
x nfsstat.sh, 2063 bytes, 5 media blocks.
x perfpmr.cfg, 2677 bytes, 6 media blocks.
x perfpmr.sh, 34324 bytes, 68 media blocks.
x perfxtra.sh, 605 bytes, 2 media blocks.
x pprof.sh, 2257 bytes, 5 media blocks.
x ps.sh, 6198 bytes, 13 media blocks.
x quicksnap.sh, 5249 bytes, 11 media blocks.
x sar.sh, 3416 bytes, 7 media blocks.
x setpri, 5045 bytes, 10 media blocks.
x setsched, 11259 bytes, 22 media blocks.
x svmon.sh, 2062 bytes, 5 media blocks.
x tcpdump.sh, 2296 bytes, 5 media blocks.
x tprof.sh, 2950 bytes, 6 media blocks.
x trace.sh, 11580 bytes, 23 media blocks.
x vmstat.sh, 7248 bytes, 15 media blocks.
__ 20. Install PerfPMR using the following command:

# sh ./Install
(C) COPYRIGHT International Business Machines Corp., 2000
PERFPMR Installation started...
PERFPMR Installation completed.
PerfPMR is ready to be used.
__ 21. Examine the shell scripts and configuration files which make up PerfPMR. Use the
ls command to see the content of the /u/QV431/perf61 directory.
» # ls /u/QV431/perf61
Install getj2mem.sh lparstat.sh perfpmr.cfg setsched
PROBLEM.INFO getmempool.sh lsc perfpmr.sh svmon.sh
README getvmpool.sh memdetails.sh perfxtra.sh tcpdump.sh
config.sh hpmcount.sh memfill pprof.sh tprof.sh
emstat.sh hpmstat.sh monitor.sh ps.sh trace.sh
filemon.sh iomon mpstat.sh quicksnap.sh vmstat.sh
getdate iostat.sh netstat.sh sar.sh
getevars iptrace.sh nfsstat.sh setpri
Part 4 - Collecting performance data using PerfPMR

In this section of the exercise you will execute the set of steps that should be followed to
collect performance data. It is important to notice that this exercise does not teach how
to obtain meaningful data, but how to collect data. A complete analysis of the system
environment, runtime, and application must be made to determine the best moment to
start PerfPMR and how long the data must be collected.
For your convenience in this training lab environment, the PerfPMR package is
provided in the /u/QV431/PerfPMR_Package directory.
Steps
__ 23. Create a directory called data_collect under /u/QV431.

» # mkdir /u/QV431/data_collect
__ 24. Change your directory to /u/QV431/data_collect.

» # cd /u/QV431/data_collect
__ 25. Run the following command to obtain the PerfPMR command line usage.
# /u/QV431/perf61/perfpmr.sh -?
Version: 610 2008/12/19
perfpmr.sh: Usage:
perfpmr.sh [-hPQDIgfnpscEHA][-r][-F file][-x file][-d sec] [-W program [-w sec]][-o

outputdirs][-Z][-z paxfile] monitor_seconds
-r run post processing scripts only
-P preview only - show scripts to run and disk space needed
-Q don't run lsattr,lvlv,lspv commands in order to save time
-D run perfpmr the original way without a perfpmr cfg file
-I get lock instrumented trace also
-g do not collect gennames output.
-f if gennames is run, specify gennames -f.
-A will use all other options such as -H, -E, and tprof -D needed for AMS systems
-E used if -E flag in tprof is desired
-h used if hpmstat desired.
-H used if hpmcount.sh is to be run
-n used if no netstat or nfsstat desired.
-p used if no pprof collection desired while monitor.sh running.

V3.1.0.1
-s used if no svmon desired.

EXempty -c used if no configuration information is desired.
-F file use file as the perfpmr cfg file - default is perfpmr.cfg
-x file only execute file found in perfpmr installation directory
-d sec sec is time to wait before starting collection period
default is delay_seconds 0
-W program wait until <program> is in the process table before data collection begins
delay -w seconds between each ps command or by default 10 seconds each time
-o outdirs list of directory names to archive into pax file (must be enclosed in quotes)
- default is current dir
-z paxfile archive data files into gzipped pax file specified with -z - ex.
/tmp/perfpmrNNNNN.bNNN.cNNN.pax.gz
-Z run pax and gzip after data collection (must be used with -z option)
monitor_seconds is for the the monitor collection period in seconds
Use 'perfpmr.sh 600' for standard collection period of 600 seconds
__ 26. Run the following command to run PerfPMR to collect performance data for 60
seconds (1 minute).
# /u/QV431/perf61/perfpmr.sh 60
Note: By default, PerfPMR (perfpmr.sh without any parameters) runs for 600
seconds (10 minutes).
(C) COPYRIGHT International Business Machines Corp., 2000,2001,2002,2003,2004-2008
19:26:57-03/04/09 : perfpmr.sh begin

PERFPMR: hostname: rand212.beaverton.ibm.com
PERFPMR: perfpmr.sh Version 610 2009/02/24
PERFPMR: current directory: /u/QV431/data_collect
PERFPMR: perfpmr tool directory: /u/QV431/perf61
PERFPMR: Parameters passed to perfpmr.sh: 60
PERFPMR: Data collection started in foreground (renice -n -20)
19:26:57-03/04/09 : PERFPMR: executing perfpmr_trace -k 10e,254,116,117 -L 20000000
-T 20000000 -I 5
TRACE.SH: Starting trace for 5 seconds

/bin/trace -p -r PURR -k 10e,254,116,117 -f -n -C all -d -L 20000000 -T 20000000
-ao trace.raw
TRACE.SH: Data collection started
TRACE.SH: Data collection stopped
TRACE.SH: Trace stopped
19:27:07-03/04/09 : running gensyms
19:27:09-03/04/09 : gensyms completed
TRACE.SH: Collecting gennames data
19:27:09-03/04/09 : running gennames
19:27:13-03/04/09 : gennames completed
TRACE.SH: Trcnm data is in file trace.nm
TRACE.SH: /etc/trcfmt saved in file trace.fmt
TRACE.SH: Binary trace data is in file trace.raw
TRACE.SH: Enabling locktrace

lock tracing enabled for all classes
TRACE.SH: Starting trace for 5 seconds
/bin/trace -p -r PURR -j 106,10C,10E,112,113,134,139,465,46D,606,607,608,609 -f -n
-C all -d -L 20000000 -T 20000000 -ao trace.raw.lock
TRACE.SH: Data collection started

TRACE.SH: Data collection stopped
TRACE.SH: Trace stopped
TRACE.SH: Disabling locktrace
lock tracing disabled for all classes
TRACE.SH: Binary trace data is in file trace.raw.lock
19:27:21-03/04/09 : PERFPMR: executing perfpmr_monitor -I 0 -N 0 -S 0 60
MONITOR: Capturing initial lsps, svmon, and vmstat data

MONITOR: Starting perf_xtra programs: initsleep=0 count=0 sleep=0
MONITOR: Starting system monitors for 60 seconds.
iostat: No tapes found in the system.
MONITOR: Waiting for measurement period to end....
iostat: 0551-157 Asynchronous I/O not configured on the system.
MONITOR: Capturing final lsps, svmon, and vmstat data

MONITOR: Generating reports....
MONITOR: Network reports are in netstat.int and nfsstat.int
MONITOR: Monitor reports are in monitor.int and monitor.sum
19:28:44-03/04/09 : PERFPMR: executing perfpmr_iptrace 10
IPTRACE: Starting iptrace for 10 seconds....

0513-059 The iptrace Subsystem has been started. Subsystem PID is 413720.
0513-044 The iptrace Subsystem was requested to stop.
IPTRACE: iptrace collected....
IPTRACE: Binary iptrace data is in file iptrace.raw
19:28:56-03/04/09 : PERFPMR: executing perfpmr_tcpdump 10
TCPDUMP: Starting tcpdump for 10 seconds....

TCPDUMP: tcpdump collected....
TCPDUMP: Binary tcpdump data is in file tcpdump.raw

19:29:11-03/04/09 : PERFPMR: executing perfpmr_filemon -T 60000000 60
FILEMON: Starting filesystem monitor for 60 seconds....

19:29:14-03/04/09 : trcon initiated
FILEMON: tracing started
19:30:14-03/04/09 : trcstop initiated
FILEMON: tracing stopped
19:30:14-03/04/09 : trcstop completed
FILEMON: Generating report....
19:30:19-03/04/09 : filemon completed
19:30:19-03/04/09 : PERFPMR: executing perfpmr_netpmon 60
19:30:19-03/04/09 : PERFPMR: executing perfpmr_tprof -f10 60
TPROF: Starting tprof for 60 seconds....
TPROF: Starting tprof with no PURR for 60 seconds....

TPROF: Sample data collected....
TPROF: Generating reports in background (renice -n 20)
TPROF: Tprof report is in tprof.sum
19:32:27-03/04/09 : PERFPMR: executing perfpmr_hpmcount -H 5
19:32:27-03/04/09 : PERFPMR: executing perfpmr_config -u 20
19:32:27-03/04/09 : config.sh begin
CONFIG.SH: Generating SW/HW configuration

19:32:27-03/04/09 : copying ODM files
19:32:29-03/04/09 : ipcs -Smqsa
19:32:29-03/04/09 : lspv
19:32:29-03/04/09 : lsvg
19:32:29-03/04/09 : lsvg -l
19:32:29-03/04/09 : lslv lv
19:32:31-03/04/09 : lsattr -E -l dev

V3.1.0.1
19:32:31-03/04/09 : df
EXempty 19:32:31-03/04/09 : netstat -in -rn -D -an -c
19:32:32-03/04/09 : getmempool.sh
/u/QV431/perf61/getmempool.sh[5]: fr_per_pool=nf/np: divide by zero
19:32:33-03/04/09 : getvmpool.sh
19:32:33-03/04/09 : getj2mem.sh
19:32:34-03/04/09 : genkld
19:32:34-03/04/09 : genkex
19:32:34-03/04/09 : getevars
19:32:34-03/04/09 : errpt
19:32:34-03/04/09 : emgr -l
There is no efix data on this system.

19:32:34-03/04/09 : lslpp -ch
19:32:35-03/04/09 : instfix -ic
19:32:36-03/04/09 : lscfg -vp
19:32:37-03/04/09 : xm -u |kdb
19:32:41-03/04/09 : echo vnode|kdb
19:32:42-03/04/09 : echo vfs|kdb
19:32:42-03/04/09 : echo dmpdt_chrp -i
19:32:42-03/04/09 : sysdumpdev -l, -e
19:32:45-03/04/09 : memdetils.sh -u 20
19:32:50-03/04/09 : memdetils.sh completed
CONFIG.SH: Report is in file config.sum
19:32:50-03/04/09 : config.sh completed
PERFPMR: Data collection complete.
PERFPMR: Data files can be archived and gzipped using:

perfpmr.sh -z filename [-o "dirs"]
where
filename is the name of the archive file.
An example of a typical archive filename:
/tmp/NNNNN.bNNN.cNNN.perfpmr.pax.gz
where NNNNN is the pmr#, .bNNN is the pmr branch #,
and .cNNN is the country code
-o "dirs":
dirs is a list of directories enclosed in quotes. If -o is
not specified, all files in current directory are archived.
You must be in the directory which contains the list
of directories when using the -z and -o flags
After pax gzip file has been created, ftp the file to:
testcase.software.ibm.com in /toibm/aix
Login as user id: ftp
19:32:50-03/04/09 : perfpmr.sh completed

d
__ 27. Examine the /u/QV431/data_collect directory using the ls command.

» # ls /u/QV431/data_collect
aiostat.int psa.elfk tprof.ctrc
config.sum psb.elfk tprof.out
crontab_l psemo.after tprof.sum
devtree.out psemo.before tprof_nopurr.csyms
errlog rand212_090304.topas tprof_nopurr.ctrc
errpt_a rand212_090122.topas tprof_nopurr.out
errtmplt rand212_090123.topas tprof_nopurr.prof
etc_filesystems rand212_090124.topas trace.crash.inode
etc_inittab rand212_090125.topas trace.fmt
etc_rc rand212_090126.topas trace.inode
etc_security_limits rand212_090127.topas trace.j2.inode
fastt.out rand212_090128.topas trace.maj_min2lv
fcstat.after rand212_090129.topas trace.mount
fcstat.before rand212_090130.topas trace.nm
filemon.sum rand212_090131.topas trace.raw
genkex.out rand212_090201.topas trace.raw-0
genkld.out rand212_090202.topas trace.raw.lock
gennames.out rand212_090203.topas trace.raw.lock-0
getevars.out rand212_090204.topas trace.syms
instfix.out rand212_090205.topas tunables.sum
iomon.out rand212_090206.topas tunables_lastboot
iostat.Dl rand212_090207.topas tunables_lastboot.log
iostat.p rand212_090208.topas tunables_nextboot
iptrace.raw rand212_090209.topas tunablesx.sum
lparstat.int rand212_090210.topas unix.what
lparstat.l rand212_090211.topas vfs.kdb
lparstat.sum rand212_090212.topas vmker.after
lslpp.Lc rand212_090213.topas vmker.before
lsps.after rand212_090214.topas vmpools.out
lsps.before rand212_090222.topas vmpools.save
lsrset.out rand212_090223.topas vmstat_s.out
mem_details_dir rand212_090224.topas vmstat_s.p.after
mempools.out rand212_090225.topas vmstat_s.p.before
mempools.save rand212_090226.topas vmstat_v.after
monitor.int rand212_090227.topas vmstat_v.before
monitor.sum rand212_090228.topas vmstati.after
mpstat.int rand212_090301.topas vmstati.before
netstat.int rand212_090302.topas vnode.kdb
nfsstat.int rand212_090303.topas w.int
objrepos rand212_090304.topas xmwlm.090116
perfpmr.cfg sar.bin xmwlm.090117
perfpmr.int svmon.after xmwlm.090118
persistent_local svmon.after.S xmwlm.090119
pile.after svmon.before xmwlm.090120
pile.before svmon.before.S xmwlm.090121
pile.out tcpdump.raw xmwlm.090122
pprof.trace.raw tprof.csyms

V3.1.0.1
EXempty __ 28. Examine the contents of the files generated by PerfPMR and answer the following
questions.
__ a. Which file contains the system configuration summary? ___________________
» config.sum
__ b. Which file contains the summaries of the ps, sar, iostat, and vmstat
commands? ______________________________
» monitor.sum
__ c. List the six primary report types generated by PerfPMR:

- *.int
- *.sum
- *.out
- *.before
- *.after
- *.raw
__ d. What type of information is in the *.int files?

» *.int files are from commands that collect the data at intervals over time. For
example, data collected from vmstat, iostat, sar, lparstat, mpstat,
netstat and nfsstat.
__ e. What type of information is in the *.sum files?

» *.sum files contain data that is collected once. There is also a file called
monitor.sum that contains statistics that are averaged from the monitor.int
files.
__ f. What type of information is in the *.out files?

» *.out files contain the output from a command is only run once.
__ g. What type of information is in the *.raw files?

» *.raw files are binary files from utilities like trace, iptrace, and tcpdump.
These files can be processed to create the ASCII report file by using the -r
flag with the shell program. For example, iptrace.sh -r.
END OF LAB

V3.1.0.1
EXempty Exercise 2. Tuning Overview

(with Hints)
Introduction
This exercise covers the basic tools available for performance tuning.
Exercise Objectives
• Be familiar with the usage of the basic performance tuning
commands
• View and change the attributes of tunables
• Validate the tunable parameters
• Examine the tunables files
• Reset tunables to their default values
References
© Copyright IBM Corp. 2009 Exercise 2. Tuning Overview 2-1


Preface
questions.
In some cases, the answer given in a hint may be just an example, and there may be
other correct answers.
All hints are marked with a >> sign.

V3.1.0.1
EXempty Part 1 - Getting familiar with some basic tools for performance tuning
In this section of the exercise, you will be introduced to some performance tuning
commands.
Steps
__ 1. Place a checkmark next to the commands below which are related to performance
tuning.
Commands Performance tuning command?

vmo X
ps
tunsave X
schedo X
chdev X
tunset
tundefault X
neto
tunchange X
time
nice X
sar
tuneo
nfso X
setpri X
renice X
tuncheck X
no X
ioo X

__ 2. AIX offers a set of tuning commands (schedo, vmo, ioo, raso, no, and nfso) which
provides a standard interface for displaying, resetting, and changing tuning
parameters (also known as tunables). Fill in the table by matching the commands to
the functions.
Tuning Command /
schedo vmo ioo raso no nfso
Description
Manages Input/Output
X
tunable parameters
Manages Network File
System (NFS) tuning X
parameters
Manages network tuning
X
parameters
Manages Virtual Memory
Manager tunable X
parameters
Manages processor
scheduler tunable X
parameters
Manages Reliability,
Availability, Serviceability X
parameters
__ 3. Run one of the following commands to obtain the command line usage for the tuning
commands.
# schedo -h
# vmo -h
# ioo -h
Usage: schedo -h [tunable] | {[-F] -L [tunable]} | {[-F] -x [tunable]}
schedo [-p|-r] (-a [-F] | {-o tunable})
schedo [-p|-r] (-D | ({-d tunable} {-o tunable=value}))
-h Display help about the command and its arguments
-h tunable Display help about a tunable
-L [tunable] List information about one or all tunables in a
table format
-x [tunable] List information about one or all tunables in a

V3.1.0.1
comma-separated format
EXempty -a Display value for all tunables, one per line
-F Force display of restricted tunables when options
(-a/-L/-x) are specified alone on the command line,
else ignored
-o tunable Display current value of a tunable
-D Reset all tunables to their default values
-d tunable Reset tunable to its default value
-o tunable=value Set tunable to value
-r Make change(s) (-D/-d/-o) or display (-a/-o) apply to
nextboot value
-p Make change(s) (-D/-d/-o) or display (-a/-o) apply to
permanent (current and nextboot) value
__ 4. Use the vmo command to list the current and reboot value, range, unit, type and
dependencies of all non-restricted VMM tunable parameters.
» # vmo -L
NAME CUR DEF BOOT MIN MAX UNIT TYPE
DEPENDENCIES
--------------------------------------------------------------------------------
ams_loan_policy n/a 1 1 0 2 numeric D
--------------------------------------------------------------------------------
force_relalias_lite 0 0 0 0 1 boolean D
--------------------------------------------------------------------------------
kernel_heap_psize 64K 0 0 0 16M bytes B
--------------------------------------------------------------------------------
lgpg_regions 0 0 0 0 8E-1 D
lgpg_size
--------------------------------------------------------------------------------
lgpg_size 0 0 0 0 16M bytes D
lgpg_regions
--------------------------------------------------------------------------------
low_ps_handling 1 1 1 1 2 D
--------------------------------------------------------------------------------
maxfree 1088 1088 1088 16 209715 4KB pages D
minfree
memory_frames
--------------------------------------------------------------------------------
maxperm 214295 214295 S
--------------------------------------------------------------------------------
maxpin 211816 211816 S
--------------------------------------------------------------------------------
maxpin% 80 80 80 1 100 % memory D
pinnable_frames
memory_frames
--------------------------------------------------------------------------------
memory_frames 256K 256K 4KB pages S
--------------------------------------------------------------------------------
memplace_data 2 2 2 1 2 D
--------------------------------------------------------------------------------
memplace_mapped_file 2 2 2 1 2 D
--------------------------------------------------------------------------------
memplace_shm_anonymous 2 2 2 1 2 D
--------------------------------------------------------------------------------
memplace_shm_named 2 2 2 1 2 D

--------------------------------------------------------------------------------
memplace_stack 2 2 2 1 2 D
--------------------------------------------------------------------------------
memplace_text 2 2 2 1 2 D
--------------------------------------------------------------------------------
memplace_unmapped_file 2 2 2 1 2 D
--------------------------------------------------------------------------------
minfree 960 960 960 8 209715 4KB pages D
maxfree
memory_frames
--------------------------------------------------------------------------------
minperm 7143 7143 S
--------------------------------------------------------------------------------
minperm% 3 3 3 1 100 % memory D
--------------------------------------------------------------------------------
nokilluid 0 0 0 0 4G-1 uid D
--------------------------------------------------------------------------------
npskill 1K 1K 1K 1 128K-1 4KB pages D
--------------------------------------------------------------------------------
npswarn 4K 4K 4K 1 128K-1 4KB pages D
--------------------------------------------------------------------------------
numpsblks 128K 128K 4KB blocks S
--------------------------------------------------------------------------------
pinnable_frames 168586 168586 4KB pages S
--------------------------------------------------------------------------------
relalias_percentage 0 0 0 0 32K-1 D
--------------------------------------------------------------------------------
scrub 0 0 0 0 1 boolean D
--------------------------------------------------------------------------------
v_pinshm 0 0 0 0 1 boolean D
--------------------------------------------------------------------------------
vmm_default_pspa 0 0 0 -1 100 numeric D
--------------------------------------------------------------------------------
wlm_memlimit_nonpg 1 1 1 0 1 boolean D
--------------------------------------------------------------------------------
n/a means parameter not supported by the current platform or kernel
Parameter types:
S = Static: cannot be changed
D = Dynamic: can be freely changed
B = Bosboot: can only be changed using bosboot and reboot
R = Reboot: can only be changed during reboot
C = Connect: changes are only effective for future socket connections
M = Mount: changes are only effective for future mountings
I = Incremental: can only be incremented
d = deprecated: deprecated and cannot be changed
Value conventions:
K = Kilo: 2^10 G = Giga: 2^30 P = Peta: 2^50
M = Mega: 2^20 T = Tera: 2^40 E = Exa: 2^60

V3.1.0.1
EXempty __ 5. Use the schedo command to display the current values of all non-restricted CPU
tunable parameters.
» # schedo -a
affinity_lim = 7
big_tick_size = 1
ded_cpu_donate_thresh = 80
fixed_pri_global = 0
force_grq = 0
maxspin = 16384
pacefork = 10
proc_disk_stats = 1
sched_D = 16
sched_R = 16
tb_balance_S0 = 2
tb_balance_S1 = 2
tb_threshold = 100
timeslice = 1
vpm_fold_policy = 1
vpm_xvcpus = 0
__ 6. Use the ioo command to display the current values of all restricted and
non-restricted I/O tunable parameters.
» # ioo -aF
aio_active = 0
aio_maxreqs = 65536
aio_maxservers = 30
aio_minservers = 3
aio_server_inactivity = 300
j2_atimeUpdateSymlink = 0
j2_dynamicBufferPreallocation = 16
j2_inodeCacheSize = 400
j2_maxPageReadAhead = 128
j2_maxRandomWrite = 0
j2_metadataCacheSize = 400
j2_minPageReadAhead = 2
j2_nPagesPerWriteBehindCluster = 32
j2_nRandomCluster = 0
j2_syncPageCount = 0
j2_syncPageLimit = 16
lvm_bufcnt = 9
pd_npages = 65536
posix_aio_active = 0
posix_aio_maxreqs = 65536
posix_aio_maxservers = 30
posix_aio_minservers = 3
posix_aio_server_inactivity = 300
##Restricted tunables
aio_fastpath = 1
aio_fsfastpath = 1
aio_kprocprio = 39
aio_multitidsusp = 1

aio_sample_rate = 5
aio_samples_per_cycle = 6
j2_maxUsableMaxTransfer = 512
j2_nBufferPerPagerDevice = 512
j2_nonFatalCrashesSystem = 0
j2_syncModifiedMapped = 1
j2_syncdLogSyncInterval = 1
j2_unmarkComp = 0
jfs_clread_enabled = 0
jfs_use_read_lock = 1
maxpgahead = 8
maxrandwrt = 0
memory_frames = 262144
minpgahead = 2
numclust = 1
numfsbufs = 196
pgahd_scale_thresh = 0
posix_aio_fastpath = 1
posix_aio_fsfastpath = 1
posix_aio_kprocprio = 39
posix_aio_sample_rate = 5
posix_aio_samples_per_cycle = 6
pv_min_pbuf = 512
sync_release_ilock = 0
__ 7. Use the vmo command to display help about the maxfree tunable.
» # vmo -h maxfree
Help for tunable maxfree:
Purpose:
Specifies the number of frames on the free list at which page-stealing is to stop.
Values:
Default: 1088
Range: 16 - 209715
Type: Dynamic
Unit: 4KB pages
Tuning:
Observe free-list-size changes with vmstat n. If vmstat n shows free-list size
frequently driven below minfree by application demands, increase maxfree to reduce
calls to replenish the free list. Setting the value too high causes page replacement
to run for a longer period of time. The difference between maxfree and minfree should
be of the order of maxpgahead, and no less than 8.

V3.1.0.1
EXempty Part 2 - Managing tunable parameters

In this section of the exercise you will be examining, changing, and resetting tunable
parameters.
Steps
__ 9. Fill in the table with the attributes for the given tunable parameters. What commands
did you use to get the information?
Value at
Current Default Minimum Maximum
Tunable Parameter Next
Value Value Value Value
Reboot
maxfree 1088 1088 1088 16 209715
minfree 960 960 960 8 209715
minperm% 3 3 3 1 100
sched_D 16 16 16 0 32
sched_R 16 16 16 0 32
timeslice 1 1 1 0 2G-1
j2_maxPageReadAhead 128 128 128 0 64K
j2_maxRandomWrite 0 0 0 0 64K
» The table above has been filled in with examples from a sample system.
» Use the following commands to get the information. The output shown should
be similar to what you will see on your system.
» # vmo -L maxfree
DEPENDENCIES
--------------------------------------------------------------------------------
maxfree 1088 1088 1088 16 209715 4KB pages D
minfree
memory_frames
--------------------------------------------------------------------------------

» # vmo -L minfree
DEPENDENCIES
--------------------------------------------------------------------------------
minfree 960 960 960 8 209715 4KB pages D
maxfree
memory_frames
--------------------------------------------------------------------------------
» # vmo -L minperm%
DEPENDENCIES
--------------------------------------------------------------------------------
minperm% 3 3 3 1 100 % memory D
--------------------------------------------------------------------------------
» # schedo -L sched_D
DEPENDENCIES
--------------------------------------------------------------------------------
sched_D 16 16 16 0 32 D
--------------------------------------------------------------------------------
» # schedo -L sched_R
DEPENDENCIES
--------------------------------------------------------------------------------
sched_R 16 16 16 0 32 D
--------------------------------------------------------------------------------
» # ioo -L j2_maxPageReadAhead
DEPENDENCIES
--------------------------------------------------------------------------------
j2_maxPageReadAhead 128 128 128 0 64K 4KB pages D
--------------------------------------------------------------------------------
» # ioo -L j2_maxRandomWrite
DEPENDENCIES
--------------------------------------------------------------------------------
j2_maxRandomWrite 0 0 0 0 64K 4KB pages D
--------------------------------------------------------------------------------

V3.1.0.1
EXempty __ 10. Change the following parameters to the given values and for the timeframe listed.
Note, the message for parameters that will take effect at the next reboot.
Parameter New Value Timeframe

minfree(vmo) 240 Immediately
maxfree (vmo) 256 Immediately
minperm% (vmo) 40 At next reboot
sched_D (schedo) 4 Immediately and at next reboot
sched_R (schedo) 4 Immediately and at next reboot
timeslice (schedo) 2 At next reboot
j2_minPageReadAhead (ioo) 4 At next reboot
j2_maxPageReadAhead (ioo) 32 At next reboot
» Use the following commands to get the information. The output is also shown.
# vmo -o minfree=240
Setting minfree to 240
»
# vmo -o maxfree=256
Setting maxfree to 256
»
# vmo -r -o minperm%=40
Setting minperm% to 40 in nextboot file
Warning: changes will take effect only at next reboot
»
# schedo -p -o sched_D=4
Setting sched_D to 4 in nextboot file
Setting sched_D to 4
»
# schedo -p -o sched_R=4
Setting sched_R to 4 in nextboot file
Setting sched_R to 4
»
# schedo -r -o timeslice=2
Setting timeslice to 2 in nextboot file
»
# ioo -r -o j2_minPageReadAhead=4
Setting j2_minPageReadAhead to 4 in nextboot file

»
# ioo -r -o j2_maxPageReadAhead=32
Setting j2_maxPageReadAhead to 32 in nextboot file
__ 11. Verify that the changes that were to take effect immediately for maxfree and
minfree have been changed. What commands did you use?
» Use the following commands to get the information.
»
# vmo -o maxfree
maxfree = 256
»
# vmo -o minfree
minfree = 240
__ 12. Look at the /etc/tunables/nextboot file to see the changes that have been made.
Are all the changes you made in Step 10 in the file? ___________
» The answer is No. Only the changes that are to be made at the next reboot
are in the /etc/tunables/nextboot file.
» Here is the contents of the nextboot file (without the comment information at
the top of the file):
» # cat /etc/tunables/nextboot
info:
AIX_level = "6.1.1.1"
Kernel_type = "MP64"
Last_validation = "2009-01-18 14:24:43 CST (current, reboot)"
vmo:
minperm% = "40"
schedo:
timeslice = "2"
sched_R = "4"
sched_D = "4"
ioo:
j2_maxPageReadAhead = "32"
j2_minPageReadAhead = "4"

V3.1.0.1
EXempty
__ 13. Reboot your system.

» # shutdown -Fr
__ 14. When the system is back up, login to your assigned system as the root user.
__ 15. Look at the /etc/tunables/lastboot file.

What information is in this file?
» The /etc/tunables/lastboot file lists all the tunable parameters. The
parameters that have not been changed from their default values show
# DEFAULT VALUE at the end of the line. Static parameters show
# STATIC (never restored). Parameters that were different from their
default value when the system was booted, will not have anything listed after
the parameter or will show # RESTRICTED not at default value.
» # more /etc/tunables/lastboot
Note: There are many more entries in this file than are shown here.
info:
Logfile_checksum = "1791190022"
Description = "Full set of tunable parameters after last boot"
AIX_level = "6.1.2.1"
schedo:
affinity_lim = "7" # DEFAULT VALUE
big_tick_size = "1" # DEFAULT VALUE
ded_cpu_donate_thresh = "80" # DEFAULT VALUE
fixed_pri_global = "0" # DEFAULT VALUE
force_grq = "0" # DEFAULT VALUE
maxspin = "16384" # DEFAULT VALUE
pacefork = "10" # DEFAULT VALUE
proc_disk_stats = "1" # DEFAULT VALUE
sched_D = "4"
sched_R = "4"
tb_balance_S0 = "2" # DEFAULT VALUE
tb_balance_S1 = "2" # DEFAULT VALUE
tb_threshold = "100" # DEFAULT VALUE
timeslice = "2"
. . . < some output deleted >. . .

vmo:
force_relalias_lite = "0" # DEFAULT VALUE
kernel_heap_psize = "65536"
lgpg_regions = "0" # DEFAULT VALUE
lgpg_size = "0" # DEFAULT VALUE
low_ps_handling = "1" # DEFAULT VALUE
maxfree = "1088" # DEFAULT VALUE
maxperm = "214295" # STATIC (never restored)
maxpin = "211826" # STATIC (never restored)
maxpin% = "80" # DEFAULT VALUE
memory_frames = "262144" # STATIC (never restored)
memplace_data = "2" # DEFAULT VALUE
memplace_mapped_file = "2" # DEFAULT VALUE
memplace_shm_anonymous = "2" # DEFAULT VALUE
memplace_shm_named = "2" # DEFAULT VALUE
memplace_stack = "2" # DEFAULT VALUE
memplace_text = "2" # DEFAULT VALUE
memplace_unmapped_file = "2" # DEFAULT VALUE
minfree = "960" # DEFAULT VALUE
minperm = "95242" # STATIC (never restored)
minperm% = "40"
ioo:
aio_active = "0" # STATIC (never restored)
aio_maxreqs = "65536" # DEFAULT VALUE
aio_maxservers = "30" # DEFAULT VALUE
aio_minservers = "3" # DEFAULT VALUE
aio_server_inactivity = "300" # DEFAULT VALUE
j2_atimeUpdateSymlink = "0" # DEFAULT VALUE
j2_dynamicBufferPreallocation = "16" # DEFAULT VALUE
j2_inodeCacheSize = "400" # DEFAULT VALUE
j2_maxPageReadAhead = "32"
j2_maxRandomWrite = "0" # DEFAULT VALUE
j2_metadataCacheSize = "400" # DEFAULT VALUE
j2_minPageReadAhead = "4"
j2_nPagesPerWriteBehindCluster = "32" # DEFAULT VALUE
raso:
kern_heap_noexec = "0" # DEFAULT VALUE
kernel_noexec = "1" # DEFAULT VALUE
mbuf_heap_noexec = "0" # DEFAULT VALUE
no:
arpqsize = "12" # DEFAULT VALUE
arpt_killc = "20" # DEFAULT VALUE
arptab_bsiz = "7" # DEFAULT VALUE
nfso:
client_delegation = "1" # DEFAULT VALUE
nfs_max_read_size = "65536" # DEFAULT VALUE
nfs_max_write_size = "65536" # DEFAULT VALUE

V3.1.0.1
EXempty
__ 16. Look at the /etc/tunables/lastboot.log file.

What information is in this file?
» The /etc/tunables/lastboot.log file contains logging information about
changes made and errors encountered during the last rebooting of the
machine.
» # cat /etc/tunables/lastboot.log
Restoring schedo values
=======================
Setting sched_D to 4
Setting sched_R to 4
Setting timeslice to 2
Setting krlock_confer2self to 1
Warning: a restricted tunable has been modified
Restoring vmo values

====================
Setting minperm% to 40
Restoring ioo values

====================
Setting j2_minPageReadAhead to 4
Setting j2_maxPageReadAhead to 32
Setting j2_syncPageLimit to 16
Restoring raso values

=====================
Restoring no values
===================
Setting net_malloc_police to 65536
Restoring nfso values

=====================

__ 17. Change all the tunable parameters so they are back to their default values at the
next boot.
What command did you use? ______________________________________
» # tundefault -r
Modification to restricted tunable v_repage_hi, confirmation required yes/no yes
Setting v_repage_hi to 0 in nextboot file
Modification to restricted tunable v_repage_proc, confirmation required yes/no yes
Setting v_repage_proc to 4 in nextboot file
Modification to restricted tunable v_sec_wait, confirmation required yes/no yes
Setting v_sec_wait to 1 in nextboot file
Modification to restricted tunable v_min_process, confirmation required yes/no yes
Setting v_min_process to 2 in nextboot file
Modification to restricted tunable v_exempt_secs, confirmation required yes/no yes
Setting v_exempt_secs to 2 in nextboot file
Setting pacefork to 10 in nextboot file
Setting sched_D to 16 in nextboot file
Setting sched_R to 16 in nextboot file
Setting timeslice to 1 in nextboot file
Setting maxspin to 16384 in nextboot file
Setting vpm_fold_policy to 1 in nextboot file
Setting ded_cpu_donate_thresh to 80 in nextboot file
Setting tb_threshold to 100 in nextboot file
Setting tb_balance_S0 to 2 in nextboot file
Setting tb_balance_S1 to 2 in nextboot file
Warning: some changes will take effect only after a bosboot and a reboot
Run bosboot now? yes/no yes
bosboot: Boot image is 36829 512 byte blocks.

Setting maxfree to 1088 in nextboot file
Setting minfree to 960 in nextboot file
Setting minperm% to 3 in nextboot file
__ 18. Look at the /etc/tunables/nextboot file.

What is in it?
» There is only some header information in the file and a few general things in
the info stanza.
» # cat /etc/tunables/nextboot
info:
AIX_level = "6.1.2.1"
vmo:

V3.1.0.1
schedo:
EXempty
ioo:
raso:
» Note: Nothing is in the file (other than the header info) because the previous
step set everything back to their default values.
__ 19. Reboot your system.

» # shutdown -Fr
__ 20. When the system is back up, login to your assigned system as the root user.
__ 21. Look at the /etc/tunables/lastboot file. What changes have been made?
» All parameters should be back to their default values.
» # more /etc/tunables/lastboot
info:
Logfile_checksum = "2899193260"
Description = "Full set of tunable parameters after last boot"
AIX_level = "6.1.2.1"
schedo:
affinity_lim = "7" # DEFAULT VALUE
big_tick_size = "1" # DEFAULT VALUE
ded_cpu_donate_thresh = "80" # DEFAULT VALUE
vmo:
force_relalias_lite = "0" # DEFAULT VALUE
kernel_heap_psize = "65536"
lgpg_regions = "0" # DEFAULT VALUE
ioo:
aio_active = "0" # STATIC (never restored)
aio_maxreqs = "65536" # DEFAULT VALUE
raso:

kern_heap_noexec = "0" # DEFAULT VALUE

kernel_noexec = "1" # DEFAULT VALUE
no:
arpqsize = "12" # DEFAULT VALUE
arpt_killc = "20" # DEFAULT VALUE
nfso:
client_delegation = "1" # DEFAULT VALUE
nfs_max_read_size = "65536" # DEFAULT VALUE
END OF LAB

V3.1.0.1
EXempty Exercise 3. Monitoring CPU Usage

(with Hints)
Introduction
This lab exercise covers administrative tools and their effect on CPU
performance. You will work with priorities, CPU utilization statistics,
and simultaneous multi-threading (SMT).
Exercise Objectives
• Modify process priorities
• Monitor CPU usage
• Observe the run queue
• Characterize CPU usage
• Enable and disable simultaneous multi-threading
• Analyze CPU related PerfPMR files
References
© Copyright IBM Corp. 2009 Exercise 3. Monitoring CPU Usage 3-1


Preface
can use either version to complete this exercise (or switch back and forth between the
two versions). In other words, use these two versions of the exercise in whatever way
best aids your learning. Also, please don’t hesitate to ask the instructor if you have
questions.

V3.1.0.1
EXempty Part 1 - Modifying the process priority

In this section of the exercise, you will use the nice and renice command to alter
process priorities.
Steps
__ 2. Change your directory to /u/QV431/ex3.

» # cd /u/QV431/ex3
__ 3. Run the following command to disable the processor simultaneous multi-threading

(SMT) mode.
# smtctl -m off
smtctl: SMT is now disabled. It will persist across reboots if
you run the bosboot command before the next reboot.
Note: The SMT capability is enabled by default in AIX 6.1 and it is beneficial for
most environments. However, for this section of the exercise we need to disable
SMT in order to simulate the expected results.
__ 4. Run the program called cpuprog in the background to generate CPU activity.
# ./cpuprog &
__ 5. Run the ps -el command and fill in the table with the current priority, CPU usage
value, and nice value of the cpuprog program.
Priority CPU Usage Nice Value

115 85 24
» Use the ps -el command and search for the cpuprog process. Look at the
PRI column for the priority, the C column for the CPU usage and the NI
column for the nice value.
» # ps -el | head -1
» # ps -el | grep cpuprog


F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
200001 A 0 311452 258284 85 115 24 2594d480 96 pts/0 13:23 cpuprog
» Additional notes:
• The priority of the process; higher numbers mean lower priority.
• CPU utilization of the process is incremented each time the system clock
ticks and the process is running. The value is decayed by the scheduler
by dividing it by 2 once per second.
• The nice value used in calculating priority (for the SCHED_OTHER
policy).
__ 6. Run the ps -el command multiple times and notice that the priority and CPU usage
values for the cpuprog program change. Write down the values in the following
table:
Priority CPU Usage Nice Value

135 120 24
127 106 24
135 120 24
135 120 24
113 80 24
135 120 24
» Use the ps -el command and search for the cpuprog process. Look at the
PRI column for the priority, the C column for the CPU usage and the NI
column for the nice value.
» # ps -el | head -1
200001 A 0 311452 258284 120 135 24 2594d480 96 pts/0 13:26 cpuprog
200001 A 0 311452 258284 106 127 24 2594d480 96 pts/0 13:27 cpuprog
200001 A 0 311452 258284 120 135 24 2594d480 96 pts/0 13:29 cpuprog
200001 A 0 311452 258284 120 135 24 2594d480 96 pts/0 13:33 cpuprog
200001 A 0 311452 258284 80 113 24 2594d480 96 pts/0 13:34 cpuprog
200001 A 0 311452 258284 120 135 24 2594d480 96 pts/0 13:36 cpuprog

V3.1.0.1
EXempty __ 7. Open another telnet window and run the cpuprog program again (it is in
/u/QV431/ex3), but this time in the foreground. The other cpuprog process should
still be running in the background.
# /u/QV431/ex3/cpuprog
__ 8. Go back to the first window and run the ps -el command. Find out the priority and
nice value of the foreground cpuprog program. You can use the TIME column to
determine the newer process.
Is the nice value different from the one that was started in the background? _____
» # ps -el |grep cpuprog
200001 A 0 188588 196636 64 92 20 45955480 96 pts/1 0:03 cpuprog
200001 A 0 311452 258284 43 92 24 2594d480 96 pts/0 44:16 cpuprog
» Note: You should notice that the background process has a nice value of 24
and the foreground process has a nice value of 20. Some shells will
automatically add a value of 4 to the initial nice value for processes that are
started in the background.
__ 9. Change the nice value of the program running in the background to a nice value of
30.
The syntax of the renice command is:
renice [[-n Increment] | Increment]] -p <PID>
Which number (Increment) are you going to use to set the nice value to 30? ___
Verify the change with ps -el.

» You should use -n 6 or 6.
• The -n flag specifies the number to add to the current nice value of the
process.
• Specifying the Increment without the -n flag specifies the number to add
to the default nice value of the process.
• The value of increment can only be a decimal integer from -20 to 20.
• Positive Increment values cause a higher nice value (lower (disfavored)
priority). Negative increment values cause a lower nice value (higher
(favored) priority.

» # renice -n 6 -p <PID>
or
# renice 6 -p <PID>
» Use whatever PID you discovered in the ps command.
200001 A 0 188588 196636 120 92 20 45955480 96 pts/1 0:03 cpuprog
200001 A 0 311452 258284 64 120 30 2594d480 96 pts/0 44:16 cpuprog
__ 10. Kill both the background and foreground cpuprog processes.

» # kill <PID> <PID>
__ 11. Start the cpuprog program again but start it with a nice value of 25. Remember, all
files for this exercise are in the /u/QV431/ex3 directory.
The syntax of the nice command is: nice -n <increment> <command>
» # cd /u/QV431/ex3
» # nice -n 5 ./cpuprog
__ 12. In another window, verify the nice value by running the ps -el command. Is the
nice value 25? If not, check the syntax of the nice command. Remember the -n flag
indicates the increment to the default nice value (20 for foreground processes).
200001 A 0 315580 339998 120 139 25 9b6c480 92 pts/0 0:15 cpuprog
__ 13. Kill the cpuprog process.

» # kill <PID>
__ 14. Start the cpuprog process again, in the background, with an initial nice value of 30.
Then, verify the nice value by running the ps -el command. Remember the
default nice value for background processes is 24.
Is the nice value 30? ________________

V3.1.0.1
EXempty » # nice -n 6 ./cpuprog &

200001 A 0 544854 557170 120 158 30 33c97480 92 pts/2 0:09 cpuprog
» The nice value should be set to 30.
__ 15. Kill the cpuprog process that is running in background.

» # kill %1
__ 16. Run the program called countem with an argument of 1000000000 (nine zeros) and
record how long it takes to run. The countem program is a simple program that
counts from 1 to whatever number is passed as an argument.
Use the shell’s built-in time command to get the timing information.
real user sys
» # time ./countem 1000000000

real 0m20.94s
user 0m20.85s
sys 0m0.00s
__ 17. Run two instances of the cpuprog program in the background.

» # ./cpuprog &
» # ./cpuprog &
__ 18. Run the countem program again and time how long it takes. Run the countem
program with an argument of 1000000000 (nine zeros) again. Record how long it
takes to run.
Why does it take longer than before?
real user sys

» # time ./countem 1000000000

real 0m34.41s
user 0m21.12s
sys 0m0.00s
» It takes longer because it has to share CPU resources with the CPU-intensive
background jobs (cpuprog).
__ 19. Start the countem program again with higher priority in a way to make it complete
sooner. You will need to start it with the time command to see the results. Record
how long it takes to run.
real user sys
» # time nice -n -15 ./countem 1000000000

real 0m24.99s
user 0m20.97s
sys 0m0.00s
Were you able to make it finish faster? ____________ ________________

» Yes, the countem program should run faster.
__ 20. Assume you have no control of how a program is started or the program has already
started and you cannot restart it. How can you make this program complete sooner
after it has already started?
What command should you use? ___________________________________
» The renice command can be used to alter the nice value of a running
process.

V3.1.0.1
EXempty __ 21. Kill the cpuprog processes that are running in background.
» # kill <PID>
» # kill <PID>
» Use the PIDs you discovered in the ps command.
__ 22. Run the following commands:

# ./usingcpu left &
# ./usingcpu right &

Note: The usingcpu program is a simple program which is CPU intensive. By
passing the argument left the program displays information on the left side of the
terminal session, and by passing the argument right the program displays
information on the right side.
Examine the output lines.
Are you able to see both processes competing for the CPU?
_______________________

[...left using CPU...]
[...right using CPU...]
» Yes, It is possible to see both sides running concurrently and also see an
equivalent amount of output (... [left/right] using CPU...) on both sides.
__ 23. Open another window and change the nice value to 33 of one of the usingcpu
processes (left or right) started in the last step. Observe the behavior.
» # renice -n 9 <PID>
» Use the ps -el | grep usingcpu command to get the PID number.
» The other session, where the usingcpu programs are running, should be
similar to the following:


Are you able to see any difference in CPU utilization of both processes? _______
» Yes, It is possible to see one side using more CPU (more output lines).
__ 24. Kill the usingcpu processes that are running in background.

» # kill <PID>
» # kill <PID>
» Use the PIDs you discovered in the ps command.
__ 25. Run the psformultithread script and examine the output. The psformultithread
script first runs the multithread process that creates 8 threads, then issues the
ps -emo THREAD command to display a line for each thread of the multithread
process.
# ./psformultithread
Do the threads belonging to the same process have the same priority? _______
the multithread process has been started.
the primary process' thread creates 8 additional threads.
collecting threads information of the multithread process, please wait ...
USER PID PPID TID ST CP PRI SC WCHAN F TT BND COMMAND
root 258118 422130 - A 433 60 1 f100080710008440 200001 pts/1 -

./multithread 1 8 1000000000

V3.1.0.1
- - - 323819 R 54 87 0 - 400000 - - -
EXempty - - - 536707 R 55 87 0 - 400000 - - -
- - - 733245 R 55 87 0 - 400000 - - -
- - - 1233083 R 54 87 0 - 400000 - - -
- - - 1286179 R 55 87 0 - 400000 - - -
- - - 1531915 R 52 86 0 - 400000 - - -
- - - 1568775 R 55 87 0 - 400000 - - -
- - - 1728603 R 53 86 0 - 400000 - - -
- - - 1269795 R 0 60 1 - 400000 - - -
» The threads do not necessarily have the same priority. Each thread could
potentially have a different priority, scheduling policy, and CPU consumption.

Part 2 - Monitoring use of CPU with ps, vmstat, sar, and topas
In this section of the exercise, you will use the vmstat and sar command to monitor
CPU usage.
Steps
__ 27. Make sure you are in /u/QV431/ex3 directory.

» # cd /u/QV431/ex3
__ 28. Start up four instances of the cpuprog program in the background.

» # ./cpuprog &
» # ./cpuprog &
» # ./cpuprog &
» # ./cpuprog &
__ 29. Run the psforcpuprog script to display the priority and CPU utilization values. The
psforcpuprog script issues the ps -ef command in an infinite loop.
# ./psforcpuprog

200001 A 0 106506 163954 64 104 24 7b0b480 92 pts/0 0:18 cpuprog
200001 A 0 139464 163954 63 103 24 11440480 92 pts/0 0:18 cpuprog
200001 A 0 319628 163954 61 102 24 3229480 92 pts/0 0:21 cpuprog
200001 A 0 393242 163954 62 102 24 351f2480 92 pts/0 0:19 cpuprog

200001 A 0 106506 163954 104 126 24 7b0b480 92 pts/0 0:19 cpuprog
200001 A 0 139464 163954 103 125 24 11440480 92 pts/0 0:19 cpuprog
200001 A 0 319628 163954 104 126 24 3229480 92 pts/0 0:22 cpuprog
200001 A 0 393242 163954 101 124 24 351f2480 92 pts/0 0:21 cpuprog

200001 A 0 106506 163954 73 109 24 7b0b480 92 pts/0 0:21 cpuprog
200001 A 0 139464 163954 74 109 24 11440480 92 pts/0 0:21 cpuprog
200001 A 0 319628 163954 71 107 24 3229480 92 pts/0 0:23 cpuprog
200001 A 0 393242 163954 74 109 24 351f2480 92 pts/0 0:22 cpuprog

V3.1.0.1

EXempty 200001 A 0 106506 163954 114 132 24 7b0b480 92 pts/0 0:22 cpuprog
200001 A 0 139464 163954 114 132 24 11440480 92 pts/0 0:22 cpuprog
200001 A 0 319628 163954 114 132 24 3229480 92 pts/0 0:24 cpuprog
200001 A 0 393242 163954 114 132 24 351f2480 92 pts/0 0:23 cpuprog

200001 A 0 106506 163954 86 116 24 7b0b480 92 pts/0 0:23 cpuprog
200001 A 0 139464 163954 86 116 24 11440480 92 pts/0 0:23 cpuprog
200001 A 0 319628 163954 87 116 24 3229480 92 pts/0 0:26 cpuprog
200001 A 0 393242 163954 87 116 24 351f2480 92 pts/0 0:24 cpuprog

200001 A 0 106506 163954 120 135 24 7b0b480 92 pts/0 0:24 cpuprog
200001 A 0 139464 163954 120 135 24 11440480 92 pts/0 0:24 cpuprog
200001 A 0 319628 163954 120 135 24 3229480 92 pts/0 0:27 cpuprog
200001 A 0 393242 163954 120 135 24 351f2480 92 pts/0 0:26 cpuprog
» In the example output above, the priority (PRI) is staying in the 100-135 range
and the CPU utilization value (C) is staying in the 60-120 range.
__ 30. In the window where the psforcpuprog script is not running, change the nice
values of each of the cpuprog programs to 20.
» # renice -n -4 <PID>
» Use whatever PIDs you discovered in the ps command.
__ 31. Look at the output of psforcpuprog again.

__ a. What do you observe about the priority values now? Notice the priority and CPU
utilization values.
__ b. What difference in priorities do you see between the last step and this step?

» The output of psforcpuprog should be similar to the following:

200001 A 0 106506 163954 78 99 20 7b0b480 92 pts/0 2:13 cpuprog
200001 A 0 139464 163954 85 102 20 11440480 92 pts/0 2:15 cpuprog
200001 A 0 319628 163954 86 103 20 3229480 92 pts/0 2:17 cpuprog
200001 A 0 393242 163954 77 98 20 351f2480 92 pts/0 2:12 cpuprog

200001 A 0 106506 163954 120 120 20 7b0b480 92 pts/0 2:15 cpuprog
200001 A 0 139464 163954 117 118 20 11440480 92 pts/0 2:16 cpuprog
200001 A 0 319628 163954 120 120 20 3229480 92 pts/0 2:18 cpuprog
200001 A 0 393242 163954 117 118 20 351f2480 92 pts/0 2:13 cpuprog

200001 A 0 106506 163954 92 106 20 7b0b480 92 pts/0 2:16 cpuprog
200001 A 0 139464 163954 97 108 20 11440480 92 pts/0 2:17 cpuprog
200001 A 0 319628 163954 92 106 20 3229480 92 pts/0 2:20 cpuprog
200001 A 0 393242 163954 89 104 20 351f2480 92 pts/0 2:15 cpuprog

200001 A 0 106506 163954 63 91 20 7b0b480 92 pts/0 2:17 cpuprog
200001 A 0 139464 163954 63 91 20 11440480 92 pts/0 2:18 cpuprog
200001 A 0 319628 163954 64 92 20 3229480 92 pts/0 2:21 cpuprog
200001 A 0 393242 163954 67 93 20 351f2480 92 pts/0 2:16 cpuprog

200001 A 0 106506 163954 101 110 20 7b0b480 92 pts/0 2:18 cpuprog
200001 A 0 139464 163954 120 120 20 11440480 92 pts/0 2:20 cpuprog
200001 A 0 319628 163954 95 107 20 3229480 92 pts/0 2:22 cpuprog
200001 A 0 393242 163954 96 108 20 351f2480 92 pts/0 2:17 cpuprog

200001 A 0 106506 163954 82 101 20 7b0b480 92 pts/0 2:20 cpuprog
200001 A 0 139464 163954 75 97 20 11440480 92 pts/0 2:21 cpuprog
200001 A 0 319628 163954 74 97 20 3229480 92 pts/0 2:23 cpuprog
200001 A 0 393242 163954 74 97 20 351f2480 92 pts/0 2:18 cpuprog
» Once the nice value is at 20, you will notice that the priority (PRI) is now
staying in the 90-120 range, it does not get as numerically high as it did when
the nice value was 24. The reason for this is because AIX penalizes threads
more (decreases the effective priority) if they have a nice value greater than
20.
» The schedo command can be used to tune the penalty algorithm.
__ 32. Kill the psforcpuprog process.

» # <CTRL-C>

V3.1.0.1
EXempty __ 33. Use the ps aux and topas commands to determine what process(es) are
consuming the most CPU time.
• The ps aux command uses the Berkeley standard and displays information
about all processes which includes USER, PID, %CPU, %MEM, SZ, RSS, TTY, STAT,
STIME, TIME, and COMMAND fields.
# ps aux | head -10
root 319628 12.5 0.0 92 96 pts/0 A 19:27:51 3:23 ./cpuprog
pconsole 245928 0.0 7.0 54148 54152 - A 15:17:38 0:13 /usr/java5/bin/j
root 8196 0.0 0.0 384 384 - A 15:17:06 0:12 wait
root 49176 0.0 0.0 384 384 - A 15:17:06 0:07 wait
root 311450 0.0 5.0 41420 41424 - A 15:17:32 0:06 /usr/java5/bin/j
root 262316 0.0 0.0 600 628 - A 15:17:32 0:03 /usr/sbin/getty
» Note: the ps aux command sorts processes by accumulated CPU time

(%CPU).
• The topas command reports selected statistics about the activity on the local
system.
The topas command uses the curses library to display its output in a format
suitable for viewing on an 80x25 character-based display or in a window of at
least the same size on a graphical display. You may need to set the TERM
environment variable if topas formatting looks strange (for example: export
TERM=vt220).
# topas

Sat Feb 28 19:49:57 2009 Interval: 2 Cswitch 314 Readch 0
Syscall 1535.7K Writech 159
CPU User% Kern% Wait% Idle% Physc Entc Reads 1535.7K Rawin 0
Forks 0 Igets 0
Waitqueue 0.0
PgspOut 0
cpuprog 393242 39.4 0.1 root PageOut 0 Size,MB 512

cpuprog 106506 20.8 0.1 root Sios 0 % Used 1.1

cpuprog 139464 20.1 0.1 root % Free 99.9
cpuprog 319628 19.6 0.1 root NFS (calls/sec)
getty 262316 0.0 0.5 root SerV2 0 WPAR Activ 0
java 245928 0.0 52.8 pconsole CliV2 0 WPAR Total 0
topas 356406 0.0 1.3 root SerV3 0 Press: "h"-help
gil 57372 0.0 0.9 root CliV3 0 "q"-quit
java 311450 0.0 40.4 root
rpc.lock 237716 0.0 1.2 root
ksh 380942 0.0 0.6 root
aixmibd 143490 0.0 1.1 root
rmcd 241806 0.0 2.6 root
xmgc 45078 0.0 0.4 root
bpa_logg 278566 0.0 0.6 root
netm 53274 0.0 0.4 root
snmpdv3n 229526 0.0 1.0 root
__ a. Are the results from the two commands the same? _____________________
» Yes, they are similar, at least in terms of the most CPU usage.
__ b. Which processes are using the most CPU time at this moment?
» The four cpuprog processes are using the most CPU time.
__ 34. Kill the topas command.

» # <CTRL-C>

V3.1.0.1
EXempty Part 3 - Observing the run queue

In this section of the exercise, you will examine the run queue using sar and vmstat
commands.
Steps

» # cd /u/QV431/ex3
__ 37. Make sure four instances of the cpuprog program are still running in background. If
not, start them.
» # ./cpuprog &
» # ./cpuprog &
» # ./cpuprog &
» # ./cpuprog &
__ 38. Use the sar command to answer the following questions:

__ a. How many threads are on the run queue now? ________________
__ b. What is the percentage of the time the run queue is occupied? ___________
» # sar -q 2 4
AIX rand212 1 6 00066C32D900 02/28/09
20:00:42 runq-sz %runocc swpq-sz %swpocc

20:00:44 4.0 100
20:00:46 4.0 100
20:00:48 4.0 100
20:00:50 4.0 100
Average 4.0 200

__ 39. Now, use the vmstat command to obtain the number of threads on the run queue.
Notice that both the sar and vmstat commands can be used to obtain the same
type of information.
» # vmstat 2 4

----- ----------- ------------------------ ------------ -----------------------
4 0 166951 55924 0 0 0 0 0 0 1 972863 258 21 79 0 0 1.00 250.2
4 0 166951 55924 0 0 0 0 0 0 0 981418 261 21 79 0 0 0.99 248.6
4 0 166951 55924 0 0 0 0 0 0 1 976796 261 21 79 0 0 1.00 249.9
4 0 166951 55924 0 0 0 0 0 0 1 974857 261 21 79 0 0 1.00 249.9

V3.1.0.1
EXempty Part 4 - Characterizing CPU Usage

In this section of the exercise, you will use sar, vmstat, and topas to characterize CPU
utilization. You will see the percentage of CPU used by the system and the percentage
by the user and libraries.
Steps

» # cd /u/QV431/ex3
__ 42. Make sure four instances of the cpuprog program are still running in background. If
not, start them.
» # ./cpuprog &
» # ./cpuprog &
» # ./cpuprog &
» # ./cpuprog &
__ 43. Use the sar command to get the average CPU utilization.
» sar -u 2 4
AIX rand212 1 6 00066C32D900 02/28/09
20:03:18 %usr %sys %wio %idle physc %entc

20:03:20 21 79 0 0 1.00 250.0
20:03:22 21 79 0 0 1.00 249.9
20:03:24 11 50 39 0 1.00 249.9
20:03:26 21 79 0 0 1.00 249.9
Average 19 72 10 0 1.00 249.9
__ 44. Explain the meaning of the following fields from the sar command:
__ a. %sys
» Reports the percentage of time the processor or processors spent in
execution at the system (or kernel) level.

__ b. %usr
» Reports the percentage of time the processor or processors spent in
execution at the user (or application) level.
__ c. %wio
» Reports the percentage of time the processor(s) were idle during which the
system had outstanding disk/NFS I/O request(s). See detailed description
above.
__ d. %idle
» Reports the percentage of time the processor or processors were idle with no
outstanding disk I/O requests.
__ 45. Now, use the vmstat and topas commands to obtain the CPU utilization.
» # vmstat 5 5

----- ----------- ------------------------ ------------ -----------------------
4 0 166917 55957 0 0 0 0 0 0 0 977877 259 21 79 0 0 1.00 249.9
4 0 166933 55941 0 0 0 0 0 0 2 977668 274 21 79 0 0 1.00 249.9
4 0 166933 55941 0 0 0 0 0 0 0 974952 259 21 79 0 0 1.00 249.9
4 0 166933 55941 0 0 0 0 0 0 0 974764 251 21 79 0 0 1.00 249.9
4 0 166933 55941 0 0 0 0 0 0 5 973284 275 21 79 0 0 1.00 249.9

V3.1.0.1
EXempty » # topas
Sat Feb 28 20:09:29 2009 Interval: 2 Cswitch 261 Readch 33
Syscall 951.2K Writech 188
CPU User% Kern% Wait% Idle% Physc Entc Reads 951.1K Rawin 0
Forks 0 Igets 0
Waitqueue 0.0
PgspOut 0
cpuprog 139464 25.5 0.1 root PageOut 0 Size,MB 512
cpuprog 106506 24.8 0.1 root Sios 0 % Used 1.1
cpuprog 393242 24.6 0.1 root % Free 99.9
cpuprog 319628 24.6 0.1 root NFS (calls/sec)
getty 262316 0.1 0.5 root SerV2 0 WPAR Activ 0
topas 315484 0.0 1.3 root CliV2 0 WPAR Total 0
java 245928 0.0 52.8 pconsole SerV3 0 Press: "h"-help
errdemon 110662 0.0 0.6 root CliV3 0 "q"-quit
gil 57372 0.0 0.9 root
java 311450 0.0 40.4 root
rpc.lock 237716 0.0 1.2 root
rmcd 241806 0.0 2.6 root
xmgc 45078 0.0 0.4 root
syslogd 118956 0.0 0.3 root
aixmibd 143490 0.0 1.1 root
netm 53274 0.0 0.4 root
syncd 131204 0.0 0.5 root
__ 46. Kill the topas command.

» # <CTRL-C>
__ 47. Kill the cpuprog processes that are running in background.

» # kill %1
» # kill %2
» # kill %3
» # kill %4

Part 5 - Taking advantage of the simultaneous multi-threading capability

In this section of the exercise, you will use the smtctl command to enable and disable
the simultaneous multi-threading (SMT) capability. The SMT capability allows
processors to have thread level parallelism at the instruction level. You will also use the
mpstat command to display SMT utilization.
Steps
__ 49. Run the smtctl command to see whether SMT is enabled or disabled.
» # smtctl
This system is SMT capable.
SMT is currently disabled.
SMT boot mode is set to disabled.
SMT threads are bound to the same virtual processor.
proc0 has 1 SMT threads.

Bind processor 0 is bound with proc0
» SMT should currently be disabled.

Note: Recall that you disabled SMT early in this exercise.
__ 50. If the SMT is currently enabled, run the command smtctl -m off to disable the
SMT capability.
» # smtctl -m off
__ 51. Run the multithread program and record its time.

The multithread program creates eight threads of a multi-threaded program and
each thread counts to 100 million. (The first argument is an iteration count).
Use the following command where the last argument has a one and eight zeros.
# time ./multithread 1 8 100000000

V3.1.0.1
EXempty
CPU Consumption Times (SMT off)
real
user
sys

8 threads doing 100000000 loops .
created thread 0
created thread 1
created thread 2
created thread 3
created thread 4
created thread 5
created thread 6
created thread 7
waiting for threads to complete ...
real 0m27.95s
user 0m27.84s
sys 0m0.00s
__ 52. Now, run the command smtctl -m on to enable the SMT capability.
» # smtctl -m on
smtctl: SMT is now enabled. It will persist across reboots if
__ 53. With SMT enabled, run the multithread program again and record its time.
CPU Consumption Times (SMT on)

real
user
sys
» # time ./multithread 1 8 100000000

8 threads doing 100000000 loops .
created thread 0
created thread 1

created thread 2
created thread 3
created thread 4
created thread 5
created thread 6
created thread 7
waiting for threads to complete ...
real 0m18.29s
user 0m18.22s
sys 0m0.00s
__ 54. Compare the real time with SMT disabled and SMT enabled.
Was the multithread program faster with SMT enabled or disabled? __________
» With SMT enabled, two logical processors are running on the same physical
processor, so this explains why the multithread command run faster when
SMT is active.
» This step shows an example of where SMT is a benefit to performance. Tests
performed have shown various results with different types of workloads, from
slightly lower performance with some compute-intensive workloads to
significant performance gains with typical commercial workloads. You will
need to try your system’s workload with SMT to see if there is a benefit.
__ 55. Run the mpstat -s 3 command to display the SMT utilization. Leave it running
throughout this part of the exercise.
The mpstat command collects and displays performance statistics for all logical
processors in the system. If the -s flag is specified, mpstat displays SMT thread
utilization in an SMT enabled partition
While mpstat is running, continue to the next step.
» # mpstat -s 3
Proc0
0.36%
cpu0 cpu1
0.28% 0.08%
--------------------------------------------------------------------------------
Proc0
0.89%
cpu0 cpu1
0.80% 0.09%
--------------------------------------------------------------------------------

V3.1.0.1
EXempty
__ 56. In another window, run the scripts 2Threads, 4Threads, and 8Threads sequentially
and record the utilization metrics. The 2Threads script starts a process that creates
two threads, the 4Threads script starts four threads, and the 8Threads script starts
eight threads.
Time of
Average cpu0 Average cpu1 Execution
Script
Percentage Percentage
(real time)
2Threads 50% 49% 0m14.19s
4Threads 50% 49% 0m27.34s
8Threads 50% 49% 0m54.64s
» The table above contains an example of the utilization metrics when SMT is
enabled.
» # ./2Threads
» The output of the 2Threads script should be similar to the following:
threads running...
real 0m14.19s
user 0m14.16s
sys 0m0.00s
» The output from mpstat while the 2Threads script was running should be
similar to the following;
-------------------------------------------------------------------------------
Proc0
99.97%
cpu0 cpu1
50.37% 49.60%
-------------------------------------------------------------------------------
» # ./4Threads
threads running...
real 0m27.34s
user 0m27.21s
sys 0m0.00s

-------------------------------------------------------------------------------
Proc0
99.97%
cpu0 cpu1
50.32% 49.65%
-------------------------------------------------------------------------------
» # ./8Threads
threads running...
real 0m54.64s
user 0m54.51s
sys 0m0.00s
-------------------------------------------------------------------------------
Proc0
99.97%
cpu0 cpu1
50.47% 49.50%
-------------------------------------------------------------------------------
__ 57. Make sure the mpstat command is still running in one window. Then in another
window, run the command smtctl -m off to disable the SMT capability. (If you try
to run mpstat with the -s flag when SMT is disabled, it will not work. mpstat must
already be running before you enable SMT.)
» # smtctl -m off
__ 58. With SMT disabled, run the 2Threads, 4Threads, and 8Threads scripts sequentially
and record the utilization metrics.

V3.1.0.1
EXempty
Time of
Average cpu0 Average cpu1 Execution
Script
Percentage Percentage
(real time)
2Threads 99.96% 0% 0m20.75s
4Threads 99.96% 0% 0m41.44s
8Threads 99.96% 0% 1m23.21s
» The table above contains an example of the utilization metrics when SMT is
disabled.
» # ./2Threads
threads running...
real 0m20.75s
user 0m20.61s
sys 0m0.00s
-------------------------------------------------------------------------------
Proc0
99.96%
cpu0 cpu1
99.96% 0.00%
-------------------------------------------------------------------------------
» # ./4Threads
threads running...
real 0m41.44s
user 0m41.31s
sys 0m0.00s

similar to the following:
-------------------------------------------------------------------------------
Proc0
99.96%
cpu0 cpu1
99.96% 0.00%
-------------------------------------------------------------------------------
» # ./8Threads
threads running...
real 1m23.21s
user 1m22.95s
sys 0m0.00s
-------------------------------------------------------------------------------
Proc0
99.96%
cpu0 cpu1
99.96% 0.00%
-------------------------------------------------------------------------------
__ 59. Compare the utilization metrics in both cases (SMT enabled and SMT disabled).
How does SMT affect the overall performance in multi-thread processes?
» This step shows other example of where SMT is a benefit to performance.
Also, it show how the threads are distributed among the logical processors.
__ 60. Kill the mpstat process.

» # <CTRL-C>

V3.1.0.1
EXempty Part 6 - Using PerfPMR data to examine CPU usage

In this section of the exercise, you will work on an actual customer PMR to examine
CPU usage. You will not be collecting any data or generating any reports as part of this
exercise; instead the focus is on reading and analyzing the files that are provided.
Note: Information identifying the specific PMR and/or the customer was removed to
protect confidentiality.
Steps
__ 62. Change your directory to /u/QV431/ex3/perfdata.

» # cd /u/QV431/ex3/perfdata
__ 63. Run the sar -P ALL -f sar.bin command to extract the CPU activity for all
processors. Answer the following questions:
Question Answer
Average percentage of time the
processor(s) spent in execution at the Around 70%
user level
processor(s) spent in execution at the Around 30%
system level
processor(s) were idle with no 0%
outstanding disk I/O requests
processor(s) were idle during which
0%
the system had outstanding disk/NFS
I/O request(s)
» # sar -P ALL -f sar.bin | more

AIX rand212 1 6 00066C32D900 11/11/08

03:29:19 cpu %usr %sys %wio %idle physc %entc

03:29:29 0 67 33 0 0 0.12 1.3
1 78 22 0 0 0.10 1.1
2 73 27 0 0 0.17 1.8
3 70 30 0 0 0.09 0.9
4 64 36 0 0 0.12 1.3
5 78 22 0 0 0.08 0.9
6 55 45 0 0 0.21 2.2
7 68 32 0 0 0.13 1.4
8 71 29 0 0 0.14 1.5
9 78 22 0 0 0.08 0.9
10 69 31 0 0 0.13 1.4
11 84 16 0 0 0.09 0.9
12 66 34 0 0 0.11 1.2
13 81 19 0 0 0.08 0.8
14 65 35 0 0 0.10 1.1
15 85 15 0 0 0.11 1.2
16 68 32 0 0 0.32 3.5
17 81 19 0 0 0.22 2.4
18 65 35 0 0 0.19 2.1
19 79 21 0 0 0.16 1.7
- 71 29 0 0 2.76 29.7
__ 64. What conclusion can be drawn from the previous step? Is the machine CPU bound
or I/O bound?
» The machine is CPU bound since 100% of the CPU utilization is spent on
user level and system (kernel) levels.
__ 65. Run the sar -q -f sar.bin command to display queue statistics. Answer the
following questions:
Question Answer
Average number of kernel threads in
Around 100 threads
the run queue
Average percentage of the time the
Around 95%
run queue is occupied

V3.1.0.1
EXempty » # sar -q -f sar.bin | more

AIX rand212 1 6 00066C32D900 11/11/08
03:29:19 runq-sz %runocc swpq-sz %swpocc

03:29:29 73.5 100 1.0 20
03:29:39 79.4 90
03:29:49 87.2 100 1.0 10
03:29:59 99.9 90 1.3 30
03:30:09 97.1 100 1.0 20
03:30:19 100.5 100 3.0 10
03:30:29 99.4 90 1.5 60
03:30:39 91.2 100 1.0 20
03:30:49 88.7 100 1.0 10
03:30:59 92.4 90 1.0 20
03:31:09 111.9 100 1.0 20
03:31:19 109.8 90 1.0 10
03:31:29 110.2 100 1.0 20
03:31:39 106.0 100 1.0 30
03:31:49 110.4 90 1.0 10
03:31:59 117.6 100 1.3 30
__ 66. What conclusion can be drawn from the previous step? Is the machine overloaded?
» Yes, the machine seems overloaded, since almost 100% of the time the run
queue is occupied.
__ 67. Examine the contents of the monitor.sum file and identify the three processes with
the largest CPU consumption value shown in the AFTER TIME column.
Dominant Processes CPU Time (AFTER TIME)

gil 267:42
lrud 252:18
dsmc 77:59

» # more monitor.sum
DELTA DELTA DELTA DELTA DELTA DELTA BEFORE AFTER

PID PGIN SIZE RSS TRS DRS C TIME TIME CMD
0 0 0 0 0 0 0 17:00 17:02 swapper
1 0 0 0 0 0 0 0:24 0:24 init
8196 0 0 0 0 0 0 17:04 17:04 wait
12294 0 0 0 0 0 0 0:31 0:31 sched
16392 0 0 0 0 0 0 252:18 252:18 lrud
90156 0 0 0 0 0 0 0:06 0:06 netm
94254 0 0 0 0 0 2 267:37 267:42 gil
98352 0 0 0 0 0 0 0:00 0:00 wlmsched
561162 0 0 0 0 0 0 0:00 0:00 oracle
565420 0 0 0 0 0 0 77:59 77:59 dsmc
569498 0 0 0 0 0 0 0:00 0:00 oracle
__ 68. What conclusion can be drawn from the previous step?

Are these processes solely responsible for the massive CPU utilization? Why?
» No, these three processes: gil, lrud, and dsmc are not responsible for the
massive CPU utilization.
• gil is a kernel process, which does TCP/IP activities, such as timing,
handles transmission errors, and ACKs. Normally, it should not consume
too much CPU. The total time represents the CPU time consumed since
TCP/IP has started.
• lrud is a kernel process responsible for scanning file pages in memory
and freeing those not recently accessed (LRU - Least Recently Used).
Normally, it should not consume too much CPU. The total time represents
the CPU time consumed since AIX has started.
• dsmc is used by Oracle to backup multiple files. Normally, it should not
consume too much CPU. The total time represents the CPU time
consumed since Oracle has started.
__ 69. Examine the contents of the monitor.sum file again and identify the processes that
used the most time during the sampling interval (see difference between AFTER
TIME and BEFORE TIME).

V3.1.0.1
EXempty Note: In order to avoid too many lines in the table collect only the intervals bigger
than 9 seconds.
Interval Time
Processes BEFORE TIME AFTER TIME
(> :09)
oracle 0:09 0:19 :10
oracle 3:58 4:08 :10
emagent 13:14 13:42 :18
oracle 7:42 7:55 :13
oracle 1:12 1:24 :12
oracle 0:18 0:28 :10
oracle 3:43 4:06 :23
oracle 0:06 0:17 :11
oracle 0:55 1:05 :10
oracle 2:46 3:03 :17
oracle 3:16 3:29 :13
oracle 0:11 0:21 :10
oracle 0:28 0:40 :12
oracle 1:36 1:48 :12
oracle 0:09 0:19 :10
oracle 2:32 2:42 :10
oracle 0:07 0:17 :10
oracle 1:57 2:08 :11
oracle 0:17 0:29 :12
oracle 0:15 0:25 :10
nmon12a_aix61 0:08 0:18 :10
oracle 3:34 3:50 :16
» The table above has been filled in with the acquired interval times.
» # more monitor.sum
» The output shows the following:
DELTA DELTA DELTA DELTA DELTA DELTA BEFORE AFTER
PID PGIN SIZE RSS TRS DRS C TIME TIME CMD
0 0 0 0 0 0 0 17:00 17:02 swapper
1 0 0 0 0 0 0 0:24 0:24 init
8196 0 0 0 0 0 0 17:04 17:04 wait
12294 0 0 0 0 0 0 0:31 0:31 sched
16392 0 0 0 0 0 0 252:18 252:18 lrud
1446008 0 1656 1656 0 1656 -6 3:58 4:08 oracle
1888500 0 1360 74388 0 74388 1 13:14 13:42 emagent
7573680 0 128 128 0 128 -1 0:08 0:18 nmon12a_aix61

__ 70. What conclusion can be drawn from the previous step?

What processes are responsible for the massive CPU utilization? Any
recommendations?
» Oracle processes are responsible for the massive CPU utilization.
• oracle is the multiuser Oracle Management Server that uses several
processes to run different parts of the Oracle code and additional
processes for the users
• emagent is part of the Enterprise Manager Framework, a mechanism of
communication between the Oracle Management Server and targets or
applications running on the host
• nmon12a_aix61 is short for Nigel's Monitor; it is a performance monitoring
tool
» The recommendation is to investigate the Oracle processes using advanced
performance tools (such as: tprof and trace) or an Oracle performance
monitor.
END OF LAB

V3.1.0.1
EXempty Exercise 4. Virtual Memory Performance

Monitoring
(with Hints)
Introduction
This exercise covers the monitoring of memory usage statistics. You
will use some monitoring tools such as vmstat and svmon. Also, you
will use the vmo command to manage VMM tunable parameters.
Exercise Objectives
• Observe the memory utilization
• Monitor the VMM free list
• Use svmon to monitor the amount of memory in use
• Analyze memory related PerfPMR files
References
© Copyright IBM Corp. 2009 Exercise 4. Virtual Memory Performance Monitoring 4-1

Preface
questions.

V3.1.0.1
EXempty Part 1 - Observing memory utilization, file pages, and free list
In this section of the exercise, you will examine some statistics and tunables maintained
by the Virtual Memory Manager.
Steps

» # cd /u/QV431/ex4
__ 3. What are the current values for the following statistics maintained by VMM?
VMM Statistics Current Value

minperm percentage
maxperm percentage
numperm percentage
free pages
file pages
numclient percentage
client pages
» You can use vmstat -v to get this information.
» # vmstat -v
262144 memory pages
238106 lruable pages
53797 free pages
1 memory pools
93410 pinned pages
80.0 maxpin percentage
3.0 minperm percentage
90.0 maxperm percentage
13.8 numperm percentage
32900 file pages
0.0 compressed percentage
0 compressed pages
13.8 numclient percentage
90.0 maxclient percentage
32900 client pages
0 remote pageouts scheduled
0 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2484 filesystem I/Os blocked with no fsbuf
0 client filesystem I/Os blocked with no fsbuf
0 external pager filesystem I/Os blocked with no fsbuf
__ 4. What are the current values for the following non-restricted tunables maintained by
VMM?
VMM Tunables Current Value

maxfree
maxperm
minfree
minperm
minperm%
» You can use the vmo command to get this information. For example:
» # vmo -o maxfree -o maxperm -o minfree -o minperm -o minperm%
maxfree = 1088
maxperm = 214295
minfree = 960
minperm = 7143
minperm% = 3
__ 5. Run the vmstat -I 3 command to display the I/O oriented view.

» # vmstat -I 3

-------- ----------- ------------------------ ------------ -----------------------
r b p avm fre fi fo pi po fr sr in sy cs us sy id wa pc ec
1 0 0 170192 53734 0 0 0 0 0 0 0 44 148 0 0 99 0 0.00 0.9
1 0 0 170192 53733 0 0 0 0 0 0 1 95 152 0 0 99 0 0.00 0.9
1 0 0 170192 53733 0 0 0 0 0 0 0 19 151 0 0 99 0 0.00 0.8
1 0 0 170192 53733 0 0 0 0 0 0 0 17 150 0 0 99 0 0.00 0.8
» If the -I flag is specified, an I/O oriented view is presented with the following
column changes:
kthr: The column p (number of threads waiting on I/O to raw devices per
second) will also be displayed beside columns r (average number of
runnable kernel threads) and b (average number of kernel threads placed in
wait queue).

V3.1.0.1
EXempty page: The columns fi (file page-ins per second) and fo (file page-outs per
second) will be displayed instead of the re (pager input/output list) and cy
(clock cycles by page-replacement) columns.
__ 6. Open another telnet window and run the ./readfiles script. Make sure you are in
/u/QV431/ex4 directory.
» # ./readfiles
cat: 0652-050 Cannot open /usr/bin/perfpmr.sh.
Note: Ignore “Cannot open” messages.
__ 7. While readfiles is running, watch the vmstat -I 3 output from your other window.
» The vmstat output should be similar to the following:
-------- ----------- ------------------------ ------------ -----------------------
1 0 0 170520 53399 0 0 0 0 0 0 0 19 153 0 0 99 0 0.00 0.8
1 0 0 171230 52202 482 0 0 0 0 0 128 23709 917 10 23 62 5 0.14 33.8
2 0 0 171660 48232 2551 0 0 0 0 0 306 125544 14385 30 65 4 0 0.61 151.5
1 1 0 172073 39306 4411 0 0 0 0 0 342 135334 15292 30 66 4 0 0.67 166.3
1 0 0 172729 36093 2594 0 0 0 0 0 331 150969 18117 30 65 4 0 0.68 169.6
1 1 0 173216 34075 2010 0 0 0 0 0 293 147842 16638 31 65 4 0 0.68 171.1
2 1 0 173694 28202 3666 0 0 0 620 4391 298 131191 40066 27 65 7 0 0.68 169.2
1 1 0 173874 11971 7295 0 0 0 1027 5720 333 79772 34076 25 67 7 1 0.46 114.7
2 1 0 174429 11981 1861 0 0 0 446 929 308 188155 45979 29 63 7 0 0.80 200.4
1 1 0 175092 11308 2005 0 0 0 263 1222 318 189938 25141 31 64 5 0 0.81 202.4
1 0 0 175687 10707 2016 0 0 0 256 454 318 192673 21363 31 65 4 0 0.80 201.0
1 1 0 176203 8452 2533 0 0 0 616 15834 338 129689 24559 29 65 5 0 0.64 160.8
__ 8. Explain what is happening in the vmstat -I window.

» While readfiles is running, the active virtual memory (avm) is going up and
the freelist (fre) is going down as free pages are needed to satisfy the file
pages being read in (fi).
__ 9. When readfiles finishes, what are the values for the following?
What changed?
VMM Statistics Current Value

minperm percentage
maxperm percentage
numperm percentage
free pages
file pages
numclient percentage
client pages
» # vmstat -v
262144 memory pages
238106 lruable pages
4762 free pages
1 memory pools
114907 pinned pages
80.0 maxpin percentage
3.0 minperm percentage
90.0 maxperm percentage
22.1 numperm percentage
52776 file pages
0.0 compressed percentage
0 compressed pages
22.1 numclient percentage
90.0 maxclient percentage
52776 client pages
0 remote pageouts scheduled
104 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2484 filesystem I/Os blocked with no fsbuf
0 client filesystem I/Os blocked with no fsbuf
0 external pager filesystem I/Os blocked with no fsbuf
» The numperm percentage and numclient percentage have increased, and

the free pages decreased.
__ 10. Use vmo to display help about the minfree and maxfree non-restricted tunables.
» # vmo -h minfree
Help for tunable minfree:
Purpose:
Specifies the number of frames on the free list at which the VMM starts to steal pages to
replenish the free list.
Values:
Default: 960
Range: 8 - 209715
Type: Dynamic
Unit: 4KB pages
Tuning:

V3.1.0.1
Page replacement occurs when the number of free frames reaches minfree.
EXempty If processes are being delayed by page stealing, increase minfree to improve response
time. The difference between maxfree and minfree should be of the order of maxpgahead, and
no less than 8.
» # vmo -h maxfree
Help for tunable maxfree:
Purpose:
Specifies the number of frames on the free list at which page-stealing is to stop.
Values:
Default: 1088
Range: 16 - 209715
Type: Dynamic
Unit: 4KB pages
Tuning:
Observe free-list-size changes with vmstat n.
If vmstat n shows free-list size frequently driven below minfree by application demands,
increase maxfree to reduce calls to replenish the free list. Setting the value too high
causes page replacement to run for a longer period of time. The difference between
maxfree and minfree should be of the order of maxpgahead, and no less than 8.
__ 11. Use vmo to display the current value of the minfree and maxfree non-restricted
tunables.
» # vmo -o maxfree
maxfree = 1088
» # vmo -o minfree
minfree = 960
__ 12. Make sure the vmstat command is still running in one session.
» # vmstat -I 3
__ 13. In another telnet window, run the ulimit command to set the soft user resource limit
for the data area to be unlimited. Then, run the memory-eater program in the
background.
# ulimit -d unlimited
» If the command is successful, there will not be any messages or output.
# ./memory-eater &
[1] 348402
Allocating memory
__ 14. While memory-eater and vmstat are running, set maxfree to 16 and minfree to 8.
» # vmo -o minfree=8
» # vmo -o maxfree=16
What happens to the memory and paging activity?
» The paging activity decreased because we reduced the threshold of free

pages (minfree and maxfree), so the page replacement algorithm stopped
stealing the same amount of pages as before. (The values for pi and po have
decreased.)
__ 15. Kill the memory-eater process running in background and observe memory and
paging activities.
What happened?
» # kill %1
» The size of the free list (fre) increased because of the termination of the
processes. The paging activity should remain low because there is no need
for paging replacement activities.

V3.1.0.1
EXempty __ 16. Use the vmo command to set maxfree and minfree back to their default values.
» # vmo -p -d maxfree
Setting maxfree to 1088 in nextboot file
» # vmo -p -d minfree
Setting minfree to 960 in nextboot file
__ 17. Stop the vmstat loop with <Ctrl-C>.
__ 18. Use the vmo command to set the maximum percentage of RAM that can be used for
caching client pages (maxclient%) to 10.
Why did you get a warning message?
» # vmo -o maxclient%=10
Setting maxclient% to 10
» Many AIX 6.1 tunables are now classified as restricted. Restricted tunables
are parameters which should not be changed unless recommended by AIX
development or support teams.
__ 19. Use the vmo command to reset all VMM tunables to their default value.
» # vmo -D
Part 2 - Observing memory utilization with svmon

In this section of the exercise you will use the svmon command to monitor memory
usage statistics.
Steps

» # cd /u/QV431/ex4
__ 22. Run the svmon -G command to display a global report of virtual memory usage
statistics, and record the following values:
Virtual Memory Statistics Current Value

The amount of real memory
(size in number of pages)
The amount of real memory in use
(inuse in number of pages)
The amount of free real memory
(free in number of pages)
Number pages in use of working segments
(work)
Number pages in use of persistent segments
(pers)
Number pages in use of persistent client
segments (clnt)
» # svmon -G
size inuse free pin virtual
memory 262144 146886 115258 114978 193358
pg space 131072 48615
work pers clnt other

pin 104476 0 0 10502
in use 145709 0 1177
PageSize PoolSize inuse pgsp pin virtual

s 4 KB - 29030 48615 23938 75502
m 64 KB - 7366 0 5690 7366

V3.1.0.1
EXempty
__ 23. Run the memory-eater program in the background and record the PID of the
process.
PID of the memory-eater
process
» # ./memory-eater &
[1] 487496
# Allocating memory
__ 24. While the memory-eater program is running, open another telnet window and run
the svmon -G command again. Record the values from the current output and
compare and record them to the last values (before memory-eater).
Virtual Memory Statistics Current Value Difference

The amount of real memory
(size in number of pages)
The amount of real memory in use
(inuse in number of pages)
The amount of free real memory
(free in number of pages)
Number pages in use of working segments
(work)
Number pages in use of persistent
segments (pers)
Number pages in use of persistent client
segments (clnt)
» # svmon -G
memory 262144 179496 82648 114998 225955
pg space 131072 48647

pin 104496 0 0 10502
in use 178309 0 1187

s 4 KB - 61624 48647 23942 108083
m 64 KB - 7367 0 5691 7367
» Using the data in the sample output, the differences are:

• The amount of real memory (size in number of pages):
262144
• The amount of real memory in use (inuse in number of pages):
179496-146886=32610 additional pages of real memory were used
• The amount of free real memory (free in number of pages):
82648-115258=32610 less free pages on the free page list
• Number pages in use of working segments (work):
178309-145709=32600 additional working segments were used
• Number pages in use of persistent segments (pers):
0
• Number pages in use of persistent client segments (clnt):
1187-1177=10 additional persistent client pages were used
__ 25. Using the svmon -G output in the last step, is memory over committed?
» The svmon -G output should be similar to the following:
memory 262144 179496 82648 114998 225955
pg space 131072 48647

pin 104496 0 0 10502
in use 178309 0 1187

s 4 KB - 61624 48647 23942 108083
m 64 KB - 7367 0 5691 7367
» The total amount of real memory is 262144 pages.

The amount of memory used is the virtual pages plus file system cache which
is 227142 pages (225955+1187).
In this case, memory is not over committed. About 87% of memory is in use
(227142/262144).
__ 26. Kill the memory-eater program.

» # kill %1

V3.1.0.1
EXempty Part 3 - Using PerfPMR data to examine memory usage

In this section of the exercise, you will work on an actual customer PMR to examine
memory statistics. You will not be collecting any data or generating any reports as part
of this exercise, instead the focus is on reading and analyzing the files that are
provided.
Note: Information identifying the specific PMR and/or the customer was removed to
protect confidentiality.
Steps
__ 28. Change your directory to /u/QV431/ex4/perfdata.

» # cd /u/QV431/ex4/perfdata
__ 29. Examine the content of the svmon_G.out file and record information about the
memory:
Configured Memory Configured Memory Free Memory Free Memory

(in 4 KB pages) (in MB) (in 4 KB pages) (in MB)
2031616 8126 223747 894
» # cat svmon_G.out
» The output is:
memory 2031616 1807869 223747 1059822 2137826
pg space 3145728 621384

pin 999615 0 0 60207
in use 1678445 0 129424

s 4 KB - 769357 621384 134654 1099314
m 64 KB - 64907 0 57823 64907
» Remember: the size metric displays the memory consumption (size) and of
free pages (free) in the default page size (4 KB).
__ 30. The customer reported slow performance in this system. The run queue size and
disk I/O bandwidth are considered normal over time (the PerfPMR baseline data
was used to confirm this information). The initial baseline data has not reported any
irregular paging activity.
Run the sar -r -f sar.bin command to obtain paging activity during this period of
time and look at the output.
# sar -r -f sar.bin
Can you see any abnormal paging activity now? _______________________
» The output is:
System configuration: lcpu=8 mem=7936MB mode=Capped
15:18:12 slots cycle/s fault/s odio/s

15:18:22 2554014 0.00 1762.77 32.69
15:18:32 2553997 0.00 151.62 25.98
15:18:42 2553998 0.00 71.70 8.88
15:18:52 2553998 0.00 27.69 12.30
15:19:02 2554022 0.00 758.96 63.94
15:19:12 2554023 0.00 694.74 1.49
15:19:22 2554159 0.00 571.76 131.81
15:19:32 2554241 0.00 3739.26 48.97
15:19:42 2554458 0.00 369.64 60.31
15:19:52 2554595 0.00 233.45 50.02
15:20:02 2553509 0.00 1152.89 116.43
» Yes, there is a consistent paging activity in the system which means that
configured memory for this partition is not adequate to the current workload.
__ 31. What application is using the most memory? Look at the SZ column in the psa.elfk
report. The SZ column display the size in 1 KB units of the core image of the
process.
» # more psa.elfk
or to more easily see the processes using the most memory, use the
following sort command:
# sort -n -r -k 10 psa.elfk | more
» Using the more psa.elfk command, part of the output is:
# more psa.elfk
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
303 A root 0 0 120 16 -- 31004110 384 Oct 07 - 29:05
swapper
200003 A root 1 0 0 60 20 19001400 720 Oct 07 - 0:31
/etc/init

V3.1.0.1
303 A root 8196 0 0 255 -- 21006110 384 Oct 07 -

EXempty 41953:59 wait

240001 A 18000 1065082 466954 0 60 20 bc3d5510 114876 22:44:54 -
0:56 ora_lgwr_IKBPROD
240001 A 18000 1069286 466954 0 60 20 19e81510 95548 f10007000066c0c8 09:45:55
- 0:02 oracleIKBPROD (LOCAL=NO)
240001 A 18000 1077346 466954 0 60 20 9b331510 92756 f100070004a3b8c8 12:18:49
240001 A 18000 1081450 466954 0 60 20 3bb25510 93636 22:49:20 - 0:07
oracleIKBPROD (LOCAL=NO)
» Using the command: sort -n -r -k 10 psa.elfk | more, part of the output

is:
# sort -n -r -k 10 psa.elfk | more
240801 A 18000 1106092 1638508 0 60 20 5af29400 147892 * Nov 22 -
12:18 /oracle/ent10gr2/server/product/home/bin/emagent
240001 A 18000 1065082 466954 0 60 20 bc3d5510 114876 22:44:54 -
0:56 ora_lgwr_IKBPROD
240001 A 18000 979142 466954 0 60 20 f259c510 106688 22:45:00 -
0:20 ora_arc1_IKBPROD
240001 A 18000 774196 466954 0 60 20 8b573510 106688 22:45:00 -
0:19 ora_arc0_IKBPROD
240001 A 18000 1024138 466954 0 60 20 9a691510 101484 f1000700017d18c8 09:40:23
240001 A 18000 1429662 466954 0 60 20 b3794510 100348 22:44:54 -
0:36 ora_dbw0_IKBPROD
240001 A 18000 1613852 466954 0 60 20 aa43510 97988 f1000700049c98c8 12:24:56
240001 A 18000 974972 466954 0 60 20 6c38f510 97760 f1000700002688c8 10:00:53
» Oracle applications are using most of the memory.
__ 32. Analyze the contents of the svmon_P.out file which reports the memory usage
statistics for all processes.
__ a. What processes are using the most memory?
__ b. Is it consistent with the previous step?
The svmon_P.out output is sorted based on memory use. The processes using
most memory is listed first, the process using least memory is listed last in the
output.
» # more svmon_P.out
» The output is:
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB
1024138 oracle 353367 224 379268 565707 Y N N
PageSize Inuse Pin Pgsp Virtual

s 4 KB 347031 176 379268 559371
m 64 KB 396 3 0 396
Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual

4400a 70000004 work default shmat/mmap s 46671 18 37804 65520
942d0 70000005 work default shmat/mmap s 44853 26 33254 65520
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB
1540160 oracle 353311 224 377951 564730 Y N N
PageSize Inuse Pin Pgsp Virtual

s 4 KB 346975 176 377951 558394
m 64 KB 396 3 0 396
Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual

4400a 70000004 work default shmat/mmap s 46671 18 37804 65520
942d0 70000005 work default shmat/mmap s 44853 26 33254 65520
» Yes, it is consistent. The Oracle processes are using most of the memory.
The one listed first in the svmon_P.out file was fifth at the time the psa.elfk
report was created. (Remember, the svmon reports are just snapshots, so top
process using memory at the time one report was created may not be at the
top when the next process was created.)
» The system was reconfigured by adding more memory and the performance
got back to normal.
END OF LAB

V3.1.0.1
EXempty Exercise 5. I/O Performance Monitoring

(with Hints)
Introduction
This exercise covers the monitoring of I/O performance. You will use
iostat, vmstat, sar, lvmstat, and filemon commands to
determine what bottleneck(s) you have on the system.
Exercise Objectives
• Use the filemon utility
• Understand I/O wait
• Locate and fix I/O bottlenecks
• Work with synchronous writes
• Display and correct fragmentation
References
© Copyright IBM Corp. 2009 Exercise 5. I/O Performance Monitoring 5-1


Preface
questions.

V3.1.0.1
EXempty Part 1 - Using the filemon Utility

In this section of the exercise, you will examine some of the most important reports
generated by the trace based filemon command.
Note: Your class lab system is a logical partition which uses a subset of a computer's
hardware resources, so the I/O statistics that you collect during the class may not
always be consistent. In order to help an analysis based on accurate data,
pre-generated output files (PG-* files) with I/O statistics are provided in some steps of
this exercise.
__ 2. Run the lsdev -Cc disk command to display information about the disk devices
configured in your class lab system.
» # lsdev -Cc disk
hdisk0 Available Virtual SCSI Disk Drive
What type of disks you have in your class lab system?
» Your lab system is configured with virtual SCSI disks provided by a Virtual I/O
Server partition.

» # cd /u/QV431/ex5
__ 4. Run the filemon command to start monitoring the I/O activity.

The filemon command monitors the performance of the file system, and reports the
I/O activity on behalf of logical files, virtual memory segments, logical volumes, and
physical volumes. It runs in the background.
# filemon -O detailed,lf,lv,pv -o fmon.out
- The -O flag specifies file system levels reports:

• detailed: to generate the detailed report

• lf: logical file entries
• lv: logical volume entries
• pv: physical volume entries
- The -o flag specifies the output report file.
Note: The I/O activity will be written into the fmon.out file.
Run trcstop command to signal end of trace.
Mon Mar 2 21:11:07 2009
System: AIX 6.1 Node: woolf221 Machine: 00066BD2D900
__ 5. Run the wseq program to generate a large sequential file as follows:

# ./wseq wseq.out 30000 512 0
The wseq program does sequential writes to a file, allows you to control the block
size and allows you to do an initial seek into the file before beginning to write. It has
the following parameters:
- First parameter should be a filename (name it wseq.out)
- Second parameter should be the number of blocks (use 30000)
- Third parameter should be the block size (use 512)
- Fourth parameter should be the seek offset (use 0)
__ 6. Stop recording the trace events by issuing the trcstop command.

» # trcstop
# [filemon: Reporting started]
[filemon: Reporting completed]
[filemon: 9.428 secs in measured interval]
__ 7. Look at the fmon.out file that you have just generated.

__ a. Were any events lost? _____________
__ b. How many writes were done to the wseq.out file? ______________
__ c. The number of writes to the wseq.out file should be 30000. If it is less than
30000, what could be the problem?

V3.1.0.1
EXempty

Mon Mar 2 21:11:07 2009
System: AIX 6.1 Node: rand212 Machine: 00066C32D900
Cpu utilization: 42.6%

Cpu allocation: 100.0%
2801712 events were lost. Reported data may have inconsistencies or errors.
Most Active Files

------------------------------------------------------------------------
#MBs #opns #rds #wrs file volume:inode
------------------------------------------------------------------------
5.1 1 0 10522 wseq.out /dev/hd1:81922
0.0 1 2 0 ksh.cat /dev/hd2:46280
Most Active Logical Volumes

------------------------------------------------------------------------
util #rblk #wblk KB/s volume description
------------------------------------------------------------------------
0.57 0 10240 543.0 /dev/hd1 /home
0.00 0 8 0.4 /dev/hd8 jfs2log
Most Active Physical Volumes

------------------------------------------------------------------------
------------------------------------------------------------------------
0.57 0 10248 543.5 /dev/hdisk0 Virtual SCSI Disk Drive
» Some events were probably lost and the number of writes to wseq.out was
probably less than 30000. The reason is likely that the trace buffer was not
large enough to capture all the events and missed some write activity to the
wseq.out file.
__ 8. Run the filemon command again, but this time specify the trace buffer size. The -T
flag sets the kernel's trace buffer size. The default size is 64000 bytes per CPU. Try
setting it to 10240000.
# filemon -O detailed,lf,lv,pv -T 10240000 -o fmon1.out
Note: The I/O activity will be written into the fmon1.out file.


Run trcstop command to signal end of trace.
Thu Feb 19 21:11:42 2009
__ 9. Run the wseq program to generate a large sequential file as follows:

# ./wseq wseq.out 30000 512 0
__ 10. Stop recording the trace events by issuing the trcstop command.
» # trcstop
# [filemon: Reporting started]
[filemon: Reporting completed]
[filemon: 6.068 secs in measured interval]
__ 11. Look at the fmon1.out file that you have just generated.
__ a. Were any events lost? _____________
__ b. How many writes were done to the wseq.out file? ______________

Mon Mar 2 21:11:42 2009
System: AIX 6.1 Node: rand212 Machine: 00066C32D900

Most Active Files

------------------------------------------------------------------------
------------------------------------------------------------------------
14.6 1 0 30000 wseq.out /dev/hd1:81922
0.3 1 89 0 unix /dev/hd2:29636
0.0 1 12 0 services /dev/hd4:889
0.0 5 10 0 resolv.conf /dev/hd4:911
0.0 3 6 0 hosts /dev/hd4:471
0.0 2 4 0 ksh.cat /dev/hd2:46280
0.0 1 2 0 cmdtrace.cat /dev/hd2:45866
0.0 9 2 0 SWservAt /dev/hd4:795
0.0 9 2 0 SWservAt.vc /dev/hd4:796

------------------------------------------------------------------------
------------------------------------------------------------------------

V3.1.0.1
0.03 0 30000 2472.0 /dev/hd1 /home

EXempty 0.00 0 32 2.6 /dev/hd8 jfs2log

------------------------------------------------------------------------
------------------------------------------------------------------------
0.03 0 30032 2474.7 /dev/hdisk0 Virtual SCSI Disk Drive
» Depending on the activity of your system between the time you started
filemon and stopped it with the trcstop command, the number of writes to
wseq.out should be 30000 (or at least more than it was the first time). Ideally,
there should not be any missing events.
__ 12. Now, examine the pre-generated PG-fmon.out file, and answer the following
questions.
__ a. Which is the most active file? ___________________________
» wseq.out
__ b. Which is the most active logical volume? ________________________

» /dev/hd1
__ c. Which is the most active physical volume? _________________________

» /dev/hdisk0
__ d. How sequential were the logical volume writes for the most active logical
volume?
» In the Detailed Logical Volume Stats report, look for the number of write
sequences compared to the number of writes.
In this example, there were 7 write sequences for 118 writes.
__ e. How sequential were the physical volume writes for the most active physical
volume?
» In the Detailed Physical Volume Stats report, look for the number of write
sequences compared to the number of writes.
In this example, there were 55 write sequences for 169 writes.

__ f. How many writes have taken place to the wseq.out file? _______________
» In the Detailed File Stats report, look for the number of writes.
In this example, there were 30000 writes.
__ g. What is the average time of the writes to the wseq.out file? _____________
» In the Detailed File Stats report, look for the write times (msec).
In this example, the average of each write is 0.006 milliseconds.
» # more PG-fmon.out
» The output should be the following:
Mon Mar 2 15:31:44 2009

Most Active Files

------------------------------------------------------------------------
------------------------------------------------------------------------
14.6 1 0 30000 wseq.out /dev/hd1:134
0.7 2 178 0 unix /dev/hd2:46575
0.0 9 10 0 vfs /dev/hd4:947
0.0 4 8 0 ksh.cat /dev/hd2:66383
0.0 3 5 0 user /dev/hd4:917
0.0 3 4 0 login.cfg /dev/hd4:905
0.0 3 4 0 group /dev/hd4:902
0.0 2 3 0 user.roles /dev/hd4:918
0.0 3 3 0 limits /dev/hd4:904
0.0 3 3 0 environ /dev/hd4:901
0.0 1 3 0 netsvc.conf /dev/hd4:568
0.0 1 2 0 methods.cfg /dev/hd4:542
0.0 1 2 0 cmdtrace.cat /dev/hd2:65966
0.0 1 2 0 group /dev/hd4:458
0.0 2 2 0 config /dev/hd4:663
0.0 6 2 0 passwd /dev/hd4:616
0.0 6 2 0 passwd /dev/hd4:906
0.0 1 2 0 environment /dev/hd4:456
0.0 1 2 0 resolv.conf /dev/hd4:934
0.0 1 0 1 active_list.tmp

------------------------------------------------------------------------
------------------------------------------------------------------------
0.00 0 30000 225.2 /dev/hd1 /home
0.00 0 264 2.0 /dev/hd8 jfs2log
0.00 0 136 1.0 /dev/hd9var /var
0.00 0 104 0.8 /dev/hd4 /

V3.1.0.1
0.00 0 104 0.8 /dev/hd2 /usr

EXempty 0.00 0 88 0.7 /dev/hd3 /tmp
0.00 0 8 0.1 /dev/hd10opt /opt

------------------------------------------------------------------------
------------------------------------------------------------------------
0.01 0 30440 228.5 /dev/hdisk0 N/A
0.00 0 264 2.0 /dev/hdisk1 N/A
------------------------------------------------------------------------
Detailed File Stats
------------------------------------------------------------------------
FILE: /wseq.out volume: /dev/hd1 inode: 134

opens: 1
total bytes xfrd: 15360000
writes: 30000 (0 errs)
write sizes (bytes): avg 512.0 min 512 max 512 sdev 0.0
write times (msec): avg 0.006 min 0.004 max 0.331 sdev 0.010
lseeks: 1
FILE: /unix volume: /dev/hd2 inode: 46575

opens: 2
reads: 178 (0 errs)
read sizes (bytes): avg 2048.0 min 4096 max 4096 sdev 0.0
read times (msec): avg 0.002 min 0.003 max 0.005 sdev 0.002
lseeks: 328
FILE: /etc/vfs volume: /dev/hd4 inode: 947

opens: 9
reads: 10 (0 errs)
FILE: /usr/lib/nls/msg/en_US/ksh.cat volume: /dev/hd2 inode: 66383

opens: 4
reads: 8 (0 errs)
lseeks: 20
FILE: /etc/security/user volume: /dev/hd4 inode: 917

opens: 3
reads: 5 (0 errs)
lseeks: 308
FILE: /etc/security/login.cfg volume: /dev/hd4 inode: 905

opens: 3
reads: 4 (0 errs)
lseeks: 149

FILE: /etc/security/group volume: /dev/hd4 inode: 902

opens: 3
reads: 4 (0 errs)
lseeks: 78
FILE: /etc/security/user.roles volume: /dev/hd4 inode: 918

opens: 2
reads: 3 (0 errs)
lseeks: 14
FILE: /etc/security/limits volume: /dev/hd4 inode: 904

opens: 3
reads: 3 (0 errs)
lseeks: 67
FILE: /etc/security/environ volume: /dev/hd4 inode: 901

opens: 3
reads: 3 (0 errs)
lseeks: 19
FILE: /etc/netsvc.conf volume: /dev/hd4 inode: 568

opens: 1
reads: 3 (0 errs)
FILE: /usr/lib/security/methods.cfg volume: /dev/hd4 inode: 542

opens: 1
reads: 2 (0 errs)
lseeks: 40
FILE: /usr/lib/nls/msg/en_US/cmdtrace.cat volume: /dev/hd2 inode: 65966

opens: 1
reads: 2 (0 errs)
lseeks: 8
FILE: /etc/group volume: /dev/hd4 inode: 458

opens: 1
reads: 2 (0 errs)
lseeks: 49

V3.1.0.1
FILE: /etc/security/audit/config volume: /dev/hd4 inode: 663

EXempty opens: 2
reads: 2 (0 errs)
lseeks: 42
FILE: /etc/passwd volume: /dev/hd4 inode: 616

opens: 6
reads: 2 (0 errs)
lseeks: 8
FILE: /etc/security/passwd volume: /dev/hd4 inode: 906

opens: 6
reads: 2 (0 errs)
lseeks: 12
FILE: /etc/environment volume: /dev/hd4 inode: 456

opens: 1
reads: 2 (0 errs)
FILE: /etc/resolv.conf volume: /dev/hd4 inode: 934

opens: 1
reads: 2 (0 errs)
FILE: /var/adm/SRC/active_list.tmp
opens: 1
writes: 1 (0 errs)
write sizes (bytes): avg 3597.0 min 3597 max 3597 sdev 0.0
------------------------------------------------------------------------
Detailed Logical Volume Stats (512 byte blocks)
------------------------------------------------------------------------
VOLUME: /dev/hd1 description: /home

write sizes (blks): avg 254.2 min 48 max 256 sdev 19.1
write sequences: 7
write seq. lengths: avg 4285.7 min 256 max 18224 sdev 6697.9
seeks: 7 (5.9%)
seek dist (blks): init 1881600,
avg 930090.7 min 256 max 5539584 sdev 2061442.7
time to next req(msec): avg 156.860 min 0.334 max 18318.228 sdev 1679.019
throughput: 225.2 KB/sec
utilization: 0.00
VOLUME: /dev/hd8 description: jfs2log

writes: 33 (0 errs)
write sequences: 26
seeks: 26 (78.8%)
avg 8.0 min 8 max 8 sdev 0.0
utilization: 0.00
VOLUME: /dev/hd9var description: /var

writes: 17 (0 errs)
write sequences: 17
seeks: 17 (100.0%)
avg 54000.5 min 8 max 213200 sdev 80081.2
utilization: 0.00
VOLUME: /dev/hd4 description: /

writes: 12 (0 errs)
write sequences: 11
seeks: 11 (91.7%)
avg 143412.8 min 8 max 463504 sdev 186725.5
utilization: 0.00
VOLUME: /dev/hd2 description: /usr

writes: 13 (0 errs)
write sequences: 13
seeks: 13 (100.0%)
avg 384155.3 min 8 max 1518272 sdev 569132.1
utilization: 0.00
VOLUME: /dev/hd3 description: /tmp

writes: 11 (0 errs)
write sequences: 9
seeks: 9 (81.8%)
avg 7653.0 min 32 max 16856 sdev 5332.7
utilization: 0.00

V3.1.0.1
EXempty VOLUME: /dev/hd10opt description: /opt

writes: 1 (0 errs)
write sequences: 1
seeks: 1 (100.0%)
seek dist (blks): init 136560
utilization: 0.00
------------------------------------------------------------------------
Detailed Physical Volume Stats (512 byte blocks)
------------------------------------------------------------------------
VOLUME: /dev/hdisk0 description: N/A

write sequences: 55
seeks: 55 (32.5%)
avg 1692096.6 min 8 max 14839000 sdev 3223136.2
seek dist (%tot blks):init 40.41320,
avg 1.18020 min 0.00001 max 10.34985 sdev 2.24806
utilization: 0.01
VOLUME: /dev/hdisk1 description: N/A

writes: 33 (0 errs)
write sequences: 26
seeks: 26 (78.8%)
avg 8.0 min 8 max 8 sdev 0.0
seek dist (%tot blks):init 40.14686,
avg 0.00001 min 0.00001 max 0.00001 sdev 0.00000
utilization: 0.00

Part 2 - Understanding I/O Wait

In this section of the exercise you will examine the CPU utilization in two different
situations, one in an I/O bound environment and the other in a mix of high intensive
CPU usage and sequential I/O writing.
Steps
__ 13. Open two telnet windows, login as root, and change your directory to /u/QV431/ex5.
__ 14. Run the vmstat -I 3 command to display the I/O oriented view and leave it running
throughout this part of the exercise (Part 2 - Understanding I/O Wait).
There should not be any programs running from previous exercises, so there should
not be any CPU activity. Take a look at the percentage of CPU idle (id column must
be near 100%).
» # vmstat -I 3

-------- ----------- ------------------------ ------------ -----------------------
2 0 0 170537 58616 3 0 0 0 0 0 1 64 156 0 0 99 0 0.00 0.6
2 0 0 170537 58616 0 0 0 0 0 0 0 21 147 0 0 99 0 0.00 0.5
2 0 0 170538 58615 0 0 0 0 0 0 0 122 159 0 1 99 0 0.00 0.8
2 0 0 170538 58615 0 1 0 0 0 0 6 18 151 0 1 99 0 0.00 0.8
2 0 0 170538 58615 0 0 0 0 0 0 0 20 143 0 0 99 0 0.00 0.5
Note: Your class lab system is a logical partition which uses a subset of a
computer's hardware resource. In a LPAR environment, the vmstat command
reports the number of physical processors consumed (pc), and the percentage of
entitlement consumed (ec).
While vmstat is running, continue to the next step.
__ 15. In the second telnet window, run the I/O intensive script named iowait in
background, as follows:
# ./iowait &
[1] 626712
While iowait is running, continue to the next step.

V3.1.0.1
EXempty
__ 16. While iowait is running, watch the vmstat -I 3 output and record the following
information from the vmstat output. Try to pick a number that is the average over the
reported intervals.
Note: You may need to wait few minutes until you see some I/O wait (wa) activity.
CPU Statistics % Average

User (us) 2
System (sy) 66
Idle (id) 18
I/O wait (wa) 14
What is your interpretation of the results? Is the system CPU or I/O bound?
» Here is the sample output of vmstat -I 3 while the iowait program is

running:

-------- ----------- ------------------------ ------------ -----------------------
r b p avm fre fi fo pi po fr sr in sy cs us sy id wa pc ec1
0 1 0 214547 4653 8129 8102 0 0 11715 27549 383 16298 11063 2 65 19 14 0.28 70.2
1 1 0 214547 4654 7835 7738 0 0 12286 25174 426 15502 11714 2 65 19 14 0.28 70.3
0 1 0 214547 4557 8116 8115 0 0 13242 26656 420 16223 12854 2 68 17 13 0.29 73.2
1 1 0 214547 4597 8205 8173 0 0 13755 24280 387 16425 13163 2 67 18 13 0.29 72.9
» The CPUs were busy 68% of the time (us + sy), so this indicates the system
is not CPU bound. We see that 14% is in I/O wait and 18% is idle. Does this
mean the system is overloaded? The answer is no, but if the I/O completed
faster that would provide a better overall performance.
__ 17. Make sure the vmstat -I 3 command and the iowait script are still running.
__ 18. Run the CPU intensive program named cpuprog in background, as follows:
# ./cpuprog &


[2] 438388
While the cpuprog program is running, continue to the next step.
__ 19. Record the following information from the vmstat output. Try to pick a number that is
the average over the reported intervals:
Statistics % Average
User (us) 17
System (sy) 83
Idle (id) 0
I/O wait (wa) 0
What is your interpretation of the results? Is the system CPU or I/O bound?
» The vmstat output should be similar to the following:

-------- ----------- ------------------------ ------------ -----------------------
r b p avm fre fi fo pi po fr sr in sy cs us sy id wa pc ec1
1 1 0 214659 5362 8142 7949 0 0 10937 22045 448 899409 11361 17 83 0 0 1.00 249.8
1 1 0 214659 4337 7758 7597 0 0 12597 27727 443 897097 13043 17 83 0 0 1.00 249.9
1 1 0 214659 4323 8014 7795 0 0 13509 27083 465 893665 14024 17 83 0 0 1.00 249.9
1 1 0 214659 4397 8100 7974 0 0 14048 24688 417 893099 14412 17 83 0 0 1.00 249.9
1 1 0 214659 4271 7938 7846 0 0 13966 24598 442 891612 14460 17 83 0 0 1.00 249.9
1 1 0 214659 4210 7811 7636 0 0 13818 28161 428 896258 14242 17 83 0 0 1.00 250.2
» When the iowait script is the only job running, the I/O wait value is around
14% and idle is around 18%. As soon as the cpuprog program starts, the I/O
wait and idle time drops to 0%. Why? Because the system has something to
do (running the CPU intensive cpuprog program) while the I/Os are waiting to
complete.

V3.1.0.1
EXempty » The user time has gone up from 2% to 17% and the system time has gone up
from 66% to 83%. Why has the system time increased too? Because the
number of system calls has increased.
» Is the system overloaded? The answer is probably yes - the system is now
CPU bound.
__ 20. Kill the iowait script and the cpuprog program that are running in background.
» # kill %1 ; kill %2
[2] + Terminated ./cpuprog &
[1] + Terminated ./iowait &
__ 21. Kill the vmstat command.

» # <CTRL-C>

Part 3 - Detecting and fixing I/O Bottlenecks

In this section of the exercise, you will examine several pre-generated reports to
determine the I/O bottleneck of the system, then make some specific tuning
recommendations in order to improve the system performance.
Note: You are not collecting the I/O statistics in this section but analyzing pre-generated
I/O statistical files (PG-* files).
__ 22. If you are not already logged in, login to your assigned system as the root user and
change your directory to /u/QV431/ex5.
__ 23. A customer reports that the average application server response time has not
improved after a processor upgrade. Based on this description, we suspect some
kind of I/O problem.
The following table shows the current application I/O environment:
Mount Point Type Logical Volume Physical Volume

/fs1 jfs lv1 hdisk0
/fs2 jfs lv2 hdisk0
N/A jfslog loglv00 hdisk0
What statistical reports would help to determine if there is an I/O performance

problem in the system?
» iostat and sar reports are always important in an initial performance
analysis. Also, they may help to identify I/O bottlenecks on disks.
» lvmstat and filemon reports help to determine, with additional details, what
is causing I/O bottlenecks.
__ 24. Examine the pre-generated iostat and sar reports (PG-iostat.before and
PG-sar.before files, respectively) and answer the following questions.
__ a. What does the iostat report indicate?

V3.1.0.1
EXempty » The iostat report indicates an I/O bottleneck on hdisk0 (% tm_act column =
100%)
» # more PG-iostat.before
tty: tin tout avg-cpu: % user % sys % idle % iowait

0.0 11.6 95.8 4.0 0.0 0.2

hdisk0 100.0 24745.6 1067.6 0 123728
hdisk1 0.0 0.0 0.0 0 0
hdisk2 0.0 0.0 0.0 0 0
hdisk3 0.0 0.0 0.0 0 0
hdisk4 0.0 0.0 0.0 0 0
hdisk5 0.0 0.0 0.0 0 0
cd0 0.0 0.0 0.0 0 0

0.0 129.8 95.9 3.9 0.0 0.2

hdisk0 100.0 24480.0 1006.0 0 122400
hdisk1 0.0 0.0 0.0 0 0
hdisk2 0.0 0.0 0.0 0 0
hdisk3 0.0 0.0 0.0 0 0
hdisk4 0.0 0.0 0.0 0 0
hdisk5 0.0 0.0 0.0 0 0
cd0 0.0 0.0 0.0 0 0

0.0 129.8 95.7 4.1 0.0 0.2

hdisk0 100.0 24455.2 1007.6 0 122276
hdisk1 0.0 0.0 0.0 0 0
hdisk2 0.0 0.0 0.0 0 0
hdisk3 0.0 0.0 0.0 0 0
hdisk4 0.0 0.0 0.0 0 0
hdisk5 0.0 0.0 0.0 0 0
cd0 0.0 0.0 0.0 0 0
__ b. What does the sar report indicate?

» The sar report indicates an I/O bottleneck on hdisk0 (%busy column = 100%)
» # more PG-sar.before
AIX woolf221 1 6 00066BD2D900 02/20/09
System configuration: lcpu=2 drives=7 mode=Capped

14:57:13 hdisk0 100 1.5 1257 27800 0.7 7.5

hdisk1 0 0.0 0 0 0.0 0.0
hdisk2 0 0.0 0 0 0.0 0.0
hdisk3 0 0.0 0 0 0.0 0.0
hdisk4 0 0.0 0 0 0.0 0.0
hdisk5 0 0.0 0 0 0.0 0.0
cd0 0 0.0 0 0 0.0 0.0
14:57:18 hdisk0 100 1.3 1228 27568 0.6 7.1

hdisk1 0 0.0 0 0 0.0 0.0
hdisk2 0 0.0 0 0 0.0 0.0
hdisk3 0 0.0 0 0 0.0 0.0
hdisk4 0 0.0 0 0 0.0 0.0
hdisk5 0 0.0 0 0 0.0 0.0
cd0 0 0.0 0 0 0.0 0.0
14:57:23 hdisk0 100 1.5 1102 24937 0.5 7.0

hdisk1 0 0.0 0 0 0.0 0.0
hdisk2 0 0.0 0 0 0.0 0.0
hdisk3 0 0.0 0 0 0.0 0.0
hdisk4 0 0.0 0 0 0.0 0.0
hdisk5 0 0.0 0 0 0.0 0.0
cd0 0 0.0 0 0 0.0 0.0
Average hdisk0 100 1.4 1195 26768 0.6 7.2

hdisk1 0 0.0 0 0 0.0 0.0
hdisk2 0 0.0 0 0 0.0 0.0
hdisk3 0 0.0 0 0 0.0 0.0
hdisk4 0 0.0 0 0 0.0 0.0
hdisk5 0 0.0 0 0 0.0 0.0
cd0 0 0.0 0 0 0.0 0.0
__ 25. Examine the pre-generated lvmstat and filemon reports (PG-lvmstat.before and
PG-filemon.before files, respectively) and answer the following questions.
» # more PG-lvmstat.before
Logical Volume iocnt Kb_read Kb_wrtn Kbps
lv2 19663 0 314596 2.83
lv1 4326 0 69204 0.62
loglv00 1007 0 4028 0.04
hd8 12 0 48 0.00
hd9var 2 0 8 0.00
hd11admin 0 0 0 0.00
hd10opt 0 0 0 0.00
hd1 0 0 0 0.00
hd3 0 0 0 0.00
hd2 0 0 0 0.00
hd4 0 0 0 0.00
paging00 0 0 0 0.00
hd6 0 0 0 0.00
hd5 0 0 0 0.00

lv1 4175 0 66800 13360.00
lv2 3950 0 63200 12640.00
loglv00 340 0 1360 272.00

V3.1.0.1
lv2 4882 0 54692 10938.40

EXempty lv1 4250 0 68000 13600.00
loglv00 327 0 1308 261.60
» # more PG-filemon.before
Fri Feb 20 16:06:09 2009


------------------------------------------------------------------------
------------------------------------------------------------------------
0.63 0 398400 14262.0 /dev/lv1 /fs1
0.62 0 290896 10413.6 /dev/lv2 /fs2
0.51 0 7224 258.6 /dev/loglv00 jfslog
0.00 0 16 0.6 /dev/hd8 jfs2log
__ a. What did you find on the lvmstat report regarding the I/O operations on logical
volumes?
» The majority of the I/O operation are on the lv1 and lv2 logical volumes
(iocnt column).
__ b. What did you find on the filemon report regarding the I/O operations on logical
volumes?
» The majority of the I/O operation are on the lv1 and lv2 logical volumes
(#wblk column).
__ c. What is causing the I/O bottleneck?

» The two busiest logical volumes (lv1 and lv2) are located on the same
physical volume (hdisk0).

__ d. Is there a way to improve the situation?

» Yes
__ e. What can be done to improve the I/O performance?

» If you put one of the busiest logical volumes on another physical volume, the
I/O performance should improve.
You can use the migratepv command to move the logical volumes to another
disk. An example is:
# migratepv -l lv2 hdisk0 hdisk1
If the command is successful, there will not be any messages or output.
__ 26. Following the suggestion presented by the support team, the customer moved one
of the busiest logical volumes to another physical volume as described in the
following table:

/fs1 jfs lv1 hdisk0
Moved from
/fs2 jfs lv2
hdisk0 to hdisk1
N/A jfslog loglv00 hdisk0
Examine the new iostat, sar, and lvmstat reports. The PG-iostat.after,
PG-sar.after, and PG-lvmstat.after files contain these new reports.
Answer the following questions.
__ a. Did the situation improve?

V3.1.0.1
EXempty
» Yes
__ b. What did you observe in terms of physical I/O transactions?
» As seen in the iostat, sar, and lvmstat reports, the I/O transactions are
now split across two physical volumes, hdisk0 and hdisk1
__ c. What did you observe in terms of I/O throughput?
» The throughput of both logical volumes is greater now than before (Kbps
column in the lvmstat report)
» # more PG-iostat.after

0.0 11.6 95.4 4.6 0.0 0.0

hdisk0 85.7 16641.8 857.9 0 83292
hdisk1 54.5 15390.2 713.9 0 77028
hdisk2 0.0 0.0 0.0 0 0
hdisk3 0.0 0.0 0.0 0 0
hdisk4 0.0 0.0 0.0 0 0
hdisk5 0.0 0.0 0.0 0 0
cd0 0.0 0.0 0.0 0 0

0.0 130.0 95.5 4.5 0.0 0.0

hdisk0 80.6 16251.2 839.0 0 81256
hdisk1 44.8 15710.4 743.6 0 78552
hdisk2 0.0 0.0 0.0 0 0
hdisk3 0.0 0.0 0.0 0 0
hdisk4 0.0 0.0 0.0 0 0
hdisk5 0.0 0.0 0.0 0 0
cd0 0.0 0.0 0.0 0 0


0.0 130.0 95.2 4.8 0.0 0.0

hdisk0 89.2 18038.4 926.0 0 90192
hdisk1 49.4 16505.6 786.4 0 82528
hdisk2 0.0 0.0 0.0 0 0
hdisk3 0.0 0.0 0.0 0 0
hdisk4 0.0 0.0 0.0 0 0
hdisk5 0.0 0.0 0.0 0 0
cd0 0.0 0.0 0.0 0 0
» # more PG-sar.after
AIX woolf221 1 6 00066BD2D900 02/20/09
System configuration: lcpu=2 drives=7 mode=Capped
17:57:47 hdisk0 92 0.7 951 18519 0.7 7.4

hdisk1 58 0.5 729 15728 0.7 9.0
hdisk2 0 0.0 0 0 0.0 0.0
hdisk3 0 0.0 0 0 0.0 0.0
hdisk4 0 0.0 0 0 0.0 0.0
hdisk5 0 0.0 0 0 0.0 0.0
cd0 0 0.0 0 0 0.0 0.0
17:57:52 hdisk0 95 0.6 877 16986 0.7 7.5

hdisk1 56 0.6 733 16382 0.7 9.3
hdisk2 0 0.0 0 0 0.0 0.0
hdisk3 0 0.0 0 0 0.0 0.0
hdisk4 0 0.0 0 0 0.0 0.0
hdisk5 0 0.0 0 0 0.0 0.0
cd0 0 0.0 0 0 0.0 0.0
17:57:57 hdisk0 83 0.6 830 16088 0.5 6.9

hdisk1 48 0.5 734 15627 0.6 8.8
hdisk2 0 0.0 0 0 0.0 0.0
hdisk3 0 0.0 0 0 0.0 0.0
hdisk4 0 0.0 0 0 0.0 0.0
hdisk5 0 0.0 0 0 0.0 0.0
cd0 0 0.0 0 0 0.0 0.0
Average hdisk0 90 0.6 886 17197 0.6 7.2

hdisk1 54 0.5 732 15912 0.6 9.0
hdisk2 0 0.0 0 0 0.0 0.0
hdisk3 0 0.0 0 0 0.0 0.0
hdisk4 0 0.0 0 0 0.0 0.0
hdisk5 0 0.0 0 0 0.0 0.0
cd0 0 0.0 0 0 0.0 0.0

V3.1.0.1
EXempty » # more PG-lvmstat.after

lv2 4622511 0 71548736 604.71
lv1 681556 8 10904516 92.16
loglv00 347558 0 1390232 11.75
hd8 2311 0 9244 0.08
hd9var 931 0 3892 0.03
hd4 828 4 3348 0.03
hd3 507 4 2024 0.02
hd2 455 3668 1116 0.04
hd5 76 172 132 0.00
hd10opt 38 0 152 0.00
hd1 30 0 120 0.00
hd11admin 0 0 0 0.00
paging00 0 0 0 0.00
hd6 0 0 0 0.00

lv1 5725 0 91600 18320.00
lv2 5384 0 80636 16127.20
loglv00 451 0 1804 360.80

lv2 6559 0 86380 17276.00
lv1 5900 0 94400 18880.00
loglv00 477 0 1908 381.60

Part 4 - Working with Synchronous Writes

In this section of the exercise, you will look at pre-generated reports to determine the
I/O bottleneck of a program that issues synchronous writes to a file located on a single
physical volume. Then, make some specific tuning recommendations in order to
improve the program performance.
Note: You are not collecting the I/O statistics in this section but analyzing pre-generated
I/O statistics files (PG-* files).
__ 28. A customer reports that the disk I/O throughput of the syncwrite program is not
satisfactory. The syncwrite program creates a sequential file using synchronous
writes.
This is the time of the execution of the syncwrite program:
Time
real 0m46.74s
user 0m0.01s
sys 0m0.16s
The following table shows the current application I/O environment:

/syncfs jfs2 synclv hdisk0
N/A jfs2log hd8 hdisk0
Examine the pre-generated filemon report (PG-sync-filemon.before file) and

answer the following questions.
» # more PG-sync-filemon.before
Sat Feb 21 11:35:32 2009


V3.1.0.1
Most Active Files

EXempty ------------------------------------------------------------------------
------------------------------------------------------------------------
10.9 1 0 2802 syncfile /dev/synclv:4
0.3 1 89 0 unix /dev/hd2:46575
0.0 1 12 0 services /dev/hd4:852
Most Active Segments

------------------------------------------------------------------------
#MBs #rpgs #wpgs segid segtype volume:inode
------------------------------------------------------------------------
11.0 0 2811 29bee client

------------------------------------------------------------------------
------------------------------------------------------------------------
0.39 0 24416 196.5 /dev/hd8 jfs2log
0.37 0 22440 180.6 /dev/synclv /syncfs
0.00 0 224 1.8 /dev/hd4 /

------------------------------------------------------------------------
------------------------------------------------------------------------
0.75 0 47568 382.8 /dev/hdisk0 N/A
__ a. What is causing the slow disk I/O throughput?

» All I/O operations (to the file itself and to the JFS2 log) are going to the same
physical volume (hdisk0).
__ b. Is there a way to improve the situation?

» Yes
__ c. What can be done to improve the I/O performance?

» Because of the synchronous nature of these writes operations, the JFS2 log
became busy.
In order to maintain a consistent file system structure, AIX duplicates
transactions that are made to file system metadata to the log. This could
cause an I/O bottleneck.
The recommendation would be to migrate the JFS2 log device to another
physical disk.
You can use the migratepv command to move the log to another disk.

__ 29. Following the suggestion presented by the support team, the customer moved the
JFS2 log device to another physical disk, as described in the following table:

/syncfs jfs2 synclv hdisk0
moved from
N/A jfs2log hd8
hdisk0 to hdisk1
The situation has improved. Now, the syncwrite program runs much faster than
before, as you can see below:
Time
real 0m23.80s
user 0m0.01s
sys 0m0.15s
Examine the new filemon report stored in the PG-sync-filemon.after file and
answer the following questions.
» # more PG-sync-filemon.after
Sat Feb 21 12:17:36 2009

Most Active Files

------------------------------------------------------------------------
------------------------------------------------------------------------
11.0 1 0 2810 syncfile /dev/synclv:4
0.3 1 89 0 unix /dev/hd2:46575
0.0 9 10 0 vfs /dev/hd4:947
Most Active Segments

------------------------------------------------------------------------
#MBs #rpgs #wpgs segid segtype volume:inode
------------------------------------------------------------------------
11.0 0 2811 2914e client

V3.1.0.1

EXempty ------------------------------------------------------------------------
------------------------------------------------------------------------
0.33 0 24496 305.7 /dev/hd8 jfs2log
0.26 0 22504 280.8 /dev/synclv /syncfs
0.00 0 8 0.1 /dev/hd10opt /opt

------------------------------------------------------------------------
------------------------------------------------------------------------
0.32 0 24496 305.7 /dev/hdisk1 N/A
0.26 0 22720 283.5 /dev/hdisk0 N/A
__ a. What did you observe in terms of I/O transactions?
» The I/O transactions are now split across two physical volumes, hdisk0
(syncfile) and hdisk1 (JFS2 log)
__ b. What did you observe in terms of I/O throughput?

» The throughput of both logical volumes (synclv and hd8) is greater now than
before. (KB/s column in the filemon report)

Part 5 - File System and File Fragmentation

In this section of the exercise, you will investigate displaying and correcting file system
and file fragmentation on JFS.
__ 31. Create a JFS file system named /fragfs on rootvg, then mount it. Use the following
commands:
# crfs -v jfs -g rootvg -a size=5M -a frag=512 -a nbpi=512 -m /fragfs
Based on the parameters chosen, the new /fragfs JFS file system
is limited to a maximum size of 16777216 (512 byte blocks)
New File System size is 262144
# mount /fragfs
__ 32. We have provided the mkfrag script which creates two types of fragmentation in the
/fragfs file system:
• File system fragmentation due to unused fragments smaller than a 4 KB block
• File fragmentation with poor sequentiality
The mkfrag script achieves this by creating files and then removing some of them. It
repeatedly issues a sync command to avoid the consolidation of written blocks that
often occurs with normal asynchronous I/O processing. You may (optionally) wish to
examine the script to see exactly what it does.
Run the script.
# ./mkfrag
__ 33. Check the fragmentation of your new file system using defragfs (query only) to see
the statistics before running defragfs.
# defragfs -q /fragfs

V3.1.0.1
EXempty » The output should be similar to the following:

Statistics before running defragfs:
Number of free fragments : 191649
Number of allocated fragments : 70495
Number of free spaces shorter than a block : 100
Number of free fragments in short free spaces : 318
Examine the fragmentation statistics and report the following:
Question Answer
What is the number of free spaces shorter than a
100
block?
What is the number of free fragments in short free
318
spaces?
__ 34. Use the fileplace command to examine /fragfs/file1 to determine the extent of
logical fragmentation.
# fileplace -v /fragfs/file1
File: /fragfs/file1 Size: 286720 bytes Vol: /dev/fraglv
Blk Size: 4096 Frag Size: 512 Nfrags: 560 Compress: no
Inode: 18 Mode: -rw-r--r-- Owner: root Group: sys
Logical Fragment
----------------
0004568-0004631 64 frags 32768 Bytes, 11.4%
0004640-0004655 16 frags 8192 Bytes, 2.9%
0006189-0006204 16 frags 8192 Bytes, 2.9%
0006208-0006271 64 frags 32768 Bytes, 11.4%
0007744-0007823 80 frags 40960 Bytes, 14.3%
0009253-0009268 16 frags 8192 Bytes, 2.9%
0009280-0009343 64 frags 32768 Bytes, 11.4%
0010784-0010863 80 frags 40960 Bytes, 14.3%
0012299-0012346 48 frags 24576 Bytes, 8.6%
0012352-0012383 32 frags 16384 Bytes, 5.7%
0013765-0013796 32 frags 16384 Bytes, 5.7%
0013824-0013871 48 frags 24576 Bytes, 8.6%
560 frags over space of 9304 frags: space efficiency = 6.0%

12 fragments out of 560 possible: sequentiality = 98.0%

Examine the fragmentation statistics and report the following:
Question Answer
What is the space efficiency%? 6%
What is the sequentiality%? 98%
__ 35. Defragment the file system to recover any small blocks using the following
command:
# defragfs /fragfs
Statistics before running defragfs:
Number of free fragments : 191649
Number of allocated fragments : 70495
Statistics after running defragfs:

Other statistics:
Number of fragments moved : 724
Number of logical blocks moved : 134
Number of allocation attempts : 125
Number of exact matches : 6
Examine the fragmentation statistics after the defragmentation, and report the
following:
Before After
Question
Defragmentation Defragmentation
What is the number of free spaces
100 11
shorter than a block?
What is the number of free
318 36
fragments in short free spaces?

V3.1.0.1
EXempty
__ 36. Was the defragfs effective in reducing the file system fragmentation?
» Yes, the defragfs command reduced the number of free spaces shorter than
a block and the number of free fragments in short free spaces.

Part 6 - Improving Sequentiality Through backup / restore

In this section of the exercise, you will reduce the fragmentation by backing up and
restoring the contents of a file system.
__ 38. Backup the /fragfs file system by name to /tmp/fragfs.bkp. Use the following
command:
# find /fragfs | backup -iqf /tmp/fragfs.bkp
__ 39. Remove all the files from the /fragfs file system.
» # cd /fragfs
» # rm -rf ./*
__ 40. Use the following command to restore all the files from the /fragfs backup:
# restore -xvqf /tmp/fragfs.bkp
New volume on /tmp/fragfs.bkp:

Cluster size is 51200 bytes (100 blocks).
The volume number is 1.
The backup date is: Mon Feb 16 22:25:02 CST 2009
Files are backed up by name.
The user is root.
x 0 /fragfs
x 0 /fragfs/lost+found
x 40960 /fragfs/model
x 286720 /fragfs/file1
x 286720 /fragfs/file2

V3.1.0.1
x 2560 /fragfs/small109
EXempty x 2560 /fragfs/small110
The total size is 2288640 bytes.
The number of restored files is 125.
__ 41. Use fileplace to re-examine the logical fragmentation of /fragfs/file1.

» # fileplace -v /fragfs/file1
File: /fragfs/file1 Size: 286720 bytes Vol: /dev/lv00
Blk Size: 4096 Frag Size: 512 Nfrags: 504 Compress: no
Inode: 17 Mode: -rw-r--r-- Owner: root Group: sys
Logical Fragment
----------------
unallocated 8 frags 4096 Bytes, 0.0%
0004552-0004607 56 frags 28672 Bytes, 11.1%
504 frags over space of 512 frags: space efficiency = 98.4%

8 fragments out of 504 possible: sequentiality = 98.6%
Examine the space efficiency after improving sequentiality:
Space Efficiency Before Space Efficiency After

Improving Sequentiality Improving Sequentiality
6% 98.4%

Part 7 - Monitoring Logical Volume Fragmentation

In this section of the exercise, you will use the lslv command to look at the physical
partition allocation of a logical volume on a specific physical volume.
__ 43. Run the lslv command to look at the physical partition allocation of the hd2 logical
volume on the hdisk0 physical volume.
# lslv -p hdisk0 hd2
hdisk0:hd2:/usr
USED FREE FREE FREE FREE FREE FREE FREE FREE FREE 1-10
FREE FREE FREE FREE FREE FREE FREE FREE FREE FREE 11-20
FREE FREE FREE FREE FREE 51-55
USED USED USED FREE FREE FREE FREE FREE FREE FREE 56-65
FREE FREE FREE FREE 106-109
USED USED USED USED 0001 0002 0003 0004 0005 0006 110-119
0007 0008 0009 0010 0011 0012 0013 USED USED USED 120-129
USED USED FREE FREE FREE FREE FREE FREE FREE FREE 130-139
FREE FREE FREE FREE 160-163
Note: The FREE blocks indicate that the physical partitions that are currently not
being used. The USED blocks indicate that the physical partitions are being used for
another logical volume. The blocks with a number indicate the logical partition
number of the logical volume specified with the lslv -p command. A
non-fragmented logical volume would have all its partitions be contiguous rather
than having partitions from other logical volumes interspersed between them.
__ 44. Try the lslv command on other logical volumes as well.

V3.1.0.1
EXempty
__ 45. If you see a logical volume that is physically fragmented, you can try to group the
partitions together using the reorgvg command. Assuming you have changed the
characteristics of multiple logical volumes, you can use the reorgvg command to
reorganize the logical volumes according to the new intra-allocation policies.
# reorgvg [VGname] [LVname]
# reorgvg rootvg hd2
0516-962 reorgvg: Logical volume hd2 migrated.
END OF LAB


V3.1.0.1
backpg
Back page
®

QV4311Exercise SG Hints

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

QV4311Exercise SG Hints

Uploaded by

Copyright:

Available Formats

V3.1.0.

Front cover

(Course Code QV431)

UNIX Software Service Enablement

April 2009 Edition

© Copyright International Business Machines Corporation 2009. All rights reserved.

Exercise 2. Tuning Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1

Exercise 3. Monitoring CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1

Exercise 4. Virtual Memory Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . 4-1

Exercise 5. I/O Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1

© Copyright IBM Corp. 2009 Contents iii

iv AIX Performance Management I © Copyright IBM Corp. 2009

EXempty Exercise 1. Data Collection and Analysis

Exercise Instructions with Hints

1-2 AIX Performance Management I © Copyright IBM Corp. 2009

Commands Performance monitoring command?

» The table above has been filled in with the answers.

__ 2. Login to your assigned system as the root user.

Does it display the percentage

» The table above has been filled in with the answers.

» The output should be similar to the following:

kthr memory page faults cpu

1-4 AIX Performance Management I © Copyright IBM Corp. 2009

» The output should be similar to the following:

Disks: % tm_act Kbps tps Kb_read Kb_wrtn

» The output should be similar to the following:

System configuration: lcpu=1 ent=0.40 mode=Uncapped

17:50:46 %usr %sys %wio %idle physc %entc

Average 0 0 0 100 0.00 0.6

Does it report memory statistics?

» The output should be similar to the following:

1-6 AIX Performance Management I © Copyright IBM Corp. 2009

__ b. The ps command shows current status of processes, including memory usage.

» The output should be similar to the following:

System configuration: lcpu=1 mem=1024MB ent=0.40 mode=Uncapped

18:01:49 slots cycle/s fault/s odio/s

Does it report I/O statistics?

» The table above has been filled in with the answers.

» The output should be similar to the following:

hdisk0 xfer: %tm_act bps tps bread bwrtn

System configuration: lcpu=1 drives=3 ent=0.40 mode=Uncapped

18:04:21 device %busy avque r+w/s Kbs/s avwait avserv

1-8 AIX Performance Management I © Copyright IBM Corp. 2009

Does it display network statistics?

» The table above has been filled in with the answers.

» The output should be similar to the following:

Net IF Total 33723 7198 7 0

Transmit Statistics: Receive Statistics:

Broadcast Packets: 11 Broadcast Packets: 26399

1-10 AIX Performance Management I © Copyright IBM Corp. 2009

EXempty Part 2 - Using topas to monitor the system

__ 8. Run topas with no options by entering the following command:

__ 9. Stop the topas display by typing <Ctrl-C>.

DATA TEXT PAGE PGFAULTS

__ 11. Stop the topas display by typing <Ctrl-C>.

1-12 AIX Performance Management I © Copyright IBM Corp. 2009

__ 13. Stop the topas display by typing <Ctrl-C>.

__ 15. Stop the topas display by typing <Ctrl-C>.

Part 3 - Installing PerfPMR

__ 17. Create a directory called perf61 under /u/QV431.

__ 18. Change your directory to /u/QV431/perf61.

1-14 AIX Performance Management I © Copyright IBM Corp. 2009

x monitor.sh, 8213 bytes, 17 media blocks.