Professional Documents
Culture Documents
AHQV474 CourseExercises Hints PDF
AHQV474 CourseExercises Hints PDF
cover
Front cover
Course Exercises Guide
with hints
AIX Internals & Performance IV: I/O
Management - Part 2 (Specialized I/O)
Course code AHQV474 ERC 1.1
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in many
jurisdictions worldwide:
AIX 6™ AIX® DB2®
DS8000® Express® FlashSystem™
GPFS™ IBM FlashSystem® Power Systems™
Power® PowerHA® PowerSC™
PowerVM® POWER6® POWER7+™
POWER7® POWER8® PurePower System™
SystemMirror® Systems Director VMControl™
Intel is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United
States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of
Oracle and/or its affiliates.
VMware and the VMware "boxes" logo and design, Virtual SMP and VMotion are registered
trademarks or trademarks (the "Marks") of VMware, Inc. in the United States and/or other
jurisdictions.
Other product and service names might be trademarks of IBM or other companies.
TOC Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
TMK Trademarks
The reader should recognize that the following terms, which appear in the content of this
training document, are official trademarks of IBM or other companies:
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International
Business Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in
many jurisdictions worldwide:
Active Memory™ AIX 6™ AIX®
BladeCenter® DS4000® DS6000™
DS8000® Enterprise Storage Server® POWER Hypervisor™
Power Systems™ Power® PowerVM®
POWER6® POWER7® Storwize®
System Storage®
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Other product and service names might be trademarks of IBM or other companies.
Text highlighting
The following text highlighting conventions are used throughout this book:
Bold Identifies file names, file paths, directories, user names,
principals, menu paths and menu selections. Also identifies
graphical objects such as buttons, labels and icons that the
user selects.
Italics Identifies links to web sites, publication titles, is used where the
word or phrase is meant to stand out from the surrounding text,
and identifies parameters whose actual names or values are to
be supplied by the user.
Monospace Identifies attributes, variables, file listings, SMIT menus, code
examples and command output that you would see displayed
on a terminal, and messages from the system.
Monospace bold Identifies commands, subroutines, daemons, and text the user
would type.
Requirements
In the normal lab environment for this class, each lab team will be
assigned a logical partition (LPAR) on a managed system. The
assigned logical partition should be running AIX 7.1 and should
normally be on a POWER6 or POWER7 processor-based system.
You will not be sitting directly in front of your lab system. Instead, you
will be using your personal PC to connect to your lab system.
Preface
This exercise includes information for you to read, and exercise steps for you to
perform. The following examples illustrate the numbered checklist format used to
identify exercise steps:
__ 1. (This is example step one.) Login to ...
__ 2. (This is example step two.) Execute the following ...
Two versions of these instructions are available: one with hints and one without. You
can use either version to complete this exercise (or flip back and forth between the two
versions). In other words, use these two versions of the exercise in whatever way best
aids your learning. Also, please do not hesitate to ask the instructor if you have
questions.
In some cases, the answer given in a hint may be just an example, and there may be
other correct answers.
All hints are marked by a » sign.
%user %sys %wait %idle physc %entc lbusy vcsw phint %hypv hcalls %nsp
----- ----- ------ ------ ----- ----- ------ ----- ----- ------ ------ -----
0.0 0.2 0.0 99.7 0.00 0.5 0.1 1396566823 6674419 0.0 817 99
__ 7. Use the output you just obtained to answer the following questions:
• How many hypervisor calls have been made since your partition was booted?
• What percentage of processing time has been spent in hypervisor mode since
your system was booted?
» The sample output indicates that 817 hypervisor calls were made from the time the
partition was last booted until the time the output was generated. During that time,
0.0% (rounded to the nearest 0.1%) of the processing time was spent in hypervisor
mode. (The values obtained will differ from system to system. Check the output you
obtain to determine the values for your system.)
__ 8. When the lparstat command is invoked with the -H flag (and no other flags or
other parameters), the resulting output will show detailed information regarding
hypervisor calls since the last time the partition was booted. The following
information will be displayed for each of the hypervisor calls:
• Number of calls: The number of hypervisor calls of this type made
• %Total Time Spent: Percentage of total time spent in this type of call
• %Hypervisor Time Spent: Percentage of hypervisor time spent in this type of call
• Average Call Time: Average call time for this type of call in nanoseconds
• Maximum Call Time: Maximum call time for this type of call in nanoseconds
Enter the lparstat command with the -H flag (and no other parameters) to obtain
detailed hypervisor call information for your system.
__ b. What is the file descriptor (fd) value (integer) returned by the open operation to
the trcprog application program that corresponds to the /unix file? __________
» The expected answer is that the file descriptor number is 3.
» The file descriptor can be found just before the hook id 104, as shown in the
example below:
. . . < some output deleted >. . .
15B trcprog 5832820 kopen 0.000947 open fd
=3
104 trcprog 5832820 kopen 0.000947 return from kop
en [26 usec]
. . . < some output deleted >. . .
» Another way to find the fd value returned by the open operation is to use the
grep command, as shown below:
# grep kopen iotracereport | grep "fd="
15B trcprog 5832820 kopen 0.000947 open fd
=3
__ c. How many times did the trcprog application read data from the /unix file?
__________________
Remember the format of the read subroutine is as follows:
read (FileDescriptor, Buffer, NumberBytes)
» As you can see in the following section of an example trace report, the trcprog
application only issues one read operation on the /unix file.
. . . < some output deleted >. . .
101 trcprog 5832820 kread 0.000948 kread LR = D012
326C
163 trcprog 5832820 kread 0.000948 read(3,
000000002FF21CBC,1000)
. . . < some output deleted >. . .
104 trcprog 5832820 kread 0.000953 return from kre
ad [5 usec]
. . . < some output deleted >. . .
» Another way to see how many times the file has been read is to use the grep
command, as shown below:
# grep "return from kread" iotracereport
104 trcprog 5832820 kread 0.000953 return from kre
ad [5 usec]
__ d. Did the trcprog program create any files? ______________
EXempty » Yes, the trcprog program created a file called file_just_created, as shown
below:
. . . < some output deleted >. . .
101 trcprog 5832820 creat 0.000954 creat LR = 1000
040C
107 trcprog 5832820 creat 0.000955 lookupp
n: file_just_created
. . . < some output deleted >. . .
15B trcprog 5832820 creat 0.008626 open fd
=4 _FREAD _FCREAT _FTRUNC mode=----w----
104 trcprog 5832820 creat 0.008627 return from cre
at [7673 usec]
. . . < some output deleted >. . .
» The 15B event indicates that file descriptor 4 is used to reference the newly
created file.
» Another way to see if the program created a file is to use the grep command, as
shown below:
# grep "^101" iotracereport | grep creat
101 trcprog 5832820 creat 0.000954 creat LR = 1000
040C
__ e. How many times was a write operation performed to the file_just_created file?
________
How many bytes were written each time? _________
Remember the format of the write subroutine:
write (FileDescriptor, Buffer, NumberBytes)
» As you can see in the following sections of the trace, the file_just_created file
has been written to four times with 0x400 (1024) bytes each time.
. . . < some output deleted >. . .
101 trcprog 5832820 kwrite 0.008628 kwrite LR = D01
20104
19C trcprog 5832820 kwrite 0.008628 write(4
,000000002FF21CBC,400)
19C trcprog 5832820 kwrite 0.008629 vnop_rd
wr_write(vp = F1000A06019F0420, offset = 0000000000000000, length = 0400, flags
= 0002, ...) = ...
. . . < some output deleted >. . .
104 trcprog 5832820 kwrite 0.008661 return from kwr
ite [33 usec]
101 trcprog 5832820 kwrite 0.008662 kwrite LR = D01
20104
19C trcprog 5832820 kwrite 0.008662 write(4
,000000002FF220BC,400)
19C trcprog 5832820 kwrite 0.008663 vnop_rd
wr_write(vp = F1000A06019F0420, offset = 0000000000000400, length = 0400, flags
= 0002, ...) = ...
. . . < some output deleted >. . .
104 trcprog 5832820 kwrite 0.008667 return from kwr
ite [5 usec]
101 trcprog 5832820 kwrite 0.008668 kwrite LR = D01
20104
19C trcprog 5832820 kwrite 0.008668 write(4
,000000002FF224BC,400)
19C trcprog 5832820 kwrite 0.008669 vnop_rd
wr_write(vp = F1000A06019F0420, offset = 0000000000000800, length = 0400, flags
= 0002, ...) = ...
. . . < some output deleted >. . .
104 trcprog 5832820 kwrite 0.008673 return from kwr
ite [5 usec]
101 trcprog 5832820 kwrite 0.008673 kwrite LR = D01
20104
19C trcprog 5832820 kwrite 0.008673 write(4
,000000002FF228BC,400)
19C trcprog 5832820 kwrite 0.008674 vnop_rd
wr_write(vp = F1000A06019F0420, offset = 0000000000000C00, length = 0400, flags
= 0002, ...) = ...
. . . < some output deleted >. . .
104 trcprog 5832820 kwrite 0.008678 return from kwr
ite [5 usec]
. . . < some output deleted >. . .
EXempty » Another way to see how many times the file has been written is to use the grep
command, as shown below:
# grep 'write(4' iotracereport
19C trcprog 5832820 kwrite 0.008628 write(4
,000000002FF21CBC,400)
19C trcprog 5832820 kwrite 0.008662 write(4
,000000002FF220BC,400)
19C trcprog 5832820 kwrite 0.008668 write(4
,000000002FF224BC,400)
19C trcprog 5832820 kwrite 0.008673 write(4
,000000002FF228BC,400)
__ 14. Examine the following extract from a trace report:
101 trcprog 5832820 kwrite 0.008628 kwrite LR = D01
20104
19C trcprog 5832820 kwrite 0.008628 write(4
,000000002FF21CBC,400)
19C trcprog 5832820 kwrite 0.008629 vnop_rd
wr_write(vp = F1000A06019F0420, offset = 0000000000000000, length = 0400, flags
= 0002, ...) = ...
59B trcprog 5832820 kwrite 0.008631 JFS2 IO
write: vp = F1000A06019F0420, sid = 8403D0, offset = 0000000000000000, length =
0400
4C3 trcprog 5832820 kwrite 0.008646 VMM WRI
TE: sid=8403D0 src=FFFFF1000FF21CBC dest=FFFFF00000000000 bytes=0400 basecopy_fl
ags=0008
19C trcprog 5832820 kwrite 0.008661 vnop_rd
wr_write(vp = F1000A06019F0420, ext = 0000, ...) = 0000, 0400 bytes moved
104 trcprog 5832820 kwrite 0.008661 return from kwr
ite [33 usec]
In the table below, indicate which kernel I/O layer each event ID belongs to.
System Call Virtual File
File System
Hook Id Interface System VMM Layer
Layer
Layer Layer
101
19C
59B
4C3
104
106
200
460
462
EXempty Hypervisor
Thread Event FLIH and
Hook Id Call
Dispatching Management Clock
Interface
4B0
10C
492
234
100
__ 16. Let your instructor know that you have completed the exercise.
End of exercise
Requirements
In the normal lab environment for this class, each lab team will be
assigned a logical partition (LPAR) on a managed system. The
assigned logical partition should be running AIX 7.1 and should
normally be on a POWER6 or POWER7 processor-based system.
You will not be sitting directly in front of your lab system. Instead, you
will be using your personal PC to connect to your lab system.
© Copyright IBM Corp. 2013 Exercise 2. Possible Disk I/O Configurations 2-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises with hints
Preface
This exercise includes information for you to read, and exercise steps for you to
perform. The following examples illustrate the numbered checklist format used to
identify exercise steps:
__ 1. (This is example step one.) Login to ...
__ 2. (This is example step two.) Execute the following ...
Two versions of these instructions are available: one with hints and one without. You
can use either version to complete this exercise (or flip back and forth between the two
versions). In other words, use these two versions of the exercise in whatever way best
aids your learning. Also, please do not hesitate to ask the instructor if you have
questions.
In some cases, the answer given in a hint may be just an example, and there may be
other correct answers.
All hints are marked by a » sign.
© Copyright IBM Corp. 2013 Exercise 2. Possible Disk I/O Configurations 2-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises with hints
EXempty listed in the output of the lsdev -Cc disk command you entered in a previous
lab step.
How many of the physical volumes listed currently belong to a volume group?
_________
» In the normal lab environment for this class, only one of the disks (hdisk0) listed
will belong to a volume group (rootvg). The other two disks will be placed in a
volume group in a subsequent exercise.
© Copyright IBM Corp. 2013 Exercise 2. Possible Disk I/O Configurations 2-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises with hints
EXempty __ 9. If you want to obtain additional information about a kernel extension listed by the
genkex command, use the following sequence:
__ a. Obtain the fileset name that owns the kernel extension aio.ext by using the
lslpp command with the -w flag.
_______________________________
» The required command and example output are shown below:
# lslpp -w /usr/lib/drivers/aio.ext
File Fileset Type
----------------------------------------------------------
/usr/lib/drivers/aio.ext bos.rte.aio File
__ b. Display the name and the fileset description by using the lslpp command with
the -L flag.
_______________________________
» The required command and example output are shown below:
# lslpp -L | grep aio
bos.rte.aio 7.1.0.1 C F Asynchronous I/O Extension
__ 10. The kdb subcommand lke can also be used to list the loaded kernel extensions.
When invoked with no arguments, the lke subcommand shows a one line summary
of each loader_entry structure in the kernel load list. Enter the lke subcommand at
the kdb prompt and examine the output of this subcommand. Remember that kdb
has a built-in pager; press the <Enter> key to obtain a new page of output.
Note: When kdb starts, it provides an initial display that includes address
information for some key symbols and then provides a prompt. The initial kdb
prompt on your lab system should be (0)>. Various kdb subcommands can be
entered at the kdb prompt.
© Copyright IBM Corp. 2013 Exercise 2. Possible Disk I/O Configurations 2-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises with hints
© Copyright IBM Corp. 2013 Exercise 2. Possible Disk I/O Configurations 2-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises with hints
Note: kdb allows output redirection of subcommands via the operators "|", ">" and
">>". The "|" symbol pipes all output of the command before the symbol, to the
input of the command after the symbol. The ">" operator writes the output of the
command preceding the operator to the file name following the operator; any
existing file is overwritten. The ">>" operator appends the output of the command
preceding the operator to the file name following the operator. This means the
output from kdb commands can be piped to grep to search for a specified pattern.
» The command and an example of output is shown below:
# (0)> lke | grep pcm
57 F1000000A063AC00 059D0000 00030000 02080242
/usr/lib/drivers/aixdiskpcmke
(0)>
At which kernel address has the aixdiskpcmke kernel extension been loaded?
______________________________________
» The 3rd output value is the kernel address at which the extension has been
loaded. So in this example, 0x59D0000 is the address at which the aixdiskpcmke
has been loaded.
» On POWER6 and newer hardware, on AIX 6 and above, kernel extensions that
are explicitly marked as being storage key safe are loaded into segment 0. lke
will display these addresses as 8 digit hex, with the first digit being a 0. Kernel
extensions that are not explicitly marked as being key safe are loaded into a
different kernel segment. The exact segment number depends on the version of
AIX being used, but generally it will be a very large segment number, such as
F10000000, which results in kdb showing a sixteen digit hex number for the
address.
» The 2nd output value is the address of the kernel loader entry structure that is
being described by the lke command.
__ 17. Use the q subcommand of kdb to quit (exit) from kdb.
» The command and an example of output is shown below:
(0)> q
Note: You can also enter e at the kdb prompt to terminate a kdb session and return
to the shell prompt. Recall that many kdb subcommands have aliases and that e is
an alias for q.
EXempty Section 4 - Obtaining information about storage families and the driver
that manages each family
The manage_disk_drivers command shows information about storage families and the
driver that manages each family. It is also used to change the driver for a storage family.
Each driver has its own characteristics, and a system may have additional drivers
installed besides the ones provided by the base AIX operating system.
The output of manage_disk_drivers has 3 columns of output:
• The first column shows the name of the storage system
• The second indicates the current MPIO driver in use
• The third indicates all supported MPIO drivers for the storage system (a comma
separated list)
__ 18. Make sure you are still logged as root.
__ 19. Use the manage_disk_drivers -l command to list all the storage families and their
supported drivers.
» The command and an example of output is shown below:
# manage_disk_drivers -l
Device Present Driver Driver Options
2810XIV AIX_AAPCM AIX_AAPCM,AIX_non_MPIO
DS4100 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4200 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4300 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4500 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4700 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4800 AIX_APPCM AIX_APPCM,AIX_fcparray
DS3950 AIX_APPCM AIX_APPCM
DS5020 AIX_APPCM AIX_APPCM
DS5100/DS5300AIX_APPCM AIX_APPCM AIX_APPCM
DS3500 AIX_APPCM AIX_APPCM
__ 20. Are models of the IBM TotalStorage DS4000 Midrange Disk System supported with
the currently installed drivers?
_____________________________________
» The command and an example of output is shown below:
# manage_disk_drivers -l | grep DS4
DS4100 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4200 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4300 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4500 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4700 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4800 AIX_APPCM AIX_APPCM,AIX_fcparray
© Copyright IBM Corp. 2013 Exercise 2. Possible Disk I/O Configurations 2-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises with hints
» Yes, the default multi-path control module (PCM) supports the IBM TotalStorage
DS4000 Midrange Disk Systems (DS4100, DS4200, DS4300, DS4500, DS4700,
and DS4800).
__ 21. Go to the Support Matrix for Subsystem Device Driver, Subsystem Device Driver
Path Control Module, and Subsystem Device Driver Device Specific Module page at
http://www.ibm.com/support/docview.wss?uid=ssg1S7001350. This page is the
entry point for the device drivers’ interoperability matrixes of the following modules
and storage subsystems:
- Subsystem Device Driver Path Control Module (SDDPCM)
- Subsystem Device Driver Device Specific Module (SDDDSM) for ESS
- DS8000
- DS6000
- DS5000
- DS4000
- DS5020
- DS3950
- SVC
- IBM Storwize V7000
- IBM BladeCenter S SAS RAID Controller Module (RSSM)
Click the Support Matrix for AIX SDD link and explore what information is
available.
__ 22. Let your instructor know that you have completed the exercise.
End of exercise
Requirements
In the normal lab environment for this class, each lab team will be
assigned a logical partition (LPAR) on a managed system. The
assigned logical partition should be running AIX 7.1 and should
normally be on a POWER6 or POWER7 processor-based system.
You will not be sitting directly in front of your lab system. Instead, you
will be using your personal PC to connect to your lab system.
Preface
This exercise includes information for you to read, and exercise steps for you to
perform. The following examples illustrate the numbered checklist format used to
identify exercise steps:
__ 1. (This is example step one.) Login to ...
__ 2. (This is example step two.) Execute the following ...
Two versions of these instructions are available: one with hints and one without. You
can use either version to complete this exercise (or flip back and forth between the two
versions). In other words, use these two versions of the exercise in whatever way best
aids your learning. Also, please do not hesitate to ask the instructor if you have
questions.
In some cases, the answer given in a hint may be just an example, and there may be
other correct answers.
All hints are marked by a » sign.
kopen system call. An example of the event sequence at the start of the kopen
call is shown below:
101 app1 9633975 0.011032 kopen LR = D0119BD8
3B7 app1 9633975 0.011033 SECURITY: refmon exit: rc=0
action=0000000000000015
52F app1 9633975 0.011034 SEC CRED: crref
callfrom=0000000000630D98 callfrom2=0000000000631A70 pid=5111824 (app1
)
107 app1 9633975 0.011034 lookuppn:
/ex3fs/smallfile
» The information shown in the lookuppn (lookup path name) event with hook ID
107 shows the name of the file passed to the kopen call.
» The expected answer is that the app1 program opened a file called
/ex3fs/smallfile.
__ 16. Which file descriptor number was used to reference the file opened by the app1
program? _______________
» The file descriptor information is shown in an event with hook ID 15B just before
the 104 return from kopen event. An example is shown below:
4DF app1 9633975 0.011065 JFS2 iput: vp =
F1000A0601CB2820, count = 0001, ino = 0002, nlink = 0003, getcaller = 33D0EC
15B app1 9633975 0.011065 open fd=3 _FWRITE
104 app1 9633975 0.011066 return from kopen [34 usec]
» The expected answer is that file descriptor number 3 is used to reference the file
opened by app1.
__ 17. How many kread system calls were made by the app1 program to read the contents
of the file that was opened? _______________
» We know that file descriptor number 3 is used for the file opened by app1. We
can count the number of read calls using the following command sequence:
# grep "read(3" app1.rpt | grep app1
163 app1 9633975 0.011067 read(3,00000000200
00848,800)
» The expected answer is that one read call is issued by app1.
__ 18. What was the read request size used with the first kread system call? ___________
» From the event information obtained while answering the previous question, we
can observe the parameters passed to the read routine:
read(3,0000000020000848,800)
» The third parameter is the read request size. The expected answer is 800
(hexadecimal), which is 2048 bytes in decimal.
__ 19. For the first kread system call, was the data requested obtained from the file system
cache? ____________
EXempty » From the information displayed in the output of the previous grep command, we
can determine the timestamp value of the 163 event for the read call. We can
use this information to find the relevant area in the app1.rpt file.
» In the following example output, the read system call event has a timestamp
value of 0.011067.
# grep "read(3" app1.rpt | grep app1
163 app1 9633975 0.011067 read(3,00000000200
00848,800)
» Using this information, we can use vi to examine the app1.rpt file, and then
search for the timestamp value. This will allow us to view the events for the read
system call. An example of the event sequence at the start of the kread call is
shown below:
101 app1 9633975 0.011067 kread LR = D012326C
163 app1 9633975 0.011067 read(3,000000002000
0848,800)
163 app1 9633975 0.011067 vnop_rdwr_read(vp =
F1000A0601CE2820, offset = 0000000000000000, length = 0800, flags = 0003, ...)=
= ...
F1000A0601CE2820, offset = 0000000000000000, length = 0800, flags = 0003, ...)
= ...
59B app1 9633975 0.011068 JFS2 IO read: vp =
F1000A0601CE2820, sid = 87845E, offset = 0000000000000000, length = 0017
100 app1 9633975 0.011070 DATA ACCESS
PAGE FAULT iar=B674 cpuid=00
1B2 app1 9633975 0.011071 VMM pagefault:
V.S=0000.87845E
client_segment
P_DEFAULT 4K large modlist req (type 0)
1B0 app1 9633975 0.011076 VMM page assign:
V.S=0000.87845E ppage=23B73
client_segment
P_DEFAULT 4K large modlist req (type 0)
» The expected answer is that the event sequence shows a data access page fault
immiediately after the JFS2 IO read event. The read request is using offset 0,
reading from the start of the file. The page fault is on page 0 of the segment
being used for the file by the cache. This indicates that the data being requested
by the read call is not currently in the file system cache.
__ 20. For the first kread system call, how many bytes were actually read? ____________
» The following example command sequence shows how to determine the
timestamp value associated with the return from kread 104 event for the app1
program:
# grep "return from kread" app1.rpt | grep app1
104 app1 9633975 0.011519 return from kread [452 usec]
» Using this information, we can use vi to examine the app1.rpt file, and then
search for the timestamp value. This will allow us to view the events for the end
of the read system call. An example of the event sequence at the end of the call
is shown below:
163 app1 9633975 0.011518 vnop_rdwr_read(vp =
F1000A0601CE2820, ext = 0000, ...) = 0000, 0017 bytes moved
104 app1 9633975 0.011519 return from kread [452 usec]
» The expected answer is that 17 (hexadecimal) bytes (23 in decimal) were read,
as indicated in the information in the 163 event.
__ 21. How many kwrite system calls were made by the app1 program to write information
to the file that was opened? _______________
» We know that file descriptor number 3 is used for the file opened by app1. We
can count the number of write calls using the following command sequence:
# grep "write(3" app1.rpt | grep app1
19C app1 9633975 0.011520 write(3,000000002000084
8,1000)
19C app1 9633975 0.011544 write(3,000000002000084
8,1000)
» The expected answer is that two write calls are issued by app1.
__ 22. What was the write request size used with the first kwrite system call? ___________
» From the event information obtained while answering the previous question, we
can observe the parameters passed to the write routine:
write(3,0000000020000848,1000)
» The third parameter is the write request size. The expected answer is 1000
(hexadecimal), which is 4096 bytes in decimal.
End of exercise
Requirements
In the normal lab environment for this class, each lab team will be
assigned a logical partition (LPAR) on a managed system. The
assigned logical partition should be running AIX 7.1 and should
normally be on a POWER6 or POWER7 processor-based system.
You will not be sitting directly in front of your lab system. Instead, you
will be using your personal PC to connect to your lab system.
Preface
This exercise includes information for you to read, and exercise steps for you to
perform. The following examples illustrate the numbered checklist format used to
identify exercise steps:
__ 1. (This is example step one.) Login to ...
__ 2. (This is example step two.) Execute the following ...
Two versions of these instructions are available: one with hints and one without. You
can use either version to complete this exercise (or flip back and forth between the two
versions). In other words, use these two versions of the exercise in whatever way best
aids your learning. Also, please do not hesitate to ask the instructor if you have
questions.
In some cases, the answer given in a hint may be just an example, and there may be
other correct answers.
All hints are marked by a » sign.
EXempty
Time command Conventional I/O (first Direct I/O (second
statistics window) window)
real
user
sys
» The following table contains the example answers:
Time command Conventional I/O Direct I/O (second
statistics (first window) window)
real 0m0.94s 0m40.45s
user 0m0.08s 0m0.10s
sys 0m0.12s 0m0.33s
__ 11. What is happening with the direct I/O test to make it so slow?
________________________________________________________________
» When writing a direct I/O file the file system cache is not used, so the VMM’s
write behind mechanism also is not executed. All DIO writes go directly from the
user buffer to the physical volume, and are considered synchronous.
» Conventional I/O is a cached I/O that has multiple advantages:
- When writing to a new file or extending an existing file, the write system call
copies the data into the file system cache and then returns to the application.
The page in the cache is written to disk at a later time by one of a number of
different kernel mechanisms.
- The 4 KB bytes file page that is brought into the file system cache when a
single byte is read can be re-used by the application upon subsequent read
requests.
- Read-ahead can occur with conventional cached I/O, further reducing the
latency of future read requests.
EXempty » When direct I/O is demoted (alignment not satisfied, in this example) the
requested bytes are copied into the file system cache, then into the application
buffer, incurring the CPU costs of double-copying of data.
» Also, read-ahead will not occur when demoted direct I/O is used.
EXempty __ d. The I/O was supposed to be DIO. Were there any demoted DIOs? _________
# pg filemon1.out
. . . < some output deleted >. . .
Most Active Segments
------------------------------------------------------------------
#MBs #rpgs #wpgs segid segtype volume:inode
------------------------------------------------------------------
. . . < some output deleted >. . .
» The DIO was not demoted because there is no segment activity caused by the
VMM file system cache. You cannot always tell from the filemon output whether
DIOs were not demoted. In those cases, you would need to look at the system
trace file for a deeper analysis.
__ e. What was the most active logical volume? ______________________
» /dev/fslv01 was the most active logical volume.
# pg filemon1.out
. . . < some output deleted >. . .
Most Active Logical Volumes
---------------------------------------------------------------
util #rblk #wblk KB/s volume description
---------------------------------------------------------------
0.87 20480 20480 40923.9 /dev/fslv01 /dio
0.01 0 8 8.0 /dev/loglv00 jfs2log
. . . < some output deleted >. . .
__ f. What was the most active logical volume being written? _________________
» /dev/fslv01 was the most active logical volume being written.
__ g. What was the most active physical volume? ____________________
» /dev/hdisk2 was the most active physical volume.
# pg filemon1.out
. . . < some output deleted >. . .
Most Active Physical Volumes
---------------------------------------------------------------
util #rblk #wblk KB/s volume description
---------------------------------------------------------------
0.88 20480 20488 40931.9 /dev/hdisk2 N/A
Most Active Physical Volumes
. . . < some output deleted >. . .
__ h. What was the utilization of the most active physical volume? _________
» The utilization was 88%.
__ i. For the most active file being written, answer the following questions:
Note: Examine the Detailed File Stats section of the filemon output.
# pg filemon1.out
---------------------------------------------------------------
Detailed File Stats
---------------------------------------------------------------
. . . < some output deleted >. . .
FILE: /dio/file1 volume: /dev/fslv01 inode: 5
opens: 1
total bytes xfrd: 10485760
writes: 40 (0 errs)
write sizes (bytes):avg 262144.0 min 262144 max 262144 sdev 0.0
write times (msec):avg 10.548 min 4.835 max 24.226 sdev 5.880 lseeks: 2
. . . < some output deleted >. . .
a. How many total bytes were transferred? ________________________
10485760 bytes.
b. What was the write size, and was it consistent? ____________________
262144 bytes. The avg, min, and max were the same, so it was consistent.
c. What was the average write time? __________________________
10.548 msec.
__ 19. Run the following command to generate another filemon output to report the I/O
activity on demoted DIO write operations. Notice the output block size of the dd
command (obs=255k.).
# filemon -O all,detailed -o filemon2.out; dd if=/dio/file \
of=/dio/file2 ibs=1024k obs=255k count=10; trcstop
__ 20. Use the filemon2.out report generated in the previous step to answer the following
questions.
» The command and an example of output are shown below:
# pg filemon2.out
Mon Apr 8 12:31:18 2013
System: AIX 7.1 Node: woolf1 Machine: 00066BD2D900
Cpu utilization: 42.0%
Cpu allocation: 97.2%
EXempty » file2.
__ b. Number of reads from this file: ___________
» None.
__ c. Number of writes to this file: ___________
» 41.
__ d. The I/O was supposed to be DIO. Were there any demoted DIOs? ________
# pg filemon2.out
. . . < some output deleted >. . .
Most Active Segments
------------------------------------------------------------------
#MBs #rpgs #wpgs segid segtype volume:inode
------------------------------------------------------------------
10.2 30 2580 81c687 client
. . . < some output deleted >. . .
» Yes, the I/O was demoted. There was a segment associated with I/Os (client
segment type). The segment was probably used by VMM for file system cache
activities. You cannot always tell from the filemon output whether DIOs were not
demoted. In those cases, you would need to look at the system trace file for a
deeper analysis.
__ e. What was the most active logical volume? __________________________
# pg filemon2.out
. . . < some output deleted >. . .
Most Active Logical Volumes
------------------------------------------------------------------
util #rblk #wblk KB/s volume description
------------------------------------------------------------------
0.89 20720 20720 35002.5 /dev/fslv01 /dio
0.02 0 16 13.5 /dev/loglv00 jfs2log
. . . < some output deleted >. . .
» /dev/fslv01 is the most active logical volume.
__ f. What was the most active logical volume being written? _______________
» /dev/fslv01 is the most active logical volumes being written.
What can you summarize about the differences in the two runs?
________________________________________________
» The second run took longer than the first run because it had to use the VMM, so
there was more overhead.
EXempty » 3145924.
__ b. How many read system calls the command issued? _______________
Note: The trace hook ID for the system call entry is 101.
# grep "^101" trace3.out | grep read | wc -l
» 0 (no read system calls).
__ c. How many write system calls did the command issue? _____________
# grep "^101" trace3.out | grep write | wc -l
» 257 write system calls.
__ d. The dio_w command uses the file descriptor 3 (returned by the open system call)
to write the file blocks. How many write system calls did the dio_w command
issue to file descriptor 3? ___________
Note: The system call syntax is write (FileDescriptor, Buffer, NBytes).
The write system call writes NBytes bytes from the buffer pointed to by Buffer to
the file referred by FileDescriptor. The FileDescriptor number used to write
into file3 is 3.
# grep "write(3" trace3.out
19C dio_w 3145924 kwrite 0.001396 write(3,0000000020000CA8,1000)
19C dio_w 3145924 kwrite 0.009015 write(3,0000000020000CA8,1000)
19C dio_w 3145924 kwrite 0.012041 write(3,0000000020000CA8,1000)
. . . < some output deleted >. . .
# grep "write(3" trace3.out | wc -l
250
__ e. What request size was used? ________________
» The request size is the third argument of the write system call which is 0x1000
in hex or 4096 in decimal.
__ f. What was the timestamp for the first write to the file represented by the file
descriptor 3? _____________________________
# grep "write(3" trace3.out | head -1
19C dio_w 3145924 kwrite 0.001396 write(3,0000000020000CA8,1000)
» 0.001396.
__ g. Edit the trace3.out file using vi, and move to the line that has the timestamp
obtained in the previous step. Then, looking at the JFS2 IO write [...] line
(hook ID 59B), what are the offset, length, and SID (segment ID) values?
# vi trace3.out
» You may need to use the / (slash) subcommand to find the appropriate
timestamp. Example: /0.001396
19C dio_w 3145924 kwrite 0.001396 write(3,0000000020000CA8,1000)
19C dio_w 3145924 kwrite 0.001396 vnop_rdwr_write(vp =
F1000A0601323020, offset = 0000000000000000, length = 1000, flags =
8000003, ...) = ...
59B dio_w 3145924 kwrite 0.001398 JFS2 IO write: vp =
F1000A0601323020, sid = 8503B4, offset = 0000000000000000, length =
1000
5CA dio_w 3145924 kwrite 0.001399 VMSVC XMATTACH: caller=26ABF8
pid=FFFFFFFFFFFFFFFF vaddr=20000CA8 count=1000 segflag=0000
59B dio_w 3145924 kwrite 0.001402 JFS2 IO dio move: vp =
F1000A0601323020, sid = 8503B4, offset = 0000000000000000, length =
1000
59B dio_w 3145924 kwrite 0.001416 JFS2 IO dio devstrat: bplist =
F1000005C0160228, vp = F1000A0601323020, sid = 8503B4, lv blk =
8D8F8, bcount = 1000
. . . < some output deleted >. . .
Offset: ______________
» 0000000000000000.
Length: ______________
» 1000.
SID (segment ID): ______________
» 8503B4.
__ h. Looking in the trace using the SID from the last step, can you confirm that no
DIOs were demoted for the application’s writes? ___________________
» Yes, there was no VMM writing activity for that SID. You also see the following
lines for each write:
JFS2 IO write: . . .
JFS2 IO dio move: . . .
JFS2 IO dio devstrat: . . .
JFS2 IO dio iodone: . . .
» The JFS2 IO dio devstrat and JFS2 IO dio iodone are only shown if the DIO
was not demoted (see hook ID 59B)
» This only shows one write. In order to confirm all DIOs were not demoted you
would need to see this sequence for all of the writes.
EXempty Part 5 - Using a trace report to examine demoted direct I/O writes
__ 26. In this section of the exercise, you will examine trace information to analyze
demoted direct I/O writes.
If you are not already logged in, login to your assigned system as the root user and
make sure you are in the /home/QV474/ex4 directory.
__ 27. Run the following command to generate a trace output to examine successful DIO
writes.
# trace -J syscall,jfs2,vnops,filepvld,vmm -x "./dio_w -w 1024000 -b 2048 -f \
/dio/file4"; trcrpt -O exec=on,pid=on,svc=on,timestamp=1 -o trace4.out
__ 28. Explore the trace4.out report generated in the previous step.
# pg trace4.out
Tue Apr 9 07:47:44 2013
System: AIX 7.1 Node: woolf1
Machine: 00066BD2D900
Internet Address: 091B198D 9.27.25.141
At trace startup, the system contained 2 cpus, of which 2 were traced.
Buffering: Kernel Heap
This is from a 64-bit kernel.
Tracing only these hooks, 1010,1040,1060,1070,10a0,10b0,1200,1290,12e0,1300,1340
,1350,1360,1390,13a0,13c0,13d0,14c0,1500,1520,1540,1560,15b0,1630,1640,1670,1680
,18b0,1940,19c0,1a00,1b00,1b10,1b20,1b30,1b40,1b50,1b60,1b70,1b80,1b90,1ba0,1bb0
,1bc0,1bd0,1be0,1bf0,1d90,1da0,1db0,1dc0,1dd0,2210,2220,2230,2a00,2a10,2a20,2fc0
,45a0,45b0,4a50,4c10,4c20,4c30,4ca0,4cb0,4cc0,4cd0,4ce0,4cf0,4d00,4d10,4d20,4d30
,4d40,4d50,4db0,4de0,4df0,4e00,4e10,4e20,4e30,4e40,4ef0,4f00,4f10,4f20,4f90,4fa0
,59b0,5c00,5ca0,62c0
EXempty » You may need to use the / (slash) subcommand to find the appropriate
timestamp. Example: /0.019911
19C dio_w 6815850 kwrite 0.019911 write(3,0000000020000CA8,800)
19C dio_w 6815850 kwrite 0.019911 vnop_rdwr_write(vp =
F1000A0601371020, offset = 0000000000000000, length = 0800, flags
= 8000003, ...) = ...
59B dio_w 6815850 kwrite 0.019914 JFS2 IO write: vp =
F1000A0601371020, sid = 8345CD, offset = 0000000000000000, length =
0800
59B dio_w 6815850 kwrite 0.019915 JFS2 IO dio move: vp =
F1000A0601371020, sid = 8345CD, offset = 0000000000000000, length =
0800
4C3 dio_w 6815850 kwrite 0.019922 VMM WRITE: sid=8345CD
src=FFFFF10000000CA8 dest=FFFFF00000000000 bytes=0800
basecopy_flags=0008
59B dio_w 6815850 kwrite 0.020004 JFS2 IO dio demoted: vp =
F1000A0601371020, mode = 0001, bad = 0002, rc = 0000, rc2 = 0000
19C dio_w 6815850 kwrite 0.024013 vnop_rdwr_write(vp =
F1000A0601371020, ext = 0000, ...) = 0000, 0800 bytes moved
104 dio_w 6815850 kwrite 0.024014 return from kwrite [4104 usec]
101 dio_w 6815850 kwrite 0.024016 kwrite LR = D0120104
19C dio_w 6815850 kwrite 0.024017 write(3,0000000020000CA8,800)
19C dio_w 6815850 kwrite 0.024018 vnop_rdwr_write(vp =
F1000A0601371020, offset = 0000000000000800, length = 0800, flags
= 8000003, ...) = ...
59B dio_w 6815850 kwrite 0.024018 JFS2 IO write: vp =
F1000A0601371020, sid = 8345CD, offset = 0000000000000800, length =
0800
. . . < some output deleted >. . .
Offset: _________________
» 0000000000000000.
Length: ______________
» 800.
SID (segment ID): ______________
» 8345CD.
__ g. Looking in the trace using the SID from the last step, can you confirm that DIOs
were demoted for the application’s writes? ___________________
» You can use to ioo -o <tunable> that lists the current value or ioo -L
<tunable> for all the tunable value information:
# ioo -o aio_minservers
aio_minservers = 3
# ioo -o aio_maxservers
aio_maxservers = 30
# ioo -o aio_maxreqs
aio_maxreqs = 131072
# ioo -o aio_server_inactivity
aio_server_inactivity = 300
# ioo -o aio_active
aio_active = 0
aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle %
iowait physc % entc
5002.2 0.0 1138 0 60 3.8 30.6 33.0
32.7 0.1 36.2
aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle %
iowait physc % entc
5369.7 0.0 1198 0 60 3.5 31.4 32.8
32.3 0.1 36.9
__ 40. When the ndiskaio program finishes, stop the iostat command, and then answer
the following questions:
__ a. Has the AIO kernel extension been used and pinned? ______________
» Yes, the ioo parameter, aio_active, has been changed to 1.
# ioo -o aio_active
aio_active = 1
__ b. How many AIO servers were created? _________________
» You can find this answer by looking at the serv field of the iostat -AQ command
or by using the ps command. In our example, 60 AIO servers were created.
# iostat -AQ 5
. . . < some output deleted >. . .
aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle %
iowait physc % entc
5002.2 0.0 1138 0 60 3.8 30.6 33.0
32.7 0.1 36.2
. . . < some output deleted >. . .
# ps -k | grep aioserver | wc -l
60
__ c. How does the number of AIO servers created compare to the minimum and
maximum AIO servers allowed?
» In this sample output, 60 AIO servers were created. aio_minservers is 3 (per
CPU) and aio_maxservers is 30 (per CPU). There are two logical CPUs on the
system.
# ioo -o aio_minservers
aio_minservers = 3
# ioo -o aio_maxservers
aio_maxservers = 30
# mpstat -s
System configuration: lcpu=2 ent=0.2 mode=Capped
Proc0
0.08%
cpu0 cpu1
0.06% 0.02%
__ 41. Once the ndiskaio command finished and if there are no more AIO requests, how
long will all the AIO servers stay active? After that time, how many will remain
active?
» The ioo parameter, aio_server_inactivity, determines how long AIO servers
stay active with no AIO requests to handle. However, once the number of AIO
EXempty servers is above the ioo parameter aio_minservers, it will not fall below that
value. In our example, aio_server_inactivity is 300 seconds. After 300
seconds of inactivity, the number of AIO servers will be 6 (aio_minservers is 3
(per CPU)).
__ 42. Change the minimum number of (Legacy AIO) AIO servers to 0.
» Use the ioo command to change aio_minservers to 0:
# ioo -o aio_minservers=0
Setting aio_minservers to 0
__ 43. Has the number of active AIO servers been reduced to 0? If not, why not?
» Use the command to see the number of active AIO servers:
# ps -k | grep aioserver | wc -l
» If you issue these commands within 300 seconds of the termination of the
ndiskaio command, you will still see active AIO servers. The number of AIO
servers is not automatically reduced to 0. The system waits until the
aio_server_inactivity time is up before deciding if the number of AIO servers
should be reduced.
# ps -k | grep aioserver | wc -l
60
__ 44. Change the aio_server_inactivity time to 30 seconds.
» Use the ioo command to change aio_server_inactivity to 30:
# ioo -o aio_server_inactivity=30
Setting aio_server_inactivity to 30
__ 45. After 30 seconds, has the number of active AIO servers been reduced to 0 (the
minimum number you set it to in the previous step)?
» Each AIO server sleeps for the number of seconds indicated by the
aio_server_inactivity time that was in effect when going to sleep. Changing
the value of this setting will not impact the sleep duration of AIO servers that are
already sleeping. If no AIO requests have been placed on the queue, then no
AIO servers will have been woken up. Therefore each server will be sleeping for
300 seconds. The number of AIO servers you observe will depend on whether
300 seconds has passed since they previously went to sleep.
Part 8 - Comparing JFS2 AIO and JFS2 AIO with CIO accesses
__ 46. If you are not already logged in, login to your assigned system as the root user and
change directory to /home/QV474/ex4.
In this section, you will run a program named ndiskaio that generates AIO
requests. You will do this two times, once for conventional AIO and then again for
AIO using CIO accesses, and then compare the results.
__ 47. In the previous part of the exercise, you set aio_minservers to 0 and
aio_server_inactivity to 30 seconds. You will keep these settings so the
number of AIO servers get set back to 0 after each test. Verify there are no active
AIO servers on the system.
» Use the following command to see the number of active AIO servers:
# ps -k | grep aioserver | wc -l
0
__ 48. Open a second window to your system, login as the root user and run the following
command:
# iostat -A 5
__ 49. In the first window, verify the /aio file system is mounted. If it is not, mount it. Then,
run the following command:
# ./ndiskaio -A -f /aio/bigfile -S -r75 -b4096 -t20 -M20 -X60
__ 50. When the ndiskaio program finishes, stop the iostat -A command in the other
window. Record the following values:
From the ndiskaio output:
- Total disk I/O: ___________________________ 62640
From the iostat -A output (Use the interval that has the highest values):
- avgc: ___________________________ 3171.2
- maxfc: _________________________ 0
- maxreqs: _______________________ 60
EXempty __ 52. The second test will use a JFS2 file system that is mounted with the CIO option.
Mount the /aio file system with the following command:
# mount -o cio /aio
__ 53. Run the iostat -A 5 command again in the second window.
__ 54. In the first window, run the following command:
# ./ndiskaio -A -f /aio/bigfile -S -r75 -b4096 -t20 -M20 -X60
__ 55. When the ndiskaio program finishes, stop the iostat -A command in the other
window. Record the following values:
From the ndiskaio output:
- Total disk I/O: ___________________________ 18960
From the iostat -A output (Use the interval that has the highest values):
- avgc: ___________________________ 0.0
- maxgc: __________________________ 0
- maxfc: _________________________ 0
- maxreqs: _______________________ 0
- physc: __________________________ 0
Transfer your data from the two tests into the table below, then answer the
following questions.
EXempty Note: AIO I/Os performed against raw logical volumes or files opened in CIO
mode do not use the aioserver kprocs. ___________________________
» The maxreqs field specifies the maximum number of asynchronous I/O requests
that can be outstanding at one time, one per AIO server. AIO I/Os performed
against files opened in CIO mode do not use AIO server processes.
__ f. For JFS2 without CIO, why are the avfc and maxfc values zero? _______
» When the data being accessed asynchronously is located in a JFS2 file system,
it is not using fast path and all I/O is routed through the AIO servers.
__ g. Why is there iowait time for the JFS2 without CIO but not for JFS2 with CIO
access? _____________________________________________________
» AIO servers are being used for the JFS2 without CIO, so they I/O operations and
wait. When a processor has initiated an I/O request and there is nothing else for
the processor to do, it logs that time as iowait.
__ 56. Umount the /convio, /dio, and /aio file systems.
» The commands are shown below:
# umount /convio
# umount /dio
# umount /aio
__ 57. Remove the /convio, /dio, and /aio file systems and mounting points.
» The commands and example of outputs are shown below:
# rmfs -r /convio
rmlv: Logical volume fslv00 is removed.
# rmfs -r /dio
rmlv: Logical volume fslv01 is removed.
# rmfs -r /aio
rmlv: Logical volume fslv02 is removed.
__ 58. Remove physical volume hdisk2 from the volume group ex4vg.
» The commands and example of outputs are shown below:
# reducevg -d -f ex4vg hdisk2
rmlv: Logical volume loglv00 is removed.
ldeletepv: Volume Group deleted since it contains no physical
volumes.
__ 59. Let your instructor know that you have completed the exercise.
End of exercise
backpg
Back page