Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

IBM Power Systems Technical University featuring IBM AIX and Linux

September 8 12, 2008 Chicago, IL

AIX 6.1 Performance Differences


Session ID: pAI09
Speaker: Steve Nasypany

2008 IBM Corporation

IBM Training

Introduction
VMM Page Replacement New defaults reducing the requirement for basic performance tuning VMM File IO Pacing Enabled By Default Performance Tunables Tunables are categorized into restricted and non-restricted tunables AIO Dynamic AIO tuning AIO Fast Path for CIO JFS2 Read only access to files opened with CIO NFS Changes to TCP scaling window, R/W size and number of biod daemons Enhanced JFS no-log option MPSS support

2008 IBM Corporation

IBM Training

Review AIX Page replacement algorithm


When page replacement begins to run, it selects a page type to steal based on: If the amount of file pages is above maxclient/maxperm, file pages are chosen If the number of file pages is between minperm and maxclient, the type is chosen based on re-paging history If the amount of file pages is below minperm, working storage and file pages are chosen without checking for the re-paging history Re-paging history indicates if individual pages have been written to disk and read back recently Re-paging history adds a degree of uncertainty to the selection process If re-paging history decides to pick working storage pages, system paging may begin This was intended as a safety valve, if we are too aggressive in stealing file pages, stop But, sometimes it is triggered by bad luck If re-paging history decides to pick file pages and many file pages are dirty, heavy writes to disk can occur This would probably happen eventually due to sync

How much memory is caching files

100% Pick file pages Pick file pages -orw/s pages based on recent history Pick any pages 0%
Minperm=20% Maxclient = Maxperm=80%

Contents of system memory

2008 IBM Corporation

IBM Training

AIX v5 vs v6 VMM Page Replacement tuning


AIX 5.2/5.3 AIX 6.1

minperm% = 20 maxperm% = 80 maxclient% = 80 strict_maxperm = 0 strict_maxclient = 1 lru_file_repage = 1 page_steal_method = 0

minperm% = 3 maxperm% = 90 maxclient% = 90 strict_maxperm = 0 strict_maxclient = 1 lru_file_repage = 0 page_steal_method = 1

On AIX 6.1, no paging to the paging space will occur unless the system memory is over committed (AVM > 97%)

2008 IBM Corporation

IBM Training

Legacy page_steal_method=0
Partition memory is broken up into page pools A page pool is a set of physical pages organized into a list One lrud per memory pool Inside each memory pool is a mix of working storage and file pages When the free list is depleted, lrud scans its page pool one scan bucket (default 128k pages) at a time The scan can be targeted for working storage pages, file pages, or either If scanning for file pages and the number of file pages is small (e.g. max_client=10) the ratio of scanned pages to freed pages will be high (e.g. 10:1) This reduces performance in two ways: CPU time in lrud Fragmentation of memory which can result in I/O coalescing being less effective

Page

List of pages
Page Pool 0

scan for either w/s or file

Page

List of pages
Page Pool 1 System Memory

scan for either w/s or file

2008 IBM Corporation

IBM Training

List-based LRU page_steal_method=1


Partition memory is broken up into page pools A page pool is a set of physical pages There are two lists for a page pool, one that is working storage pages and another that is file pages One lrud per memory pool When the free list is depleted, lrud scans the appropriate list for the type of pages it desires one scan bucket (128k pages) at a time If scanning for file pages and the number of file pages is small (e.g. max_client=10) the ratio of scanned pages to freed pages should be low (e.g. 2:1 1:1) This improves performance in two ways: CPU time in lrud is reduced due to less scanning IO Coalescing is better preserved for reading and writing of files larger than memory

List of w/s pages

Page scan for w/s

List of file pages

Page scan for file

Page Pool 0

2008 IBM Corporation

IBM Training

VMM File IO Pacing Enabled By Default


IO Pacing Enabled By Default Prevents system responsiveness issues due to large quantities of writes Limits the maximum number of pages of I/O outstanding to a file
Without I/O pacing a program can fill up large amounts of memory with written pages. Those queued I/Os can result in long waits for other programs using the storage Better solution than the file system write behind techniques

New defaults
Not very aggressive, intended to limit one or a few programs from impacting system responsiveness. Values high enough not to impact sequential write performance maxpout = 8193 minpout = 4096

2008 IBM Corporation

IBM Training

Performance Tunables
Tunables now in two categories Restricted Tunables
Should not be changed unless recommended by AIX development or development support Are not shown by tuning commands unless the F flag is used Dynamic change will show a warning message Permanent change must be confirmed Permanent changes will cause an error log entry at boot time

Non-Restricted Tunable
Can have restricted tunables as dependencies

2008 IBM Corporation

IBM Training

Changing restricted tunables


Changing a restricted tunable dynamically
>

ioo -o aio_sample_rate=6 Warning: a restricted tunable has been modified

A dynamic change of a restricted tunable will inform the user.

Changing a restricted tunable permanently


ioo -po aio_sample_rate=6 Modification to restricted tunable aio_sample_rate, confirmation yes/no

A permanent change of a restricted tunable requires a confirmation from the user. Note: The system will log changes to restricted tunable in the system error log at boot time.
2008 IBM Corporation

IBM Training

List restricted tunables


> ioo -aF aio_active = 0 aio_maxreqs = 65536 ... posix_aio_minservers = 3 posix_aio_server_inactivity = 300 ##Restricted tunables aio_fastpath = 1 aio_fsfastpath = 1 aio_kprocprio = 39 aio_multitidsusp = 1 aio_sample_rate = 5 aio_samples_per_cycle = 6 j2_maxUsableMaxTransfer = 512 j2_nBufferPerPagerDevice = 512 j2_nonFatalCrashesSystem = 0 j2_syncModifiedMapped = 1 j2_syncdLogSyncInterval = 1
2008 IBM Corporation

IBM Training

TUNE_RESTRICTED Error Log Entry


LABEL: IDENTIFIER: Date/Time: Sequence Number: Machine Id: Node Id: Class: Type: WPAR: Resource Name: TUNE_RESTRICTED D221BD55 Thu May 24 15:05:48 2007 637 000AB14D4C00 quake O INFO Global perftune

Description RESTRICTED TUNABLES MODIFIED AT REBOOT Probable Causes SYSTEM TUNING User Causes TUNABLE PARAMETER OF TYPE RESTRICTED HAS BEEN MODIFIED Recommended Actions REVIEW TUNABLE LISTS IN DETAILED DATA Detail Data LIST OF TUNABLE COMMANDS CONTROLLING MODIFIED RESTRICTED TUNABLES AT REBOOT, SEE FILE /etc/tunables/lastboot.log

2008 IBM Corporation

IBM Training

Why you ask?


The number of tunables in AIX had grown to a ridiculously large number 5.3 TL06: vmo 61, ioo 27, schedo 42, no 135, plus a few others 6.1 vmo 29, ioo 21, schedo 15, no 133, plus a few others The potential combinations that exist are too huge to effectively test and document Many of the tunables had been created to deal with very specific customers or situations which dont apply often This wasnt done in a vacuum, a survey of support and recent situations was employed to identify the commonly used tunables (which remain unrestricted) If a restricted tunable must be changed, a PMR should be opened to identify the issue
2008 IBM Corporation

IBM Training

General trend toward file system I/O with concurrent I/O


Concurrent I/O (CIO) has been a feature of AIX since AIX 5.2 http://www.ibm.com/systems/p/os/aix/whitepapers/db_perf_aix.pdf Concurrent I/O gives applications which do internal buffering of disk I/O and locking a means of by-passing operating system caching and i-node file locking This improves CPU efficiency of I/O to very near that of raw logical volumes And improves scalability by eliminating operating system i-node locking in the read/write paths Concurrent I/O is not for all applications Some applications require operating system i-node locking to function correctly Other applications do not do sophisticated storage buffering and benefit from caching in the operating system or read-ahead/writebehind mechanisms that the AIX virtual memory management subsystem provide to improve sequential file performance

2008 IBM Corporation

IBM Training

CIO and Applications


DB2 Version 9.5 implements CIO as the DEFAULT mechanism for table spaces on AIX NO FILE SYSTEM CACHING/FILE SYSTEM CACHING clauses on CREATE TABLESPACE or ALTER TABLESPACE View caching DB2 GET SNAPSHOT FOR TABLES ON db DB2 has supported CIO since V8.1 http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0408lee/

Oracle 10g/11g have support, but it is not a default Requires filesystemio_options is SETALL or DIRECTIO CIO is the recommended deployment solution for JFS2, however some 3rd party tools have issues

2008 IBM Corporation

IBM Training

CIO and Applications


If you use legacy VMM tuning (e.g AIX 5.2/5.3 defaults) and you switch an application from non-CIO to CIO operation, you will likely need to retune The amount and distribution of memory may change quite radically Usually, switching file usage to CIO reduces the memory required, as the operating system no longer will be buffering file pages for those files Upgrading from DB2 9.1 (non-CIO) to DB2 9.5 may require some tuning preparation With AIX 6.1 default tuning, it should not be necessary to change tuning when converting from non-CIO to CIO operation

2008 IBM Corporation

IBM Training

AIX 6.1 AIO Support


Interface Changes All the AIO entries in the ODM and AIO smit panels have been removed The aioo command will not longer be shipped All the AIO tunables have current, default, minimum and maximum value that can be viewed with ioo AIO kernel extension loaded at system boot Applications no longer fail to run because you forgot to load the kernel extension (you may applaud here)
No AIO servers are active until requests are present Extremely low impact on memory requirements with this implementation

2008 IBM Corporation

IBM Training

Improvements to AIO CIO


AIO Fast Path for CIO enabled by default
With the fast path, the AIO server threads no longer participate in the I/O path By removing the AIO servers from the path, we get three things
The removal of AIO servers as any potential resource bottleneck The reduction in path length for AIO read/write services, as less dispatching is required Potentially better coalescing of sequential I/O requests initiated through AIO or LISTIO services

Application
Application File System

AIO Server

File System LVM

FS no Fast Path

Device Driver

Application File System LVM Device Driver

Fast Path enabled for LV and PVs for a long time

CIO Fast Path

No change in behavior for environments such as Oracle 10G/ASM on raw hdisks

2008 IBM Corporation

IBM Training

General improvements to AIO


The number of AIO servers varies between minservers and maxservers (times #CPUs), based on workload AIO servers stay active as long as they service requests Number of AIO server dynamically increased/reduced based on the demand of the workload aio_server_inactivity defines after how many seconds idle time an AIO server will exit Do not confuse no active servers with kernel extension not loaded. The kernel extension is always loaded Changes to AIO tunables are dynamic through ioo Changes do not require system reboot minservers is changed to a per CPU tunable maxservers is changed to 30 maxreqs is changed to 65536 Benefit No longer necessary to tune the minservers/maxservers/maxreqs as in the past

2008 IBM Corporation

IBM Training

AIO Tunables
>

ioo -a aio_active = 0 aio_maxreqs = 65536 aio_maxservers = 30 aio_minservers = 3 aio_server_inactivity = 300 posix_aio_active = 0 posix_aio_maxreqs = 65536 posix_aio_maxservers = 30 posix_aio_minservers = 3 posix_aio_server_inactivity = 300

2008 IBM Corporation

IBM Training

AIO Restricted Tunables


> ioo -aF ... ##Restricted tunables aio_fastpath = 1 aio_fsfastpath = 1 aio_kprocprio = 39 aio_multitidsusp = 1 aio_sample_rate = 5 aio_samples_per_cycle = 6 posix_aio_fastpath = 1 posix_aio_fsfastpath = 1 posix_aio_kprocprio = 39 posix_aio_sample_rate = 5 posix_aio_samples_per_cycle = 6
2008 IBM Corporation

IBM Training

CIO Read Mode Flag


Allows an application to open a file for CIO such that subsequent opens without CIO avoid demotion In the past, a 2nd opening of a file without CIO, would cause demotion which removes many of the benefits of CIO The 2nd read-only opening without CIO will still result in that opening having uncached reads to the file. Thus, such programs should ensure that the I/O sizes are large enough to achieve I/O efficiency Example, a backup application can access database files in read only mode while the database has the file opened in concurrent IO mode open() flag is O_CIOR procfiles does not reflect O_CIO/O_CIO_R currently

kdb 'u <slotnumber>' then for each file listed there 'file <filepointer>' gives some info

2008 IBM Corporation

IBM Training

NFS Performance Improvements


RFC 1323 enabled by default Allows for TCP window scaling beyond 64K, so more one-way packets in-flight allowed between acks for large sequential transfers. We had the nfs_rfc1323 tunable before, it just wasn't enabled by default. Increase default number of biod daemons 32 biod daemons per NFS V3 mount point Very slight increase in memory (<2MB) required over previous default of 4 Enables more I/Os to be outstanding at the same, doesnt speed sequential operations much, but helps random access (e.g. OLTP) Default read/write size increased to 64k for TCP connections Was 32k previously

2008 IBM Corporation

IBM Training

NFS biod changes


Having more biods allows better read-ahead and writebehind However, measured on a single-process basis, dont have huge performance differences over the AIX 5.3 defaults Results should improve in tests with multiple processes/threads operating over NFS NFS client tests, p5 520 on 1GB Ethernet with 64kB I/Os (next slide)

2008 IBM Corporation

IBM Training

NFS biod changes


NFS single process throughput, over 256MB file
120000 100000 MB/second 80000 60000 40000 20000 0
se rv er re un ad ca se ch q ed re se ad rv er ra nd ca ch se ed rv er un ca w ch rit ed e se q ov er w rit w e rit e se q cr ea w rit te e ra nd cr ea te

32biod 4biod

re ad

se q

2008 IBM Corporation

IBM Training

NFS biod change with Kerberos krbp5


The increase in biods has a much more positive impact when using Kerberos DES security Overlapping more compute with network traffic through more biods greatly improves throughput Same model as previous chart, krbp5 (full packet encryption) mount option
MB/sec

NFS biod changes with Kerberos


70000 60000 50000 40000 30000 20000 10000 0
he d d e nc ac he d rw ri t e rc ac cr ea t ch e un ca ov e se q se rv er u se rv e se rv er ra nd q cr ea te

32biod 4biod

rit e w

se

se

re ad

re ad

re

ad

ra

nd

w rit e

se q

w r it e

2008 IBM Corporation

IBM Training

Enhanced JFS nolog option


JFS2 standard metadata logging for filesystem integrity disabled via a mount option Similar to legacy JFS nointegrity option Meant to enable faster migration of data to new storage File system operation with heavy file create/delete activity can create log bottlenecks Potentially useful for temporary file systems where the filesystem can be easily recreated or fscked Mount o log=NULL during data migration phase, then unmount and mount with standard logging

2008 IBM Corporation

IBM Training

Enhanced JFS nolog option - example


4-way POWER5 p550, PHP test Wikibench Test makes heavy use of file meta-data With single disk setup, bottleneck on disk writes to Enhanced JFS2 logs
%disk busy
Throughput

PHP Wikibench
90 80 70 60 50 40 30 20 10 0 Default log nolog

Disk utilization over time


100 80 60 40 20 0 time default log nolog

With nolog, the log bottleneck is avoided

2008 IBM Corporation

IBM Training

Multiple Page Size Segment (MPSS) Support


POWER6 provides hardware support for mixing 4kB pages and 64kB pages in the same hardware segment This allows the AIX operating system to transparently to an application promote small pages to medium pages This typically improves performance by reducing stress on hardware translation mechanisms It is controlled with the vmo vmm_default_pspa parameter (-1 turns off) This behavior is enabled as a default on AIX 6.1 on POWER6 hardware Since it is not supported on POWER5, systems running identical application conditions on POWER5 and POWER6 may differ on exact memory page usage In general, no increase in memory consumption should be noticed, however the usage of 64kB pages may increase on POWER6 System paging activity may result in 64kB pages being broken into 4kB pages 64kB pages that are broken by paging wont usually be reconstituted into 64kB pages later
2008 IBM Corporation

IBM Training

MPSS Using svmon to see MPSS segments


svmon P 553068 Pid Command 553068 java PageSize s 4 KB Inuse 44652 Inuse 1132 2720 Pin 8388 Pin 244 509 Pgsp Virtual 64-bit Mthrd 16MB 37623 73342 Pgsp 4055 2098 Virtual 4798 4284 N Y N

m 64 KB

Vsid 51b10 0 3c02d d3a7 61adc 65add 51ad0 75ad9

Esid Type Description 3 work working storage 0 work kernel segment

PSize Inuse m m m sm s m m m 1879

Pin Pgsp Virtual 0 1946 3068 47 85 561 612

520 507 297 582 0

d work text or shared-lib code seg e work shared memory segment - work f work working storage 2 work process private 1 work code

0 3744 4096 311 702 17 2 1 36 5 2

549 244 20 3 1 0 2 0

2008 IBM Corporation

IBM Training

MPSS Using svmon to detail MPSS segments


svmon D d3a7 Segid: d3a7 Type: working PSize: sm (4 KB - 64 KB) Address Range: 0..4095 Size of page space allocation: 3744 pages ( 14.6 MB) Virtual: 4096 frames (16.0 MB) Inuse: 582 frames ( 2.3 MB) Page Psize 0 1 2 382 435 m m m s s Frame 442176 442177 442178 362140 430534 Pin Y Y Y N N ExtSegid 2008 IBM Corporation

ExtPage

IBM Training

Implementation Considerations
AIX 5.2/3 to AIX 6.1 migration example (DB2 performance tuning) AIX 5.2/5.3 VMM page replacement tuning
reduce minperm, maxperm, maxclient turn off strict_maxclient increase minfree, maxfree Enable AIO Tune minservers, maxservers and reboot Enable CIO

AIX 6.1 VMM page replacement tuning


NO TUNING REQUIRED

AIO tuning
NO TUNING REQUIRED

AIO tuning

DB2 tuning
Enable CIO

DB2 tuning

2008 IBM Corporation

IBM Training

Implementation Considerations (Contd)


Best Practices Do not apply legacy tuning since some tunables may now be restricted If you do an upgrade install, your old tunings will be preserved You may wish to undo them, but we wont make you This level of tune was been applied to numerous AIX 5.3 customers through field support We are confident this was a good thing However, we try to never change defaults in the service stream, so AIX 5.3 remains as it was Change restricted tunables only if recommended by AIX support

2008 IBM Corporation

IBM Training

Implementation Considerations (Contd)


Problem Determination Common problems - seen in field or lab Legacy VMM tuning results in error log entries (TUNE_RESTRICTED) Tuning scripts fail due to required confirmation for permanent changes of restricted tunables Install/tuning scripts fail due missing aio0 device Diagnostics Check AIX errpt for TUNE_RESTRICTED Check /etc/tunables/lastboot.log PERFPMR

2008 IBM Corporation

IBM Training

Trademarks
The following are trademarks of the International Business Machines Corporation in the United States, other countries, or both.
Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not actively marketed or is not significant within its relevant market. Those trademarks followed by are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.

For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml:


*, AS/400, e business(logo), DBE, ESCO, eServer, FICON, IBM, IBM (logo), iSeries, MVS, OS/390, pSeries, RS/6000, S/30, VM/ESA, VSE/ESA, WebSphere, xSeries, z/OS, zSeries, z/VM, System i, System i5, System p, System p5, System x, System z, System z9, BladeCenter

The following are trademarks or registered trademarks of other companies.


Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.
* All other products may be trademarks or registered trademarks of their respective companies. Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

34

2008 IBM Corporation

You might also like