Linux Tuning For ASE and IQ

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

RUNNING ASE AND IQ ON

REDHAT ENTERPRISE LINUX


SPECIAL CONSIDERATIONS AND
CONFIGURATION
CHRIS BROWN
OPTIONS
SANJAY RAO
TECHNICAL EVANGELIST PRINCIPAL SOFTWARE
SYBASE, INC. ENGINEER
cbrown2@sybase.com REDHAT, INC.
1 Sybase Confidential 8/11/10 srao@redhat.com
Agenda

A few things to remember....


Tuning the Linux Kernel

Memory

HUGEPAGES

NUMA

swappiness

Tuning I/O parameters



I/O Elevators

Filesystems

Q&A
Appendix
2 Sybase Confidential 8/11/10
A Few Session Notes ....
This is not a generalized ASE performance and
tuning session
Focus is more on Linux OS tunables that directly affect
ASE
No discussion of sp_configure parameters
Assumption is you have already tuned ASE for your
application :)
This is not a generalized Linux tuning session either
Only discusses those areas that directly affect
(improve) ASE performance
Recommendations are from RH benchmarking and SY
experience
YMMV when in doubt, benchmark!

3 Sybase Confidential 8/11/10


The General Rules Still Apply
What's suggested here won't replace general good
practices
Make sure you have sized the hardware correctly.
Use the right RAID level (don't use RAID 5 except
maybe on read-only type databases)
Make sure you've tuned ASE (caches, memory, indexes,
I/O, etc) correctly
None of these suggestions will help if you have slow
CPU's or disk
Or a poorly designed schema, or no indexes, or other
general ASE (only) P&T problems
What's here is the boost to add performance to an
already tuned database server

4 Sybase Confidential 8/11/10


Tuning the Linux Kernel

5 Sybase Confidential 8/11/10


Sample TPC-H Benchmark....

http://www.redhat.com/rhel/resource_center/reference_architecture.html

6 Sybase Confidential 8/11/10


Kernel Performance/Scalability
RHEL 5's kernel includes many new performance
features:
Voluntary preemption patches (in 2.6.13 subset in RHEL 4)
Reduces scheduling latency (<1ms) enabling improved
performance for applications such as audio/video
Lightweight userspace priority inheritance (PI) support for futexes
useful for realtime applications (2.6.18)
Assists priority inversion handling. Ref:
http://lwn.net/Articles/178253/
New 'mutex' locking primitive (2.6.16)
Similar to spinlocks, but permitted to block
High resolution timers (2.6.16)
Provide fine resolution and accuracy depending on system
configuration and capabilities used for precise in-kernel timing

7 Sybase Confidential 8/11/10


Kernel Performance / Scalability
Modular, on-the-fly switchable I/O schedulers (2.6.10)
Only provided as a boot option in RHEL4
Improved algorithms (especially for CFQ)
Per-Queue selectable (previously choice was system-wide)
Conversion to 4-level page tables (2.6.11 architecture specfic)
Allows x86_64 to increase from 512GB to 128TB of virtual address space
New pipe implementation (2.6.11)
30-90% performance improvement in pipe bandwidth
Circular buffer allows more buffering rather than blocking writers
Big Kernel Semaphore turns the Big Kernel Lock into a
semaphore
Latency Reduction by breaking up long lock hold times and adds voluntary
preemption
X86 SMP alternatives
Optimizes a single kernel image at runtime for UP or SMP operation
Ref: http://lwn.net/Articles/164121/
8 Sybase Confidential 8/11/10
Setting Shared Memory For ASE
First thing you will need to set is the shared memory
size
kernel.shmmax=<somenumber>

This sets the total amount of shared memory allowed


by the Operating System (single segment)
Formula is <amount of RAM desired in MB>*1024*1024
So value for 768MB of RAM:
768*1024*1024= 805306368
Note you must set this before building your first ASE
Default may be too low and ASE won't build
Make sure this value is less than kernel.shmall

9 Sybase Confidential 8/11/10


Understanding Hugepages

2M pages vs 4K standard Linux page size

Virtual to physical page map is 512 times smaller

TLB cache can map more memory resulting in
fewer cache misses

Huge pages pinned

Configuring huge pages (4G memory of huge
pages)
echo 2048 > /proc/sys/vm/nr_hugepages
vi /etc/sysctl.conf (vm.nr_hugepages = 2048)

10 Sybase Confidential 8/11/10


Huge Pages
Sybase Huge Pages Testing on AMD Barcelona RHEL 5.5
OLTP transactional throughput on a Quad Core 4 Socket 2.5Ghz 96G Physical

180000

1 70000

160000

150000

140000

1 30000

1 20000

1 10000

100000
h ug epa ge s defa ult

11 Sybase Confidential 8/11/10


HUGEPAGES and ASE
HUGEPAGES are supported in 64-bit Linux ASE >
15.0.3
Useful for systems with large memory settings
Parameter not directly configurable within ASE
Look for this line in the ASE errorlog:
CouldnotallocatememoryusingHugePages.Allocatedusingregularpages.Forbetter
performance,reboottheserverafterconfiguringenoughHugePages.

If you see that then check /proc/sys/vm, it's probably


set to 0 (which is the default)
Note that ASE rounds memory to the nearest
multiple of 256MB when HUGEPAGES are used
So make sure that kernel.shmmax will support the
increased memory requirement

12 Sybase Confidential 8/11/10


Understanding NonUniform
Memory Access (NUMA)

Multi core Multi socket architectures



NUMA needed for scaling

RHEL 5 / 6 completely NUMA aware

KVM guests draw benefits of NUMA

Additional performance improvements to be gained by
enforcing NUMA placement

How to enforce NUMA placement



numactl cpu and memory pinning

taskset cpu pinning

libvirt cpu pinning in libvirt - <vcpus cpuset='0-
3'>4</vcpus>

13 Sybase Confidential 8/11/10


AMD64 NUMA Memory Layout
Process on
S1C0
S1 S2
C C C C
S S SS S S S S S S S S
0 1 0 1
1 2 34 1 2 3 4 1 2 3 4
Memory Memory
Interleaved Memory
1 hop to any memory bank

Process on
C C C C S1C0
0 1 0 1
Memory Memory
S3 S4
S1 S2 S3 S4
Non-Interleaved (NUMA)

14 Sybase Confidential 8/11/10


CPU Scheduler

Recognizes differences between


logical and physical processors Socket 0
Core 0
I.E. Multi-core, hyperthreaded Threa
d0
Threa
d1
& chips/sockets Core 1 Socket 1
Threa Threa Thread Thread

Optimizes process scheduling d0 d1 0 1 Socket 2


to take advantage of shared
on-chip cache, and NUMA memory
nodes
Proc Proc Proc
ess ess ess

Implements multilevel run queues Proc Proc Proc


for sockets and cores (as ess ess ess

opposed to one run queue Proc


ess
Proc
ess
Proc
ess
per processor or per system) Proc Proc
ess ess
Strong CPU affinity avoids Proc
ess
task bouncing
Scheduler Compute Queues
Requires system BIOS to report
CPU topology correctly
15 Sybase Confidential 8/11/10
Red Hat Confidential
NUMA pinning for performance
Sybase OLTP - RHEL KVM Guests
Effect of NUMA pinning
160000 9.1

140000 9

120000 8.9

100000 8.8
KVM
Trans / min

80000 8.7 w/n umactl


% Diff

60000 8.6

40000 8.5

20000 8.4

0 8.3
1 Guest 6 cpu 8G mem 2 Guest 6 cpu 8G mem
Guests

16 Sybase Confidential 8/11/10


VM swappiness

Controls how aggressively the system reclaims
mapped memory:
Anonymous memory swapping
Mapped file pages writing if dirty and freeing
System V shared memory swapping

Decreasing swappiness
More aggressive reclaiming of unmapped pagecache
memory

Increasing swappiness
More aggressive swapping of mapped memory

Default value is 60

17 Sybase Confidential 8/11/10


/proc/sys/vm/swappiness
Sybaseserverwith/proc/sys/vm/swappinesssetto60(default)


procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussyidwa
51643644267883544323417888801204044749613022084625342516


Sybase server with /proc/sys/vm/swappiness set to 10:


procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussyidwa
8302422867243228069600238886377612862002024381326

18 Sybase Confidential 8/11/10


Other Memory-Related
Parameters....
Disable Shared Memory security blocks:
kernel.execshield=0
kernel.randomize_va_space=0
These work in combination with each other
Intention is to prevent buffer overflows and other
remote attacks / worms
They can however, cause issues with ASE
Security settings are good for applications that
expect this behavior (ASE doesn't and other
apps that don't expect this behavior won't run
correctly either)
Advice is to disable

19 Sybase Confidential 8/11/10


Tuning I/O Parameters

20 Sybase Confidential 8/11/10


IO Elevators
Deadline
Two queues one for read and one for write
IOs dispatched based on time spent in queue
CFQ (Completely Fair Queueing)
This is the default
Per process queue
Each process queue gets a fixed time slice (based
on process priority to maintain fairness)

21 Sybase Confidential 8/11/10


IO Elevators (cont'd)
Noop
FIFO order
Simple IO merging
Lowest CPU cost
Likely best for SANs
How to configure?
When booting, at the command line
(elevator=<deadline | cfq |noop)
echo deadline > /sys/block/sda/queue/scheduler

22 Sybase Confidential 8/11/10


Setting the I/O Elevator for ASE
Which one to use for ASE depends on the type of
storage:
If using a SAN (or other enterprise storage
system)
Use noop
SAN manages I/O requests, etc so no need for
kernel to manage this as well
If not using SAN (local disks, etc)
Use deadline
Note that you can change this on a per-device level
Check /sys/block/<device>/queue/scheduler to see
which one is active

23 Sybase Confidential 8/11/10


Sybase OLTP IO Elevators
Sybase IO scheduler testing on AMD Barcelona RHEL 5.5
OLTP transactional throughput on a Quad Core 4 Socket 2.5Ghz 96G Physical

17 0000
165055
163569
162110
1 60000

1 50000

1 40000

130000

12 0000

11 0000

1 00000
DEADLINE CFQ NOOP

24 Sybase Confidential 8/11/10


Tuning File systems
Mount options
How precise you want date/time -noatime
Access controls acl or -noacl NFS
Tune2fs
Writeback options
Journal options -j, blocksize
LUN optimizations
Blockdev w/ /dev and LVM
Readahead adjustment
[root@localhost~]#blockdevgetra/dev/sda5256
[root@localhost~]#blockdevsetra/dev/sda52048
25 Sybase Confidential 8/11/10
EXT3, GFS, NFS Iozone w/
DirectIO

RHEL5Direct_IOIOzoneEXT3,GFS,NFS
(Geom1M4GB,1k1m)
PerformanceinMB/sec

80
70
60
EXT_DIO
50
GFS1_DIO
40
NFS_DIO
30
20
10
0
ALL Initial Re Read Re Ran Ran Back RecRe Stride
I/O's Write Write Read dom dom ward Write Read
Read Write Read

26 Sybase Confidential 8/11/10


Red Hat Confidential
Using DIO with Filesystems
Sybase DIO vs. RAW scaling on AMD Magny-Cours

OLTP transactional throughput on an AMD Magny-Cours RHEL 6


145000
141306
140000

1 35000 133322

1 30000

12 5000

12 0000

11 5000

11 0000

105000

100000
1 2

27 Sybase Confidential 8/11/10


Tuning The OS for ASE and IO
Well not much to do if you are using raw partitions
If you are using filesystem-based devices for
ANYTHING ASE-related (tempdb, etc) then check a
few things:
Make the ASE server page size = filesystem block size
Change mount options for the filesystem as needed
Ext3 data=writeback, noatime, nodiratime, etc

Xfs logbufs=8

Also, look at inode ratio if using larger devices for ASE

Check ASE errorlog to make sure that direct I/O is being used on
filesystem-based devices
May have to explicitly enable; some versions of ASE
default to DSYNC when using filesystems

28 Sybase Confidential 8/11/10


Other tuning considerations
Do you need Selinux?
Modules to enforce access control

Performance impact rules
CPU speed
Off
On
Performance
Ondemand
powersave

29 Sybase Confidential 8/11/10


Q&A

30 Sybase Confidential 8/11/10


RUNNING ASE AND IQ ON
REDHAT ENTERPRISE LINUX
SPECIAL CONSIDERATIONS AND
CONFIGURATION
CHRIS BROWN
OPTIONS
SANJAY RAO
TECHNICAL EVANGELIST PRINCIPAL SOFTWARE
SYBASE, INC. ENGINEER
cbrown2@sybase.com REDHAT, INC.
31 Sybase Confidential 8/11/10 srao@redhat.com
APPENDIX

Additional RedHat Linux Topics / Tuning


Information

32 Sybase Confidential 8/11/10


Red Hat RHEL Performance Engineeri
Linux Virtual Memory

33 Sybase Confidential 8/11/10


Per Node/Zone Paging Dynamics

User Allocations

Reactivate

INACTIVE
ACTIVE FREE
(Dirty -> Clean)

Page aging swapout Reclaimin


pdflush(R g
HEL4/5)

User deletions
34 Sybase Confidential 8/11/10
Memory reclaim Watermarks
Free
All of RAM
List Do nothing

Pages High kswapd sleeps above


High
kswapd reclaims
memory
Pages Low kswapd wakesup at Low
kswapd reclaims memory

Pages Min all memory allocators reclaim at Min


user processes/kswapd reclaim memory

35 Sybase Confidential 8/11/10


/proc/sys/vm/pagecache

Controls when pagecache memory is
deactivated.

Default is 100%

Lower

Prevents swapping out anonymous memory

Higher

Favors pagecache pages

Disabled at 100%

36 Sybase Confidential 8/11/10


(Hint)flushing the pagecache
echo 1 > /proc/sys/vm/drop_caches

procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussyidwa
00224571841078083350196000561136212008317
0022457184107808335019600001039198001000
0022457184107808335019600001021188001000
0022457184107808335019600001035204001000
0022457248107808335019600001008164001000
302242128160176143863600001030197015850
002243610656204344080028361027177032672
0022436106562043440800001026180001000
002243610720212344000080101018300991

37 Sybase Confidential 8/11/10


(Hint)flushing the
slabcache
echo 2 > /proc/sys/vm/drop_caches

[tmp]# cat /proc/meminfo tmp]# cat /proc/meminfo


MemTotal: 3907444 kB MemTotal: 3907444 kB
MemFree: 3104576 kB MemFree: 3301788 kB

Slab: 415420 kB Slab: 218208 kB

Hugepagesize: 2048 kB Hugepagesize: 2048 kB

38 Sybase Confidential 8/11/10


/proc/sys/vm/dirty_ratio
Absolute limit to percentage of dirty pagecache

memory


Default is 40%

Lower value means less dirty pagecache and smaller IO streams

Higher value means more dirty pagecache and larger IO streams

39 Sybase Confidential 8/11/10


/proc/sys/vm/dirty_backgroun
d_ratio

Controls when dirty pagecache memory starts
getting written.

Default is 10%

Set to a lower value ..

pdflush starts earlier

less dirty pagecache and smaller IO streams

Set to a higher value...

pdflush starts later

more dirty pagecache and larger IO streams

40 Sybase Confidential 8/11/10


dirty_ratio and
dirty_background_ratio
pagecach
e
100% of pagecache RAM dirty

pdflushd and write()'ng processes write dirty buffers

dirty_ratio(40% of RAM dirty) processes start synchronous writes

pdflushd writes dirty buffers in background

dirty_background_ratio(10% of RAM dirty) wakeup pdflushd

do_nothing

0% of pagecache RAM dirty

41 Sybase Confidential 8/11/10


/proc/vmstat
cat/proc/vmstat
nr_anon_pages98893
nr_mapped20715 CONTINUED...
nr_file_pages120855 pgrefill_dma18338
nr_slab23060 pgrefill_dma321353451
nr_page_table_pages5971 pgrefill_normal0
nr_dirty21 pgrefill_high0
nr_writeback0 pgsteal_dma0
nr_unstable0 pgsteal_dma320
nr_bounce0 pgsteal_normal0
numa_hit996729666 pgsteal_high0
numa_miss0 pgscan_kswapd_dma7235
numa_foreign0 pgscan_kswapd_dma32417984
numa_interleave87657 pgscan_kswapd_normal0
numa_local996729666 pgscan_kswapd_high0
numa_other0 pgscan_direct_dma12
pgpgin2577307 pgscan_direct_dma321984
pgpgout106131928 pgscan_direct_normal0
pswpin0 pgscan_direct_high0
pswpout34 pginodesteal166
pgalloc_dma198908 slabs_scanned1072512
pgalloc_dma32997707549 kswapd_steal410973
pgalloc_normal0 kswapd_inodesteal61305
pgalloc_high0 pageoutrun7752
pgfree997909734 allocstall29
pgactivate1313196 pgrotated73
pgdeactivate470908
pgfault2971972147
pgmajfault8047.

42 Sybase Confidential 8/11/10


Red Hat RHEL Performance Tools

43 Sybase Confidential 8/11/10


Performance Monitoring Tools

Standard Unix OS tools

Monitoring - cpu, memory, process, disk

Top, vmstat, ps, iostat, netstat, sar ksar etc

Kernel Tools

/proc, info (cpu, mem, slab), sysctl, AltSysrq

Networking

Ethtool, ifconfig, ethereal,sysctl's

Profiling

Tracing strace, ltrace

Stap - dprobe, kprobe

3rd party profiling/ capacity monitoring

Vtune (intel), CodeAnalsst (amd)

SARcheck, KDE, BEA Patrol, HP Openview, Tivoli etc

44 Sybase Confidential 8/11/10


The /proc filesystem
/proc
meminfo
slabinfo
cpuinfo
pid<#>/maps
vmstat(RHEL4 & RHEL5)
zoneinfo(RHEL5)
sysrq-trigger

45 Sybase Confidential 8/11/10


Red Hat Top Tools

CPU Tools Memory Tools

Process Tools

1 top 1 top 1 top


2 vmstat 2 vmstat -s 2 ps -o pmem
3 ps aux 3 ps aur 3 gprof
4 mpstat -P all 4 ipcs 4 strace,ltrace
5 sar -u 5 sar -r -B -W 5 sar
6 iostat 6 free Disk Tools

7 oprofile 7 oprofile 1 iostat -x


8 gnome- 8 gnome- 2 vmstat - D
system-monitor system-monitor 3 sar -DEV #
9 KDE-monitor 9 KDE-monitor 4 nfsstat
10 /proc 10 /proc 5 NEED MORE!

46 Sybase Confidential 8/11/10


Monitoring Tools

mpstat reveals per cpu stats, Hard/Soft Interrupt
usage

vmstat vm page info, context switch, total ints/s, cpu

netstat per nic status, errors, statistics at driver level

lspci list the devices on pci, indepth driver
flags

oprofile system level profiling, kernel/driver code

modinfo list information about drivers, version, options

sar collect, report, save system activity
information


Many others available- iptraf, wireshark, etc

Sample use for some of these embedded in talk
47 Sybase Confidential 8/11/10
top - press h help,1-show cpus, m-memory, t-threads, > - column sort
top09:01:04up8days,15:22,2users,loadaverage:1.71,0.39,0.12

Tasks:114total,1running,113sleeping,0stopped,0zombie

Cpu0:5.3%us,2.3%sy,0.0%ni,0.0%id,92.0%wa,0.0%hi,0.3%si

Cpu1:0.3%us,0.3%sy,0.0%ni,89.7%id,9.7%wa,0.0%hi,0.0%si

Mem:2053860ktotal,2036840kused,17020kfree,99556kbuffers

Swap:2031608ktotal,160kused,2031448kfree,417720kcached

PIDUSERPRNIVIRTRESSHRS%CPU%MEMTIME+COMMAND

27830oracle1601315m1.2g1.2gD1.360.90:00.09oracle

27802oracle1601315m1.2g1.2gD1.061.00:00.10oracle

27811oracle1601315m1.2g1.2gD1.060.80:00.08oracle

27827oracle1601315m1.2g1.2gD1.061.00:00.11oracle

27805oracle1701315m1.2g1.2gD0.761.00:00.10oracle

27828oracle1502758466484620S0.30.30:00.17tpcc.exe

1root1604744580480S0.00.00:00.50init

2rootRT0000S0.00.00:00.11migration/0

3root3419000S0.00.00:00.00ksoftirqd/0

48 Sybase Confidential 8/11/10


vmstat(paging vs swapping)
Vmstat10

procsmemoryswapiosystemcpu

rbswpdfreebuffcachesisobiboincsussywaid

200548352420052423457600546315251303096

020169784020052439314400057850482108539941221463

300784420052457841090059330589463243144307321842

Vmstat10

procsmemoryswapiosystemcpu

rbswpdfreebuffcachesisobiboincsussywaid

200548352420052423457600546315251303096

02016623402005242345760057850482108539941221463

3023567873842005242345761875423745193589463243144307321842

49 Sybase Confidential 8/11/10


iostat -x of IOzone EXT3 file
system
iostat metrics
rates perf sec sizes and response time
r|w rqm/s request merged/s averq-sz average request sz
r|w sec/s 512 byte sectors/s avequ-sz average queue sz
r|w KB/s Kilobyte/s await average wait time ms
r|w /s operations/s svcm ave service time m

Linux2.4.2127.0.2.ELsmp(node1)
avgcpu:%user%nice%sys%iowait%idle
0.400.002.630.9196.06
Device:rrqm/swrqm/sr/sw/srsec/swsec/srkB/swkB/savgrqszavgquszawaitsvctm%util
sdi16164.600.00523.400.00133504.000.0066752.000.00255.071.001.911.8898.40
sdi17110.100.00553.900.00141312.000.0070656.000.00255.120.991.801.7898.40
sdi16153.500.00522.500.00133408.000.0066704.000.00255.330.981.881.8697.00
sdi17561.900.00568.100.00145040.000.0072520.000.00255.311.011.781.76100.00

50 Sybase Confidential 8/11/10


SAR
[root@localhostredhat]#saru33
Linux2.4.2120.EL(localhost.localdomain)05/16/2005

10:32:28PMCPU%user%nice%system%idle
10:32:31PMall0.000.000.00100.00
10:32:34PMall1.330.000.3398.33
10:32:37PMall1.340.000.0098.66
Average:all0.890.000.1199.00

[root]sarnDEV
Linux2.4.2120.EL(localhost.localdomain)03/16/2005

01:10:01PMIFACErxpck/stxpck/srxbyt/stxbyt/srxcmp/stxcmp/srxmcst/s
01:20:00PMlo3.493.49306.16306.160.000.000.00
01:20:00PMeth03.893.532395.34484.700.000.000.00
01:20:00PMeth10.000.000.000.000.000.000.00

51 Sybase Confidential 8/11/10


Networking tools

Tuning tools
ethtool View and change Ethernet card
settings
sysctl View and set /proc/sys settings
ifconfig View and set ethX variables
setpci View and set pci bus params for device
netperf Can run a bunch of different
network tests
/proc OS info, place for changing device
tunables

52 Sybase Confidential 8/11/10


Profiling Tools: OProfile

Open source project
http://oprofile.sourceforge.net
Events to measure with Oprofile:
Upstream; Red Hat contributes

Initially time-based samples most useful:

Originally modeled after DEC
Continuous Profiling Infrastructure PPro/PII/PIII/AMD:
(DCPI) CPU_CLK_UNHALTED
P4: GLOBAL_POWER_EVENTS

System-wide profiler (both kernel
and user code) IA64: CPU_CYCLES

Sample-based profiler with SMP TIMER_INT (fall-back profiling
machine support mechanism) default

Processor specific performance

Performance monitoring hardware monitoring hardware can provide
support additional kinds of sampling

Relatively low overhead, typically Many events to choose from
<10%
Branch mispredictions

Designed to run for long times Cache misses - TLB misses

Included in base Red Hat Pipeline stalls/serializing instructions
Enterprise Linux product

53 Sybase Confidential 8/11/10


Red Hat Confidential
oprofile builtin to RHEL4, 5 and
6
opcontrol on/off opreport analyze
data profile
--start start -r reverse order

collection sort
--stop stop -t [percentage]

collection theshold to view


--dump output to -f /path/filename

disk -d details

--
opannotate
event=:name:count -s /path/source
Example: -a /path/assembly
# opcontrol start
# /bin/time test1 &
# sleep 60
54 Sybase Confidential 8/11/10
oprofile opcontrol and opreport
cpu_cycles
#vmlinux2.6.9prep
CPU:Itanium2,speed1300MHz(estimated)
CountedCPU_CYCLESevents(CPUCycles)withaunitmaskof0x00(Nounitmask)count100000
samples%imagenameappnamesymbolname
909368968.9674vmlinuxvmlinuxdefault_idle
9698857.3557vmlinuxreread_spin_unlock_irq
7444455.6459vmlinuxreread_spin_unlock_irqrestore
4201033.1861vmlinuxvmlinux_spin_unlock_irqrestore
1464131.1104vmlinuxreread__blockdev_direct_IO
749180.5682vmlinuxvmlinux_spin_unlock_irq
652130.4946vmlinuxrereadkmem_cache_alloc
594530.4509vmlinuxvmlinuxdio_bio_complete
586360.4447vmlinuxrereadmempool_alloc
566750.4298scsi_mod.korereadscsi_decide_disposition
539650.4093vmlinuxrereaddio_bio_complete
530790.4026vmlinuxrereadbio_check_pages_dirty
530350.4022vmlinuxvmlinuxbio_check_pages_dirty
474300.3597vmlinuxvmlinux__end_that_request_first
472630.3584vmlinuxrereadget_request
433830.3290vmlinuxreread__end_that_request_first
402510.3053qla2xxx.korereadqla2xxx_get_port_name
359190.2724scsi_mod.koreread__scsi_device_lookup
355640.2697vmlinuxrereadaio_read_evt
328300.2490vmlinuxrereadkmem_cache_free
327380.2483scsi_mod.koscsi_modscsi_remove_host

55 Sybase Confidential 8/11/10


Profiling Tools: SystemTap

Red Hat, Intel, IBM & Hitachi
collaboration pars probe script
e

Linux answer to Solaris Dtrace

Dynamic instrumentation elaborat
e

Tool to take a deep look into a probe-set
running system: library

Assists in identifying causes of translate to C, compile
*
performance problems

Simplifies building instrumentation load module, start
probe kernel
object
probe

Current snapshots available from:
http://sources.redhat.com/systemtap

Source for presentations/papers extract output,
unload probe output

Kernel space tracing today, user
space tracing under development * Solaris Dtrace is interpretive


Technology preview status until 5.1
56 Sybase Confidential 8/11/10
Profiling Tools: SystemTap

Technology: Kprobes:

In current 2.6 kernels

Upstream 2.6.12, backported to RHEL4 kernel

Kernel instrumentation without recompile/reboot

Uses software int and trap handler for instrumentation

Debug information:

Provides map between executable and source code

Generated as part of RPM builds

Available at: ftp://ftp.redhat.com

Safety: Instrumentation scripting language:

No dynamic memory allocation or assembly/C code

Types and type conversions limited

Restrict access through pointers

Script compiler checks:

Infinite loops and recursion Invalid variable access
57 Sybase Confidential 8/11/10
SystemTap: Kernel debugging

Tracepoints were added to RHEL5 kernel
trace_mm_filemap_fault(area>vm_mm,address,page);
trace_mm_anon_userfree(mm,addr,page);
trace_mm_filemap_userunmap(mm,addr,page);
trace_mm_filemap_cow(mm,address,new_page);
trace_mm_anon_cow(mm,address,new_page);
trace_mm_anon_pgin(mm,address,page);
trace_mm_anon_fault(mm,address,page);
trace_mm_page_free(page);
trace_mm_page_allocation(page,zone>free_pages);
trace_mm_pdflush_bgwriteout(_min_pages);
trace_mm_pdflush_kupdate(nr_to_write);
trace_mm_anon_unmap(page,ret==SWAP_SUCCESS);
trace_mm_filemap_unmap(page,ret==SWAP_SUCCESS);
trace_mm_pagereclaim_pgout(page,PageAnon(page));
trace_mm_pagereclaim_free(page,PageAnon(page));
trace_mm_pagereclaim_shrinkinactive_i2a(page);
trace_mm_pagereclaim_shrinkinactive_i2i(page);
trace_mm_pagereclaim_shrinkinactive(nr_reclaimed);
trace_mm_pagereclaim_shrinkactive_a2a(page);
trace_mm_pagereclaim_shrinkactive_a2i(page);
trace_mm_pagereclaim_shrinkactive(pgscanned);
trace_mm_pagereclaim_shrinkzone(nr_reclaimed);
trace_mm_directreclaim_reclaimall(priority);
trace_mm_kswapd_runs(sc.nr_reclaimed);
58 Sybase Confidential 8/11/10
SystemTap: Kernel debugging
Several custom scripts enable/use tracepoints
(see /usr/local/share/doc/systemtap/examples)
#!/usr/local/bin/stap
globaltraced_pid
functionlog_event:long()
{
return(!traced_pid||traced_pid==(task_pid(task_current())))
}
probekernel.trace("mm_pagereclaim_shrinkinactive"){
if(!log_event())next
reclaims[pid()]++
command[pid()]=execname()
}
//MMkerneltracepointsprologandepilogroutines
probebegin{
printf("Startingmmtracepoints\n");
traced_pid=target();
if(traced_pid){
printf("modeSpecificPid,tracedpid:%d\n",traced_pid);
}else{
printf("modeAllPids\n");
}
printf("\n");
}
probeend{
printf("Terminatingmmtracepoints\n");
printf("CommandPidDirectActivateDeactivateReclaimsFreed\n");
printf("\n");
foreach(pidinreclaims)

59 Sybase Confidential 8/11/10


SystemTap: Kernel debugging
CommandPidDirectActivateDeactivateReclaimsFreed

kswapd05440150376791943715157430730
kswapd15450180678882434712117341408
memory254359975697573083604621115837
mixer_applet2768764180101333981
Xorg749151906283920382
gnometerminal71612103869512320
gnometerminal77015261422457172
cupsd7100192704128

60 Sybase Confidential 8/11/10


SystemTap: Kernel debugging
CommandPidAllocFreeA_faultA_ufreeA_pginA_cowA_unmap

memory25685284278440644082834840398981614048185
kswapd1545300753257000049884
kswapd054462025241000017568
mixer_applet27687302282700101241
sshd25051227000600
kjournald86320728300002149
Xorg74911698980000310
gnomepowerman76531520001800
avahidaemon7252150128000480160
irqbalance67251263641313180190
bash250531220001300
hald7264890008300
gconfd271638252600680116

Red Hat Performance NDA Required


61 Sybase Confidential 8/11/10
2009

You might also like