Opff4 33646

33646 abhinav sharma 15008 lake union hill way alpharetta, GA 30004
United States 7706088398 abheenav.sharma@gmail.com

Oracle Performance
Firefighting
Craig Shallahamer
OraPub, Inc.
OraPub books are available at special quantity discounts to use as premiums and sales
promotions, or for use in corporate training programs. For more information, please contact
OraPub at http://www.orapub.com.
Oracle Performance Firefighting

Copyright © 2009, 2010 by Craig Shallahamer
All rights reserved. Absolutely no part of this work may be reproduced or transmitted in any
form or by any means, electric or mechanical, including photocopying, scanning, recording,
or by any information or storage or retrieval system, without prior written permission of the
copyright owner and the publisher.
Please—Out of respect for those involved in the creation of this book and also for their
families, we ask you to respect the copyright both in intent and deed. Thank you.
ISBN-13 : 978-0-9841023-0-3
ISBN-10 : 0-9841023-0-2
Printed and bound in the United States of America.
Trademarked names may appear in this book. Rather than use a trademark symbol with every
occurrence of a trademarked name, we use the names only in an editorial fashion and to the
benefit of the trademark owner, with no intention of infringement of the trademark.
Fourth Printing: June 2010
Project Manager Copy Editor

Craig Shallahamer Marilyn Smith
Cover Design Technical Reviewers

Lindsay Waltz Kirti Deshpande
Dan Fink
Printer Tim Gorman
Good Catch Publishing Gopal Kandhasamy
Dwayne King
Dean Richards
Distributed to the book trade worldwide by OraPub, Inc. Phone +1.503.636.0228 or visit
http://www.orapub.com.
The information in this book is distributed on an “as is” basis, without warranty. Although
precautions have been taken in the preparation of this work, neither the author nor OraPub,
Inc. shall have any liability to any person or entity with respect to any loss or damage caused
or alleged to be caused directly or indirectly by the information contained in this book.
The most fascinating thing in life is

the discovery of God.
About OraPub, Inc.

OraPub is all about Oracle performance management.
In the early 1990’s, Craig Shallahamer and his Oracle Consulting colleagues began receiving
requests for their technical papers from around the world. While honored by the requests,
emailing multiple technical papers from a hotel room with a dial-up connection became
overwhelming. The World Wide Web was just getting started, so in 1995 Craig registered
www.orapub.com and invited his colleagues to place their technical papers on his web-site.
While the initial orapub.com web-site was about as bare-bones as you could get, soon
thousands and thousands of papers were being downloaded and orapub.com became one of
the most popular web-sites for Oracle performance technical papers.
When Craig left Oracle Corporation in 1998 he founded OraPub, Inc. While keeping the same
philosophy of disseminating quality yet free technical papers, he quickly expanded the web-
site by providing free tools, additional research papers, and began marketing his consulting
services and classroom training.
In 2008, OraPub celebrated its tenth anniversary. Its focus has remained unchanged and its
popularity and impact continues. OraPub continues to update and offer its highly acclaimed
courses Oracle Performance Firefighting, Advanced Oracle Performance Analysis, and Oracle
Forecasting & Predictive Analysis. It has also developed a number of strategic partnerships
with consulting organizations, software product vendors, and business enhancing
organizations. Since its incorporation, OraPub has offered its training in 23 countries on 6
continents. Over ten thousand Oracle professionals have attended OraPub’s seminars,
conference presentations, and classroom training offerings.
For more information, please visit http://www.orapub.com.

Contents at a Glance
About the Author
About the Technical Reviewers
CHAPTER 0. Introduction ........................................................................................1
CHAPTER 1. Methods and Madness ...................................................................11
CHAPTER 2. Listening to Oracle’s Pain...............................................................35
CHAPTER 3. Serialization Control........................................................................67
CHAPTER 4. Identifying and Understanding Operating System Contention...101
CHAPTER 5. Oracle Performance Diagnosis ....................................................135
CHAPTER 6. Oracle Buffer Cache Internals ......................................................185
CHAPTER 7. Oracle Shared Pool Internals .......................................................245
CHAPTER 8. Oracle Redo Management Internals............................................291
CHAPTER 9. Oracle Performance Analysis.......................................................327

CONTENTS
Contents
About the Author
CHAPTER 0. Introduction ...........................................1

Why Buy This Book? .............................................................................................2
What Is the Value to Me?......................................................................................3
Who Will Benefit? ..................................................................................................3
How This Book Is Organized ................................................................................4
What Notations Are Used? ...................................................................................5
What Is Not Covered? ...........................................................................................7
Which Platform and Version? ...............................................................................8
About the Tools Used In This Book......................................................................9
Want More?............................................................................................................9
Comments and Questions ....................................................................................9
CHAPTER 1. Methods and Madness .......................11

Firefighting 101 ....................................................................................................12
Stoking the Fire ................................................................................................12
Cooling the Fire, Step by Step ........................................................................13
Don’t Panic ...................................................................................................13
Define Your Objectives ................................................................................13
Establish the Scope and Get Reinforcements ...........................................14
Get a Baseline of Current Performance .....................................................14
Install Your Tools .........................................................................................15
Develop a Simple Communications Strategy.............................................15
Pick the Low-Hanging Fruit .........................................................................16
Conduct a Deep Performance Analysis .....................................................16
Get a Final Baseline of Current Performance ............................................17
Document Your Success .............................................................................17
Celebrate! .....................................................................................................17
OraPub 3-Circle Analysis ....................................................................................18
A Couple 3-Circle Analysis Case Studies ......................................................20
Case Study 1: Quick 3-Circle Analysis .......................................................20
Case Study 2: More Complex 3-Circle Analysis ........................................21
Your Compelling Story.....................................................................................22
The Story Components................................................................................23
The Story Development Process ................................................................24
CONTENTS
Wait-Event Analysis ............................................................................................ 25

Oracle Response-Time Analysis........................................................................ 27
The Role of the Response-Time Curve ......................................................... 27
Case Study of Oracle Response-Time Analysis ........................................... 29
Red Line, Blue Line............................................................................................. 31
Summary.............................................................................................................. 33
CHAPTER 2. Listening to Oracle’s Pain..................35

Performance Diagnosis: The Backstory ............................................................ 36
It’s All About Instrumentation.............................................................................. 37
Oracle Instrumentation .................................................................................... 41
How Oracle Collects Time .............................................................................. 42
The Wait Event Views ..................................................................................... 45
System-Level Perspective (v$system_event)............................................ 45
Session-Level Perspective (v$session_event) .......................................... 47
Real-Time Session-Level Perspective (v$session_wait) .......................... 49
Oracle Time Classification.................................................................................. 52
Queue Time Classification .............................................................................. 54
Service Time Classification............................................................................. 55
OraPub’s Response-Time Analysis Reports..................................................... 56
Instance-Level ORTA Reporting .................................................................... 56
Part 1: Workload Metrics ............................................................................. 57
Part 2: Response Time Summary .............................................................. 57
Part 3: IO Wait Time Summary with Event Details.................................... 61
Part 4: Other Wait Time (Non-IO) Event Details ....................................... 61
Part 5: SQL Activity Details During Probe ................................................. 61
Part 6: Similar SQL Statements.................................................................. 62
Part 7: Operating System CPU Utilization ................................................. 62
Session-Level ORTA Reporting ..................................................................... 62
Profiling a Single Session ........................................................................... 63
Profiling a Group of Sessions ..................................................................... 65
Summary.............................................................................................................. 65
CHAPTER 3. Serialization Control ...........................67

Serialization Is Death .......................................................................................... 68
Serialization and Queuing............................................................................... 68
Everyone Gets Involved .................................................................................. 69
How to Detect and Resolve Contention............................................................. 70
Fundamental Protection Requirements ............................................................. 71
Relational Structure Control............................................................................ 71
Memory Structure Control............................................................................... 72
Oracle Latch Specifics ........................................................................................ 74
How Multiple Latches Are Implemented ........................................................ 75
Least Recently Used Lists .......................................................................... 75
CONTENTS
Cache Buffer Chains....................................................................................76

Oracle’s General Latching Algorithm..............................................................78
Shared or Exclusive? ...................................................................................80
Immediate or Willing to Wait? .....................................................................80
Spinning on a Latch .....................................................................................80
Sleep Time....................................................................................................81
Time Accounting ..............................................................................................83
A Real-Life Latching Acquisition Example .................................................84
Should You Increase the _spin_count Parameter? ...................................86
How to Detect Significant Latch Contention...................................................86
Oracle Mutex Specifics........................................................................................91
What Is a Mutex? .............................................................................................91
Benefits of Using Mutexes...............................................................................91
Flexible Creation ..........................................................................................92
Reduced False Contention ..........................................................................92
Control Structure Contention.......................................................................93
Faster Pinning ..............................................................................................93
Oracle’s General Mutex Algorithm..................................................................94
How to Detect Mutex Contention ....................................................................98
Summary ............................................................................................................100
CHAPTER 4. Identifying and Understanding
Operating System Contention ..........101

The Four Subsystems .......................................................................................102
CPU Contention .................................................................................................102
How to Model a CPU Subsystem .................................................................103
Where CPU Time Is Spent ............................................................................104
When Queuing Sets In ..................................................................................105
Queue Length Strategies ..............................................................................109
OLTP-Centric Systems ..............................................................................109
Batch-Centric Systems ..............................................................................110
Monitoring CPU Activity.................................................................................110
Utilization ....................................................................................................110
Run Queue .................................................................................................116
Memory Pressure ..............................................................................................118
Memory Categories .......................................................................................118
Real and Virtual Memory ...........................................................................119
Shared Memory Segments........................................................................119
Process-Related Memory ..........................................................................119
The General Memory Game .........................................................................120
Swap: A Four-Letter Word.............................................................................120
Memory Page Scanning ................................................................................122
What to Say and What Not to Say About Memory ......................................122
IO Contention .....................................................................................................122
CONTENTS
Load-Balancing Still Helps ............................................................................ 123

Why IO Subsystems Are Expensive ............................................................ 123
How We Model an IO Subsystem ............................................................ 123
Even the Best IO Subsystems Queue...................................................... 124
How to Detect an IO Bottleneck ................................................................... 125
Using Oracle’s Wait Interface ................................................................... 126
Removing Oracle from the Equation ........................................................ 128
Using Operating System IO Reports ........................................................ 129
Understanding Your Solution Options.......................................................... 130
Network Contention........................................................................................... 132
Network Latency ............................................................................................ 132
Network Collisions ......................................................................................... 133
Dropped Packets ........................................................................................... 134
Summary............................................................................................................ 134
CHAPTER 5. Oracle Performance Diagnosis ........135

Oracle CPU Consumption and Components .................................................. 136
Oracle’s Limited Perspective ........................................................................ 136
Using Instance Statistics ............................................................................... 137
Using the System Time Model...................................................................... 142
Time Model Superiority ............................................................................. 142
Time Model Time Classification................................................................ 145
The Ghost IO Bottleneck .................................................................................. 148
More Than Just an Average ............................................................................. 149
Wait Event Myths .............................................................................................. 151
Decreasing Wait Time Always Improves Performance .............................. 151
Decreasing Wait Time Decreases End-to-End Response Time ................ 152
End-to-End Response Time Defined ....................................................... 153
Some End-to-End Response Time Realities ........................................... 153
The SQL*Net Message from Client Wait Event....................................... 154
Profiling a Session Is Always the Best Approach ....................................... 156
Profiling Defined ........................................................................................ 156
The Trap ..................................................................................................... 157
The Solution ............................................................................................... 157
Modern Architecture Statistics Collection ........................................................ 158
Why We Need a Better Collection Facility ................................................... 158
Oracle’s Solution: DBMS_MONITOR .......................................................... 163
It Helps to Change Our Mindset ................................................................... 163
How to Use DBMS_MONITOR..................................................................... 164
Criteria Specification: Identify the Session(s) of Interest ........................ 164
Enable Tracing, Statistics Collection, or Both.......................................... 167
Wait While the Data Is Being Collected ................................................... 168
Query the Appropriate Statistics Collection View .................................... 168
Disable Tracing, Statistics Collection, or Both......................................... 169
Consolidate the Trace Files into a Single File ......................................... 169
Tkprof the Trace Files ............................................................................... 169
CONTENTS
Perform Your Analysis ...............................................................................169

A DBMS_MONITOR Example ......................................................................169
Active Session History.......................................................................................176
Why ASH Is a Big Deal..................................................................................177
A Demonstration of ASH Capabilities...........................................................177
ASH Data Collection and Architecture .........................................................182
Summary ............................................................................................................184
CHAPTER 6. Oracle Buffer Cache Internals ..........185

Big Expectations ................................................................................................186
What Is a Buffer? ...............................................................................................187
Free Buffers....................................................................................................187
Dirty Buffers....................................................................................................188
Pinned Buffers................................................................................................189
The Role of Buffer Headers ..............................................................................189
Cache Buffer Chains .........................................................................................191
Introduction to Hashing..................................................................................191
Hash Functions ..........................................................................................191
Hash Buckets .............................................................................................193
Hash Chains ...............................................................................................194
CBCs in Action ...............................................................................................194
How to Wreck CBC Performance .................................................................195
Limiting Concurrency by Decreasing Latches..........................................195
Increasing Chain Scan Time by Decreasing Chains ...............................198
Increasing Chain Scan Time with Cloned Buffers ...................................199
CBC Contention Identification and Resolution.............................................202
Least Recently Used Chains ............................................................................203
LRU Chain Changes Over the Years ...........................................................204
Standard LRU Algorithm............................................................................205
Modified LRU Algorithm.............................................................................206
Oracle’s Touch-Count Algorithm...................................................................207
Midpoint Insertion .......................................................................................208
Touch Count Incrementation .....................................................................209
Buffer Promotion ........................................................................................210
Hot Region to Cold Region Movement .....................................................211
About Touch Count Changes ....................................................................212
LRU Chain Contention Identification and Resolution ..................................213
The Write List and Database Writer .................................................................215
The Database Writer in Action ......................................................................216
Database Writer-Related Contention Identification and Resolution...........218
Free Buffer Waits ...............................................................................................220
Buffer Busy Waits ..............................................................................................223
The Four-Step Diagnosis ..............................................................................223
Determining If There Is a Parameter Pattern ...........................................223
Identifying the Buffer Type.........................................................................224
Determining the Header Block ..................................................................226
CONTENTS
Implementing the Appropriate Solution Set ............................................. 227

Solutions for a Single Busy Table Block ...................................................... 227
Solutions for Multiple Busy Table Blocks..................................................... 228
Solutions for Table Segment Header Blocks............................................... 229
Solutions for Undo Segment Header Blocks ............................................... 230
Solutions for Index Leaf Blocks .................................................................... 231
The Situation .............................................................................................. 231
The Solution ............................................................................................... 232
The Good News and the Bad News ......................................................... 233
Enqueue Waits .................................................................................................. 233
Diagnosing Enqueue Waits .......................................................................... 234
TX Enqueue Waits......................................................................................... 236
Introduction to Interested Transaction Lists............................................. 237
Undo Segment’s Transaction Table ......................................................... 238
Deeper into Interested Transaction Lists ................................................. 239
Deeper into Buffer Cloning........................................................................ 242
Summary............................................................................................................ 244
CHAPTER 7. Oracle Shared Pool Internals............245

Problems in the Shared Pool............................................................................ 246
What’s in the Shared Pool? .............................................................................. 247
The Oracle Cursor............................................................................................. 248
Parent and Child Cursors.............................................................................. 248
Cursor Building .............................................................................................. 249
Cursor Searching Introduction ...................................................................... 250
Cursor Pinning and Locking.......................................................................... 250
Library Cache Management ............................................................................. 251
Library Cache Architecture ........................................................................... 251
Library Cache Conceptual Model ............................................................. 251
Library Cache Object References ............................................................ 252
Object Hashing .............................................................................................. 257
Keeping Cursors in the Cache...................................................................... 259
Increase the Likelihood of Caching .......................................................... 259
Force Caching............................................................................................ 260
Private Cursor Caches .................................................................................. 261
Library Cache Latch/Mutex Contention Identification and Resolution ....... 262
Enable Mutexes ......................................................................................... 262
Use Bind Variables to Create Similar SQL .............................................. 263
Use Cursor Sharing ................................................................................... 264
Take Advantage of the Hash Structure .................................................... 265
Try Mutex-Focused Solutions ................................................................... 266
Shared Pool Memory Management ................................................................. 268
From Hashing to Subpools ........................................................................... 268
Memory Allocation and Deallocation............................................................ 270
Shared Pool Latch Contention Identification and Resolution ..................... 271
Pin Large and Frequently Used Objects .................................................. 271
CONTENTS
Flush the Shared Pool ...............................................................................275

Increase the Number of Subpools ............................................................275
Reduce the Shared Pool Size ...................................................................275
4031 Error Resolution....................................................................................276
Flush the Shared Pool ...............................................................................276
Increase the Shared Pool Size..................................................................276
Increase the Shared Pool Reserved Size. ...............................................277
Minimize Cursor Pinning Duration ............................................................277
Reduce Kept Objects Memory Consumption ...........................................278
Upgrade to Oracle Database 10g Release 2 ...........................................278
In-Memory Undo Management .........................................................................278
New Features Bring High Risk ......................................................................279
The Problem: Segment Management ..........................................................279
Introducing In-Memory Undo.........................................................................280
How IMU Works .............................................................................................281
Traditional Undo Management..................................................................281
IMU Management.......................................................................................283
A Marked Performance Improvement ..........................................................284
IMU Setup and Monitoring.............................................................................285
Setting Up IMU ...........................................................................................285
Monitoring IMU ...........................................................................................286
IMU Contention Identification and Resolution..............................................288
Summary ............................................................................................................289
CHAPTER 8. Oracle Redo Management Internals 291

Buffer Cache Changes ......................................................................................292
Just Enough Redo Is Generated ..................................................................292
Undo-Related Redo .......................................................................................293
Query-Related Redo ......................................................................................293
Redo Log Buffer Architecture and Algorithm ...................................................295
Pre-Oracle9i Release 2 Redo Log Buffer.....................................................296
Post-Oracle9i Release 2 Redo Log Buffer ...................................................297
Redo Flow ......................................................................................................299
Global Temporary Tables..................................................................................300
The Need for True Interim Tables.................................................................300
Common Characteristics ...............................................................................300
Truly Reduced Redo ......................................................................................301
Log Writer Background Process Triggers ........................................................301
Commit Issued ...............................................................................................302
Commit Write Facility .................................................................................302
Database Writer Posting the Log Writer.......................................................305
Buffer Fill.........................................................................................................305
Three-Second Timeout..................................................................................306
Redo-Related Performance Issues ..................................................................307
Log Buffer Space ...........................................................................................308
Redo Allocation Latch Contention ................................................................309
CONTENTS
Redo Copy Latch Contention ....................................................................... 310

Log File Sync Contention.............................................................................. 310
Application-Focused Solutions for Log File Sync Contention................. 311
Operating System-Focused Solutions for Log File Sync Contention..... 314
Oracle-Focused Solutions for Log File Sync Contention ........................ 314
Log File Parallel Write Contention................................................................ 315
Log Writer Write Challenges ..................................................................... 315
Gathering Oracle’s IO Requirements ....................................................... 319
Application-Focused Solutions for Log File Parallel Write Contention .. 321
Operating System-Focused Solutions for Log File Parallel Write
Contention .......................................................................................................... 323
Oracle-Focused Solutions for Log File Parallel Write Contention .......... 324
Log File Switch Contention ........................................................................... 324
Checkpoint Incomplete.............................................................................. 325
Archive Incomplete .................................................................................... 326
Summary............................................................................................................ 326
CHAPTER 9. Oracle Performance Analysis...........327

Deeper into Response-Time Analysis ............................................................. 328
Oracle Arrival Rates ...................................................................................... 328
Utilization........................................................................................................ 331
Requirements Defined............................................................................... 332
Capacity Defined ....................................................................................... 336
Calculating Utilization ................................................................................ 338
Oracle Service Time ...................................................................................... 340
Oracle Queue Time ....................................................................................... 341
Oracle Response Time ................................................................................. 343
The Bridge Between Firefighting and Predictive Analysis ...................... 343
Total Time and Time Per Workload.......................................................... 344
CPU Service and Queue Time ................................................................. 344
IO Service and Queue Time ..................................................................... 345
Oracle Response Time in Reality ............................................................. 346
Response-Time Graph Construction ............................................................... 348
Selecting the Unit of Work ............................................................................ 349
Choosing the Level of Abstraction ............................................................... 350
The Five-Step Response-Time Graph Creation Process........................... 352
Know the System Bottleneck .................................................................... 352
Pick an Appropriate Unit of Work ............................................................. 353
Determine the Service Time and Queue Time ........................................ 353
If Possible, Compare Utilizations.............................................................. 354
Create the Response-Time Graph ........................................................... 354
A Response-Time Curve for an IO-Bottlenecked System .......................... 355
Know the System Bottleneck .................................................................... 355
Pick an Appropriate Unit of Work ............................................................. 358
Determine the Service Time and Queue Time ........................................ 358
If Possible, Compare Utilizations.............................................................. 358
CONTENTS
Create the Response-Time Graph............................................................358

How to Improve the Performance Situation .....................................................360
Tuning: Reducing Requirements ..................................................................360
Buying: Increasing Capacity..........................................................................361
Balance: Managing Workload .......................................................................362
Anticipating Solution Impact..............................................................................364
Simplification Is the Key to Understanding ..................................................364
A Word of Caution..........................................................................................365
Full Performance Analysis: Cycle 1 ..............................................................365
Oracle Analysis ..........................................................................................365
Operating System Analysis .......................................................................367
Application Subsystem ..............................................................................369
Response-Time Graphs.............................................................................370
What-If Analysis .........................................................................................371
Oracle Analysis ..........................................................................................373
Application Analysis ...................................................................................374
What-If Analysis .........................................................................................376
Oracle Analysis ..........................................................................................378
Application Analysis ...................................................................................379
Full Performance Analysis: Summary ..........................................................380
Improper Index Impact Analysis Performed .............................................382
Proper Use of Work and Time...................................................................382
Batch Process-Focused Performance Analysis ..........................................383
Setting Up the Analysis..............................................................................383
Capturing Resource Consumption............................................................384
Including Parallelism in the Analysis.........................................................384
Operating in the Elbow of the Curve.........................................................387
Summary and Next Steps .................................................................................387
About the Author

Craig Shallahamer is a researcher, writer, and
teacher for 1000s of Oracle professionals on 6 continents in
23 countries. His expertise is Oracle performance
management, in which he has published more than 24
technical papers, authored the book Forecasting Oracle
Performance, and technically reviewed Oracle Performance
101, Oracle Wait Interface: A Practical Guide to
Performance Diagnostics and Tuning, and others. Craig is
the creator of the popular Oracle Performance Firefighting
and the revolutionary Advanced Oracle Performance
Analysis and Oracle Forecasting & Predictive Analysis
courses. After nine years with Oracle Corporation, in which
he co-founded the System Performance Group and the Core Technologies groups, he left to
start OraPub, Inc. Craig is a passionate educator, a pinnacle of Oracle knowledge, and an
engaging instructor. His entertaining presentation style combines with his depth of Oracle
experiences to make each presentation unique and practical.
When Craig is not working on maximizing IT efficiency, he is fly fishing, backpacking,
playing acoustic guitar, beekeeping, roasting coffee, riding his Honda VFR Interceptor, or just
relaxing around a fire.

Dean Richards has over 20 years of Oracle
implementations, database architecting and performance
consulting experience. Before coming to Confio Software,
Dean held engineering positions at McDonnell Douglas and
Daugherty Systems and was also a technical director for
Oracle Corporation consulting, managing all aspects of key
accounts including technical planning and strategic alliances.
While at Confio Software Dean has been a key
contributor to Confio's success by playing important roles in
both product development and technical sales support of
Confio's Ignite database performance management solution.
As a highly successful liaison between management and
technical staff, Dean has proven to be an effective collaborator implementing cutting-edge
solutions. Dean's ability to simply and concisely explain very intricate and advance technical
database information to prospective and current customers has made him an extremely
valuable asset.
Daniel Fink has been working with Oracle since 1995,

starting as a developer/dba on Oracle 7.1 Parallel Server on
OpenVMS then moving to database administration. Currently
working as a consultant, he focuses on diagnosis, optimization
and data recovery. He is also a highly regarded trainer and
presenter, speaking at user group conferences in the United
States and Europe. When not working with technology, he
enjoys the mountains of Colorado on foot, on skis or from the
seat of a bicycle and spending time with his favorite girls,
Beverly and Aggie.
Tim Gorman has worked with relational databases

since 1984, as an Oracle application developer since 1990,
and as an Oracle DBA since a dark and stormy night in 1993.
He is an independent consultant (http://www.EvDBT.com)
specializing in performance tuning, database administration
(particularly availability), PL/SQL development, and data
warehousing. He has been an active member of the Rocky
Mountain Oracle Users Group (http://www.rmoug.org) since
1992 and has presented at Oracle Open World, Collaborate,
UKOUG, Miracle Database Forum and Master Classes, as
well as local Oracle users groups in North America and the
Caribbean. He has also taught his Scaling to Infinity:
Partitioning Data Warehouses on Oracle class for customers
onsite, TruTek, Speak-Tech, and Oracle University.
K Gopalakrishnan (Gopal) is a strategic

deployment specialist with the Enterprise Solutions Group at
Oracle Corporation specializing exclusively in performance
tuning, maximum availability architecture and scalable
system architectures. He is a recognized expert in Oracle
RAC and database internals and has used his extensive
experience in solving many vexing performance issues all
across the worked for telecom giants, financial institutions
and ecommerce applications. Gopal has worked in the few of
the biggest and busiest databases of the planet.
Gopal has co authored an award winning Oracle Press
book Oracle Wait Interface: A Practical guide to
performance Diagnostics and Tuning. He also the author of
another best selling Oracle Press book: Oracle Database 10g
Real Application Clusters Handbook. He was awarded Oracle
Author of the Year by Oracle Magazine in 2005. He was also chosen as an Oracle ACE by
Oracle Technology network in 2006. Gopal has more than a decade of Oracle experience
backed by an Engineering Degree in Computer Science and Engineering from University of
Madras, India.
Kirtikumar Deshpande works for Oracle

Corporation as a Senior Principal Consultant in their Advance
Technology Solutions group. He has over 28 years of IT
experience including over 14 years as an Oracle DBA. He co-
authored Oracle Press books, titled Oracle Performance
Tuning 101 published in May 2001 and Oracle Wait Interface:
A Practical Guide to Performance Diagnostics & Tuning
published in June 2004. Oracle Magazine selected him as the
Oracle Author of the Year 2005 for the latter. He has presented
a number of papers at various Oracle User Group Conferences
in the USA and abroad.
Dwayne King has been an Oracle professional since the

early 90s, when he started designing and implementing systems
using Oracle 6 and Oracle Forms. After developing his skills
on several large scale projects he eventually started an Oracle
consulting practice (http://www.kridan-consulting.com), and
has been assisting his clients with database design, DBA
mentoring, performance tuning, and PL/SQL development and
training for the last 15 years.
His clients have included public sector, private sector, and
international organizations in Europe and North America. He has worked on database
systems in the telecomm, healthcare, government research and security sectors and has been
privileged to work with several authors on some great books related to Oracle technology.
When not working, he enjoys spending time with his wife Jennifer, and son Noah. His
hobbies include participating in open-source initiatives, scuba diving, and more recently,
riding his motorcycle.
He can be reached at dwayne.king@KRIDAN-consulting.com
CHAPTER
0
Introduction
Oracle performance firefighting is not for everyone. It’s a strange breed that enjoys being
faced with shouting, scared, crying, angry, jealous, and happy people, all while solving a
difficult technical problem. I, and others like me, get a strange rush of adrenaline when
plunging wholeheartedly into these situations.
To operate effectively in firefighting conditions takes a unique combination of method,
perfect diagnosis, deep Oracle internals knowledge, the ability to communicate the complex
in simple terms, and confidence in each of these areas. This book is squarely focused on
helping you develop these characteristics and skills.
This book has been a long time in coming! For years, my colleagues, students, partners,
and friends have asked me to take my courseware, technical papers, stories, and how I explain
things, and turn them into a book. But it’s a massive undertaking that I’ve put off. I finally
relented, and here is the result.
I’ve spent most of my professional life either teaching others how to fight Oracle
performance fires (and even better, avoid them in the first place) or doing the work myself. I
still remember my very first assignment. While booking my flight to Round Rock, Texas, I
was scared to death, but excited all the same. I was wondering what the client would be like.
And I was enjoying getting to know my new colleague and friend for the first time.1 Since
then, I’ve seen grown men and women cry, watched a chief financial officer grab an Oracle
1
Few people know that Cary Millsap and I grew up at Oracle together. We started and also ended our careers
at Oracle Corporation within a few months of each other. I tell people that I cofounded both Oracle’s Core
Technologies and the System Performance Groups. What I usually don’t tell them is that one of the other cofounders
is Cary. Those years at Oracle were filled with some of the best memories of my life.
1
Introduction
salesman by the tie and drag him across the room, visited nearly every continent and 23
countries, eaten things (food is hardly the right word) I won’t mention, stayed in hotels fit for
kings and the homeless, and enjoyed conversations with people all over the world.
This book is the culmination of all these experiences written down so I can share them
with a broader audience than I could ever hope to personally work with. I hope you find it
phenomenally valuable in your performance firefighting quest!
Why Buy This Book?

This book details a proven, integrated, and cooperative (some would say holistic) approach to
solving complex Oracle database performance problems. We all tend to favor our analysis on
the Oracle internals, SQL tuning, or operating system side. It’s just who we are. But by
performing our analysis from multiple perspectives, followed by integration, we get an
unbreakable diagnosis that is spot-on. The first part of this book is aimed at performance
analysis methods that will naturally foster a comprehensive performance perspective.
This book first builds a framework, so I can cram as much useful information into your
brain as possible. I can see it in my students’ eyes: If I don’t take the time to create a
framework, followed by a proven method, and then fill in the missing technical parts, they are
quickly overwhelmed. I’ve followed this approach in this book. If you look at the chapter
titles, you’ll notice I develop a framework for performance analysis focused on Oracle
performance firefighting, then dig into spot-on diagnosis, and then fill in the missing parts
with deep yet relevant Oracle internals.
But this book goes beyond method, diagnosis, and Oracle internals. It also prepares you
to anticipate the effects of your proposed solutions, both graphically and numerically. So
when people ask you what kind of improvement you expect to achieve, you don’t lie, guess,
stare at them, or ask them to rephrase the question. You can answer them with confidence and
conviction.
This book is based on years of field-validated research and hands-on production Oracle
systems analysis. While laptop tests are valuable, they lack the risk and suspense of working
on real systems. Everything in this book has been tested to the best of my ability. Not only
have I validated this book’s contents, but the thousands of DBAs I have worked with have
been using these methods (and some of the tools) for nearly 20 years now. I think that’s
proven enough.
But keep in mind that I will use abstraction when details are not necessary. While
useless details make great cocktail party conversation, they don’t help much with production
Oracle systems.
This book is for you, not me. I did not write this book to impress myself. My ego is
secure. I wrote this book to help me do what I love doing, and that’s teaching. Honestly, there
is a financial component involved, but ask any author, and he will tell you the payoff is not in
the book royalties. I hope you quickly realize I am trying to teach (with some entertainment
mixed in), not preach, and give you clear evidence of the usefulness of the methods and
techniques I recommend.
2
Introduction
What Is the Value to Me?

Most DBAs and developers do a lot of guessing in their performance work. This book
demonstrates how to remove the guessing, which helps you develop truth-based confidence.
That’s immensely valuable!
Contained in this book are the methods and techniques to give you the confidence to do
the following:
• Know where to start your analysis.

• Know what to focus on and what to confidently ignore.
• Know what Oracle is saying to you. (Yes, Oracle does speak to me.)
• Know how to make the situation better, for sure.
• Know how to build a case and convince others of your recommendations.
• Know how to quantify the benefits of your tuning options.
• Be a better performance firefighter.
If you have a deep Oracle internals background, applying the techniques covered in the
first few of chapters will dramatically improve the speed and accuracy of your analysis, and
will also provide you with a way to naturally explain to others the problem and the possible
solutions. As most senior Oracle professionals know, given enough time and proper training,
we can learn the technical internals, but communicating complex technical topics in a
confident and convincing manner is much more difficult. This book will help every Oracle
performance firefighter to better communicate complex technological subjects.
Keeping up with Oracle technology can be daunting: It is constantly changing.2 But I’ve
found that I’m able to quickly assimilate new information. It’s clearly not because I’m smarter
than others. I think it’s because I have a framework and a method to cram all this new
information into my tiny brain. As a side benefit, having a solid framework also makes it easy
to disprove the multitude of bogus myths that surround performance firefighting. You’ll
definitely see the value of my solid and methodical framework.
Finally, this book is about equipping. Without a doubt, you will be equipped to deal with
the realities of diagnosis, coming up with practical and effective solutions, and be able to
work with others to prioritize and implement the solutions in complex Oracle systems.
Who Will Benefit?

Most readers will be DBAs and developers. But you may be one of those IT professionals
who specialize (or want to specialize in) performance management, optimization, tuning, or
whatever you want to call it. This book will benefit you the most.
2
It’s a very good thing that Oracle keeps improving its technology. My students and clients frequently ask,
“Will I be replaced by Oracle’s automated features?” My answer is that if Oracle stops developing its technology
then, yes, you will be replaced! Because with no innovation, the supporting automation will catch up. So if you keep
on the edge, you’ll stay in front of automation.
3
Introduction
If you are in some way responsible for improving and maintaining any Oracle system,
then you will benefit from the information in this book. This means those of you who fit one
of these descriptions:
• Are responsible for making performance better

• Want to be responsible for making performance better
• Want to understand what those who are responsible for making performance better
are doing
• Want your career to take on a new life (perhaps a slight exaggeration, but not much)
How This Book Is Organized

This book is organized into three basic parts. We’ll start by delving into a series of field-
proven methods that give us a dependable and structured approach—which also naturally
leads to being able to explain our complex analysis in a way that even managers can
understand! Second, I’ll infuse you with spot-on diagnosis techniques so you can listen to
Oracle as it tells you where it hurts; that is, where the pain—the contention and the
performance-limiting issues—reside. I’ll then venture deep into relevant Oracle internals, so
you have the knowledge you need to arrive at correct and appropriate solutions to remove
Oracle’s pain. And finally, I’ll teach you how to anticipate the impact of your proposed
solutions.
Here is a quick summary of each chapter:
• Chapter 1, Methods and Madness: This chapter is about how to approach
firefighting from a methodical and holistic perspective. It builds the framework upon
which this entire book is based, setting the stage for a massive infusion of diagnostic
details and Oracle internals.
• Chapter 2, Listening to Oracle’s Pain: Here, you learn how to diagnose complex
Oracle systems with pinpoint accuracy. The material is based on Oracle's kernel
instrumentation and response-time analysis using the latest Oracle capabilities.
• Chapter 3, Serialization Control: Locks and latches are a mystery to many,
including the most seasoned Oracle professionals. And with Oracle Database 10g
Release 2, Oracle introduced mutexes! Most DBAs feel there is little you can do
when latching contention raises its ugly head, but this is not true. I have found that
once you understand the general latch and mutex algorithms, and are equipped with
knowledge of Oracle's memory architecture and algorithm internals, latch/mutex
contention solutions become very apparent.
• Chapter 4, Identifying and Understanding Operating System Contention: This
chapter focuses on each subsystem (I/O, memory, CPU, and network), teaching you
how to quantify both throughput and contention (entire system and top processes)
using standard Unix/Linux tools. If you want to effectively fight performance fires,
you must be able to quickly find the operating system bottleneck and relate that to
Oracle.
4
Introduction
• Chapter 5, Oracle Performance Diagnosis: This chapter rounds out and digs very
deep into, among other things, gathering performance statistics, avoiding myths and
pitfalls, dealing with the realities of modern Oracle architectures, and taking
advantage of advanced Oracle data collection capabilities. It also introduces Oracle’s
built-in kernel-level data collector, known as active session history, or ASH for
short.
• Chapter 6, Oracle Buffer Cache Internals: Oracle’s buffer cache operation must
be smooth, contention-free, and speedy for optimal performance. There are a large
number of Oracle wait events surrounding the buffer cache. The buffer cache is an
amazing Oracle technical accomplishment, and it’s our job to ensure it is running
optimally. I will go deep into the key algorithms, including how to diagnose real
problems and resolve them.
• Chapter 7, Oracle Shared Pool Internals: Oracle’s shared pool is extremely
complex and can have significant performance problems. The shared pool contains a
large number of different memory structures, each with unique characteristics that
must relate to potentially many other shared pool structures. In this chapter, I take a
divide-and-conquer approach to ensure the problem is correctly diagnosed, and the
related structures and algorithms are understood, so that the solutions naturally
present themselves (really).
• Chapter 8, Oracle Redo Management Internals: Surprisingly, redo-related
contention identification is straightforward, but finding solutions can be frustratingly
difficult. We’ll take this on by looking at the core algorithms, understanding
precisely what the problem is, and knowing how to address it. As you’ll find, some
of the solutions are quite creative.
• Chapter 9, Oracle Performance Analysis: This chapter integrates all of the topics
presented in this book. Having a deeper understanding of the problem gives you the
ability to shift a reactive performance situation into a forward-looking solution.
Personally, I think this is the most fascinating chapter, because it shows how to
quantify the impact of your various optimization solutions both numerically and
graphically (by creating response-time graphs for your system).
What Notations Are Used?

Table 1 shows the notation used in this book for numbers. Decimal, scientific, and floating-
point notation are used where appropriate.
Table 1. Numeric notations used in this book
Decimal Scientific Floating Point

1,000,000 1.0x106 1.000E+06
-6
0.000001 1.0x10 1.000E-06
5
Introduction
Units of time are abbreviated as shown in Table 2.
Table 2. Abbreviations for units of time
Abbreviation Unit Scientific Notation

hr Hour 3600 sec
min Minute 60 sec
s, sec Second 1.0 sec
ms Millisecond 1.0x10-3 sec
µs Microsecond 1.0x10-6 sec
ns Nanosecond 1.0x10-9 sec
The units of digital (binary) capacity are as shown in Table 3. The corresponding units
of throughput are, for example, KB/s, MB/s, GB/s, and TB/s.
Table 3. Abbreviations for units of binary capacity
Symbol Unit Decimal Equivalent Power of 2 Equivalent

KB Kilobyte 1,024 bytes 210 bytes
MB Megabyte 1,048,576 bytes 220 bytes
GB Gigabyte 1,073,741,824 bytes 230 bytes
TB Terabyte 1.0995116x1012 bytes 240 bytes
6
Introduction
Table 4 gives a list of symbols used in this book.3
Table 4. Mathematical symbols
Symbol Represents Example

trx Transaction 10 trx
λ Arrival rate 10 trx/sec
St Service time 1 sec/trx
Qt Queue time or wait time 0.120 sec/trx
Rt Response time 1.12 sec/trx
Q Queue length 0.24 trx
U Utilization or percent busy 0.833 or 83.3%
M Number of servers 12 CPU cores or 450 IO devices
m Number of servers per queue 12 CPU cores or 1 for IO subsystems
What Is Not Covered?

This book is focused on the subject of Oracle performance firefighting. But some
performance firefighting skills require a separate book or technical paper to adequately cover
the topic.
SQL tuning is a unique and highly prized skill. We all have our own SQL tuning guru
we turn to when we need help.4 In this book, I will show you how to diagnose very precisely
and with full confidence which SQL statements need our attention. However, I will not cover
how to tune the SQL. For that information, you can refer to any of the many resources
devoted to SQL tuning.
Oracle forecasting/predictive analysis is my other area of expertise. In fact, I wrote the
first book on the subject entitled, Forecasting Oracle Performance. But this topic is clearly
not the focus of this book. However, I will take some key concepts from forecasting, creating
a bridge between firefighting and forecasting. This bridge will allow you to naturally switch
between these subject areas and also help in communicating the performance issues and
possible solutions.
New features and functions are a lot of fun, and they are important. But they are no
substitute for pinpoint diagnosis, a solid methodology, and an understanding of Oracle’s core
algorithms. I find new feature and function type classes very useful in coming up with
creative solutions to a well-diagnosed and defined problem. This book is not focused on new
features and functions in the traditional sense. Only when a change in Oracle’s kernel affects
3
For most symbols, there is no formal or universally accepted representation. This makes relating formulas
from one book to another sometimes very confusing. I have tried to keep the symbols straightforward, in line with the
standard when there appears to be a standard, and consistent throughout the book.
4
For SQL tuning help, I turn to Mr. Mark Gury of Mark Gury & Associates (MGA) based in Melbourne,
Australia. Mark makes his living, in large part, by tuning the biggest and baddest SQL statements I have ever seen.
7
Introduction
the subject matter of this book will it be mentioned. Otherwise, this book would become very
Oracle version-specific, joining many other massive, and of relatively limited value, Oracle
books on the market today.
Interesting yet not useful Oracle internals are simply not covered. Just how deep do we
go into Oracle internals? This is a question I grapple with daily. Everyone involved with
Oracle, from the kernel architect to the businessperson, abstracts Oracle internals in some
way. Since each of us has limited time and specific responsibilities, it behooves us to go only
as deep as we must and abstract wherever we can. So, I only go as deep into the internals as
necessary to clearly diagnose the problem, arrive at solutions, and communicate my work to
others (both the technical and the nontechnical). This means I miss out on certain cocktail
party conversations, but I can cover a much larger breadth of technology and get the job done.
One of the frustrating parts of my work is when people (and sometimes students) want
nothing but Oracle internals knowledge. Without exception, they are the ones who, when
faced with a case study or a real Oracle system, stare at the screen wondering where to start,
spend time on nonproductive analysis, and have extreme difficulty communicating to others
what they are doing and why they are doing it. Therefore, in this book, I will take the liberty
to abstract where I feel it is best for real-life production Oracle database system performance
analysis.
Which Platform and Version?

In this book, references to methodology, performance diagnosis, and communication apply to
all Oracle versions since Oracle 6. They are not affected by Oracle version changes. This is
great news, because it allows a consistent and proven approach that spans time. And we’ll use
that to our advantage!
Core Oracle internal algorithms are covered up until the printing of this book. But don’t
let that stop you from enjoying and gleaning what this book has to offer. Most of the core
Oracle algorithms that I present are so central to Oracle’s interworkings that their relation to
real-life production Oracle system performance analysis remains constant. Sure, you can find
exceptions, but with 99.9% of Oracle performance issues, this is not the case. This book is
focused on real systems, not laptop test cases designed just to make a point.
When the specific Oracle version affects the algorithms I present, I’ll explain the
differences among versions and how they affect our performance work. I think you’ll find it
very refreshing that to be a stellar performance analyst, you don’t need to know every Oracle
algorithm in every Oracle release.
Regarding operating systems, unless otherwise stated, references are based on Linux.
However, among the various operating systems, when it comes to determining the operating
system bottleneck (not tuning) and relating that to Oracle activity, the concepts, words we use,
and our objective are similar. The data collection tool may be different, but the specific pieces
of information that we want to gather are the same. Most people are surprised that the
concepts can be applied to any operating system. The key is knowing what to look for and
then finding the tool to get that information.
8
Introduction
About the Tools Used In This Book

All tools related to Oracle data collection used in this book are from the OraPub System
Monitor, or OSM for short. I created this tool kit and use it on-site during my consulting
engagements. You can download the complete tool kit (for free) from
http://www.orapub.com. There are also a number of other spreadsheets and Perl scripts I
reference. Unless otherwise specifically mentioned, all the tools are available for free from
OraPub’s web site.
All the operating system tools used in this book are standard Unix/Linux tools, such as
vmstat, sar, iostat, and top. If you understand how these tools work, you’ll know
what to look for when using your platform-specific tool.
Want More?
If you like what you are reading, then consider attending any of my presentations, seminars,
and courses. I have the opportunity to literally travel the world, teaching and consulting. Here
is how most people glean from my work:
• Training: I absolutely love to teach. My master schedule can always be found at my
company’s web site, http://www.orapub.com. If you and your colleagues would like
a customized, private, on-site course, you can always contact me through OraPub. I
offer training in both performance firefighting and predictive analysis areas.
• Consulting: If I can fit it into my schedule, I’ll do it. And I’ll travel to all but three
countries to meet you on site. As you may have guessed, my areas of expertise are
performance firefighting and predictive analysis. Consulting in the predictive
analysis area tends to focus on server consolidation, hardware change analysis, and
workload change impact analysis. I help organizations set up a predictive analysis
capability.
• Publications: I have written many technical papers. Most of them are still relevant,
as I tend to focus on time-enduring topics. The most recent version of all my
technical papers can be found on the OraPub web site.
• Tools: At the time of this writing, all my tools can be downloaded for free. This
includes SQL, operating system, and spreadsheets tools. Enjoy!
Comments and Questions

I have tested and verified the information in this book to the best of my ability, but you may
find that features have changed or that I have made mistakes. If so, please notify me by email.
My email will be posted on this book’s web page located at the OraPub web site
(http://www.orapub.com).
9
Introduction
This page has been left intentionally blank.
10
CHAPTER
1
Methods and
Madness
Firefighting situations are unique in that they contain a number of elements, which together
bring fear and panic to nearly everyone involved. What you need to excel in this environment
is confidence to methodically move forward step by step. Even though people are out of
control around you and you’re being peppered with distractions, if you have a proven method,
you’ll know exactly what to do. This book provides the knowledge and methods you need to
be successful in a firefighting situation.
If you scan this book, you will notice a few recurring themes, such as OraPub’s 3-circle
analysis method, wait-event analysis, response-time analysis, and a solid understanding of
Oracle internals. The beauty of these is that each supports the other during diagnosis, analysis,
and communication, leading to building an extremely strong solution case and breeding
cooperation.
This chapter starts you off by identifying the keys to thriving in the chaotic firefighting
environment, and then introduces the main methods covered in this book. This chapter is
about creating the initial framework to prepare you for the eventual firefight. Once the
framework is complete, you’ll be able to quickly absorb a tremendous amount of technical
knowledge, which when methodically applied, yields extremely strong solutions.
11
Methods and Madness
Firefighting 101
Most people around you will be frightened. They will be fighting fires with their right hand,
while their left hand is updating their resume on Monster.com. They will be frantically trying
to figure out which SQL and programs are consuming a lot of resources, what Oracle
processes are waiting for, what users are experiencing, and the CPU consumption—all while
people are standing behind them asking for constant updates. What you need is a superior
method you trust. There is no way you can or would want to look at every statistic. It’s just
not possible, and the good news is that it’s completely unnecessary. With a proven method,
you can take one step forward at a time by gathering the minimum amount of necessary
information.
Stoking the Fire

Over the years, I’ve noticed a pattern or characteristics of firefighting situations. I think we
can learn a lot about prevention if we think about the common elements that help stoke1 the
fire:
• The critical production system is involved. This one is kind of obvious because, if
the situation were not critical and a production system were not involved, then it
wouldn’t be classified as firefighting. But it’s a common element.
• Technical promises are made by nontechnical people. When a performance
situation escalates, it’s common for management to get involved. While this can
bring calm to the situation, if management desires peace at any price, it can lead to
promises that are technically unfounded and put unnecessary stress on the technical
team. You need to be diplomatic in these situations. Never agree to something you
know is false and cannot be realistically accomplished. In the end, you could be held
accountable simply for agreeing.
• The timing is unfortunate. How about Friday afternoon just before you are about to
leave on a holiday—sound familiar? It has happened to everyone.
• Significant change occurred. It may seem obvious while reading a book, but when
you’re in the battle, changes like an operating system patch, an Oracle database
patch, an application patch, or even an additional module or application can escape
everyone’s attention. Think about what could have possibly changed, because it will
help you understand what you are technically observing.
• Surprise! Change is fine as along as we anticipate it. Even unanticipated change is
OK if we allow room for the unexpected. But what really causes problems is a
complete surprise that no one was expecting. That punch-in-your-gut surprise when
you first arrive in the morning can be enough to throw everyone into a tizzy. If there
is no surprise, people have enough time to prepare and respond to the situation,
which is then less likely to become an intense firefighting situation.
1
Stoke is a cool word. Perfecting my surfing skills during my university years on the central California coast, I
became quite a practitioner of the word stoke, as in “I’m so stoked!” or “That wave was stoken!” The more traditional
use of the word, and as I’ve used it in this case, is to increase the intensity of a fire or some combustible occurrence,
as in “I will stoke the fire.”
12
Methods and Madness
• Key business processes are slow. When business processes are slow, people get
irritated, whine, and complain. But when the process directly affects the business,
managers become concerned and feel they must step in. For example, if your
business cannot recognize revenue until the product ships, the product shipping
business flow must be smooth and uninterrupted. Problems will directly affect
revenue, the stock price, people’s bonuses and paychecks, and employees’ jobs. So
when people are complaining, ask yourself if their activity is key to the successful
operation of the company. If so, be sensitive and careful about how you proceed. If
not, delicately turn your focus elsewhere.
• Applications are generally slow. If the situation is intense, usually the root cause is
harsh enough to affect nearly all application areas on the system. So not only do you
have people screaming about specific key business processes, but there is also a
grumbling application user chorus. This is when you must prioritize and
communicate to people what you are doing and why.
• Extreme pressure exists. In a firefighting situation, it is not uncommon for jobs to
be on the line. When this occurs, people freak out. Their fear turns to anxiety, and
they panic. People express this in different ways. I’ve seen people break down and
cry. I’ve seen a distraught chief information officer (CIO) physically grab another
person and drag him across the room. People become unstable. You need a way to
methodically move forward to avoid becoming trapped.
• Olympic-level finger-pointing occurs. Oracle database administrators (DBAs) are
not trained like salespeople. Salespeople are amazing at deflecting responsibility. If a
salesperson tells you the problem is with a module, you may say something like, “I
can look into that.” And to your surprise, the salesperson’s response is, “Oh, so it’s
your fault! We all hope you can fix it.” See how easily the situation turned from your
willingness to help to being your fault? This is what I’m talking about. Try to clarify
what you say, so the expectations of the company do not rest on your shoulders.
Cooling the Fire, Step by Step

Every time I encounter a firefighting situation, I follow a consistent series of steps. Not only
does this list guide my actions, but it also lets others know my plan. If people know you have
a plan and understand what that plan is, they are more likely to leave you alone. They will
have more trust in you, which is important in these situations.
Don’t Panic
Never panic. You are a highly paid professional! Numerous DBA jobs are available,
especially if you are willing to visit other places in the world. So, even if you completely
screw up, there is a very good chance you can get another job. And you may be paid more! So
relax and enjoy one of the unique and challenging aspects of our line of work. I tell a lot of
stories when I teach, and the best stories come from the most intense firefighting situations
I’ve encountered.
Define Your Objectives

Make sure you and everyone else involved agrees with what you are trying to accomplish. Get
it down in writing. I have been to countless companies where the DBAs were trying to solve
13
Methods and Madness
an undefined or shifting problem. Eventually, they became so frustrated that some of them
quit their jobs. Usually, this is a management problem, but you need to take control of the
situation and clearly outline what you intend to do, thereby also identifying what success
means. Don’t settle for a moving target, because you’ll never win!
Establish the Scope and Get Reinforcements

The ancient Spaniards had a saying that goes, “He was a brave man that day.”2 In other words,
you may be awesome, but you’re not awesome every day! So get other people involved, just
in case you can’t do it all.
I take a pragmatic view regarding people who say they can do it all: either they are lying
or they have never been involved in an intense firefighting situation. Either way, you don’t
want to work with them.
Figure out what skills are needed and get them. This is not always so simple, because
sometimes the skills are diverse and not available in your group (and perhaps, they should not
be available in your group). For example, the typical DBA group will not have operating
system tuning experts and application functional experts. The key is clearly identifying the
required skills, regardless of where the skills reside. First deal with the scoping, or the
definition, and then work to secure those skills.
Most firefighting situations require a very diverse set of skills. For example, you may
need Oracle diagnostic, SQL tuning, and operating system tuning skills. But I’m willing to bet
you also need someone who is an expert on how to use the application. If your users are new
to the application, it’s a good bet they are trying to use the new application in the same way as
they used the old application. While they may get the work done, it might not be the best way
to use the new application’s capabilities. This directly impacts the system, because the users
may be requiring the system to do more work than necessary. The problem is that you and
your management team won’t know this, but a functional expert will. So consider investing in
someone who can help your users. The reduction in system workload can have a dramatic
impact on performance.
Consider at the outset of the crisis that this might be a long ordeal that continues through
the day and night, and back into day again, for 16, 30, 40, or more hours. No one person,
however heroic, should expect to endure, or be expected to endure, for the entire duration.
One of my first considerations is who will be available within 8 to 10 hours to take over for
me for 8 hours. If you are the most capable person to begin the firefighting effort, then find
the next best person to take over from you, and plan on at least an hour of “handover” overlap
as well.
Get a Baseline of Current Performance

Related to a well-defined objective is an actual sample of the key problem area response
times. This will become your baseline. A baseline is used for comparison. For example, when
someone says, “the query now runs 50% faster,” you still want to know if the query runs in 2
seconds or 2 hours. It’s similar to a commercial stating something contains “35% less fat!” A
statement like that screams for a baseline, and is clearly intent on deceiving and manipulating.
Don’t fall into the same trap.
If you are not sure what the key business transactions are, then ask your manager. Even
better, set up a meeting with the user representatives and ask their priorities. Keep it limited to
2
Précis de l’art de la guerre, by Antoine-Henri, baron de Jomini; referenced in The Greek And Macedonian
Art of War, by F.E. Adcock (University of California Press, 1957; p. 10).
14
Methods and Madness
just a handful of transactions. Let everyone know that initially your team must focus on the
absolute business-critical transactions and nothing else. Reassure them that after these
transactions are running well, you will move on to another tier of performance analysis.
After selecting the transactions, take multiple samples. It is best to have users run the
transactions and have them watch you write down the timing result. This will increase their
trust, which at this point is probably at an all-time low. You don’t need to take 30 samples—3
to 5 will be enough to give you a good idea of the situation.3 Then write these down and
perhaps post them so everyone knows the baseline.
Install Your Tools

Regardless of which tools you use, you will need interactive and historically focused tools.
Each has a distinct focus, and most vendor-supplied tools naturally fall into one of these
categories.
Interactive tools operate in real time. When you log in to Oracle using SQL*Plus and
run a script, you are being interactive. Interactive tools have the distinct advantage of going
technically very deep. All the data is available because the activity is happening now. Hyper-
focused interactive tools usually have dazzlingly graphics. However, be aware that interactive
tools do have limitations. During a firefight, there is usually a lot of chaos, and while you are
focusing on one problem, another problem is occurring. If you are limited to interactive tools,
you could miss a key investigative opportunity.
Historically focused tools periodically collect data and then allow you to “look back in
time,” as they say. Examples of historical tools are Statspack and AWR. Historical tools are
wonderful for analyzing a problem after it happened (like while you slept last night), when
multiple problems need to be analyzed, or if you need to reference a past situation (this occurs
more often than people think). Historical tools usually provide trending and graphing
capabilities, which are especially helpful in communicating with nontechnical people.
Historical product vendors have a very technically challenging problem. They want to
gather as much data as possible and as frequently as possible. However, vendors know the
more frequently they gather data and the more data they gather, the more their collection
activities will impact the system. This is one reason why good historical products seem
expensive. In Chapter 5, which covers Oracle performance diagnosis, I will address data-
collection challenges.
Firefighting vendor products shine when they blend interactive and historical
capabilities. Regardless of the tools you use, ensure you have both interactive capabilities and
historical capabilities. Always have your tools installed, running, and ready.
Develop a Simple Communications Strategy

If performance is disastrous, put simply, users assume you’re an idiot. They believe you do
not understand there are problems, you don’t know what the problems are, or you’re not
fixing the problems. Unless you have done something to change their minds, that’s what they
will think.
How could users be so cruel? Look at it from their perspective. They may believe it is
solely your responsibility to keep the system running smoothly and that you get paid very well
to do this. So if performance is unacceptable, it follows that you are not doing your job or are
3
For characterizing a sample set average and dispersion (think standard deviation), only a few samples are
required. If you are inferring what may happen in the future, then 30 or more samples are desired. For details on
sampling for forecasting purposes, see my book, Forecasting Oracle Performance (Apress, 2007; p. 84).
15
Methods and Madness
incapable of doing it well. That may be completely incorrect, but unless you do something
quickly to break that thought process, it will become truth to them and, unfortunately, perhaps
to others.
Breaking this destructive thought process is easier than you may think. If you have met
with users about the key business processes and have timed those processes, you have already
demonstrated you care and know what is important to them. If you have not met with them,
then schedule a meeting right away. Time is of the essence.
Next, create a way for users to monitor your progress whenever they wish. This could be
as simple as a basic web page, with both numbers and graphs, or a notice with the same
information periodically posted in a location convenient to the users. Keep it simple and to the
point. If your posting becomes confusing to them, not only will they continue to interrupt and
distract you from doing your work, but you will have also failed to break the destructive
“you’re an idiot” cycle.
Pick the Low-Hanging Fruit

I have rarely been to a customer site where there was not an immediate and relatively easy
performance improvement opportunity. It is in everyone’s best interest to perform a very
quick analysis looking for these opportunities. While my analysis is performed quickly, I still
follow my methods, but I don’t spend as much time gathering perfect data and waiting for the
“ah ha!” moments to occur. It’s more of a “ram the analysis through my brain as fast as I can”
exercise. Then I discuss the options with my client. Some changes may require an instance
cycle; others may not. I also will try to give input, quantifying if possible, on which solution
will impact performance the most. We look at combining uptime requirements with potential
performance gains, and together, my client and I determine how to implement my
recommendations.
Finding the low-hanging fruit not only improves performance, but also gives everyone
involved confidence and hope. Because these situations are so charged with emotion, hope
and confidence are desperately needed.
Conduct a Deep Performance Analysis

At this point, I should have good performance data available, some user hope and confidence,
the objectives clearly defined, and the performance baselines established. Now, I very
methodically perform my analysis, implement the solutions, give the system time to stabilize,
gather more data, and repeat until it’s time to stop.
Students and clients frequently ask me how I know when to stop improving
performance. The answer, while not providing the warm, fuzzy feeling people may want, is
straightforward. I keep working on improving performance until there is no time remaining,
no budget remaining, or performance is acceptable. This is a very simple statement, but
obviously there is much more to it.
At some point, it just doesn’t make sense to keep chipping away at performance.
Diminishing returns are always a factor. For example, the first tuning cycle may improve
performance by 50%, the second by an additional 10%, and the next by 5%. Should we keep
tuning? Perhaps other options should be considered, such as increasing system capacity or
altering the workload. Regardless of how you disguise the situation, it always comes down to
time, money, and performance.
16
Methods and Madness
Get a Final Baseline of Current Performance

Since you are periodically collecting key business process timing information, gathering the
final baseline sample should be trivial. Still, write it down and publish it. Let people know
how awesome a job you and your team have done! If performance went from a query taking
12 hours to that query running in 2 seconds, tell them. You don’t need to be an egomaniac
about it, but at the same time, people need to know they can count on you in the future. Use
this opportunity to further instill confidence and hope.
Document Your Success

Closely tied to the final performance baseline is documenting your work and its subsequent
success. The focus now turns to documenting what just happened to gain a deeper
understanding, informing and training others, and to prevent a similar situation from
recurring.
When I worked for Oracle Consulting, I required my team to complete an on-site client
report, which we called our engagement summary, and distribute it to the entire group. While
traveling, we commonly read each other’s engagement summaries. This dramatically
improved the technical competence of our group. It also created a deep sense of community
and camaraderie. When the group is geographically dispersed, as was the case with my team,
this is especially valuable to the members.
Celebrate!
Oracle performance firefighting is a unique time in your life. Most people never get a chance
to experience the highs and lows of firefighting. How many times in your career will you be
able to celebrate a dramatic and company-critical performance firefighting success? Probably
not that many. So when you have survived a firefight, take some time out and celebrate.
When the ancient Greeks went to war, their campaigns were brief. The armies operated
in the summer, because they needed to be home again in time for the harvest and gathering of
grapes and olives. One battle always settled the business.4 A war was specifically designed to
be of limited scope and be done with quickly. At that time, governments couldn’t run a deficit
or borrow money to finance a long, drawn-out campaign. If wartime spilled into harvest, both
sides could starve! Everyone knew this, and their tactics were based on doing only what was
absolutely necessary.
Just like the ancient Greeks, DBAs should use performance firefighting techniques only
when necessary. You don’t need to go through deep firefighting performance analysis
everyday. If you try to do this, you will probably quit, change groups, or become a manager.5
Take time to celebrate and time to rest. Then your performance firefighting career will be
long and glorious!
4
The Greek And Macedonian Art of War, by F.E. Adcock (University of California Press, 1957; p. 7).
5
One of the more creative ways to get out of constant firefighting is to be promoted. This is a risky strategy,
however. While the well-known performance firefighter can more easily be promoted, becoming a good manager is
usually not a great career fit. Often, ex-firefighters are atrocious managers and end up returning to a technology-
oriented job.
17
Methods and Madness
OraPub 3-Circle Analysis

I first wrote about my 3-circle analysis method back in 1994 on a flight from Los Angeles to
Sydney. I actually remember writing the paper, entitled “Total Performance Management,”6
in which I called it the holistic problem isolation method. It seemed so natural to the way I
approached firefighting that, at the time, I didn’t grasp its simple power. Over the years, many
of my students have written me about how this one method revolutionized their performance
analysis. Strangely, of all my work, this is the very least technical, but perhaps that’s the
genius of the method.
Take a close look at Figure 1-1, which illustrates the 3-circle analysis technique. In a
nutshell, the database server is divided into three subsystems: Oracle, application, and
operating system. Each subsystem is diagnosed separately and then in combination. There
will be a clear overlap in resource consumption and contention. If the problem resides with
the database server, the overlap is the current performance problem. It’s that simple.
Figure 1-1. As unbelievable as it may seem, OraPub’s 3-circle analysis method will guide
you toward pinpoint diagnosis, reduce your chances of making a mistake, help you determine
multiple solutions, help you structure your analysis, and help you effectively communicate
your findings.
The Oracle subsystem is focused on the Oracle instance and database. Oracle response-
time analysis (discussed later in this chapter) is used to diagnose this circle, and solutions are
directly targeted at what can be done to improve the Oracle response time. The application
subsystem is focused on SQL, application processes, programs, modules, and so on. The
Oracle response-time analysis will direct you to what type of SQL is causing contention in
Oracle, and probably consuming a tremendous amount of operating system resources.
The operating system analysis is focused on finding the operating system bottleneck,
which will be the CPU, memory, IO, or network. Unless the problem is outside the database
server, or there is an Oracle or application blocking or locking issue, there will be an
operating system bottleneck.
6
This technical paper is still available at the OraPub web site (http:///www.orapub.com).
18
Methods and Madness
The benefits of the 3-circle analysis method are many and should quickly become
obvious. As you read about the following benefits, consider how they would have helped you
in previous firefighting situations.
• Different perspectives reduce risk. An ancient proverb says, “Plans fail for lack of
counsel, but with many advisers they succeed.”7 When your analysis is based on
three connected yet distinct perspectives, your chances of making a mistake are
greatly diminished. In addition, I cannot overstate the power of analyzing the
operating system bottleneck from an operating system perspective, or finding the
problem SQL based on what both Oracle and the operating system are telling you. If
your operating system and Oracle analysis do not support each other, then you are
missing something. Once you have confirmed your analysis from multiple
perspectives, the risk of a misdiagnosis is greatly reduced.
• Different perspectives strengthen analysis. Establishing and understanding the
links or relationships among the three subsystems not only reduces the risk of a
misdiagnosis, but also increases the strength of the analysis. You will have
anticipated questions others will ask and can address how one subsystem is affecting
the other. People inherently seem to understand that your analysis vets
inconsistencies and errors. You’ll find that your analysis is naturally comfortable for
people from both an operating system and a “it’s always the SQL” perspective. This
helps build a cohesive and unified approach to solving the problems. Many DBAs
make the mistake of diagnosing only from one or perhaps two perspectives. Then
when they present their analysis, they are shocked to find that, for example, the
operating system team does not agree with or is offended by their analysis. If they
had analyzed the situation from all three perspectives, their analysis would have been
much stronger and received a better reception.
• Multiple solutions naturally result. At least three different solutions will naturally
result, and probably many more! When solutions are being devised, I focus on each
subsystem and ask myself how I can alter the situation to improve response time. For
example, when focusing on the Oracle circle, I ask myself, “How can I alter the
Oracle instance to reduce response time?” The answer may be as simple as
increasing the buffer cache, but the focus is clearly on changing Oracle. It is
common to develop multiple solutions for each subsystem. The realities of
availability and change control mean that some changes are not immediately
possible, need to be scheduled, or are simply not an option. Multiple solutions will
be attacking the same problem, which obviously increases the chances of a
significant performance improvement.
7
The Bible, Proverbs 15:22.
19
Methods and Madness
• A story develops organically. Your analysis will naturally develop a story. Yes, I
said a story. If you can turn a raging technical catastrophe into a plausible story that
both technical and nontechnical people can understand, you have a very good chance
of people believing you and therefore embracing your recommendations. Why?
Because it makes sense to them, and people are much more willing to believe in
someone they understand. Don’t underestimate the power of a simple story.
Throughout history, stories have repeatedly demonstrated the ability to change entire
cultures and countries. Your technical analysis may be staggeringly accurate, but if
you can’t convince others, they will not accept your recommendations. A simply
story can transform a very complex problem into a tangible and understandable
situation. The beauty of the 3-circle analysis method is that the technical analysis
naturally creates the story for you. I’ll show you how this works after we look at
some case studies of 3-circle analysis.
• Avoid finger-pointing; breed cooperation. Unless there is a gross
misconfiguration, everyone involved can contribute to the solution. Since everyone
has something to contribute, finger-pointing stops. It also helps put pressure on
vendors or groups who state there is nothing for them to do, because essentially they
are saying that there is nothing they can do to help. No one wants to say they don’t
want to help. For one group to place the blame entirely on another subsystem is
usually not appropriate, demonstrates ill will, and will result in a suboptimal
solution. When discussing solutions, look for every circle’s group to contribute
significantly. There must be a very good reason why a circle’s group does not wish
to participate in solving the problem.
• Any performance tools will help. Any respectable tool kit will provide the
information you need to perform a solid 3-circle analysis. Basic operating system
tools such as vmstat and sar, and Oracle tools such as Statspack are enough to
perform a strong analysis. These tools make it easier to get the information you need.
Your analysis can go deeper, and you have a higher trust level in the raw data. When
this occurs, both your productivity and analysis confidence increase. Regardless of
your tools, because you’re performing a 3-circle analysis, its built-in
multiperspective analysis safeguards you from making a mistake.
A Couple 3-Circle Analysis Case Studies

Let’s look at two case studies constructed to demonstrate how you would actually perform the
3-circle analysis. The first case study is very simple, and the second is slightly more complex.
Case Study 1: Quick 3-Circle Analysis

In this case study, management is concerned about periodic and increasingly intense periods
of performance slowdowns. Several key business processes are being affected, as well as
general application performance. Management wants you to diagnose the system, recommend
solutions, work with the various teams to ensure the solutions are implemented, and then
confirm performance is back to previously accepted levels.
Using 3-circle analysis, you focus on each of the three circles, and then establish their
interrelationships. Where you find overlap or where the relationship is the strongest, that’s the
bottleneck or the area that needs your immediate attention.
20
Methods and Madness
Suppose an Oracle response-time analysis, introduced in the “Oracle Response-Time

Analysis” section later in this chapter, shows Oracle processes are primarily waiting for
multiple IO block requests. I like to call this physical IO, but really the blocks are simply not
in Oracle’s buffer cache and reside somewhere else. The terms block reads, block requests,
block gets, and disk reads are also commonly used. On the other hand, a buffer get is an
attempt to retrieve a block buffered in the Oracle buffer cache. This is also commonly called
logical IO, because it is not a physical IO request. It is helpful to think of blocks as residing
outside Oracle’s buffer cache and on disk, and to think of buffers as residing inside the Oracle
buffer cache. The Oracle response-time analysis shows these IO requests take an average of
22 milliseconds (ms) to complete, and the Oracle server and background processes are
consuming about 30% of the available operating system CPU.
Based solely on this information, you would expect to find a few SQL statements
dominating block read consumption. Looking at the SQL statements, sorted by block reads,
you see that two statements clearly are consuming the majority of block read requests. The
connection between the Oracle and the application subsystems has been made.
Notice that you have not pointed a finger at anyone; therefore, it’s unlikely that you have
offended members of any group and caused them to take an immediate defensive posture.
There is a very high likelihood that performance-improving changes can be made to both the
SQL and Oracle. So right away, two teams can get involved in solving the problem!
From an operating system perspective, because Oracle processes are waiting for blocks
outside Oracle’s buffer cache and on average take 22 ms8 to be received from the IO
subsystem, you can clearly expect an IO bottleneck. You have not implied that there is a
problem with the IO subsystem or that the SQL is poorly written, but the fact remains that it
takes 22 ms to complete a multiple-block IO call.
After using standard operating system commands to find the CPU, IO, memory, or
network bottleneck (this will be discussed in Chapter 4), sure enough, you discover there is a
hot disk array. And when you dig a little deeper, you notice the tables are involved in
multiblock reads that are located on the hot disk array! Is it possible to reduce the IO
subsystem multiblock read times? That is what you will talk about with the IO subsystem
team (which includes storage and capacity management personnel). But regardless of the IO
subsystem team’s cooperation, the operating system link to both the Oracle and the
application subsystems has clearly been established.
At this point, you have a three-way confirmed relationship, clearly showing issues and
opportunities for performance improvement. This may sound overly simplistic, and in some
ways, it is. However, the method, the process, and the steps taken are essentially the same,
regardless of the Oracle server configuration or complexity.
Case Study 2: More Complex 3-Circle Analysis

In this case study, management is increasingly concerned about order-entry response time.
Over the past six months, during month-end processing, order-entry response time has
consistently increased to the point where it significantly impacts business. Your assignment is
to return performance to previously acceptable levels.
8
My rule of thumb for IO subsystem read response time is 10 ms. While every application and IT organization
has its own response time tolerances, I have never found an IO team comfortable with a 15 ms response time. A 10
ms or greater response time means not only are the blocks not cached, resulting in real physical IO activity, but there
is also significant contention. I am in no way implying the IO subsystem is poorly configured, although it could be. I
am simply stating Oracle’s IO requirements have exceeded the IO subsystem’s capacity.
21
Methods and Madness
The Oracle response-time analysis shows Oracle processes are primarily waiting to get
cache buffer chain (CBC) latches, and Oracle processes are consuming nearly all of the
available database server CPU. (This is not uncommon with severe latch contention.) CBCs
are used to answer the question, “Is the block in the buffer cache?” CBCs get stressed when
the answer is usually, “Yes, the block is in the cache.” Repeatedly asking this question and
accessing buffers in memory stresses the CPU subsystem. As I’ll detail in Chapter 6, an
Oracle-focused CBC solution is to increase the number of CBC latches by changing an
instance parameter. This is a perfect example of why you must know Oracle’s architecture. A
fantastic diagnosis is important, but you must know what to do with that stellar diagnosis!
This is why this book focuses on both diagnosis and Oracle internals.
Based on the Oracle analysis, you can anticipate what you will find in the application
and the operating system analyses. In this situation, you would expect the operating system to
be experiencing a raging CPU bottleneck and the application SQL to be asking for a lot of
buffers (that is, buffer gets as opposed to block gets). Also, you will want to check the order-
entry SQL. It’s likely that the order-entry SQL is either causing the intense logical IO activity,
and hence CBC activity, or being negatively affected by it.
A quick operating system analysis clearly shows a raging CPU bottleneck. As expected,
the CPU subsystem is 90% busy on average, with a run queue nearly always greater than the
number of CPU cores (which means processes are needing to wait for CPU resources). While
the IO subsystem is doing some real work, there are no volumes busier than 60%, and their
response times are well under the rule of thumb of 10 ms. There is no memory swapping. As
in many cases, the network has been deemed out of scope. A solution focused on the
operating system is to acquire more CPU cycles for Oracle processes. There are many ways to
go about doing this, such as by identifying processes that do not need to be or should not be
running during peak times.
For the application analysis, referencing a top SQL report, you look for SQL consuming
the most buffers (buffer gets, or what I call logical IO). You quickly identify three statements
that are consuming around 90% of all the logical IO. As expected, the top two statements are
part of the order-entry application. An application-focused solution is to tune the SQL with
the objective of reducing logical IO. It’s fine to also reduce the physical IO, but the Oracle
response-time analysis and the operating system analysis clearly support first and foremost
focusing on logical IO—that is, buffer gets.
In summary, the database server is clearly suffering from intense Oracle buffer cache
management. This is supported by a CPU bottleneck, a very high level of easily identifiable
logical IO SQL statements, and Oracle processes waiting to get a latch related to buffer cache
management. Fortunately, there are a number of solutions that should resolve the problem,
from an Oracle, an application, and an operating system perspective.
The 3-circle analysis method may seem strange and even uncomfortable at first, but
once you do it a few times, you’ll be amazed at how fast, thorough, and spot-on your
diagnosis will be. Of course, you also need a good understanding of Oracle’s architecture and
its interaction with the operating system, which is what you’ll get from this book.
Your Compelling Story

People want a good story. One of the natural results of a 3-circle analysis is the story. You can
use the 3-circle analysis method to structure your presentation, a formal document, or even a
simple email. When I start a performance analysis, I create a blank document structured
22
Methods and Madness
similarly to the 3-circle analysis method. As I perform my analysis, I will see the story unfold
before me. It’s amazing to watch this happen.
Regardless of the communication form—whether it is verbal, written, or in a
presentation—the story and the resulting documentation will naturally inherit 3-circle analysis
characteristics. For example, upon analysis completion, it takes very little effort to create a
short email message summarizing your work. Or for the operating system-focused audience,
you could simply provide the Performance Summary Analysis and the Operating System
Performance Analysis sections (described next). You can scale and customize your
communications very quickly, without sacrificing technical integrity or performing needless
rewrites.
Let’s look at the story components, how they relate to each other to build the story, and
how you might go through this process to develop your story.
The Story Components

Every story has a beginning, an ending, and something in between. In essence, your
performance story is no different. Figure 1-2 shows the components I include and their
relationship to one another.
Figure 1-2. Whether written, verbal, or presentation, a natural and structured story unfolds
when performing the 3-circle analysis. These are the structural components of the story and
their relationships.
Here are short descriptions of the components of the performance story, in the order in
which I present them:
• Executive Summary: This section introduces the entire exercise and sets the stage
so anyone can understand the importance of your task. You want to establish what
you are doing, why you are doing it, and what you accomplished. Clearly stating this
at the beginning will provide a kind of road map to guide your audience. Part of the
Executive Summary section is a further consolidation of the Performance Summary
section. Focus on the business and trying to communicate a highly technical situation
to nontechnical people. It’s easy to get too technical, so be careful. Remember that
this is probably the only section management will read. Make it simple, relevant, and
to the point. Do not try to impress people with your technical prowess here. There
are plenty of other areas to impress your colleagues. I also include the overall
engagement objectives and accomplishments in the Executive Summary section.
23
Methods and Madness
• Recommendations: Most people will ask for a consolidated recommendation list.

While this list may be more detailed in subsequent sections and summarized at a
high level in the Executive Summary section, in the Recommendation section, keep
this list succinct, like a simple check box list. This also serves as a follow-up
reference; that is, a kind of baseline.
• Performance Summary: In a single sentence, summarize each subsystem’s
situation and comment on their relationships. Then clearly and forcefully establish
the relationships among the three subsystems. In each subsystem’s Performance
Analysis section, I focus primarily on that circle’s area. But in this Performance
Summary section, my goal is to clearly demonstrate that their activity is strongly
correlated. If I can establish these links, my recommendations will not only attack
the actual problem, but will also be better received by others. I also summarize my
recommendations in this section, showing how they address the problems directly.
• Oracle Performance Analysis: Start this section with a one- or two-sentence
summary of the Oracle performance analysis. Then focusing on Oracle, not the
application or the operating system, perform a detailed Oracle response-time analysis
(you’ll learn more about how to do this later in this chapter and in Chapter 5). This
section is very detailed and includes actual script output and screen shots.
• Application Performance Analysis: Start this section with a one- or two-sentence
summary of the application performance analysis. Then focus on the application, not
Oracle or the operating system. Based on your Oracle performance analysis, identify
the most resource-consuming SQL, programs, processes, users, and applications. I
also look at application-specific features. For example, with the Oracle E-Business
Suite, the batch manager (called the Concurrent Manager) can have a dramatic
performance impact. For a comprehensive analysis, the Concurrent Manager must be
addressed. As with the Oracle Performance Analysis section, the contents are
typically very detailed and include actual script output and screen shots.
• Operating System Performance Analysis: Start this section with a one- or two-
sentence summary of the operating system performance analysis. This section
contains details about only the operating system bottleneck. Unless these areas are
specifically specified to be out of scope, make sure to investigate the CPU, memory,
IO, and network subsystems. I usually include many screen shots and raw command
output in this section. The more raw, relevant, and accurate the output, the more trust
you will receive from the operating system team.
The Story Development Process

When I start any performance analysis, I begin with a blank document, with the exception of
the section headings listed in the previous section and shown in Figure 1-2. Here, I perform a
detailed analysis on each circle, copying and pasting, making notes, and slowly and iteratively
cleaning up the document. I usually start with the Oracle analysis, as that guides me to what
kind of SQL I’m looking for and what to expect from the operating system. But, honestly,
sometimes I start with the operating system.
When I first perform my analysis, while the overall structure is set, the document is
essentially a scratch pad. Sure, it’s full of report output and technical details, but it also
contains comments, ideas, and offhand remarks. These are meant for my eyes only, as they
24
Methods and Madness
contain undeveloped, untested, and miscellaneous ideas. I will never show this raw document
to management. Only when I have a chance to back off a bit and rest, and then make a second
pass will I allow anyone other than the DBAs to view my work.
Once I have completed my 3-circle analysis, I repeatedly go over each circle, and then
work on clearly establishing each circle’s relationship with the others. Detailed circle-specific
solutions will naturally result. Eventually, the overall performance situation will become
apparent and can be summarized in only one or two sentences. If I cannot quickly summarize
the situation, then I know my analysis is somehow flawed or incomplete.
After repeated cleansing cycles and hopefully an “aha!” moment or two, the final
document will begin to take shape. With the document completed, I am able to easily create a
presentation or create email messages for a variety of audiences.
If you follow this process and complete your document, you will be able to stand in front
of management and your colleagues and confidently explain the situation and your
recommendations.
If formally documenting your work is not your style, you can still follow this process on
a piece of scratch paper. During my firefighting training courses, I don’t have students create
a document. Instead, on a piece of paper, they outline each key section and methodically
perform their analysis. Within 30 minutes, the students can complete a full 3-circle analysis
and be ready to discuss each circle’s unique situation, how they relate, and possible solutions!
Wait-Event Analysis
There are two distinct methods to diagnose Oracle contention. The traditional approach is
based on performance ratios, such as the block buffer cache hit ratio. The current approach is
based on Oracle’s instrumented9 kernel code. The instrumented analysis approach is
commonly called wait-event analysis. As you’ll come to understand, wait-event analysis is far
superior because it is both fast and accurate. Plus it is a key input when performing an Oracle
response-time analysis. This book exploits Oracle’s instrumentation. Here, I’ll give you some
insight into the wait-event analysis method. How to perform a thorough wait-event analysis is
described Chapters 2 and 5.
Wait-event analysis is like going to the doctor, as an adult.
What is the difference between traditional performance ratio analysis and wait-event
analysis? I like to contrast the two by telling a story about a mother bringing her sick baby to
a doctor. The doctor looks at Johnny and asks, “Johnny, where does it hurt?” Since Johnny is
just a baby and can’t talk, he looks at the doctor and starts screaming loudly. To diagnose
little Johnny, the doctor must perform a series of uncomfortable tests.10 After the tests are
performed, the doctor says something like, “Based on the series of tests I ran and based on my
9
Instrumentation involves injecting additional code into software to observe specific behavior. It is similar to
the gauges and instruments on a car dashboard or my bright-red Honda VFR 800 Interceptor’s speedometer—it
provides internal functioning information. When the hooks needed to obtain this data are added as the application is
built, instrumentation is easy. But most often, little thought is given to instrumentation. Oracle started instrumenting
its kernel code with Oracle version 7.
10
And we wonder why we grow up avoiding doctor visits!
25
Methods and Madness
years of experience and my keen knowledge of a child’s physiology, I believe this is the
problem and this is what we should do.” Very professional indeed!
Let’s go over a couple key points in this analogy. First and foremost, because Johnny
can’t speak, the doctor must run a series of broad tests to discover the problem. If he forgets
to run a specific test, doesn’t know about a new test, or just doesn’t feel that a specific test is
necessary—for example, looking into Johnny’s ears—he can easily misdiagnose. The point is
that because Johnny can’t tell the doctor where the pain is, the doctor must run more tests than
he would need to if Johnny could talk.
The second point is that the doctor must know about the child’s physiology to solve the
problem. While he may be able to diagnose the problem, unless he knows children’s
physiology, he will need to refer Johnny’s mother to a specialist.
Consider a very different situation. Johnny has grown up and is now an adult, John. John
is not feeling well, so he visits his doctor. The doctor asks John where it hurts, and John
responds by moving his elbow and saying, “It hurts when I do this!” The doctor then responds
by telling John to stop moving his elbow and escorts him out of her office. No, just joking.
Because John told the doctor specifically about the pain, the doctor runs a few very isolated
tests. She then will say something like, “Based on what you told me, the few isolated tests I
ran, my years of experience, and my in-depth knowledge of the human body, this is the
problem and this is what you need to do.”
What a difference a few years can make, eh? The key difference is that John can tell the
doctor where the pain is. This is just like Oracle’s wait interface telling you where the pain is,
where the contention is, and where too much time is being spent. Ratio analysis is like
diagnosing Johnny, whereas wait-event analysis is like working with John—a much faster,
more precise, and more confident diagnosis.
Notice the three distinct doctor-level qualifications:
• First, the doctor must know John’s language and be able to listen to him. This is just
like a firefighter knowing how to listen to Oracle, through Oracle’s wait interface.
This book will teach you how to listen to Oracle. But this is just the start.
• Second, even with an amazing diagnosis, to solve John’s problem, the doctor must
have a deep understanding of the human body and specifically in this case, the
elbow. This is just like a performance specialist knowing Oracle internals. Knowing
Oracle is suffering from intense library cache contention is good, but to solve the
problem, you must understand Oracle internals enough to influence the situation to
reduce the contention.
• Finally, the doctor has been practicing 11 for a number of years. This adds confidence
and a broader experience.
To summarize, to become a stellar Oracle performance firefighter, you must know how
to listen to Oracle, you must know what Oracle is telling you, you must know Oracle
internals, and you need experience. This book will prepare you for all but the experience
aspect. But through the examples, exercises, and case studies, I’ll try to jump-start your
experience quotient.
11
The word practice used in conjunction with a medical doctor has always been a little unsettling. Even more
personal was when my title at Oracle Corporation was, you guessed it, Practice Manager. As the joke went, Craig’s
not a real manager—he’s just practicing to be one!
26
Methods and Madness
I hope you can see why Oracle wait-event analysis is far superior to traditional
performance ratio analysis. As you read this book, you will learn how to exploit the wait
interface, take it beyond just listening to Oracle, and also be made aware of how the wait
interface can be shockingly misused, leading to an incorrect diagnosis.
Oracle Response-Time Analysis

I believe Oracle response-time analysis, or ORTA for short, is the most significant
performance analysis method DBAs have at their disposal.12 ORTA allows DBAs to quantify
(at least in part) a user’s experience, to classify time, to anticipate both the application and
operating system situation, and to quantify a solution’s benefit. Thus, this approach
transforms the traditionally defensive firefighting posture into a forward-thinking predictive
posture. That is an impressive list!
Even wait-event analysis falls short compared to ORTA, because ORTA encapsulates
wait-event analysis and takes it further. Shunning the advantages of ORTA and relying solely
on a wait-event analysis will limit performance analysis in depth and scope, and opens up the
possibility of a misdiagnosis. Those are condemning statements, but as you read this book,
you will discover that they are true. Here, I will introduce the background and concepts of
ORTA. In Chapters 5 and 9, I will detail how to perform the analysis, where to gather the
necessary data, and other interesting details.
Response-time analysis did not originate from the firefighting world. It is from the
infrastructure planning world of IT. The infrastructure community looks at IT as a network of
services and conduits to those services. Translating this to DBA-speak, that would be a bunch
of servers networked together. When a user makes a request, it is routed into the mystical IT
cloud, where the request waits to be serviced and receives service from a potentially large
number of service providers. One way to categorize this time is to place it into two buckets:
time being serviced and time waiting to be serviced, or more simply, service time and queue
time. Adding the two results gives you the response time. In the course of my Oracle
forecasting and predictive analysis work, I noticed that response-time analysis could be
applied to firefighting.
As I noted, this type of analysis deals with two high-level time categories: service time
and queue time. From an abstracted performance analyst perspective, queue time contains all
the time an Oracle process is waiting to be serviced. Service time is all the time the
transaction is being serviced. Add all that time together, and you have how long it took for the
database transaction to complete; that is, its response time. But it gets better! Since ORTA is
used both when firefighting and forecasting, it can serve as a bridge between the two,
allowing you to predict the effects of the solution. (Although this book does not focus on the
predictive aspect of performance analysis, I will introduce the subject in Chapter 9.)
The Role of the Response-Time Curve

The response-time curve is fundamental to our work and is simple to understand. Take a look
at the classic response-time curve graph shown in Figure 1-3. The vertical axis is the response
time. For simplicity, think of how long it takes a query to complete. Keep in mind that
12
In 2001, I published the original ORTA paper and presented it at the Computer Measurement Group (CMG)
conference in Anaheim, California. Since that time, ORTA has been at the core of all my Oracle-related work.
27
Methods and Madness
response time is the sum of the service time and the queue time. The horizontal axis is the
arrival rate. As more transactions enter the system, we shift to the right. Notice that when the
arrival rate is small, the response time is equal to the service time; that is, there is no queuing.
But as more transactions enter the system per unit of time (that is, the arrival rate increases),
queuing will eventually occur. Notice that just a little queuing occurs near the beginning. But
as the arrival rate continues to increase, at some point (usually around 75% utilization), the
queue time becomes significant and eventually will skyrocket. When this occurs, the response
time also skyrockets, performance slows, and users get extremely upset.
Figure 1-3. Graph of the classic response time curve. This example shows at an arrival rate
(the workload) of 1.55 transactions per millisecond (trx/ms). The response time is 3 ms/trx,
the service time is 2 ms/trx, and the queue time is 1 ms/trx.
While at a university, I had a job answering the computer operations room telephone.
The phone could handle multiple lines, and since I could talk with only one person at a time,
sometimes I had to put someone on hold. When things were calm and someone called, I
would listen to the request and handle the call, hang up, and then wait for the next call. That’s
when the job was relaxing. However, if the call arrival rate increased enough, someone would
call while I was already talking to someone on the other line. As a result, someone had to wait
his turn, or in forecasting terms, queued. I noticed once the rate of calls caused people to
queue, it took only a slight increase in the call arrival rate before there were many people in
the queue and they were waiting a long time. It was like everything was calm, and then—
wham!—everyone was queuing.
You might have experienced this yourself with your computer systems. Performance is
fine, yet the system is really busy. Then for any number of reasons, the system activity
increases just a little, and—wham!—performance takes a dive. And you sit back and say,
“What just happened? Everything was going fine, and the workload didn’t increase that
much.”
28
Methods and Madness
What happened to both of us is that we hit the famous rocket-science-like term, elbow of
the curve (also known as the knee of the curve). At the elbow of the curve, a small increase in
the arrival rate causes a large increase in the response time. This happens in all queuing
systems in some fashion, and it is our job to understand when and under what conditions the
elbow of the curve will occur.
As I will detail in Chapter 5, it is very convenient for DBAs when service time is CPU
time, and queue time is Oracle wait-interface time. Think of the arrival rate as the Oracle
workload. If you are familiar with Statspack or AWR, you know that near the beginning of
the report, you can find a series of load profile statistics. Any one of those can represent the
arrival rate. When the response-time curve is used for predictive purposes, a model is
developed based on one or more of the load profile statistics. Common Oracle arrival rate
statistics are user calls, executions, and buffer gets.13
The graph shown in Figure 1-3 is based on a queuing theory model.14 So it’s not real;
it’s an abstraction. Service time is never perfectly horizontal, and queue time does not occur
exactly as we plan in real systems. All those Oracle optimizations and messy workload issues
muddy up things a bit. However, transactions and computing systems do behave in a queuing-
like manner. They must—it’s their nature. I’ve been asked how I know a system will respond
this way. The answer is always the same. That’s how transactions behave, and so do humans
when we become a transaction and enter a queuing system, such as the one at McDonald’s.
When you perform ORTA, you construct a response time curve graph, or simply
numerically detail its components (service time and queue time), based on what occurred
during an interval of time. For example, Figure 1-3 could represent Monday morning between
9 a.m. and 10 a.m.
So far, this discussion has been fairly academic. The following case study introduces
how to perform an ORTA. I hope you will begin to see that this type of analysis is both
straightforward and very useful.
Case Study of Oracle Response-Time Analysis

Here is a sneak preview of how ORTA is performed and used in conjunction with the 3-circle
analysis to develop an array of possible solutions, as well as a visual understanding of both
the situation and the solution’s effects.
Suppose it’s Friday afternoon. You’ve looked at your database Statspack reports and had
a short conversation with your operating system administrator. You’ve found that between 1
p.m. and 2 p.m., the following occurred:
• Average user call activity was 1,510 uc/sec; that is, 5,400,000 user calls occurred in
the one-hour time span. This will be the arrival rate (the workload).
• Total Oracle CPU time consumed was 39,960 seconds. This will be the service time.
• The top five wait-event times totaled 32,022 seconds. This will be the queue time.
13
Knowing how to shift the response-time curve up, down, left, and right will significantly aid your IT
understanding, as well as enhance your creative performance problem-solving abilities. If you are interested in
learning more, consider reading Chapter 5, “Practical Queuing Theory,” in my Forecasting Oracle Performance book
(Apress, 2007).
14
The model used is the M/M/m Queuing Theory Analysis spreadsheet. Go to the OraPub web site
(http://www.orapub.com) and search for “queuing theory.”
29
Methods and Madness
• CBC latch contention accounted for 90% of the total wait time; that is, 28,920
seconds. So nearly all of the wait time (queue time) is related to CBC latch
contention.
• Oracle consumed 93% of the available database server CPU capacity, which contains
12 CPU cores. Utilization is simply consumption divided by capacity. The total CPU
capacity equals the interval duration (60 minutes) multiplied by the 12 CPU cores;
that is, 720 minutes, or 43,200 seconds. Said another way, within any one-hour
period, a 12 CPU core server can provide up to 720 minutes of CPU power.
Statspack showed Oracle CPU consumption was 39,960 seconds. Therefore, Oracle
used 39,960 seconds of the available 43,200 seconds of CPU, which is 93%.15
If you were to draw the response-time curve, it would look something like Figure 1-4.
(Keep in mind this is an abstraction, and I took certain liberties, which will be fully explained
in Chapters 5 and 9.) Looking at the response-time curve in Figure 1-4, you can see that, at an
arrival rate (workload) of 1,510 uc/sec, there is a clear problem, because the system is
operating in the elbow of the curve. There is a possibility that service levels are still being
met, so this situation does not represent a problem. However, just a slight increase in the
arrival rate (workload) will very likely cause a service level breach.
Figure 1-4. This response time curve is based on 12 CPU cores with a service time of 0.0074
sec/uc. In this example, the arrival rate is 1,510 uc/sec, which mathematically equates to an
average queue time of 0.0059 sec/uc, an average CPU utilization of 93%, and a response time
of 0.0133 sec/uc.
As noted in the preceding list, during the one-hour period, total queue time (wait time)
was 32,022 seconds. Since over the one-hour period, 5,400,000 user calls occurred, the
15
I am not implying the average operating system CPU utilization is 93%. In fact, it is probably higher.
30
Methods and Madness
average queue time per user call is 0.0059 sec/uc, or 32,022/5,400,000. For service time,
Oracle consumed 39,960 seconds, and therefore the average service time for each user call is
0.0074 sec/uc, or 39,960/5,400,000.16
What is not graphically clear in Figure 1-4 is that nearly all the available database server
CPU time is being consumed. Our simple calculation shows Oracle consumed 93% of the
available CPU, which means that only 7% remained for all other processes including
operating system activity, such as virtual memory management and process scheduling. The
combination of Oracle’s CPU consumption and the CBC latching issue will surely result in a
raging CPU bottleneck.
Based on this abbreviated ORTA, an understanding of the Oracle architecture, and a
little experience, there is an amazing array of possible solutions. From an Oracle perspective,
you look for ways to reduce both the wait time and the service time by cleverly influencing
Oracle. Wait-event analysis alone would not have exposed the CPU consumption issue.
Knowing Oracle CPU time is greater than Oracle queue time is not enough information to
devise a responsible solution. In fact, not considering response time could result in a reduction
in wait time but a larger increase in service time!
Oracle-focused solutions will zero in on reducing both service time and queue time.
CBC latch contention indicates a high level of buffer activity, which consumes CPU. So you
start thinking of creative Oracle-centric ways to reduce the amount of buffer cache activity.
You also expect, and will be looking for, high buffer get (that is, logical IO) SQL during your
application analysis. From an Oracle queue time perspective, a possible solution is to increase
the number of CBC latches. (Many other solutions will be presented in Chapter 6, which
covers Oracle buffer cache internals.) The key is to focus on reducing the response time, not
just the service time or the wait time.
From an application perspective, look for the high buffer get SQL. From an operating
system perspective, look for ways to give Oracle more CPU power. Of course, this could
mean additional and/or faster CPUs, but it also implies looking for any processes consuming a
lot of the CPU that could possibly run at other times or not be run at all.
This example demonstrates that a relatively simple ORTA, combined with a quick 3-
circle analysis, results in a spot-on diagnosis, a number of solution possibilities, and graphics
to help everyone understand the situation.
Red Line, Blue Line

Remember your school report card? It may have included your current term’s grade point
average and also an accumulated grade point average. Both statistics are important, yet based
on a different data set. Oracle reports based on performance views can also yield current or
single-term data (think interval) or accumulated data (think since instance start). For example,
running a quick SQL statement referencing an Oracle performance view, such as
v$sysstat, does not necessarily represent what is currently occurring, what just occurred,
or what has recently occurred. It represents what has been occurring since the Oracle instance
started! Therefore, unless you are careful, you can be deceived.
16
Utilization equals the service time multiplied by the arrival rate, then divided by the number of CPU cores;
U=Stλ/M. Response time equals the service time divided by one minus the utilization to the number of CPU cores
power; Rt=St/(1-UM). For more details, see Forecasting Oracle Performance (Apress, 2007; p. 25–26) or visit the
OraPub web site (http://www.orapub.com) and search for “forecasting formulas.”
31
Methods and Madness
Suppose an Oracle instance has been running for two months and you wanted to know
the cache hit ratio for the buffer cache. Querying v$sysstat once would give you the
average cache hit ratio since the instance has started. If you wanted to know the cache hit ratio
over a specific period, you would need to have beginning and ending values. Think in terms
of Statspack and AWR. Before you run the report, you must set both the start and end points.
This provides an interval, or period of time, so your calculations are relevant to the time
period desired. I call the interval calculations the blue line, and the simple v$sysstat query
the red line, as illustrated in Figure 1-5.
Figure 1-5. Blue line statistics are created from an interval of time, whereas red line statistics
contain all activity (accumulated) since the instance started.
Figure 1-5 startles many DBAs at first because both lines represent the same underlying
statistics. Suppose you wanted to know the statistic value at time 8. If you ran a red line
query, the beginning value would be 0 (zero), since that is when the instance started, and the
ending value would be queried at time 8. Now suppose you wanted to know the statistic
within the last hour. The blue line query’s initial value would be queried at time 7 and the
ending value at time 8. In Figure 1-5, while the ending statistic values are the same, the
performance statistic value for the red line is 1.5, yet for the blue line, it is 2.0. That is a
significant difference. And notice the longer the instance is running, the flatter the red line is
and the less likely that it is somewhat close to the blue line.
What is dangerous is that most DBAs have never thought about this. Yet when in a
firefighting situation and running interactive scripts, usually very simple queries that gather
data once from the underlying performance view are run. The result is a red line query, which
can be very misleading.
You can be sure you are running a blue line query if you have a defined beginning and
end. You will also notice the values are very dynamic in range, whereas red line values tend
to be very static, with little variance.
You’re probably wondering if your existing tools and products are reporting the blue
line or the red line. The only way to tell for sure is to look at the SQL they are running. The
32
Methods and Madness
next best thing is to see if there is a defined beginning and ending point. If this is not possible,
there is a good chance the tool or product is producing red line statistics. An update button
may make no difference. The update button could just keep moving the red line’s end point
while always using the instance start time as the beginning value.
In summary, when running SQL scripts, be aware of the data used to derive the statistic.
It may be less useful than you thought.
Summary
Oracle performance firefighting requires unique skills, tools, methods, deep technical Oracle
internals knowledge, and a different perspective. Getting the most out of these areas requires
an integrated framework. This chapter introduced the core supporting framework enabling
spot-on diagnosis, multiple-perspective analysis, visual aids, and response-time analysis.
You may have noticed I frequently mentioned leaving out significant details and taking
abstraction liberties. This was necessary to establish a solid framework without distracting
you with details and side topics. The upcoming performance diagnosis chapters will provide
ever-decreasing abstraction and a much deeper look into wait-event analysis and response-
time analysis. Once the framework has been fully developed, the Oracle internals chapters
will provide the technical underpinnings, allowing appropriate and relevant solutions to the
correctly diagnosed problems. Once the framework and the internals are covered, we’ll move
into complete performance analysis integration in the final chapter.
33
Methods and Madness
34
CHAPTER
2
Listening to Oracle’s
Pain
Oracle does speak to me. In a way, I’m like an Oracle therapist. If it’s hurting, I’ll listen, and
I’ll expend a tremendous amount of effort trying to help alleviate its pain as quickly as
possible, while minimizing disruption to its operation. If you’re interested, I’ll teach you how
to do this as well.
I listen very closely as Oracle speaks through its wait interface. By paying close
attention, I can hear Oracle clearly disclose where it hurts. With a little more information and
a solid methodical approach, I can get a near-complete and clear picture of the painful
situation, which allows me to develop a very specific solution set.
This chapter is the technological foundation for Oracle performance diagnosis. With this
information in hand, you will be ready to dig deep into Oracle internals and diagnosis topics. I
will start with a brief history of Oracle performance diagnosis. Then I’ll move directly into
the topics of software instrumentation, Oracle-specific instrumentation, and Oracle response-
time analysis (ORTA). I will introduce both session-level, group-level, and instance-level
ORTA. This is also known as profiling, but I tend to not use that word, as there is a lot of
baggage associated with it.
35
Listening to Oracle’s Pain
Performance Diagnosis: The Backstory

Oracle performance diagnosis has a relatively short but interesting history. From its inception
to the early 1980s, Oracle performance diagnosis was just getting started, and Oracle systems
were relatively small and simplistic. As a result, SQL tuning and resolving lock contention
were the focus.
From the mid-1980s to mid-1990s, Oracle systems began to grow in size and
complexity. As a result, performance analysis was needed urgently. Oracle ratio analysis was
born, and the performance industry started to flourish, giving rise to technical papers, books,
vendor products, and consulting services. It was kind of the cowboy days of performance
analysis. Authors and consultants had a good thing going. The books on the topic were
massive because of the sheer number of performance ratios. And because there were so many
ratios to examine, consultants had long engagements, generating a lot of money. And since
the analyses were not as exact as today, they tended to take a few more cycles to complete. In
truth, not only authors and consultants benefited; anyone involved in performance analysis
had plenty of work.
Now, don’t get me wrong. Ratio analysis works, and at that time, it was the best method
we had at our disposal. But times have changed.
The introduction of Oracle 7 in 1996 came with its kernel instrumented and allowed
performance analysts a simple interface to view the instrumentation through standard SQL. I
first heard about the wait interface from an Oracle internal paper by Anjo Kolk. Immediately,
I began researching, testing, and validating its practical use in firefighting situations. Once
convinced of its value, I began using the wait interface during my consulting engagements,
writing and publishing technical papers on the subject, and speaking on the subject.1
But not everyone thought the wait interface was so fantastic. It was not public
knowledge and was an entirely new way to diagnose Oracle systems. Many questioned its
validity and value. The established Oracle gurus at the time did not desire change. After all,
they had a really good deal going with ratio analysis. In fact, the scuffle between ratio and
wait-event analysis practitioners became quite personal in some cases—petty and actually
quite embarrassing to watch. Eventually though, wait-event analysis become recognized as
the best way to diagnose performance issues.
In 2001, I began pondering response time as it relates to firefighting. After all, what a
user experiences is response time. So perhaps there was a way to approach Oracle
performance analysis from a response-time perspective as well. It turned out capacity-
planning response-time analysis could be applied to Oracle diagnosis. In fact, as I discussed in
the previous chapter, it worked so well, I published the original paper on the subject. Since
that time, there have been many different publications, tools, and products that embrace
ORTA.
Since the introduction of Oracle 7, the most significant performance analysis
improvement was released around 2005 with Oracle Database 10g. The active session history
facility, known as ASH, provides a kernel-level data-collection facility. The fact that the
collection ability resides directly in Oracle’s kernel provides a number of significant benefits:
1
My original “Direct Connection Identification Using Oracle’s Wait Interface” paper has had thousands and
thousands of downloads. It quickly became the foundation upon which all of my performance analyses were based.
This paper is still available from the OraPub web site (go to http://www.orapub.com and search for “direct
contention”).
36
• Data collection can be intrusive, but since the collection is built directly into the
kernel, the impact is minimized.
• The collected data is retained; even after a session disconnects, the data remains.
• Oracle states that even long-running SQL resource consumption will be accurate to
within 5 seconds.
• The ASH data is saved in Oracle’s automatic workload repository.
There are other benefits of ASH, as I’ll discuss in the Chapter 5.
But be warned: while the ASH views are queryable at any time, to legally query the
views, an additional license is required.2
I am confident Oracle will continue to provide both new and improved performance
diagnostic capabilities. For example, Oracle is now allowing operating system activity to be
viewed through its performance views (for example, v$osstat).
It’s All About Instrumentation

If you’ve ever had to develop speedy code, there is a good chance you are familiar with
instrumentation. Instrumenting source code requires that additional code be inserted directly
into source code, enabling measuring of specific activities. When the code is run, the analyst
can then determine where time has been spent and focus on slow areas of the code.
A more personal illustration of instrumentation is the time card. During my consulting
engagements, my clients require that I keep track of my time. After all, they want to ensure I
am using their money wisely. So, I must take extra steps to ensure my time is recorded
properly. At the end of each month, I give my monthly activity calendar to my administrator.
She extracts the time I spend for each client and summarizes it in a way that is useful for
billing purposes. For example, the summarization may show that I spent 15 hours working on
the Zeppo project, 25 hours on the Dingbat project, and 4 hours on the Tucson project. The
only way the billing information can be created is if I allowed my activity to be instrumented.
These extra steps—the instrumentation of my day—are what allow my time to be summarized
and useful for billing purposes. Without me taking the time to record my activities, there is no
way for anyone to know where I spent my time—including myself! So instrumenting my
workday is not only useful, but it also required.
Instrumentation may seem simple enough, but there is more to it than is usually
apparent. In fact, instrumentation can even lead to an incorrect diagnosis! Let’s see how this
can happen.
Assume a software product has three modules: A, B, and C. The software was running
slow; no one seemed to know why. So, the development manager decided it was time to
instrument the code. As shown in Figure 2-1, the developers inserted simple timers into each
module. When the module began, a starting time was gathered; upon exit, the finish time was
gathered.
2
Yes, this is true. If you don’t believe me, just ask an Oracle salesperson.
37
Module_A
{
timer.start
…
timer.stop
}
Module_B
{
timer.start
…
timer.stop
}
Module_C
{
timer.start
…
timer.stop
}
Figure 2-1. Simple instrumentation example. Each product’s modules have a simple start and
stop timer, providing the ability to determine the module run time.
The development manager then repeatedly ran the product while the timers were
recording module start and stop times. After he was satisfied that enough runs had occurred,
he ran the query shown in Figure 2-2, and got the results shown in that figure.
col mod format a7 heading "Module"

col r format 999 heading "Runs"
col tt format 990.00 heading "Tot Time"
col at format 990.00 heading "Avg Time"
select module,
count(*) r,
sum(time_spent) tt,
avg(time_spent) at
from instrumentation
group by module;
Module Runs Tot Time Avg Time

------- ---- --------- --------
A 6 9.20 1.53
B 11 16.70 1.52
C 8 40.60 5.08
Figure 2-2. Sample instrumentation analysis code and the actual results. After repeated
product runs, module C clearly took the most time.
Based on the instrumentation output shown in Figure 2-2, it was clear that module C had
taken the longest, both in total time and average time per run. As you can imagine, the
development manager called a meeting with module C’s developer. We’ll just call her
developer C.
38
As the development manager gave his explanation, developer C grew increasingly and
visibly angry. Alarmed at her response, the development manager asked what was wrong.
Developer C responded that the problem was not in her code, but rather with the inadequate
IO subsystem, which just happened to be configured by the development manager! Developer
C went on to say that the reason module C’s time was so long was that module C needed to
make calls to the IO subsystem, which she had absolutely no control over. So, module C
appeared as though it were performing slowly, but the real problem was that the IO subsystem
was performing slowly. After a little thought, the development manager agreed. He also said
that they needed to ensure there weren’t needless IO calls being made, but in principle, he
thought developer C was correct.
To confirm developer C’s hypotheses, they agreed the source code would need to be
instrumented at a more granular level. Figure 2-3 shows (for only module A) that, in addition
to identifying the module’s run time, details such as CPU, IO reads, IO writes, and lock
requests would be instrumented.
Module_A
{
timer.start.cpu(‘A’)
…
timer.stop.cpu(‘A’)
timer.start.ioread(‘A’)
…
timer.stop.ioread(‘A’)
…
timer.start.lockreq(‘A’)
…
timer.stop.lockreq(‘A’)
…
}
Figure 2-3. Increased granularity instrumentation example for only module A. Each
product’s modules have multiple timers implemented, enabling a more detailed time
accounting.
With great anticipation, the product was repeatedly run, and the instrumentation report
was generated. As shown in Figure 2-4, developer C was correct—the vast majority of her
module’s time was spent waiting for IO read requests to complete. Feeling vindicated,
developer C went so far as to say that the development manager should have a talk with
developer A, since his code was actually burning more CPU than anyone else’s code!
39
SQL> l
1 select module,
2 time_type,
3 count(*) r,
4 sum(time_spent) tt,
5 avg(time_spent) at
6 from instrumentation
7* group by module, time_type
SQL> /
Module Resource Runs Tot Time Avg Time

------- --------- ---- -------- --------
A cpu 7 12.66 1.81
A lockreq 6 0.68 0.11
B cpu 7 2.66 0.38
B ioread 3 3.25 1.08
B lockreq 3 1.03 0.34
C cpu 7 6.66 0.95
C iowrite 4 0.81 0.20
C ioread 6 29.57 4.93
C lockreq 3 0.96 0.32
Figure 2-4. Sample instrumentation analysis code and the actual results. After repeated
product runs, clearly module C must wait for IO reads to complete.
The development manager quickly realized that he could also summarize the
instrumentation by the resource, not only by the module. As shown in Figure 2-5, the
development manager was clearly able to see that both IO reads and CPU consumption have a
dramatic effect on the overall application performance.
SQL> l
1 select time_type,
2 count(*) r,
3 sum(time_spent) tt,
4 avg(time_spent) at
5 from instrumentation
6* group by time_type
SQL> /
Resource Runs Tot Time Avg Time

--------- ---- -------- --------
cpu 21 21.98 1.05
iowrite 7 4.06 0.58
ioread 6 29.57 4.93
lockreq 12 2.67 0.22
Figure 2-5. Sample instrumentation analysis code and the actual results focusing on the
resources, rather than the modules. After repeated product runs, waiting for IO reads to
complete is the overall performance-limiting factor.
There are a number of not so obvious yet extremely important advantages to this
technique. First, a granular instrumentation will help developers focus on their code issues
without being misled by other performance-limiting factors outside their control. In this
40
example, the outside issue was IO reads. Second, nondevelopers can also use the detailed
instrumentation to diagnose performance issues. While nondevelopers cannot change the
source code, they can perhaps change parameters to influence code operation, such as how
often and which code modules are being run, and they can influence the operating system on
which the code is being run. Similarly, using Oracle’s instrumentation benefits kernel
developers, operating system administrators, and performance analysts.
Oracle Instrumentation
Through Oracle’s instrumentation efforts, Oracle kernel developers, Oracle performance
analysts, and operating system administrators significantly benefit, as follows:
• Oracle kernel developers use their instrumentation to focus on real kernel-level
performance issues.
• Performance analysts can use the instrumentation to influence the code operation
(think instance parameters) and how the code is being run (think application SQL).
• Operating system administrators can use the instrumentation to understand how
Oracle is stressing the computer system.
As you can see, Oracle source code instrumentation is incredibly valuable and was a
major performance analysis breakthrough for Oracle Corporation.
Oracle instrumented its code starting with Oracle 7. The beauty of this instrumentation is
that Oracle made the information available through the SQL interface. Oracle could have
forced us to trace a specific process or processes, and perhaps use an additional command like
tkprof to consolidate and format the information. By making the instrumentation details
easily available through the performance views, Oracle has encouraged performance analysts
and vendors to use the information.
I personally believe that Oracle instrumented its code for kernel optimization, not for
DBAs or application developers. An obvious clue is the names given to describe the timing,
such as db file scattered read to relate non-buffer cache multiblock reads and
latch free to signify latch sleep time. No DBA would have come up with names like
these!
Oracle has never guaranteed it has completely instrumented it code. This leaves open the
possibility for unaccounted for time. If you are counting each millisecond and come up short,
one of the possible reasons for this is that Oracle simply did not instrument a piece of code
you ran. For this, and many other reasons, I don’t expect to account for every little bit of time
in my analyses. This may seem like a problem, but I don’t believe it is.
Oracle performance analysis is about solving real-life production performance issues. In
all my years of using Oracle’s instrumentation, Oracle has not misled me in my analysis. The
OraPub 3-circle analysis has confirmed and strengthened what Oracle’s instrumentation has
indicated. This is why I don’t get uptight about unaccounted time. My job is to fix real
performance issues, not to find the missing 50 ms!
Sure, it’s a lot of fun to figure out the response time to the minutia level, and it definitely
is good for your image and makes great cocktail party conversations, but the value of this
missing time is highly questionable. During a firefight, it’s easy to get distracted from the
main overriding goal, and trying to account for every microsecond can cause the technical
problem solver to stray from the main objective.
41
How Oracle Collects Time

Understanding how Oracle collects time is not only fascinating, but it also gives you a greater
appreciation for the Oracle wait interface. You’ll have a better idea of how the wait interface
works, which provides you with a higher level of confidence during analysis and when
presenting your results. You will also begin to use the instrumentation in creative and highly
effective ways.
We must get creative to see how Oracle instruments its kernel code. Since we are not
Oracle kernel developers, we obviously do not have access to Oracle’s kernel code. But all is
not lost! By tracing an Oracle process through the operating system, we can easily observe
how Oracle has instrumented it code. The depth of knowledge you gain from doing this
yourself is amazing.
The Linux trace command is strace. On Solaris, the command is truss. See your
operating system documentation for its corresponding tracing command. When you issue the
appropriate command for your operating system, the tracing immediately begins to spew forth
output and continues until the process ends or you break out of the trace.
Figure 2-6 shows an Oracle Database 10g Release 1 server process operating system
trace. The Oracle server process oracleprod3 has the process ID 14558. This is an active
process, so many of lines in Figure 2-6 have been removed. Where the lines were removed, an
ellipsis (...) was inserted.
[root@localhost oracle]# ps -eaf|grep oracleprod3

oracle 14558 1 0 06:02 ? 00:00:06 oracleprod3
root 14586 2956 0 06:15 pts/2 00:00:00 grep oracleprod3
[root@localhost oracle]# strace -p 14558

Process 14558 attached - interrupt to quit
...
gettimeofday({1199358777, 801187}, NULL) = 0
readv(17, [
{"\6\242\0\0-\364\0\1\377\300\22\0\0\0\1\4Y\225\0\0\1\0%"..., 8192},
{"\6\242\0\0.\364\0\1\377\300\22\0\0\0\1\4\203 \0\0\1\0&"..., 8192},
{"\6\242\0\0/\364\0\1\377\300\22\0\0\0\1\4\213E\0\0\1\0%"..., 8192},
{"\6\242\0\0000\364\0\1\377\300\22\0\0\0\1\4]9\0\0\1\0&\0"..., 8192},
{"\6\242\0\0001\364\0\1\377\300\22\0\0\0\1\4]\211\0\0\1\0"..., 8192},
{"\6\242\0\0002\364\0\1\377\300\22\0\0\0\1\4\222G\0\0\1\0"..., 8192},
{"\6\242\0\0003\364\0\1\377\300\22\0\0\0\1\4\251\20\0\0\1"..., 8192},
{"\6\242\0\0004\364\0\1\377\300\22\0\0\0\1\4\302\224\0\0"..., 8192},
{"\6\242\0\0005\364\0\1\377\300\22\0\0\0\1\4&\366\0\0\1\0"..., 8192},
{"\6\242\0\0006\364\0\1\377\300\22\0\0\0\1\4\324\210\0\0"..., 8192},
{"\6\242\0\0007\364\0\1\377\300\22\0\0\0\1\4_\272\0\0\1\0"..., 8192},
{"\6\242\0\0008\364\0\1\377\300\22\0\0\0\1\4\372\240\0\0"..., 8192},
{"\6\242\0\0009\364\0\1\377\300\22\0\0\0\1\4\355\257\0\0"..., 8192},
{"\6\242\0\0:\364\0\1\377\300\22\0\0\0\1\4s\322\0\0\1\0&"..., 8192},
{"\6\242\0\0;\364\0\1\377\300\22\0\0\0\1\4\230\225\0\0\1"..., 8192},
{"\6\242\0\0<\364\0\1\377\300\22\0\0\0\1\4\26\262\0\0\1\0"..., 8192}],
16)
= 131072
gettimeofday({1199358777, 801788}, NULL) = 0
...
Figure 2-6. Oracle’s instrumentation through an operating system lens. Oracle has submitted
a multiblock IO read request to the operating system by issuing the operating system readv
call. The call took 0.6 ms, which is wonderfully fast.
42
In Figure 2-6, the instrumented call is readv. But just before the Oracle kernel code
makes the readv call, Oracle asks the operating system for the time by issuing the
gettimeofday call. Then, immediately after the readv call, Oracle once again asks the
operating system for the time by issuing another gettimeofday call. According to the
Linux gettimeofday manual page, the second numeric output shown is a time in
microseconds. Subtracting the initial time (801187) from the final time (801788), we find that
this particular readv call took 601 µs (801788 – 801187), or 0.6 ms.
There are a number of things we can learn by taking a closer look at the readv call:
• According to the Linux manual page, readv is a synchronous call to the operating
system that requests one or more blocks from an IO device. So, we know Oracle did
not find these specific blocks in its buffer cache and was forced to ask the IO
subsystem for the blocks. This is commonly called a physical IO or a block read.
• This is a synchronous IO call, not an asynchronous IO call. If it were asynchronous,
Oracle would have issued a different system call.
• We can clearly see that this is a multiblock read call, as opposed to a single block
request.
• Take a look at the requested block size and the number of blocks requested. If you
are familiar with Oracle, you’ll immediately notice that we are looking at an 8KB
database, and Oracle is asking for 16 blocks at once. And you are correct if you
guessed the instance parameter db_multiblock_read_count is 16!
• The total of bytes requested is 131,072.
Digging a little deeper, we can assert that all the requested blocks must have resided in
memory (but not Oracle’s buffer cache memory, since Oracle needed to request them from the
operating system). We can make this assertion because a physical spinning IO device cannot
return 16 nonsequential blocks (the blocks could be scattered over many physical devices) in
less than a single millisecond!
Oracle’s wait interface takes this timing information, gives the system call a special
nonplatform-specific name, records the timing information, and makes this all available to us
through its performance views. Regardless of the operating system or the actual system call,
Oracle gives a multiblock IO call the special name db file scattered read, because
these multiple blocks can be scattered over the IO subsystem. This Oracle-given name is more
commonly called an event name, a wait event, or simply an event. From a performance analyst
perspective, we tend to think of a multiple-block request as a sequential request, but looking
at the IO call from a more kernel-centric perspective, it is more appropriate to name the
request a scattered read.
Oracle associates the wait event name with the wait time, which in Figure 2-6 is 0.6 ms.
If we had been looking at server process 14558 through Oracle’s wait event interface during
the time of the Figure 2-6 trace, we would have seen the server process was posting a db
file scattered read. I like to say that the Oracle process is yelling out the wait event
as it impatiently waits for the operating system to return the requested blocks. After the server
process was done waiting, we would see that the time waited was 0.6 ms. So, in a very real
way, Oracle has made this tracing information available to us without tracing, but through its
wait interface. Beautiful simplicity!
43
Every Oracle process is either posting a wait event or consuming CPU 3—there are no
exceptions. This means that if you have hundreds of Oracle sessions, there is a good chance
you’ll have hundreds of sessions posting wait events. Knowing that sessions either consume
CPU or post a wait event will become more important as you go deeper into performance
analysis.
Let’s take a look at another multiblock read call but using a different operating system
tracing option. Look closely at Figure 2-7. Process 6852 was traced using the rp options. The
r option shows the timing information at the beginning of the next line. For this example, the
initial gettimeofday call took 0.000147 µs, or 0.147 ms. The 16 8KB block multiblock
read request took 22.5 ms. This is much too slow for a respectable IO subsystem. Clearly, this
IO request contains at least one block that resides on a physically spinning disk.
[root@localhost oracle]# ps -eaf|grep oracleprod3

root 14586 2956 0 06:15 pts/2 00:00:00 grep oracleprod3
[root@localhost oracle]# strace -rp 6852

...
0.000384 gettimeofday({1200616750, 111291}, NULL) = 0
0.000147 readv(11,[
{"\6\242\0\0\33=\0\1\270\313#\1\0\0[\4\v\303\0\0\1\0\0\0"..., 8192},
{"\6\242\0\0\34=\0\1\270\313#\1\0\0O\4;~\0\0\1\0#\0&\250"..., 8192},
{"\6\242\0\0\35=\0\1\270\313#\1\0\0Q\4P\\\0\0\1\0#\0&\250"..., 8192},
{"\6\242\0\0\36=\0\1\270\313#\1\0\0T\4\202\300\0\0\1\0#\0"..., 8192},
{"\6\242\0\0\37=\0\1\270\313#\1\0\0U\4\245\377\0\0\1\0#\0"..., 8192},
{"\6\242\0\0 =\0\1\270\313#\1\0\0M\4\312V\0\0\1\0#\0&\250"..., 8192},
{"\6\242\0\0!=\0\1\270\313#\1\0\0O\4\23S\0\0\1\0#\0&\250"..., 8192},
{"\6\242\0\0\"=\0\1\270\313#\1\0\0N\4\3062\0\0\1\0#\0&\250"..., 8192},
{"\6\242\0\0#=\0\1\270\313#\1\0\0X\4|\302\0\0\1\0#\0&\250"..., 8192},
{"\6\242\0\0$=\0\1\270\313#\1\0\0R\4k\200\0\0\1\0#\0&\250"..., 8192},
{"\6\242\0\0%=\0\1\270\313#\1\0\0T\4X\177\0\0\1\0#\0&\250"..., 8192},
{"\6\242\0\0&=\0\1\270\313#\1\0\0J\4M\272\0\0\1\0\"\0&\250"..., 8192},
{"\6\242\0\0\'=\0\1\270\313#\1\0\0\22\4rn\0\0\1\0\"\0&\250"..., 8192},
{"\6\242\0\0(=\0\1\267\313#\1\0\0S\4-\350\0\0\1\0\"\0&\250"..., 8192},
{"\6\242\0\0)=\0\1\267\313#\1\0\0R\4e\24\0\0\1\0\"\0&\250"..., 8192},
{"\6\242\0\0*=\0\1\267\313#\1\0\0Q\4\255X\0\0\1\0#\0&\250"..., 8192}
],16)=131072
...
Figure 2-7. Oracle’s instrumentation through an operating system lens. Oracle has submitted
a multiblock IO read request to the operating system by issuing the operating system readv
call. The call took 22.5 ms, which is much too long for most Oracle systems.
Do we cast the blame on the IO subsystem? While no one would be proud of a 22.5 ms
IO subsystem response time, based on the 3-circle analysis, and unless there is a gross
misconfiguration, performance optimizing solutions exist from Oracle, application, and
operating system perspectives. Don’t be quick to point the finger at the IO subsystem. And for
you SQL tuners, don’t immediately cast the blame on the SQL either, or you will be limiting
your optimization opportunities.
3
As I’ll detail in the Chapter 4, there are occasions where Oracle records that a session is consuming CPU
when actually the session is waiting in the CPU run queue.
44
Now that Oracle has instrumented its code, given the system calls convenient Oracle
kernel developer names, stored the timing information, and made the information available in
performance views, we are ready to query the wait time through the SQL interface.
The Wait Event Views

Oracle makes its kernel instrumentation details available to us very simply through the SQL
interface. There is actually a family of wait event views. I will present most of them in this
book, starting with the three wait event views that form the core of any wait-based analysis
and the queue time component of an ORTA:
• v$system_event: This is the high-level view. Like v$sysstat, its contents are
reset to zero at instance startup, the values contain summary wait information from
all Oracle sessions, and the values increase over time.
• v$session_event: This is just like v$system_event, except it contains the
session ID, known as the SID (not the Oracle SID, but the session’s session ID). Like
v$sesstat, when a session disconnects, the session’s wait information is
removed; when a session connects, all values begin incrementing from zero.
• v$session_wait: This is real-time session level information. Starting in Oracle
Database 10g, you can also get this information in v$session, which can be very
useful and more efficient. The value of querying the real-time information is that you
can see the lower-level parameters: p1, p2, and p3. These parameters contain very
detailed information about the specific wait. For example, if the event is db file
scattered read, p1 contains the first block’s file number, p2 contains the
block number, and p3 contains the number of blocks requested. However, for the
event latch free, p2 contains the latch number, which can be cross-referenced
in v$latchname.
A number of excellent white papers and books detail the wait event views, so there is no
need to repeat that information here. But I will summarize the views with some actual sample
script output to highlight important details.
You may also want to take a look at the v$event_name view, which lists all the wait
events available for a specific Oracle release. It is also a great quick reference for the
parameter values and the event classifications.
System-Level Perspective (v$system_event)

I like to start my 3-circle analysis by focusing on the Oracle system. Central to the Oracle
response-time analysis is a good understanding of what Oracle processes are waiting for, such
as a block, a buffer, a latch, or an enqueue. From an all-session or a system-level perspective,
the central wait event view is v$system_event. When looking at v$system_event
over an interval of time, you get a broad understanding of the Oracle wait situation.
Figure 2-8 is a 30-second interval snapshot based on v$system_event. The report
swpctx.sql is part of OraPub’s System Monitor (OSM) tool kit, which is a free and easy-
to-use performance diagnostic tool kit available from OraPub’s web site. Notice there are no
session-level references. Session-level information is available in v$session_event and
v$session_wait. The top wait event is clearly db file scattered read. In fact,
sessions are spending 53% of their Oracle wait time on this single wait event.
45
SQL> @swpctx
Remember: This report must be run twice so both the initial and
final values are available. If no output, press ENTER twice.
Database: prod3 28-MAR-10 04:48pm

Report: swpctx.sql OSM by OraPub, Inc. Page 1
System Event CHANGE (blue line) Activity By PERCENT
Time Waited % Time Avg Time Wait

Wait Event (sec) Waited Waited(ms) Count(k)
----------------------------------- ----------- ------- ---------- -------
db file scattered read 57.090 52.69 2.8 20
read by other session 41.150 37.98 14.1 3
latch: cache buffers chains 1.340 1.24 5.4 0
control file parallel write 0.770 0.71 70.0 0
log file sync 0.660 0.61 110.0 0
log file parallel write 0.490 0.45 49.0 0
db file sequential read 0.090 0.08 8.2 0
latch: cache buffers lru chain 0.020 0.02 0.6 0
latch free 0.000 0.00 0.0 0
db file parallel write 0.000 0.00 0.0 0
direct path write 0.000 0.00 0.0 0
Figure 2-8. A classic instance-level interval (30 s) wait event report based on
v$system_event. The top wait event is db file scattered read, followed by
read by other session.
It is common to misunderstand what Oracle’s wait interface is telling us. The

information in Figure 2-8 is not referring to Oracle response time, end user response time, or
what the users are experiencing. The top wait event’s 53% value refers only to the wait time
component of the Oracle response time; that is, what Oracle’s kernel code instrumentation has
provided.
Another important point is that our focus should be on the percentage of wait time and
not the wait time itself. Referring to Figure 2-8, if I were to tell someone the time waited for
scattered reads was 57 seconds and that this was a problem, they might laugh! First of all,
Figure 2-8 is only a 30-second snapshot, so the wait time may not be large. And your
colleague might expect a much larger scattered read time on his system, because his system is
busier than your system. But regardless of these issues, looking at this system from an impact
perspective and concentrating on what is affecting the sessions, it is clear that scattered read is
the top wait event.
You may have noticed I have not mentioned the number of waits. This is because it is
not an issue for our analysis. We are focusing on what affects our users, and that’s time, not
the number of waits. Look closely at the average wait times in Figure 2-8. You’ll see that
some wait events have an average wait time of less then 0.1 ms, while the largest average wait
time is 110 ms. It is the combination of the number of waits and the individual wait time that
affects our users, and therefore what’s important. It is very common for the wait event with
the largest number of waits to not be associated with the most time. Focus on the statistics that
primarily impact the user experience. In this report, what impacts our users the most is the
percentage of time waited.
46
The report shown in Figure 2-8 is based on the system event view. Over the years,
Oracle developers have added more columns, and I’m sure they will continue to do so, but for
our work, the most important columns are shown in Table 2-1. The wait class columns will be
discussed in the “Time Classification” section a little later in this chapter.
Table 2-1. Selected columns in the v$system_event performance view
Name Description and Unit Example

db file
event Actual event name as text
scattered read
total_waits Number of waits posted 22000
Sum of all event wait time since
time_waited instance startup, in hundredths of 95206
seconds (centiseconds)
Average event wait time in
average_wait 70.0
hundredths of a second
event_id Wait event primary key 506183215
wait_class Wait event classification (11g+) User I/O
wait_class_id Unique wait class ID (11g+) 1740759767
wait_class_# Unique wait class number (11g+) 8
Session-Level Perspective (v$session_event)

Whereas the system event view focuses on the overall system wait situation, the session event
view focuses on the individual session wait situation. This view is helpful for answering
questions like “What sessions have been waiting on db file scattered reads?” and
“What is wait situation for session 205?” It’s the natural next drill-down level.
The v$session_event view does not report in real time. In fact, the information
could be as old as 3 seconds! Even though it has a column named time_waited_micro,
the contents could have been updated seconds ago. So don’t be fooled. If you want real-time
details, you will need to sample from the v$session_wait view. The only significant
difference between the system event view columns, listed in Table 2-1, and the session event
view column is the session identifier column (sid) in the latter.
Figure 2-9 is a report based on v$session_event. Notice the command-line
parameter is the session identifier. The report was run twice with around a 30-second gap. If
session 205 were active, we would expect some increased wait-event activity. Notice that a
couple of the wait times did increase. We would never expect the wait time to decrease unless
session 205 disconnected and then a new session 205 was initialized upon connection.
47
SQL> @swsessid 205

Report: swsessid.sql OSM by OraPub, Inc. Page 1
Session Wait Session Event For SID 205
Total Total Time Waited Avg Wait

Wait Event Waits Timouts (sec) (ms)
------------------------------- -------- -------- ----------- ---------
db file scattered read 2351257 0 2634.39 1.12
read by other session 26770 9 938.83 35.07
db file sequential read 329968 0 60.12 0.18
free buffer waits 5295 5239 56.82 10.73
latch: cache buffers chains 1131 1129 20.63 18.24
latch: library cache 62 0 3.45 55.72
latch free 28 0 0.83 29.63
log file sync 1 0 0.26 262.00
latch: cache buffers lru chain 222 0 0.20 0.92
latch: shared pool 5 0 0.06 12.63
SQL*Net message from client 14 0 0.01 0.46
SQL*Net message to client 14 0 0.00 0.00
12 rows selected.
SQL> @swsessid 205

Report: swsessid.sql OSM by OraPub, Inc. Page 1
Session Wait Session Event For SID 205
Total Total Time Waited Avg Wait

Wait Event Waits Timouts (sec) (ms)
------------------------------- -------- -------- ----------- ---------
db file scattered read 2352665 0 2660.63 1.13
read by other session 26817 9 940.20 35.06
db file sequential read 329968 0 60.12 0.18
free buffer waits 5295 5239 56.82 10.73
latch: cache buffers chains 1131 1129 20.63 18.24
latch: library cache 62 0 3.45 55.72
latch free 28 0 0.83 29.63
log file sync 1 0 0.26 262.00
latch: cache buffers lru chain 222 0 0.20 0.92
latch: shared pool 5 0 0.06 12.63
SQL*Net message from client 14 0 0.01 0.46
SQL*Net message to client 14 0 0.00 0.00
12 rows selected.
Figure 2-9. Based on the session level view, v$session_event, this report allows
session-level wait time questions to be answered. Two snapshots are shown to demonstrate
how as the session continues to be connected, its wait time can increase.
Session 205 in Figure 2-9 is primarily waiting for the operating system to complete
multiblock IO requests. However, the operating system is doing a fantastic job, as the average
time to retrieve multiple blocks is 1.12 ms. For self-preservation, I would not recommend
confronting the IO subsystem team and implying their IO subsystem is not meeting
expectations. In fact, as I’ll discuss later, it very likely the operating system bottleneck is
related to the CPU!
48
Real-Time Session-Level Perspective (v$session_wait)

Sometimes, you will want to know what is occurring at this very moment or what just
occurred. When you need to know very specific details about the wait you are observing,
v$session_wait shines. This view gives you a real-time view of a session’s wait
situation. It shows the current wait or the most recent completed wait event.
It is common to catch a session in the middle of a wait. For example, refer back to
Figure 2-6 or Figure 2-7. If we just happened to query from v$session_wait while that
session was waiting for the operating system to return the multiblock request, we would see
that specific Oracle session posting a db file scattered read.
In addition to the real-time feature, v$session_wait also provides additional details
that are lost when the wait details are summarized in the v$session_event and
v$system_event views. This detailed information is contained in parameter columns: p1,
p2, and p3. These parameter columns also have associated text and raw columns, providing
even more information. These parameters are event-dependent; that is, their contents depend
on the wait event. For example, if the wait event is db file scattered read, then p1
and p2 contain the first block’s file number and block number, respectively, and p3 contains
the number of blocks requested. But if the wait event is latch free, then p2 contains the
latch number, which can be joined with v$latchname to determine the latch for which the
session is waiting. Note that starting with Oracle Database 10g, v$session contains all the
v$session_wait columns. Table 2-2 lists some of the important columns in this view.
Table 2-2. Selected columns of the v$session_wait performance view
Name Description and Unit Example

sid Session identifier 205
db file
event Actual event name as text
scattered read
p[1,2,3] Parameter in decimal 4
p[1,2,3]raw Parameter in hexadecimal 4
p[1,2,3]text Parameter description File number
wait_class Wait event classification (11g+) User I/O
wait_class_id The unique wait class ID (11g+) 1740759767
wait_class_# Unique wait class number (11g+) 8
WAITING: Session is currently waiting
WAITED KNOWN TIME: Previous wait
completed; the session is currently not
waiting
state WAITED SHORT TIME: Last wait was WAITING
less than 1/100 second
WAITED UNKNOWN TIME: Last wait
time is unknown; perhaps instance
parameter timed_statistics is false
49
Suppose we wanted to take a closer look at a particular session. Continuing our example
based on Figure 2-8 and Figure 2-9, look closely at Figure 2-10. This is a very simple script
based on v$session_wait. In this example, we caught session 205 in the middle of a wait
and the details are exposed. Figure 2-10 clearly shows session 205 is currently waiting for an
operating system multiblock read to complete. And we also know that 16 blocks are being
requested (P3), the first block’s file number is 4 (P1), and its block number is 12925 (P2).
SQL> @swsid 205

Report: swsid.sql OSM by OraPub, Inc. Page 1
Real Time Session Wait For SID=205
Wait Event P1 P2 P3
------------------------------ ------ ------ ------
db file scattered read 4 12925 16
Figure 2-10. This script is based on the v$session_wait view. This is easy to tell
because the output contains the parameter columns. Session 205 is currently waiting for the
operating system to retrieve 16 blocks.
With the file number and block number known, we can build a script around the
dba_extents view to determine the object’s owner, name, segment type, tablespace name,
and even the database file in which the block resides! Figure 2-11 shows the output from such
a script. The objfb.sql script referenced in Figure 2-11 is somewhat complicated, so it is
shown in Figure 2-12.
SQL> @objfb 4 12925

Report: objfb.sql OSM by OraPub, Inc. Page 1
Object Details For A Given File #(5) and block #(3163)
File number :4
Block number :12925
Owner :OE
Segment name :ORDERS
Segment type :TABLE
Tablespace :USERS
File name :/u01/oradata/prod3/users01.dbf
1 row selected.
Figure 2-11. Knowing a session is waiting for a specific file and block number in conjunction
with accessing Oracle’s data dictionary, you can determine the block’s type, name, owner,
tablespace affiliation, and even the database file where it resides. The core code for the script
is shown in Figure 2-12.
50
def file_id=&1
def block_id=&2
col a format a77 fold_after
define ts_name=x
col tablespace_name new_value ts_name
select ts.name tablespace_name

from v$tablespace ts, v$datafile df
where file# = &file_id
and ts.ts# = df.ts#
/
set termout on
set heading off
select
'File number :'||&file_id a,
'Block number :'||&block_id a,
'Owner :'||owner a,
'Segment name :'||segment_name a,
'Segment type :'||segment_type a,
'Tablespace :'||e.tablespace_name a,
'File name :'||f.file_name a
from dba_extents e,
dba_data_files f
where e.file_id = f.file_id
and e.file_id = &file_id
and e.block_id <= &block_id
and e.block_id + e.blocks > &block_id
and e.tablespace_name = '&ts_name'
/
Figure 2-12. This is the code to retrieve information about a specific block. This is very useful
when linking Oracle’s analysis with the application and the operating system. This code is the
core of the OSM objfb.sql script.
If we perform a few more samples, we will get a good idea of the objects the session is
waiting for and also where the objects reside. This information provides a very important 3-
circle analysis link to the application and also to the operating system.
In this example, from an application perspective, we should expect to find SQL that is
requesting blocks from the orders table. Look for that SQL. You will find it. It must be
there, because if it were not run, it would be impossible for a session to be waiting for a block
in the orders table. From an operating system perspective, if the average wait time is
significant, we should expect the device where the associated database file resides to be very
busy, or the path from the server process to the database file to be hindered in some way. This
example demonstrates that with only limited but very specific information, you can make a
pinpoint diagnosis.
Now suppose we want to gain a better understanding of the other sessions waiting for
multiblock reads. Once again, we can reference v$session_wait. But instead of selecting
a specific session, we list all the sessions waiting on a given wait event—in this case, db
file scattered read. Figure 2-13 shows such a query, which simply uses the wait
event like a command-line parameter. At the specific moment of the query, there were three
51
sessions patiently waiting for the blocks to be retrieved from the operating system. Sampling
the file and block numbers while running a script like objfb.sql (shown in Figure 2-12)
will help build a strong connection between all three circles in a 3-circle analysis.
SQL> @swswp db%scat

Report: swswp.sql OSM by OraPub, Inc. Page 1
Session Wait Real Time w/Parameters
Sess
ID Wait Event P1 P2 P3
----- ---------------------------- ------------ --------- -----
200 db file scattered read 4 6155 16
4 rows selected.
SQL> l
1 select sid, event,
2 p1, p2, p3
3 from v$session_wait
4 where event like '&input%'
5 and state = 'WAITING'
6* order by event,sid,p1,p2
SQL>
Figure 2-13. The v$session_wait view can also be used to list wait event details about
all sessions waiting on a specific event. In this example, all four sessions are waiting for the
operating system to return 16 blocks.
If you understand the instrumentation concept, are familiar with the three core wait
event views, and use a methodical diagnosis framework, you are prepared to jump to the next
level, which is performing an ORTA. Obviously, you need an understanding of what the wait
events are telling you. Otherwise, you are like a doctor who hears what the patient is saying,
but doesn’t know enough to do anything about it. The upcoming chapters will provide you
with the knowledge you need to take a stellar diagnosis and derive solutions that will directly
attack the problem.
Oracle Time Classification

What is your response when someone asks you what you do for a living? Perhaps you simply
say you work “in IT” or “with Oracle databases.” It’s very unlikely that you say something
like, “Well, it’s interesting you should ask. Today, I diagnosed a very complex Oracle
database system. And you know what? I discovered that 75% of all the Oracle session wait
time was because of latching issues. Upon closer inspection, I found the latch in question was
the CBC latch! It makes sense, you know, because the server was completely CPU-bound,
and there were two SQL statements generating tons of logical IO…”
52
Most of us would guess at just how familiar the person asking the question is with IT,
and more specifically with Oracle systems, and then make our first shot at explaining what we
do. I usually start at a high level with something like, “I work with computers.” If the person
seems interested, I slowly work my way into more detail until I can tell I’ve reached his
technical limit or he simply doesn’t care to know more.
My point is that to effectively communicate technical topics, we must understand our
audience and our objectives. And all this must occur without misleading or being
condescending.
From Chapter 1’s description of ORTA, you know that response time is the sum of
service time and queue time. From the previous discussion of the wait event views, you know
that through Oracle’s wait interface, you can collect very detailed queue time information.
Oracle also allows service time collection with some level of breakdown. We can combine
our objectives to simplify and to effectively communicate response time by categorizing the
detailed time components.
Figure 2-14 illustrates various levels of Oracle-focused time classification. This
information could represent the results of an entire Oracle system over a one-hour period. It
could also represent the results of profiling a single transaction, a single session, or perhaps a
group of sessions. Regardless, the diagnostic and communication aid provided by such time
classification are significant. The benefit increases when combined with a little information,
such as available CPU capacity, workload activity, and a response-time graph.
Notice there are a number of classification levels, allowing you to pick the most
appropriate for the given occasion. When combined with Oracle’s percentage of the operating
system CPU consumption, level 2 provides insights into the CPU subsystem utilization,
without running a single operating system command! You’ll learn more about this in Chapter
5. When talking with management, I usually work at level 3, and if the managers can grasp
more detail, I’ll go to level 4. When talking with DBAs and devising solutions, I’ll be
working at level 5.
The ability to summarize complexity to enable a broader understanding is very desirable.
The more senior the DBA, the more he tends to realize this challenge and value this skill.
Acquiring technical prowess is required, but if you want to continue moving forward in your
career, you must be able to transform very complex topics into seemingly very simple
concepts. This is where ORTA and, more specifically, time classification shine.
For example, one way to explain the performance situation is to say, “Oracle’s wait time
is composed of 55% db file scattered reads, 35% db file sequential
reads, and 10% various wait events.” A more effective way to start the conversation would
be to say, “Oracle response time is all about IO. And if you look into the details, you’ll find
that reading multiple blocks per request from the IO subsystem is what’s really hurting
performance.” This more simplistic approach is not misleading and should not be
condescending, yet it communicates exactly what needs to be understood.
As DBAs, we have many ways to classify time, which gives us many options when
communicating concepts to a variety of audiences. I personally use a very simplistic
classification scheme for both service time and queue time. Oracle began classifying its wait
events in Oracle Database 10g, which uses 12 classifications; 11g uses 13. For performance
firefighting, I find it much more useful to keep the number of classifications to around 4.
53
Figure 2-14. Various levels of Oracle-focused time classification. The contents could be the
result of a one-hour snapshot of all Oracle activity or the result of profiling a specific Oracle
transaction. There are numerous classification schemes. The schemes shown here are
exceptional for firefighting purposes. When combined with a response-time curve, the
communication impact is phenomenal.
Here, I’ll introduce queue time classification and service time classification, discussing
how these classifications can be used and their data sources. You’ll find more details about
these topics in Chapter 5.
Queue Time Classification

The objective with time classification is to improve our diagnosis and the ability to
communicate the results of our analyses. Use just enough complexity to get the job done. In
most cases, four or five queue time classifications are all that are needed. One level up from
the wait events themselves, I typically use only three classifications: other, IO reads, and IO
writes. If the issue is a combination of similar other category wait events, I’ll collapse those
into a single “sub-other” category. For example, in Figure 2-14, notice that I broke down the
other category into latching and non-latching. With just a few categories, I am able to pinpoint
the problem, perform an ORTA, and complete a 3-circle analysis, and I can communicate the
results to both technical and nontechnical people. All queue time information was gathered
from the Oracle wait event views.
54
Figures 2-15, 2-16, and 2-17 (shown below) are parts of an instance-level ORTA report
based on a 120-second interval. Looking at the report in Figure 2-15, you can see the queue
time is classified at different levels. While total queue time is 621 seconds, the next level is
simply IO time (615 seconds) and non-IO time (6 seconds). If you work down the report and
to the other figures, you’ll notice the IO queue time classified as read time (586 seconds) and
write time (30 seconds). Finally, all the queue time classifications result in the actual wait
events with their associated time. Chapter 5 will give you a deeper understanding of queue
time classification, including how being too focused on wait events can lead to a
misdiagnosis.
Service Time Classification

While Oracle’s wait time instrumentation is very comprehensive, this is not the case with
service time; that is, CPU consumption or CPU time. For versions before Oracle Database
10g, service time is based on the v$sysstat and v$sesstat views, which indicate total
consumption, as well as all parse time and recursive SQL CPU time. Systems based on Oracle
Database 10g and later can take advantage of the v$sys_time_model and
v$ses_time_model views, which provide more accurate timing and an additional
background process CPU time category.
While Oracle does record CPU consumption related to parsing and recursive SQL, they
can overlap each other, which muddies our timekeeping and reduces their usefulness. Parsing
CPU consumption includes all parse time CPU consumption from all SQL, including server
processes and background processes. Even parsing CPU consumption from recursive
statements is included. Recursive CPU consumption also includes all recursive CPU
consumption, regardless of whether the SQL originated from a server process or a background
process. This also includes recursive CPU time involved with parsing. This means there is a
crossover in the time accounting. As a result, we cannot classify CPU consumption (service
time) into the clean and separate buckets of parse CPU time, recursive CPU time, and
everything else (other).
A few words about recursive SQL are in order here. While I typically look at recursive
SQL as anything that a DBA or application developer did not type, Oracle’s internal
definition is stricter and actually means that even application SQL can consist of a significant
amount of recursive SQL. Oracle’s internal definition of recursive SQL is that, when traced,
the depth is greater than zero. If you look at a trace file, you will notice each statement has a
reference to its depth, such as dep=0. A simple SQL statement you enter or a simple PL/SQL
loop you enter into SQL*Plus will have a depth of zero. However, if you place a SQL
statement within a PL/SQL loop that gets executed 500 times, those 500 executions will have
a depth of one and will be considered recursive SQL. This is why the recursive figures can
seem higher than you might expect.
Figure 2-15 (shown in the next section) shows the total service time of 10 seconds. The
service time classification is simply server process (SP) time (9 seconds), background process
(BG) time (1 second), all parse CPU time (1 second), and all recursive SQL time (0 second).
Most Oracle systems will consume CPU primarily in the server process (SP) category.
Chapter 5 provides collection details and the associated service time classification math.
Oracle’s Statspack and AWR reports provide the detailed level response-time analysis
data. However, the reports are clearly not focused on ORTA. In fact, part of the challenge of
using Oracle’s reports is to know what not to focus on, so you can avoid needless hours
55
immersed in unproductive work. Once you do a few ORTAs based on these tools, you’ll
discover that performing such an analysis doesn’t take all that long.
OraPub’s Response-Time Analysis Reports

Being that OraPub is focused on Oracle performance management, ORTA is obviously
foundational. Over the years, various reports and tools focused on response time have been
developed and included in the free OSM tool kit. Three key reports are a session-focused
report (rtsess.sql) and two instance-focused reports (rtsysx.sql, rtpctx.sql).
Figures 2-15, 2-16, and 2-17 are actual output from the rtsysx.sql report on a test system.
Figure 2-18 is an example from the rtsess.sql script.
As I introduce these two reports here, and explore their secrets in Chapter 5, pay close
attention to where the same information can be retrieved from the tools you currently use. I
have discovered that nearly all Oracle diagnostic tools and products have the information you
need to perform an ORTA. It’s just that some products make it easier than others.
In the OSM tool kit, by default, the categories are bogus (events usually not relevant to
performance analysis), ior (IO read events), iow (IO write events), and other (everything else).
Part of the tool kit installation routine runs a separate script (event_type.sql) that loads
a reference table containing each wait event and its associated category. When I run response
time-based reports, I simply reference this table to appropriately categorize the time. If you
want to add a category or change a wait event’s category, just modify the
event_type.sql script and rerun it. From that point on, all the OSM ORTA reports will
be based on your preferred categorization scheme. This flexibility also allows you to easily
categorize new wait events or fix misclassifications.
Instance-Level ORTA Reporting

Figures 2-15, 2-16, and 2-17 show the output from OraPub’s instance-level (what I usually
call system-level) interval ORTA report. The report is based on the rtsysx.sql script,
which captures instance-wide response-time details for a specific time interval. As Figure 2-
15 shows, the script starts by taking a snapshot of instance-level statistics (v$sysstat and
v$sys_time_model) and instance-level wait-event statistics (v$system_event).
During the 120-second interval, the script wakes up every 10 seconds, queries the active SQL
from v$session, and stores the currently running SQL_ID. At the end of the report
duration, another statistics snapshot is taken, the differences in time calculated, and the report
produced. Nearly all this information can be gathered from a Statspack or AWR report. The
benefit of using the rtsysx.sql script is the output is formatted for a quick ORTA.
Throughout this book, I will explain the various parts of the report. For now, I will just
provide a broad overview.
The report consists of multiple sections, each focused on a particular ORTA component:
• The first part focuses on understanding the workload.
• The second part focuses on the high-level response time categories.
56
• The next sections focus on the IO situation and then the non-IO situation. The SQL
that was identified during the report interval has its key performance attributes also
displayed, providing an understanding of what SQL was involved in the workload,
service time, and wait time situation.
• The next section helps identify SQL that is not using bind variables.
• For Oracle Database 10g and later, an additional section shows the operating system
CPU utilization.
As you can see, this report pulls together a lot of information, consolidating and
formatting it to give analysts a jump-start in their ORTAs.
Part 1: Workload Metrics

The first section shown in Figure 2-15 provides a number of workload metrics common to the
Load Profile section in both Statspack and AWR. This information is useful when comparing
different response-time snapshots. Keep in mind that a decrease in the response time is to be
expected if the workload decreases. To ensure you don’t fall into the trap of thinking the
system is performing better, you need to know the snapshot workloads.
Part 2: Response Time Summary

The second section is the response time summary. This provides a high-level view of the
response time components. In Figure 2-15, the total service time is 10 seconds of CPU, and
the total wait time is 621 seconds. Written another way, during the 120-second interval,
Oracle processes consumed only 10 seconds of CPU time and waited 621 seconds. The wait
time is classified into IO time and non-IO time. In Figure 2-15, you can see the IO wait time
is 615 seconds and the non-IO wait time is only 6 seconds. Clearly, Oracle processes are
spending a significant portion of their time waiting for blocks outside the Oracle buffer cache!
A key statistic shown in the rtsysx.sql report is the Oracle CPU utilization
consumption. During the 120-second interval, Oracle processes consumed only 8% of the
total CPU available. This statistic is simply the Oracle CPU consumption divided by the host
CPU time capacity. Figure 2-15 shows that Oracle processes consumed only 10 seconds of
CPU. The total CPU capacity the host can provide is the number of CPU cores multiplied by
the report interval. In this case, there is a single CPU core and the interval is 120 seconds,
resulting in a total CPU capacity available of only 120 seconds. Therefore, the percentage of
Oracle CPU consumed is 8%, or 10/(1*120), which is 8.33%. This simple statistic gives us a
surprisingly useful understanding of this instance’s host CPU impact. If this is the only
instance running on the host, it also provides an operating system CPU utilization picture,
without executing an operating system command. In this situation, this is the only instance;
therefore, we would expect the operating system CPU utilization to be a few percentage
points higher than 8%.
57
SQL> @rtsysx 120 10
OraPub's Response Time Analysis (RTA) interactive instance level delta report
Initializing response time delta objects...

Sleeping and probing active SQL for next 120 seconds...
Done sleeping...gathering and storing current values...

*** Workload Metrics
RT Ratio Ora Trx/s Block Changes/s User Calls/s Execs/s
-------- ---------- --------------- ------------ ------------
0.985 0.04 2.19 1.09 3.08
*** Response Time System Summary (delta - interactive - instance level)
Tot CPU CPU SP CPU BG CPU Recur CPU Parse Tot Wait IO Wait Other Wait
Time Time Time Time Time Ora CPU Time Time Time % IO % Other
(sec) (sec) (sec) (sec) (sec) Util % (sec) (sec) (sec) Wait Wait
---------- --------- --------- --------- ---------- ------- ---------- --------- ---------- ------ --------
10 9 1 0 1 8.0 621 615 6 99 1
*** I/O Wait Time Summary w/Event Details (delta - interactive - instance level)
IO Wait IO WRITE IO READ
Time Wait Time Wait Time % IO % IO
(sec) (sec) (sec) Write Read
-------- --------- ---------- ------ -----
615 30 586 5 95
Tot Call Avg Call
Wait Time Wait Time
IO Wait Event R,W % (sec) (ms) Tot Waits
------------------------------------------------------- --- ----- ----------- ----------- ----------
db file scattered read (User I/O) R 64 395.34 35.93 11,003
read by other session (User I/O) R 30 182.77 97.17 1,881
Figure 2-15. The first sections of OraPub’s instance-level ORTA report, reporting over a 120-second interval. The sections shown here
are the initialization details and the Workload Metrics, Response Time System Summary, and IO Wait Time Summary with Event Details.
58
59
*** Other Wait Time (non-I/O) Event Detail (delta – interactive – instance level)
Tot Wait Avg Call
Time Wait Time
Non IO (other) Wait Event % (sec) (ms) Tot Waits
------------------------------------------------------- ----- -------- ----------- ----------

latch: library cache (Concurrency) 85 5 129.49 39
enq: RO - fast object reuse (Application) 39 2 380.00 6
latch: shared pool (Configuration) 8 0 230.00 2

latch free (Other) 4 0 115.00 2
*** SQL Activity Details During Probe
Phys Rds Log Rds Tot Time CPU Time Rows Stmt
SQL ID Sec/EXE (k) (k) (sec) (sec) Sec/PIO Sec/LIO Runs (k) Sorts Type
---------------- --------- -------- -------- --------- ---------- ------- ------- -------- ------- ----- -----
ajgxt6x8dmsay 54.48 47 56 163.4 2.7 0.003 0.003 3 0 0 SELEC
bxcn1wjupmsa8 69.96 29 37 139.9 1.7 0.005 0.004 2 0 0 selec
gz5bfrcjq060u 1.80 0 0 30.6 1.1 15.297 0.072 17 0 32 INSER
2wa086qbucg7c 0.00 0 0 0.0 0.0 ####### ####### 1 0 0 selec
96g93hntrzjtr 0.00 0 0 0.0 0.0 ####### 0.000 1 0 0 selec
Figure 2-16. The second sections of OraPub’s instance-level ORTA report, reporting over a 120-second interval. The sections shown are
the Other Wait Time (non-IO) Event Detail and SQL Activity Details During Probe.
*** Similar SQL Statements During Delta
SQL Statement (shown if first 10 chars) Count

----------------------------------------------------------------- --------
SELECT SUM 2
select sum 2
*** OS CPU Breakdown During Delta
Category Percent
--------------------------------- --------
Idle 83.28
Nice 0.00
System 4.19
User 12.53
Figure 2-17. The final sections of OraPub’s instance-level ORTA report, reporting over a 120-second interval. First is the Similar SQL
Statements During Delta section, followed by the OS CPU Breakdown During Delta section.
60
It may seem odd that a 120-second interval can show 621 seconds of wait time. This is
very common in Oracle systems. The 621 seconds of wait time are the total wait time for all
sessions during the 120-second interval. For example, if there were 1,000 sessions, and each
session waited 1 second during the 120-second interval, then the total wait time would be
1,000 seconds. If 2,000 sessions each waited 2 seconds, then the total wait time would be
4,000 seconds. The more time processes wait, the more wait time. So one way to increase the
wait time is to keep increasing the workload, which can be represented by the number of
sessions or the work the sessions are doing.
Part 3: IO Wait Time Summary with Event Details

If IO is an issue, you will certainly want to know if it’s read- or write-focused. IO
administrators can also benefit from knowing the type of IO load (read or write; single block
or multiblock) Oracle is putting on the system. The Oracle IO-centric solutions are also very
different based on the IO load type. An IO read issue can be minimized by keeping the more
popular blocks in Oracle cache, whereas an IO write issue is minimized by core configuration
issues, such as the number and size of the online redo logs.
The second part of Figure 2-15 lists the IO-related wait events, along with their wait-
time percentage and also the average wait time. The average wait time for IO events provides
a kind of backdoor view into how the IO subsystem is performing and also helps eliminate
worthless IO-centric Oracle solutions. For example, if the average sequential read time were 1
ms, then asking the IO team for better performance could result in you being physical
assaulted. The situation captured in Figure 2-15 shows multiblock reads are taking 36 ms to
complete! That is a horrendous situation, which demands immediate attention from everyone
involved.
As another example, suppose that you receive a call about long commit times. The users
are saying “submits” or “saves” are taking longer today than yesterday. Upon an examination
of the average log file sync wait times (which indicate commit times from an Oracle
perspective), you discover there is no change from the previous day. Therefore, you know the
performance issue is not because of Oracle’s commit mechanism and most likely not related
to the database.
Part 4: Other Wait Time (Non-IO) Event Details

Figure 2-16 shows a non-IO wait time summary, along with the underlying wait event detail.
While most Oracle performance problems start out as an IO issue, after working on the
problems, many turn into memory management issues. Depending on the event mix,
sometimes I will perform an additional classification manually.
Starting with Oracle Database 10g, each latch of significance has its own wait event. If
you want to know the total latch-related time, you’ll need to sum the individual latch times.
Usually, this is not necessary, even with severe latch contention, because one latch will
dominate and be your main focal point.
In Figure 2-16, the top non-IO wait event is library cache latch contention. Is this an
issue? While 85% of the non-IO wait time is related to library cache latch contention, the 5
seconds it consumes, compared to the total wait time of 621 seconds, are insignificant.
Part 5: SQL Activity Details During Probe

To help in analyzing the application, the report captures the SQL directly affecting the
response times and shows its resource consumption. As Figure 2-16 shows, only the SQL
61
identifier and key performance attributes appear in the report. For performance reasons, the
actual SQL statement text is not collected.
During the reporting interval, every few seconds, the currently running SQL statement’s
identifier (SQL_ID) is captured from v$session. This part of rtsysx.sql provides the
SQL statement’s interval activity. For example, if SQL statement 123 were first captured and
had already consumed 10,000 Oracle blocks, and during the last probe, it had consumed
15,000 blocks, this probe report would show the statement consuming 5,000 blocks during the
report interval.
The SQL Activity Details During Probe section in Figure 2-16 captured a few
statements. IO read queue time dominates this ORTA; therefore, locating physical IO SQL is
extremely important to complete our analysis. The first two statements in this section have the
largest number of physical reads! It is very common for this simple sampling method to
quickly identify the key SQL statements that need special attention.
Part 6: Similar SQL Statements

The second parameter in rtsysx.sql is associated with finding similar SQL statements.
Similar SQL statements are exactly the same, except for the filtering and join conditions in
the where clause. Oracle considers these unique statements, even though they are very
similar.
Look at the rtsysx.sql command line in Figure 2-15. The second parameter is 10. If
you use 100 as the parameter, then all SQL statements captured during the report interval that
have the same first 100 characters are counted and displayed. And only statements with a
count greater than 1 are shown.
Figure 2-17 shows that only two SQL statements had the same first ten characters, and
each of these occurs only twice. So it appears that nonbind variable SQL is not an issue.
When nonbind variable SQL is an issue, this section may show hundreds of SQL statements
with the first 100 characters exactly the same. An equivalent OSM script is simsql1.sql.
Part 7: Operating System CPU Utilization

Figure 2-17 shows the last section of rtsysx.sql, displaying the operating system CPU
utilization details. Starting in Oracle Database 10g, Oracle captures operating system CPU
details and makes the information available through the v$osstat view. Chapter 5 provides
more details about the v$osstat view, and Chapter 4 discusses the CPU utilization
components.
Session-Level ORTA Reporting

Profiling a session or a group of sessions is another term for session-level ORTA. There are a
number of formatting possibilities, with every report writer claiming his is the best. Figure 2-
18 shows just one option for this type of report. This format clearly supports ORTA from a
high-level classification all the way down to the wait events and service time categories.
Session-level ORTA is useful when a very specific business-centric problem has been
reported, and the underlying database interaction can be identified. Keep in mind that Oracle
and operating systems are designed and built to share resources. This implies that any
singularly transactional-focused report is the result of both the transaction under investigation
and all the other activity. No Oracle session operates untouched by other Oracle sessions,
62
because Oracle is built to share resources. Don’t be fooled into thinking your session-level
profile output is the result of only the processes run by the session being profiled.
Profiling a Single Session

There are times when a user’s session can be uniquely identified by its session identifier and
serial number. When this is possible, it is usually very easy to profile that session based solely
on Oracle’s performance views.
Figure 2-18 shows the result of profiling a session based on its session identifier, which
is the sid column from the v$session view. This report is run with careful user
coordination, as follows:
• The performance analyst identifies the application user’s session identifier.
• The user is told to navigate in her application to just before she ran the poor
performing operation.
• Just before the performance analyst tells the user to execute the operation, the
performance analyst starts the collection.
• When the user’s operation is completed, the performance analyst is alerted and stops
the collection.
After a few seconds of determining the differences between the ending and starting
performance statistics, a report similar to the one shown in Figure 2-18 is created.
Looking closely at the report, notice that queue time accounts for 86% of the response
time! Now look at the queue time summary and notice that most of the queue time is
classified as Net+Client. The Net+Client time is the Oracle client process run time and
network communication between the Oracle client process and the Oracle server process. If
you spend a few minutes reviewing the report you can see how quickly a full understanding of
the response time situation can be captured. An instance-level ORTA, combined with
identification of the SQL being executed, can lead to a surgical solution.
Figure 2-18 mirrors a consulting engagement I had a number of years ago. That
engagement presents an interesting view into why a typically filtered-out wait event can be
very useful. I received a call from a company manager who suspected a performance issue
was the result of an application he purchased from a vendor. But when he confronted the
vendor, he was told that the problem was the database server. I suggested profiling a real user
running the application. Because of the client/server architecture, once the user’s Oracle
server process was identified, it was simple to determine the Oracle session identifier. With
the session identifier known, using the rtsess9.sql script, I profiled the poorly
performing part of the application multiple times. (Interestingly, the user was on a different
floor when this was done, and all our communications were through a speakerphone.)
Key to understanding a situation similar to Figure 2-18 is recognizing the single
underlying wait event for the Net+Client queue time category is SQL*Net message
from client. When a server process is waiting to hear from its client process, it will post
the wait event SQL*Net message from client. A server process does not know if the
user associated with the client process is waiting for the client process to complete, if there is
a problem with the network, or if the user is taking a coffee break. All the server process
knows is that it has nothing to do and is programmed to scream, SQL*Net message
from client! In a very real sense, the server process is waiting for a message via
SQL*Net from its client process. So, in this case, the wait event is actually very descriptive.
63
SQL>@rtsess9 204
...
Session level response time details for SID 204
*** Response Time Summary
Response Service Queue Unaccount % CPU % Queue % UAT

Time(sec) Time(sec) Time(sec) Time(sec) RT RT RT
[rt=st+qt+uat] [st] [qt] [uat] [st/rt] [qt/rt] [uat/rt]
-------------- --------- --------- --------- ------- ------- --------
12.50 1.50 10.75 0.25 12.0 86.0 2.0
*** Queue Time Summary
QT QT QT
Queue Time(sec) I/O(sec) Net+Client(sec) Other(sec)
[qio+qnc+qot] [qio] [qnc] [qot]
--------------- -------- --------------- ----------
10.75 1.50 7.50 1.75
*** Queue Time IO Timing Detail
QT QT QT
I/O(sec) Write I/O(sec) Read I/O(sec) % Writes Time % Read Time
[tio=wio+rio] [wio] [rio] [wio/tio] [rio/tio]
------------- -------------- ------------- ------------- -----------
1.50 0.00 1.50 0.0 100.0
*** Queue Time IO Event Timing Detail
Wait Time
Wait Event Name (sec)
---------------------------------------- ---------
db file scattered read 1.35
read by other session 0.15
*** Queue Time Other Event Timing Detail
Wait Time
Wait Event Name (sec)
---------------------------------------- ---------
latch: cache buffers chains 1.21
latch: library cache 0.53
latch: cache buffers lru chains 0.01
...
Figure 2-18. Report output from a session-level ORTA based on a single and identifiable
Oracle session. The session is waiting primarily for information from the client process,
which leads the analyst to suspect issues with the client program or the network
communication between the client and server process.
Here’s the strange part: both the application user and the Oracle server process are
waiting! That is very unusual indeed. If the user has executed a command and is waiting for
the command to complete, while at the same time, the associated Oracle server process is
waiting to hear from its client process, then there is problem between the two. The two broad
problem areas are the network and the client process.
64
Referring back to my customer, the problem was indeed the client process. We checked
the network, and multiple tnsping executions resulted in a very fast and dependable
network. I asked my customer what the application was doing. The answer was something
like, “Well, it’s doing all sorts of very advanced processing related to inventory
management.” I responded, “I bet it’s doing a lot of advanced processing, because that’s what
you’re waiting for!” Armed with this simple session-level response time profile and the
underlying SQL, the manager was able to approach the vendor with full confidence that the
problem was clearly focused on the application. And guess what? The vendor listened and
fixed the problem!
Profiling a Group of Sessions

Modern-day Oracle architectures make identifying specific Oracle sessions sometimes
impractical, if not impossible. Due to multitier architectures, identifying a specific user’s
activity may simply not be worth the effort. However, the client identifier column in
v$session can be used to identify a user or perhaps just as useful, a group of users.
Figure 2-18 is based on the OSM script rtsess9.sql, which focuses on a single
Oracle session. The similar rtsess.sql script takes as its input a client identifier, which
presents the possibility of profiling a session or a group of sessions.
If you are not confident that the end user is actively waiting for the application to
respond, then while identifying a user or group of users using the client identifier solves the
identification problem, the Net+Client time (based on the wait event SQL*Net
message from client time) becomes useless. Again, the Oracle server process does
not know if the wait is due to the network, a client process, or the user. Regardless of the
reason, it will post the SQL*Net message from client wait event. A notable
exception is when profiling batch processes. Since the Oracle client process never waits on an
application user, we know all SQL*Net message from client wait time is associated
with either the client process or the network. Even with this limitation, just as when profiling
an entire Oracle system, we can still perform a very robust ORTA.
Summary
When someone is in pain but cannot verbally communicate, diagnosing the problem is
difficult. Much like a very young patient, before Oracle instrumented its source code, it was
limited in what it could say to us. Finally, Oracle instrumented its source code, and we began
listening to it.
Oracle’s words may seem strange, but when you’ve learned what phrases like db file
scattered read mean, you will be able to pinpoint the cause of its pain and understand
how the flow of information within Oracle was disrupted. In a very real sense, Oracle grew
up, and now we can listen to Oracle speak to us like an adult.
Immediately, performance diagnosis takes on a whole new dimension. Instead of
inferring where the problem resides and spending countless hours performing diagnosis, our
time shifts toward analysis. Wait-event analysis has paved the way to response-time analysis,
enabling us to quantify the user experience and develop solutions focused on directly reducing
time, instead of increasing efficiency.
But our journey has just begun. Just like a doctor, we must also understand Oracle’s
physiology to come up with solutions. Now we will continue into the heart of Oracle, where
65
mysterious algorithms and data structures reside. The framework developed in this and the
previous chapter will be filled in with Oracle internal workings and how the various pieces fit
together. Only with this knowledge can the guessing stop and spot-on solutions be developed.
66
CHAPTER
3
Serialization Control
Oracle controls serial access to memory using latches and mutexes. When a serialization issue
develops, Oracle DBAs feel it in their gut, because they know they will need to go deep—
sometimes very deep—into Oracle internals.
Even the most seasoned Oracle performance specialists shudder at the thought of dealing
with Oracle latch contention. I have heard some of the most respected Oracle performance
specialists say that once you get latch contention, there really isn’t a whole lot you can do
about it. As you will learn, this is absolutely wrong!
Why do even the best performance specialists’ knees buckle in the face of latch
contention? Because it means understanding a great deal about not only latching, but also
about specific Oracle architecture internals and some queuing theory. You also must take
risks, like suggesting bold application changes or implementing hidden instance parameters
that few have actually tried in a real production environment. If you like technical challenges,
then you’re in for a treat!
Over my years as a consultant and a teacher, I have found that to be able to resolve latch
and mutex contention, you must first understand Oracle’s general latching and mutex
algorithms, and then understand the associated specific Oracle architecture component. So,
it’s a two-step educational process. This chapter is devoted to the first part: understanding
how Oracle serializes memory access using both latches and mutexes. This includes the
general algorithms, how multiple latches are implemented, how serialization fits into Oracle’s
wait interface and also in Oracle response-time analysis (ORTA), and how to influence latch
activity with the goal of improving response time.
67
Learning about Oracle serialization is full of surprises. You’ll be amazed at the insight
you receive, and you may enjoy recognizing that most people’s latching solutions are pretty
much guesswork.
Serialization Is Death
If you’re into speed and throughput, then serialization is death. What you want is
parallelization maximization. Parallelization brings high performance, because all available
resources can be used. In resolving serialization issues, our goal is to increase parallelization
whenever possible, and when serialization must exist, to minimize it!
When working on Oracle systems, you will eventually hear some poor whiner say
something like, “Oracle is a resource hog! It consumes every bit of resources it can get its
despicable hands on!” Here’s my response, “Yeah, ain’t that great! Wouldn’t it be a shame to
have a user waiting for the application to respond when there are unused computing
resources?” Shocked and taken aback, the complainer doesn’t know how to respond.
Certainly, we want efficient uses of expensive computing resources. However, when
those expensive resources are available, to not be able to use them while at the same time our
users, whom we serve, are waiting for an application to respond is selfish on our part, and
actually quite preposterous!
When a user is waiting for the application to respond and there are available computing
resources, then parallelism is being limited and serialization has raised its ugly head.
Serialization is death to performance maximization!
Serialization and Queuing

While related, serialization and queuing are distinct, and you don’t want to confuse the two.
Queuing occurs when a process waits. But a process can wait because of serialization issues,
not just because all available resources are busy. Also, there are times when all efforts to
parallelize a process have failed, resulting in idle resources while the process is running. Let’s
take a closer look at each of these situations.
Figure 3-1 can help differentiate serialization and queuing. In Figure 3-1, the outlined
circles represent a CPU. In capacity planning, we call these servers, because they service
transactions. The solid circles represent a process. The horizontal outlined rectangle
represents the single queue, where all processes enter the system.
Figure 3-1. While serialization and queuing are related, they are distinct. Diagram A shows a
classic queuing situation. Diagram B shows a serialization issue because there are idle
CPUs. Diagram C is a serialization issue because there are processes queuing while there
are idle CPUs.
68
Diagram A on the left side of Figure 3-1 is the classic queuing situation in which all the
servers (think CPUs) are busy servicing transactions, and there are other processes waiting to
be serviced. This represents parallelism, as all available servers are involved in processing
transactions.
Diagram B in Figure 3-1 is a very common but unfortunate situation. In this case, there
is a single process running on single server (think CPU), while the other three servers remain
idle. This is the situation that causes some initial debate in my classes. Some may feel this is
not a problem because, “That’s just the way it is.” But others may feel that it’s a problem
because the user associated with the running transaction is waiting, even though resources are
available. While it is true this may just be “the way it is”—the best our technology allows—it
is also true that parallelism is lacking, resulting in a serial process. As I’ll explain in the next
section, a tremendous amount of effort and expense go into reducing the chances of the
diagram B situation occurring.
Diagram C in Figure 3-1 is extremely unfortunate and will likely be reported to the
operating system vendor. It does happen, although rarely. In this situation, for some reason,
the next process waiting in the queue cannot be assigned to either of the idle servers (think
CPUs). There is a serialization issue, and parallelism is being thwarted.
Everyone Gets Involved

While technical people don’t usually think of it this way, it’s an interesting human
characteristic to desire parallelization. Anyone driving on a busy road instinctively knows that
adding lanes will speed up traffic. Most people are not aware of it, but such a solution
increases parallelism—allowing more work to be done in a given period of time.
When businesspeople can’t find a single computer to run their business on, they will ask,
“Can we run each division on its own computer?” Without knowing it, they are asking if they
can parallelize their divisions’ IT operations.
As DBAs, when faced with a large, time-consuming data import, we instinctively ask,
“Can we break up the import by users, and then run them all at the same time?” We are trying
to parallelize!
And Oracle gets involved as well. It has multiple background processes running in
parallel, Oracle parallel query, Oracle Parallel Server (excuse me, I meant to write Real
Application Clusters), and distributed queries.
Operating system vendors try to take a single process and break it up into multiple
smaller processes so performance is not constrained to a single process. In effect, they are
trying to transform the situation from diagram B in Figure 3-1 to diagram A, in the hopes of
increasing performance. And I’m sure the CPU vendors themselves are highly involved in
reducing serialization.
My point is that everyone gets involved in reducing serialization and maximizing
parallelization. It’s not only a technical issue, but can also be a business issue.
However, there are times when all the time, money, creative thinking, and technical
prowess come to a grinding serialization slowdown. In the Oracle world, this can occur when
accessing memory. When a process wants to access Oracle memory structures, Oracle must
ensure complete serialization control. It may be able to parallelize if multiple processes want
to look at a memory structure, but it still must have full control. Memory serialization control
is what latching and mutexes are all about, and why this chapter is so important to Oracle
performance firefighting.
69
How to Detect and Resolve Contention

Detecting and resolving Oracle latch and mutex contention is straightforward. However, I do
not mean to imply that it’s simple. As you’ll see, you not only must understand how Oracle
uses latching and mutexes, but also the underlying memory structures. And this means you
must learn about Oracle internals. I’ve distilled this process to eight key steps:
• Understand the general latch and mutex algorithms. This is what this chapter is
all about. If you don’t understand the fundamentals of serialization, your solutions
are pretty much guesses—you honestly don’t know if they will help improve
performance.
• Detect significant latch and mutex contention. Every production Oracle system
contains latch and mutex contention. If this were not the case, Oracle would be out of
business, and nonconcurrency control products would be used instead. The issue is
not whether contention exists; it’s whether contention is a performance problem. As
you’ll see, Oracle’s wait interface tells us if there is enough serialization to prompt
us to act.
• Determine the problematic latch or mutex. With rare exception, the best solutions
are latch- and mutex-specific, which means they are not general, but target a specific
latch or mutex. Knowing the contention exists is not enough for a pinpoint diagnosis.
You need to determine specifically which latch is the problem. Again, Oracle’s wait
interface provides us with this information.
• Understand why the latch or mutex is being requested so often. Ask yourself,
“Why is the latch so popular? Why are processes so interested in acquiring this
specific latch?” Answer the question, and try to discover how you can influence the
situation so the latch is not being requested so often. To do this, you must understand
Oracle’s general serialization algorithm, the specific latch with contention issues,
and the underlying memory structure and associated Oracle kernel code. The
underlying memory structure and associated Oracle kernel code for the most
common latching issues are presented in Chapters 6, 7, and 8, which cover Oracle
internals.
• Understand why the latch or mutex is being held so long. Ask yourself, “Once the
latch has been acquired, why is Oracle holding it so long?’ Answer that question, and
then ask yourself how you can influence the situation to reduce the time the latch
must be held. As with the previous point, answers to this question are the focus of
Chapters 6, 7, and 8.
• Determine multiple resolution strategies. Surprising to most DBAs is that once the
preceding points have been addressed, you will actually be able to arrive at multiple
solutions. And these solutions will not be guesses or even good guesses. They will
absolutely impact performance for the better!
70
• Take appropriate action. This step deserves its own bullet point because usually
one of the solutions involves changing a hidden instance parameter, which typically
requires the Oracle instance to be cycled. I don’t take this lightly. Most production
system nowadays can’t simply be cycled. You will need to prioritize the solutions
and plan their implementation based on expected performance improvement and
uptime requirements. Quantitative solution analysis is the focus of Chapter 9 and
also OraPub’s Advanced Oracle Performance Analysis class.
As you can see, understanding how Oracle latches and mutexes work is just the
beginning. But it is where you must start to resolve the most notorious of all Oracle
performance problems, so it’s worth spending your time to dig deep into the details.
Fundamental Protection Requirements

Oracle systems have many protection needs. Besides the obvious security requirements, there
are control requirements as well. Two broad categories of control are relational structures and
memory structures. Latches and mutexes are firmly aligned with memory control.
Relational Structure Control

Another name for relational control is lock management. For example, when the command
lock table employee exclusive is issued, that’s relational control.
Locks prevent an inappropriate change from occurring. When an Oracle process is
waiting for a lock, it posts an enqueue wait. I like to say using enqueues is very mature and
boring because they are orderly, structured, and just not a lot of fun. (You’ll see that latches
are just the opposite.)
The two broad locking areas are application locks and data dictionary locks. Locking the
employee table is an application lock example. Application locks are under the control of
application developers. They write the code to issue the locks. On the other hand, data
dictionary locks are under the control of Oracle kernel code developers.
Most people don’t think about this, but Oracle has an extremely response time-critical
application. This application is managing the data dictionary! Just like an end-user
application, Oracle’s data dictionary is composed of tables, indexes, and sequences. This
means locks must be used to ensure an inappropriate change does not occur. Just as
application developers must make sure application locks are employed correctly, Oracle
kernel code developers must ensure they appropriately lock their relational structures. If they
mess this up, massive enqueue waits will be posted! Oracle gave the locks it uses special
names that give us a clue about what operation is being performed. For example, a high water
enqueue is related to Oracle adjusting the high water mark.1
When DBAs think of enqueues, they typically think of data dictionary issues, but
enqueue waits are related to both application lock issues and data dictionary lock issues. In
releases before Oracle Database 10g, the wait event was simply enqueue wait, and we
1
Each Oracle table has an associated high water mark. Going from bottom (row one) to top, the high water
mark points to the topmost block that has ever contained a row. When performing a full table scan, Oracle processes
know never to scan above the high water mark—it just doesn’t make sense to do so. The high water mark is stored in
both Oracle’s data dictionary and in the table’s header block. If you do a block dump on a table’s header block, the
high water mark is very plainly shown.
71
needed to look in the v$session_wait view to determine the actual enqueue. But starting
with Oracle Database 10g, the actual enqueue is part of the wait event name. For example, the
wait event name could be enq TX – row lock contention. You can tell the enqueue
is related to an Oracle data dictionary object or an application object by the enqueue type. For
example, the high water enqueue (HW) or the space transaction enqueue (ST) are specifically
related to an object’s high water mark being updated and dictionary-based free space
management, respectively. As DBAs or application developers, we can’t directly control or
manage high water mark settings or dictionary free space—that’s Oracle’s responsibility. But
other enqueues like application table and row locks are well within application developer
control.
Enqueues are relatively expensive, because order must be maintained at all times. Just as
the name implies, requests for access must be enqueued and dequeued. To make matters
worse, there are various locking degrees, such as exclusive, shared, share for update, and so
on. While the time required to manage enqueues is deemed acceptable for managing relational
objects, for managing memory structures, it simply takes too long. So another strategy is
needed, and that’s where latches and mutexes get involved!
Memory Structure Control

Latches and mutexes are deeply involved with memory structure control. In fact, they are both
small memory structures themselves! Memory structure access must be exceptionally quick.
For example, perhaps a server process wants to know if a single block is in the buffer cache.
If a SQL statement must touch 5,000 blocks, and each time the question, “Is the block in the
buffer cache?” is asked, which takes 1 ms, it will take 5,000 ms (5 seconds) just to determine
if the blocks are in Oracle’s cache! This time does not include actually accessing the buffers,
asking the operating system for the blocks if they are not in the cache, and other cache
management requirements (discussed in Chapters 6, 7, and 8). So, every millisecond counts
with memory access. Oracle goes to great lengths to reduce the number of times a latch or
mutex is requested, and also in optimizing the code so that once a latch or mutex is acquired,
it’s not held long.
When relational objects are locked, it’s like the locking process is tightly clutching the
object and telling others to back off. In contrast, memory structure control is more like
acquiring a token, which is the control structure. Once the process has the control structure, it
is then, and only then, allowed to access the memory structure. So, memory structure access is
indirect. First, you must acquire the control structure, and then you can run the kernel code
that accesses the memory structure. A good mental model of this is to think of a latch or
mutex like the baton in a track race relay. Once you have the baton, you are free to run!
Figure 3-2 shows the three main components; kernel code modules, control structures,
and the control structure’s associated memory structure. The control structures could be either
latches or mutexes. Figure 3-2 shows that module A has acquired control structure CS101. So,
module A is now free to run the kernel code that accesses memory structure MS1. Module Z
has acquired control structure CS411 and is authorized to run the kernel code to access
memory structure MS3. Both module B and module X are attempting to acquire control
structure CS411, because they want to access memory structure MS3. However, module Z has
acquired control structure CS411, and so both module B and module X must wait.
72
Figure 3-2. A representation of the relationships between code modules, control structures,
and memory structures: Module B and module X are contending for control structure CS411,
which is currently being held by module Z. Each memory structure has an associated control
structure. Before a memory structure can be accessed, the code module must first acquire the
associated control structure.
It is important to notice that control structure CS411 has ensured serial access to
memory structure MS3. Said another way, without control structure CS411, module B,
module X, and module Z could simultaneously be running and accessing memory structure
MS3. If any of these three modules makes a change to the memory structure, the results could
be memory corruption! As you can see, latches and mutexes play an important role in
controlling and ensuring serial memory access.
Let’s look at this from more of a kernel code perspective. Figure 3-3 represents a kernel
code section. When I worked for Oracle, I rarely saw pure Oracle database server kernel code.
One of those rare times was when I was talking with a developer about the redo allocation
process. While I don’t remember the conversation details, I distinctly recall seeing code
similar to Figure 3-3. There was a very simple conditional statement followed by a large
chunk of highly indented code. I was shocked at the simplicity and straightforwardness of the
latch request (in this situation). I asked the developer if asking for the latch is really this
simple. With a puzzled look on his face, he replied, “Well yes, of course it is.” Although I was
a little embarrassed, it was a good lesson I’ll never forget.
73
if get_control_structure(name, type, mode)

{
blah, blah, blah
…
release_control_structure(name);
}
Figure 3-3. An obvious example showing that unless the control structure is acquired, code
cannot access the memory structure associated with the control structure.
One key point to notice in Figure 3-3 is that the control structure must be acquired.
There is no such thing as a valiant try or everyone is a winner when asking for the control
structure. You either successfully acquire the control structure or you do not. And you cannot
execute the kernel code to access the control structure’s associated memory structure until you
have acquired the control structure.
Both latches and mutexes can be used to pin a memory structure. An object is pinned to
ensure it is not deallocated or removed. In contrast, a lock is used to prevent an inappropriate
change. For example, if a buffer in the block buffer cache is being changed, it is first pinned
to ensure another process does not replace the buffer with another block from disk. Multiple
processes can pin the same object, and the object is considered pinned if at least one process
has it pinned. Another example is related to a cursor in the library cache. When a SQL
statement is being executed, its cursor is pinned to ensure its memory is not deallocated and
reallocated for some other shared pool purpose. Pinning must occur extremely quickly.
Here are a few miscellaneous bits of information about latches and mutexes:
• Oracle continues to improve latches and mutexes. For example, in Oracle Database
10g Release 2, Oracle uses less memory per latch.2
• Starting in Oracle8i, latches can be acquired either as shared or exclusive. To
exclusively acquire a latch or a mutex (available starting with Release 10.2), there
can be no shared holders.
• While latches and mutexes ensure serialization, there is no ordering or queuing
involved when a process is attempting to acquire the control structure. This is in
distinct contrast to enqueues. As you’ll learn in the next section, it is actually quite
humorous how processes contend for a latch or mutex.
Oracle Latch Specifics

A few years ago, a separate chapter section on latching would not have made sense. But in
Oracle 10g Release 2, mutexes were introduced to complement latches. Although latches and
mutexes serve the same purpose, they are different enough to warrant their own section. But
don’t forget that latches are not an Oracle invention. Oracle has numerous patents and also
acquires know-how by acquiring other companies. But like other database vendors, Oracle
uses the term latch to describe a memory control structure.
2
I am in no way implying Oracle sincerely cares about how much memory its database system requires.
74
How Multiple Latches Are Implemented

It’s pretty straightforward how a single latch can be used to control serial access to a memory
structure. But, as we all know will eventually occur, when multiple processes need to access
the same memory structure, contention results. As the term serial execution implies, one, and
only one, process can hold the latch at any given time. This means that significant contention
for the latch is very possible. Once again, our need to increase performance draws us toward
parallelism. In simplistic terms, this requires multiple latches (sometimes called child
latches). And once two latches exist, we also must have a facility to ensure that the two latch
holders are not corrupting Oracle’s memory structures.
One solution is have a master latch coordinating and controlling many slave latches, but
that would mean additional kernel code for coordination. Time spent on latch acquisition is
not for actually accessing the underlying memory structure, but rather for gaining the
authorization to do so. It is a sad situation indeed when the performance issue resides in the
acquisition process itself, not the memory structure access. Using a master/slave latching
architecture could easily result in too much time spent acquiring a latch.
Another solution is to structure the multiple latches to take advantage of the underlying
memory structure itself. For example, if the memory structure consists of 500 linked lists,3
then instead of having one latch controlling access to all 500 lists, have 100 latches, each
related to 5 linked lists. Referring back to Figure 3-2, it would be like control structure CS101
being associated with memory structure MS1-A, MS1-B, ... MS1-n, and control structure
CS375 being associated with memory structure MS2-A, MS2-B, … MS2-n. Oracle takes this
approach with at least two memory structures: the least recently used list and the cache buffer
chains.
Least Recently Used Lists

Years ago, it was common for Oracle’s single least recently used (LRU) list to have latching
contention issues. LRU lists are used to ensure that the popular buffers remain in the buffer
cache and to help server processes quickly locate free and unpopular buffers. The LRU lists
are not used to find or locate a specific block; that is the purpose of the cache buffer chains.
One approach that can work well with a single long linked list is to simply divide the list
into multiple smaller lists, as illustrated in Figure 3-4. Each of the smaller lists will have its
own latch.
3
Abridged from Wikipedia, a linked list is defined as one of the fundamental data structures, which can be
used to implement other data structures. It consists of a sequence of nodes, each containing arbitrary data fields and
one or two references (links) pointing to the next and/or previous nodes. A linked list is a self-referential data type
because it contains a pointer or link to another datum of the same type. Linked lists permit insertion and removal of
nodes at any point in the list, but do not allow random access. Several different types of linked lists exist: singly
linked lists, doubly linked lists, and circularly linked lists.
75
Figure 3-4. One approach to implement multiple latches is to divide a large single linked list
into multiple smaller linked lists. Oracle did this with the least recently used (LRU) structure.
Let’s take a close look at Figure 3-4. The lines represent the linked lists, and the arrows
represent the latches. The top single LRU is protected by a single LRU latch. It’s easy to
understand with just a few users needing to acquire the latch, significant latch contention
could occur. So, as the bottom picture in Figure 3-4 shows, Oracle divided the single LRU
structure into multiple LRU lists, each with its own LRU latch. By default, Oracle has
multiple LRU latches, and each LRU latch has one or more associated LRU lists. For
example, even for a small 250MB buffer cache in Release 1 of Oracle Database 10g and 11g,
by default, Oracle creates eight LRU latches.
However, simply dividing a large single list into smaller lists may not always work. It
depends on how the memory structure is accessed. The Oracle kernel architects must ensure
deadlock-type situations are unlikely, and if they do occur, they are recoverable. Oracle
architects should know (and we really hope they do) that this particular situation was well
suited for a simple multiple-latch solution.
We will look more closely at LRU structures in Chapter 6.
Cache Buffer Chains

The cache buffer chain (CBC) structure is used to quickly answer the question, “Is the Oracle
block in the buffer cache?” Thinking about all the Oracle systems in the world today, the
number of times this question is asked each day is mind-boggling.
The CBC is essentially a searching structure. There are many search algorithms for
Oracle to choose from, but for reasons I’ll detail in Chapter 6, Oracle chose what is known as
a hashing structure. Figure 3-5 is an example of a hashing structure.
76
Figure 3-5. A hashing structure can be used to quickly locate (search for) a specific object.
The memory chains shown in option A on the left are protected by a single latch, whereas
with option B, the chains are protected by three latches.
Essentially, a hashing structure is composed of a bunch of chains. When looking for an

object (think block buffer or cursor), the searcher is directed to one of the chains. Then the
searcher sequentially searches the chain, hoping to find whatever it is seeking. With Oracle’s
hashing implementation, if the object is not found in the specified chain, then the object is not
in the cache.
Option A in Figure 3-5 shows the entire hashing structure is protected—its access is
controlled—by a single latch. While this will ensure serialization, with even limited
concurrency, it will also ensure significant latch contention. As option B shows, a hashing
structure can naturally be divided into group of chains, with each group being protected by its
own latch. Each latch in option B in Figure 3-5 will be responsible for serialized access to a
specified range of chains. In this example, the number of chains will be either three or four.
The hashing structures described in Table 3-1 will give you an idea of just how large
Oracle’s CBC memory structures can be. For example, the Oracle9i production database
contains 1.4 million chains, not 10 chains as shown in Figure 3-5. These are massive memory
structures! To reduce the likelihood that a process will be contending for a latch, based on the
number of buffers the DBA set, Oracle created 8,192 CBC latches! While this may seem like
a lot, each latch must still ensure controlled access to an average of 170 chains.
We’ll dig deeper into the CBC structure in Chapter 6, but this should whet your appetite!
77
Table 3-1. Selected Oracle cache buffer chain attributes. All numbers are Oracle defaults,
except the chosen block size and the chosen number of block buffers.
Oracle Chains per
Data Buffers Hash Chains CBC Latches
Release CBC Latch
8.1 (4KB) 76,800 153,600 1,024 150
9.2 (8KB) 699,994 1,400,000 8,192 171
9.2 (8KB) 10,163,200 20,326,433 1,048,576 19
9.2 (8KB) 1,316,273 2,632,549 16,384 160
10.1 (8KB) 327,680 1,048,576 4,096 256
10.2 (8KB) 1,203,798 4,194,304 16,384 256
10.2 (16KB) 712,545 2,097,152 65,536 32
10.2 (32KB) 749,486 2,097,152 65,536 32
11.1 (8KB) 32,256 65,536 2,048 32
11.2 (8KB) 63,776 131,072 4,096 32
11.2 (8KB) 125,559 262,144 8,192 32
11.2 (8KB) 149,475 524,288 16,384 32
Oracle’s General Latching Algorithm

You can learn a lot by working through Oracle’s general latching algorithm. Even at the
highly abstracted view I’ll walk you through, the benefit will last your entire DBA career.
You will learn how latching appears in Oracle’s wait interface, how and if CPU time is
accounted for, why latch contention and CPU saturation are so common, and why increasing
the instance parameter _spin_count is usually a very bad idea! So let’s dig into the
algorithm one step at a time.
Figures 3-6 and 3-7 show the core latch request functions. It all starts with a process
needing to access a memory structure. The Oracle kernel code developer will know which
latch to request. For example, the developer may be working on a piece of code that needs to
find a free buffer in an LRU list. While the developer may know the latch name, if there are
multiple latches, as shown in option B in Figure 3-5, the developer might need to access
another internal Oracle structure to determine the specific child latch and its associated
memory address.
78
Function Get_Latch(latch_name,type)
{
/* Immediate Request */
If type = ‘immediate’
If Fast_Get(latch_name)
return TRUE
Else
return FALSE
End-If
/* Willing-to-wait Request */
Else
Then
return TRUE
Else
/* The Spin-Sleep loop */
While ( TRUE )
If Spin_Get(latch_name)
return TRUE
Else
Register_Event(“latch: “||latch_name)
Sleep(try++)
End-If
End-While
End-If
End-If
}
Figure 3-6. First part of the pseudocode for when a process asks for a latch. This figure
contains only the main Get_Latch function. Notice the distinction between the immediate
and willing-to-wait requests, and the spin-sleep loop.
Function Fast_Get(latch_name)
{
If try_to_get_latch(latch_name)
return TRUE
Else
return FALSE
End-If
}
Function Spin_Get(latch_name)
{
for i := 1 to _spin_count
Then
return TRUE
End-If
End-For
return FALSE
}
Function Sleep(try)
{
sleeptime := decode(try,0,0,1,10,2,20,3,~40,4,~80,...~2000)
sleep(sleeptime)
}
Figure 3-7. Second part of the pseudocode for when a process ask for a latch. The three core
supporting functions are depicted; fast_get, spin_get, and sleep.
79
Shared or Exclusive?
Once the address of the child latch is known, the developer must also know whether the latch
should be acquired in shared mode or exclusive mode. Shared mode is likely to be acquired
more quickly, because multiple processes can share a latch, but only one process can acquire
and hold a latch in exclusive mode. If the memory structure is only to be scanned, but not
changed, then perhaps the developer can ask for the latch in shared mode. But if the memory
structure is going to be altered in some way, an exclusive mode request is required.
Immediate or Willing to Wait?

The developer also needs to know if the latch request type is immediate or willing to wait. An
immediate latch request, sometimes called a nowait request, is a single “give me the latch this
instant” request. This is commonly called a fast get, and this Fast_Get function is shown in
Figure 3-7.
Essentially, the Oracle process checks the latch’s memory address to see if another
process is already associated with the latch. A more colorful illustration is thinking of this as
if a person is trying to acquire a latch. Picture a man running into a room where that specific
latch (or child latch) is being requested and then making a single grab for the latch (think
token or baton). The man then looks in his hand and either sees the latch or does not see the
latch. If the latch has been successfully acquired the man yells, “I’ve got it!” Otherwise, he
says, “Dang! I didn’t get the latch!” What happens next is when it really gets interesting.
A willing-to-wait latch request consists of repeated fast gets. But I’m getting ahead of
myself. Figure 3-6 shows the pseudocode for the main Get_Latch request. If the request
type is willing to wait, flow jumps down to that portion of the pseudocode. Interestingly,
Oracle immediately performs a fast get—a single latch request. If that request fails, which is
highly likely in an active production system, control continues down to the spin-sleep loop.
Notice the While loop with the TRUE condition set. Unless the process is interrupted, the
Oracle process will continue spinning and sleeping until the latch has been acquired. Truly,
the process is willing to wait!
Spinning on a Latch
Once in the spin-sleep loop, a call to the Spin_Get function is made. Figure 3-7 shows the
Spin_Get function pseudocode. The Oracle process will perform repeated fast gets until
either the latch is acquired or the numbers of spins reaches the hidden _spin_count
instance parameter value. This is where the term spinning on a latch originates.
For years, the _spin_count instance parameter has been set to 2000 by default,
which is still the case in Oracle Database 11g. Note that Oracle does not spin for 2,000 ms,
but rather makes up to 2,000 latch fast get requests.
A good mental picture of spinning on a latch is to think of a room full of people looking
intently at the room’s center. There is a hole in the very center of the room where the just-
released latch shoots out like a rocket. All the people in the room are repeatedly yelling,
“Give me the latch!” and swinging their arms as fast as they can up to 2,000 times, hoping
that their hand will move over the launch spot just as the latch is released. When they look in
their hand, the contestants hope to see the latch. If they don’t, they continue swinging their
arms and yelling (up to _spin_count times), hoping next time they will be the lucky one.
As I wrote earlier, unlike enqueues, with latches there is no ordering, no enqueue, and no
dequeue. It’s dynamic, changing, exciting, and probabilistic. But unless the Oracle architects
made a poor choice, this should be a fast way for a process to acquire a latch. I guess it is
80
possible for the first people to ask for the latch to be the last people to acquire the latch, but I
have not experienced that situation.
As you can image, in a busy system, making repeated fast attempts to acquire a latch can
be exhausting. So after the specified _spin_count attempts have been made, and the latch
has not been acquired, the spin_get function returns FALSE and, as can be seen in Figure
3-6, the process falls into the sleep portion of the spin-sleep loop.
Sleep Time
Figure 3-7 contains the sleep function pseudocode. Oracle dictates the amount of time the
process sleeps before jumping back into the latch acquisition room. Interestingly, the first
sleep does not contain any sleep time, but does ensure the very tight spinning loop is exited. If
the process does not acquire a latch after two traumatic spinning attempts, then Oracle does
insist the process sleep for 10 ms. From this point forward, every failed spin attempt will
result in an increased sleep time, up to 2,000 ms. This may not sound like much, but
demanding that an Oracle process stop what it’s doing for 2 seconds is a performance disaster.
Keep in mind the process is just gaining permission to access a memory structure.
Perhaps it just wants to check if a block is in the buffer cache. This time does not include the
actual memory structure access, but only the acquisition of the control structure! For example,
suppose a SQL statement touches 1,000 buffers and each “Is the buffer in the cache?” check
takes 2 seconds. Then just to get permission to answer the question—not the time to actually
scan the memory structure, to bring the requested blocks into the cache, or to perform any
other memory activity—will take 2,000 seconds! As you can see, severe latch contention has
the potential to cripple an Oracle system.
Looking closely at the Sleep function’s decode statement in Figure 3-7, you will
notice a tilde (~) preceding the 40, 80, and 2000. This represents randomness. In fact,
especially in a busy Oracle system, the latch sleep time will not be exactly 40 ms, 80 ms, or
160 ms. And there is a good very reason for this!
Imagine you have a boiling cup of coffee and you need to quickly walk across a room.
As you briskly walk, you will notice an oscillation pattern develop in your coffee cup. If
conditions are not in your favor and you continue your brisk pace, the oscillation pattern will
cause some of the coffee to spill on your hand, and you will scream (or at least your eyes will
water).4 You have a couple of options. First, you could stop, and the oscillation pattern will
subside. But since this is an Oracle process illustration and there is a benchmark to reach,
slowing or stopping a process is absurd. Another very creative option is to slightly shake the
coffee cup. If done correctly, the oscillation pattern will be disrupted and subside. In a very
real sense, you have placed some randomness, or at a minimum, disrupted a potentially
harmful situation.
Ask an operating system tuning specialists which is better: a calm and evenly paced
system or one with wild and dynamic load swings. Most will say that they prefer the calm
option. Quick and intense process scheduling activities result in large run queue variances and
increased CPU consumption. Notice anything special about the Oracle sleep times? They are
all multiples of 10 ms! This means during very heavy latch contention conditions, every 10
ms, many Oracle processes will wake up from their slumber and need to be placed back onto
4
And if you live the United States, you can sue the coffee maker because the coffee temperature was too hot,
and it is unconscionable to think an adult should make the decision whether or not the coffee is too hot to carry across
the room.
81
the CPU run queue. Once the CPU has dwindled down the run queue, it will receive another
jolt, and so on and so on.
One way to reduce the impact of this oscillation-like situation is to inject into the mix
some sleep time randomness. Interestingly, Oracle lets the operating system do this! Figure 3-
8 is the partial output of an Oracle server process operating system trace. The Oracle system is
experiencing severe CBC contention, and the CPUs are pegged at 100% busy. When the
server process is traced, Oracle Database 10g Release 1 on Linux creatively uses and issues a
select system call to enact the sleep effect. It’s unfortunate for Oracle DBAs that the
select call is issued, because it creates confusion with the Select SQL statement. But
rest assured, select is a system call kernel developers put into their code, and you can even
get a Linux manual page by submitting the man select command at the operating system
prompt. As the manual page states, the select call provides a fairly portable way to sleep
with subsecond precision. However, while the sleep parameter is in microseconds, the
operating system does not guarantee precision to the microsecond.
[oracle@localhost ~]$ ps -eaf|grep 32229

oracle 32299 31992 0 22:04 pts/1 00:00:00 grep 32229
[oracle@localhost ~]$ strace -rp 32229

0.000087 select(0, [], [], [], {0, 10240}) = 0 (Timeout)
...
0.000085 select(0, [], [], [], {0, 10240}) = 0 (Timeout)
...
0.000084 select(0, [], [], [], {0, 10240}) = 0 (Timeout)
...
0.000093 select(0, [], [], [], {0, 10240}) = 0 (Timeout)
...
0.000085 select(0, [], [], [], {0, 10240}) = 0 (Timeout)
...
0.000085 select(0, [], [], [], {0, 10240}) = 0 (Timeout)
...
0.000085 select(0, [], [], [], {0, 10240}) = 0 (Timeout)
...
0.000085 select(0, [], [], [], {0, 10240}) = 0 (Timeout)
Figure 3-8. Oracle uses the system call select to put a process to sleep during heavy latch
contention. The Linux strace –r option shows the call time on the next line before the
name of the next call. For example, the first select shown in this figure took 0.099707µs.
During the Figure 3-8 trace, you can see the select system call and its key time
parameter, which was always 10240, or 10,240 ms—not exactly 10 ms, but close enough to
the pseudocode. What I find more interesting is the actual sleep time is close to 10 ms, but
82
never exactly 10 ms. The –r option of the Linux strace option, used in Figure 3-8, shows
the call time at the beginning of the next line. If you look at the actual sleep time in
microseconds in Figure 3-8, you will see sleep times of 99.707 ms, 99.352 ms, 10.602 ms,
and 10.504 ms, and the next value is 10.562 ms. I have observed that the more intense the
CPU bottleneck, the more variance in the sleep times. This is actually good news for Oracle,
because in a very real way, the operating system is providing the desired randomness without
Oracle needing to randomize the sleep time itself!
Time Accounting
Latch contention affects end user application response time. Therefore, if we want to
quantifiably decrease the negative impact of latch contention, we must know how Oracle
records time related to latch contention. Referring to the latching pseudocode in Figures 3-6
and 3-7, there are two key time-related areas: spinning and sleeping.
Ruminate on these questions for a few moments:
• When a process is spinning on a latch, should it be considered service time or queue
time?
• How would the user categorize latch contention time?
• How would the operating system administrator categorize latch contention time?
• How does Oracle categorize latch contention time?
The correct answers depend on your perspective. The application users don’t care how
the time is categorized. They just know performance is unacceptable and it’s impacting their
ability to get their work done. Operating system administrators see only CPU busyness (that
is, utilization) and the CPU run queue. So their understanding—or better said, their ability to
clearly see the picture—is limited.
From a DBA perspective, understanding how Oracle response time relates to latch
contention time is paramount. Keep in mind that it is very convenient for DBAs when CPU
time is classified as service time. Therefore, when an Oracle process is spinning on a latch, it
makes sense to record the spin time as CPU time. And, in fact, Oracle does do this. As
spinning increases, you will observe Oracle processes consume more CPU time, and therefore
also notice the CPU subsystem becomes increasingly busy. Relating back to the situation with
the room full of people waving their arms and yelling “Give me the latch,” consider all that
activity takes energy. In the real Oracle world, this energy is CPU consumption.
So, we have the application users waiting for their application to respond and Oracle
DBAs waiting for the Oracle process to acquire the latch, but I have made no mention of
Oracle wait event time. I stated spinning on a latch consumes CPU time, but have made no
mention of what constitutes the Oracle wait event latch free. If Oracle considered and
recorded CPU spin time as wait time, yet it also recorded CPU spin time as service time, then
Oracle would double-count the spin time. In other words, 20 ms spinning would result in a
response time of 40 ms: 20 ms for CPU consumption while spinning and 20 ms for latch wait
time.
Fortunately, Oracle does not double-count latch spin time. If you look closely near the
bottom of Figure 3-6, just before the Sleep function call, you’ll notice the wait event post.
Oracle latch wait time reflects only the sleep time, not the spin time. This means that latch-
related response time contains the CPU consumption during spinning and also latch sleep
83
time. Stated another way, latch-related response time contains service time due to spinning on
a latch, and the queue time is related to Oracle processes sleeping.
This has profound latch contention resolution implications. When you look at a wait
event report or a response time report, all latch wait event time is latch sleep time, and not the
related CPU time. In other words, when you see latch wait time or a latch wait event, you
know the associated process has already been spinning and consuming CPU before the wait
event was posted! This is why when there is intense latch contention, the DBA sees a lot of
latch wait time and also a lot of CPU consumption. I’ll provide more details about this in
Chapter 5.
Now let’s relate Oracle’s latch acquisition algorithm and time accounting to a real-life
Oracle system.
A Real-Life Latching Acquisition Example

Figure 3-9 is not based on queuing theory, and it is not a model. Figure 3-9 is based on actual
data sampled from a real Oracle system. A heavy logical IO load was placed on a single four-
core CPU Oracle Database 10g Release 2 Linux system. Six 3-minute samples were taken at
each load, and then the load was incremented with another three Oracle sessions. The graph in
Figure 3-9 shows the following:
• Each bar is the average of the six samples.
• The horizontal axis is the incremental load placed on the system—in this case, the
number of concurrent synthetic users.
• The transaction or work element is the v$sysstat view’s user call statistic.
• The response time is the milliseconds for a single user call to complete.
• Service time is based on CPU time consumed for each user call and is sampled from
the v$sys_time_model view.
• Queue time is based on the v$latch view’s time_waited column (but
v$system_event could have been used, as the values in these two views are
consistently within a tenth of a second).
84
Figure 3-9. Latching acquisition performance data gathered from a real Oracle system, not
based on a model. Notice that Oracle’s default latching spin and sleep algorithm actually
decreases latch acquisition response time per user call until the CPU subsystem becomes
saturated. The test was run on a single four-CPU core Linux Oracle 10g Release 2 system.
Figure 3-9 provides deep insights into the efficiency of Oracle’s latch acquisition
algorithm. Initially, there is some overhead, resulting in a relatively high initial response time.
This does not represent a problem, because there is plenty of CPU power available. As
concurrency goes up and CPU cycles become increasingly less available, fortunately,
economies of scale take effect, resulting in a continual decrease in acquisition service time
and response time. But as CPU resources start to become scarce, at 47% CPU utilization,
Oracle processes begin sleeping. However, even with the workload increasing, sleep time per
user call remains about the same, and CPU time per user calls continues to decrease until
around 95% utilization. At this point, CPU resources are in such short supply that queue time
becomes increasingly more significant as processes spend more time sleeping.
This graph serves as a testimony to the efficiency of Oracle’s latch acquisition
algorithm, because even as the load increases pushing CPU utilization toward 100%, latch
acquisition time does not significantly increase. If you look closely, only at 97% utilization
does the average acquisition time begin to increase. Even more amazing is that, based on
queuing theory, with four cores running at 80% utilization, around 50% of the response time
is composed of queue time. Yet based on this test with Oracle’s algorithm, queue time doesn’t
hit 50% of response time till 100% CPU utilization. So Oracle beat queuing theory!
85
Should You Increase the _spin_count Parameter?

Oracle’s hidden instance parameter _spin_count eventually comes up when talking about
latches. Because it’s common for latching articles to recommend increasing _spin_count,
this parameter deserves special attention.
Understanding both the algorithms and observing actual real-system results, as in Figure
3-9, I hope you understand that increasing _spin_count diminishes the likelihood of a
latch-requesting process to sleep. In fact, if you want to effectively remove Oracle’s sleep
capability, keep increasing the _spin_count value until sleep requests are effectively zero.
Without sleeping, Oracle processes have no option other than spinning on a latch. And
when CPU utilization increases—available CPU cycles are becoming less and less
available—we are effectively asking Oracle to consume potentially even more CPU by
increased spinning! So, we could be making an already serious situation even worse.
The other problem with increasing the _spin_count value is that all latches will be
affected. Key to removing latch contention is focusing on the specific problematic latch.
Changing the _spin_count value will affect not only the latch that has problems, but all
other latches as well. So, the result is less predictable.
The only time I will increase _spin_count is if the operating system vendor’s
performance engineer, who must also be an Oracle expert, makes the recommendation and
can clearly explain why latch acquisition response time will decrease or the available CPU
will increase after changing this value. This is exceptionally rare. If the engineer can’t clearly
explain why increasing _spin_count will help performance, he is guessing! And guessing
is no way to fight performance fires or advance your career.
How to Detect Significant Latch Contention

Using Oracle’s wait interface, detecting Oracle latch contention is very straightforward. As I
have mentioned, nearly every production Oracle system will have latch contention, but that
does not imply there is a problem worthy of our attention. Only when the contention rises to
the top of our wait event and ORTA reports is it significant and worthy of our precious time
to improve the situation.
To have a realistic shot at solving latch issues, our diagnosis must also include
determining the specific latch that is causing the problem. For example, knowing latching is
responsible for 80% of the wait time is not enough information. Knowing 75% of the wait
time is associated with the CBC latches gives us the detailed information we need to develop
a latch-specific solution. So detecting latch contention implies latching is consuming a
significant portion of response time and also determining the problematic latch. In nearly all
cases, a single latch type will be involved—for example, the library cache latch, the CBC
latch, or the LRU latch.
Figure 3-10 shows a classic wait event report based on the Oracle Database 10g
v$system_event view. The same information can be found near the top of both a
Statspack (Figure 3-13) and an AWR report. Since the top wait event is clearly the CBC latch,
we know there is significant latch contention, and we know the specific latch.
86
SQL> @swpctx
Database: prod3 24-APR-10 10:44pm


Wait Event (sec) Waited Waited (ms) Count(k)
-------------------------------- ----------- -------- ----------- --------
LGWR wait for redo copy 0.010 0.01 5.0 0
log file sync 0.000 0.00 0.0 0
Figure 3-10. The v$system_event based OSM swpctx.sql report. Clearly, latch
contention is significant, and the latch to focus on is the CBC latch.
SQL> @swswp latch%free

Sess
----- ---------------------------- ------------ --------- -----
179 latch free 1712396568 116 1
189 latch free 1712394712 116 1
205 latch free 1712387056 116 1
3 rows selected.
SQL> l
2 p1, p2, p3
SQL>
SQL> select * from v$latchname where latch#=116;
LATCH# NAME HASH

---------- --------------------------------------------- ----------
116 cache buffers chains 3563305585
1 row selected.
Figure 3-11. The v$session_wait-based OSM swswp.sql report details sessions

currently posting a wait event containing latch%free. We caught three sessions waiting
for latch number 116, which the second v$latchname-based query shows as being the
CBC latch. We know this is a pre-10g system because the latch name is not part of the wait
event.
87
If the Oracle release is earlier than Oracle Database 10g, the latch name is not part of the
latch free wait event; therefore, additional actions must occur to determine the specific
latch requiring our close attention. You have two basic options. One is to sample the system
interactively based on the real-time wait event view, v$session_wait. The other option is
to very carefully sample from the classic latch performance view, v$latch.
Sampling from v$session_wait is very straightforward. Figure 3-11 is based on a
pre-Oracle Database 10g system v$session_wait view and displays the two key columns
of utmost importance to us: event_name and p2.
Regardless of the Oracle release, the second parameter (p2) is the latch number, which
can be joined with v$latchname to determine the specific latch. Repeated manual samples
from v$session_wait will clearly show the latch number that are sessions are waiting
for. Figure 3-11 shows a sample v$session_wait-based query followed by its SQL, and
then the v$latchname query to determine the specific latch of interest.
If you have an earlier Oracle system (for example, Oracle 7, Oracle8, or Oracle9i) and
cannot sample from v$session_wait, you can still determine the specific latch that needs
attention. For example, you can sample from v$latch or use a Statspack report. Remember
that regardless of your reporting options, before you drill down into the specific latch, first
ensure latching is a significant problem.
Figure 3-12 is the OraPub System Monitor (OSM) latch report based solely on
v$latch and will work on every Oracle release. In more recent Oracle releases, v$latch
does have a wait time column, but even if your view does not have this column, you still have
a good way to determine the problematic latch. The Gets column represents the number of
times the Get_Latch function (see Figure 3-6) was called. The significant column is Sleeps.
This is the number of times the sleep function is called. Not only is the number of sleeps
important, but the sleep time is also significant. The user feels the sleep time, not the number
of sleeps. Also, the sleep time multiplied by the number of latch requests is approximately the
latch free wait time.
Understanding the strong correlation between Oracle’s wait interface and v$latch
sleep time and latch requests, Steve Adams5 created a wonderful indicator named impact.
The impact value is simply the number of sleeps multiplied by the number of sleeps divided
by the number of gets: sleeps*(sleeps/gets). Figure 3-13 shows the OSM report
latch.sql, which includes the Impact column and the impact percentage for each latch
listed. My personal experience has shown that in a latch-suffering system, the Impact column
clearly indicates the problematic latch and also matches perfectly with the wait event reports.
Figure 3-12 also shows the top latch to be CBC by both the Wait Time and the Impact
columns. Assuming latching is a significant issue, our solutions will certainly directly address
reducing CBC latch contention.
5
Steve Adams, based in wonderful Australia, has a deep Oracle internals understanding. Recognizing the
correlation between sleep time and Oracle latching wait event time led him to develop his impact calculation.
88
SQL> @latch

Report: latch.sql OSM by OraPub, Inc. Page 1
Latch Contention Report
Wait Time
Latch Name (sec) %Impt Impact Gets(k) Sleeps
--------------------- -------- ----- ------- ------- --------
cache buffers chains 33,849 100 35647 273133 1495238
slave class create 8 0 1 1 32
simulator lru latch 0 0 0 1 5
redo allocation 0 0 0 113 15
user lock 0 0 0 1 1
library cache 21 0 0 1803 43
parameter table allo 0 0 0 1 1
redo writing 0 0 0 318 2
shared pool 0 0 0 829 3
session allocation 0 0 0 2043 2
10 rows selected.
Figure 3-12. The OSM latch report, with some columns removed. The Impact column is
calculated as sleeps multiplied by sleeps divided by gets. Both the wait time and the impact
indicate the CBC latches are where we should focus any latch-specific solution.
Suppose the performance issue occurs at night, and you can’t interactively sample from
v$latch, or you simply prefer Oracle’s Statspack output. Based on the partial Statspack
output in Figure 3-13, the top wait event is latch free. Therefore, we know latch
contention is significant (so is CPU consumption, but I will save that discussion for Chapter
5). Now the question turns to which specific latch deserves our attention. The second part of
Figure 3-13 is the Latch Sleep Breakdown, located about two-thirds down the Statspack
report.
While not shown in Figure 3-13, more recent Oracle releases and Statspack reports will
show the Time Waited column. The time waited value should be very close to the wait time
reported by Oracle’s wait interface. If so, then use this number to determine the latch that
needs attention. If the Time Waited column is not provided, you’ll need to manually calculate
the Impact column, as discussed earlier. Just copy and paste the Statspack values into a
spreadsheet and create an Impact column. The top latch should clearly rise above the rest.
89
Top 5 Timed Events

~~~~~~~~~~~~~~~~~~ %Total
Event Waits Time (s) Ela Time
--------------------------------------- ------------ ----------- --------
CPU time 47,411 46.84
latch free 1,681,639 28,395 28.06
db file sequential read 19,150,874 8,197 8.10
db file scattered read 19,965,462 8,091 7.99
db file parallel write 77,300 2,094 2.07
...
Latch Sleep breakdown for DB: BUGS Instance: BUGS Snaps: 8959 -8969
-> ordered by misses desc
Get Spin &

Latch Name Requests Misses Sleeps Sleeps 1->4
------------------------ ------------ ----------- ----------- -----------
row cache enqueue latch 300,922,214 26,813,321 63,353 26750538/622
16/564/3/0
row cache objects 301,591,381 3,299,463 83,466 3216692/8208
shared pool 77,478,493 3,683,712 499,072 3207833/4551
46/18888/184
5/0
library cache 102,444,638 2,425,460 648,082 1813686/5791
32/29594/304
8/0
cache buffers chains 1,287,056,955 1,111,534 143,794 1023579/6408
4/9194/14682
...
Figure 3-13. A partial Statspack report based on Oracle9i Release 2. Only the Top 5 Timed
Events section and the most active latches in the Latch Sleep Breakdown section are shown.
Table 3-2 is based on the Latch Sleep Breakdown section in Figure 3-13. The Impact
column and the impact percentage were calculated, and the rows were sorted by the impact in
descending order. Clearly, the library cache is the top latch and should be given priority. But
in this case, the shared pool latch should also be addressed. So, the first priority is the library
cache latch, followed by the shared pool latch. As you’ll learn in Chapter 7, the library cache
and shared pool latches are closely associated with cursor execution, so a change focused on
one can affect the other.
In addition to the sleeps information in v$latch, the v$latch_misses view
contains additional information. One key piece of potentially helpful information is the
location (column location) of the latch within the kernel code. This can provide useful
information when searching MetaLink or Google for solution clues.
90
Table 3-2. Without wait time details, the impact calculation can be used to determine the
offending latch.
Latch Name Gets Sleeps Impact Impact %

Library cache 102444638 648082 4100 56
Shared pool 77478493 499072 3215 44
Row cache objects 301591381 83466 23 0
Cache buffer chains 1287056955 143794 16 0
Row cache enqueue latch 300922214 63353 13 0
Oracle Mutex Specifics

The use of another memory serialization control structure was introduced starting with the
Oracle Database 10g Release 2 kernel. Called a mutex, it provides more flexibility and
potentially lower performance impact than traditional Oracle latches. Oracle continues to
more fully make use of mutexes in each release. Currently, the mutex focus has been in the
library cache area. As you’ll come to understand, the performance possibilities will entice
Oracle kernel developers to increasingly take advantage of the mutex.
What Is a Mutex?
The mutex, short for mutual exclusion, is not an Oracle invention. Just like the readv and
select system calls, mutexes are available to programmers. A simple Linux manual page
request for mutex (man 9 mutex) will return a number of mutex-related calls.
Developed for fine-grained and high-concurrency thread control, mutexes are
implemented at a very low system call level, whereas latches rely more on Oracle kernel code.
Each mutex call starts with the mutex abbreviation mtx, and the call names should come as
no surprise. Here is a partial list of mutex calls: initialization (mtx_init), destruction
(mtx_destroy), spinning (mtx_lock_spin, mtx_unlock_spin), and sleeping
(mtx_sleep).
Because mutexes are based on standard system calls, they form a base functionality on
which Oracle can build to meet its specific objectives. For example, suppose Oracle wants
greater control then the standard mutex spin call provides. Instead of using only the mutex
spin request call, Oracle can add code to meet its objectives. This gives Oracle a tremendous
amount of flexibility, and also allows Oracle to more easily integrate new facilities, like
mutexes, into its existing code base.
Benefits of Using Mutexes

There are a few very important and remarkable reasons why Oracle chose to start using
mutexes. Besides a smaller memory footprint6 than latches (16 bytes compared to 112 bytes),
6
It’s a tough argument that Oracle cares about memory consumption. But the fact is that each mutex does
require less space than latches. However, there are many more mutexes than latches, so perhaps the net difference is
of no significance.
91
mutexes give Oracle kernel developers more control over structure creation, reduce false
contention, reduce the likelihood of the control structure being the bottleneck, and provide
faster pinning times. Let’s take a closer look at why Oracle was motivated to begin using
mutexes.
Flexible Creation
Like latches, mutexes are associated with a memory structure or a memory structure piece.
But, as shown back in Figure 3-2, latches are truly separate from their associated memory
structures. While latch and mutex serialization objectives are the same, kernel developers can
define a mutex to be part of the underlying memory structure or memory structure piece. This
allows for increased memory structure granularity, with less concurrency control overhead
and less false contention. If the mutex is defined as part of the memory structure or memory
structure piece, and the memory structure is deallocated, then so is the mutex. For example, a
mutex can be defined for each cursor. Oracle could also define a mutex for not only the parent
cursor, but also any and all related cursors or cursor parts.
Reduced False Contention

Oracle’s flexible mutex implementation comes in very handy for extremely complex memory
structures, like Oracle’s library cache. When using a library cache latch, Oracle kernel code
developers must associate multiple memory pieces with one of the library cache latches.
Because library cache memory structures are highly interconnected, Oracle’s latching
capabilities result in a practical limitation (performance and coding complexity) as to how
granularly the latches can be applied.
The unintentional result is known as false contention. This unfortunate situation occurs
when two processes require access to different memory pieces, yet both memory pieces are
controlled by the same latch. Figure 3-14 shows an example of false contention. Process P100
requires access memory structure MS1. Process P200 requires access to memory structure
MS2. A single latch CS999 controls access to both memory structures MS1 and MS2. If
process P100 has latch CS999 while accessing MS1, process P200 will contend for the latch,
even though it has no need to access memory structure MS1! Because the processes are
contending for the same control structure, yet will not access the same memory structure, they
are falsely contending. The more complex the underlying memory structure, the more likely
this will occur.
92
Figure 3-14. An example of false contention. If process P100 wants access to memory
structure MS1 and process P200 wants access to memory structure MS2, they will both
contend for the same control structure CS999, resulting in false contention.
Control Structure Contention

Another consequence associated with false contention is control structure contention. For
example, suppose you want to build a shed to store farm equipment, but you must first obtain
a permit. Many other people also want to build sheds, so the governmental office is
overwhelmed with permit requests. To ensure there is no illegal builder collaboration, each
permit is serially reviewed. As you can imagine, during peak permit submission times, the
permit review process can experience long delays. In this situation, acquiring the permit is the
bottleneck, not shed construction.
Relating this to Oracle, the bottleneck would be acquiring the control structure, not
accessing the underlying memory structure. When complex memory structures are protected
by a limited number of latches, solutions focused on the control structure are to either add
control structures (for example, specialized governmental permit reviewers) or decrease the
control structure access time (for example, streamline the permit review process). In Oracle
terms, this means either add more latches and improve latch acquisition performance or
optimize the kernel code accessing the memory structure. A creative option is to not use
latches at all, but instead use a completely different structure that inherently provides both of
these benefits—like a mutex! Reducing the likelihood of control structure contention is
another reason mutexes are an attractive memory control option.
Faster Pinning
Pinning ensures a specific memory structure is not deallocated, removed, or destroyed. For
example, if a process is accessing an Oracle block buffer, obviously the process would not
want the buffer replaced by some other buffer. To prevent this from occurring, the accessing
process pins the buffer. Kernel developers perform the pin, not DBAs. While DBAs can cache
tables and keep programmatic structures in the library cache, Oracle kernel developers issue
an underlying pin of a buffer or a cursor.
Latches and mutexes can be used to pin a memory structure. Every mutex has a variable
called the reference count. It contains the number of processes that are currently referencing
the mutex (in shared mode). Oracle uses the reference count to determine if an object is being
pinned. If the reference count is greater than zero, Oracle knows the object is pinned and
cannot be deallocated. When an object is being pinned, the corresponding mutex’s reference
count is incremented by one. When the object is unpinned, the reference count is reduced by
one. As long as the reference count is greater than zero or the mutex is being exclusively held,
every Oracle process knows the memory structure is being referenced by some process, and
therefore it cannot be deallocated.
93
As you might expect, mutexes are expected to perform the pinning operation much more
quickly than latches can do pinning. A quick test showed mutex pinning is significantly faster
than latch pinning. For this test, a simple Oracle Database 10g Release 2 one-table SQL
statement cursor was repeatedly opened and closed 500,000 times in a very tight PL/SQL loop
by a single session. The database server contained a single four-core CPU. This test was
repeated 30 times and the wall clock time recorded. The hidden instance parameter
_kks_use_mutex_pin (a setting of true enables mutexes when supported) determined
if mutexes were used instead of library cache latches.
With mutexes disabled, both the library cache and the library cache pin latches were
very active. However, with mutexes enabled, neither latch had any activity (that is, latch gets
were zero)! With mutexes disabled and the library cache latches enabled, the average test took
20.00 s +/- 1.07 s, 95% of the time. With mutexes enabled and the library cache latches
disabled, the average test took 19.90 s +/- 0.98 s, 95% of the time. While the difference may
seem irrelevant, statistically, they are indeed very different. Even at a 95% confidence level,
this test indicates mutexes did indeed improve performance. Also keep in mind that this is a
simple, single-session cursor open and close test. We would expect mutex performance to be
even more pronounced in a high-concurrency test.
So, I created a high-concurrency mutex experiment. The situation was similar to the
single-process test, except now five sessions were involved, the cursor was opened and closed
200,000 instead of 500,000 times, and 6 samples were taken instead of 30 samples. The clock
started when the first process began and stopped when all five processes completed their
200,000 cursor open and close loops. With mutexes disabled and the library cache latches
enabled, the average test took 468.0 s +/- 14.7 s, 95% of the time. With mutexes enabled and
the library cache latches disabled, the average test took 418.1 s +/- 7.4 s, 95% of the time.
Mutexes reduced the test time by an incredible 11%. Statistically speaking, the performance
difference is significant indeed! Also the mutex test times were less variable then standard
latching, which means mutex usage will result in more consistent performance. As we hoped
and expected, mutexes significantly improved cursor pinning performance.
Oracle’s General Mutex Algorithm

If you are comfortable with the latching pseudocode shown in Figures 3-6 and 3-7, you will
quickly grasp Oracle’s mutex algorithm shown in Figures 3-15 and 3-16. As you can see,
there are some interesting twists!
Looking at Figures 3-15 and 3-16, you can quickly see some similarities between the
latching and mutex algorithms. The concepts of willing to wait versus immediate (nowait),
spinning, sleeping, and posting the wait event when sleeping are very similar. However, there
are a few distinct differences.
It all starts with the mutex request. Just as with latches, the specific mutex is requested
along with the type of either nowait (that is, immediate) or willing to wait, and the mode of
either shared or exclusive. If a nowait request is made, similar to latching, a single fast get
request occurs. If a willing-to-wait request occurs, a single fast get is requested; if it fails,
processing drops into the familiar spin-and-sleep cycle. While both latches and mutexes have
the same spin and sleep objectives, as you’ll see, how this plays out in practice is very
different.
94
Function Get_Mutex(mutex_name, type, mode)

{
/* Nowait or Immediate Request */
If type = ‘nowait’
If Fast_Get(mutex_name)
return TRUE
Else
return FALSE
End-If
/* Willing-to-wait Request */
Else
return TRUE
Else
/* The Spin-Sleep Loop */
While ( TRUE )
If Spin_Get(mutex_name)
return TRUE
Else
Register_Event(“cursor: ...”)
Mutex_Wait(try++)
End-If
End-While
End-If
End-If
}
Function Mutex_Wait(try)
{
/* Kernel developers choose sleep option during mutex
definition and creation. Options are to yield the CPU,
block other processes, or simply sleep.
*/
}
Function Spin_Get(mutex_name)
{
for i := 1 to 255
Then
return TRUE
End-If
End-For
return FALSE
}
Figure 3-15. First part of the pseudocode for when a process asks for a mutex. This figure
contains the main Get_Mutex along with the supporting Mutex_Wait and Spin_Get
functions.
95
Function Fast_Get(mutex_name)
{
If mutex.holder := SID
Case mode:
‘X’: If mutex.ref_count = 0
return TRUE
Else
mutex.holder := clear
return FALSE
End-If
‘S’: mutex.ref_count++
mutex.holder := clear
return TRUE
End-Case
Else
return FALSE
End-If
}
Figure 3-16. Second part of the pseudocode for when a process asks for a mutex. The
Fast_Get function is shown, along with the shared and exclusive mode request details.
The mutex Fast_Get function objective is the same as the latch Fast_Get function,
but since mutexes are being used, the Fast_Get function can make use of the built-in and
extremely quick mutex capabilities. Figure 3-17 illustrates a model of a mutex and its memory
structure. If you look closely at Figure 3-17, you can see the mutex has been defined as part of
the memory structure MS1. The memory structure contains two mutex variables: the holder
identifier and the reference count. If an Oracle session is holding the mutex exclusively or is
in the middle of running the Fast_Get function, the holder identifier contains its session
identifier (v$session.sid).
Figure 3-17. A model of a mutex and its memory structure. Notice this mutex is part of the
memory structure MS1. The memory structure contains the mutex inherent holder identifier
and reference count variables, and also the actual memory structure itself.
Look closely at the Fast_Get function in Figure 3-16. The first If statement is not an
equality test, but rather an attempt to assign the current session’s identifier to the mutex’s
holder identifier. If this assignment is successful, then at this point, this session has control of
the mutex. Any other session running the Fast_Get function will not be able to assign its
session identifier to the holder identifier and will immediately return a FALSE. This ensures
96
that only one session will be able acquire the mutex in either shared or exclusive mode at a
time. In essence, Oracle is enforcing mutex request serialization in its Fast_Get function.
As with all things Oracle, there is another level of detail. Mutex request serialization is
ensured because, at an atomic level, the session identifier assignment operation is performed
by a single compare-and-swap (CAS) operation. Because only a single instruction is executed,
this single instruction becomes the point of serialization and control. This is a beautiful
solution, because the point of serialization is below the business, the DBA, and the Oracle
kernel code—down at the underlying operating system level. The lower the level the point of
serialization occurs, the better. Not only is there less likely to be serialization issues because
of application code or Oracle kernel code, but operations (for example, source code
conditional statements versus a single CAS operation) will be significantly faster.
Alas, RISC operating systems do not have the CAS operation; therefore, Oracle, through
its software, simulates the CAS operation. Obviously, some performance is lost. Oracle
simulates the CAS operation by creating a pool of latches known as the KGX latches. If there
is contention with Oracle’s CAS operation simulation, latch:KGX will become the top wait
event. If this occurs, as it has for some RISC systems, contact Oracle support.
If a shared mode mutex request is being made, the mutex’s reference count is simply
incremented by one, the holder identifier is cleared, and the Get_Mutex function returns
TRUE. This operation is very elegant, very efficient, and very fast! It helps reduce the risk of
the control structure, as opposed to access to the underlying memory structure, being the
bottleneck.
Suppose the session requests the mutex in exclusive mode. As before, the session tries to
set the holder identifier to its session identifier. If successful, it makes sure that no other
session has the mutex in shared mode by checking whether the reference count is zero. If the
reference count is zero, the Fast_Get function returns TRUE, as will the Get_Mutex
function, indicating the session now has the mutex in exclusive mode. This is key: the
mutex’s holder identifier was not cleared during the exclusive fast get request! This will cause
all subsequent shared and exclusive mode requests to quickly fail, as they will not be able to
set the mutex’s holder identifier. This quick failure (due in part to the CAS operation) also
helps to reduce the risk of the control structure being the performance bottleneck.
The Fast_Get function is repeatedly called during the Spin_Get function. The
concept of spinning is the same for both mutexes and latches. However, when attempting to
acquire a mutex, Oracle has hard-coded the maximum number of spins to 255, rather than the
2,000 used for latches, and it’s not settable via an instance parameter. The thinking behind
this is that, because the attempt to set the mutex’s holder identifier occurs so elegantly and
incredibly fast, if you’re going to successfully acquire the mutex, you will get it quickly. If
you don’t, then you’re not likely to acquire the mutex by spinning up to 2,000 times. So,
instead of spinning 2,000 times and consuming precious CPU cycles, Oracle just puts the
process to sleep for a bit and tries again.
The Mutex_Sleep function has the same objectives as with latches, but Oracle takes
advantage of inherent mutex function calls. Figure 3-15 shows Oracle developers have a
number of options to sleep an Oracle process. The desired sleep tactic is established when the
mutex is created. In essence, it’s part of the mutex creation process. Oracle developers can use
mutex-specific sleep options, or they can create their own hybrid routine. But regardless, the
general mutex options are to either yield the CPU, sleep, or block other processes from
acquiring the mutex. By operating system tracing some Oracle server processes with heavy
mutex contention, it appears Oracle is currently using the CPU yield option.
97
As you can see, Oracle has taken what was designed for high-concurrency thread control
and incorporated—or perhaps retrofitted is a better word—mutexes into its serialization
control scheme. The result is a stellar serialization scheme that is both flexible and fast. I
suspect we’ll continue to see Oracle kernel code developers take advantage of mutexes.
How to Detect Mutex Contention

It is exceptionally rare to have mutex-related contention. I don’t state this lightly, but if I
found such contention in a system, I would immediately suspect an Oracle bug and contact
Oracle support.
Just like latches, mutexes are hooked into the wait interface while sleeping. Also,
spinning on a mutex consumes CPU resources and will be recorded as such. To save
resources (said another way, to optimize internal resources), mutex get requests are not
recorded; only sleep activity is recorded. There are v$mutex_sleep (like v$latch),
v$mutex_sleep_history, and dba_hist_mutex_sleep views.
Figures 3-18 and 3-19 are examples of what you could see with severe mutex
contention. Wait events starting with cursor are related to acquiring a mutex to access a
cursor in the library cache. It’s unfortunate that the wait event name does not start with
mutex, but understanding that mutexes are replacing library-related latches, it does make
sense.
Resolving the most common mutex issues is detailed in their respective Oracle internals
chapter. For now, you are fully prepared to detect if mutex contention exists, if it is
significant, and specifically which mutex is being requested.
98
SQL> @swpctx
Database: prod5 02-APR-10 07:02am


-------------------------------- ----------- ------- ----------- --------
cursor: pin S 13.540 90.09 1230.9 0
os thread startup 0.080 0.53 40.0 0
latch free 0.000 0.00 0.0 0
control file sequential read 0.000 0.00 0.0 0
log file sync 0.000 0.00 0.0 0
Figure 3-18. An example of mutex contention. This situation was created by having a few
sessions opening and closing a very simple cursor in a tight PL/SQL loop.
SQL> @swpctx


-------------------------------- ----------- ------- ----------- --------
library cache: mutex X 10.050 89.57 245.1 0
db file parallel read 0.410 3.65 20.5 0
latch free 0.000 0.00 0.0 0
log file sync 0.000 0.00 0.0 0
control file sequential read 0.000 0.00 0.0 0
Figure 3-19. An example of mutex contention. This situation was created in the same way as
the one reflected in Figure 3-18, except the instance parameter
session_cached_cursors was set to zero. This ensures exclusive library cache mutex
acquisition requests.
99
Summary
Serialization is huge! Smooth Oracle latching and mutex operation is critically important. If
there are problems in this area, parallelization suffers a grinding and painful slowdown.
I’ve presented many technical details in this chapter. The goal was to help you
understand serialization, Oracle’s general latching and mutex algorithms, latching and mutex
specifics, and how to identify when there is significant latching and mutex contention to
warrant your attention. You will be able to combine your latching and mutex understanding
with Oracle internal algorithms (for example, CBC operations) and proven methodologies to
come up with creative, practical, and powerful solutions. So this chapter will become practical
only when combined with a proven methodology and a solid understanding of Oracle
internals. None of these three components used alone will produce an exceptional
performance analysis.
And finally, don’t let anyone tell you that once you’ve hit latching contention, you can’t
do anything more. As this book is beginning to reveal, there is a very diverse and practical set
of solutions to each latch and mutex contention scenario.
100
CHAPTER
4
Identifying and
Understanding
Operating System
Contention
I still remember the day a colleague handed me The Fish Book and simply told me to read it.
This was in the early years of my Oracle journey, and I was getting deep into Oracle
performance analysis and loving it. The book was actually titled System Performance Tuning,
but since it had a picture of a big swordfish on the cover, we always called it The Fish Book.
That night I started reading the book, and I was intrigued. It gave me an entirely new
perspective on performance analysis. I began to discover that what was happening to the
operating system correlated with what was happening with Oracle. It was like getting another
opinion or having someone review my work. From that point on, every time I talk, teach, and
write about Oracle analysis, you can expect it to be confirmed from an operating system
perspective as well. The operating system analysis has such a profound impact on
performance firefighting that it’s one of the three circles in OraPub’s 3-circle analysis.
101
Operating System Contention
This chapter is not your typical operating system chapter. Most people reading this book
already have a pretty good understanding of how to install, upgrade, and administer Oracle on
a host operating system. You know the typical commands to find files—do some awk’ing and
grep’ing—and how to perform standard administration tasks. But I will go beyond issuing
commands and understanding syntax.
Here, I focus on how to quickly determine the operating system’s bottleneck using
standard, freely available tools and how to ensure what you’re seeing in Oracle matches what
you’re seeing from an operating system perspective. I will show you how to hook this
altogether using the OraPub 3-circle analysis method (introduced in Chapter 1). I want you to
be able to stand in front of your operating system administration team and confidently explain
to them where the operating system bottleneck resides, how you discovered it, how Oracle is
involved, and what can be done to resolve the problem from three different perspectives.
There are a lot of Oracle DBAs in the world, and there are a lot of operating system
administrators as well. But few people really know Oracle well, can quickly determine the
operating system bottleneck, and can relate them together to build a convincing solution case.
What I’m trying to convince you is that understanding the operating system from an Oracle
performance firefighter’s perspective is a very good thing for your career!
The Four Subsystems

While trying to solve a raging performance issue, an operating system administrator told me,
in front of his manager, that there was no operating system bottleneck. He flatly and coldly
stated, “The problem is all Oracle!” Well, unless the speed of light is not fast enough or there
is a block or locking issue somewhere, there will be an operating system bottleneck.
The good news is there are only four areas, or subsystems as I like to call them, to
investigate, and it’s very easy to detect the problem area. The four subsystems are CPU,
memory, IO, and network. One of those subsystems is suffering.
Using only standard Linux/Unix tools, along with some Oracle performance views, you
can find the operating system bottleneck. If you’re a Windows DBA, don’t worry. At our
level, the concepts and terms are very familiar. You’ll know which statistic you need to get.
You’ll just need to find the tool that provides the statistic.
Most of you reading this book will have access to your specific vendor’s performance
tools. For example, HP’s tool is tusc (my guess is Trace Unix System Calls). With AIX and
Solaris, you have truss (my guess is Trace Unix System Statistics). If you can find the
bottleneck using standard tools, you will certainly be able to do the same thing using tools
developed specifically for your platform. So, let’s get started!
CPU Contention
Of the four subsystems we will dig into, the CPU is probably the easiest to understand. Yet,
identifying and understanding CPU contention breeds insights into every part of your work.
When you look at the CPU subsystem with a response-time analysis perspective, everything
seems to fall very neatly into place. Service time, queue time, and the run queue all make
sense, and you can see how this translates into the Oracle system. And you’ll also find that the
basic concepts of queuing theory are blatantly exhibited. It’s like a living science experiment!
102
How to Model a CPU Subsystem

Models are a wonderful way to distill something very complex into something we can quickly
understand. The CPU subsystem model shown in Figure 4-1 provides surprising insights.
Figure 4-1. A CPU subsystem model showing one queue and four CPU cores
In Figure 4-1, the CPU subsystem contains four CPU cores.1 We model a CPU
subsystem with a single queue. From our abstraction view, every transaction enters into the
single CPU queue and waits for any available CPU core to be serviced. Figure 4-1 shows only
one transaction is being serviced, and no transactions are waiting in the queue. This means
three of the four cores are not servicing transactions and are therefore idle.
At this point, we can make two key observations:
• Only one of the four CPU cores is busy servicing transactions. Said another way, 25%
of the CPU cores are busy.
• Because three CPU cores are idle, no transactions are queued up waiting to be
serviced.
This will be our starting point for CPU subsystem performance analysis. The rest of this
section will build from this point.
Have you ever heard someone say the IO subsystem needs to be balanced? Of course
you have, because an IO subsystem can have very active devices and not very active devices.
But as a DBA, it’s very unlikely you have heard someone say the CPU subsystem needs to be
balanced. That is because any available CPU core can service any transaction in the queue.
Unlike an IO subsystem, where reading and writing must occur at an exact spot on a physical
spindle, any CPU core can process any waiting transaction. This has massive implications.
Figure 4-2 shows a four-CPU core subsystem in three different situations. Situation A is
very busy CPU subsystem with two processes waiting in the queue. Situation B looks like a
very idle CPU subsystem with only one active transaction. However, situation B could also
represent a problem. If the transaction being serviced in situation B is a relatively long
transaction, why are the other CPUs not participating, but instead standing idle? This is a
scalability issue, and as mentioned in Chapter 3’s discussion of serialization, everyone—from
1
CPU cores are sometimes called servers because each core serves transactions. This can be very confusing to
DBAs without a predictive analysis background. DBAs typically consider each box, node, or database server to be a
server. In the predictive analysis arena, it is common to speak of anything that serves transactions as a server. It could
be a CPU, a CPU core, a physical IO device, an IO volume, or the server at a restaurant. If it can serve, than it’s a
server. However, in this book, I will try not to use the word server to avoid confusion
103
the businesspeople to the operating system vendors—gets involved, so this situation never
occurs.
Figure 4-2. A four-CPU core subsystem in three different situations
Situation C in Figure 4-2 is truly unfortunate. Because there is only one queue, there
should never be a situation where processes are waiting in the queue while there are idle CPU
cores. Can you imagine standing in a fast food restaurant line, and even though two servers
are available, they look at you and tell you to wait? Absurd indeed, and if you ever see this
occur (on a computer system), contact your operating system vendor!
Where CPU Time Is Spent

When power is being supplied to a CPU, its time can be classified into three or four areas:
• User time: This is when a core is spending time processing code, such as SQL*Plus,
the oracle executable, or Perl scripts. Normally, Oracle database CPU subsystems
spend about 60% to 90% of their active time in what is called user mode.
• System time: This is when a core is spending time processing operating system kernel
code. Virtual memory management, process scheduling, power management, or
essentially any activity not directly related to a user task is classified as system time.
From an Oracle-centric perspective, system time is pure overhead. It’s like paying
taxes. It must be done, and there are good reasons (usually) for doing it, but it’s not
under the control of the business—it’s for the government. Normally, Oracle database
CPU subsystems spend about 5% to 40% of their active time in what is called system
mode. If you’re from a non-Unix background, you may be more familiar with the term
kernel mode or privileged mode, which is, in essence, system time.
• Idle time or waiting for IO: When the CPU cores are not busy doing work, they
could be truly just sitting around doing nothing, or idle. Or, as with many Oracle
systems, they are waiting for data from the IO subsystem. When CPUs are waiting for
IO, the time classification is commonly called wait for IO, or wio or wa, for short.
When idle time drops to around 30%, transactions start queuing for CPU resources,
any response time related to the CPU subsystem starts increasing, and end user
performance begins to degrade.
If you ever see system time surpass user time, either contact your system administrator
or casually give your pager to your colleague and walk away. When this occurs, which is very
unusual, the operating system is spending more time managing itself than doing work for you.
The operating system can actually continue spending more and more time self-managing until
it finally shuts down.
104
As a painful and humiliating as this is, I have to admit that this actually happened to me.
This was during one of my consulting engagements when Oracle’s multithreaded servers
(MTS), commonly called shared servers, were first available. I forgot to set the instance
parameter mts_max_servers. The system was very large and running a well-known
financial application. The shared servers where very busy, and Oracle responded by launching
new shared servers. Unfortunately, it was taking so long for the shared server processes to
start that, whenever Oracle rechecked the situation, it came to the conclusion that it should
launch even more shared servers (you know where this is headed). Eventually, Oracle
launched so many shared servers and the operating system was so far behind and busy trying
to start the processes that the operating system actually shut down. During this time of
excitement, user time started dropping and system time started increasing, leading to the
system becoming completely unresponsive and the console displaying “panic.”
When Queuing Sets In

When does queuing become a problem? The answer begins with, “Well, it depends.” And
what it depends on, in large part, is the number of CPU cores. A general answer is queuing
becomes significant when CPU cores are busier more than 75% of the time. This is truly a
general statement and does not imply that you want your CPUs running below 75%.
Figures 4-3 and 4-4 are based on the same 12 CPU core model. It’s easy to think that
when queuing represents 10% of the response time, this is a very serious situation. However,
based on queuing theory, which CPU subsystems follow very closely, at around 75% busy,
queue time accounts for around 10% of the response time. Now, 10% may seem very
significant, but if you look at the situation graphically in Figure 4-4, 75% busy doesn’t look
so bad. This demonstrates why it can be a good idea to show data both numerically and
graphically.
105
Figure 4-3. Numerically, on a 12-CPU core system, queuing accounts for 10% of response
time at around 75% busy. However, once queuing sets in, it quickly becomes significant.
When busyness exceeds 100%, queue time becomes infinite (grows and grows) and therefore
so does response time. Excel shows this as #N/A.
106
Figure 4-4. Graphically, on a 12-CPU core system, queuing doesn’t look like an issue until
around 85% busy. Notice that once the CPUs are around 90% busy, response time
skyrockets!
On a 12-CPU core system, the elbow of the curve occurs at around 85% busy. With
fewer cores, queuing becomes significant much sooner. And with more cores, CPUs can be
much busier before queuing becomes significant.
Look closely near the elbow of the curve in Figure 4-4. Notice that once we enter the
elbow of the curve, response time skyrockets. I call this “the wall.” Any senior DBA will have
experienced this.
The situation begins with the system performing wonderfully. It’s running so well that
the DBA is asked to allow another batch job or two to run concurrently. What the DBA
doesn’t realize is that the system is operating just before the elbow of the curve. So the DBA
decides to allow the increased workload to enter the system—and bam!—performance takes a
significant drop. On a 12-CPU core system, it takes only a relatively small increase in the
workload to send your system deep into the elbow of the curve. If the system were not
operating near the elbow of the curve, then the situation would have been very different. This
is perfect example of how understanding just a little queuing theory can make a big
difference. If the DBA had known the number of cores, the CPU busyness, a little queuing
theory, and some workload management control, he would have never allowed the increased
workload. (I will delve deeper into practically applying queuing theory in the last chapter.)
Now let’s shift to a CPU subsystem with just a couple of cores. Figures 4-5 and 4-6 are
models based on a 2-CPU core system. If you graphically compare the 12-CPU core system
with the 2-CPU core system, you will immediately notice a difference. The 12-CPU core
system (Figure 4-4) response-time curve has a much more distinct elbow; that is, the system’s
response-time curve is very flat until it enters the elbow of the curve, and then response time
skyrockets. In stark contrast, the 2-CPU core system graph (Figure 4-6) response-time curve
107
elbow is more gradual. In fact, if you look at the queue time value in Figure 4-5, you’ll notice
that queue time exists even during a light workload. The situation is even more apparent with
a single-CPU core system. This is why experienced DBAs know they can run a system with
more cores at a higher CPU busyness than a system with just few cores!
Figure 4-5. Numerically, on a two-CPU core system, queuing accounts for 10% of response
time at around 32% busy. Just as with Figure 4-3, when the system busyness exceeds 100%,
queue time becomes infinite.
108
Figure 4-6. Graphically, on a two-CPU core system, queuing doesn’t look like an issue until
around 65% busy. It’s more difficult to make a risk assessment because the response-time
curve does not have such a dramatic elbow as systems with more CPU cores.
DBAs who have control over the workload entering their system have been known to
run the CPUs as busy right up to the beginning of the elbow of the curve. Now, not many
administrators have this kind of workload control. But if your system runs a combination of
online transaction processing (OLTP) and batch processes, then it’s likely you can play this
game yourself. Let’s look at how.
Queue Length Strategies

If your system is more of an OLTP system or more batch-centric, you have some creative
choices to maximize your user’s experience. For an OLTP system, users want snappy
response time. They really don’t care about how much work the system is processing, but just
want the system to be responsive to their requests. On the other hand, if the focus is on batch
processing, responsiveness takes a backseat to throughput. For example, if there are 12 CPU
cores along with a long batch process queue, having a CPU idle is something we don’t want
to see. Let’s take this a step further.
OLTP-Centric Systems
We know that the lower the CPU busyness, the shorter the CPU queue. For a snappy OLTP
system, we do not want processes waiting in the queue. We want the OLTP-related processes
to be serviced immediately! The way to increase the likelihood of this occurring is to keep the
CPU busyness relatively low. This means if you have an OLTP-centric system and desire
snappy response time, you must keep the CPU subsystem at a low enough utilization to
109
ensure OLTP processes are not queuing. For example, if a system is a mix of OLTP and batch
processing, which most systems are, then during key OLTP processing times, you will want
to throttle back batch processing to ensure the CPU run queue is not impacting OLTP
processing responsiveness.
Batch-Centric Systems
If the system is batch-centric or batch processing is the priority, the situation becomes very
different. For systems focused on batch processing, service levels are not measured by
response time. The key service level for batch systems is throughput; that is, response time is
not based on a single batch job, but on a group of batch jobs. Put another way, throughput is
important, not response time. Idle resources are death to throughput. For a batch-focused
system, every bit of CPU power needs to be consumed.
The way to ensure every bit of CPU is being consumed is to ensure the CPU busyness
reaches the level where there is always a process waiting in be serviced. Said another way,
make sure there is always a process queued up and ready to run. So for a batch-centric system,
you will want to run the CPU busyness at a much higher level than you would for an OLTP-
centric system.
Don’t get carried away during batch processing times and allow a massive queue,
because the operating system must manage processing scheduling. The longer the queue, the
more time the operating must spend managing the queue. The key is to ensure there is always
a process ready to run, not a hundred processes ready to run.
Obviously, we need to quantify the length of the queue and the busyness of the CPU
subsystem. This is the focus of the next section.
Monitoring CPU Activity

So far, I have conveniently left out how to measure CPU activity or quantify how long of a
queue is too long. Now is the time to delve into the specifics.
Monitoring CPU activity can be effectively done by focusing on just two statistics: CPU
busyness and CPU queue length. The more common term for CPU busyness is utilization. A
CPU subsystem has a single run queue.
Utilization
Utilization is an amazingly simple concept that can be applied in a wide variety of
circumstances. It is simply consumption divided by capacity. For example, if we have an
empty pitcher with the capacity of four cups, and it currently contains two cups of water, the
utilization is 50%. It’s not rocket science, but this basic concept is used throughout
performance management, both in firefighting and predictive analyses of all kinds.
Gathering CPU utilization statistics is easy, and there are a number of sources, including
one of Oracle’s performance views. While each operating system has its own specific tools,
vmstat is nearly always available, and sometimes you can use sar.
Figure 4-7 shows a typical vmstat command sampling every 30 seconds from an
active and very stable system. The first vmstat line returned is supposed to be the average
since the system was rebooted. This is why the first line commonly looks very different from
the rest of the lines, and why we usually do not include the first line in our analysis.
110
[oracle@fourcore]$ vmstat 30 9999

procs ---------memory------------ --swap-- --io- --system- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 2488556 131384 764324 0 0 2 2 89 158 0 0 100 0 0
0 0 0 2488556 131384 764324 0 0 0 23 1011 8678 23 2 75 0 0
0 0 0 2488556 131384 764324 0 0 0 21 1010 9225 23 2 75 0 0
0 0 0 2488680 131384 764324 0 0 0 19 1011 8408 23 2 75 0 0
0 0 0 2488680 131384 764332 0 0 0 20 1011 8423 23 2 75 0 0
0 0 0 2488680 131384 764332 0 0 0 18 1010 9032 23 2 75 0 0
0 0 0 2488680 131384 764332 0 0 0 19 1010 8589 23 2 75 0 0
0 0 0 2488680 131384 764332 0 0 0 18 1010 7336 22 2 76 0 0
0 0 0 2488680 131384 764332 0 0 0 18 1010 8561 23 2 75 0 0
Figure 4-7. Sample vmstat output sampling every 30 seconds. The CPU system is active,
but not too active, with plenty of excess capacity.
While vmstat can be used to monitor many system aspects, our current focus is the
CPU subsystem. The utilization-related columns are at the far right of the report (Figure 4-7):
• us: User time. This represents the percentage of time during the sample interval that
CPUs were servicing processes in user mode. For example, when the oracle
executable is being processed, most of its time will be deemed user time.
• sy (or sys): System time. This value represents the percentage of time during the
sample interval that CPUs were servicing processes in system mode. For example,
when the process scheduling or virtual memory management occurs, this will be
recorded as system time.
• id (or idl or idle): Idle time. This value represents the percentage of time during
the sample interval that CPUs were idle, with no processes desiring CPU time.
• wa (or wio): Wait for IO time. This value represents the percentage of time during the
sample interval that CPUs were idle yet processes were waiting for IO before they
could consume CPU resources. Not all IO subsystems will show wait for IO time, even
when there are significant IO wait issues. If a wait for IO column is not shown, all wait
for IO time becomes part of the idle time.
• st: Stolen time. This represents the percentage of time during the interval stolen from
a virtual machine. This can be considered part of the true idle time. If you are not
running a virtual machine, this value should always be zero.
Referencing Figure 4-7, the CPU subsystem is busy around 25% of the time; that is, it’s
idle around 75% of the time. There appears to be plenty of available CPU capacity; that is, the
operating system’s CPU capacity has exceeded Oracle’s requirements.
Figure 4-8 shows another common way to get CPU utilization. The sar (System
Activity Report) command is a wide-reaching performance monitoring facility available on
every Linux/Unix system. However, some operating system administrators do not install the
required sar package or allow DBAs to run an interactive sar report. 2 When sar is
2
Confusing to Oracle DBAs, the Linux sar facility is contained in the sysstat package. On Linux, this
package is not automatically installed, but obviously can be very helpful for our purposes.
111
installed, it continually monitors system activity and stores the results for later retrieval.
Operating system administrators tend to reference historical sar data, whereas most DBAs
want to run sar interactively to view the system as it is currently operating. So when you tell
your administrator you want to “run sar,” make sure he knows you want to run it
interactively and only when necessary. The sar output can also be directed to a file for later
processing (think awk and grep).
[oracle@fourcore]$ sar -u 15 9999

Linux 2.6.18-92.el5PAE (fourcore) 05/26/2010
10:18:45 AM CPU %user %nice %system %iowait %steal %idle

10:19:00 AM all 25.47 0.00 2.84 0.02 0.00 71.68
10:19:15 AM all 19.38 0.00 1.98 0.02 0.00 78.62
10:19:30 AM all 26.24 0.00 2.72 0.00 0.00 71.04
10:19:45 AM all 18.11 0.00 2.06 0.00 0.00 79.83
10:20:00 AM all 27.77 0.00 2.94 0.00 0.00 69.29
10:20:15 AM all 16.59 0.00 1.81 0.02 0.00 81.58
10:20:30 AM all 28.99 0.00 2.77 0.00 0.00 68.24
Figure 4-8. A sample sar output sampling every 15 seconds on an active Oracle system.
There is plenty of idle time, so the system has plenty of CPU capacity.
The CPU utilization sar option is -u. The %user, %system, %iowait, %steal,
and %idle columns contain the same information as the corresponding columns in the
vmstat command output. The new columns in the sar report are as follows:
• Ending sample time: When you first run sar, no output appears. But the activity
information is being stored until the first interval is complete, and then the output line
appears.
• CPU: The -u option shows the average for all CPUs. There is also a sar option to
show utilization activity for each CPU. Refer to the sar manual page for details.
• %nice: Nice time. This represents the percentage of time during the sample interval
that CPUs were servicing processes in which their priority had been lowered—in
effect, they have been nice to other processes. This time counts as CPU utilization and
can be also considered user mode time. Oracle systems tend to have zero nice time.
The total idle time is the sum of %iowait, %steal, and %idle time. Referring to
Figure 4-8, the CPU subsystem is utilized around 20% to 30%, so the CPU subsystem is idle
between 70% to 80% of the time. It appears there is plenty of CPU power available.
All Linux/Unix systems contain the /proc virtual file system, which offers a staggering
amount of information about operating system activity: CPU, memory, device and IO activity,
process-level information, and even CPU chip details. It’s amazing what you can find. For
most DBA types, /proc represents a whole new world to explore! On Linux, all the /proc
files are in text, and they can be easily retrieved and parsed to create your own reports.
Unfortunately, on other platforms, such as HP-UX and Solaris, the /proc files are binary,
and you’ll need to do some work, such as write a C program, to retrieve the details.
Figure 4-9 shows an example of the /proc/stat file. The first line, which begins
with cpu, contains all the CPU time components since the system last booted. Figure 4-10
112
makes use of this information to create a real, and arguably useful, average CPU utilization
report. A sample run output is shown in Figure 4-11.3
[oracle@fourcore proc]$ cat /proc/stat

cpu 20540 2898 13904 129680947 27466 56 610 0
cpu0 4658 102 9125 32396904 25389 0 467 0
cpu1 3500 90 1432 32430959 629 0 2 0
cpu2 2048 34 360 32433732 405 0 0 0
cpu3 10333 2672 2985 32419351 1042 56 139 0
intr 327715481 324400767 2 0 0 0 0 2 0 3 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 163 0 0 0 0 0 0 0 272590 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 56 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 3041890 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 197733009
btime 1233065114
processes 23022
procs_running 1
procs_blocked 0
Figure 4-9. A sample Linux /proc/stat file that contains CPU consumption information
since the system last booted. With a little creativity, you can write your own CPU monitoring
script.
3
Obviously, there is a lot more you can do with the /proc file system than I have presented here. My intent is
to expose you to the capability and provide a simple example of its usage. Plenty of online information about this file
system is available.
113
interval=5
i=`cat /proc/stat | grep cpu | wc -l`

cpu_cores=ècho "$i-1" | bc`
echo "The server contains $cpu_cores core(s)."

echo ""
echo -e "USER\t NICE\t SYSTEM\t WIO\t IDLE\t"
while [ 1 = 1 ]
do
cpu_all_t0=`cat /proc/stat | head -1`
cpu_usr_t0=ècho $cpu_all_t0 | awk '{print $2}'`
cpu_nic_t0=ècho $cpu_all_t0 | awk '{print $3}'`
cpu_sys_t0=ècho $cpu_all_t0 | awk '{print $4}'`
cpu_idl_t0=ècho $cpu_all_t0 | awk '{print $5}'`
cpu_wio_t0=ècho $cpu_all_t0 | awk '{print $6}'`
sleep $interval
cpu_all_t1=`cat /proc/stat | head -1`

cpu_usr_t1=ècho $cpu_all_t1 | awk '{print $2}'`
cpu_nic_t1=ècho $cpu_all_t1 | awk '{print $3}'`
cpu_sys_t1=ècho $cpu_all_t1 | awk '{print $4}'`
cpu_idl_t1=ècho $cpu_all_t1 | awk '{print $5}'`
cpu_wio_t1=ècho $cpu_all_t1 | awk '{print $6}'`
usr=ècho $cpu_usr_t1-$cpu_usr_t0 | bc`

nic=ècho $cpu_nic_t1-$cpu_nic_t0 | bc`
sys=ècho $cpu_sys_t1-$cpu_sys_t0 | bc`
idl=ècho $cpu_idl_t1-$cpu_idl_t0 | bc`
wio=ècho $cpu_wio_t1-$cpu_wio_t0 | bc`
tot=ècho $usr+$nic+$sys+$idl+$wio | bc`
usr_pct=ècho "scale=2;100*$usr/$tot" | bc`

nic_pct=ècho "scale=2;100*$nic/$tot" | bc`
sys_pct=ècho "scale=2;100*$sys/$tot" | bc`
idl_pct=ècho "scale=2;100*$idl/$tot" | bc`
wio_pct=ècho "scale=2;100*$wio/$tot" | bc`
echo -e "$usr_pct\t $nic_pct\t $sys_pct\t $wio_pct\t $idl_pct"

done
Figure 4-10. A /proc-based average CPU utilization report from a Linux system. Every 5
seconds, the average CPU utilization components are displayed.
114
$./cpu_util.sh
The server contains 4 core(s).
USER NICE SYSTEM WIO IDLE

4.53 0 .19 .98 94.28
10.55 0 .19 .19 89.04
5.58 0 .39 1.19 92.81
28.54 0 1.79 .99 68.66
5.18 0 .39 0 94.41
26.09 0 .39 .19 73.30
31.33 0 .39 0 68.26
Figure 4-11. Sample output from the CPU utilization report code shown in Figure 4-10.
Starting with Oracle Database 10g, some operating system statistics are being captured
and available through the v$osstat performance view. The column names are not
consistent from release to release, nor from one flavor of Linux or Unix to the next. However,
with only a simple examination of the columns, you can easily understand their meanings.
Figure 4-12 is an excerpt from a real-life Oracle production system’s AWR report.
While most Statspack and AWR reports are run with a one-hour duration, this particular
report is based on only 10 minutes. Notice the number of CPUs is four.
Figure 4-12. An excerpt from a production Oracle system’s 10-minute duration AWR report
showing operating system statistics. This output is based on the v$osstat view.
115
Calculating the average CPU utilization is very straightforward:
Utilization = CPU consumed / CPU available

where:
• CPU consumed can be taken directly from Figure 4-12 BUSY_TIME of 17,250
hundredth seconds (hs), which is very important, since we must maintain a constant
unit of time in our calculations. This means that over the 10-minute period, all
processes on the database server consumed 17,250 hs, or 172.50 seconds.
• CPU available is the time interval multiplied by the number of CPU cores. In Figure
4-12, the time interval is 10 minutes, which is 600 seconds (10 × 60). The number of
CPU cores is four, so the CPU power available is 2,400 seconds (600s × 4 cores, or 4
cores × 10 minutes × 60 seconds / 1 minute).
Therefore, the average CPU utilization is only 7% (0.0719 = 172.50 seconds / 2,400
seconds). At only 7% CPU utilization, regardless of the number of CPU cores, the system is
operating nowhere near the response-time curve elbow.
Based on v$osstat data, you also can calculate the average operating system CPU
utilization. Just remember the CPU consumed requires two samples: the initial and the final.
If you take only one sample, you will probably notice the bogus utilization calculation will far
exceed 100%. The CPU consumed is the final BUSY_TIME value minus the initial value. The
CPU time available will be the sample interval multiplied by the number of CPU cores. And
don’t forget to convert all time to the same unit, such as seconds.
In summary, to determine the average CPU utilization, some tools you can use are
vmstat, sar, /proc/stat, and even v$osstat. So, if you don’t have the operating
system privileges you need, starting with Oracle Database 10g, you can still calculate the
utilization.
Run Queue
The CPU run queue is key to understanding the health of your CPU subsystem. Combined
with the average utilization, it gives you everything you need to tell if there is a CPU
bottleneck.
While it seems strange, the run queue reported by the operating system includes
processes waiting to be serviced by a CPU as well as processes currently being serviced by a
CPU. This is why you may have heard it’s OK to have a run queue up to the number of CPUs
or CPU cores. As discussed earlier, for an OLTP-centric system, we do not want processes
waiting to be serviced, so we do not want the run queue to be greater than the number of CPU
cores.
Both the vmstat and sar commands provide average CPU run queue details.
Referring back to Figure 4-7, the run queue information is in the far-left column, and its value
is zero in every case. Since that particular system contains four CPU cores, the CPU
subsystem is not very busy. A run queue of 1, 2, 3, or 4 would be fine. But we do not want a
run queue of 5 or more, since this is an OLTP-based system.
Figure 4-13 shows a number of 30-second vmstat samples from a very active and
highly volatile Oracle workload. As a review, the average CPU utilization is around 73%
percent, or said another way, the average CPU idle time is around 27%. Based on queuing
theory, a four-CPU core system running at 73% will have an average run queue of 5.3, and
116
queue time will be around 31% of the response time. So while it may seem surprising, the
response time is already significantly degrading. The vmstat report shows the average run
queues to be between 3 and 7, with an average of 5.2 (I did the math). Notice the queuing
theory prediction of 5.3 was very close. So based both on utilization and run queue length,
this Oracle system is running out of CPU resources, and there is a very high likelihood the
users are not pleased with performance.

0 0 0 1276332 131384 774712 0 0 0 25 1011 15518 66 7 27 0 0
3 0 0 1276332 131384 774716 0 0 0 19 1010 16099 66 8 27 0 0
4 0 0 1275960 131384 774716 0 0 0 18 1010 16272 66 8 26 0 0
4 0 0 1275960 131384 774720 0 0 0 23 1011 16990 66 8 26 0 0
7 0 0 1275960 131384 774720 0 0 0 18 1010 17187 65 8 27 0 0
7 0 0 1275836 131384 774720 0 0 0 19 1010 17930 65 8 28 0 0
7 0 0 1275712 131384 774720 0 0 0 20 1010 19620 65 8 28 0 0
6 0 0 1275712 131384 774720 0 0 0 19 1010 19971 64 7 28 0 0
7 0 0 1275588 131384 774720 0 0 0 18 1011 20293 64 7 28 0 0
1 0 0 1275092 131384 774720 0 0 0 23 1011 20815 65 7 28 0 0
6 0 0 1275092 131384 774720 0 0 0 22 1011 20722 65 7 28 0 0
Figure 4-13. A vmstat report on a very active four-CPU core database server. This
particular system has a very dynamic workload. This is why even with around 30% idle time,
the run queue frequently exceeds the number of CPU cores.
Figure 4-14 is a sample sar –q report. The runq-sz column shows average CPU run
queue, and the information is from the same system shown in Figure 4-13. Key to making this
report useful is knowing how many CPU cores exist on the server. This Oracle database
server has four CPU cores, so processes are frequently waiting for CPU resources. This is not
what we want for an OLTP-centric system.
Do not confuse run queue length with the load average. The load average is available
from v$osstat,4 the sar –q, and uptime commands, as well as in the
/proc/loadavg file. The load average statistic is calculated differently on various flavors
of Unix and Linux. On your system, it may represent the run queue length, but only your
experience or analyzing the actual kernel source code will provide an acceptable answer.
While the CPU run queue and utilization analyses are consistent across platforms, the load
average is not.
4
At the time of this writing, Oracle does not show the run queue length in v$osstat. I suspect this will
change, as the information is easily available from the /proc file system.
117
[oracle@fourcore ~]$ sar -q 15 9999

10:36:49 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15

10:37:04 AM 6 1398 4.04 3.32 1.85
10:37:19 AM 5 1398 3.81 3.30 1.87
10:37:34 AM 2 1398 4.48 3.47 1.94
10:37:49 AM 1 1398 4.36 3.49 1.98
10:38:04 AM 8 1398 4.80 3.63 2.04
10:38:19 AM 2 1398 4.69 3.66 2.08
10:38:34 AM 6 1398 5.30 3.84 2.16
10:38:49 AM 1 1398 5.22 3.90 2.21
10:39:04 AM 6 1398 4.95 3.90 2.24
10:39:19 AM 4 1398 4.48 3.86 2.25
10:39:34 AM 4 1398 4.91 3.98 2.31
10:39:49 AM 2 1398 4.55 3.94 2.33
10:40:04 AM 4 1398 4.14 3.88 2.33
10:40:19 AM 1 1398 3.96 3.85 2.35
10:40:34 AM 5 1398 4.12 3.89 2.39
10:40:49 AM 3 1398 3.83 3.85 2.39
Figure 4-14. A sar command providing average run queue information. This database
server consists of four CPU cores, so processes are frequently waiting for CPU resources.
Memory Pressure
Compared to CPU, IO, and network analysis, memory bottleneck analysis is simple. The
challenge is understanding the unique words used and not used on your particular platform. In
fact, words like bottleneck and swapping can invoke a hostile and near-violent response from
many operating system administrators. And don’t even think about saying the word swap
around an operating system vendor. It’s just not worth the abuse you’ll receive.
During one of my consulting engagements, it was clear that Oracle’s memory
requirements were exceeding the available capacity. In fact, the situation was so severe that it
was impacting response times. When discussing the situation with the operating system
administrator, I casually and gently said, “The system clearly has a memory bottleneck.” He
gave me a confused look and said, “We don’t have a memory bottleneck. But there is
definitely a lot of memory pressure.” So I learned a new term that day to describe a memory
bottleneck: memory pressure. It does make sense, and it is a kinder and gentler word, which I
always try to use when situations are very tense.
Memory Categories
Part of the confusion DBAs face stems from the fact that physical memory is used for a
variety of purposes and classified differently. There is real memory, virtual memory, shared
memory, nonshared memory, private memory, shared memory segments, code, data, stack,
resident memory . . . I’m sure there are other types, names, and categories. The confusion can
be significantly eliminated by grouping memory into three categories: real and virtual, shared
memory segments, and process-related memory.
118
Real and Virtual Memory

The real and virtual memory category is pretty simple. Real memory is the actual memory
chips. Virtual memory is not real but provides the appearance of a lot more real memory.
Some operating systems manage this better than others, but I digress.
Shared Memory Segments

Shared memory segments reside in real memory (we hope), and a process creates them using
the shmget call. Most DBAs quickly figure this out once they try to start an Oracle instance
and receive a shmget-related message, and then need to track down what the heck just
happened.
What is unique about shared memory segments is that multiple processes can access the
same piece of memory. The alternative would be to place the common data on disk. What a
locking nightmare that would be! On Linux and Unix systems, Oracle’s SGA is stored in
shared memory segments.
Process-Related Memory
Process-related memory can commonly be divided into three areas: code, data, and stack.
Different operating systems will give these different names, and will also have different and
additional process-related memory categories. AIX in particular has its own way of
categorizing process memory. If you need to know the actual memory a process is consuming,
use an operating system-specific command. For example, both Solaris and Linux provide the
pmap command to get process-level memory details. I’ll quickly describe each of the
common categories:
• Code: This is the actual computer program code and is sometimes called text. For
Oracle systems, the most critical piece of code is the oracle executable,
$ORACLE_HOME/bin/oracle. The oracle executable should always be shared,
so there are never multiple copies. When installing earlier versions of Oracle, we had
to ensure the oracle executable was shared, or memory would be quickly exhausted.
• Data: This is private process memory (real or virtual). For most Oracle systems, a
server process’s PGA memory is stored in the data area. Think about it—the PGA is
associated with a single server process. It is not meant to be shared, and contains
process-specific information like session variables and in-memory sort information. If
the Oracle architecture requires some of the PGA memory to be shared, it is moved out
of the PGA memory and into the SGA. A resident set of memory, known as the
resident set size (RES or RSS) is defined to contain only a process’s real nonshared
memory. We want any semiactive Oracle process to have its data memory contained
within this RSS. The only way to be sure about a process’s point-in-time RSS memory
is to run an operating system- and vendor-specific command that requires a process
ID, such as the pmap command. Commands like ps and even top are notorious for
counting shared memory segments and nonresident portions of a process’s related
memory as the RSS.
119
• Stack: This is a relatively small (kilobytes usually) portion of an Oracle process’s

memory. The stack is used to push and pop process variables, and also to hold
command-line arguments while they are being processed—that is, popped off the
stack.5
The General Memory Game

Never mention the approach I outline here to your operating system vendor. And it’s probably
not a good idea to mention this to operating system administrators either, unless they are also
Oracle administrators. Also be aware that every operating system uses different words, terms,
and algorithms. But this general three-step process provides the guidance you need when
understanding if there is a memory bottleneck—excuse me, I mean memory pressure.
In memory, paging is the first line of defense. A process asks the operating system for
memory (for example, issuing the C malloc call), and the operating system looks for free
memory. First, the operating system looks for free memory pages or memory from other
processes that have not recently been used. This is called a page fault, but not a physical page
fault. (The fault part is unfortunate, but that’s what it’s called.)
Out-of-memory paging is the second line of defense. If the process memory is not
satisfied, the operating system gets a little more aggressive. It now finds another process’s
memory page (which hopefully has not been used in a while) and writes or pages it out to disk
onto the swap area (on Windows, that’s the page file). This is known as a page out (po), or a
physical page fault. Physical page faults are normal and acceptable up to a point. Your
operating system administrator and experience on a specific platform will provide the best
guidance.
Swapping is the final line of defense. Traditionally, swapping refers to all memory
associated with a process being moved out of real memory and written to disk in the swap
area. This is indeed unfortunate, because if the process needs to be run, all that swapped-out
memory must be read—swapped—back into real memory. It’s expensive from a CPU
perspective and when really bad, also from an IO subsystem perspective.
Swap: A Four-Letter Word

On an Oracle database server, we don’t want to see pages or processes being swapped out.
You’ll notice that when there is plenty of memory to go around, the swap-outs will almost
always be zero.
Figure 4-15 is a vmstat report showing zero for both pages swapped in (si) and pages
swapped out (so). Just as we want, the swap-out value is consistently zero. If you
occasionally see pages being swapped out, that is an indication the system is under some
memory pressure. It doesn’t necessarily mean there is a problem, but it’s getting close. I look
at it as if the system is just entering the response-time curve’s elbow.
I am commonly asked why I don’t look at the pages swapped in and only the pages
swapped out. Well, if there are pages being swapped in, there must have been pages swapped
out. And if pages have been swapped out, there is a good chance they will eventually be
paged back in.
5
I had a student who told me that, by simply running the oracle executable from the command line with a
massive set of command-line arguments, Oracle would essentially barf and produce a stack dump, which contains
sensitive security information. At that point, I asked him not to tell me any more. It’s like knowing the root password
when you really don’t need that information.
120

0 0 0 1276332 131384 774712 0 0 0 25 1011 15518 66 7 27 0 0
3 0 0 1276332 131384 774716 0 0 0 19 1010 16099 66 8 27 0 0
4 0 0 1275960 131384 774716 0 0 0 18 1010 16272 66 8 26 0 0
4 0 0 1275960 131384 774720 0 0 0 23 1011 16990 66 8 26 0 0
7 0 0 1275960 131384 774720 0 0 0 18 1010 17187 65 8 27 0 0
7 0 0 1275836 131384 774720 0 0 0 19 1010 17930 65 8 28 0 0
7 0 0 1275712 131384 774720 0 0 0 20 1010 19620 65 8 28 0 0
6 0 0 1275712 131384 774720 0 0 0 19 1010 19971 64 7 28 0 0
7 0 0 1275588 131384 774720 0 0 0 18 1011 20293 64 7 28 0 0
1 0 0 1275092 131384 774720 0 0 0 23 1011 20815 65 7 28 0 0
6 0 0 1275092 131384 774720 0 0 0 22 1011 20722 65 7 28 0 0
Figure 4-15. A vmstat report on a very active four-CPU core database server. The so
column represents the number of pages swapped out per second during the reporting interval.
As in this sample, we want to this nearly always to be zero.
Performance analysts know that if an Oracle server process (and, God forbid, a
background process) has some of its memory pages swapped out, response time for that
process will be negatively impacted if the pages are swapped back in. And let’s not forget the
pages were swapped out because another active process requested memory and the operating
system could not immediately provide the memory.
I tend to focus on the currently active process that is asking for pages at this moment and
must wait while the operating system swaps out some pages so it can give them to the
process. That’s why I focus on pages swapped out. But practically speaking, if there is a
memory bottleneck, you will see both pages swapped in and swapped out.
Figure 4-16 shows an example of using the Linux sar command for sampling memory
page swapping. Linux uses the uppercase W, while all other Unix platforms use the lowercase
w parameter. As Figure 4-16 shows, there is absolutely no memory swapping occurring. This
is what we like to see!
[oracle@fourcore ~]$ sar -W 15 9999

03:48:19 AM pswpin/s pswpout/s

03:48:34 AM 0.00 0.00
03:48:49 AM 0.00 0.00
03:49:04 AM 0.00 0.00
03:49:19 AM 0.00 0.00
03:49:34 AM 0.00 0.00
03:49:49 AM 0.00 0.00
03:50:04 AM 0.00 0.00
03:50:19 AM 0.00 0.00
Figure 4-16. The sar –W command being used to check if there is any memory page
swapping occurring. In this sample, swapping is not occurring, which is exactly what we want
to see.
121
Memory Page Scanning

SunOS, Solaris, and HP-UX have a process that is continually looking for free memory. The
more memory being requested, the more aggressive this process becomes.
The operating system keeps track of how many pages are scanned each second. When
memory requests are light, the scan rate will be near zero. When memory is aggressively
being requested, the scan rate will be greater than zero. On Solaris, administrators like to see
the scan rate near zero. On SunOS and HP-UX, the key number is 200. For example, on HP-
UX, if the scan rate is frequently exceeding 200, this means there is truly a lot of memory
pressure. Said another way, Oracle memory requirements have exceeded the operating
system’s memory capacity.
When available, the scan rate can be seen in vmstat reports, in the sr column. When
running sar, the command-line option is –g.
What to Say and What Not to Say About Memory

I have learned over the years that memory issues are one of the most sensitive issues for
administrators and especially with vendors. When you see a memory bottleneck, think there
may be one, or you’re just not sure, simply ask the operating system administrator. If you run
the commands I presented, give the situation some thought, and then approach the operating
system administrator, he should respect your honest and concerned request.
Every modern operating system does not swap entire processes. However, the word
swap is still used in common commands such as vmstat and sar. To ensure you are not
tackled by an operating system administrator, always use the term pages swapped (which is
essentially just like a page getting paged out), instead of a process being swapped out, process
swapping, or even just swapping. I typically say, “pages being swapped out” so I can move
on. If you still get hassled, ask the aggressor to check the manual page on either vmstat or
sar, and he will see it says something like the Red Hat Linux vmstat manual page: “Total
number of swap pages the system brought out.” Maybe it’s not too informative, but that’s
what it says.
Personally, I prefer to use the page scans, as that avoids the entire swapping discussion.
Operating system administrators feel more comfortable talking in terms of page scans and
memory pressure.
I don’t flat out say that there is a memory bottleneck. I always say that it looks like
Oracle is consuming a lot of memory, a point an administrator will never argue. I’ll ask if the
administrator feels there is a lot of memory pressure being placed on the system—perhaps
even too much for the current memory capacity. When I put the situation in those terms, a
very productive memory-related discussion occurs.
IO Contention
Like it or not, Oracle DBAs are losing control of the IO subsystem. Modern-day IO
subsystems are so complex that they frequently require a full-time administrator. Adding to
our problems, IO vendors in particular can be very abusive and just plain uncooperative with
DBAs. But while the configuration and operating of the IO subsystem may be out of our
control, we can very easily tell if the IO subsystem is not providing the capacity the Oracle
122
system needs. In this section, I will describe how to diagnose the IO subsystem and suggest
how to handle aggressive vendors.
Before we dive into the technical details, let’s consider a nontechnical solution. I still
find that most IO problems can be resolved by balancing the IO workload. This may sound
simplistic, but while volume management software does a fantastic job balancing IO within
its realm, and RAID arrays do much of the same, when multiple realms exist, they may not be
balanced. Let me explain this by telling a story.
Load-Balancing Still Helps

A few years back, I was performing a remote firefighting consulting engagement. This
analysis was fairly straightforward, and there was an obvious IO bottleneck. It wasn’t that the
IO subsystem was poorly configured, the IO administrator was not doing a good job, or the
SQL was not tuned. It was simply that Oracle’s IO requirements were clearly exceeding the
currently configured IO subsystem’s capability.
To complete the consulting engagement, I scheduled a conference call with my client. I
asked that not only the DBAs attend, but also that the IO administrator be present. I walked
everyone through the OraPub 3-circle analysis and ORTA. Everything clearly pointed toward
an IO bottleneck. But then I added that there was some good news related to my analysis. I
noticed that while some of the devices had excessive response times and were “hot devices,”
there were also a number of devices that were being underutilized. In fact, if some of the
Oracle database files were moved onto these devices, the IO problem may very simply be
resolved. As you might expect, the DBAs were very pleased about this.
However, the IO administrator was not pleased. He didn’t argue with my analysis, but he
said he didn’t want the database files moved. I asked why. He said because he was saving the
devices. I asked, “Saving them for what?” And he actually said, “I’m saving them in case we
really need them.” While I appreciated his honesty, I knew my role was complete, and the
DBAs and the IO administrator had some work to do.
My point is that while we have wonderful load-balancing and management IO
capabilities, IO problems are commonly the result of nontechnical issues, and those are just as
important as the technical challenges.
Why IO Subsystems Are Expensive

Most people in IT are aghast at the cost of an IO subsystem capable of meeting an Oracle
system’s IO requirements. While I don’t disagree with the fact that there is a considerable
amount of money spent on IO subsystems, I think it is important to understand the technical
challenges IO vendors face to make the impossible, well, possible. It starts with a basic
understanding of IO subsystem queuing theory.
How We Model an IO Subsystem

Have you ever head someone say, “Our CPUs are not balanced. We’ve got to get that fixed.”?
You might have heard this if you’re an operating system kernel developer or work for Intel,
but not as an Oracle DBA. This is because there is a single CPU run queue, and any available
core can service the next transaction. But IO subsystems are fundamentally different.
Figure 4-17 shows how we model both a CPU subsystem and an IO subsystem. Even a
quick glance shows they are fundamentally different. The key in understanding their
differences is recognizing that an IO device cannot serve every IO request. In fact, an IO
123
device can service only requests that are intended specifically for that device. So while one IO
device may be actively servicing requests, it’s completely possible another device may not be
servicing any requests. In stark contrast, a CPU core has no choice (not that this represents a
problem either) but to service the next transaction in the single CPU run queue.
Figure 4-17. CPU and IO subsystems are fundamentally different. In a CPU subsystem, any
core can service any transaction. An IO subsystem device can service only requests directed
to itself. This enables hot and cold IO devices. Even with advanced software, the balancing
activity cannot be perfect.
Even the Best IO Subsystems Queue

This natural seemingly imbalance of IO activity is why so much effort and expense is put into
keeping all devices equally busy. But even when all devices on average are just as busy as the
next, at a point in time, they can fall into the situation shown in Figure 4-17. This is easy to
see with an animated graphical simulation and much more difficult to describe in words, but
I’ll try.
Suppose you have 100 pieces of paper, 25 with the number 1 printed on them, 25 with
the number 2, 25 with the number 3, and 25 with the number 4. You round up 100 people and
give each one piece of paper. You find four other people and label them 1, 2, 3, and 4. You
tell the 100 people to mix, and then stand in a single line on one side of a room. You place the
four people on the other side of the room evenly spaced. You then stand at the beginning of
the 100-person queue, and once every 10 seconds, you allow one person to walk to the
numbered person that matches the number written on her piece of paper. You also tell the four
people on the other side of the room that once a person is in front of them, that person must
stay there for 5 seconds and then move away. Then you say, “Go!” You’ll see that after a few
minutes, some of the queues in front of the four people are very long, some are short, and
some have no queue. When you look again in a couple of minutes, while the lines may have
moved around, the same general situation exists!
The point is that even with the same IO activity perfectly balanced across all devices,
because every transaction must go to a specific device, there is no way all devices can always
have the same run queue. The result is that even at a low IO device utilization, there will be
queuing! And the queue times are highly variable when compared to the CPU queuing
configuration. Figure 4-18 graphically shows just how dramatic and immediate queue time
occurs when each device has its own queue, resulting in an increase in response time. This
happens with the same exact arrival rate and the same speed devices. The only difference is
the number of queues in the system.
124
Figure 4-18. This response-time graph contrasts two configurations with the same device
speed and same system arrival rate. The one difference is the dotted line has a queue for each
device (like an IO subsystem) and the solid line has a single queue feeding all the devices (like
a CPU subsystem). As the graph shows, when each device has its own queue, queuing
immediately occurs, resulting in an immediate response time increase.
Based on Figure 4-18, IO subsystem vendors have a difficult problem. Their goal is to
somehow, with all their caching and advanced algorithms, to transform the dotted-line
situation into the solid-line situation. To transform one fundamentally different queuing
system into another is expensive and very difficult. This is one reason IO subsystems seem so
expensive.
How to Detect an IO Bottleneck

Using Oracle’s wait interface, it is extremely simple to detect an IO bottleneck. If you want to
know how long it takes the IO subsystem to respond to a single or multiblock read request, or
how long it takes the IO subsystem to complete a multiblock database writer or log writer
request, you need not go any further than a simple wait event report. Of course, you can also
use standard operating system reports like iostat and sar. You can even get a bit tricky
and use operating system process tracing to remove Oracle from the equation. But I’m getting
ahead of myself!
My general IO subsystem performance rule of thumb is read requests must complete in
10 ms or less and write requests must complete in 5 ms or less. Only the most arrogant IO
vendors would argue their 20 ms response-time IO subsystem is performing acceptably. When
I look at device busyness via iostat or sar, unless the device is at least 5% busy, I don’t
bother to check the response time. Low-activity devices, when they are active, can result in
125
some crazy-looking response times. Since the low-activity devices are not doing any real
work for the system, we can just ignore them.
Using Oracle’s Wait Interface

We can creatively use Oracle’s instrumentation to monitor IO subsystem response time. Since
Oracle times (instruments) each and every IO request, we simply query from the wait event
views to get a very accurate performance view. Regardless of how the IO devices are
performing, how the IO subsystem is configured, or how the network-attached storage (NAS)
or storage area network (SAN) is performing, we can easily tell how long an Oracle IO
request is taking.
Figure 4-19 is an OraPub wait interface report based on v$system_event. I ran the
report once (not shown), waited 30 seconds, and then ran the report again. The second run,
which contains what occurred during the 30-second interval, is shown in Figure 4-19. The top
wait event is db file scattered read, which is what Oracle calls a multiblock read
request emanating from an Oracle server process. During the 30-second interval, the average
IO subsystem response time was 2.8 ms. That’s very good, and I suspect some of the blocks
were in a cache, but remember that these blocks are not in Oracle’s buffer cache.6
SQL> @swpctx
Database: prod3 28-May-10 04:48pm


Wait Event (sec) Waited Waited(ms) Count(k)
----------------------------------- ----------- ------- ---------- -------
log file sync 0.660 0.61 110.0 0
latch free 0.000 0.00 0.0 0
Figure 4-19. A classic instance-level interval (30 seconds) wait event report based on
v$system_event. Over the duration of this report, the IO subsystem responded to
Oracle’s multiblock read requests on average in 2.8 ms. That’s pretty good!
6
It is common for IO reads to complete in less than 1 ms. When this occurs, you know non-Oracle caching is
involved. No physically spinning device can respond within 1 ms. Obviously, the block(s) requested does not reside
in Oracle’s buffer cache; otherwise, no wait event would have been posted, but you do know the block resides in
some other cache. It could be the file system buffer cache, or perhaps that very nice cache you purchased from your
IO subsystem vendor.
126
Be careful about making assertions when there are not very many samples. For example,
during the 30-second interval between reports, single-block read requests (event db file
sequential read) took an average of 8.2 ms, but Oracle server processes issued less
than 1,000 of them. Another more pronounced example is log writer multiblock writes (event
log file parallel write) took an average of 49.0 ms! Normally, that would be
unacceptable, but they occurred less than 1,000 times during the report interval, and their
combined time accounts for only 0.45% of the wait time. So this is nothing to be concerned
about.
Table 4-1 lists the Oracle-related wait events that can be useful in understanding IO
subsystem response time. When focusing on the IO subsystem, and not analyzing Oracle
response time, I am primarily concerned with multiblock writes, and single-block and
multiblock reads. My general rule of thumb is a 10 ms response time. Different applications
will have different service-level requirements, but this is a good general guideline. The wait
event log file sync is how long commits take from an Oracle perspective. I’ve seen
commit time requirements range from 2 to 100 ms. So it really depends on your objectives.
However, in a top-of-the-line IO subsystem, you can expect IO read and write response time
to be well within 10 ms.
Table 4-1. Oracle-IO related wait events and their IO characteristics
Event Operation Blocks

db file sequential read Read Single
db file scattered read Read Multiple
db file parallel write Write Multiple
log file parallel write Write Multiple
log file sync Write N/A
direct path write Write Multiple
direct path read Read Multiple
direct path write temp Write Multiple
direct path read temp Read Multiple
Figure 4-20 is an AWR report’s Top 5 Timed Events table, showing wait events. Notice
that multiblock reads (event db file scattered reads) are taking an average of 20
ms to be returned to an Oracle server process. This normally would not be acceptable. Now, I
am not implying the IO subsystem is improper in some way and I’m also not implying the
Oracle instance or the related SQL is tuned. All we know from this report is that when Oracle
server processes submit a multiblock IO request, it takes the IO subsystem 20 ms to respond.
Based on the associated OraPub 3-circle analysis, there will most likely be at least three
possible, and probably more, solutions.
127
Figure 4-20. An AWR report’s Top 5 Timed Events table. Notice IO-related wait time
accounts for about half of the response time, and multiblock reads account for about 28% of
the wait time, with an average response time of 20 ms.
When you find that IO response times are only a couple of milliseconds or less,
remember that it takes CPU resources to satisfy these IO requests. It is very common for a
seemingly IO-bottlenecked system to actually be suffering from a CPU bottleneck.
Performing an OraPub 3-circle analysis and the associated ORTA will clearly show this, so
there should be no surprises.
Removing Oracle from the Equation

Suppose your analysis clearly shows the IO subsystem takes on average 783 ms to respond to
Oracle multiblock read requests. You show this to your IO subsystem vendor, who laughs and
then says something like, “First of all, that’s what Oracle is telling you, which is false. And
second, if you had the tools we have, you would see that our IO subsystem is performing
within normal parameters.” If you’ve been working with large Oracle systems, you’ve
probably run into this situation. It can be extremely frustrating.
Fortunately, we can easily remove Oracle from the equation and prove, based on what
the operating system is telling us, that the IO subsystem truly is responding to multiple blocks
in 783 ms! Figure 4-21 is the result of simply operating system tracing—in this case, tracing
an Oracle server process. Using strace with the options –cp suppresses output until you
interrupt the trace, and then it produces a very nice summary of the system calls, the
occurrences, and their time.
Clearly, the readv call, which occurred nearly 105,000 times (not a bad sample size)
during this sample interval, took an average of 783 ms. The pread64 call took only 2.1 ms
and occurred nearly 22,000 times. By referring to the manual page for each call and also
looking at detailed trace output (for example: strace –rp 28227 >out.out 2>&1),
you can see that Oracle uses the readv call for multiblock reads and the pread64 call for
single-block reads. These read requests are for blocks outside Oracle’s cache. Oracle knows
only the blocks’ addresses and that they reside outside the buffer cache. It has no idea how the
operating system retrieves the blocks. But fortunately for us, through operating system tracing
the server process, we can confidently demonstrate that, indeed, Oracle must wait nearly
three-quarters of a second for a multiblock read to complete.
Now I realize this will not improve relationships with IO vendors, but it does put them
on the defensive, and forces them to take a good, long look at what the IO subsystem is really
doing!
128
[oracle@fourcore iops_research]$ strace -cp 28227

Process 28227 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.14 0.782548 7 104923 readv
0.44 0.003485 0 390839 gettimeofday
0.27 0.002138 0 21940 pread64
0.07 0.000546 0 19847 semctl
0.04 0.000337 0 46450 times
0.02 0.000151 0 18001 _llseek
0.02 0.000134 0 4087 4 semtimedop
0.00 0.000000 0 44 getrusage
------ ----------- ----------- --------- --------- ----------------
100.00 0.789339 606131 4 total
Figure 4-21. Operating system tracing is an effective way to remove Oracle from the
discussion and discover how long the operating system takes to respond to IO calls. In this
example, an Oracle-dedicated server process was traced.
Using Operating System IO Reports

Nearly all Linux and Unix systems have an iostat report; Solaris may offer the sar –d
report. If you are using a filer, for example from NetApp, by issuing a remote shell sysstat
command directly to the filer, you can gather IO statistics similar to those reported by
iostat. If you’re a DBA, you may not have permission to do this, so you will need to rely
on the wait interface and operating system tracing to build a strong IO subsystem case.
I will focus on the iostat report because it provides more detail than the sar –d
report and filer sysstat IO reports. Plus, if you understand the iostat report, you can
easily understand the others.
Figure 4-22 is a typical iostat report for a very small IO subsystem. It is common for
a thousand lines to be returned. When studying the IO subsystem, it can be a good idea to load
the iostat data into Oracle and filter on the response time (await, svctm) and
utilization (%util) columns to find offensive IO patterns.
The iostat report can be initially overwhelming, but you need to look at only a few
columns to determine if there is a performance problem. The manual pages are also very
comprehensive, so you can refer to them. Here are brief descriptions of the key columns:
• Device: This is the device name, which can be just about anything these days. It
could a volume, a volume group, a RAID array, an actual physical disk, a disk
partition, or something else. The key is to first determine if Oracle is experiencing any
IO response time issues. Then, in the iostat report, look for active yet poorly
performing devices. If you find any, present both your Oracle and iostat analyses to
your IO administrator.
• r/s: This is the number of read requests submitted to the device per second. My
testing has shown reads per second correlates well with Oracle read IO operations, as
determined from wait interface read wait events and also the v$filestat view’s
phyrds column.
129
• w/s: This is the number of write requests submitted to the device per second. My
testing shows good write-per-second correlation with Oracle, when adding together the
database writer IO operations from the v$filestat column phywrts and log
writer requests from the v$sysstat statistic redo_writes. It is troubling,
however, that adding all wait event write-related occurrences fell well short of the
actual iostat r/s and v$filestat and v$systats numbers.
• await: This is the average IO request response time in milliseconds. While clearly
mislabeled, the manual page and observations show the await column includes both
the true service time and queue time. Before I begin an IO analysis or make this
statement to a customer, I always double-check the manual page for the particular
Linux or Unix implementation. Again, my rule of thumb for acceptable response time
is 10 ms.
• %util: This is the average device utilization. Modern IO subsystems can sustain
utilizations up to and possibly over 65%. While I use a high utilization to spot
potential issues, I also make sure the response time is an issue before claiming there is
a problem. I have seen devices 90% busy and still servicing IO requests in under 5 ms.
The arrival pattern and IO request service time play a significant role in the resulting
device utilization.
The report in Figure 4-22 show there are three devices that are not able to respond fast
enough: sdc, sdd, and sdg. Device sdc is experiencing heavy read activity, has a response
time of 23 ms, and is 74% busy. Device sdd is also experiencing heavy read activity, has a
response time of 14 ms, and is 66% busy. I would expect the top few wait events to be related
to IO reads. It is very satisfying to see Oracle is experiencing long read request response times
based on both the wait interface and also iostat. Device sdg is experiencing heavy write
activity, has an 18 ms response time, and is 81% busy. The writes could be caused from either
the log writer or the database writer. We cannot tell which from this operating system-focused
analysis. However, when we combine the operating system analysis with the ORTAs, we
should be able to see if the issue is focused on the database writer, log writer, or both.
Understanding Your Solution Options

Whenever I encounter an IO issue, I always frame it in terms of Oracle and application
requirements, and IO subsystem capacity. This not only implies there are three areas to focus
on for solutions, but it immediately disarms everyone involved (that is, unless they really
messed up and performed what I call a career decision).
I have seen the IO subsystem misconfigured. Vendors won’t admit this as a possibility
until cornered with both Oracle and operating system tracing proof. Usually, the problem is a
physical connection issue, where too much IO activity is going through too few connections.
A relatively simple reconfiguration can actually solve the problem.
In most cases, however, everyone must get involved to solve the problem. This means
DBAs focused on Oracle and balancing the workload; DBAs, vendors, and developers
working on the SQL; and IO administrators somehow increasing IO capacity. Eventually,
nearly every Oracle system outgrows its IO subsystem, and additional capacity must be
added. But proving this is not our initial objective. We are problem-solvers, not salespeople!
130
131
[oracle@fourcore ~]$ iostat -xd 60 999
...
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.08 4.40 0.11 1.82 5.65 49.72 28.81 0.01 2.70 0.81 0.16
sdb 0.00 5.00 0.00 1.60 0.00 52.80 33.00 0.00 0.38 0.38 0.06
sdc 288.80 12.80 461.00 14.00 114601.60 214.40 241.72 1.48 23.11 1.56 74.16
sdd 82.24 12.38 360.08 3.39 21534.53 126.15 341.26 0.28 14.48 4.07 65.85
sde 0.00 11.60 0.00 2.20 0.00 110.40 50.18 0.00 1.18 1.18 0.26
sdf 0.00 3673.80 1.00 147.00 9.60 30446.40 205.78 1.02 6.84 3.48 51.48
sdg 0.00 6246.80 1.20 201.20 9.60 51465.60 254.32 1.71 18.46 4.01 81.10
sdh 0.00 452.40 0.00 54.20 0.00 4052.80 74.77 0.11 2.10 1.48 8.04
sdi 0.00 2497.40 0.00 52.60 0.00 20400.00 387.83 0.33 6.29 4.06 21.38
sdj 0.00 1384.23 0.00 55.29 0.00 11516.17 208.29 0.26 4.71 2.95 16.29
sdk 0.00 6.00 0.00 7.40 0.00 107.20 14.49 0.06 7.78 0.49 0.36
sdl 0.00 2.00 0.00 1.00 0.00 24.00 24.00 0.00 2.20 2.20 0.22
Figure 4-22. The standard iostat command run at 60-second intervals. This particular snippet is for one of the 60-second intervals.
The IO subsystem is not well balanced; some devices are very active, with significant response times.
Network Contention
As a DBA, I routinely check on three network-related areas: latency, collisions, and dropped
packets. If there is a network issue, it is nearly always a latency issue. But in your career, you
will undoubtedly encounter a collision issue and perhaps even a dropped packets problem.
Network Latency
If you suspect there is a latency issue, my first recommendation is to gets many, many
samples. Network people can be extremely gruff. And if you crawled around on your hands
and knees all day stringing cable, or spent hours staring at network graphs, you would
probably be the same way.
Modern networks can be programmed to change packet routing based on packet type,
time of day, and activity intensity. The only way to really understand latency issues is to take
a lot of samples, paste them into Excel, and create a scatter graph.
It is important that your sample latency packets look just like SQL*Net packets. One of
the best ways to do this is to use Oracle’s tnsping command, found in the
$ORACLE_HOME/bin directory.7 While both the ping and tnsping commands provide
network latency times, the classic ping command is not good enough. Sharp network
administrators managing a complex network infrastructure could possibly delay a standard
ping packet, stop it altogether, or route it around the world a few times. So simply avoid this
trap by issuing the tnsping command.
Take hundreds or thousands of tnsping samples. Don’t sample once every second, or
you might have an unsavory encounter on your way home from work. Create a simple shell
script that runs tnsping once every 30 or 60 seconds, directing the output to a text file. And
let this run for a couple of days. Do some fancy awk’ing and grep’ing, and then drop the
data into an Excel spreadsheet to get a good understanding of the situation.
Besides calculating the typical mathematical statistics on the latency, such as the average
and standard deviation, look for daily trends. It could be that during peak Oracle processing
times, latency time skyrockets. Perhaps there is a lot of non-Oracle activity during this time,
and the network is taking a beating, so the administrators are routing all other traffic
(including your SQL*Net packets) around the problem! I have found that the network
administrators may not know the Oracle system is experiencing network issues—that is, until
you tell them about it. They are focused on support issues and ensuring the network is
available (sounds like some DBAs I know).
Figure 4-23 is the result of sampling tnsping results once an hour for three months.
There were 1,210 samples collected, the average latency was 47.90 ms, and 90% of all the
samples were less than 47.38 ms. It might seem strange that the average latency is greater
than the ninetieth percentile figure, but this is due to a few samples over 600 ms. For this
particular application, service levels dictated the latency must be below 100 ms 90% of the
time. Most applications require the latency to be within 20 ms or even 5 ms. So while there
are clearly latency times greater then 100 ms, both the average and the ninetieth percentile
figures meet service-level requirements.
7
Another stellar network command is traceroute. “Hangs” in SQL*Net can commonly be discovered by
running a few of these.
132
Figure 4-23. This graph shows SQL*Net packet latency results taken once each hour over a
three-month period. While there are no alarming trends, the average latency for the 1,210
samples is 47.90 ms.
Network Collisions
While many network administrators are not aware of Oracle application latency issues, most
administrators will realize if there is a collision issue. Collisions are the result of some type of
confusion between the packet sender and receiver. As a result, they both send a packet at the
same time and collide. You won’t get anything as exciting as subatomic particle collision
results, just angry users.
A local area network can have up to 10% of its network activity collide and still be
considered acceptable, but on modern-day switched networks, collisions should be near zero.
It’s very simple to determine the collision rate. You need two pieces of information: the
number of collisions and the number of network card output packets. If you have root access,
run the ifconfig command. Figure 4-24 is a sample ifconfig output. For Ethernet card
eth0, since the system was rebooted, there have been 31 collisions and millions of packets
both sent and received. The percentage of collisions in relation to both incoming and outgoing
traffic is far below 1%. No problem there!
If you don’t have root access, try the netstat command. If you’re not familiar with
the netstat command you may need the help of your network administrator to figure out
how to get collisions and output packets to appear. The netstat manual page can be
overwhelming. Simply divide the number of collisions by the number of output packets, and
ensure it is well below 1%. If it’s not, contact your network administrator immediately.
133
# ifconfig
eth0 Link encap:Ethernet HWaddr 00:21:9B:1C:DD:A4
inet addr:X.X.X.250 Bcast:10.0.1.255 Mask:255.255.255.0
inet6 addr: 2002:47c1:e7be:0:221:9bff:fe1c:dda4/64 Scope:Global
inet6 addr: fe80::221:9bff:fe1c:dda4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets: 1017973687957 errors:0 dropped:0 overruns:0 frame:0
TX packets: 62846571614 errors:0 dropped:0 overruns:0 carrier:0
collisions:31 txqueuelen:100
RX bytes:10606993 (10.1 MiB) TX bytes:11650777 (11.1 MiB)
Memory:fdfc0000-fdfe0000
Figure 4-24. This is an example of an ifconfig command run to determine if network

collisions are a problem. From both an inbound and outbound perspective, collisions are well
below 1%.
Dropped Packets
It is very unlikely you will ever encounter a dropped packets problem. I have heard about this
occurring only once on an Oracle system, but it was the key problem, and I suspect it has
happened in other cases. Essentially, what occurs is the network card receives more packets
than it can process and queues the packets for processing. However, if the network queue fills,
packets are simply dropped. While this can obviously occur during a denial-of-service (DoS)
attack, it could also happen on a particularly popular day for an Internet company.
There is no hard-and-fast rule of thumb for how many dropped packets is an acceptable
number. I check to see if there are any dropped packets, and if so, I simply alert the network
administrator and ask very clearly if the dropped packets represent a problem. If I suspect
they are a problem, I note the date, time, and person I contacted. If I’m feeling particularly
paranoid, I’ll also email the person I contacted.
Figure 4-24 shows that since the system was last rebooted, no network packets have
been dropped. That’s what I like to see!
Summary
Operating system bottleneck analysis can be extremely satisfying and useful. Coming from an
Oracle DBA, this may sound strange. But understanding what’s happening to the operating
system not only strengthens my performance analysis, but also gives me a much deeper
knowledge of computing systems.
Here, you’ve seen that a CPU subsystem has only one run queue and why an IO
subsystem without a massive cache will immediately queue transactions. I’ve shown you how
to confirm an IO bottleneck using multiple techniques and how to maximize either response
time or throughput by controlling the workload.
There are many Oracle DBAs, and there are many operating system administrators, but
there are not nearly enough people who can effectively operate in both arenas. My hope is this
chapter has given you the tools, strategies, and courage to begin operating in this realm.
134
CHAPTER
5
Oracle Performance
Diagnosis
This chapter is different. Sure, the previous chapters have focused on Oracle performance
diagnosis, but in this chapter, I pull everything together, filling in any missing pieces and
introducing some additional diagnosis topics. This chapter will also complete the conceptual
framework of performance diagnosis coverage, paving the way for the next three chapters on
Oracle internals.
When reading this chapter, keep in mind everything that you’ve read about so far:
Oracle’s instrumentation, how to trace operating system processes, Oracle response-time
analysis (ORTA), OraPub’s 3-circle analysis methodology, that spinning on a latch is service
time not queue time, and how to gather operating system utilization from v$osstat. If any
of these topics seem unfamiliar, I respectfully ask you to turn back to those pages.
With the first four chapters as a foundation, we are now ready to begin completing our
diagnostic focus.
Oracle CPU Consumption and Components

In the section on CPU contention in Chapter 4, I introduced how to gather information about
the operating system CPU consumption, utilization, and the CPU run queue. Here, we will
135
Oracle Performance Diagnosis
focus on Oracle’s CPU consumption. Obviously, Oracle’s CPU consumption is a subset of all
CPU consumed on the database server. For example, if the operating system CPU utilization
is 65% with only a single Oracle instance on the box, it’s a good guess that Oracle is
consuming around 55% of all the CPU available on the database server. But in this section,
we go a step further.
To complete an ORTA, we need a better understanding of where Oracle consumes CPU.
For example, are the Oracle background processes consuming an unusual amount of CPU, or
perhaps parsing is the culprit. Just as with wait event classification, Oracle allows a couple
CPU—that is, service time—classifications. How we gather and classify Oracle CPU time
and how that relates to the database server is what this section is about.
There are three approaches to gathering data about Oracle CPU consumption. The first is
the traditional approach based on v$sysstat, the second is based on
v$sys_time_model, and the final approach (which is covered in the final section in this
chapter) is based on v$active_session_history. But before looking at the
approaches, you should understand how Oracle perceives Oracle process CPU consumption.
Oracle’s Limited Perspective

Oracle is limited in its perception of Oracle process CPU consumption. While great strides
have been made with the system time model, there are still two areas where Oracle can
misrepresent CPU consumption. Keep in mind that these areas are of no consequence when
the CPU subsystem is not massively bottlenecked.
The first area of misrepresentation is because of inefficient data collection tools. While
this may not be significant, or even measurable, during times of low CPU utilization, when
the database server’s CPU subsystem activity becomes extremely intense, executing
collection-related commands take a relatively long time. If it takes 5 seconds to ask the
database, “What time is it?” that 5 seconds can throw off your analysis. So be aware that
during extremely high CPU utilization (90% and greater) reporting error could be significant.
The other area of misrepresentation has to do with time in the CPU run queue. When
Oracle reports that a process has consumed 10 ms of CPU time, Oracle does not know if the
process actually consumed 10 ms of CPU time or if the process first waited in the CPU run
queue for 5 ms and then received 5 ms of CPU time.1 Oracle perceives only 10 ms of time. If
this occurs, you will most likely notice it because the Oracle CPU time consumed will be
greater than the CPU subsystem’s capacity. For example, if you ran a response time-focused
report over a duration of 60 seconds, and if the single-CPU core subsystem were heavily
overworked, Oracle could report that Oracle processes consumed 65 seconds of CPU time.
Yet, we clearly know the CPU subsystem is limited to supplying up to 60 seconds of CPU
time and no more. And let’s not forget that the operating system consumes CPU to manage
the database server. So, even if Oracle is reported to consume 90% of the available CPU, it’s
a good guess that the CPUs are on average 100% busy. Again, this is rare, but on an
extremely CPU bottlenecked system, when the CPU run queue is significantly larger than the
number of CPU cores, it can happen.
1
If you are a queuing theory wiz, based upon the number of CPU cores, the utilization, and the run queue, you
can mathematically predict how much time the Oracle processes sat in the run queue and how much time they were
actually being served by a CPU core. Cool stuff, but the not the focus on this chapter. If this seems interesting to you,
considering reading my book Forecasting Oracle Performance (Apress, 2007).
136
Using Instance Statistics

Oracle has made CPU consumption details available for many years, at both the session and
instance levels. It’s just that CPU consumption has not been classified by DBAs and used in a
response time-focused or OraPub 3-circle analyses. The benefits to using the instance statistic
views (v$sesstat and v$sysstat) are that they are available on any Oracle release and
are familiar to most DBAs. The negatives are that the statistics are not guaranteed to be
updated until after a SQL statement completes, and when a session disconnects, all the
session-level statistics (v$sesstat) vanish. Usually, this does not present a problem with a
large multiuser Oracle system. However, with a batch-centric workload with only a few
massive resource-consuming SQL statements running, it can muddy any Oracle analysis.
In the instance statistic views, Oracle provides only total CPU consumption. It does not
differentiate between server process time and background process time.2 But if a background
process is consuming an unusually large portion of the CPU, this can be easily seen from an
operating system perspective. Figure 5-1 shows statistic 12, which is the total CPU
consumption for the Oracle instance since instance startup. This includes all Oracle server
processes and (we hope) all the background processes. The figure is in centiseconds. To
convert to seconds, simply divide the queried value by 100. The example shown in Figure 5-1
has consumed 734.20 seconds (73420/100) of CPU time since it last started.
SQL> l
1* select name,value,statistic# from v$sysstat where statistic#=12
SQL> /
NAME VALUE STATISTIC#

------------------------------ ---------- ----------
CPU used by this session 73420 12
1 row selected.
Figure 5-1. Since this instance has started, it has consumed 734 seconds of CPU. Statistic 12
from the classic instance statistics view v$sysstat contains the CPU time, in hundredths of
a second, since the instance has started.
As I stressed in previous chapters, most of our diagnostic work is based on an interval of

time. To capture interval activity, we need both an initial value and a final value. Figure 5-2
shows a simple example of how to capture instance CPU consumption over a 30-second
interval. This strategy can be applied to any similarly structured Oracle table or view,
including the performance views.
2
In fact, based on my extensive data collection tool experience, Oracle does not always collect and report
background CPU consumption through the v$sysstat or v$sesstat views. While this can present a problem
when gathering detailed data used during predictive analysis, for firefighting performance analysis, the background
CPU time is not likely to be significant enough to cause a misdiagnosis.
137
SQL> col T0s new_val T0s

SQL> select value/100 T0s from v$sysstat where statistic#=12;
T0S
----------
6734.25
1 row selected.
SQL> exec dbms_lock.sleep(30);
PL/SQL procedure successfully completed.
SQL> col T1s new_val T1s

SQL> select value/100 T1s from v$sysstat where statistic#=12;
T1S
----------
6739.26
1 row selected.
SQL> select &t1s-&t0s CPU_sec_Consumed from dual;

old 1: select &t1s-&t0s CPU_sec_Consumed from dual
new 1: select 6739.26- 6734.25 CPU_sec_Consumed from dual
CPU_SEC_CONSUMED
----------------
5.01
1 row selected.
Figure 5-2. Most firefighting requires interval reporting. This figure shows an example of
how simple it can be to capture and report interval data from v$sysstat or any similarly
structured performance view.
Oracle does provide parsing and recursive CPU consumption information. However, the
statistic parse time cpu contains all CPU parse time from server processes and
background processes, and includes both recursive and nonrecursive SQL. To complicate
classification efforts further, the recursive cpu usage statistic includes all recursive
SQL CPU consumption, including parsing-related CPU consumption. This means we cannot
simply subdivide total Oracle CPU consumption into three exclusive categories: parsing,
recursive SQL, and everything else. The best we can do with the instance views is to gather
Oracle process CPU consumption, all recursive CPU consumption, and all parsing-related
CPU consumption.
This begs the question, “What exactly is recursive SQL?” Most DBAs consider
recursive SQL to be any SQL that a DBA or application developer did not create. For
example, a DBA may submit this statement:
create table employee (fname varchar2(200));
138
Oracle then takes this nonrecursive statement and dynamically creates the SQL
necessary to insert rows into the data dictionary tables, such as tab$ and col$. This
dynamically created SQL is indeed recursive SQL. However, consider the SQL shown in the
following trace file snippet.
PARSING IN CURSOR #6 len=138 dep=0 uid=5 oct=47 lid=5

tim=1211066735177081 hv=3514417586 ad='40bff5a0'
declare
looper number;
the_count number;
begin
for looper in 1..500
loop
select count(*) into the_count from bogus;
end loop;
end;
END OF STMT
PARSE
#6:c=3999,e=4532,p=0,cr=3,cu=0,mis=1,r=0,dep=0,og=1,tim=1211066735177078
The true Oracle definition of recursive SQL is any SQL with a depth greater than zero.
Notice the preceding PL/SQL entry has a depth of zero (dep=0) and therefore is officially
deemed nonrecursive SQL. However, notice what happens when the select statement is
run 500 times:
PARSING IN CURSOR #8 len=26 dep=1 uid=5 oct=3 lid=5

tim=1211066735178338 hv=701285344 ad='409b2e2c'
SELECT COUNT(*) FROM BOGUS
END OF STMT
PARSE #8:c=1000,e=1129,p=1,cr=4,cu=0,mis=1,r=0,dep=1,og=1,
tim=1211066735178335
EXEC #8:c=1000,e=18,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,
tim=1211066735178396
FETCH #8:c=0,e=23,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=1,tim=1211066735178436
EXEC #8:c=0,e=12,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1211066735178482
FETCH #8:c=0,e=18,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=1,tim=1211066735178514
EXEC #8:c=0,e=10,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1211066735178550
...
The depth is one (dep=1), which means Oracle considers this application SQL,
recursive SQL. Therefore, all its related CPU time will be recorded as both recursive SQL and
server process SQL (statistics CPU used by this session), plus there will be some
related parse time CPU consumption. So, as you can see, we cannot simply subdivide total
Oracle CPU consumption into the three exclusive categories of parsing, recursive SQL, and
everything else. This also provides an answer to why recursive CPU consumption can seem
strangely high!
Figure 5-3 shows a simple script to gather CPU consumption, including recursive SQL
CPU consumption, and parsing-related CPU consumption, over a 120-second period. Figure
5-4 takes the raw values and produces the service time details we need for our ORTA.
139
SQL> col T0_CPU_all_s new_val T0_CPU_all_s

SQL> col T0_CPU_parse_s new_val T0_CPU_parse_s
SQL> col T0_CPU_recur_s new_val T0_CPU_recur_s
SQL>
SQL> select value/100 T0_CPU_all_s from v$sysstat
where name='CPU used by this session';
T0_CPU_ALL_S
------------
734.37
SQL> select value/100 T0_CPU_parse_s from v$sysstat

where name='parse time cpu';
T0_CPU_PARSE_S
--------------
115.42
SQL> select value/100 T0_CPU_recur_s from v$sysstat

where name='recursive cpu usage';
T0_CPU_RECUR_S
--------------
122.06

SQL> col T1_CPU_all_s new_val T1_CPU_all_s

SQL> col T1_CPU_parse_s new_val T1_CPU_parse_s
SQL> col T1_CPU_recur_s new_val T1_CPU_recur_s
SQL>
SQL> select value/100 T1_CPU_all_s from v$sysstat
where name='CPU used by this session';
T1_CPU_ALL_S
------------
744.38
SQL> select value/100 T1_CPU_parse_s from v$sysstat

where name='parse time cpu';
T1_CPU_PARSE_S
--------------
115.43
SQL> select value/100 T1_CPU_recur_s from v$sysstat

where name='recursive cpu usage';
T1_CPU_RECUR_S
--------------
122.79
Figure 5-3. This code snippet details the collection of values for service time classification
based on v$sysstat.
140
SQL> select &T1_CPU_all_s-&T0_CPU_all_s Tot_CPU_s,

2 &T1_CPU_parse_s-&T0_CPU_parse_s Parse_CPU_s,
3 &T1_CPU_recur_s-&T0_CPU_recur_s Parse_CPU_s
4 from dual;
old 1: select &T1_CPU_all_s-&T0_CPU_all_s Tot_CPU_s,
new 1: select 744.38- 734.37 Tot_CPU_s,
old 2: &T1_CPU_parse_s-&T0_CPU_parse_s Parse_CPU_s,
new 2: 115.43- 115.42 Parse_CPU_s,
old 3: &T1_CPU_recur_s-&T0_CPU_recur_s Recur_CPU_s,
new 3: 122.79- 122.06 Parse_CPU_s
TOT_CPU_S PARSE_CPU_S RECUR_CPU_S

---------- ----------- -----------
10.01 .01 .73
1 row selected.
Figure 5-4. This code snippet takes the raw CPU v$sysstat values as gathered in Figure
5-3 and produces the service time classification breakdown.
Writing the Figure 5-3 and Figure 5-4 CPU times in a tabular format, we get the
following:
Total CPU 10.01s (stat: CPU used by this session)
Recursive CPU 0.73s (stat: recursive cpu usage)

Parse CPU 0.01s (stat: parse time cpu)
Figures 5-3 and 5-4 report that on this Oracle system, over a 120-second interval, the
Oracle instance consumed 10.01 seconds of CPU. This is what Oracle consumed, what
Oracle’s CPU requirements were, and if the same workload continues, what Oracle will
consume over the next 120-second interval.
If this database server contained a single CPU core, then over the 120-second interval,
the database server’s CPU subsystem could supply a maximum of 120 seconds of CPU power
to operating system processes. Said another way, this is the database server CPU capacity
over a 120-second interval. Since Oracle consumed 10.01 seconds and the database server’s
capacity is 120 seconds, Oracle consumed 8.34% (10.01/120) of the available CPU. Said
another way, Oracle is utilizing around 8% of the server’s CPU. If this database server
contains only this Oracle instance, then it is likely the operating system CPU utilization is
around 12% to 18% (add between 5% to 10% for operating system overhead). This tells us
that unless there are other processes consuming CPU from the database server, it is highly
unlikely there is a CPU bottleneck.
In Figure 5-4, also notice that the CPU consumption is not parse time or recursive SQL
time-intensive. In fact, both take less than 1% of the total CPU consumed. As I’ll discuss in
detail in the following chapters, when parsing or recursive SQL time starts increasing, shared
pool-related latching begins to raise its ugly head.3
3
Some of you reading this may feel I’m getting uncomfortably close to performance ratio analysis. I
understand your concern, but considering the resulting ratios are only one small part of a solid ORTA, a complete 3-
circle analysis, and no action will be taken unless your ORTA directs it, I’m not concerned, and I hope you’re not
either. Don’t, as they say, throw the baby out with the bathwater. Percentages are a fantastic way for us to relate raw
141
Using the System Time Model

While the v$sesstat and v$sysstat instance statistics view CPU consumption data
collection strategy provides useful information, it does have several flaws. As you learned in
the previous section, statistics may not be updated until a database call completes; when a
session disconnects, session-level details are lost; and background process statistics are not
guaranteed to be collected. With the introduction of the system time model starting in Oracle
Database 10g, the updates and background processes issues have been resolved, and the
session disconnets problem is solved through the active session history capability discussed in
the final section of this chapter.
Time Model Superiority

Essentially, Oracle has extended the instrumentation approach into service time—that is, CPU
consumption. While the time model is not perfect, and there are still only two service time
classifications, it’s a significant leap forward and sets the stage for some truly amazing self-
aware, self-tuning, and self-adjusting features. So this is a very big deal, and it works
wonderfully.
In keeping with Oracle naming conventions, there are two time model views:
v$sess_time_model and v$sys_time_model. Two key CPU statistics are used to
classify CPU service time:
• DB CPU, which contains only server process CPU consumption
• background cpu time, which contains Oracle background CPU consumption
As I’ll demonstrate, an active Oracle process updates the time model performance views
about every 6 seconds. All time is in microseconds, so be sure when converting to seconds to
divide by six zeros—that is, 1,000,000.
If you operating system trace an active Oracle process, you will notice not only a large
number of gettimeofday calls, but also some getrusage calls. As you learned in
Chapter 2, Oracle uses the gettimeofday call to determine system call time, with the
results fed into Oracle’s wait interface. The getrusage system call is used for
complementary service time collection. The getrusage system call allows a process to
gather its own CPU, memory, IO, and other consumption information. Oracle embeds this call
in both its server processes and background processes. Oracle can use this call to determine
Oracle session-level resource consumption at a very granular level. As you may have guessed,
this information is also available from the /proc virtual file system (which exists on all
Linux/Unix systems I’ve encountered).
One of the limitations of the instance statistics view (for example, v$sysstat)
approach is the statistics may be updated only when a call completes. And just as bad,
background processes may not have their CPU time recorded (as demonstrated shortly in
Figure 5-6). The time system model conquers this problem. To illustrate this, Figure 5-5 is the
result of a 5-minute operating system trace of the database writer. Doing some simple math,
since the duration is 300 seconds (5 × 60) and the database writer made 49 getrusage
calls, we know that Oracle at least has the capability to update the v$sess_time_model
(which feeds into the v$sys_time_model) about once every 6 seconds.
numerics from one situation to another. Don’t let any disdain for ratio analysis cloud their usefulness when properly
used.
142
[oracle@localhost ~]$ ps -eaf|grep dbw

oracle 2918 1 0 Jan22 ? 00:01:01 ora_dbw0_prod3
oracle 18897 18861 0 16:35 pts/0 00:00:00 grep dbw
[oracle@localhost ~]$ strace -cp 2918

[oracle@localhost ~]$ kill -2 2918

------ ----------- ----------- --------- --------- ----------------
100.00 0.000086 1 104 read
0.00 0.000000 0 104 open
0.00 0.000000 0 104 close
0.00 0.000000 0 104 kill
0.00 0.000000 0 312 times
0.00 0.000000 0 49 getrusage
0.00 0.000000 0 1557 gettimeofday
0.00 0.000000 0 6 pwrite64
0.00 0.000000 0 99 99 semtimedop
------ ----------- ----------- --------- --------- ----------------
100.00 0.000086 2439 99 total
Figure 5-5. Oracle processes call the getrusage system call to determine their resource
consumption details. This DBWR operating system trace duration was 300 seconds, which
means the database writer background process called getrusage about once every 6
seconds. By the way, notice the DBWR does make some read calls.
Oracle does, in fact, update the v$sess_time_model view with the database writer
resource consumption about every 6 seconds. I tested this by simply repeatedly running the
first query shown in Figure 5-6. The background cpu time statistic stayed constant for
6 seconds, and then I saw the value incremented. Notice the database writer CPU time is
classified as background cpu time, not DB CPU time. Oracle server process time is
placed into the DB CPU bucket.
Also notice the statistic value in the second query in Figure 5-6. This is the CPU time
consumed (instance statistic CPU used by this session) by the database writer since
the instance has started. But it shows zero seconds of CPU consumed! This demonstrates that
for the database writer, v$sesstat does not contain background process CPU consumption.
Therefore, if you want to classify background process CPU time, you must either gather from
the /proc virtual file system or simply query from v$sess_time_model or
v$sys_time_model.
Let’s shift from background processes to Oracle server processes. Just as with tracing
the database writer process, an active server process will make a getrusage call an average
of once every 6 seconds. I keep saying active because if a process is waiting for an event to
complete (for example, a multiblock read request or an enqueue), the getrusage call
obviously cannot be made. When this delay occurs, the time model details are not updated.
For example, if a session is waiting 20 seconds for a lock, its time model details will not be
updated until after the lock is acquired. Repeated tests demonstrate the delay is sometimes
longer, possibly due to the fact that the next getrusage call may not occur immediately
after the lock is acquired.
143
SQL> select to_char(sysdate,'HH24:MM:SS'), stat_name, value

from v$sess_time_model
where sid=223 and stat_name in ('DB CPU','background cpu time');
TO_CHAR( STAT_NAME VALUE

-------- -------------------- ----------
20:02:40 DB CPU 0
20:02:40 background cpu time 170452204
2 rows selected.
SQL> select to_char(sysdate,'HH24:MM:SS'), value from v$sesstat

where sid=223 and statistic#=12;
TO_CHAR( VALUE
-------- ----------
20:02:47 0
1 row selected.
Figure 5-6. This is an example of gathering CPU consumption for the database writer
background process. Notice v$sess_time_model contains a value greater than zero,
where v$sesstat always will show zero. It appears Oracle is not including the database
writer’s CPU consumption in the instance CPU consumption statistic. Also, by repeatedly
querying v$sess_time_model, the value was observed to be updated about every 6
seconds.
Figure 5-7 contains the same SQL as in Figure 5-6, but Figure 5-7 is reporting on an
active server process, whereas Figure 5-6 is looking at the database writer background
process. As we would expect, the time model CPU consumption is recorded in the DB CPU
statistics as 430,836 ms, which is 0.43 second. The v$sesstat CPU consumption records
the server processes has consumed 39 cs, which is 0.39 second. This time difference should
not surprise you. If you take any Statspack or AWR report and compare these statistics, you
will notice there is always a difference—but probably not enough to throw off the
v$sysstat-based service time analysis. Since the time model is based on increased
instrumentation granularity, if possible, always use the time model views.
144
SQL> select to_char(sysdate,'HH24:MM:SS'), stat_name, value

from v$sess_time_model
where sid=208 and stat_name in ('DB CPU','background cpu time');
TO_CHAR( STAT_NAME VALUE

-------- -------------------- ----------
22:02:26 DB CPU 430836
22:02:26 background cpu time 0
2 rows selected.
SQL> select to_char(sysdate,'HH24:MM:SS'), value

from v$sesstat where sid=208 and statistic#=12;
TO_CHAR( VALUE
-------- ----------
22:02:31 39
1 row selected.
Figure 5-7. This is a sample CPU consumption query for an Oracle server process. Notice
the v$sess_time_model CPU consumption is properly allocated in the DB CPU bucket.
Also notice the CPU times are slightly different. The time model view shows CPU
consumption at 0.43 second, whereas the v$sesstat view shows it to be 0.39 second.
Time Model Time Classification

The time model views clearly differentiate background and server process time, plus we still
can gather the parse time and recursive time values from v$sysstat. To save paper, I did
not paste in a sample script. However, the script is very similar to the one shown in Figures 5-
3 and 5-4. It’s just a little more complicated because two views are used (v$sysstat and
v$sys_time_model) and we have one additional class for the background process CPU
consumption.
Let’s classify the service time based on the Oracle Database 10g Statspack report shown
in Figures 5-8 and 5-9. The Statspack report (as does an AWR report) converts the Figure 5-8
time system model statistics into seconds, but the Figure 5-9 instance statistics need to be
converted appropriately. For CPU-related time, the instance statistics, which are based on
v$sysstat, are shown just as if they were queried directly from v$sysstat, which is in
centiseconds. Based on the system time model statistics shown in Figure 5-8, the total CPU
consumed by the Oracle instance during the 26-hour Statspack interval (I realize this is an
unusually long report interval, but this is what one of my students sent me to help diagnose
his system) is 48,126.3 seconds, which consists of DB CPU (Oracle server processes only)
time of 45,551.6 seconds and background cpu time of 2,574.7 seconds. Based on the
Statspack’s instance activity report, the parse time CPU consumption (not the elapsed time) is
187.69 seconds (18,769 cs) and the recursive CPU consumption is 6,425.75 seconds (642,575
cs).
145
Here is how it looks in table format:
Total CPU 48,126s (also known as service time)

Server Process CPU 45,552s (DB CPU)
Background CPU 2,575s (background cpu time)
Recursive CPU 6,426s (recursive cpu usage)

Parse CPU 188s (parse time cpu)
Notice that the instance CPU consumption figure based solely on v$sysstat is
33,020 seconds (statistic CPU used by this session), whereas based on the time
system model, the CPU consumption is 48,126 seconds. Even with subtracting the time
model’s background CPU figure, the v$sysstat value is off by around 30%. This is a good
example of why you want to use the time system model whenever possible.
The Oracle instance statistics shown in Figures 5-8 and 5-9 were gathered on a four-
CPU core server over a large interval of 1,560.15 minutes. This means during this interval, the
database server had the capacity to provide up to 374,436 seconds (4 cores × 1560.15 minutes
× 60 s/1 m) of CPU power. Since Oracle consumed 48,126 seconds of CPU, Oracle used
12.9% of the available CPU. If this instance were the only instance on the database server and
no other processes were consuming CPU, adding 10% operating system overhead, the
database server would probably be around 23% busy. At 23% utilization, the queue time is
not significant, and the average run queue will surely be less than the number of CPU cores.
Since there are four CPU cores, we can surmise the average run queue is between zero and
four. (If this is a surprise, please review the previous chapter, which covers the operating
system.)
146
Time Model System Stats DB/Inst: ABCCAT2/boxu07xy Snaps: 2361-2581

-> Ordered by % of DB time desc, Statistic name
Statistic Time (s) % of DB time

----------------------------------- -------------------- ------------
sql execute elapsed time 407,790.5 45.4
DB CPU 45,551.6 5.1
PL/SQL execution elapsed time 8,044.5 .9
parse time elapsed 2,404.2 .3
hard parse elapsed time 2,162.3 .2
sequence load elapsed time 426.9 .0
PL/SQL compilation elapsed time 207.2 .0
RMAN cpu time (backup/restore) 110.2 .0
repeated bind elapsed time 56.7 .0
connection management call elapsed 56.4 .0
hard parse (sharing criteria) elaps 25.9 .0
hard parse (bind mismatch) elapsed 15.4 .0
inbound PL/SQL rpc elapsed time 9.0 .0
failed parse elapsed time 5.9 .0
DB time 898,395.2
background elapsed time 92,025.5
background cpu time 2,574.7
Figure 5-8. This time model statistics snippet is from a 26-hour interval Oracle Database 10g
Statspack report. During the Statspack report interval, Oracle server processes consumed
45,552 seconds of CPU and the background processes consumed 2,575 seconds of CPU.
Instance Activity Stats DB/Inst: ABCCAT2/boxu07xy Snaps: 2361-2581
Statistic Total per Second per Trans

--------------------------------- ------------ -------------- ------------
CPU used by this session 3,301,964 35.3 0.8
CPU used when call started 1,865,425 19.9 0.5
...
parse count (failures) 428 0.0 0.0
parse count (hard) 34,033 0.4 0.0
parse count (total) 4,171,968 44.6 1.0
parse time cpu 18,769 0.2 0.0
parse time elapsed 86,096 0.9 0.0
...
process last non-idle time 93,606 1.0 0.0
recursive aborts on index block r 6 0.0 0.0
recursive calls 41,491,231 443.2 10.1
recursive cpu usage 642,575 6.9 0.2
redo blocks written 104,090,180 1,112.0 25.3
...
Figure 5-9. This is the instance statistics (v$sysstat based) snippet from the Oracle
Database 10g-based Statspack report shown in Figure 5-8. During the Statspack report
interval, v$sysstat reports Oracle processes consumed 33,020 seconds of CPU, parsing
consumed 1,877 seconds, and recursive SQL consumed 6,426 seconds.
147
The Ghost IO Bottleneck

When confronted with a wait event report like the one shown in Figure 5-10, most Oracle
DBAs will scream there is a blatant IO bottleneck. They will also be surprised when the IO
subsystem team laughs in their face and tells them to go away. So let’s look a little closer. The
top wait event, db file scattered read, clearly indicates Oracle processes are
waiting for multiple block read requests to complete. Clearly, a process needed Oracle blocks,
which at the time they were requested did not reside in the buffer cache. As a result, the
Oracle process needed to make a read call to the operating system. However, the operating
system was able to provide the blocks to Oracle in less than a single millisecond. And this
didn’t occur just once or twice. Based on the report shown in Figure 5-10 (which I did not
alter), this was the average situation for around 50,000 multiblock IO calls!
SQL>@swpctx
final values are available. If no output, press ENTER twice
Database: prod3 06-MAY-10 07:16pm

WT Time % Time Avg Time Wait

----------------------------------- -------- ------- ----------- --------
log file sync 3.050 6.65 63.5 0
latch: library cache 0.020 0.04 10.0 0
Figure 5-10. This classic OSM time interval wait event report shows the top wait event db
file scattered read having an average wait time of 0.6 ms. Does this represent an
IO bottleneck? Not unless you think a multiblock read request taking 0.6 ms is a problem.
So is there any IO subsystem bottleneck? Clearly, the IO subsystem capacity has

exceeded Oracle’s IO read requirements, which means there is not an IO bottleneck. But this
in no way implies there is not a problem. If users are unsatisfied with application
performance, you will probably be able to discover Oracle-focused, operating system-focused,
and application-focused solutions, such as the following:
148
• Oracle focused: If the desired block had been found in the Oracle buffer cache,
Oracle would not have asked the operating system for the block. Therefore, a larger
Oracle buffer cache would reduce the likelihood of a db file scattered
read. Of course, there needs to be both memory available and CPU available to
manage the increased buffer cache. Other possibilities are using the keep or recycle
pools, or other Oracle instance-focused solutions to increase the likelihood of the
block being found in Oracle’s buffer cache.
• Operating system-focused: From an operating system perspective, there is a high
likelihood the CPU subsystem is the bottleneck. This is very common when the top
event is IO read-related yet the operating system satisfies the request in only a couple
of milliseconds. If the IO subsystem is part of the database server (think small Linux
server), this is likely. For large systems with a separate IO subsystem, this is less
likely. If there is a CPU bottleneck, look for ways to increase the CPU capacity.
• Application-focused: From an application perspective, there absolutely must be
SQL asking for blocks that do not reside in Oracle’s buffer cache. If the blocks were
in the cache, the server process would not have issued a multiblock read request
resulting in the db file scattered read wait event. To find the responsible
SQL, look for the SQL with the most block reads (sometimes called block gets or
what I call physical IO). The SQL must be there, and you will see it. To reduce the
physical IO pressure the application is placing on Oracle, which then impacts the
operating system, tune the SQL or reduce its execution rate.
This situation is very common; if you have not encountered it before, you will. The key
to diagnosing this situation (or any for that matter) is to not remove initially uncomfortable
data from your analysis. Either the data is incorrect (which is something to consider) or your
analysis is not yet complete.
More Than Just an Average

Up to this point, all wait time discussed has been the average wait time. Just as with weather
temperatures, you should expect values both above and below the average. While the average
conveniently simplifies potentially thousands of values into a single value, we do lose some
information through this statistical simplification. Starting with Oracle Database 10g, the view
v$event_histogram provides a more detailed look into the actual wait times, providing
more than just the simple statistic of average.
Figure 5-11 is a good example of how a histogram perspective adds important additional
information to your analysis. While not shown, a standard wait event report showed the top
wait event to be db file scattered read, with an average wait time of 14.9 ms.
While the average multiblock read request is greater than my rule-of-thumb of 10 ms, some
IO administrators will argue this is not significant enough to motivate change. But what most
people miss is that over 25% of the multiblock read requests took more than 16 ms, and 18%
of the requests took an excess of 32 ms! That is shocking and makes the argument to reduce
IO response time stronger and more urgent.
149
SQL> @swhistx.sql db%scat%

Remember: This report must be run twice so both
the initial and final values are available.

Report: swhistx.sql OSM by OraPub, Inc. Page 1
Wait Event Activity Delta Histogram (event:db%scat%)
Running
Wait Event ms Wait <= Delta Occurs Occurs %
-------------------------------- ---------- ------------ --------
db file scattered read 1 42502 36.02
Figure 5-11. Based on the v$event_histogram view and during the report interval,
while the average wait time is 14.9 ms (not shown), it can seen that about 25% of the
scattered read waits took more than 16 ms. The far-right column is a running total.
Unfortunately, the histogram granularity is limited. The millisecond increment shown in

Figure 5-11 is the most detailed time breakdown we can get. This is particularly vexing,
because most of the really interesting Oracle wait activity happens around 10 ms. When
working in the millisecond neighborhood, a jump from 8 ms to 16 ms is massive, so we lose
some valuable diagnostic information. However, the information available is helpful.
One of the more intriguing v$event_histogram applications is gaining an
undocumented understanding of Oracle’s interworkings. For example, the free buffer
wait event (which will be discussed in the next chapter) is documented to cause a 10 ms
wait, period. However, if the event histogram view is to be trusted, then why have DBAs seen
15% of the wait times greater than 16 ms? (I have seen this many times.) This indicates there
are other things occurring inside Oracle, which we as DBAs are not aware of, forcing us to
look a little deeper into Oracle’s inner workings.4
4
If you really want to learn about Oracle internals, especially the internals that practical performance analysts
don’t care about, then become an Oracle kernel developer. But if you want to learn about internals that actually affect
a DBAs work, you may be surprised to find a lot of useful information by reading Oracle patents. Yes, that is correct;
patents. Now, I realize this is a little strange. But that’s how I find out about how Oracle works without getting my
hands on illegal proprietary information. Go to http://patft.uspto.gov/netahtml/PTO/search-bool.html and do a search
with the assignee as Oracle. That will get you started! Patent information is publically available and free. Keep in
mind that just because Oracle has a patent does not mean it must implement the invention in its kernel code.
150
Wait Event Myths

It was a long and painful struggle. When the wait interface was first available, no one knew
about it, so convincing people it was valuable was in direct opposition to the established
Oracle performance analysis community. Historically, going against a scientific establishment
can mean you are publicly discredited and potentially searching for a new job. But after many
battles and unfortunate personal attacks, the wait interface prevailed. However, the wait event
community has kept pushing so hard that, in some cases, they have misrepresented the true
performance situation, bringing into question some performance analyses based on wait event
information. This section highlights three of the most common wait event myths. This is
important, because as performance firefighters, we need to be aware of the wait interface
limitations so we don’t misdiagnose.
Decreasing Wait Time Always Improves Performance

Decreasing wait time usually does improve performance, but not always. First, a user does not
feel only wait time, but wait time plus service time. Second, through cyclical performance
optimization cycles, our goal is to reduce response time—again, not wait time or service time,
but their combination.
I find the mental snare we fall prey to is due to our years of running experiments and
tests. When running an experiment, we try to change a single parameter so we can observe the
impact of changing that specific parameter. So if we look at the simple response-time
equation of response time equals service time plus queue time, we think if we reduce queue
time, then obviously response time will also decrease. But the hidden myth and assumption is
that when queue time decreases, the service time will remain the same. This is completely
false. Oracle is not so polite. If we alter the performance situation by focusing on changing
the queue time, we may inadvertently also cause a service time change. This is why when
basing an Oracle performance analysis (even when using the OraPub 3-circle methodology)
on a wait event report, the resulting diagnosis can be incorrect and misguided.
A very simple and straightforward illustration of this relates to latch contention. As
discussed in Chapter 3, the latch wait event is process latch sleep time, and spinning on a latch
is actually service time (and will be recorded as such). So it is entirely conceivable and
demonstratable that a reduction in wait time will result in a larger increase in spin time.
Although the wait time truly can be reduced, the more important response time can actually
be increased!
Let’s use Figure 5-12 as an example. Figure 5-12 is OraPub’s interactive instance-level
ORTA report. In summary, it gathers initial statistics, sleeps for the given interval (in this case
120 seconds), wakes and gathers statistics again, and reports their differences. Basic stuff, I
know, but this report shows both service time and wait time, and sample active SQL during
the report interval (not shown). Given the same workload (which is not shown on this earlier
version of rtsysx.sql), our objective is to reduce the response time (shown as 305
seconds)—not just reduce the service time (shown as 119 seconds) or the wait time (shown as
186 seconds).
151
SQL> @rtsysx 120 10
*** Response Time System Summary (delta - interactive - instance level)
CPU Tot Wait IO Wait Other Wait

Response Time Ora CPU Time Time Time % %
Time(sec) (sec) Util % (sec) (sec) (sec) IO Wait Other Wait
---------- ------- ------- -------- -------- ---------- ------- ----------
305 119 98.9 186 17 169 9 91
*** Response Time I/O Summary w/Event Details
IO Wait IO WRITE IO READ

Time Wait Time Wait Time % IO % IO
(sec) (sec) (sec) Write Read
-------- --------- ---------- ----- ----
17 16 1 95 5
Tot Wait Avg Wait
IO Wait Event R,W Time (sec) % Time (ms)
----------------------------------------- --- ---------- ----- ----------
log file sync W 13 77 29.9
control file parallel write W 2 10 4.2
*** Response Time Other Waits (non-I/O) Event Detail

(delta - interactive - system level)
Tot Wait
Time Avg Wait
Non IO (other) Wait Event (sec) % Time (ms)
--------------------------------------------- -------- ----- ----------
latch: cache buffers chains 168 100 0.9
Figure 5-12. This OraPub 120-second interval response time report clearly shows Oracle
processes are suffering from intense cache buffer chain latch contention. A response time-
focused analysis will work on reducing the response time, not only the wait time or the service
time.
Most Oracle DBAs will look at the 168 seconds of cache buffer chains time and focus
on reducing the wait time, without regard to the effect on service time or the Oracle workload
(not shown). Suppose they decide to increase the spin count. While this may indeed reduce
latch sleep time, it will very likely also increase CPU consumption! If the DBA is lucky, this
may result in a net response time decrease. While we appreciate luck and take it when it
comes, that’s no way to optimize systems or build a career. The solution here is to focus on
reducing response time, because that is closer to what the user experiences.
Decreasing Wait Time Decreases End-to-End Response Time

As Oracle systems become increasingly complex, the time attributed to the Oracle database
server is becoming less significant. Said another way, Oracle’s time contribution toward what
a user experiences is decreasing. While that may sound fine, this also means Oracle’s time
contribution is becoming more and more insignificant. This makes our job more difficult and
152
our impact less significant, because our optimization efforts will have less and less effect on
the total user experience. (This is also not great for job security!)
End-to-End Response Time Defined

First, let’s be crystal clear about what end-to-end response time means. It is what the end user
personally experiences. If the end user held a stopwatch and timed, for example, a query, that
is end-to-end response time. It’s not what Oracle’s performance views can possibly show us.
It’s not what our ORTA shows us (sorry to disappoint you). It’s not what an Oracle wait-event
analysis shows us. It’s not what the network team shows us. And it’s not exactly what the bot
on the user’s PC tells us. It is only what the user experiences. Whenever you hear people
mention end-to-end response time, be sure you understand their definition, because it may not
be the same as your definition.
An Oracle process time contribution increase does increase end-to-end response time,
but understanding the details is becoming more and more complicated. Here are just of few of
the time consumers between the database and the end user: cloud computing, security checks,
application servers, web servers, web services, and load-balancing algorithms.
In the not-so-distant past, everything except the users and their terminals resided on the
database server. Now, as shown in Figure 5-13, different architectural components are spread
out (potentially) in different parts of world. The key to diagnosing a performance problem is
understanding where the data flow is being blocked or taking a relatively long time to
respond. When both the Oracle client and server processes resided on the database server, the
problem nearly always resided on the database server. Now, with the various architectural
components dispersed, the problem can be spread all over the world.
Figure 5-13. A simplified Oracle architecture consisting of key timing components: the end
user, the user’s web browser, Oracle’s client process, and the Oracle server process residing
on the database server. For a strong diagnosis, we need time consumed for each component
and between each component.
Some End-to-End Response Time Realities

Let’s inject some time into Figure 5-13. Based on the following numbers, the true end-to-end
response time is 50 ms, which is not unrealistic by today’s web-based application standards.
Oracle server process time 10ms 20%

Network time: Oracle server and client process 5ms 10%
Oracle client process time 20ms 40%
Network time: Oracle client process and browser 15ms 30%
Time between web browser and actual user 0ms 0%
153
Let’s say you’re an Oracle tuning hotshot. You perform a stellar OraPub 3-circle
response time-based performance analysis, and reduce the Oracle server process time by 80%,
from 10 ms to 2 ms. That would be an amazing feat to be sure. But think about the impact
from the end user’s perspective. What used to take 50 ms, now takes 42 ms. So the user
experiences a 16% improvement. When the users are asked how the system is working now,
they might say it feels a little faster. So, doing a fantastic database-centric tuning job did not
make enough of a difference to satisfy the end user. But the situation can become even more
bleak.
Suppose the end-to-end response time is 115 ms, using the architecture shown in Figure
5-13, and the timing breakdown as follows:
Oracle server process time 7ms 6%

Network time: Oracle server and client process 2ms 2%
Oracle client process time 5ms 4%
Network time: Oracle client process and browser 101ms 88%
Time between web browser and actual user 0ms 0%
Let’s say you’ve a true gift from the Oracle performance gods and reduce Oracle server
time by 86%, from 7 ms down to a mind-blowing 1 ms. Even so, the users’ experience will
improve only from 115 ms down to 109 ms. When they don’t feel the 5% end-to-end response
time improvement, your achievement will not get the appreciation you think it deserves. The
point is that as DBAs, our impact, while perhaps technically miraculous, may have little effect
on the end-user experience.
To make matters even worse, because historically the database was the problem, end
users are trained to expect it’s an “Oracle problem,” putting anyone remotely connected with
Oracle on the defensive. It is not unusual for the DBA to reconstruct the end-to-end response
time profile before the group who can make a real performance difference will get involved.
Another tricky and rarely discussed reality is that through both tracing Oracle processes
or using Oracle’s performance views, it is impossible to determine end-to-end response time.
Let me say that again—impossible. Don’t be fooled by fancy vendors or the promises of
tracing and session profiling. Tracing and Oracle’s performance views reach from the
database server out to as far as the Oracle client process, and then stop. Beyond the Oracle
client process, DBAs must use other tools to understand the remaining user experience time.
But it gets even more depressing. By looking at all sessions—that is, at a system-level
response-time analysis perspective, the best the performance views can do is tell us about the
timing related to the server process. We lose the ability to ascertain the time between the
Oracle server process and its associated client process(es).
It’s time to introduce the wait event, SQL*Net message from client.
The SQL*Net Message from Client Wait Event

SQL*Net message from client (SNMFC) is a fascinating and misunderstood wait
event. Simply stated, Oracle server processes post this event when they are waiting for a
message from their client process so they can do some work. Oracle server processes are
either consuming CPU or waiting for something. When a server process has nothing to do, it
posts the SNMFC wait event.
This wait event can be posted because a user is taking a coffee break, resulting in both
an idle client and server process. It could also be posted because the user is thinking about
what to do next, again causing both the client and server process to be idle. A server process
154
could also be posting this event because a network problem is preventing it from receiving a
message the client process has already sent. And finally, the server process could post the
event because the Oracle client process is doing some advanced processing that is taking a
while to complete. In each of these cases, the SNMFC wait event is posted by the server
process. So our challenge is to understand if we care!
When looking at all sessions together as a single unit of work, known as profiling the
system, while some SNMFC wait events may be the result of a network or client processing
issue, the vast majority of the SNMFC postings will be from both an idle client and server
processes patiently waiting for something to do. This is why most wait event reports filter out
this wait event. In fact, because most Oracle sessions spend more time waiting than doing
work, if the SNMFC wait time were included in the typical wait event report, it would
effectively dwarf all the other wait time we dearly care about and desperately need for our
analysis.
However, there are times when SNMFC is immensely useful, and every Oracle
performance firefighter needs to understand this. Suppose we are watching both the end user
and the corresponding server process very closely. So closely, in fact, that we discover that
both the user and the Oracle server processes are waiting. The user is waiting for a query to
complete, and the Oracle server process is patiently waiting for a message from its client
process. This is a strange situation indeed and indicates there is problem between (but not
including) the database server process and the end user. If this is confusing, read this again
and refer to Figure 5-13.
If the problem were centered at the database server, the server process would be either
consuming CPU or posting a wait event related to latching, IO, or perhaps an enqueue. But
since the server process is posting an SNMFC, we know the problem is beyond the database
server. If the problem were with the end user, the user would not be staring idly at the screen.
A story about one of my consulting engagements may clarify this concept. The
architecture was classic client/server, with the server process on the database server, the client
process on the user’s PC, and the network allowing the two processes to communicate. The
users told me when they executed a particular application function, it was taking around 30
seconds to complete. The DBAs and the operating system team noticed the database server
was essentially idle, and expected the problem to be focused squarely on the vendor
application. And as you would expect, the vendor proclaimed the database server was
undersized. I told the DBAs that we needed to profile a session while running the key
application function that was causing so much pain. Using only Oracle’s performance views
(that is, no tracing was involved), repeated tests resulted in this situation:
End user response time 19s

Oracle CPU time 1s
Network and client process time 12s
Database server IO time 3s
Unaccounted for time 3s
First, there was no denying the users were waiting and staring at their screen for 19
seconds. I was standing there watching them! The Oracle CPU time was gathered from
v$sesstat, the network and client time was wait event SNMFC time, database server IO
was all IO-related wait event time, and any leftover time was placed into a category I called
unaccounted for time. No one disputed the numbers, as they could be easily demonstrated and
155
were based on the rtsess.sql OraPub script.5 Obviously, the majority of time was spent
either by the Oracle client process or network activity between the client process and the
server process. We did a few quick tnsping executions, and as everyone expected,
communication to the database server on the floor below us took around 1 ms. This meant
nearly all of the 12 seconds was consumed by the Oracle client process. It turned out the
application was doing some fairly advanced numerical calculations, which required
significant CPU resources. The vendor could have responded that the user’s PC was
undersized, but the vendor knew the customer’s PCs met the stated sizing recommendations.
This squarely placed any real performance improvement on the vendor. It was one of those
rare (but not impossible) situations where focusing on the other two circles (Oracle and the
operating system) would not materially improve performance.
The unaccounted for time of 3 seconds is important to understand. There are many
reasons for this time. Because the session was not being traced, the data collection was
manually started and stopped. The manual collection can and does produce some error, but
this can be minimized by familiarizing yourself and your user with the script. There are also
issues related with the posting for SNMFC wait time because v$session_event is updated
perhaps only every 3 seconds or when the wait situation changes. So from a script
development perspective, you need to be aware of this.
The final cause for the unaccounted for time is simply because there could be 3 seconds
of time between the Oracle client process and the user’s experience, which is the missing
component of the true end-to-end user time.
In a client/server architecture, there should be no time between the Oracle client process
and the user’s experience. However, with modern web architectures, this is not the case.
Referring to Figure 5-13 again, the time between the Oracle client process and the user’s web
browser would be represented by unaccounted for time. This is your clue that the true
performance issues reside outside Oracle’s realm.
If the Oracle database is deemed guilty until proven innocent, I encourage you to either
acquire or develop your own end-to-end response time monitoring system. This is not
something you can throw together in a few hours. Indeed, there are companies based on
providing infrastructure solutions6 to understand the user’s experience. But it can be done
using creative scripting, information from Oracle’s performance views, and data from the
networking group, and placing bots on a few strategically located PCs.
Profiling a Session Is Always the Best Approach

Session profiling has reached fanatical appeal. I spoke about this topic at a massive Oracle
conference a few years ago. Afterward, a person meekly walked up to me. Looking over his
shoulders, he quietly said, “Do you know what you just said? You are saying that profiling is
not the best approach.” He was clearly disturbed. Session profiling had become so personal to
him it was like I was questioning this deeply held faith. My response was gentle, and I clearly
restated my key point: profiling a session is not always the best approach. And, in fact, it is
easy to be misled by reading more into the situation than is actually there. To bring clarity to
5
This script is available today for free in the OSM tool kit. The version used in this consulting engagement
watched a session based on its session identifier and serial number. The script has since been renamed to
rtsess8.sql, and the current rtsess.sql script is based on the session(s) client identifier. I will talk more
about the client identifier in the DBMS_MONITOR section later in this chapter.
6
If you see the word solution in an IT product, multiply the typical firefighting product cost times a million
dollars. If you see the word infrastructure, do the same thing. If you see both, run far and run fast.
156
this sometimes emotional topic, I will first define profiling and then present how to avoid
being misled.
Profiling Defined
Besides all the political DBA industry talk about profiling, software engineers also use the
term when analyzing performance. The term can take many different flavors and is applied to
many disciplines. For example, the term profiling can be used for software analysis that is
event-based, statistics-based, instrumentation-based, and simulation-based. So to limit the
term profiling to one particular method of Oracle performance analysis breaks the actual
definition of the word.
Unfortunately, when many DBAs hear profiling, they think of tracing an Oracle session
and categorizing the time. However, based on the industry-accepted definition of profiling,
analyses based on an Oracle trace, an operating system trace, Oracle’s performance views,
and sampling directly from the SGA are all profiling. So there are many ways to profile an
Oracle system, an individual session, or a group of Oracle sessions.
ORTA, which is also profiling, was initially injected into the Oracle community back in
2001 when I published a paper on the subject and started presenting it at Oracle conferences
and internally within Oracle. I can still remember my presentation on the subject at a
Computer Measurement Group (CMG) conference. It created quite a stir because most of the
attendees had no idea we could gather and classify Oracle time. All this to say, profiling is
indeed useful and practical, but let’s keep the definition as it is defined by the larger IT
community and get on with it.
The Trap
The session profiling or session-level response time analysis7 trap is insidious. It works likes
this. First we profile a session and a report is produced. The report classifies time wonderfully
by providing a breakdown of response time and various levels of classification. We may also
see the SQL executed during the profile. Then we make the fatal assumption that the response
time and its components are based entirely on the SQL executed during the profile. Doesn’t
that sound correct? But it is not correct.
The trap begins when we hold the report in our hands and we think we hold both the
cause and effect before us. We start believing that the SQL was run in isolation and is
unaffected by anything and everything else occurring on the system. For this to be true, the
database server would need to be completely dedicated to that specific SQL. All other
database server activity and operating system activity would need to halt while your SQL ran.
Here’s a very simple yet crystal-clear example. When a SQL statement reads 1,000
blocks from disk, is it because the SQL statement is not tuned, the buffer cache is too small
for the table, or other SQL statements replaced cached blocks with other blocks our SQL is
not interested in? The answer is possibly, possibly, and possibly.
To think that a single SQL statement is unaffected by everything else on the system goes
against the very essence of why Oracle is an awesome database and why operating systems
were invented. Both Oracle and operating systems exist, in part, to share scarce resources.
Somehow, the technology is supposed to cause balance and fairness for all the requests for
7
OraPub’s Oracle performance view-based session level response time analysis (profiling) report is
rtsess.sql. The instance-level response-time analysis (profiling) report is rtsysx.sql.
157
computing resources. But when we look at a profiled session, we start forgetting all of this
and begin to think in isolation.
The Solution
Fortunately, there is a very simple solution. Sure, you should profile the session, as this is
valuable and helps identify where the performance issue resides. But also profile the entire
Oracle system and include an operating system analysis. Essentially, in addition to profiling
the session, perform an OraPub 3-circle analysis and make sure you also perform a system-
level ORTA. This will enable you to understand the relationship that exists between the
profiled session and the rest of the computing environment. It will also allow you to better
understand the impact, both from a session and system-wide perspective, of any changes you
may decide to make.
So simply widening our analysis breadth brings our analysis back in order and actually
strengthens it. And that’s a good place to be.
Modern Architecture Statistics Collection

This section focuses on collecting performance statistics (not optimizer statistics) in modern
Oracle architectures. The difficulty associated with this type of data collection is the result of
the end users being increasingly separated technically and possibly physically from where
their requests are processed. This separation causes significant technical challenges when
profiling a single end user’s activity. To understand the challenges and truly appreciate the
solution, we need to first look at the dramatic changes in Oracle system architectures.
Why We Need a Better Collection Facility

Oracle architectures have changed greatly over the years, making session profiling (Oracle
session-level response-time analysis) nearly impossible without purchasing a product. Keep in
mind that there are always architectural variations, but the core dynamic remains true.
As shown in Figure 5-14, in the 1980s, both Oracle’s SGA and the Oracle server and
client processes resided on the database server. It’s important to understand that because the
Oracle server process works directly on Oracle’s cache (which resides on the database server
in shared memory segments), the server process must also reside on the same machine as the
SGA. So regardless of all the creative and fascinating Oracle architectures we can think of,
each requires the server process and the SGA to be on the same physical machine.8 Profiling a
session based on a Figure 5-14 architecture was very simple, as there was a single server
process for every client process and the client process could be easily associated with the end
user.
8
The only way this can change is if the operating system allows shared memory segments to be modified by a
process on another server (physical or virtual). I suspect this will eventually be overcome.
158
Figure 5-14. The classic, though no longer commonly used, Oracle architecture with both the
client and server process residing on the database server and communicating via SQL*Net.
The user interface is typically a terminal or a terminal emulator. While network traffic is
minimized, the user experience is bland, and the database server activity can easily reach
capacity.
Figure 5-15 is very similar to Figure 5-14, yet the difference is significant. Figure 5-15
shows Oracle in what is traditionally called a single-task architecture. The server and client
processes are linked into a single large executable. These single task executables actually
have the letters st appended to the standard names to distinguish them from their two-task
relatives. Operating in a two-task architecture, the client and server process communicate via
SQL*Net. With a single-task architecture, because both the client and server components are
part of the same executable, they communicate using function calls. When heavy
communication is required between the server and client processes, a single-task architecture
could boost performance by 10% to 50%. This made operations like an import much faster.
Profiling a session was very simple, as there was a single client/server process associated with
every end user.
159
Figure 5-15. Combining an Oracle client and server process into a single executable. This
single-task architecture has the advantage of the Oracle process communicating through
program function calls instead of SQL*Net. Just as with a two-task architecture (Figure 5-
14), the same user experience and database server capacity issues exist.
With its amazing marketing machine, in the early 1990s, Oracle introduced client/server
computing. As shown in Figure 5-16, this allows the server and client processes to run on
different machines while communicating via SQL*Net. For end-user computing, the client
process typically resided on the user’s PC. However, for batch processing, it is common for
both the client and server processes to remain on the database server. I don’t know if the
Oracle architects planned for the client and server process separation, but regardless, Oracle’s
two-task architecture fit perfectly into this model and helped systems scale (not to mention
Oracle’s stock price). From a session profiling perspective, it was still a fairly simple task to
associate an end user with both a client and server process.
160
Figure 5-16. Oracle’s client/server architecture. Notice the Oracle client process has been
shifted from the database server to the end user’s PC. This allows the client process to focus
more on the user experience at the cost of increased PC computer power requirements, PC
maintenance issues, and increased network activity.
In the 1990s, memory was relatively expensive and the maximum physical addressable
memory was limited. Oracle systems also continued to grow larger and larger. For Oracle
systems to continue to scale and support more users and processing, Oracle needed to find a
solution. As shown in Figure 5-17, this solution was to allow a server process to be shared by
multiple clients. This was ingenious, because for OLTP processing, server processes are more
idle than busy (this is why we typically filter out the SNMFC wait event). By creating a
shared-server process, Oracle systems could continue to scale.
While Oracle systems could now continue to scale, profiling a session became messy.
Now that a server process could process requests from multiple clients, associating a server
process activity with a specific client or a specific end user became nearly impossible without
specialized tools. This core challenge remains today.
There was another force beginning to rise. As developers began to enrich the user
experience and shift processing requirements away from limited database server resources,
the capacity of desktop PCs increased. IT budgets started to suffer because of increased
operating system and application software licenses, physical hardware costs, security-related
issues, and all the related maintenance requirements.
161
Figure 5-17. To enable Oracle database servers to handle an increasing number of end users,
Oracle created server processes that could be shared among client processes. This was
brilliant design, because in an OLTP environment, server processes remain very idle.
This resulted in what Oracle CEO Mr. Ellison coined network computing. While many
have dismissed his prophetic words, modern-day web architectures have proved him right. In
fact, vendors today are now selling low-processing computers known as netbooks. As Figure
5-18 shows, what essentially happened is the end user’s computer is focused on display and
navigation, while all other processing is handled by an increasingly dizzying array of
computing systems.
Figure 5-18 is in some ways misleading because it implies a one-to-one relationship
from the end user to the server process. In addition to accounting for a specific client’s
activity on a shared server, with application servers, web server, connection pooling of all
kinds and at various layers, the task of associating an end user’s activity to part of a server
processing is nearly impossible. This created not only a performance analyst problem, but at a
basic level, also a support issue. Suppose an end user calls her support department with an
issue. How does the support department know what the end user is truly doing? Many times
this is not possible, so the support technician must make inferences. To truly understand what
the user is doing, the user’s activity must be isolated in a controlled environment. But what if
the problem is the result of noncontrolled environment issues? Now diagnosing user
experience issues is no longer just muddy, it’s nearly impossible to isolate with satisfactory
clarity.
As I mentioned, application software vendors anxious to avoid being blamed for poor
performance stepped in to help identify a user’s activity. Additionally, other companies
introduced tools to help profile a user’s activity in this messy environment. From an IT
management perspective, it’s complicated, many people are involved, and it’s expensive. This
boils down to increased risk, and that’s a word no IT manager wants to hear. I believe this has
162
motivated Oracle to take some leadership and work on providing at least a partial solution.
This partial solution is indeed a big step, and it is called DBMS_MONITOR.
Figure 5-18. Many factors—such as cost, maintenance, security, the end users’ experience
and location, and computing system capacity—helped push Oracle to embrace a web-based
architecture. This is a highly simplified drawing highlighting the key Oracle architectural
components: browser, Oracle client process, Oracle server process, and the Oracle SGA.
Oracle’s Solution: DBMS_MONITOR

Conceptually, all database activity flows through DBMS_MONITOR, and if any activity meets
certain specific criteria, the activity is captured. So instead of focusing on a specific user
(which you can do), you focus on defining the criteria of interest. Oracle introduced
DBMS_MONITOR in Oracle Database 10g, and unlike other diagnostic features, it does not
require an additional license.
This package combines many of the ways we have traditionally traced Oracle processes,
but extends it to meet modern architecture challenges. DBMS_MONITOR allows for the
creation of trace files, including bind variables, SQL, and wait event details. Additionally, it
collects v$sesstat and v$sess_time_model-based statistics. DBMS_MONITOR is a
very flexible and nicely designed package that eventually every DBA will use.
Overhead on the database server is always a concern during data collection. Vendors go
to great lengths to reduce both the real and perceived impact of their data collection. Based on
my experience, DBMS_MONITOR tracing places no more load on the database server than
traditional tracing, and I have been unable to detect a noticeable load increase when gathering
statistics. I suspect the no-load statistics collection is because Oracle’s kernel code
instrumentation and statistical sampling (see the “Active Session History” section a little later
in this chapter) is already in place and working, regardless of statistics being recorded. So any
additional load would simply be the statistics being updated in memory and their associated
x$ tables.
It Helps to Change Our Mindset

When working on a very specific and well-defined performance problem, our training and our
success have taught us to focus on a specific Oracle session. In fact, if you understand what
led up to DBMS_MONITOR, you can see that the focus has always been on attaching to a
163
specific end user’s server process. With a complex architecture, even when using
DBMS_MONITOR, this may still not be possible.
But all is not lost. Based on specified criteria, DBMS_MONITOR is able to gather
performance information for groups or classes of activity. For example, when speaking with a
user on the phone, suppose he says that querying basic customer information is painfully
slow. Perhaps, for any number of reasons, you cannot easily identify his client or server
process. However, when you look at v$session, you discover the application developers
instrumented the application using DBMS_APPLICATION, setting both the module and
action. You also discover that the application module apcustquery is being run when a
customer’s basic information is being queried. Here’s the key: If the problem is related to
what the user (or another user doing a similar task) is doing, not about the specific end user,
then DBMS_MONITOR can capture both SQL and performance statistics when a customer’s
basic information is queried. Said another way, if it’s beneficial to capture apcustquery
performance information regardless of which users run the module, DBMS_MONITOR can
help. But as you’ll see, DBMS_MONITOR may be able to identify just the end user’s activity.
Classifying the activity of interest can be done in a surprisingly large number of ways—not
just by an application’s module and action.
So while DBMS_MONITOR still attaches to a server process, it will only collect
performance data based on the defined criteria. If you can make the mental shift from end user
to end user classification, combined with a little creativity in identification criteria, then
DBMS_MONITOR will serve you well! The next section focuses on strategies to identify a
session or a group of sessions.
How to Use DBMS_MONITOR

To successfully use DBMS_MONITOR, follow these steps:
• Identify the session or sessions of interest.
• Enable tracing, statistics collection, or both.
• Wait while the data is being collected.
• Query the appropriate activity statistics view.
• Disable tracing, statistics collection, or both.
• Consolidate the potentially many trace files into a single trace file.
• Tkprof the trace files.
• Perform your analysis.
Let’s take a closer look at each of these steps, followed by an actual example.
Criteria Specification: Identify the Session(s) of Interest

This is probably the most unique aspect of using DBMS_MONITOR. Because you have many
session identification options, you’re likely to find a way to identify session(s) of interest.
Sessions can be identified by their Oracle instance, session identifier (v$session.sid),
client identifier (v$session.client_identifier), service name
164
(v$session.service_name), module (v$session.module), program

(v$session.program), and various combinations.
Once the session is attached, data collection starts and will continue until the session
qualifications are no longer met. For example, if the DBMS_MONITOR were set to trace all
sessions with a program name of arpost, once a session’s program name became arpost,
a trace file associated with the running server process would be created and tracing details
written. But once the session either disconnected or the program name changed, the tracing
would stop. Then if the session’s program were reset to arpost, the tracing information
would once again start flowing. The power lies in the fact that it makes no difference which
server process or client process the session is associated with at the time. If two different
server processes are involved with a single or even multiple sessions with their program set to
arpost, then two trace files will be created. This allows for connection pooling, Oracle’s
multithreaded shared server capability, and various other architectures to benefit.
Since the key is to identify Oracle server processes of interest, you need to investigate
how the session identification columns are set for your application. You may be surprised by
what you see, so run a simple query and take a look. Figure 5-19 is the result of such a query
run on a real PeopleSoft system. While not shown because DBMS_MONITOR does not
directly filter on this column, the client_info column provides a wealth of information
about the PeopleSoft session. Notice that the application and the DBA have not set the client
identifier column, so unless the DBA takes action to set this column, it will remain empty. By
closely examining such a query on your system, you will begin to understand your session
identification options.
One of the most powerful ways to identify a session is based on its client identifier. This
column resides in v$session and can be set by the
dbms_session.set_identifier procedure. Increasingly, applications are setting the
client identifier in addition to the module and action.
If sessions are not persistent (that is, they connect, run SQL, and disconnect), then
combining a logon trigger with setting the client identifier may enable you to pinpoint a
specific user or group of users. Figure 5-20 shows a working logon trigger used for just such
an occasion.
165
SQL> select client_identifier CID, service_name service, module,
2 username, osuser, machine, program
3 from v$session
4 where type='USER'
5 order by 1
6 /
CID SERVICE MODULE USERNAME OSUSER MACHINE PROGRAM
----- -------- --------------------------------- -------- ------ -------- --------------------------
FSPRD11 PSMONITORSRV@xyzhstnm (TNS V1-V3) SYSADM psoft xyzhstnm PSMONITORSRV@xyzhstnm (TNS
FSPRD11 PSAPPSRV@xyzhstnm (TNS V1-V3) SYSADM psoft xyzhstnm PSAPPSRV@xyzhstnm (TNS V1-
FSPRD11 PSPRCSRV@xyzhstnm (TNS V1-V3) SYSADM psoft xyzhstnm PSPRCSRV@xyzhstnm (TNS V1-
FSPRD11 PSDSTSRV@xyzhstnm (TNS V1-V3) SYSADM psoft xyzhstnm PSDSTSRV@xyzhstnm (TNS V1-
FSPRD11 PSAESRV@xyzhstnm (TNS V1-V3) SYSADM psoft xyzhstnm PSAESRV@xyzhstnm (TNS V1-V
FSPRD11 PSMSTPRC@xyzhstnm (TNS V1-V3) SYSADM psoft xyzhstnm PSMSTPRC@xyzhstnm (TNS V1-
SYS$USERS SQL*Plus SYSADM xyzuse XYZ\XYZL sqlplus.exe
FSPRD11 PSANALYTICSRV@xyzhstnm (TNS V1-V3 SYSADM psoft xyzhstnm PSANALYTICSRV@xyzhstnm (TNS
Figure 5-19. This simple query can be used to gain a better understanding of how a particular Oracle system’s sessions can be identified.
Notice the client identifier has not been set; the service name, module, machine, and program have been set. This is real data from a
PeopleSoft system, altered to disguise the system.
166
CREATE OR REPLACE TRIGGER set_client_id_trigger

AFTER LOGON
ON DATABASE
DECLARE
client_id VARCHAR2(64);
sess_user varchar2(64);
BEGIN
-- For complete list of USERENV options, see:
-- Internet search on oracle and sys_context
select sys_context('USERENV','SESSION_USER')
into sess_user
from dual;
if sess_user = 'MG' then client_id := 'WLC_1';

elsif sess_user = 'OE' then client_id := 'WLC_2';
elsif sess_user = ’IN' then client_id := 'WLC_2';
end if;
dbms_session.set_identifier(client_id);
END set_client_id_trigger;
/
Figure 5-20. This actual working logon trigger can be used as a template to set a session’s
client identifier based on an amazing array of possibilities.
There is a lot to glean from Figure 5-20. First, remember that if the application uses a TP
monitor or Oracle connections are persistent, then this approach is not likely to work. The first
time the persistent connection is made, its client identifier will be set and cannot be reset by a
logon trigger. Perhaps somewhere else in the application, or through your methods, the client
identifier will be reset, but obviously the logon trigger is fired only once for each session.
Pay close attention to the use of the sys_context function. This function can produce
a wide variety of user-identifying information, such as the host from where the client process
connected, instance name, language and territory, module, network protocol, operating system
username of the client initiating the connection, service name, session identifier, and terminal.
With this many options, it is highly likely you can identify the user or at least a group of
users, and then set their server process’s client identifier, enabling DBMS_MONITOR to gather
the requested performance data!
Enable Tracing, Statistics Collection, or Both

Based on how server process activity is identified, the best DBMS_MONITOR procedure will
become apparent. Tracing and statistics collection are independently switched on and off by
their own procedures; that is, if you want to trace and collect instance statistics, two
procedures will be called to turn on the collection, and then two different procedures must be
called to turn off the collection. A simple describe of DBMS_MONITOR will produce a
couple pages of options. If you want to gather all actions for a specific module, you can use
the dbms_monitor.all_actions procedure as an argument. Table 5-1 is sorted by
turning collection on or off, and then tracing or statistics collection.
167
Table 5-1. DBMS_MONITOR procedures

Procedure Name Switch Parameters
Session identifier, serial
session_trace_enable On
number, waits, binds
Instance name, service name,
serv_mod_act_trace_enable On module name, action name,
waits, binds
client_id_trace_enable On Client identifier, waits, binds
client_id_stat_enable On Client identifier
Service name, module name,
serv_mod_act_stat_enable On
action name
Session identifier, serial
session_trace_disable Off
number
client_id_trace_disable Off Client identifier
serv_mod_act_trace_disable Off
action name, instance name
client_id_stat_disable Off Client identifier
serv_mod_act_stat_disable Off
action name
Wait While the Data Is Being Collected

Don’t be impatient. If you are collecting based on a group or type of user, make sure to collect
plenty of data. DBAs tend to collect just enough data to do a cursory analysis, but you want
enough data to perform a convincing analysis.
Query the Appropriate Statistics Collection View

If you are collecting statistics, you must query from the appropriate view before you disable
collection. If you first disable statistics collection and then query, zero rows will be returned.
The view containing the desired statistics is based on how statistics collection was
enabled:
• If statistics were enabled by supplying only the service name without the module or
action, query from v$service_stats.
• If you enabled using the service name, module, and action, query from
v$serv_mod_act_stats.
• If you enabled collection by supplying the client identifier, query from
v$client_stats.
The statistics supplied (shown in the upcoming exercise) are a wonderful combination of
v$sysstat and v$sys_time_model statistics. These statistics provide CPU and IO
consumption information, as well as workload-related data. Oracle did a very nice job with
this feature because it provides both performance firefighting and predictive analysis
statistics.
168
Disable Tracing, Statistics Collection, or Both

If you are monitoring a specific session or a user you can actually see, it’s obvious when to
turn off collection. But when using DBMS_MONITOR, it is common to not be physically
watching the user. Therefore, you will likely need to set some event to disable collection, wait
for the session(s) to disconnect, or when you feel enough data has been collected, manually
disable collection. The disabling procedure names and their parameters are very similar to
their corresponding enabling procedures. See Table 5-1 for a list of the collection disabling
procedures
Consolidate the Trace Files into a Single File

Depending on the identification criteria, Oracle architecture, and number of sessions, a few
minutes of collection could potentially result in hundreds of trace files. With any luck, your
criteria are very specific, but it all depends on what is needed.
The $ORACLE_HOME/bin/trcsess program consolidates trace files. You can even
specify identification criteria, although when using DBMS_MONITOR, you probably have
already effectively done this. The result is a single trace file ready to be formatted by
tkprof, or whatever trace file parser/formatter you choose to use.
Tkprof the Trace Files

The $ORACLE_HOME/bin/tkprof program has been around as long as I can remember.
Oracle continues to improve tkprof by adding wait event and timing information.
Essentially, tkprof parses a trace file and formats for human readability. Other tools
available that may better suit your budget, formatting, and analysis preferences.
Perform Your Analysis

At this point, you may have prepared a trace file for analysis, as well as possibly performance
statistics. Now it’s your turn to take this information and perform a session-level response-
time analysis.
A DBMS_MONITOR Example
Using DBMS_MONITOR the first few times can be confusing. There is a lot involved to
achieve the final reports. To help in this transition, I have included a real-life example here.
To get the most from this example, remember the steps outlined in the previous section. You
will see the spooled output follows these steps precisely. I have divided the key steps and
made comments following each snippet.
The session identification criteria are the same as the logon trigger shown in Figure 5-
20. In fact, the results are based on setting the client identifier using that specific logon
trigger. The group of activity I am interested in has the session’s client identifier set to
WLC_1. Multiple sessions do meet the criteria, and Oracle is setup in a multithreaded
architecture. This ensured multiple trace files would be produced, which makes for a more
interesting (and trust-building) example.
169
SQL> exec dbms_monitor.client_id_stat_disable('WLC_1');

BEGIN dbms_monitor.client_id_stat_disable('WLC_1'); END;
*
ERROR at line 1:
ORA-13862: Statistics aggregation for client identifier WLC_1 is not enabled
ORA-06512: at "SYS.DBMS_MONITOR", line 12
ORA-06512: at line 1
SQL> exec dbms_monitor.client_id_trace_disable(client_id=>'WLC_1');

BEGIN dbms_monitor.client_id_trace_disable(client_id=>'WLC_1'); END;
*
ERROR at line 1:
ORA-13850: Tracing for client identifier WLC_1 is not enabled
ORA-06512: at "SYS.DBMS_MONITOR", line 66
ORA-06512: at line 1
SQL>
SQL> !rm -f /tmp/*.trc
The preceding code snippet ensures both tracing and statistics collection are turned off.
Turning off statistics collection effectively empties the views of all rows. This is why it is also
important to query the statistics before turning off statistics collection!
This particular DBMS_MONITOR session was run on Oracle Database 10g Release 1. If
you are using Oracle Database 11g, the trace file directory structure is about a gazillion9
levels down. As you’ll see in a few snippets later, I reset the trace file directory to /tmp. To
make my job even easier, before I started having trace files placed in this directory, I removed
all the trace files.
9
This is my favorite made-up word for a relatively shocking large number.
170
SQL> col client_identifier format a5 heading “CLID”

SQL> col process format a7
SQL> col machine format a22
SQL> col program format a15 trunc
SQL> col service_name format a10
SQL> col module format a10 trunc
SQL> select client_identifier, process, machine, program,
service_name, module
2 from v$session
3 where type='USER'
4 /
CLID PROCESS MACHINE PROGRAM SERVICE_NA MODULE

----- ------- ---------------------- --------------- ---------- ----------
27088 localhost.localdomain sqlplus@localho SYS$USERS SQL*Plus
1884 localhost.localdomain sqlplus@localho SYS$USERS sqlplus@lo
31285 localhost.localdomain sqlplus@localho prod3 sqlplus@lo
WLC_1 31272 localhost.localdomain sqlplus@localho prod3 mg
WLC_1 31273 localhost.localdomain sqlplus@localho prod3 sqlplus@lo
WLC_2 31263 localhost.localdomain sqlplus@localho prod3 oe
WLC_1 13799 localhost.localdomain sqlplus@localho SYS$USERS mg
WLC_2 13793 localhost.localdomain sqlplus@localho SYS$USERS oe
18 rows selected.
This report was first used to understand how the application set some of the various
session identification columns. I always hope no action will be required to further identify a
session(s) or a user(s). Logon triggers are just plain scary. You mess them up, and the entire
Oracle application can halt. In this case, I felt it was necessary and implemented the logon
trigger that is shown back in Figure 5-20.
The preceding report was used a second time to ensure the client identifier was being set
properly. I do see the client identifier is correctly set by the logon trigger since all users with
an Oracle username of MG have their client identifier set to WLC_1.
While the logon trigger sets sessions with their Oracle username of OE or IN to WLC_2,
I will not be tracing or gathering WLC_2 activity statistics. Setting the client identifier does
not imply you are collecting data.
Based on this report, I believe the client identifier is being set correctly, and it’s time to
enable both tracing and statistics collection.
171
SQL> alter system set user_dump_dest="/tmp";
System altered.
SQL> exec dbms_monitor.client_id_stat_enable('WLC_1');
SQL> exec dbms_monitor.client_id_trace_enable(client_id=>'WLC_1',

waits=>true, binds=>true);
SQL>
SQL> drop table trace_t0;
Table dropped.
SQL> create table trace_t0 as select * from v$client_stats where

client_identifier='WLC_1';
Table created.
SQL>
This code snippet turns on tracing, statistics collection, and sleeps for my specified data
collection interval of 60 seconds. Since I am using only the client identifier to screen sessions
of interest, I enabled statistics collection using the client_id_stat_enable procedure.
I want the trace files to include both wait event details and bind variables (which you can lose
when consolidating trace files) based on the client identifier. Therefore, I enabled tracing
using the client_id_trace_enable procedure.
172
SQL> select t1.stat_name,t1.value-t0.value difference

2 from trace_t0 t0,
3 v$client_stats t1
4 where t1.client_identifier = t0.client_identifier
5 and t1.stat_name = t0.stat_name
6 order by 1
7 /
STAT_NAME DIFFERENCE
----------------------------------------------------------- ----------
DB CPU 225460
DB time 308566
application wait time 0
cluster wait time 0
concurrency wait time 0
db block changes 80
execute count 320
gc cr block receive time 0
gc cr blocks received 0
gc current block receive time 0
gc current blocks received 0
opened cursors cumulative 180
parse count (total) 180
parse time elapsed 20840
physical reads 1032
physical writes 0
redo size 9920
session cursor cache hits 0
session logical reads 1540
sql execute elapsed time 149186
user I/O wait time 2
user calls 440
user commits 20
user rollbacks 0
This report shows the activity during the collection interval only for the sessions of
interest. Also note the use of the v$client_stats view, since we were collecting data
based on the client identifier. We are provided with different types of statistics: DB CPU
(service time), wait times (queue time) related to IO and RAC, workload statistics like db
block changes, execute count, and user calls. This is a wonderful collection
of statistics that can be used for firefighting, response-time analysis, and predictive analysis.
173
SQL> exec dbms_monitor.client_id_stat_disable('WLC_1');
SQL> exec dbms_monitor.client_id_trace_disable(client_id=>'WLC_1');
SQL>
SQL> !cat /tmp/*trc | grep WLC | wc -l
58
SQL> !trcsess output="WLC_1.trc" clientid=WLC_1 *.trc
SQL>!tkprof WLC_1.trc traceout.txt \

waits=yes sort=fchdsk explain=system/manager
TKPROF: Release 10.1.0.3.0 - Production on Tue Jun 3 18:13:46 2008
Copyright (c) 1982, 2004, Oracle. All rights reserved.
SQL> !ls -l traceout.txt

-rw-r--r-- 1 oracle dba 1138 Jun 3 18:13 traceout.txt
The preceding code snippet performs a variety of tasks. First, both statistics collection
and tracing are disabled. Notice I used the proper client identifier. If you do make a mistake
and attempt to turn off collection that is not currently enabled, Oracle will respond with an
error.
I then wanted to know how many trace files were created during the 60-second interval.
The 58 trace files were created and stored in the /tmp directory. All files with a suffix of
.trc and SQL statements with an associated client identifier of WLC_1 were consolidated
into a single trace file named WLC_1.trc. Specifying the client identifier was redundant, but
I wanted to show the capability.
The tkprof utility was invoked, and the formatted file traceout.txt was created
and does indeed exist!
174
SQL> !head -100 /tmp/traceout.txt
TKPROF: Release 10.1.0.3.0 - Production on Mon Jun 2 19:43:34 2008
Trace file: WLC_1.trc

Sort options: fchdsk
**************************************************************************
count = number of times OCI procedure was executed
cpu = cpu time in seconds executing
…
rows = number of rows processed by the fetch or execute call
**************************************************************************
select sum(object_id)
from customers
where status != :"SYS_B_0"
and object_id != :"SYS_B_1"
and rownum < :"SYS_B_2"
call count cpu elapsed disk query current rows

------- ------ -------- ---------- ------- ------- ---------- -------
Parse 105 0.01 0.01 0 0 0 0
Execute 105 0.03 0.02 0 0 0 0
Fetch 210 0.50 0.5 5355 720 0 105
------- ------ -------- ---------- ------- ------- ---------- -------
total 420 0.55 0.54 5355 6720 0 105
Misses in library cache during parse: 1

Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 46
Elapsed times include waiting on following events:

Event waited on Times Max. Wait Total Waited
------------------------------ Waited ---------- ------------
SQL*Net message to client 210 0.00 0.00
db file scattered read 766 0.00 0.07
SQL*Net message from client 210 0.00 0.11
db file sequential read 6 0.00 0.00
**********************************************************************
The preceding trace file snippet is what we would expect, with one exception. It does
contain the standard trace information about CPU consumption and block activity, and it also
contains wait event data including the average wait times. As discussed previously, IO-related
average wait times are very useful in understanding Oracle’s interaction with the IO
subsystem. What is missing though is bind variable details. The SQL statement clearly shows
bind variables are being used (and they are system-generated), but notice there are no
references to the actual bind variables. When multiple trace files are consolidated, don’t
expect to see the individual bind variables. To get this level of detail, you’ll need to look at
the trace file yourself or use another tool.
DBMS_MONITOR is a wonderful Oracle package. While it does not solve all profiling
challenges, it is truly a giant leap forward. It’s flexibility in purpose and use, and the fact that
no additional license is required make it something every Oracle DBA should at least be
familiar with.
175
Active Session History

Some would say instrumentation is the best performance statistics gathering approach, but
others would say periodic sampling is the best. After all, nearly all third-party tools rely on
polling, which is sampling. Just as Iron Man’s Tony Stark would say, “Is it too much to ask
for both?” And this is exactly what Oracle has done.
The classic performance view v$sysstat is based on instrumentation, since when a
database call completes, the respective statistics are updated. The wait interface is clearly an
instrumentation approach, with the additional gettimeofday system calls inserted into
Oracle’s kernel code. The time model views are based on a combination of instrumentation
(since the view is not updated if a session is in the middle of a wait) and sampling due to the
frequent getrusage calls. But with the introduction of Oracle’s active session history, or
ASH for short, Oracle has clearly implemented a very aggressive statistical sampling
collection strategy. So Oracle has indeed begun using both instrumentation and sampling.
While the core session-level performance views (v$sesstat, v$session_event)
have served us well, Oracle knows more accurate metrics are required to support its vision of
a self-adjusting, optimum-performing system. As I have mentioned, two glaring problems
exist. First, when a session disconnects, all session-level details are gone. While the data is
rolled up into v$sysstat and v$system_event, the session-level details, which we
sometimes need, vanish. The other problem, specifically with v$sesstat, is the statistics
may not be updated until after the database call has completed. This can muddy any
performance analysis when long-running SQL is involved. Oracle solves both of these
problems with the introduction of ASH.
But this problem is solved for a price. A license is required to legally query ASH views
or run ASH reports. Since ASH data feeds the AWR, the same legal limitation applies. The
nasty thing about this is that, by default, Oracle’s base product consumes your computing
system resources gathering detailed performance statistics, yet a simple query to retrieve this
information is illegal without the additional license. While not a great public relations move,
it is a shrewd corporate maneuver.10
Oracle took a completely different data collection strategy with ASH. Instead of
instrumentation, ASH uses sampling. Built directly into Oracle’s kernel, active sessions
(server or background) will insert one additional row into the active session table each and
every second. A session is deemed active if it is either consuming CPU or waiting on a non-
idle event, such as an enqueue, latch, or IO. The sample information is buffered for around 30
minutes, but this is dependent on a number of factors. There is also a stunning array of
information stored, such as the running SQL identifier, whether the session is consuming
CPU or waiting on an event, the event name, and session identification information like the
session identifier and the client identifier. If the sample rate is high enough yet not too high,
you get fantastic performance data, without placing a significant burden on the system.
10
Software vendors who rely on good performance data hate ASH with a passion. Third-party performance
vendors spend a considerable amount of time and money creating and maintaining their low-overhead data collection
facility. They would welcome ASH, except Oracle forces their customers to license ASH, which also allows customer
to use Oracle performance products. So now the customer must ask the question, “Why purchase both Oracle’s
performance products and this vendor’s product?” This is a question performance product vendors want to avoid.
176
Why ASH Is a Big Deal

I am often asked why ASH is such a big deal. There are four reasons why ASH truly is a very
big deal: back-in-time capabilities, configurable low-impact kernel-level data collection, clean
data, and the connection of session activity with resource consumption (think service time)
and wait time.
Suppose you received a call from a user who said performance was poor about 30
minutes ago, but now performance is just fine. With ASH, you can easily go back in time, and
with a very simple query, perform a session-, instance-, or system-level ORTA. Usually,
when someone calls with a performance problem, we get on the system and start running
scripts. But even if our scripts are interval-based, they are probably based on the present going
forward. This means if the problem is gone, so is our capability to diagnose that problem.
Since ASH collects data and buffers it, that data is there for us to use in our “what just
happened?” diagnosis. Even better, the ASH data is written into the automatic workload
repository tables, creating the capability to generate advanced reports based on just about any
time frame.
As discussed earlier in this chapter, modern Oracle architectures make linking a user’s
activity to an Oracle server process challenging. ASH helps in this endeavor because it stores
not only the session identifier and serial number, but also the client identifier, the Oracle
username, program, module, and action!
Let’s not forget that ASH is built directly into Oracle’s kernel, as opposed to a third-
party data collection facility that either samples from Oracle’s SGA or samples sessions
through the SQL interface. While both approaches can work well, keeping the collection
running constantly requires a professional-grade product. I have personally spent a
considerable amount of time designing and building data collection tools. It is extremely
difficult to perform a low-impact collection on a large system while supplying accurate data.
Most third-party vendors spend a lot of time and money on their collection facilities. What
anyone who has tried to create collectors quickly learns is that what initially seems like a few
simple SQL queries turns out to be a daunting technical challenge. This is one reason why
performance products seem relatively expensive.
ASH is also configurable. Depending on your Oracle release, there are around a dozen
hidden ASH-related instance parameters. One in particular, _ash_sampling_inteval,
controls how often ASH samples. The default is set to once each second, but this can be
changed to increase statistical accuracy or to decrease collection impact.
I think you’ll agree that ASH is quite a technical achievement and can be very useful
during performance firefighting.
A Demonstration of ASH Capabilities

Before I delve into ASH internals, I will demonstrate a few ASH-based reports to give you a
taste of what you can do with ASH. While OraPub’s OSM tool kit contains around a dozen
ASH scripts, I will demonstrate only a few. The underlying SQL is very simple and you can
download the scripts for free and analyze them. Just keep in mind that what you’ll see here is
just a fraction of what you can do with ASH. My intention is to show you ASH’s flexibility,
ease of use, and spot-on ORTA diagnostic power.
Figure 5-21 is a simple instance-level ORTA report, yet it has significant differences
that highlight ASH’s power. First, notice the initial parameter of 15. This parameter sets how
far back in time we want to start reporting. The reporting interval is set to start 15 minutes
177
into the past and end at the present. In addition to the wait event details, ASH’s
session_state indicates whether the session was waiting for a non-idle event to complete
or consuming CPU. This simple column enables us to determine the percentage of time
waiting (queue time) and consuming CPU (service time). And because the wait event names
are also stored, we can classify the queue time as we wish. As you know, I prefer simple
classifications, and those can be seen in Figure 5-21.
SQL> @ashrt 15 %

Report: ashrt.sql OSM by OraPub, Inc. Page 1
ASH Response Time Analysis Report (last 15 min, SQL_ID=%)
% Service Time % Queue Time % Qt IO % Qt Other

(cpu) (wait) (wait) (wait)
-------------- ------------ --------- -----------
36.55 63.45 95.29 4.94
Activity Detail % Time

-------------------------------------- -------
CPU 36.78
4 rows selected.
Figure 5-21. Very similar to any Oracle response-time report, except this report is based on
ASH’s sampled data starting 15 minutes ago until the time of the report run. Overall response
time can be significantly decreased by focusing first on IO activity and then on CPU
consumption.
Figure 5-21 shows that considering all ASH rows over the past 15 minutes with a
session_state of either WAITING or ON CPU, 37% of the rows have a
session_state of ON CPU and 63% of the rows have a session_state of
WAITING. Said another way, over the past 15 minutes, 37% of background and server
process time was spent consuming CPU and 63% of their time was spent queuing for some
resource. Because every ASH row with a session_state of WAITING has the currently
waiting event name in its event column, we can easily determine that of the 63% queue
time, 95% was spent waiting for an IO call to complete. I could have easily further classified
the IO time, but I think you get the point. We need to determine the SQL related to CPU
consumption (37% of all Oracle sessions’ response time) and also the SQL related to IO calls.
We can use ASH to determine them both!
Figure 5-22 ranks SQL statements that waited for scattered reads over the past 15
minutes. Figure 5-21 shows 95% of the wait time is IO-related and 40% of the IO waits are
related to multiblock requests (see the db file scattered read entry). Figure 5-22
shows that just one SQL statement is responsible for 32% of the multiblock reads. So an
application-focused solution is to tune the SQL statement 3r5xuxmggzwt8.
178
SQL> @ashsqlpcte 15 db%scat

Report: ashsqlpcte.sql OSM by OraPub, Inc. Page 1
Instance Level ASH: Find SQL by event (last 15 min)
% Time
Wait Event SQL ID ADDRESS HASH_VALUE Waited
----------------------------- ------------- -------- ---------- -------
db file scattered read 3r5xuxmggzwt8 5516AE70 3741315880 31.76
db file scattered read ajgxt6x8dmsay 55159240 1356456286 14.71
db file scattered read fp67qwqz44m2m 550554C4 3192015955 3.53
db file scattered read 0fbbnwzj80b0q 5504BADC 3800050710 2.94
db file scattered read 22zj7nt1bv1qc 54FA504C 1119717068 2.35
db file scattered read dmbxunrzpcj92 54FDA630 4283843874 2.35
Figure 5-22. Each ASH sample is related to a single SQL statement and if waiting, there will
be an associated wait event. In this situation, we want to list the SQL that has been waiting
the most for multiblock reads over the last 15 minutes.
A subtle yet very distinct analysis step occurred. Without ASH,11 when performing our
ORTA, the wait event db file scattered read directs us to look for SQL requesting
blocks outside Oracle’s cache. We would typically look at the view v$sql and sort the SQL
by the disk_reads column. Sure enough, we would discover a few SQL statements
responsible for most of the scattered reads. But the situation is entirely different with ASH.
Because v$activity_session_history session samples contain both the SQL
identifier (column SQL_ID) along with the wait event (column EVENT), we absolutely know
the SQL associated with the wait event. So we don’t infer the associated SQL, which does
leave a little room for mistakes.12 As long as ASH samples frequently enough, the chance of
identifying the wrong SQL statement is about zero.
Figure 5-23 shows the same ASH response time report shown in Figure 5-21, with one
very important difference. This time, we want the response time breakdown for a specific
SQL statement, not all the SQL statements. The second parameter is the SQL statement
identifier. If we enter a %, then all statements will be considered; otherwise, only the specific
SQL identifier we enter will be considered.
As we should expect, Figure 5-23 shows the top scattered read statement did indeed
spend a considerable amount of its response time (50%) over the past 15 minutes waiting on
multiple block requests to complete. This report confirms tuning this single statement focused
on multiblock reads will significantly impact the overall system’s physical IO requirements
and also this particular SQL statement’s response time.
11
Obviously, building a tool to sample v$session can be done. But on a large Oracle system, directly
sampling from v$session and then storing that information can easily take over a minute!
12
Now don’t panic. Every time I have checked, my inferences and ASH data match perfectly. We just need to
be a little more careful and understand what we are viewing.
179
SQL> @ashrt 15 3r5xuxmggzwt8

Report: ashpct.sql OSM by OraPub, Inc. Page 1
ASH Response Time Analysis Report (last 15 min, SQL_ID=3r5xuxmggzwt8)

-------------- ------------- -------- -----------
5.45 94.55 96.15 4.00

-------------------------------------- -------
CPU 5.45
4 rows selected.
Figure 5-23. This is the same ASH-based response time report as shown in Figure 5-21. The
difference is in the second parameter, the SQL identifier. We now want a response time
profile for a single SQL statement over the past 15 minutes. This particular SQL statement is
waiting a significant amount of time for IO requests to complete.
Our ASH-based instance-level response-time analysis report shown back in Figure 5-21
indicated that over the past 15 minutes, active sessions where consuming CPU 37% of the
time. So it’s important we identify those CPU-consuming SQL statement(s) and tune them
focusing on CPU consumption, which means logical IO. Figure 5-24 shows the top CPU-
consuming SQL statements over the past 15 minutes. As I mentioned, each ASH row contains
the column session_state. If the column value is ON CPU, we know the SQL statement
is currently consuming SQL and not associated with a wait event. Figure 5-24 shows that,
considering all the ASH rows over the past 15 minutes, the SQL statement
ajgxt6x8dmsay accounted for 56% of all the ON CPU rows. This statement needs to be
tuned focusing not on disk reads, block gets, or physical IO activity, but instead on buffer
gets—that is, logical IO activity.
As we did with the top physical IO SQL statement (Figure 5-23), we also want to
understand the top CPU-consuming SQL statement’s response-time profile. Figure 5-25 is
again based on the response time report, ashrt.sql. But just as in Figure 5-23, we want the
response-time profile for a specific SQL statement. The result, shown in Figure 5-25,
indicates the top CPU-consuming statement when running spends 73% of the time consuming
SQL! We can reduce overall SQL statement CPU consumption by either tuning this specific
statement or by reducing its execution rate. Altering the execution rate is known as workload
balancing, and while not a technically exciting solution, it can have a massive impact on CPU
consumption during peak activity and business times.
180
SQL> @ashsqlpctcpu 15

Report: ashsqlpctcpu.sql OSM by OraPub, Inc. Page 1
Instance Level ASH: Find CPU SQL (last 15 min)
% Time
SQL ID ADDRESS HASH_VALUE CPU
------------- -------- ---------- -------
ajgxt6x8dmsay 55159240 1356456286 56.10
3r5xuxmggzwt8 5516AE70 3741315880 3.66
39hstnr8a89q8 55172280 3500418760 1.83
Figure 5-24. Our response-time report shown in Figure 5-21 indicated a significant portion
of the response time is related to CPU consumption. This ASH-based report shows the top
CPU-consuming statements over the past 15 minutes. When a SQL statement is consuming
CPU, its session_state is ON CPU.
SQL> @ashrt 15 ajgxt6x8dmsay

Report: ashrt.sql OSM by OraPub, Inc. Page 1
ASH Response Time Analysis Report (last 15 min, SQL_ID=ajgxt6x8dmsay)

-------------- ------------- --------- ------------
72.80 27.20 94.12 6.25

-------------------------------------- -------
CPU 72.80
4 rows selected.
Figure 5-25. Based on Figure 5-24, we know this particular SQL statement is the largest
CPU consumer. This report shows us the time classification for this particular statement.
Clearly, focusing on CPU-reducing tactics, like reducing logical IO activity, will have a
significant performance-improving impact.
ASH can be used in other ways also. Because it contains the client identifier as well as
the session identifier and serial number, we can create an entire group of similar reports. Just
as we drilled down from our initial instance-level response-time analysis in Figure 5-21 into
both CPU consumption and wait time, we could have also drilled down to the session or to the
sessions tagged with a specific client identifier. In fact, we could have easily performed an
ORTA based on a specific client identifier. The possibilities are endless and powerful. I hope
you have gained an appreciation for how ASH can be used during performance firefighting.
181
ASH Data Collection and Architecture

ASH can be viewed as simply a data collector, but it’s Oracle’s adjustable kernel-level-
embedded data collector. And as demonstrated in the previous section, it can produce an
amazing array of extremely useful and pinpoint accuracy response-time-based diagnostic
reports. How ASH collects its data is the focus of this section.
ASH is based on a sampling methodology, whereas both Oracle’s wait interface and
instance statistic views (e.g., v$sysstat) gather their data based on instrumentation.
Sampling is not an unusual way to gather information. It is used in many disciplines. Nearly
all Oracle third-party performance products gather their performance data by polling, which is
sampling. Sampling is used outside the Oracle community also; for example, with statistics,
signal processing, music, compression, financial auditing, and quality control.
Sampling simply takes a look at what is currently occurring and makes note of it. With
enough samples, we’ll get a good idea of what has happened in the past. While sampling does
not provide perfect information, with enough samples, statistically the data is just as good as
instrumentation and possibly at a much lower overhead.
For example, suppose ten samples were taken over a 10-second period. During two of
the samples, an Oracle process was consuming CPU and the other eight were waiting for IO.
Based solely on these samples, we can infer 20% of the time the process is consuming CPU,
while 80% of the time it is waiting for IO. Since we know the duration is ten seconds, we can
also infer the process was consuming CPU for about 2 seconds and waiting for IO about 8
seconds. While we do not know the exact figures and we could increase the accuracy by
increasing the sample rate, we know enough to understand the situation, resulting in a stellar
diagnosis.
During each sample period, every session (both server process and background process)
that is consuming CPU or waiting on a non-idle wait event will be recorded, resulting in an
additional v$active_session_history row. Even if the session is still in the same
wait as the previous sample—for example, a very long table lock—there will be an additional
ASH row inserted.13
The trick with sampling is to gather just enough samples to be meet the requirements
while not disrupting the system of interest. And this is the central challenge all data collection
facilities face. Because Oracle’s ASH collection facility is built directly into its kernel, it has
the distinct advantage of lower overhead. While ASH gathers session-level details from active
sessions and stores the information in an ASH buffer located in the shared pool, the sampling
frequency is adjustable via the _ash_sampling_frequency instance parameter.
From a v$active_session_history perspective, a new row is inserted for every
active session during each sampling period. So if you decreased the sampling frequency from
once per second to once every 2 seconds, you would notice a 50% drop in the number of ASH
rows. Assuming the size of the ASH buffer is fixed, decreasing the sample rate would also
allow reporting further back in time.
ASH buffers are stored in the shared pool. Oracle wants to keep around 30 minutes of
history, but the more active sessions, combined with the shared pool size and the sample
frequency, will obviously influence how far back in time ASH activity is available. ASH
13
As a side note—and I have tested this—the ASH sample frequency does not affect the time model views
update frequency.
182
space could be as little as 1MB or as much as 30MB.14 To see the current memory dedicated
to ASH, execute the following:
select bytes/1024/1024 from v$sgastat where name='ASH buffers'
You may be able to specify memory allocated to ASH by setting the _ash_size (in
bytes) instance parameter, but experience has shown Oracle does not always respect this
setting.
By default, ASH is enabled on all Oracle Database 10g and above systems. With the
instance parameter statistics_level=typical (default), ASH will be enabled. You
can also directly turn ASH on or off by setting the instance parameter _ash_enable to
either true or false. By default, when ASH buffers are written to the AWR tables, only
one out of every ten samples is written. This is controlled by the
_ash_disk_filter_ratio parameter, which has a default value of 10.
Personally, I would never change ASH parameters unless I had a very good reason for
doing so. Even more risky would be to disable ASH. As Oracle turns increasingly to
automated performance management, it needs the best performance data available, and it gets
much of this data from ASH.
The two background processes MMON and MMNL are deeply involved with ASH
activity. MMON, short for Manageability Monitor, wakes up every few seconds (but like
DBWR and LGWR, it can be woken up by another process) and is involved in writing ASH
buffers to the AWR tables. MMNL, short for Management Monitor Light, is responsible for
gathering active session details once every second (default). It gathers operating system
information (for example, from /proc/stat) and is also involved with writing ASH
buffers to the AWR tables.
One way of visualizing the ASH buffers is to think of a ring structure. This model helps
you understand how new ASH records are written into memory, how old ASH records are
overwritten, how ASH buffers are referenced via the v$active_session_history
view, and how ASH records are written to the AWR tables.
By default, every second, the MMNL background process wakes up and gathers active
session information and writes the information in the ASH buffers, starting from where it last
finished writing. This means the oldest ASH buffer resides just before—just ahead of—the
next MMNL ASH buffer write. Since the ASH buffers form a ring structure, unless the
records are archived, they will eventually be overwritten. Oracle will not allow this, so once
an hour, or when the ASH ring structure is getting full, the MMNL and MMON background
processes are woken and are involved in archiving ASH buffers to the AWR tables. Picture
this archiving going clockwise, or forward in time.
While it is common to hear about the ASH buffers being flushed, it is not a true flush,
because the data is not cleared or erased. It remains, so we are more likely to find ASH data in
memory. Oracle’s AWR facility manages how much and the granularity of AWR data to keep
before the rows are physically deleted from the AWR tables.
When the v$active_session_history view is queried, you can think of it as
starting at a specific period in the past and then progressing forward in time (moving
clockwise). While most ASH queries pull data from a specific number of minutes in the past
and move forward until the most recent ASH record, because ASH records are indexed and
14
This maximum value will certainingly change from one Oracle version to the next, so don’t look at this
number as a fixed maximum, but more of an indication of how much memory Oracle may want to keep specifically
for ASH buffers.
183
stored by time, the report start and stop time can be anything you wish, as long as the data is
stored in the ASH ring structure.
I am commonly asked just how much of an impact ASH places on the system. The
answer is highly variable, based on CPU speed, the number of active sessions, the sampling
frequency, and most important, the cost of CPU power.15 But I have run some tests to get an
idea. On a very lightly loaded testing system, over a 3-minute period, the MMON and MMNL
background processes together issued around 3,500 gettimeofday calls, and each call
took between 0.03 ms and 0.08 ms. This means that in the worst case (0.08ms), over an hour,
5.6 seconds of CPU time were occupied by just these two background processes. That is an
amazingly lightweight data collector. I have seen commercial-grade data collectors consume
over a minute of CPU time every hour.
Interestingly, my tests show Oracle’s wait interface places significantly more load on the
database server than ASH. For example, simply operating system tracing an active server
process indicated that 2,938 gettimeofday calls were issued over a 3-minute period, and
each call took 0.017 ms. Another 3-minute sample of an active server process made 89,555
gettimeofday calls, with each taking around 0.077 ms. As you can see, the impact is
highly variable. My first example would consume 999 ms (2938 × 20 × 0.017) of CPU per
hour, and the second example would consume 138 seconds (89555 × 20 × 0.077) of CPU per
hour. Therefore, wait interface overhead is highly variable. Since every system call is
instrumented, many short waits would create more overhead than fewer longer waits. Keep in
mind that this is just for a single server process, not the total for ten, hundreds, or even
thousands of server processes. So if you’re concerned about ASH overhead, you should be
very concerned about the wait interface overhead.
ASH is fascinating from both an architectural perspective and a usefulness perspective.
Many Oracle internals topics are interesting, but few have the potential to be useful in our
firefighting efforts. If you have the license to query from the ASH view, I think you’ll find
using ASH in your OraPub 3-circle analysis extremely beneficial.
Summary
This brings us to the end of a wide variety of performance diagnosis topics. Most people in IT
think Oracle performance optimization is centered on superior Oracle internals knowledge. I
disagree. I believe those who use a solid method, perform a spot-on diagnosis, and then
appropriately apply Oracle internals are less likely to guess and much more likely to be
successful performance firefighters.
We have come to the end of the performance diagnosis and methods chapters. Now it’s
finally time to begin digging deep into Oracle internals!
15
There is a kind of double hit in this situation. Because many hardware and software licenses are based on the
number of CPU cores, the more CPU consumed by performance software, the more any CPU-related license will
cost. So the cost of CPU consumption is not just the simple CPU time consumption calculation, but is also related to
the cost of software licenses.
184
CHAPTER
6
Oracle Buffer Cache
Internals
Oracle’s buffer cache is a fascinating topic. It has everything you need for a fantastically
horrific yet exciting performance nightmare: high concurrency, latches, locks, queues, lists,
nodes, in-memory objects, and more. Add in a few irate users, and you’ll have stories to tell
you grandkids. With so many algorithms and pieces of kernel code simultaneously in
operation, contention opportunities abound. Fortunately for us, an analysis focused on
response time can detect specifically where the contention resides and, when combined with a
solid understanding of the related internals, a number of possible solutions will result.
Oracle internals can become so interesting and addictive to people like us that within a
few minutes, we will be talking about something that is truly fascinating but has relatively
little practical value. Avoiding this has always been my challenge when teaching and writing.
Here, I will attempt to balance my excitement with how things work with what is actually
useful for firefighting. While there are many buffer cache topics, if it will not help you fight
performance fires in a practical and realistic way, I will not write about it. If your curiosity is
aroused, you can find many resources with further details about these internals.
For each topic, I will first present the general situation, followed by the related
architectural aspects. Next, I will move on to how the structures can be stressed in such a way
to cause performance degradation. Then I will present a number of solutions that directly
address the problem. For each solution, I will explain why and how it can improve the
situation and how the specific change will affect your system. This will allow you to better
185
Oracle Buffer Cache Internals
anticipate the effects of your change. In the final chapter of this book, I will present ways to
quantify the expected solution change to help prioritize your recommendations.
Big Expectations
While Oracle has a number of caches, the three I will cover in this book are the buffer cache,
the shared pool, and the redo log buffer. These three caches are where most of your
performance firefighting situations will arise. Surely, other problem areas exist, but when
problems quickly and unexpectedly arise, from an Oracle-centric perspective, they usually are
related to one of these caches.
Oracle has a big problem: When its customers purchase more memory, they expect
performance to improve. Yet bigger caches require more CPU cycles to manage, and Oracle’s
algorithms, which were developed with the expectation of more IO and less memory, become
stressed in ways they were not originally intended to handle. The result can be bizarre and
deep contention related to memory structure.
For example, here is what Oracle has to say about the TimesTen database (which it
purchased):
Oracle TimesTen In-Memory Database delivers real-time performance by

changing the assumptions around where data resides at runtime. By managing
data in memory, and optimizing data structures and access algorithms
accordingly, database operations execute with maximum efficiency, achieving
dramatic gains in responsiveness and throughput, even compared to a fully
cached disk-based RDBMS.1
Right away, Oracle states the assumption has changed. The assumption is the data
resides in memory. In contrast to the standard Oracle kernel, where a balance must be struck
between disk IO and in-memory operations, the TimesTen database is specifically designed
for in-memory operations. It even goes so far as to say that an in-memory Oracle database
cannot match the TimesTen performance! Clearly, trade-offs have been made to break speed
barriers in specific processing areas.
The Oracle relational database management system (RDBMS) is not optimized for only
in-memory management. I remember the day a small team in my consulting group did some
testing using the standard Oracle kernel, but with a lot of memory. So much memory, in fact,
that the entire database was cached. While our performance tests did run faster, we were
shocked that they did not obliterate the baseline tests. What we learned that day was that the
standard Oracle kernel is not optimized for in-memory operations. This is because it must
strike a balance between in-memory operations and on-disk operations. As Oracle caches
become increasingly larger, finding that optimal balance is a technically daunting challenge.
Finding the optimal in-memory to IO operations balance becomes even more difficult
without changing core algorithms. So, as you might expect, Oracle has changed its core
algorithms to keep up with larger caches and increased CPU processing speeds. Change
represents risk. Oracle does not want to change its code unless it absolutely must. Unless you
have worked on a large software product on which the entire existence of the company is
1
From the “Oracle TimesTen In-Memory Database” document
(http://www.oracle.com/technology/products/timesten/pdf/ds/ds_timesten_imdb.pdf).
186
based, this can be difficult to appreciate. Even the slightest change can cause unexpected and
cascading performance issues, not to mention functional problems. So Oracle must see clear
advantages and be highly motivated to take the risk of changing its code. As you will learn in
this chapter, Oracle has taken this risk and continues to take this risk.
One creative way to deal with balancing in-memory and IO operations is to develop self-
adjusting algorithms. For example, if IO operations begin to slow down, then automatically
adjust the algorithms for decreased IO operations. Or if there is an abundance of CPU power,
then automatically shift processing to consume CPU with the intention of improving
performance. This is one reason why good performance statistics are so important to Oracle.
Fortunately for us, Oracle makes much of this information available to us.
What Is a Buffer?
An Oracle buffer is an Oracle segment’s cached block. As you might expect, an Oracle buffer
starts off containing the same information as in the Oracle block. A buffer’s contents depend
on the type of segment and whether it’s a segment header block.
Figure 6-1 is just one way to model an Oracle data block. Oracle buffers are indeed
cached in Oracle’s buffer cache. While I will delve deeper into data blocks later in this
chapter, my point here is that a buffer simply represents the block on disk. But if a block is
accessed, a difference between the buffered block and the on-disk block can occur.
Figure 6-1. A data block representation highlighting the three main block parts
While there are many buffer states, as indicated by the state column in v$bh, they
can be practically summarized into three modes: free, dirty, and pinned. The algorithms I’ll
present in this chapter are closely associated with a buffer’s state.
Free Buffers
A buffer is free when it matches the block on disk. I commonly refer to a free buffer as a
mirrored buffer, because it mirrors what is on disk. Figure 6-2 shows just how simple it is to
determine the number of free buffers in the buffer cache. A free buffer may indeed be empty
(for example, after an instance restart), but it will most likely contain real block information,
such as rows. A free buffer can be replaced without any corruption because there is a copy on
disk. Of course, if a transaction commits, then at a minimum, the buffer change must be
recorded in an online redo log.
187
SQL> select count(*) from v$bh where status='free';
COUNT(*)
----------
440
Figure 6-2. The number of free buffers can be determined by issuing a simple v$bh-based
query looking at the status column.
A free buffer can be very unpopular. Perhaps a query needed to look at a single row, and
therefore required the block to be brought into the cache, yet the buffer was never accessed
again. On the other hand, a free buffer can also be very popular. For example, if a particular
block is repeatedly queried, it becomes relatively popular, yet still free, because the buffer has
not been changed. If you keep the free buffer definition simple and pure, many of Oracle’s
algorithms also become clearer, which makes understanding, detecting, and resolving
contention easier.
Dirty Buffers
A buffer is dirty when it does not match its associated block on disk. Any change to a buffer
makes it dirty because it will no longer match the block on disk. Dirty blocks cannot be
replaced, as this would overwrite an in-memory change that has not yet been written to disk.
Once a database writer 2 writes a dirty buffer to disk, the buffer and the on-disk block once
again match, and the buffer becomes free.
A dirty buffer can be unpopular. Suppose a row is updated but no other process is
interested in the buffer. Since the row changed, the block it is indeed dirty, but it’s also not
popular. Of course, there are popular dirty buffers3 as well. Simply repeatedly updating a row
will ensure its buffer is both dirty and popular.
Figure 6-3 shows dirty buffers having a status of either xcur or write. I will detail
current and consistent mode buffers in the upcoming section on cache buffer chains. The
xcur status means a process has changed a current mode buffer’s status to this status, and
processes can now update rows in the buffer, although the rows are still subject to other
conditions, such as row-level locking. My tests demonstrate an exclusive mode does not
prevent multiple users from changing multiple rows in the same buffer; it simply signifies the
current mode buffer can be changed. This becomes critical in a Real Application Clusters
(RAC) environment, where there can be multiple shared current mode buffers (mode of
scur), but only one exclusive current mode buffer in the entire RAC database.
2
The database writer is more correctly called the dirty buffer writer, since it writes only dirty buffers from the
buffer cache to disk. But it is nearly always referred to as the databse writer (DBWR for short). In fact, if you said
“dirty buffer writer,” you would get some blank stares.
3
Speaking as a buffer, it’s my goal to remain in the cache. Being a popular buffer will increase my likelihood
of remaining in the cache. And being a popular dirty buffer, well, that carries a certain mystique.
188
SQL> select status, count(*) from v$bh where dirty='Y' group by status;
STATUS COUNT(*)
------- ----------
xcur 5167
write 124
Figure 6-3. The number of dirty buffers can be determined by issuing a simple v$bh-based
query looking at the dirty column.
Pinned Buffers
When a buffer is pinned, it cannot be replaced. Another way to look at pinning is as kind of an
unofficial lock on a buffer. Since a buffer is not a relational structure, standard locking
mechanisms do not apply. Pinning relates to a specific buffer, whereas latches or mutexes can
control access to an entire group of buffers. Pinning can be used in conjunction with latching
and locking to ensure the proper serialization, protection, and concurrency control are
achieved.
Suppose you are a server process that is reading a row in a buffer. It would be extremely
rude of someone to replace that specific buffer with another buffer while you were still
accessing the row. It would be like you were reading a book, and someone saying, “Hey, let
me see that!” and ripping it out of your hands. In fact, placing your hand on a book is a great
picture of pinning a buffer. Whenever there is at least one hand on the book, then no other
process can take the book away. Many processes can pin the same buffer (read the same
book), but as long as one process has the buffer (the book) pinned, it cannot be replaced.
When a free buffer’s row is being queried, its status changes from free, to pinned, to free
again. When a row in a free buffer is changed, its buffer status changes from free, to pinned,
to dirty.
Oracle does not expose pinned buffers through v$bh, but any buffer that is touched has
also been pinned. Oracle will also pin a buffer when it is being moved onto the write list and
while updating its touch count (both discussed later in this chapter).
The Role of Buffer Headers

While buffers do reside in the buffer cache and buffers are indeed changed, list management
acts on buffer headers, not the actual buffers. A buffer header is an optimized in-memory
metastructure that contains information about a buffer and its associated block, but does not
contain block data such as rows.
Figure 6-4 shows the one-to-one relationship between a buffer header, its associated
buffer, and its data block. As illustrated in Figure 6-4, the buffer headers contain, in part,
where a buffer resides in the buffer cache, where the associated block resides on disk, and
when the buffer was last read from disk. If you hear kernel developers talking about a DBA,
they are not referring to you! They are referring to a block’s disk address, which is called the
data block address, or DBA for short.
189
Figure 6-4. There is a one-to-one relationship between a buffer header (BH 100), its cached
buffer (CB 330), and its on-disk data block (DBA 5,320). List manipulation occurs at the
buffer header level, buffer changes occur in the cached buffer, and block changes occur at the
disk level.
Have you ever wondered why there is no view v$bc for the buffer cache? That’s
because a buffer and a block’s metadata are stored in the buffer header, and it’s the metadata
we usually need for our performance analysis. So the view is named v$bh, for buffer header.
There are three key lists or chains:
• Cache buffers chains (CBCs) are used to quickly determine if an Oracle block resides
in the buffer cache.
• The least recently used (LRU) lists are used to keep popular buffers in the cache and to
find free buffers.
• The write lists contains dirty buffers that will be soon written to disk.
It is important to understand that the buffer headers, not the actual buffers, make up
these three lists. I will provide details about these lists later in this chapter.
Figure 6-5 shows the relationship between a single buffer header and the various lists.
For example, buffer header BH 100 is on LRU 1 and also on CBC 2. Buffer header BH 180 is
on CBC 1 and also on Write List 1. A three-dimensional figure would show that a single
buffer header always resides on a CBC and either an LRU chain or a write list.
While I will detail the list manipulation in the following sections, manipulation of the
three lists occurs at the buffer header level, not at the buffer level, and certainly not at the data
block level. Many of us have been taught that while the buffers reside in the buffer cache, the
buffers themselves are linked. This is incorrect. Each buffer is associated with a buffer header,
and it is the buffer headers that are manipulated in the various lists.
190
Figure 6-5. Each buffer header resides on a CBC and either an LRU chain or a write list. For
example, buffer header BH 150 resides on both LRU 1 and CBC 3.
Cache Buffer Chains

Put simply, CBCs are used to answer the question, “Is the buffer in the buffer cache, and if so,
where does it reside?” This is essentially a search-type question. Many types of searching
algorithms can be used to get the answer: binary tree, B+ tree, B* tree, sequential search,
hashing algorithm, or some combination. Oracle chose to use a hashing algorithm, followed
by a quick (we hope) sequential search.
Introduction to Hashing
Hashing algorithms can be extremely fast, since the entire structure is typically stored in
memory and requires a single mathematical calculation, along with perhaps a few memory
accesses, to answer the search question. There are many hashing structure variations, but all
consist of a hash function, hash buckets, and hash chains.
Hash Functions
Hash functions take an input and produce an output within a defined range. The input is called
a hash value. Figure 6-6 shows the basic format along with a classic hash function. The x
mod 10 function can easily be used to ensure that, regardless of the positive integer hash
value input, the result will always reside between 0 (zero) and 9. With a hash value input of
11, the output will be 1. This is commonly spoken as, “Eleven is hashed to one.”
191
Figure 6-6. A hashing function maps the hash value input to a defined range. Part a shows
the general hashing algorithm structure; part b shows a classic hashing algorithm.
Regardless of the positive integer input, the output will always be between 0 and 9.
A good hash function will yield evenly dispersed outputs. For example, regardless of the
hash value, any output is just as likely to occur as any other output. More specifically, if the
output range is between 0 and 9, looking at actual outputs, there are just as many 0 values as
1, 2, 3 … 9 values. Figure 6-7 shows a histogram depicting this near-perfect result.
Figure 6-7. When given random numbers as hash value inputs, even a simple modulus hash
function will provide an even distribution of outputs.
Given 1,000 random numbers between 0 and 1,000, the modulus hash function does in
fact produce the histogram shown in Figure 6-7. But suppose the situation is not quite so
random. Suppose the first 500 numbers are from 0 to 5 and the remaining 500 are between 0
and 1,000. Using the same modulus 10 hash function, the result is the histogram shown in
Figure 6-8. This is unfortunate, because clearly the hash function does not return evenly
dispersed numbers.
This is indeed the situation with Oracle! Oracle blocks are not randomly accessed
because SQL is not randomly run from random users doing random tasks (at least, we hope
not). So unless Oracle uses a more advanced hashing algorithm, the hashed results can easily
look like those in Figure 6-8. This forces Oracle to develop a specific hashing function to
transform a Figure 6-8 situation into a Figure 6-7 situation. As you might expect, Oracle’s
buffer hashing algorithm is proprietary information, but I would be willing to wager that
prime numbers are involved somehow.
192
Figure 6-8. Given nonrandom numbers, a simple modulus hash function is not likely to
produce an even distribution, but more like what this figure shows. This graph is based on
1,000 inputs, the first 500 between 0 and 5, and the remaining 500 between 0 and 1,000.
When Oracle is searching for a buffer, the hash value is created based on a combination
of the block’s file number and block number, which is known as the data block address
(DBA). So the hash function essentially hashes on the buffer’s file number and block number.
It is a very convenient situation and allows for a very quick hash.
Hash Buckets
Hash value inputs are hashed to buckets. Every output value represents a single bucket. For
example, because the modulus 10 hash function shown in Figure 6-6 can produce ten possible
output values, there will be ten hash buckets. If the input value were 9, then it would be
hashed to bucket number nine. It’s that simple.
In many hashing situations, the number of possible hash value inputs exceeds the
number of buckets. The examples I just gave are just like this, since there are only 10 buckets,
yet there are 1,000 hash input values. In regard to Oracle, the number of possible hash value
inputs is the number of Oracle database blocks. But at any one time, the number of hash value
input values will equal the number of buffers in the buffer cache.
When two hash values hash to the same bucket, it is called a collision. Collisions are
common with hashing. They can be minimized by increasing the number of buckets, but can
be disastrous for high-performance applications—such as an Oracle system. As an example,
suppose a modulus 10 hash function had 1,000 possible hash value inputs. This would ensure
collisions. To avoid collisions, a hashing algorithm outputting a perfectly even output
distribution would need 1,000 hash buckets. If the hashing algorithm were not perfect, as none
are, then to reduce the likelihood of collisions, the number of output buckets could also be
increased. To summarize, two ways to decrease collisions are to use a superb hashing
algorithm and to have plenty of hash buckets. If the hashing algorithm is not changeable, then
you may have the option to increase the number of hash buckets.
193
Hash Chains
Each hash bucket has an associated hash chain. When a searched-for object is hashed to a
bucket, the bucket’s chain is sequentially searched looking for the object. If the object is not
found in the hash chain, we know the object is not anywhere in the entire hashing structure. If
the hash chain is short, the sequential search occurs extremely fast. Even better is a zero chain
length if the object is not in the cache.
Oracle’s CBC structure, as this is called, is a complex memory structure, and Oracle
must maintain serialization control. So it employs one of its serialization structures: a latch or
a mutex. As of Oracle Database 11g, Oracle is still using the latching scheme.
Figure 6-9 is a small example of an Oracle hashing structure consisting of two CBC
latches (CS 800 and CS 810), six CBCs (CBC 00 through CBC05), and eleven buffer headers
(BH nnnn). This is a very small example, but it does show all the key pieces with their proper
associations. It is common for Oracle CBCs to have a zero length. Oracle does this to help
minimize collisions. As I’ll describe next, Oracle goes to great lengths to minimize collisions,
and only a very unfortunate situation would lead to a buffer chain more than six buffer
headers long.
Figure 6-9. A representation of Oracle CBC structure consisting of two CBC latches, six
cache buffer chains, and eleven buffer headers. Notice both control structures (CS 800 and
CS 810) cover three CBCs, reducing the likelihood of CBC latch contention.
CBCs in Action
Let’s now put this all together and walk through an example. Figure 6-9 will represent the
entire CBC structure. In this example, I will make certain abstractions or simplifications,
focusing on areas that I have already covered and that are useful for performance firefighting.
As you progress through the next chapters, the story will become increasingly detailed,
bringing together all that you’ve learned.
Suppose you are a server process executing a SQL statement. Based on the SQL
statement and Oracle’s data dictionary, you discover there is row you must access located in
file number 35 with a block number of 2435.
194
You hash 35,2435, which hashes to CBC 02. If the hash function and the number of
buckets have not been changed, hashing this block will always point to CBC 02. So, if the
block does exist in the buffer cache, it absolutely must reside in CBC 02. Referencing the
CBC, you realize that you must acquire latch CS 800 in shared mode. You go through the
spinning and sleeping algorithm, burning CPU and sleeping (screaming “latch free cache
buffer chain!”), but finally do acquire latch CS 800.
Now with the latch in hand, you begin traversing CBC 02 in the hope of finding buffer
35,2435. You check buffer header BH 165 and discover it is associated with another buffer,
so you move onto buffer header BH 145. You discover this buffer header is associated with
another buffer, so you continue. But you hit the end of CBC 02, and therefore you know that
buffer 32,2435 is not in the buffer cache. You release the CBC latch CS 800.
You still need to access the buffer, so you can’t simply give up. While the buffer is not
in memory, it is in a database file on disk (otherwise the data dictionary would not have
showed this block was of interest to you). So you must make a system call to the IO
subsystem requesting your single block. Just before you make the IO request, you ask the
operating system for the time by issuing the gettimeofday system call. While you are
waiting for the operating system to return the block, you are yelling, “db file sequential read!”
When you finally get the block from disk, you again issue a gettimeofday call,
calculate the time difference, and record that into Oracle’s wait interface structures for anyone
to see via wait event views. Now with the block in hand, you still cannot peek inside, as it
must first be appropriately placed in the buffer cache and all the relevant memory structures
must be appropriately changed. How this is performed is what much of the rest of this chapter
is about.
How to Wreck CBC Performance

One of the best (and most interesting) ways to learn how to solve performance problems is to
devise ways to create the problem! Referring to Figure 6-9, there are three classic ways to
slow CBC performance:
• When we reduce the number of latches, concurrency on the remaining latches will
increase.
• If we decrease the number of CBCs (for example, remove CBC 04 in Figure 6-9), the
average chain length will grow, increasing concurrency on the remaining chains and
also increasing CBC scan time.
• If buffer cloning becomes intense, then the popular chain will become very long,
increasing both concurrency and CBC scan time.
Both of these situations will increase response time. Let’s take a look at these
unfortunate situations in more detail.
Limiting Concurrency by Decreasing Latches

With a single latch, serialization is ensured, but concurrency is severely limited. All it takes is
one other process requesting the latch while it’s being held for contention to result. In this
example, simply adding one more latch could solve the problem. Scale this out quite a bit by
having hundreds or thousands of processes requiring access to the CBCs, and you can see the
potential exists for serious performance-limiting concurrency issues.
195
Fortunately, by default, Oracle creates thousands of CBC latches. While Figure 6-9
shows 3 latches, real Oracle systems will have at least 10,000 chains and 1,000 latches.
One way to look at the contention possibility is to calculate how many chains each latch
protects. For example, if there are 10,000 chains and 1,000 latches, then each latch ensures
serialization for 10 chains.
Table 6-1 shows a few samples of the relationship between data buffers, CBCs, and the
CBC latches. Each sample was taken from a real production Oracle system. Only the number
of data buffers was set; that is, all the other values were defaults based on the number of data
buffers. Based on the number of data buffers, Oracle determines the number of hash chains
and the number of latches.
Table 6-1. Relationships between buffers, CBCs, and their latches
Oracle Release Data Hash CBC Chains per

(Block Size) Buffers Chains Latches CBC Latch
8.1 (4KB) 76,800 153,600 1,024 150
9.2 (8KB) 699,994 1,400,000 8,192 171
9.2 (8KB) 10,163,200 20,326,433 1,048,576 19
9.2 (8KB) 1,316,273 2,632,549 16,384 160
10.1 (8KB) 327,680 1,048,576 4,096 256
10.2 (8KB) 1,203,798 4,194,304 16,384 256
10.2 (16KB) 712,545 2,097,152 65,536 32
10.2 (32KB) 749,486 2,097,152 65,536 32
11.1 (8KB) 32,256 65,536 2,048 32
11.2 (8KB) 63,776 131,072 4,096 32
11.2 (8KB) 125,559 262,144 8,192 32
11.2 (8KB) 149,475 524,288 16,384 32
A number of remarkable items immediately become apparent from Table 6-1:

• There are a lot of hash chains. While I’ll illustrate perhaps five to ten chains when
explaining CBCs, in reality, there are thousands or millions of them! Oracle knows its
hash function is not perfect and will result in some collisions. One way to reduce the
likelihood of collisions is to have a larger number of chains. Your first reaction may be
that more chains will consume more memory, but this is not true. Each buffer header
must be on a chain, regardless of the number of chains and the chain length. With
more chains, the average chain length decreases, while the number of buffer headers
does not change. So, while there is some additional memory overhead for each chain,
the real memory consumer is the number of buffer headers, not simply the number of
chains.
196
• There are thousands of CBC latches. Years ago, the rule of thumb for the number of
latches was that they should not exceed the number of CPU cores times two.
Obviously, Oracle has changed the rules, because I know for a fact that none of the
Table 6-1 systems has over 500 CPU cores! And don’t forget that the CBC latches
represent only one of the many latches Oracle uses (last count on an Oracle Database
11g system was 382 latches).
• Each CBC latch can cover a couple hundred chains. Once you realize Oracle can
handle multiple CBC latches, you might jump to the conclusion that there will be one
latch for every CBC. Oracle does not consider this necessary and feels comfortable
with each latch being responsible for perhaps over a hundred chains.
• In every case, there are more chains than buffers. Not just by a few, but by two
times or more. If there are more chains than buffers, this means some chains will not
have an associated buffer header, effectively having a chain length of zero. While this
may seem wasteful, it clearly shows that Oracle does not want a process consuming
CPU resources, holding a latch and doing a sequential scan. It also implies the
developers don’t trust their hashing algorithm to not produce collisions. They decided
it costs less (however that was measured—perhaps memory consumption) to increase
the hashing structure size in the hope of saving CPU cycles and improving response
time.
The fact that there are so many CBC latches represents an interesting shift in Oracle’s
thinking. By limiting the number of latches, Oracle can control which processes can run its
kernel code. If there are only ten latches, then Oracle’s thought is that only ten processes
could be on the CPU. This way, Oracle developers thought they could help ensure the CPU
subsystem is not being overwhelmed by Oracle processes. But this is bad policy and thinking
for two reasons. First, if a process does not have a latch but wants one, the process will be
repeatedly spinning and sleeping. As we know, spinning consumes CPU. So regardless of
whether a process has the latch and is running kernel code or is spinning on a latch, CPU will
be consumed. The only benefit is when sleeping is involved, which will reduce CPU
consumption. Second, when Oracle tries to control this process, it is effectively saying it
knows better how to balance operating system resources. I think Oracle developers learned it
is best to liberally give latches to reduce any contention involved in acquiring the latch, and
then let the operating system deal with the scheduling issues.
This does bring up an interesting point though. Latch contention can be the result of time
spent acquiring the latch, not only time holding the latch. It is sad indeed when significant
resources are used to simply acquire the latch and not actually do the database work we
desire.
I suspect many of you are now anxious to see what the situation is on your system. The
script in Figure 6-10 will produce the raw numbers you need. Because the query is against x$
performance tables, you’ll need to be connected as sysdba. The instance parameters are
fairly self-explanatory, but are detailed in Table 6-2.
197
col param format a50 heading "Instance Param and Value" word_wrapped
col description format a20 heading "Description" word_wrapped
col dflt format a5 heading "Dflt?" word_wrapped
select rpad(i.ksppinm, 35) || ' = ' || v.ksppstvl param,

i.ksppdesc description,
v.ksppstdf dflt
from x$ksppi i,
x$ksppcv v
where v.indx = i.indx
and v.inst_id = i.inst_id
and i.ksppinm in
('db_block_buffers','_db_block_buffers','db_block_size',
'_db_block_hash_buckets','_db_block_hash_latches'
)
order by i.ksppinm
/
Figure 6-10. The OSM ipcbc.sql report shows the CBC-related instance parameters. This
script output was the source of the data in Table 6-1.
Table 6-2. CBC-related instance parameters

Instance Parameter Description
If set, the number of buffers you want in the buffer
db_block_buffers cache (Oracle reserves the right to alter the actual
number of buffers)
_db_block_buffers Actual number of buffers in the buffer cache
db_block_size Default database block size
Number of CBC buckets, which is also the number of
_db_block_hash_buckets
CBCs
_db_block_hash_latches Number of CBC latches
One of the best and easiest ways to cause CBC latch contention is to create a large buffer
cache so your most active blocks are always cached, and then reduce the number of CBC
latches to say, one. That should do it, except that Oracle is smart enough (at least starting with
Oracle Database 10g) to not allow the number of CBC latches to drop below 1,024. But even
with 1,024 latches and enough logical IO activity, you’ll see plenty of CBC latch contention.
Increasing Chain Scan Time by Decreasing Chains

If the CBCs are long, the time to scan them will become enough to cause significant
contention. Plus the time other processes spend contending for the latch will skyrocket.
One obvious way to increase the average chain length is to decrease the number of
chains, which can be achieved by decreasing the number of hash buckets. Simply decrease the
value of instance parameter _db_block_hash_buckets to say, 50, ensure your queried
blocks reside in the buffer cache. You would get massive CBC latch contention very quickly.
198
But Oracle will resist your efforts by ensuring there are at least 64 buckets, which is still slow
enough to ensure plenty of contention.
Realistically though, one solution to CBC latch contention is to increase the number of
hash buckets. While this will decrease the average chain length, if one particular chain is very
long and popular, this solution will not improve performance. Furthermore, Oracle creates a
large number of chains by default, so it is unlikely that adding chains will create the
performance benefit you seek. But it is a valid approach and should be considered.
Increasing Chain Scan Time with Cloned Buffers

Although the problem of long chains is very rare, if it does occur, it is very serious indeed.
Understanding how this could happen not only helps you solve the problem if it should arise,
but also allows a much deeper understanding of the CBCs, latching, undo, and read
consistency. It even touches on RAC systems.
Long chains present a very challenging problem. First, hashing structures are fast
because there is little or no scanning, so a long chain quickly diminishes the benefit of using a
hashing algorithm. Second, a scanning process must possess a CBC latch—and not just any
CBC latch, but the CBC latch protecting the specific chain of interest. A longer chain means
that the latch will be held longer and also that more CPU will be burned while scanning the
list. Plus, since the latch is being held longer, there is an increased likelihood that another
process will be contending for the latch. That contending process is also consuming CPU
while spinning and posting a wait event while sleeping. But the problem is even deeper than
this.
Normally, Oracle’s hashing algorithm, combined with more than twice the number of
chains than buffers, results in a very, very short chain. The only way for a long chain to occur
would be for multiple buffers to hash to the same chain. Typically, this is not a problem, but it
can occur. And this is the focus of this discussion.
To explain this situation, I need to first discuss the topic of block cloning and then
weave it into hashing. When an Oracle database block is cached, only the single current mode
buffer can be modified. If a row in the buffer needs to be changed, the single current mode
buffer must be available. Current mode buffers are sometimes called CU buffers. On an RAC
system, if the current mode buffer you need resides in another instance, it must be sent to your
instance, and then you can modify the buffer.
Suppose a server process is running a query at time T100. The process accesses the data
dictionary and knows it must access a specific block, so it hashes to the appropriate chain,
acquires the appropriate latch, scans the chain, and finds the current mode buffer’s buffer
header. However, upon examination of the buffer header, it is discovered the current mode
buffer was changed at time T200, which is after the server process started its query. This
means it is possible that the row(s) needed could have been changed after the query started.
Oracle’s default read consistency model demands the information returned will be as it was
when the query started. Therefore, Oracle must take action to ensure the returned information
is correct as of time T100.
Oracle must now either find a copy of the buffer or build a copy of the current mode
buffer so the buffer represents the situation at time T100. A buffer copy is commonly called a
buffer clone. Cloning a buffer is a relatively expensive process. First, a free buffer must be
found, and then the buffer header must be properly connected into the CBC structure and also
the LRU chain structure. As I will detail in the next section on LRU chains, completing this
process requires multiple latches, consumes CPU, and potentially generates IO.
199
The key to understanding the potentially significant performance impact is in

understanding where the cloned buffer’s buffer header will be placed in the CBC structure.
Because a cloned buffer is a legitimate buffer, it occupies space in the buffer cache, can be
shared, and must be locatable. This means it must be properly placed in the CBC structure.
The “ah ha!” moment occurs when we realize that the cloned buffer’s file number and block
number are the same as its current mode buffer, which means it must be hashed to the same
CBC! Therefore, if a buffer has 50 clones, its associated CBC will be at least 50 buffer
headers long. and possibly even longer if collisions with other buffers have occurred. And
there is nothing Oracle can do about this, because the hashing algorithm is based on the file
number and block number. OraPub’s OSM clone script, clone.sql, displays all buffers
with at least one clone. During a nonpeak time, run this clone script, as shown in Figure 6-11.
An example output snippet is shown in Figure 6-12. Cloned buffers are a common occurrence
in Oracle systems, and in fact are required to satisfy read consistent queries. Still, it is
surprising to see that so many clones are invading your buffer cache.
select file#, block#, count(*)-1 clones

from v$bh
group by file#,block#
having count(*) > 1
/
Figure 6-11. This script will show the number of clones for buffers having at least one clone.
It is simple to tell if there are clones because there will more than one buffer header with the
same file number and block number.

Report: clone.sql OSM by OraPub, Inc. Page 93
Buffer Clone Summary Report
File # Block # STATUS Clones

------- ------- ------- -------
...
3 7099 cr 6
3 9859 cr 6
3 12779 cr 6
1 53657 cr 6
3 1523 cr 6
3 12299 cr 6
9 3755 cr 6
Figure 6-12. Some of the output of OSM clone script, clone.sql. Notice this figure shows
only the last page (page 93), which means there are many buffers with clones, but none with
more than six.
Not only does its free buffer search algorithm favor replacing cloned buffers, but Oracle
tries to limit the number of clones per buffer. Oracle wants the number of clones per buffer to
not exceed the hidden instance parameter _db_bock_max_cr_dba, which has a default
value of 6. However, if the cloning situation becomes intense, a buffer can easily have more
than six clones.
200
But what if file number 9, block number 3755 in Figure 6-12 had 120 clones, rather than
6? If the system were experiencing CBC latch contention along with the 120 clones, we
would be motivated to find out what’s so special about this block and why so many processes
are interested in accessing it.
For a buffer to be cloned, a process must be interested in an earlier version of the buffer.
This implies two actions. First, the buffer must have been changed. Second, it is being
queried. And for the contention to become significant, the buffer must be very popular;
otherwise, only a few clones would be created. Popular buffers are popular for a reason.
Usually, with the information shown in Figure 6-13, a DBA (or someone who knows the
applications well) will know why the buffer is so popular. Because Oracle’s hashing
algorithm is based on the file number and block number, the solutions are frighteningly
limited.
SQL> @objfb 9 3755

Report: objfb.sql OSM by OraPub, Inc. Page 1
Object Details For A Given File #(9) and block #(3755)
File number :9
Block number :3755
Owner :SYS
Segment name :SPECIAL_CASES
Segment type :TABLE
Tablespace :MOTOGP
File name :/u957/oradata/prod16/MOTOGP19.dbf
1 row selected.
Figure 6-13. This OSM script is based on DBA_EXTENTS. For a given file number and block
number, it produces information that can be used in a variety of situations, such as helping to
resolve buffer cloning issues.
Please remember that having many clones may not present a performance issue. If it is
in fact a performance problem, CBC latching contention will clearly be an issue. If this is the
case and you spot a cloned buffer, then consider the following as possible remedies:
• Fix the application. This is usually what must be done. It’s painful, requires meetings,
can get very personal if the application developer is involved, and usually requires the
application to be modified in some way to reduce the popularity of the single cloned
buffer.
• Move the rows. If you’re lucky, there may be multiple rows that are intensifying the
buffer’s popularity. If possible, distribute the rows, so multiple buffers are now not
quite so popular. While altering the traditional pct_free and pct_used storage
parameters is an option, for increased control, consider setting the maximum number
of rows table blocks can hold. Unexpectedly, there is more involved than simply
issuing an alter statement like alter table all_status minimize
records_per_block 5. But it works wonderfully! For details, see a SQL
reference guide or do an Internet search.
201
• Balance the workload. If you have control over the workload intensity, during peak
cloning activity, consider reducing the workload specifically associated with buffer
cloning. While not an exciting technical solution, workload balancing can have
significant positive performance results.
CBC Contention Identification and Resolution

A number of solutions can help to eradicate CBC issues. Before you try to resolve CBC latch
issues, ensure they exist. I know this sounds strange and almost condescending, but I find
many DBAs ponder implementing solutions that do not fit the problem. Figure 6-14 is what
you might see if serious CBC latch contention exists.
SQL> @swpctx

System Event CHANGE Activity By PERCENT

--------------------------------- ----------- ------- ----------- --------
log file sync 0.000 0.00 0.0 0
Figure 6-14. This is a typical wait event report showing there is clear CBC latch contention.
If you’re on a system earlier than Oracle Database 10g, the top wait event will be
latch free, and as I detailed in Chapter 3, you will need to confirm the latching issue is
the CBC latch. With Oracle Database 10g and later systems, as shown in Figure 6-14, the wait
event will be latch: cache buffers chains. In most cases, the CPU subsystem will
be heavily utilized and overburdened.
Here is a list of possible CBC latch solutions, in no particular order:
• Tune logical IO SQL. The CBC structure becomes stressed when the answer to the
question, “Is the buffer in the buffer cache?” is nearly always “Yes!” If the answer
shifts to “No, it’s not in the cache,” you will notice sequential or scattered reads will
be the issue. So, from an application perspective, look for the SQL that performs the
majority of buffer gets, that is, logical IO activity. The SQL statement or statements
will be there—they must exist. Do whatever you can to reduce their logical IO
consumption. This means classic SQL tuning, including indexing, as well as reducing
the execution rate during the performance-critical times.
202
• Increase CPU power. In most cases, the CPU subsystem will be heavily utilized and
probably the operating system bottleneck. Latch acquisition and the associated
memory management consume a tremendous about of CPU. Do anything you can
think of to reduce CPU consumption and to increase CPU capacity. Look for processes
that do not or should not be running during peak times. Consider adding CPUs or
using faster CPUs. If you are running in a virtual environment, considering ensuring
this Oracle system has increased CPU resources. However, be aware that unless the
application workload has considerably increased, additional CPU power typically will
be consumed rather quickly. The real solution probably lies elsewhere. Increasing CPU
power may be a good quick fix, but it will probably not truly solve the problem.
• Check for cloning issues. Whenever I encounter CBC latch contention, I check to
ensure cloning is not the issue. This is rarely the case, but if it is, the solution path (as
outlined previously) is very different from all the other solutions.
• Increase CBC latches. This usually brings some relief, but not nearly as much as
tuning the logical IO SQL. The hidden instance parameter
_db_block_hash_latches controls the number of CBC latches. Before you
change this, be aware there may be support issues from both Oracle and your
application vendor.
• Increase CBC buckets. This rarely brings any performance relief because Oracle, by
default, creates a tremendous number of buckets. Unless someone previously
decreased the number of CBC buckets, increasing this parameter will probably have
little effect on performance.
Least Recently Used Chains

The LRU chains (or LRU lists, as they are sometimes called) have had their associated
algorithms change many times over the years. But while the algorithms have changed, the
functions of the LRU chains remain the same: to help keep popular buffers in the cache and to
help server processes quickly find replaceable buffers. Anytime a single list strives to fulfill
two missions, there will undoubtedly be some compromise. The LRU chains are no different.
But as you’ll discover, Oracle’s current LRU algorithm implementation works wonderfully,
supporting buffer caches over 100GB with the incredibly high transaction rates required for
massive telecom and governmental systems.
Back in Oracle 6 days, there was only a single LRU chain protected by a single LRU
chain latch. On large OLTP systems, DBAs battled LRU chain latch contention. But starting
with Oracle 7, Oracle helped relieve the situation by segmenting the single LRU chain into
many smaller LRU chains, giving each its own LRU chain latch. Every cached buffer is
represented in the CBC structure and is also represented on either one of the LRU chains or
one of the write lists (commonly called a dirty list). Buffers do not reside on both a write list
and an LRU list.
LRU chains are much, much longer than CBCs. A 1GB 8KB buffer cache with 126,126
buffers and 16 LRU chains (the default) means that each LRU chain is, on average, 7,888
buffer headers long. A smaller system with a 250MB 8KB buffer cache containing 31,563
buffers and 8 LRU chains (by default) means that each LRU is around 3,945 buffer headers
long. There will be some dirty buffers on the write list, but Oracle tries to keep that list very
203
short. So although the numbers in these examples are not exact, they do present a good picture
of how long LRU chains can be. This is important to keep in mind, because when
diagramming LRU chains, as in Figure 6-15, they are usually drawn with only a few buffer
headers.
It is no problem that dirty buffers reside on an LRU chain. In fact, it would be a massive
performance problem if they couldn’t reside on an LRU chain. One of the objectives of the
LRU chains is to help keep popular buffers in the cache, and many dirty buffers are also very
popular. During a database checkpoint though, every dirty buffer does get written to disk and
becomes free once again.
Figure 6-15. This is an example of Oracle’s LRU chains. There are 2 LRU chain latches and
6 LRU chains, protecting 20 buffer headers. Each buffer header is associated with a single
free, pinned, or dirty buffer.
Figure 6-15 is a good scaled-down diagram of Oracle LRU chains. Production Oracle
systems usually have less than 50 LRU chains; most have between 8 and 16 chains. The
hidden instance parameter _db_block_lru_latches shows how many LRU chains the
instance is using. As with the CBCs, each LRU chain latch controls serialization for a group
of LRU chains. Each of the Figure 6-15 LRU chains is protected by either latch CS 900 or CS
910. Just as with the CBC structure, each LRU chain is entirely made up of buffer headers. In
fact, each of the buffer headers in the Figure 6-15 LRU chains is also linked to one of the
CBCs! Each buffer header is labeled with the letter F, P, or D to signify it is free, pinned, or
dirty. As I mentioned, in most production Oracle systems, each LRU chain links thousands of
buffer headers.
LRU Chain Changes Over the Years

The current LRU chain algorithm is called the touch-count algorithm, which uses a count
frequency scheme to place a value on each buffer header. But it has taken Oracle many years
to arrive at this algorithm. Understanding Oracle’s LRU algorithm progression provides an
204
insight into how the LRU chains work, their shortcomings, and how to ensure they perform as
needed.
When the LRU chains become a performance issue, massive LRU chain latch contention
will occur. From an Oracle algorithmic perspective, the latching issue is usually the result of
server processes holding an LRU chain latch too long while searching for a free buffer. There
are many interconnected reasons for this, as well as solution strategies, which I’ll cover
throughout this section.
Standard LRU Algorithm

When Oracle was first written, the LRU algorithm was extremely simple. Figure 6-16 shows a
single LRU chain with the important standard LRU algorithm elements highlighted.
Figure 6-16. A single LRU chain with the important standard LRU algorithm aspects
highlighted: the least recently used (LRU) end and the most recently used (MRU) end.
Regardless of Oracle’s LRU algorithm, every Oracle LRU chain has a least recently
used (LRU) end. It also has a most recently used (MRU) end. In very general terms, the more
popular buffer headers will reside near the MRU end, and the less popular buffers will reside
near the LRU end.
The standard LRU algorithm is very simple. When a buffer is brought into the cache or
accessed (query or DML), it is placed at the MRU end of the session’s LRU chain (each
session is associated with an LRU chain). The thinking is that a popular buffer will be
repeatedly touched and therefore repeatedly moved to the MRU end of the LRU chain. The
movement to the MRU of this list is commonly called, buffer promotion. If a buffer is not
popular, then as other buffers are promoted or inserted into the LRU, the unpopular buffers
will naturally migrate toward the LRU end of the LRU chain.
Possibly lurking near the LRU end of every LRU chain is a server process seeking a free
unpopular buffer to replace with a block it just read from disk. I call such a server process the
reaper, as it replaces buffers whose goal in life is to live long and gloriously in Oracle’s
buffer cache. For example, suppose the LRU situation is as shown in Figure 6-16 and there is
a server process looking for a free buffer. The server process will start at the LRU end of the
LRU chain and check if buffer header BH 310 is free. If buffer header BH 310 indicates it is
free, it will be replaced (being transported to buffer heaven, we hope) with the just-read
block’s buffer header and moved to the MRU end of the list.
This simple strategy worked well for Oracle until tables grew in size beyond the bonus
and dept tables. The LRU chain in Figure 6-16 is eight buffer headers long. Suppose an
eight-block table was just full-table scanned, requiring every block to be read into Oracle
cache and its buffer headers to be placed into the LRU chain. When this algorithm was used,
there was only a single LRU chain, so the entire LRU chain would be replaced with the full-
table scanned table. So hour after hour of refining the cache to contain the popular buffers
205
have just been obliterated. Users would certainly notice a performance change, and the IO
subsystem would also take a beating. As database size continued to increase, Oracle clearly
had to make a change, resulting in the modified LRU algorithm.
Modified LRU Algorithm

Oracle’s well-known modified LRU algorithm was released with Oracle 6. It was touted as a
major accomplishment, and Oracle developers were indeed very proud of their advanced
buffer cache algorithm. After all, it did solve the key standard LRU algorithm problem.
Figure 6-17 shows the key aspects of Oracle’s modified LRU algorithm. The only
difference from the standard LRU algorithm is the creation of a window of just a few buffers
at the LRU end of the LRU chain. The size of this window was only a few buffers (for
example, four) and could be altered by the hidden instance parameter
_small_table_threshold. This ensured that, regardless of how large the table being
full-table scanned was, it would not obliterate a nicely developed cache.
Figure 6-17. Oracle’s modified LRU algorithm created a window of a few buffer headers,
which all full-table scanned (FTS) buffer headers flowed through when read into the buffer
cache. This ensured the more popular buffers residing in the MRU LRU chain area would not
be replaced.
As with all algorithms, there are limitations. But these limitations did not cause a
problem for many years. However, once customers began using Oracle for large data
warehouse applications, two significant problems began to occur:
• Large data warehouses have massive indexes, and when those massive indexes
undergo a large range scan, thousands and thousands of index leaf blocks must be read
into the cache. The problem, until Oracle8i, was if the index leaf block was not in the
buffer cache, Oracle made a single block IO request (think db file sequential
read) to bring the block into the buffer cache. The kicker was that since this was not
a multiblock IO request, the index buffers were inserted into the LRU chain at the
MRU end of the chain, destroying a beautifully developed cache, now completely full
of index leaf blocks!
• When the data blocks were then requested (based on the index leaf blocks), they were
also requested one at a time from the IO subsystem (think db file sequential
read). So once again, these data blocks were placed in the MRU end of the LRU
chain. As Oracle systems increased in size, Oracle’s buffer cache decreased in
usefulness.
206
At this point, Oracle had big problems brewing. With all the new Oracle features being
considered and yet to be considered, Oracle couldn’t simply add another clause to handle yet
another unfortunate LRU chain situation. It needed a very general, flexible, and high-
performing LRU chain algorithm. The Oracle developers found that in the touch-count
algorithm.
Oracle’s Touch-Count Algorithm

In a simple, nonassuming patch for Oracle8i Release 8.1.5, Oracle introduced a completely
altered LRU chain algorithm that has virtually eliminated all LRU chain latch contention
problems. Amazingly, it works without tweaking instance parameters, was slid into the
world’s largest production Oracle systems without any announcement, and was not
documented anywhere in official Oracle documentation. Even scouring Oracle Database 11g
documentation, I could find only references to the LRU chain algorithm as a modified LRU.
I first discovered this algorithmic change while looking at a list of instance parameters
and noticing some new strange ones, like _db_percent_hot_default and
_db_aging_cool_count. I had never seen these parameters before, and as I have
learned, whenever there are new parameters or old parameters suddenly disappear, an
algorithm has been changed. At that time, I had no idea just how significant this change was. I
started checking my typical sources4 and discovered that Oracle had indeed implemented
what is commonly called in the computer science field the count-frequency scheme.
As you might expect, the general approach is to increment a counter every time a buffer
header is touched. The more popular buffer headers will have a higher touch count and are
deemed more popular, and therefore should remain in the buffer cache. While this is generally
true, how it is implemented is quite different. Figure 6-18 shows a diagram of the touch-count
algorithm.
Figure 6-18. Oracle’s touch-count algorithm determines buffer header popularity based on
the number of times it is touched. Notice the FTS window concept is no longer necessary and
has been removed.
There are three key touch-count algorithm aspects: midpoint insertion, touch count
incrementation, and buffer promotion. The following sections look at each of these aspects.
4
My sources related to the touch-count algorithm were an Internet posting by Steve Adams of Ixora, an
internal Oracle source, and hours and hours of testing and experimenting. That was enough to allow me to perform
my research and publish a paper entitled All About Oracle’s Touch-Count Algorithm in 2001. From what I can tell,
James Morle made the first public hint of the touch-count algorithm in his 2000 book, Scaling Oracle8i (Addison-
Wesley).
207
Midpoint Insertion
The single most radical departure from the modified LRU algorithm is known as midpoint
insertion. Each LRU chain is divided into a hot and cold region. When a buffer is read from
disk and a free buffer has been found, the buffer and buffer header replace the previous buffer
and buffer header contents, and then the buffer header is moved to the LRU chain midpoint.
Single-block read, multiblock read, fast-full scan, or full-table scan—it makes no difference.
The buffer header is not inserted at the MRU end of the LRU chain, but rather at the
midpoint. This ensures the LRU chain is not obliterated by a large influx of a single object’s
blocks being read into the buffer cache.
By default, the hot and cold regions are split down the middle 50/50. with the midpoint
truly at the middle. However, this is configurable through the
_db_percent_hot_default parameter. Table 6-3. later in this chapter, describes the
more significant touch-count instance parameters.
As other buffer headers are inserted into the midpoint or promoted, buffer headers
naturally migrate from the hot region toward the cold region’s LRU chain endpoint. After a
buffer header is inserted, the only way to remain in the cache for a long time is to be
repeatedly promoted. I’ll detail this in the upcoming “Buffer Promotion” section.
Because the window scheme used in the modified LRU algorithm is no longer used, the
hidden instance parameter _small_table_threshold became deprecated. However, in
Oracle Database 11g, it is being used again, but for a different purpose. Starting with this
version, the _small_table_threshold parameter is the threshold for a server process
to start issuing direct reads. Direct reads can increase performance because blocks read from
disk are not placed into the buffer cache, but instead are processed in the server process’s
PGA memory. However, it is more selfish read and can actually slow performance, because
no other server process will benefit from the IO activity. My tests have shown that Oracle
Database 11g does not always respect changes to this parameter.
Let’s walk through an abstracted scenario. I’m leaving out a lot of detail, but this story
will work fine for our purposes. Suppose you are a server process who must query a row,
which resides in a specific block. Based on the SQL statement and the data dictionary, you
know the file and block number. You’re all about speed, so you’re hoping the block already
resides in the buffer cache. To check this, you need to get the buffer’s buffer cache memory
address, which resides in its buffer header.
To find the buffer header, you must access the CBC structure. You hash on the file and
block number, which points you to a hash bucket. Based on this hash bucket, you look up the
associated CBC latch and contend for it. After a few spins, you are able to acquire the latch,
so you begin your sequential CBC search. The first buffer header is not the buffer you’re
interested in, and unfortunately, there is no second buffer header in this cache chain, so you
know the buffer does not currently reside in the buffer cache.
You release the CBC latch and make a call to the operating system, asking for your data
block. While you’re waiting, you’re posting a db file sequential read wait event.
Finally, you receive the block from the operating system and hold it in your PGA memory.
Because you did not make a direct read, before you, or any other server process, can access
the buffer, it must be properly inserted into the buffer cache and update all the appropriate
structures.
You’ll need a free buffer to insert the just-read block into the buffer cache, so you go to
the LRU end of the LRU chain you’re associated with. But before you can start scanning the
LRU chain, you must contend for and acquire the associated LRU chain latch. After
consuming CPU by spinning and posting the wait event latch: cache buffers lru
208
chains while sleeping, you finally acquire the latch. Starting at the LRU end of your LRU
chain, you ask the buffer header (this would be buffer header BH 310 in Figure 6-18) if it is
an unpopular free buffer, and it replies that unfortunately, it is not all that popular. You now
begin the unpopular buffer replacement process. You immediately pin the buffer header.
From the buffer header, you get the memory address of the block buffer in the buffer cache,
replace the buffer with the just-read block you’re still holding in your PGA memory, make
any required buffer header modifications. You manipulate the LRU chain to move the buffer
header to the LRU chain’s midpoint, release the LRU chain latch, and unpin the buffer
header. Now any server or background process, including you, can access the buffer—and all
in a moment’s work.
Touch Count Incrementation

The concept is that every time a buffer header gets touched, its touch count is incremented. In
reality, this is not true. Consider the situation shown in Figure 6-19. The long horizontal axis
represents time, and the tick marks occur every 3 seconds. The arrows are when the buffer
header is touched.
Figure 6-19. By default, a buffer header’s touch count can be incremented only once every 3
seconds. This helps ensure a buffer deemed popular is active more than just a few seconds.
When a buffer is inserted into the buffer cache, its touch count is set to zero. However, if
the buffer is repeatedly touched within a short period of time, there is a good chance the touch
will not count. In Figure 6-19, the time period labeled A will result in the touch count being
incremented only once, as is the case for all the periods diagrammed. Without the 3-second
rule, the buffer header’s touch count would be increased by 14; with the 3-second rule, the
touch count will be increased by only 5. This ensures that for a buffer to be deemed popular, it
can’t just be popular for a few seconds. It must continue to be popular to generate a large
touch count.
Oracle also allows a touch count to be missed. There is no latch involved (which is a
fantastic way to eliminate latch contention), and Oracle does not pin the buffer header.
Without serialization control, two server processes can increment and update a buffer header’s
touch count to the same value!
As an example, suppose that server process S100 gets the buffer header’s touch count at
time T0, which is 13, and it begins the increment to 14. But server process S200 now asks the
buffer its touch count at time T1, and since server process S100 has not completed the
increment, it shows its touch count to still be 13. Server process S200 now begins the
increment from 13 to 14. At time T2, server process S100 changes the buffer header’s touch
count to 14, and at time T3, server process S200 also changes the buffer header’s touch count
to 14. Is it significant that a touch count increment was missed? No structures have been
corrupted, and the touch count has indeed been incremented, but just not twice. In the larger
209
scheme, if the buffer is indeed popular, it will be touched again. What Oracle has saved with
this fuzzy implementation is CPU consumption and the amount of kernel code being run.
Buffer Promotion
You may have noticed that I’ve have not said that when a buffer gets touched, it is promoted
to the MRU end of the LRU chain. This is because buffer header touching and buffer header
promotion are now two separate actions. When a buffer is considered for promotion, it is also
considered for replacement. While both server processes and database writers can promote
buffer headers, only a server process will replace a buffer and its associated buffer header as a
result of a physical block IO read. It doesn’t make sense for a database writer to perform a
replacement, since it has nothing to replace.
Figure 6-20 is a flowchart of what I’m about to explain. After a server process reads a
data block from disk, it must find an unpopular free buffer to replace with the just-read data
block. The server process acquires the appropriate LRU latch, and then starts scanning buffer
headers at the LRU end of the LRU chain. Remember that the buffer headers reside on the
LRU chains, not the buffers. If the server process bumps into a free buffer header, then it
checks if it is popular. If it is popular, the server process promotes the buffer header and
continues scanning. If the free buffer header is not popular, the server process replaces the
buffer with the block read from disk, updates the buffer header, and moves the buffer header
to the LRU chain midpoint. Notice there is no need to update the CBC structure, because the
buffer has not moved; only the buffer header on the LRU chain has moved. If the server
process bumps into a dirty buffer header, then it checks if it is a popular dirty buffer header. If
the dirty buffer header is popular, it promotes the buffer header and continues scanning. If the
dirty buffer header is not popular, the server process moves the buffer header to the write list.
If the server process bumps into a pinned buffer header it, simply continues with its scan.
Pinned buffers are off limits.
The promotion cutoff is a shockingly low value of 2 (_db_aging_hot_criteria).
So when a server process or a database writer asks, “What’s your touch count?” it’s actually
asking, “Is your touch count greater than or equal to _db_aging_hot_criteria?” As
long as a buffer gets touched every few seconds, it should remain in the cache; if not, it could
be quickly subjected to replacement.
When a popular buffer is promoted, life only gets more difficult. Part of the promotion is
the touch count is reset to zero (_db_aging_stay_count). This occurs unless the buffer
is a segment header or a consistent read (CR) buffer. So while a buffer may feel like a rock
star one second, the next second, it must once again prove its worth to remain in the buffer
cache.
The database writers can also promote popular buffer headers. When a database writer is
inactive, it wakes up every 3 seconds. Each database writer has its own write list (dirty list)
and is also associated with one or more LRU chains. When an LRU chain’s database writer
wakes up, it checks its write list to see if it is long enough to warrant an IO write. If the
database writer decides to build up its write list, it will scan one of its LRU chains looking for
unpopular dirty buffers. Much like the server process looking for free buffers, the database
writer will acquire the associated LRU chain latch, start at the LRU end of the LRU chain,
and check if the buffer header is both dirty and not popular. If an unpopular dirty buffer is
found, the database writer will move the buffer header from the LRU chain to its write list.
(Keep in mind that the buffer header remains in the CBC structure so it can be found by other
processes.) If the write list is still not long enough to warrant a write, then the database writer
continues scanning its LRU chain, looking for more unpopular dirty buffer headers.
210
Figure 6-20. Flowchart of a server process searching for a free buffer as the result of a
physical IO block read and LRU chain latch acquisition. If a database writer is scanning an
LRU, it is the same, except that an unpopular free buffer is not replaced, but skipped, and
scanning continues.
Hot Region to Cold Region Movement

A buffer header’s life in an LRU chain begins at the midpoint. Because other buffer headers
are being replaced and inserted into the midpoint, as buffers get promoted, a buffer header
naturally migrates toward the LRU end of the LRU chain. As presented in the previous
section, the only way for a buffer header to get promoted is to be identified as popular.
Another significant event occurs when a buffer crosses the midpoint, moving from the hot
region into the cold region.
When a buffer crosses into the cold region, its touch count is reset to a default of 1
(_db_aging_cool_count). This has the affect of cooling a hot buffer, something no
buffer who desires to remain in the cache wants to occur. Increasing this parameter will
artificially increase buffer value, thereby increasing the likelihood of buffer movement.5 So,
5
Always remember that when we change an instance parameter, it is truly only a suggestion. If our change
places Oracle in an impossible situation or Oracle deems the change inappropriate in any way, Oracle may simply
ignore our suggestion or choose a value that is close to our suggestion and acceptable to Oracle.
211
by default, when a buffer header crosses into the cold region, it must be touched at least once
to match the promotion criteria (_db_aging_hot_criteria).
About Touch Count Changes

You may wonder why Oracle resets the touch counter when a buffer header is promoted and
also when it crosses into the cold region. The key to understanding this lies in the midpoint.
While the midpoint defaults to evenly split each LRU chain into hot and cold regions
(_db_percent_hot_default=50), it can be set to anything at and between 0 and 100.
If the LRU chain becomes a 100% hot region, then the only reset will occur when a buffer is
promoted.6 When Oracle releases the ability to create any number of buffer pools (as DB2 has
for many years), the ability to manipulate the midpoint in each pool will allow for highly
optimized and specific LRU activity. So while the double reset may initially seem silly, it
does serve a real purpose and sets the stage for the future.
The touch count reset has significant ramifications. First, this means touch counts will
not skyrocket into infinity. Figure 6-21 is an example of what you may find in your
production system. It may be surprising, but due to the touch count reset during promotion,
buffer headers typically have a touch count of 0, 1, or 2. The touch count reset also means the
most popular buffer headers will not necessarily have the highest touch counts. If you notice a
particular buffer has a low touch count, you may have caught a popular buffer just after it was
promoted or crossed into the cold region. In fact, the highest touch count buffer headers will
reside near the LRU end of the LRU chain!
With all this touch count incrementing, resetting to zero, and touch count inquiry, the
definition of a popular buffer begins to get a little nebulous. Is a popular buffer one that
remains in the cache for 8 hours with an average touch count of 2, or a buffer that remains in
the cache for 5 minutes with an average touch count of 250? Well, Oracle has determined a
popular buffer simply needs a touch count greater than or equal to 1 (by default) when asked.
6
If you have given the keep and recycle pools a lot of thought, you might notice by messing with the midpoint,
you can essentially turn the entire buffer cache into either a keep or recycle pool.
212
SQL> l
1 select '00 : '||count(*) x from x$bh where tch=0
2 union
3 select '01 : '||count(*) x from x$bh where tch=1
4 union
...
33* select '16 : '||count(*) x from x$bh where tch=16
SQL> /
TCH : Occurrence
--------------------
00 : 65927
01 : 56995
02 : 2159
03 : 8
04 : 330
05 : 7
06 : 2
07 : 3
08 : 1
09 : 1
10 : 0
11 : 0
12 : 0
13 : 4
14 : 1
15 : 1
16 : 2
17 rows selected.
Figure 6-21. Shown are touch counts followed by the number of buffers with that touch count
(the number of occurrences). Notice that the touch counts are much lower than you may
initially think. This is due to the touch counts being reset.
LRU Chain Contention Identification and Resolution

Out of the box and for 99% of all Oracle implementations, Oracle’s LRU touch-count
algorithm, combined with the default instance parameter settings, enables high-performance
LRU chain activity with negligible contention. When the touch-count algorithm does get
stressed, it’s a unique combination of IO and CPU activity.
The LRU chain latches are named cache buffers lru chain. The hash chain
latches are named cache buffer chains. The names are very close and can lead to
quite a bit of confusion. Just remember that the LRU chain latches have lru in their name,
and you’ll be fine. On pre-Oracle Database 10g systems, the wait event will simply be latch
free, so as I presented in Chapter 3, to determine the specific latch, you will need to
reference the p2 column in v$session_wait or use the v$latch impact calculation
strategy. For Oracle Database 10g and later systems, the wait event will be latch: cache
buffers lru chain.
Without physical blocks being read from disk, there will be no LRU chain latch
contention, because there will be no need to find a free buffer or insert a buffer header into an
LRU chain. Database writers looking for unpopular dirty buffers will not stress the LRU chain
structure enough to cause LRU chain latch contention. However, anytime a server process
213
reads a block from disk, it must find a free buffer, which requires LRU chain activity (the
exception is a direct read). If the IO reads are taking over 10 ms, then we are likely to see
scattered or sequential reads instead of LRU chain latch contention. But if the IO subsystem is
returning buffers in less than 5 ms, the stress can shift toward the CPU subsystem, and this is
when LRU chain activity will begin to be stressed.
LRU chain latch contention could be the result of problems acquiring the latch, holding
the latch too long, or both. If the operating system is CPU-bound, acquiring a latch could take
a long time, because there simply are not enough CPU cycles to go around. Once the latch is
acquired and the LRU chain-related kernel code is run, if CPU cycles are in short supply or
there are limited unpopular free buffers, the LRU chain latches could be held long enough to
cause significant contention.
So, first, there must be intense physical read activity. Second, the IO subsystem response
time is very fast, passing the majority of wait time from read wait events to the LRU chain
latch wait event. This contention requirement combination provides many resolutions options.
Here are the options I consider, in no particular order.
• Tune physical IO SQL. As I mentioned, there can be no significant LRU chain latch
contention without physical IO. So, from an application perspective, look for the SQL
statements that perform the majority of block reads; that is, physical IO activity. Do
whatever you can to reduce their physical IO consumption. This means classic SQL
tuning, including indexing, as well as reducing the execution rate of the top physical
IO SQL during the performance-critical times.
• Increase CPU power. Just as with CBC latch contention, or any other latch contention
for that matter, if more CPU power is available, memory management will take less
time. This means latch hold time and also latching acquisition time (both spinning and
sleeping) could be reduced. Increasing CPU power also means looking for creative
ways to reduce CPU consumption during peak contention times.
• Increase LRU latches. By default, most Oracle systems have only 8 to 32 LRU chain
latches. LRU concurrency can be increased by adding latches, which means increasing
the hidden parameter _db_block_lru_latches. This can be especially effective
if you have a multigigabyte buffer cache.
• Use multiple buffer pools. A creative strategy to take some of the strain off the main
LRU chains is to implement the keep and recycle pools. All buffer pools can have their
number of LRU chain latches increased. They also use the touch-count algorithm, and
therefore have similar touch count instance parameters, such as
_db_percent_hot_keep.
• Adjust touch count instance parameters. Table 6-3 summarizes the available touch
count parameters. But be warned: the parameters have very small values, like 1 and 2.
Therefore, even a switch from 1 to 2 is a relatively massive change that could easily
have unintended consequences. I would consider adjusting the touch count parameters
only as a last resort and after testing; otherwise, you’ll just be guessing.
214
Table 6-3. Touch-count-related instance parameters, excluding keep and recycle pools
Instance Parameter Default Description
The percentage of buffer headers in the hot
region. If you want more headers to be
considered hot, increase this value.
_db_percent_hot_default 50 Decreasing this parameter will give buffer
headers more time to be touched before
encountering a server or database writer
process.
The time window in which a buffer
header’s touch count (x$bh.tch) can be
increased by only one. Increasing this
_db_aging_touch_time 3
parameter decreases the affect of a sudden
buffer-centered activity burst, while
risking devaluing truly popular buffers.
The touch count threshold a buffer header
must meet or exceed to be promoted. If
you want to make it more difficult for a
_db_aging_hot_criteria 2
buffer to be promoted, increase this value.
Then only really hot buffers will remain in
the cache.
The touch count reset value when a buffer
_db_aging_stay_count 0 header is promoted. Consistent read and
segment header blocks are exempt.
The touch count reset value when a buffer
header crosses from the hot region into the
_db_aging_cool_count 1 cold region. Decreasing this will make it
more difficult for a buffer header to be
promoted.
Makes consistent read buffers always cold,
_db_aging_freeze_cr FALSE
so they are easily replaced.
The Write List and Database Writer

Write lists, commonly called dirty lists or LRU-W lists, are composed entirely of dirty buffer
headers. Each of the dirty buffer headers is also in the CBC structure. Oracle has the concept
of a working set, which consists of an LRU latch, an LRU chain, and a write list. Each
database writer is associated with one or more working sets. Upon instance startup, Oracle
will determine the number of working sets and the number of database writers (init.ora
db_writer_processes, which has a default of 1), and then set their association. When a
database writer writes, it feeds off one of its write lists. Both server processes and database
writers move unpopular dirty buffer headers from their LRU chains to their associated write
lists. Now let’s inject some movement into these facts.
215
The Database Writer in Action

Multiblock writes are more efficient than single-block writes, so the database writer builds up
a dirty list before it writes. Over the years, Oracle has changed the instance parameters and
algorithm to control the minimum dirty list batch size. In fact, Oracle has even used a self-
adjusting algorithm to account for bursts in dirty buffers. When the database writer is in the
midst of a multiblock write, it will post the wait event db file parallel write.
The v$session_wait parameters are documented to provide the number of blocks
being written, but I have personally not found this accurate. Once again, I turn to operating
system tracing. Figure 6-22 shows the result of tracing an active database writer process.
There are three columns: the pwrite command, the number of bytes written, and the number
of 8KB Oracle blocks written. Your system will look different, but don’t be surprised to see
the database writer submit a write request of more than 100 Oracle 8KB blocks!
[oracle@fourcore local]$ ps -eaf |grep dbw

oracle 10127 1 1 09:35 ? 00:02:01 ora_dbw0_prod16
[oracle@fourcore local]$ strace -p 10127 2>test.txt
[oracle@fourcore local]$ cat test.txt | grep -v 8192 | grep -v 16384 | \
> grep -v NULL | awk '{print $1 " " $6 " " $6/8192}' | head -15
Process to 0
pwrite64(22, 294912 36
pwrite64(22, 73728 9
pwrite64(22, 73728 9
pwrite64(22, 155648 19
pwrite64(22, 122880 15
pwrite64(22, 180224 22
pwrite64(22, 114688 14
pwrite64(22, 139264 17
pwrite64(22, 270336 33
pwrite64(22, 122880 15
pwrite64(22, 114688 14
pwrite64(22, 114688 14
pwrite64(22, 122880 15
pwrite64(22, 122880 15
Figure 6-22. Operating system tracing is the reliable way to see how many Oracle blocks the
database writer is actually submitting for each request. The far-right column is the number of
8KB Oracle blocks written. The largest multiblock database writer write shown in this figure
is 36.
It’s the database writer’s responsibility to write the dirty buffers to disk. Run this simple
query:
select count(*) from v$bh where dirty=’Y’;
You will notice the count cycling up and down. The cycling is the result of the
application dirtying buffers while the database writer is writing them to disk, making them
free buffers once again. This cycling is normal and what you want to see. If the count keeps
going up and up, then you know the database writer is falling behind. It is common to have
thousands of dirty buffers in a large production Oracle system.
216
Most of us have been taught that the database writers wake up every 3 seconds. I
decided to see if this was true on an Oracle Database 11g system. Figure 6-23 is a snippet of
the results. I started an Oracle Database 11g instance, placed no load on the system, and
operating system traced the database writer. Notice the sleep time is just about 3 seconds and
the system call is semtimeodop. If you recall from Chapter 3, when a server process sleeps
during latch acquisition, it’s because it issued a select system call. The select does not
allow the process to be woken, but a semaphore call does. This is an important distinction,
because the database writer may need to be woken up for a variety of reasons, such as a
checkpoint or a free buffer waits event (described a little later in the “Free Buffer
Waits” section). If you were to trace an active database writer process, you would see
frequently occurring write-related system calls, as shown in Figure 6-24.

...
0.000035 semtimedop(229376, 0xbfbd3d24, 1, {3, 0}) = -1 EAGAIN
(Resource temporarily unavailable)
...
...
...
Figure 6-23. This is an operating system trace of an Oracle Database 11g database writer on
a very idle system. Notice the 3-second database writer sleep is induced by a semaphore call.
This allows the database writer to be woken if necessary.
217

0.028631 pwrite64(23, "&\242\0\0\t\0\300\0\316"..., 8192, 73728) = 8192
0.000114 pwrite64(23, "\2\242\0\0D\260\3"..., 73728, 1443397632) = 73728
0.000111 pwrite64(23, "\2\242\0\0\223\26"..., 73728, 1450336256) = 73728
0.000111 pwrite64(24, "\6\242\0\0\201\0\"..., 65536, 1056768) = 65536
0.000111 pwrite64(24, "\6\242\0\0\212\"..., 32768, 1130496) = 32768
0.000113 pwrite64(24, "\6\242\0\0\205\212"..., 32768, 290496512) = 32768
0.000114 pwrite64(24, "\6\242\0\0\213\212"..., 81920, 290545664) = 81920
Figure 6-24. This is an operating system trace of an Oracle Database 11g database writer on
a very DML-intensive system. No lines where removed between the write system call
(pwrite64). Notice the database writer made a variety of different-sized writes.
Database Writer-Related Contention Identification and Resolution

There are a variety of database writer-related wait events. One way to categorize the
contention situation is to understand if the database writer is having “push-to-disk” issues or
“pull-from-write-list” issues. Most of the issues are pushing—that is, writing to disk issues.
But there is a very common pull situation that I’ll detail in the next section.
All wait events related to database writer push-to-disk issues start with db file. As
with other IO events, just before and after the IO call, a gettimeofday call is made, and
the difference is what we see through Oracle’s wait interface. Here are the two common
database writer push-to-disk wait events:
• db file parallel write: By the far the most common database writer wait
event, a parallel write is simply a multiblock write, like most of the calls shown in
Figure 6-24. This is the result of the database writer feeding off the write list and
writing a batch of dirty blocks to disk. I like to see the wait time below 5 ms, but every
organization has its own budget and service level requirements. A write less than 5 ms
indicates write caching is working well. The write calls in Figure 6-24 range from
around 1.5 ms to nearly 40 ms!
• db file single write: This should never be the top wait event. This can
possibly occur at the end of a checkpoint when all the database file header blocks must
be written. These are done one at a time by multiple single-block writes.
218
I look at IO issues in term of requirements and capacity. When there is an IO issue, the
requirements have exceeded the capacity. The only exception is when a locking or blocking
type of issue occurs, such as with a free buffer waits event (discussed in the next
section). When you see a database file write issue, except for locking/blocking reasons, you
know the IO requirements have exceeded the IO subsystem’s capacity. With modern IO
subsystems, I never assume the requirements are due to the database I’m working on, or even
another Oracle database. With complex IO management comes the increased chance a
different system’s files are residing on the same physical disk as my database’s files. So I’m
very careful in my IO subsystem assumptions.
IO issues can become very emotional. Vendors can get involved, and people start
protecting their turf. To help move toward a solution, I frame the issues and solutions using
OraPub’s 3-circle analysis method. From an application perspective, I find the SQL
generating the dirty buffers. You will find one or more update, insert, and/or delete
operations. They must be there, or these wait events would never have reached the top of the
list.
From an Oracle perspective, I start thinking of any instance parameter that may increase
Oracle’s IO writing efficiency. For example, I would investigate looking for a way to increase
the database writer’s batch write size. As I mentioned, there are version-specific ways to alter
the database writer’s batch size. Investigate the _db_block_write_batch and the
_db_writer_max_writes parameters. Also considering increasing the instance
parameter _db_writer_max_scan_pct (the default may be 40, for 40%) or
_db_writer_max_scan_cnt, as they determine how many LRU buffer headers a server
process will scan before it signals the database writer to start writing. Increasing these
parameters provides more time for the write list to build up, and therefore results in more
blocks written per database writer IO request. This increase in efficiency can allow more
blocks to be written to disk per second. My tests have shown that by increasing
db_writer_max_scan_pct from 5 to 95, database writer operating system write calls
decreased by 9% and db file parallel write waits decreased by 3%, while
transaction activity increased by 14% and block changes per second increased by 19%. That’s
amazing, because just by changing this instance parameter, IO activity decreased while
workload performed increased. This is exactly the type of improvement you want to see when
tuning Oracle!
Another Oracle-centric possibility is to encourage server processes to not signal the
database writer to write, thereby letting the write queue build up. When a server process, in
search of a free buffer, stumbles across an unpopular dirty buffer and moves it to its
associated write list, it also checks to see if the write list is long enough to be written. If it is
long enough, the server process will signal the database writer to write. So a valid option, to
allow the write queue to build up resulting in a larger batch write, is to increase the
_db_large_dirty_queue (the default is 25 on some systems) instance parameter. But
be careful about creating too large of a write queue. When the dirty buffers are being written
to disk, they are unavailable for change. Any process that needs to change one of the buffers
being written to disk must wait. The associated wait event is write complete waits. It
is unusual for write complete waits to be the top wait event, but if someone has been
altering the write queue length, this could occur.
Finally, from an Oracle perspective, increasing the buffer cache can provide the database
writer some relief from short but intense block change activity bursts. A larger buffer cache
allows the cache to fill with dirty buffers, while not forcing the database to write
unexpectedly, so it has more time to create larger and therefore more efficient batches. If you
219
really want to stress a database writer, create a small buffer cache and get some DML going.
You’ll see the database writer frantically trying to rid the tiny buffer cache of dirty buffers.
From an operating perspective, there should be an IO bottleneck. While rare, if write
caching is working wonderfully with IO write response times less than 5 ms, the total write
time could be large enough to push, say db file parallel write, to the top. When
this occurs, focusing on the IO subsystem will likely have little substantial effect. In this
situation, focus on reducing the application IO write activity and increasing Oracle write
efficiency. This also means creatively reducing write activity during peak database activity.
Experienced DBAs have seen write-intensive IO activity like RMAN, backups and file
transfers being performed during normal business hours. Decreasing non-Oracle IO
requirements effectively increases the IO subsystem’s capacity.
Free Buffer Waits

The wait event free buffer waits is closely related to database writer activity and is
usually seen with the wait events db file parallel write and log file
parallel write (log writer writing). Yet the free buffer waits event is unique in
that it takes an interesting combination of database writer, IO subsystem, and server process
activity to manifest.
Key to understanding free buffer waits is the difference between a push-to-disk
issue and a pull-from-LRU-chain issue. When we see the wait event db file parallel
write rise to the top of our reports, the problem is too much time being spent writing, or
pushing, dirty buffers to disk. (The buffers are not actually moved to disk, but copied, which
results in the block and buffer matching creating a free buffer.) A server process posts a free
buffer waits event when it can’t find a free buffer quick enough. One reason for this is
the database writer has not pulled enough dirty buffers from the LRU chain and made them
free again. So it’s not a push-to-disk issue, but instead a pull-from-LRU-chain issue. While
the difference may seem exaggerated, it has a huge impact on our solution approach.
If we focus on the pull issue, then we will do whatever it takes to ensure the database
writer does not get too far behind in pulling dirty buffers from the buffer cache. In other
words, our goal is to get the database writers to do more work because the issue is not writing
to disk. If the issue were writing to disk, the top wait event would be db file parallel
write, not free buffer waits.
Let’s walk through another buffer cache scenario to highlight the free buffer wait
situation and also round out the database writer discussion I started in the previous section.
Suppose you are a server process who must query a row, which resides in a specific block.
Based on the SQL statement and the data dictionary, you know the file and block number.
You’re hoping the block already resides in the buffer cache, but to confirm this and acquire its
memory address, you must access the CBC structure. Referring to Figure 6-6 as an example, I
hash on the file and block number, which points me to hash bucket CBC 01 (in Figure 6-9).
Based on this hash bucket, you look up the associated latch (CS 800) and contend for it. After
a few spins, you are able to acquire the latch, so you can begin your sequential CBC search.
However, as Figure 6-9 shows, CBC 01 is empty and contains no buffer headers. Therefore,
you know the block does not reside in the buffer cache. To get the block, you make a read call
to the IO subsystem (for example, readv in Linux), and while waiting, you post a db file
220
sequential read. Finally, you get the block, stop waiting, and start consuming CPU.
You now have the block in your PGA memory.
You need to find a free buffer to cache the block, so you go to your LRU chain. But
before you can scan your LRU chain, you must acquire the appropriate LRU chain latch. Your
LRU chain is LRU 05 (in Figure 6-15), so you contend for latch CS 910. Finally, with both
your LRU chain latch and the block in your PGA memory, you begin scanning LRU 05,
looking for a free buffer. You first encounter buffer header BH 365 and examine the buffer
header. The good news is the buffer is free, but the bad news is it’s very popular, with a touch
count of 12. So you promote the buffer to the MRU end of the list, and with great satisfaction,
reduce its touch count to 0.
You move on to the next buffer header, BH 330. You examine the buffer header and
discover it is a dirty buffer with a touch count of 1, which does not meet the default popularity
threshold of 2. So, you move the buffer header to the LRU chain’s write list. After you
complete the move, you check the dirty list length to ensure it is less than
_db_large_dirty_queue. The dirty list is only 6, which is less than the common default
of 25, so you do not tell the database writer to write its write list.
Now suppose you have scanned more than _db_writer_max_scan_pct buffer
headers. If so, you would be very frustrated. You would have consumed a lot of CPU and held
the LRU chain latch for a relatively long time. Suppose you have scanned more than this
threshold. You now stop scanning, release the LRU chain latch, tell the database writer to free
up some buffers, post the wait event free buffer waits, and patiently wait for 10 ms.
The really interesting thing is that while you are screaming “free buffer wait!” for 10 ms, the
database writer is busy writing dirty buffers to disk, making them free, and then inserting
them back into the LRU end of the LRU chain.
Now that you’ve waited 10 ms, you wake up, acquire the LRU chain latch again, and
start your search at the beginning—at the LRU end of your LRU chain. There is now a very
high likelihood an unpopular free buffer is waiting there for you to replace. And there is! You
pin the buffer header, release the LRU latch, update the buffer header, appropriately move the
buffer header in the CBC structure so other processes can find the block you’re bringing into
the cache, replace the free buffer with the block you read from disk, and unpin the buffer
header.
Notice what led the server process to post the free buffer wait event. First, it performed a
physical IO read, which forced the server process to search for a free buffer. Second, it needed
to scan too many dirty buffers, which means there must be active DML SQL. Third, the
database writer had fallen behind in ensuring there were enough free buffers at the LRU end
of the LRU chain. Each of the three conditions contributed to the situation, which means we
have three solutions to the same problem!
If the top wait event is free buffer waits, focus on the pull, not the push. If you
forget this, you will devise inappropriate solutions.
The operating system could be suffering from either or both (it quickly flips between the
two) a CPU and an IO bottleneck, but probably an IO bottleneck. In my personal experience,
a free buffer waits event has never been solved by focusing on the operating system.
It has always been a combination of working at the problem from an Oracle and application
perspective. If there is a CPU bottleneck, look for non-Oracle CPU-consuming processes.
Here are a few Oracle-centric solutions:
221
• Increase the buffer cache. If there is memory available, increase the buffer cache.
This will allow more room for buffers, which increases the likelihood of finding a
free buffer.
• Increase database writer pull power. For example, increase the number of
database writers. Do anything you can think of to help the database writers increase
their rate of dirty buffer writing. Unless the buffer cache is grossly undersized, this is
probably your best Oracle-focused solution.
• Increase _db_writer_max_scan_pct. This will give the database writer more
time to flush its write list. This could result in LRU chain latch contention, because
the server process searching for free buffers will be searching more buffers before
giving up and posting a free buffer waits event.
• Decrease write batch size threshold. This will force the database writer to flush the
write list more often, increasing the likelihood of there being a free buffer at the
LRU end of the LRU chain. To decrease the write batch size, decrease the instance
parameter _db_large_dirty_queue. A server process cannot move a dirty
buffer to the write list if the database writer is busy writing the list’s buffers to disk.
If a server process, while looking for a free buffer, tries to move an unpopular dirty
buffer to the being-written dirty list, it will wait, posting a free buffer waits
event. If the write batch size has been increased to combat db file parallel
write issues, it may have been increased too much. This is unusual, but can occur.
Here are two application-focused solutions:
• Find and tune physical IO SQL. Without a block read from disk, there would be
little cause for a free buffer waits event. Find the top physical IO SQL.
Usually there are only a few large SQL statements consuming physical IO that
clearly stand out. Tune or reduce their execution rates with the objective of reducing
the amount of physical IO generated.
• Find and reduce DML SQL impact. Since a free buffer waits event is
related to too many dirty buffers in the LRU chain, there must be DML SQL.
Depending on your performance tools, DML SQL can be difficult to find because it
may be a high physical IO, logical IO, execution rate, or CPU consumer. It could be
a sneaky combination of many statistics. If I cannot view the type of SQL
(v$sql.c_type), then I look at both the top physical IO and logical IO SQL, and
then check the statement itself. There is a good chance the DML SQL is also the top
physical IO SQL statement as well (I see this a lot). If so, you’ve hit the jackpot and
know you have found the key SQL statement. For more details about identifying
DML SQL, refer to Chapter 8 section Application-Focused Solutions for Log File
Parallel Write Contention. The objective is to reduce the impact, so that can mean
tuning and also workload balancing, which will reduce the execution rate during
peak times.
222
Buffer Busy Waits

Buffers cannot be locked, because they are not relational structures. However, for a number of
reasons, they can be temporarily unavailable. When this occurs, the buffer is deemed busy.
Distilling all the complexity down to its essence, a buffer busy waits event is about
limited concurrency. A process needs to access a buffer, but it can’t because another process
is accessing the buffer and won’t allow concurrent access. There are a number of situations
when this can occur and specific solutions in each case.
What makes buffer busy waits events challenging is there are multiple solid
diagnostic approaches. One of the most common diagnostic buffer busy approaches is using
what is known as the reason code. The reason code consists of three digits, with each digit’s
value revealing part of the story as to why the buffer is busy and cannot be immediately
accessed.7 While many performance specialists use this method, I was never comfortable with
it. Strangely, in Oracle Database 11g, Oracle removed the reason code from the
v$session_wait and v$session view’s p3 column! If you are used to using the reason
code, don’t fret, because the method I will detail works wonderfully.
In Oracle Database 10g, Oracle took one of the most common buffer busy situations and
gave it its own wait event. So what we now see as a read by other session event
used to be part of the buffer busy waits event. While this did cause some confusion,
it’s actually good news, because it makes both wait events more specific. I will also detail the
read by other session event in this section.
The Four-Step Diagnosis

The key to solving a buffer busy type waits (which includes the read by other
session event) is to first gain an understanding of the buffer being waited upon. It’s not
enough to know the top event is buffer busy waits. You will need to gather additional
information before you know the correct solution set. As I’ll detail here, the additional
required information is the buffer’s type and if it is a header block. With this information in
hand, you know enough to develop a number of solutions to the problem. To aid in this
process, I have developed a four-step diagnostic process:
1. Determine if there is a parameter pattern.
2. Identify the buffer type.
3. Determine the header block.
4. Implement the appropriate solution set.
The following sections detail each of these steps.
Determining If There Is a Parameter Pattern

To start the diagnosis, repeatedly sample the buffer busy waits event’s p1 and p2
parameters. The p1 parameter is the buffer’s file number, and the p2 parameter is the block
number. Normally, when you repeatedly sample the p1 and p2 values, they will change,
indicating different buffers are being waited upon. It is also common to see a smaller subset of
7
If you are curious about using the reason code, you can find plenty of information available online. Just do an
Internet search on “oracle buffer busy reason code.”
223
blocks that are related to a specific object, say the employee table. Finally, while rare, but
important to notice, is if the same buffer or two always seem to appear. These would be
deemed hot blocks. When you repeatedly sample, take note of the parameter values, as they
will be used later in our analysis.
Figure 6-25 shows a single sample query providing the parameter details. In this single
sample, three processes were waiting for file number 4 and block number 54. Another sample
will normally show multiple processes waiting for a different buffer. Once you have a few
samples, move on to step 2.
SQL> @swswp buffer%busy

Sess
----- ---------------------------- ------------ --------- -----
4391 buffer busy waits 4 54 0
4405 buffer busy waits 5 10340
4 rows selected.
SQL> l
2 p1, p2, p3
SQL>
Figure 6-25. This sample v$session_wait-based OSM script swswp.sql displays the
session numbers along with their wait event parameters for a given wait event. In this
situation, we are looking for buffer busy waits parameter values and parameter value
patterns.
Identifying the Buffer Type

Every Oracle block buffer is part of an Oracle segment. Every Oracle segment is a segment
type, such as a data segment, index segment, undo segment, or temporary segment. The
solution set is based, in part, on the segment type. Using the buffer busy wait event’s
p1 (file number) and p2 (block number) gathered in step 1, the segment type can be
determined from a query based on dba_extents and dba_data_files, such as the one
shown in Figure 6-26.
On a large Oracle system with hundreds of thousands of extents, querying
dba_extents can take an unexpectedly long time. This is one reason why I force an
additional step to run a query like the one in Figure 6-26. For performance reasons, some
DBAs will periodically make a copy of the dba_extents view and run their report based
224
on the copy. Unless a real-time copy of the extent situation is needed (which isn’t the case in
this example), dba_extents-based queries can run noticeably faster.
def file_id=&1
def block_id=&2
col a format a77 fold_after
set heading off verify off echo off

set termout off
col tablespace_name new_value ts_name
select ts.name tablespace_name

from v$tablespace ts, v$datafile df
where file# = &file_id
and ts.ts# = df.ts#
/
set termout on
select
'File number :'||&file_id a,
'Block number :'||&block_id a,
'Owner :'||owner a,
'Segment name :'||segment_name a,
'Segment type :'||segment_type a,
'Tablespace :'||e.tablespace_name a,
'File name :'||f.file_name a
from dba_extents e,
dba_data_files f
where e.file_id = f.file_id
and e.file_id = &file_id
and e.block_id <= &block_id
and e.block_id + e.blocks > &block_id
and e.tablespace_name = '&ts_name'
/
Figure 6-26. Taken from the OSM tool kit script objfb.sql, given a block’s file number
and block number, this script produces a useful set of information about the block’s object.
Figure 6-25 shows three sessions are waiting for file 4 and block 54. Executing the script
shown in Figure 6-26 resulted in the output shown Figure 6-27. This particular buffer (file 4,
block 54) is a data segment and part of the orders table. We don’t yet know if the buffer is
the orders table header block or a data block containing rows. Step 3 will reveal the
answer.
225
SQL> @objfb 4 54
File number :4
Block number :54
Owner :OE
Segment name :ORDERS
Segment type :TABLE
Tablespace :USERS
File name :/u01/oradata/prod16/OE01.dbf
1 row selected.
Figure 6-27. Given the buffer’s block number and file number, along with accessing
dba_extents and dba_data_files, we can glean other information about the block.
In this case, the buffer having three sessions waiting for access is part of the orders table.
Determining the Header Block

Every Oracle segment has a header block. For example, as shown in Figure 6-28, both table
segments and undo segments have a single header block. The first block of the first extent is
the segment’s header block. Header blocks are different from the remaining object’s blocks
because they contain something special. And that specialness is dependent on the segment
type. This is why we needed to first determine the segment type. Now we will determine if the
block is a header block.
Figure 6-28. Shown are two segment types: an undo segment and a table segment. Each
segment has two extents. Each extent contains eight contiguous blocks, and each segment has
a single segment header block. In Figures 6-25 and 6-27, the orders table block 4,54 was
busy, with three sessions waiting to access.
226
Figure 6-29 is a simple dba_segments-based query. Each row in dba_segments

contains information about a single Oracle segment. It also contains columns for the header
block’s file number and block number. So a simple query like the one shown in Figure 6-29
will reveal if the busy buffer is in fact a header block. In our example, block 4,54 is not a
header block. Therefore, because the segment type is table, we know the busy buffer contains
rows. In addition to this, when I repeatedly queried the v$sesson_wait-based script
shown in Figure 6-25, I noticed that different orders table blocks were being waited upon.
This means sessions are waiting to access rows within multiple blocks of the orders table.
SQL> l
1 select *
2 from dba_segments
3 where header_file = 4
4* and header_block = 54
SQL> /
no rows selected
Figure 6-29. This simple dba_segments-based query clearly shows that block 4,54 is not a
header block.
Now that we have identified the buffer busy pattern, the object type, and whether the
block is a header block, we have enough information to proceed directly to a list of possible
solutions.
Implementing the Appropriate Solution Set

The reason why buffer busy waits events give firefighters so much trouble is that
before they can get to an appropriate (and I stress appropriate) solution set, they must first
determine the busy pattern, the block’s segment type, and if the block is a header block. With
this information, we are now ready to jump to a very specific solution set. Solutions for the
most common buffer busy situations are presented in the following sections.
Solutions for a Single Busy Table Block

If the same single buffer is nearly always the busy buffer, then we need to figure out why. In
our example, a group of buffers were busy, including buffer 4,54; that is, not just a single
buffer was nearly always busy. But if there is a single busy buffer, you need to clearly
understand what information is being stored in the block (perhaps query the rows just in that
block) and why the application is so interested in the block.
You can easily determine the waiting SQL statement based on the waiting session’s
v$session.sql_id column. Eventually, you will probably need to talk with application
developers, as situations like these are typically very application-specific. I have found this
situation commonly (but not always) is the result of Oracle sequence numbers not being used.
When asked why Oracle sequence numbers are not being used, you may receive a response
such as, “Well, we want our application to be database vendor independent.” This is another
way of saying, “Well, we decided to write our application at the lowest common denominator
to make it easy to port.”
227
The problem is that this rapid buffer access puts an incredible high concurrency
requirement on the buffer’s internal structures. When the internal structures are being
changed, the block will be unavailable to other processes, issuing the buffer busy
waits event. Because changing the architecture of the application is usually not a realistic
option (but should be considered), creatively finding ways to spread the popular row
information or popular rows is a solid approach. For example, if each row contains an
application sequence number (like the next check number), then move the rows into their own
blocks, add a large fixed-length column, or set pct_free to a very high value to keep the
number of rows to a minimum. This solution is painful, but not as painful as continual buffer
busy waits.
Solutions for Multiple Busy Table Blocks

Multiple busy table blocks are by far the most likely buffer busy waits (pre-Oracle
Database 10g) and read by other session (Oracle Database 10g and later) wait event
situation. In this case, each time we check for busy buffers (for example, Figure 6-25), they
are different, the busy buffers are data segments, and they are not header blocks. When this
situation occurs, the cause is usually centered around queries or a mix of DML and queries.
For pure query situations, what is happening is multiple sessions require the same block,
and that same block does not reside in the buffer cache. The first session that makes a call to
the IO subsystem will post either a db file sequential read or db file
scattered read wait event. It would be silly for the other sessions to also issue an IO
call for the same block, so they wait for the first session to complete not only the IO call, but
also for the block to be properly placed into the buffer cache. Then the other sessions can
access the buffer just like any other session. While the other sessions are waiting for the first
session to complete, starting in Oracle Database 10g, the waiting sessions will post a read
by other session wait event; for prior releases, the waiting sessions will post a
buffer busy waits event. So truly, the session is waiting for the read by the other
session to complete!
The solutions are very straightforward and centered around increasing the likelihood of
blocks residing in the buffer cache. If the buffer were in the buffer cache, then this busy buffer
wait situation would not exist. So think of ways to increase the likelihood of finding buffers in
the buffer cache.
Application-focused solutions center on finding the top physical IO SQL and tuning it,
focusing on reducing physical IO. Not only does this solution increase statement performance
due to a decrease in block accesses, but it also reduces the chances of a buffer busy wait and
allows blocks from other objects to be stored in the buffer cache. An Oracle-focused solution
would be to increase the buffer cache to increase the likelihood of the buffer residing in the
cache and therefore negating the need to make an IO call. Another creative, yet very possibly
impractical, Oracle-focused solution is to reduce the database block size. Smaller blocks
combined with a random row access pattern results in a more efficient cache. In other words,
the cache is more likely to contain truly popular rows, not full of rows we don’t care about.
An operating system-focused solution is to decrease IO read response times. The faster a
block can be retrieved, the less time sessions will be waiting.
Personally, I seriously consider each option in parallel. I will detail this in the last
chapter of this book, but I predict that the expected performance change in terms of
throughput increase and response time decrease. I also include factors such as the difficulty
and skill required, how long it will take to implement, and the uptime and availability issues.
228
There are also other issues and things to consider that I may know nothing about. Perhaps
there are considerations related to the business or budget. So I don't simply start with the SQL
or with increasing the buffer cache. I gather information and help others gather information.
Then, together, we cooperatively develop the best forward motion plan.
Another common nonheader data block issue is when there is a mix of buffers being
changed and queried. Typically, this involves both DML and query SQL, but DML SQL can
also touch a tremendous number of buffers during filtering. The likely problem is the DML is
updating internal buffer structures while another process wants to query the block contents or
also make internal buffer changes. Remember that these changes are not row changes, but
internal Oracle structure changes. If there is a row or table locking issue, an enqueue wait
(discussed later in this chapter) will be posted. Good strategies center on reducing
concurrency. Consider reducing the block density (moving rows to other blocks or increasing
pct_free), decreasing workload/concurrency activity during peak times, and reducing the
number of buffers being touched by both the DML and query SQL.
Solutions for Table Segment Header Blocks

Figure 6-28 diagrams two segments. The segment on the left is an undo segment, and the
segment on the right is a table segment. This section is specific to table segments, and the next
section is specific to undo segments.
The first block of the table’s first extent is known as the segment header block. As with
all segment header blocks, they contain very special internal Oracle structures directly related
to their type of segment. For table segments, part of the header block contents is information
about the location of blocks that can accept additional inserted rows. These blocks are known
as free blocks. When a process must insert a row into a table, it first retrieves the table’s
segment header block to find a free block, it retrieves that buffer, and then it inserts the row. If
there are many processes concurrently inserting rows into the same table, a table segment
head block induced buffer busy waits event can result.
229
Figure 6-30. A diagram of the order table segment head block (4,50), which contains free
lists with links to free blocks
Fortunately, the solutions are very straightforward and they work very well. If you are
using manual segment space management, then segment space management is controlled by
what are called free lists. As shown in Figure 6-30, I picture a free list as a jellyfish-like
structure, with each of the stinging strings a list containing nodes, which point to free blocks.
Oracle’s free list approach usually works fine, but in high-concurrency situations, the existing
free lists simply cannot handle the workload. Fortunately, we can easily alter the table to
create additional free lists in another one of the segment’s blocks. The result is a header block
popularity reduction, and hence a reduction in buffer busy wait contention. Simply keep
adding free lists until the contention subsides. The number of free lists can be found in the
free lists column in the dba_segments view. If the column is empty, then we know free
lists are not being used and automatic segment space management (ASSM) is being used.
Another option, which focuses more on the longer term, can be performed during a
maintenance opportunity. This is to move high-concurrency segments into a locally managed
tablespace with ASSM enabled. ASSM does not use free lists, but instead uses bitmaps to
manage free space. While this usually allows for increased table segment header block
concurrency, I have spoken with a few of my students who have unexpectedly experienced
increased buffer busy activity when converting to ASSM. For some reason (and they never
found out), an unknown number of events (for example, workload, Oracle release, application
design) had contributed to stressing ASSM in an unexpected way.
Solutions for Undo Segment Header Blocks

Undo segments are different from table segments because they contain information related to
transaction rollback and read consistency. One of the Oracle structures central for rollback
and read consistency to properly occur is called the transaction table. Simply put, the
transaction table is a map to an undo segment’s contents. Every undo segment contains a
single transaction table, which is located in the undo segment’s header block. When intense
DML, especially when combined with read consistency activity, occurs, the transaction table
can become the point of contention, resulting in buffer busy waits events on an undo
230
segment header block. Before moving into how to solve this issue, an introduction to the
transaction table is warranted.
By default, every Oracle transaction generates redo (roll forward) and undo (roll back)
information, as well as the actual data change and the possible index changes. The undo
information is stored in an undo segment, and the map of each undo segment is stored in its
transaction table. Each undo segment’s transaction table can hold multiple transaction entries.
From a relational database way of thinking, each entry corresponds to a row in the transaction
table. Having the word table as part of its name is unfortunate, because the transaction table is
not a relational database structure. In fact, the Oracle kernel developers refer to the transaction
table entries as slots.
Because the transaction table is stored in an Oracle block, there is a limit to the number
of slots it can hold, which depends on the Oracle block size. If the transaction table becomes
full and a new entry must be added, the oldest inactive8 transaction entry is overwritten by the
new entry. If the transaction is active and there is no more room in the transaction table or
multiple sessions need to change the transaction table, a buffer busy waits event is
posted. I provide a lot more information about the transaction table and undo entries in the
“TX Enqueue Waits” section later in this chapter.
If the database is not using automatic undo management (AUM), and is therefore using
traditional rollback segments, the solution is simple. Just create an additional rollback
segment, which will also create an additional transaction table, thereby distributing
transaction table activity. You will notice that the buffer busy wait contention subsides. Keep
adding additional rollback segments until buffer busy waits is far from the top wait
event.
Most Oracle systems today take advantage of Oracle AUM capabilities. By default,
Oracle tries to assign only one active transaction per undo segment. If each undo segment has
an active transaction and if there is space available in the undo tablespace, Oracle will
automatically create an additional undo segment. This usually takes care of the buffer busy
waits. However, if there is no more room in the undo segment tablespace, multiple
transactions will be assigned to each undo segment, and eventually undo segment header
contention will result. The solution is to add another database file to your undo segment
tablespace, thus enabling Oracle to create additional undo segments. It’s that simple.
Solutions for Index Leaf Blocks

Simply put, indexes are ordered structures. This ordering allows indexes to be used to quickly
search. Any changes to the index must result in the order being maintained. If order is not
maintained, searching could not occur quickly, and the index would be both worthless and
corrupt. So order in an index must be maintained. This has profound performance
implications.
The Situation
While index ordering enables quick searching, in a high-concurrency insert situation, it can
cause severe performance problems. For example, suppose an index is based on an increasing
sequence number (for example, 1, 2, 3, and so on), which is commonly called a monotonically
increasing value. If a table column contains this sequence number and that column is indexed,
8
An inactive transaction has been either commited or rolled back. Therefore, an active transaction is in process
and has not been committed or rolled back.
231
the respective index entries will be placed next to each other in an index leaf block. Why?
Because an index must maintain sorted order, and this index is based on an ascending
sequential number.
The problem can occur when many sessions concurrently insert rows. As each insert
statement gets the next sequence number, the index entry will likely be physically placed into
the same index leaf block or less likely into the adjacent index leaf block. If the concurrency
increases enough, sessions will need to wait a relatively long time for another session to
complete its leaf block change. When the session waits, it will post a buffer busy
waits event, and the buffer it is waiting on will be an index leaf block. This situation can
become extremely serious and a massive performance killer.
This is actually quite easy to simulate. Simply create a table with a column that will
contain a sequence number. Create an index on the sequence number column, and then create
the sequence with an increasing value of one. Now have multiple sessions (perhaps only ten)
begin concurrently inserting rows into the table. Make sure each of these sessions uses the
next sequence number in its insert statement. Since the sequence numbers will be sequential,
the indexed column contains the sequence number, and the index entries must be stored in
sorted order, it is very likely that nearly all of the concurrent sessions will be making entries
into the same index leaf block! With this load occurring, run any wait event report, or simply
look at the v$session or v$session_wait views, and observe the sessions posting
buffer busy waits events. If you determine the object based on the file number and
block number (p1 and p2), it will be the index! Exciting yes, but very bad for performance.
The Solution
One solution is to have Oracle store the indexed sequence numbers (1, 2, 3, and so on) with
their bytes reversed. From a DBA perspective, the sequence numbers are just as they were
before, but internally their bytes will be reversed. Because index entries must be stored in
sorted order internally, the index entries will probably be placed into different index leaf
blocks, eliminating the buffer busy waits events.
Here’s an example of how Oracle reverses the bytes of each indexed column. Suppose
the sequence numbers are represented in 4 bytes. So the first four sequences (1, 2, 3, 4) would
be represented as 0001, 0010, 0011, and 0100. If these four values were used in the index,
because index ordering must be maintained, they will be placed next to each other. However,
if they bytes were reversed, they would be represented like this: 1000, 0100, 1100, and 0010.
Since they must also be placed into the index in sorted order, they will likely not be placed
into the same index leaf block. In fact, they will probably span all the index leaf blocks. As a
result of key reversal, the buffer busy wait will be eliminated.
Creating a reverse key index is very simple. Use the same index creation DDL as usual,
but as the following code snippet shows, simply add the reverse keyword at the end. That’s
it!
SQL> create unique index special_cases_rk_u1

2 on special_cases (object_id) reverse;
Index created.
232
One final note: Because there will be intense index insertion in every leaf block, to
reduce the frequency of leaf block splitting, consider adding the pctfree 50 storage
parameter when creating reverse key indexes.
The Good News and the Bad News

To give you an idea of the performance impact of using a reverse key index, I created the test
situation I described earlier (in “The Situation” section). Over the 120-second test interval,
using a nonreverse index, 75% of all wait time was related to buffer busy waits events
on an index block with a transaction rate of 0.80 trx/sec. (Each transaction inserted 50,000
rows.) I reran the test with the only difference being the creation of a reverse key index. Over
the 120-second test interval, 28% of the wait time was related to buffer busy waits
events, with a transaction rate of 0.95 trx/sec. That’s great news, because the wait time
decreased and the transaction rate increased!
The bad news is while resolving the buffer busy wait issue, we could have significantly
impacted query performance. Suppose we optimized a query based on a nonreversed sequence
number column index. But suppose because of buffer busy waits, we dropped the nonreversed
key index and created a reversed key index. Now those nicely ordered sequence numbers are
scattered all over the index leaf blocks. That high-performing index range scan may not
perform so well now.9 In fact, the cost-based optimizer (CBO) should recognize this situation
and devise another execution plan strategy. Otherwise, the query could potentially touch
every index leaf block and also a large number of data blocks!
So be aware that while reverse key indexes can increase insert concurrency and
throughput, they can also have a severe negative impact on previously tuned queries. As a
performance analyst, your job will be to find a performance compromise or implement a
creative solution (like partitioning the index or the data) to maximize performance for both
insert and select operations.
Enqueue Waits
Enqueues are used to lock relational and nonrelational Oracle structures in an orderly fashion.
The relational structures could be Oracle data dictionary tables or application tables. For
example, when Oracle updates the sys.col$ table or an application updates its employee
table, enqueues are involved. If a server process is prevented from locking a table, not only
will it be posting an enqueue wait event, but the lock will show in v$lock, dba_lock,
v$enqueue_statistics and other views. Nonrelational structures that are locked to
prevent an inappropriate change could be a library cache cursor.
As their name implies, enqueues are very orderly and ensure the structures are changed
in a very deterministic manner. Compared to latching, enqueues are very boring and, as I like
to say, mature. There is no likelihood or probability involved with enqueues. A process’s
enqueue request simply gets pushed onto the appropriate queue, and when it’s time to be
processed, its enqueue entry is popped off the queue (known as a dequeue). Not much
excitement here, but then enqueues are not about adventure, they are about ensuring Oracle
structures are changed in a very orderly and accountant-like fashion.
9
In case you’re wondering about my test, the negative query performance did not occur because my test data
volume was not that large. Beware of low data volume tests!
233
Oracle maintains a shockingly large number of enqueues.10 In Oracle Database 10g

Release 2, I found 208 enqueues; in Oracle Database 11g Release 1, I found 247 enqueues.
But don’t panic, because there are only a handful of enqueues you’re likely to encounter. Plus
if you’re a seasoned DBA, you have already been dealing with row- and table-level locks,
which use enqueues.
Diagnosing Enqueue Waits

When solving enqueue problems, first determine the enqueue type, then determine the SQL
involved, and finally develop a solution based on your knowledge of the application and the
associated Oracle internals. Before I drill down into the most common enqueue wait, the
transaction (TX) enqueue, it’s important to know how to determine which enqueue is being
waited upon and the associated sessions for both pre-Oracle Database 10g and later systems.
Prior to Oracle Database 10g, the wait event for all enqueues was simply enqueue.
This is indeed unfortunate, as it requires the firefighter to sample from either v$lock or
v$session_wait to determine the enqueue name. The SQL to determine the enqueue
name from v$session_wait, shown in Figure 6-31, is particularly unsavory. When run,
the Figure 6-31 SQL provides details for every session waiting for an enqueue. Figure 6-32
shows one such example, where three sessions are involved with a table lock.
In Figure 6-32, session 4388 has the table locked, is not waiting for the lock, and
therefore is not shown. The first session in queue is session 4387, followed by session 4393.
A simple way to determine the SQL being run and hence the table involved is to query the
session’s sql_address or sql_hash_value from v$session. For the TM enqueue,
the table can be easily identified since the p2 column (also known as the ID 1 column)
contains the object_id, which can be referenced in dba_objects. This makes
determining the object of contention very simple. Oracle is constantly adding enqueues, so I
have found the best way to get enqueue identifier details is to perform a very specific Internet
search.
10
For a list of enqueues, just do an Internet search for “oracle enqueue lock names.”
234
col sid format 9999 heading "Sid"

col enq format a4 heading "Enq."
col edes format a30 heading "Enqueue Name"
col md format a10 heading "Lock Mode" trunc
col p2 format 9999999 heading "ID 1"
col p3 format 9999999 heading "ID 2"
select sid,
chr(bitand(p1,-16777216)/16777215)||
chr(bitand(p1, 16711680)/65535) enq,
decode(
chr(bitand(p1,-16777216)/16777215)||chr(
bitand(p1, 16711680)/65535),
'TX','Row related lock (row lock or ITL)',
'TM','Table related lock',
'TS','Tablespace and Temp Seg related lock',
'TT','Temporary Table',
'ST','Space Mgt (e.g., uet$, fet$)',
'UL','User Defined',
chr(bitand(p1,-16777216)/16777215)||chr(
bitand(p1, 16711680)/65535))
edes,
decode(bitand(p1,65535),1,'Null',2,'Sub-Share',3,'Sub-Exlusive',
4,'Share',5,'Share/Sub-Exclusive',6,'Exclusive','Other') md,
p2,
p3
from v$session_wait
where event = 'enqueue'
and state = 'WAITING'
/
Figure 6-31. The core and working SQL from the OSM swenq.sql script, which provides
enqueue wait details for pre-Oracle Database 10g systems.
Just as with the latching wait event, starting with Oracle Database 10g, each enqueue has
been given its own wait event. This saves us a diagnostic step, as we can determine both the
involved sessions and the enqueue type with a single simple query. Figure 6-33 shows another
locking situation based on an Oracle Database 10g system. In this situation, three sessions are
involved. Session 4393 has the table locked and is therefore not waiting and not shown.
Sessions 4383 and 4388 are waiting to lock the table and are therefore posting a TM enqueue
wait. The table involved can be determined by cross-referencing the P2 column (49911) with
the object_id column in the dba_objects view. And, of course, referencing
v$session will review many interesting details about the session, such as the SQL being
run.
235
SQL> /
Sid Enq. Enqueue Name Lock Mode ID 1 ID 2

----- ---- ------------------------------ ---------- -------- --------
4387 TM Table related lock Exclusive 49911 0
4393 TM Table related lock Sub-Exlusi 49911 0
Figure 6-32. Output from the SQL shown in Figure 6-31. Three sessions are involved.
Session 4388 has the table locked and is not waiting and therefore not shown. Next in the
queue is session 4387, followed by session 4393. The table involved has an identifier of
49911, which can be cross-referenced from the object_id column in the dba_objects
view. The ID 1 report column is actually the v$session_wait column p2.
SQL> @swswp enq%

Sess
----- ---------------------------- ------------ --------- -----
4383 enq: TM – contention 1414332422 49911 0
4388 enq: TM – contention 1414332422 49911 0
2 rows selected.
SQL> l
2 p1, p2, p3
Figure 6-33. Shown is a standard v$session_wait-based report (in particular, the OSM
swswp.sql script) with three sessions involved in a TM enqueue wait. In this particular
situation, session 4393 has the table locked and is not waiting on the enqueue. Session 4388 is
next in the queue, followed by session 4383.
TX Enqueue Waits
The TX enqueue wait is arguably the most common enqueue wait. Personally, I think it’s also
the most fascinating. I want to drill down into this wait event because it will give you a much
deeper understanding of how Oracle manages transaction concurrency, which is related to
block cloning, undo, read consistency, and interested transactions lists. I introduced the undo
segment’s transaction table earlier in the chapter. Now we’re going much, much deeper.
While the TX enqueue is known as the row-level lock enqueue, there are actually three
reasons for a TX enqueue to be posted, and only one of those is actually a row-level lock.
Figure 6-34 is very similar to Figure 6-1 shown at the beginning of this chapter. Every Oracle
data block can be abstracted into three areas:
236
• The row data contains the actual Oracle rows and is just one of the important parts of
every data block.
• The variable data contains transaction metadata (discussed in the next section).
• The amount of free space can be reduced by both row data growth and variable data
growth.
Figure 6-34. An Oracle data block model containing the three main block parts. Notice the
block’s free space can be decreased by growth from both the variable and row data areas.
Introduction to Interested Transaction Lists

Residing in every Oracle data block’s variable data area are structures called interested
transaction lists (ITLs). These structures are largely responsible for Oracle’s row-level lock
and read consistency capabilities. From a highly abstracted perspective, you can think of ITLs
like check boxes, as illustrated in Figure 6-35. Each check box is related to a specific
transaction. If you want to update a row, but that locked row is already associated with
another transaction’s ITL, you will receive a TX enqueue wait—truly, a row-level lock.
Figure 6-35. A highly abstracted Oracle data block variable data area highlighting the three
interested transaction lists (ITLs). The ITLs are deeply involved with row-level locking; that
is, concurrency control.
Each Oracle data block is created with a specific number of ITLs. The initial number of
ITLs is controlled by the table’s initrans space parameter and can be queried from the
dba_tables view’s ini_trans column. Starting with Oracle9i, the default ini_trans
237
value is 1, yet a simple block dump clearly shows two ITLs are always created. With two
ITLs, a single block can be subjected to two concurrently active transactions.
Suppose a third transaction wants to change a unlocked row in a block with only two
ITLs. The third transaction’s server process will try to dynamically create an additional ITL.
However, the server process must first ensure the maximum number of ITLs (max_trans)
will not be exceeded and also that there is free space available in the block. If the server
process cannot create an additional ITL, it will post a TX enqueue wait, and the process will
patiently wait. To reduce the likelihood of this occurring, both the default value and the
maximum number of ITLs a single block can contain are set to 255. While the value cannot
be exceeded, it can be reduced by issuing a simple alter table command.11
Once an ITL has been created in a data block, pushing deeper into the block’s free
space, the only way to get the space back is to re-create the entire table. Altering the space
parameters will not affect ITLs already created. This is why the default number of ITLs is set
to 1 (again, two are actually created) and the maximum it set to 255. If the block’s
concurrency requires more than a few ITLs, Oracle would rather the space be consumed than
stall a transaction while posting the TX enqueue wait event.
At first glance, the maximum number of 255 ITLs may seem very limited, but consider
this: Think of highest concurrency table, in the highest concurrency application, in the highest
concurrency database system you administer. Perhaps there is a table that could possibly have
250 concurrent processes updating, deleting, and inserting rows. Now ask yourself how many
processes will be concurrently updating, deleting, or inserting rows into a single block—not
the entire table or extent, but in a single block. This is the number of ITLs that may be
eventually created in a single block. Even with the highest concurrency applications, to have
more than 255 concurrent transactions active in a single block is extremely unlikely. So a
maximum of 255 ITLs is really not that limiting after all. However, if this does present a
problem, you can reduce block concurrency by increasing the table’s pct_free parameter
or perhaps add a fixed-length column to reduce the number of rows that could possibly exist
in the block.
Undo Segment’s Transaction Table

As mentioned earlier in the undo segment header buffer busy wait section, each undo segment
contains a structure called the transaction table located in its header block. Figure 6-36 is a
diagram of an Oracle transaction table. As noted earlier, Oracle developers refer to its rows as
slots. Every occupied slot is associated with a transaction that is storing or has stored its undo
information in the undo segment. If a transaction has committed or rolled back, it is deemed
an inactive transaction; otherwise, it is deemed an active transaction. Besides containing the
slot number and the transaction status, each slot also contains a sequence number. To
distinguish between different transactions that have used the same slot and to enable slot
reuse, the sequence number can be incremented. The UBA is the undo block address,
providing a direct link to the transaction’s undo. The SCN is the transaction’s system change
number when the associated transaction began.
11
Setting and resetting the intial and maximum number of ITLs can be easily performed by a command such
as alter table T1 initrans 10 or alter table T1 maxtrans 10.
238
Figure 6-36. Diagram of an Oracle transaction table, which contains metadata about its
associated undo segment. Each undo segment’s transaction table consists of slots and also
sequence numbers to distinguish transactions that have used the same slot, and also the
transaction status. Additional information about the transaction is also stored. Transaction
table size is limited by the database block size.
Transaction tables are relevant to performance analysts because they supply the
transaction number. Every transaction has an associated transaction number, and that number
is based on the transaction’s transaction table entry. The transaction number consists of three
parts concatenated together. The first part is the transaction table number, the second is the
slot number, and finally the associated sequence number. For example, the first transaction
shown in Figure 6-36 has a transaction number of 00100.000.0007. The connection between
ITLs and the transaction table is that every ITL entry is related to a specific transaction and
contained within the ITL entry is the transaction number, such as 00100.000.00007. Now it’s
time to complete our ITL discussion.
Deeper into Interested Transaction Lists

Now that I’ve introduced ITLs and undo segment transaction tables, it’s time to bring them
together as a single working unit and present how ITLs change during transaction activity.
This section should leave you with a very good understanding of how Oracle manages
transaction concurrency, how Oracle creates read consistent blocks, and why the feared
“snapshot too old” error occurs.
Figure 6-37 is not a model or a diagram. It is an actual block dump created by issuing
the command alter system dump datafile 1 block 75847. At the time of the
block dump, this block (1,75847) contains many rows and has three active transactions
updating four different rows. The first and third transactions shown are updating one row
each, while the second transaction is updating two rows.
239
$ cat prod5_ora_21741.trc
...
Block header dump: 0x00412847
Object id on Block? Y
seg/obj: 0xff6b csc: 0x00.50fcb6 itc: 3 flg: O typ: 1 - DATA
fsl: 0 fnx: 0x412848 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x0003.00d.00000318 0x00c3e3d0.0593.0c ---- 1 fsc 0x0000.00000000
0x02 0x0008.01b.00000340 0x00c41bce.0481.24 ---- 2 fsc 0x0000.00000000
0x03 0x0001.000.00000320 0x00c45fa0.0599.0b ---- 1 fsc 0x0000.00000000
...
Figure 6-37. Shown is a data block (1,75847) dumped in the midst of three concurrent
transactions. There are three active transactions updating a total of four rows.
Let’s take a look at some of the ITL entry aspects:

• Itl: This is the transaction’s ITL number.
• Xid: This is the transaction ID, which is a concatenation of the transaction table ID (in
Figure 6-37, 0003), the transaction’s transaction table’s slot number (in Figure 6-37,
00d), and the sequence number (in Figure 6-37, 0000318). The transaction ID is
important, as it is used to ensure seemingly related undo information is truly related.
• Uba: This is the undo block address. This points directly to the transaction’s most
recent change undo and is necessary for rolling back a transaction and also for read
consistency (cloned buffer construction) purposes.
• Flag: The flag is a status of the transaction and can take on many values. The
following are the most common:
• ---- means the transaction is active; that is, DML is occurring and the
transaction has not been committed or rolled back.
• --U- means the transaction has committed, so any row data referencing the ITL
is not involved in an active transaction and is therefore not locked. The
transaction’s row data may not be consolidated. By consolidated I mean, for
example, if a column was updated, both the before and after values can remain in
the row data.
• C--- means the transaction has been committed, row data has been
consolidated, and the ITL entry in the row data has been removed. Any block
touch can (but does not always) trigger the change to this flag value, including a
select statement. I realize this is difficult to believe, so I will demonstrate it
shortly. This seemingly delayed change is commonly referred to as a delayed
block cleanout, or simply a block cleanout.
240
• Lck: This is the number of rows the transaction has, at some point, locked in this
block. A value greater than 0 does not indicate a row(s) is currently locked. If the
number is 2, as is the case with the second transaction shown in Figure 6-37, the
transaction is related to two rows. The lock value remains until the flag changes to a
C---. This means after a transaction has committed and is no longer deemed active
(flag of --U-), the Lck value can be greater than zero.
• Scn/Fsc: The SCN is the system change number and is used to determine when the
transaction ended (committed or rolled back). Notice in Figure 6-37 that the SCN has
not been assigned, but after the transaction commits, as shown in Figure 6-38, the SCN
is set. The SCN is important when determining if undo retrieval is necessary when
creating a read consistent version of the buffer; that is, creating a buffer clone. The
FSC refers to the free space credit. It is used by uncommitted transactions when an
update or delete operation caused a row to shrink in length. Oracle will preserve this
free space in case the transaction rolls back and the space needs to be refilled. If the
free space was used for something else and then the transaction rolled back, the row
may need to be migrated!
Continuing with the example shown in Figure 6-37, suppose the first two transactions
(ITLs x01 and x02) commit, making their transactions inactive. The third transaction, ITL
x03, has not yet committed. Immediately after the first two transactions commit, the same
block dump command, alter system dump datafile 1 block 75847 is issued,
with the results shown in Figure 6-38. Notice the flag has changed, and an SCN has been
assigned to the transaction.
...
seg/obj: 0xff6b csc: 0x00.50fcb6 itc: 3 flg: O typ: 1 - DATA
fsl: 0 fnx: 0x412848 ver: 0x01

0x01 0x0003.00d.00000318 0x00c3e3d0.0593.0c --U- 1 fsc 0x0000.0050fd6f
0x02 0x0008.01b.00000340 0x00c41bce.0481.24 --U- 2 fsc 0x0000.0050fd6b
0x03 0x0001.000.00000320 0x00c45fa0.0599.0b ---- 1 fsc 0x0000.00000000
...
Figure 6-38. Shown is a data block (1,75847) dumped immediately after the first two
transactions committed and the third transaction remaining active. Notice the committed
inactive transaction flags have changed from ---- to --U-.
The two flags ---- and --U- are needed because a row involved in an active
transaction or even a past active transaction can have a valid ITL entry in its row data. So
simply referencing the row data and seeing an ITL entry does not imply the row is currently
involved in an active transaction and is therefore locked. To check if the row is locked, a
server process must get the ITL reference from the row data and then check the flag in the
block’s variable ITL area. If the flag is ----, then a server process knows the row is indeed
involved in an active transaction and is locked. However, if the flag is --U, a server process
241
knows the row is not locked. Part of the block cleanout process is removing nonactive
transaction row data ITL entries, changing their respective ITL entry’s in the block’s variable
portion flag to a status of C---, and consolidating the row data.
This is a brilliant strategy, because Oracle can quickly make the minimal amount of
changes necessary to record a change in a block, yet still maintain concurrency control at the
row level. The final bock changes will eventually need to be made, but perhaps that will occur
at less intense workload period, such as after a benchmark has completed.
To see a wonderful example of this, Figure 6-39 shows the inactive transactions flag
change as a direct result of issuing a simple a select statement that touches block 1,78547.
...
seg/obj: 0xff6b csc: 0x00.510047 itc: 3 flg: O typ: 1 - DATA
fsl: 0 fnx: 0x412848 ver: 0x01

0x01 0x0003.00d.00000318 0x00c3e3d0.0593.0c C--- 0 scn 0x0000.0050fd6f
0x02 0x0008.01b.00000340 0x00c41bce.0481.24 C--- 0 scn 0x0000.0050fd6b
0x03 0x0001.000.00000320 0x00c45fa0.0599.0b ---- 1 fsc 0x0000.00000000
...
Figure 6-39. Shown is a data block (1,75847) dumped immediately after a simple select
statement touched the block. Notice the transaction flags have changed from a --U- to a C-
--, indicating a block cleanout has occurred.
Now I realize this is interesting and all that, but I also understand some readers may
think this block dump and ITL stuff is not all that relevant. But I beg to differ. Not only have
you gained a fuller understanding of the TX enqueue, but you have also clearly seen how
Oracle implements its patented row-level locking scheme, In the next section, you will see
how ITLs are involved with the “snapshot too old” error and with buffer cloning!
Deeper into Buffer Cloning

Earlier in this chapter, I introduced block cloning as it relates to CBC latch contention. Now I
will delve deeper into exactly how Oracle does this using ITLs, undo blocks, SCNs, and other
interesting Oracle tidbits.
If you recall, when a server process locates a desired buffer and discovers a required row
has changed since its query began, it must create a back-in-time image of the buffer. This is
known as a consistent read (CR) buffer of the current (CU) buffer. Once the buffer is copied,
the appropriate undo is applied, bringing the copied buffer back in time until the CR buffer
has been properly cloned. Now let’s step through this process.
Figure 6-40 is the basis for this example. Suppose our query begins at SCN time 12330.
At SCN time 12355, our query finally gets around to accessing buffer 7,678. However, we
notice there is some ITL activity. Transaction 7.3.8 is currently active, and the buffer may
have been changed since our query began. Transaction 5.2.6 is no longer active (Flag of C,
has an SCN assigned, and Lck is 0), but its changes were committed after our query started
and are reflected in this CU buffer. Both of these block changes mean the CU buffer has been
242
changed after our query began at SCN time 12330 and cannot be used as is in our query. We
need a version consistent copy, which is one taken back in time, at time 12330. Therefore, the
CU buffer 7,678 must be cloned and undo applied, creating a SCN time 12330 CR buffer.
Figure 6-40. A diagram of a single data block with its three areas and three associated undo
blocks with their relationships shown. Two transactions are active, changing three rows.
Before the cloning can occur, an unpopular free buffer must be found and then replaced
with the 7,678 CU buffer. Our server process will acquire the LRU chain latch associated with
its LRU and begin scanning, starting at the LRU end of the LRU chain, looking for an
unpopular free buffer. Eventually, it will find an unpopular free buffer and replace it with a
copy of CU buffer 7,678. And, of course, the CBC structure will need to be updated to reflect
the cloned buffer’s location in the buffer cache.
We begin with the first ITL that is associated with active transaction 7.3.8. Our server
process needs to retrieve any undo that has occurred after our query started at SCN time
12330. Transaction 7.3.8’s most recent undo can be found by following its ITL’s undo block
address (UBA) link to undo block 2,45. Our server process must then access undo block 2,45,
which requires CBC activity and could also require an IO call along with LRU activity. Once
we have access to undo buffer 2,45, we check to ensure we are working with the correct
transaction by comparing transaction numbers. Both the data block and the undo block
transaction numbers do match (7.3.8), so we’re okay. Because the transaction is active, the
undo information should not have been overwritten.
243
Still working with undo block 2,45, the SCN is 12348, which means the undo represents
block changes that occurred after our query started at SCN time 12330. Therefore, we apply
the undo to our cloned CR buffer, inching it a little further back in time.
Undo block 2,45 also has link to another undo block—2,90. This is referred to as the
chain of undo and could possibly go on for quite some time, consuming significant computing
resources. Our server process must now access undo block 2,90 (which requires CBC activity
and could require an IO call along with LRU activity), and again compares transaction
numbers to ensure they match. They do match, and now we examine the SCN. The SCN is
12320, which is before our query started at time 12330. Therefore, we do not apply the undo.
If we did apply the undo, our CR buffer would represent a version of block 7,678 at time
12320, which is too early!
We now start working on the second ITL, which is associated with transaction 5.2.6.
This transaction committed at SCN time 12350, which is after our query began so we need to
apply its undo. From the ITL entry, we get the undo block address of 2,70 and access the undo
block. The transaction numbers are now compared. Since the transaction has committed, the
undo is not guaranteed to be protected. Increasing the undo retention setting keeps the undo
around longer, but there is no guarantee.
Suppose another server process overwrote its transaction undo in undo block 2,70. If this
occurred, that server process’s transaction number would be recorded, and it would not be
5.2.6. Upon transaction number comparison, we would notice this difference and immediately
know the undo in block 2,70 should not be applied to our CR buffer. At this point, our server
process would post a “snapshot too old” error and stop our query. Clearly, the undo block
snapshot was to old because it was overwritten by another process.
Fortunately, the transaction numbers do match! The undo in undo block 2,70 was
changed at SCN time 12340, which is after our query started, so we apply the undo to our CR
buffer. The next undo link is empty, so there is no more undo to consider applying.
Referring back to our ITL entries, there are no more ITLs to consider, so we have
completed our block clone. Any server process can now access CR buffer 7,678 consistent at
time 12330.
It should now be obvious why ITLs are so important and that Oracle’s read-consistent
model, while extremely powerful and necessary and efficient, is still relatively expensive
because it has the potential to consume a lot of CPU and IO, slowing application response
time. Oracle is very aware of this, and in Oracle Database 10g Release 2, it started using in-
memory optimized structures to temporarily store undo information. These objects are not
segments and are therefore not subjected to segment-related overhead like CBC and LRU
chain activity. In memory, undo is stored in the shared pool, which is covered in Chapter 7.
Summary
This brings us to the end of the buffer cache internals chapter. This is truly a fascinating topic,
complete with interesting internal structures, latches, queues, high concurrency, in-memory
objects, and ITLs! My hope is that I went deep enough into the internals to clearly explain the
key Oracle buffer cache structures and their associated wait events, and then provide multiple
practical solutions. I also hope I didn’t go too deep and cross the border into interesting, yet
not all that useful. Finally, I hope Oracle concurrency management is not such a mystery and,
in your mind, fits together a whole lot better. Now it’s time to turn our attention to Oracle’s
shared pool cache.
244
CHAPTER
7
Oracle Shared Pool
Internals
It’s much more than sharing SQL. When the shared pool was introduced in Oracle 7, it was
the answer to a particularly vexing problem of the continual reparsing of SQL statements,
along with all their related memory management. But while the initial focus of the shared
pool was in sharing SQL statements and programmatic structures, it has grown into a
behemoth cache full of every imaginable type of memory structure, complete with a wide
variety of requirements from size, to access patterns, to cache duration.
Many new Oracle features require new memory structures, which are commonly cached
in the shared pool. This has created an incredibly difficult technical problem for Oracle
software architects. Not only are the existing memory structures difficult to manage together,
but with the constant addition of new memory structures, the requirements continue to
change. This forces memory management optimization to remain flexible while at the same
time being pushed, we hope, toward optimal performance. But regardless of the challenge and
the claims, as they say, it is what it is, and that is what we, as performance analysts, have to
deal with.
My objective in this chapter is to clearly present the relevant architectural components of
the shared pool, the performance challenges we face, and how we can alter the situation in our
favor to improve performance. Many of the architectural concepts—such as hashing, hash
chains, latches, and mutexes—have already been covered in earlier chapters, so let’s dive
right into the shared pool.
245
Oracle Shared Pool Internals
Problems in the Shared Pool

There are many problems—some would say challenges—associated with the shared pool. The
constant inclusion of new memory structures, new requirements of all kinds, and highly
dynamic memory management contribute to the situation.
The following aspects make the shared pool challenging:
• Intimate relationships: Shared pool objects are related and can be intimately
connected with each other. For example, a single table column may be related to
literally thousands of SQL statements and programmatic constructs (functions,
procedures, and packages). If that one column is altered, that single change can ripple
(cascade) throughout all its relationships. The result could be the invalidation of every
associated object, which would force recompilation and rebuilding of cursors, and that
could require massive latching, pinning, and locking of resources. As you can see,
these interconnected and dynamic objects present a very difficult problem.
• Memory fragmentation: Oracle has made absolutely amazing progress in this area.
Memory fragmentation used to be one of most common performance problems. But
Oracle has taken a number of steps in both architecture and instance parameter
flexibility, allowing us ways to influence Oracle to better adjust its behavior to a
particular application’s character. Throughout this chapter, notice how memory
management has evolved, and how we can alter and influence Oracle’s shared pool
memory management.
• Pinning and locking: Similar to the situation with Oracle buffer headers, the central
shared pool object—the cursor—can be pinned and locked. Pinning ensures memory
is not improperly deallocated and can also be used to ensure serialization. Locking is
used to prevent an inappropriate change. They can be used together. For example,
during a procedure compilation, the related cursor can be pinned to ensure the memory
is not deallocated and locked to ensure no other process can inappropriately change the
related objects. And, of course, latching can be involved to ensure serialization.
• Latching: As you know by now, latching is all about serialization control. Because the
shared pool contains memory structures, it relies on latches and mutexes for
serialization. If it were about relational structures, there would be enqueues, which are
focused on table and row locks. While shared pool-related latching has traditionally
infused fear into most DBAs, the two core latches (library cache and shared pool) are
used for very different purposes, and Oracle’s wait interface clearly differentiates the
two. This allows the performance specialist to know exactly what is occurring,
resulting in a number of spot-on solution possibilities.
Fortunately for us, the wait interface will clearly tell us if the problem is latching,
pinning, or locking. So while all of these may be involved, we will be able to prioritize their
impact and develop appropriate solutions. During normal operations, pinning and locking are
rarely a significant problem. When they are the problem, most DBAs will recognize their
timing corresponds to some administrative tasks, such as altering objects, forcing
recompilation, applying patching, or performing various upgrades.
I would say where most DBAs mess up shared pool latch contention is not clearly
differentiating the latch situation, and therefore essentially making guesses at solutions. The
library cache latch is related to finding objects in the shared pool. The shared pool latch is
246
about memory management. Just keeping these two concepts clear will have a dramatic effect
on your shared pool optimization success.
While nearly all shared pool problems are highly connected, try your best to isolate the
problems and apply solutions to each of them. Our solutions become muddy and watered-
down if we get sloppy and don’t separate the problems. A solid understanding of the core
shared pool architecture and Oracle’s wait interface will allow you to do a terrific job at
resolving problems.
What’s in the Shared Pool?

I tell my students that if presented with a multiple-choice exam question about where a
memory object resides in Oracle, and you’re not sure, always guess it’s in the shared pool.
Oracle’s shared pool is packed full of all types of memory structures—everything from shared
SQL to undo related structures.1 If that’s not scary enough, consider this simple query:
select name from v$sgastat where pool='shared pool';
It returns 35 rows in 10.1, 593 rows in 10.2, 729 rows in 11.1, and a staggering 857 in 11.2.2
While there are hundreds of objects in the shared pool, as performance specialists, we
must have a solid understanding of the following key objects (or areas):
• SQL (cursors): These statements contain the essence of the shared pool. In this
chapter, I will provide details about cursors, but for now, think of a cursor as a chunk
of memory connected to many other memory chunks containing information key to the
proper functioning of the SQL statement.
• PL/SQL (cursors): These programmatic structures—such as functions, procedures,
and packages—are also cached and therefore require shared pool memory. PL/SQL
structures can consume a tremendous amount of shared pool memory.
• Library cache: This is used to locate objects in the shared pool and to ensure their
proper relationship is maintained. While the library cache can be represented as a
hashing structure (complete with buckets and chains), it is actually much more
complicated than this. Efficient and unfettered library cache operations are central to a
well-performing Oracle database system.
1
I suspect the trend will continue. As shared pool memory management continues to improve, Oracle will be
more confident in placing more and more distinctly different objects into this single large cache.
2
For some math fun, go to http://www.wolframalpha.com and enter fit {35,593,729,857}.
247
• Row cache: Commonly called the dictionary cache, tuning of this cache was
automated starting in Oracle 7. The row cache caches Oracle’s data dictionary rows
(not blocks). Objects like sys.tab$, sys.col$, and sys.auth$ are cached
within the shared pool’s row cache, rather than in the buffer cache. While Oracle must
retrieve an entire block to retrieve a single dictionary table row, the result is significant
memory savings and a highly efficient cache. In fact, the row cache can be monitored
through the v$rowcache view and typically has an extremely high cache hit ratio of
around 99%. While DBAs were initially concerned about Oracle removing manual
optimization control, Oracle has done a fantastic job at dictionary object cache
management.
• In-memory undo (IMU): This was introduced in Oracle Database 10g Release 1. It
increases DML and read-consistent-intensive operational performance by keeping as
much undo as possible using in-memory objects, instead of traditional undo segments.
My tests have shown a 21% CPU consumption reduction when using IMU with a read-
consistent-intensive load.
The Oracle Cursor

The cursor is a fundamental object that is a complete executable representation of a SQL
statement or a PL/SQL programmatic construct, which can be used and reused by any
authorized session. Cursors must be created, locatable (that is, found by a search), destroyed
(deallocated), invalidated, and reloaded. If any part of the cursor is not in the shared pool and
is needed for any reason, that aspect must be reloaded, which does slow performance.
Developers typically have a good understanding of cursors because they have needed to
specifically create, open, execute, fetch, and close cursors. DBAs commonly look at cursors
as simply memory chunks related to SQL. However, this simplistic view inhibits our ability to
create solutions to cursor-related performance problems. So, if you take the time to better
understand cursors, you’ll notice your performance solution options will significant increase.
Parent and Child Cursors

The term cursor is in itself an abstraction and is used to refer to shared information (located in
a shared SQL area), private information (located in a session’s PGA), and the library cache
chain node (called a handle when referring to the library cache) used to locate the various
cursor components. Unfortunately, this multipurpose definition also adds to the confusion.
When a cursor is closed, Oracle does not simply deallocate all three of these cursor
components. Rather, Oracle may deallocate the cursor components as it deems necessary; for
example, if the memory is needed elsewhere.
The first time a cursor is executed, there is actually a parent cursor and a child cursor.
Subsequent sessions, and even the same session executing the same exact SQL statement (that
is, the hash value is the same), may require a different child cursor. While the SQL statement
will be textually exactly the same, a child cursor is created to capture specific characteristics,
such as optimization mode differences (for example, first_rows) resulting in different
execution plans or different session-level parameters (cursor_sharing=similar). To
demonstrate this, Figure 7-1 shows a simple situation where the same session executed the
same SQL statement twice with a simple alter session in between the two executions.
248
This is enough to force an additional child cursor to be created. The trace command is used
to prove two child cursors are created.
alter session set optimizer_mode = all_rows;

select * from dual;
alter session set optimizer_mode = first_rows;
select * from dual;
alter session set events 'immediate trace name library_cache level 10';
Figure 7-1. Shown is the same session running the same SQL statement with a simple alter
statement between executions. This is enough to force Oracle to create two child cursors. The
trace statement is used to prove two child cursors are created.
Figure 7-2 is part of the massive trace output file created from the trace command
shown in Figure 7-1. I located the part of interest by searching for from dual and then
examining the SQL statement. Of interest to us at this point is that this one SQL statement
was executed by just a single session, yet it created two child cursors.
…
BUCKET 108289:
LIBRARY OBJECT HANDLE: handle=41226b18 mutex=0x41226bcc(2)
name=select * from dual
hash=0d54fc02b2ad4044a2cb0974382da701 timestamp=04-06-2009 12:33:26
…
CHILDREN: size=16
child# table reference handle
------ -------- --------- --------
0 4122668c 41226340 412261d4
1 4122668c 412264a0 4121b6e8
…
Figure 7-2. Shown is a small yet significant part of the level 10 library cache dump executed
from the end of Figure 7-1. Notice the same SQL statement has two child cursors. The second
cursor was created because a session parameter was changed.
Relationships between library cache objects must be maintained not only for execution
purposes, but also if a change occurs in one of the components. Suppose one table is
referenced in 2,000 SQL statements, in 100 functions, and 20 packages. Now suppose one of
the table’s columns is renamed. Oracle will then invalidate all related SQL statement and
programmatic constructs. This can result in a cascade effect that requires latching and locking.
The combination of multiple involved sessions, invalidations, recompilations, and timing has
resulted in entire Oracle instances locking up. Obviously, Oracle knows about this serious
situation and is actively decreasing the likelihood of it occurring. But every DBA needs to
know library cache relationships are very complex and can cause problems at times.
Cursor Building
Cursors are created when searched for and not found in the library cache. This results in what
is called a hard parse. As will become obvious, this is a relatively expensive operation that
249
requires memory management (allocation and possibly deallocation), latching to ensure

serialization, locking to prevent inappropriate changes, CPU consumption to run the kernel
code, and possibly IO to bring data dictionary information into the row cache.
Cursors are built from data in the shared pool. And if the data is not currently in the
shared pool, Oracle creates its own SQL to retrieve the data from the data dictionary tables.
Recursive SQL is the name given to the SQL Oracle dynamically creates and runs itself. Some
of the data Oracle needs to create a cursor is optimizer statistics, session information, security
information, object information, and object relationship information. Books could be written
simply about cursor creation.
Cursors are created out of chunks of shared pool memory called heaps. Traditionally,
different SQL statements required different size memory chunks. Common SQL statements
have traditionally required 4KB memory chunks. Just as with free extent management, having
nonuniform memory chunk requirements creates allocation, performance, and efficiency
issues. Starting in Oracle Database 10g Release 2, Oracle states all memory chunks are 4KB.
When the appropriate memory chunk cannot be quickly found, Oracle eventually gives up,
posts a 4031 “out of shared pool memory” error, and stops the SQL statement processing.
Ouch!
Cursor Searching Introduction

Like each buffer in the buffer cache, each parent and child cursor must be locatable, and that
search must be fast! This requires memory, a searching structure, serialization, kernel code,
and plenty of CPU resources.
Since cursors and programmatic structures reside in the library cache, there is a structure
to locate the objects. Oracle chose to use a hashing algorithm and the related hash-like
structure. Part of parsing is determining if a cursor currently resides in the library cache. If the
cursor is indeed found in the library cache, some parsing did occur, so it’s deemed a soft
parse. However, if the cursor was not found, the entire cursor must be built, so this is deemed
a hard parse. As I mentioned earlier, cursor creation, and therefore hard parsing, results in a
relatively expensive operation.
Cursor Pinning and Locking

Pinning a cursor is similar to pinning a buffer. It is used to ensure the cursor is not deallocated
(sometimes called destroyed) when being referenced. While cursors are clearly not relational
structures, SQL is related to relational structures (for example, the employee table), and
relational structures are used to build cursors (for example, sys.col$) and therefore
locks—that is, enqueues are used! The cursor enqueue is called the CU enqueue and can be
detected just like any other enqueue through Oracle’s wait interface.
Pinning occurs whenever a cursor is created and executed. It makes sense when you
think about it. When your build a cursor, which is a memory structure, you don’t want another
process is deallocate the associated memory! Normally, cursors are unpinned after being built
and after execution. This means that after your session executes a cursor and then 2 minutes
later wants to execute that same cursor again, it may have been deallocated. If this occurs, the
cursor will not be found in the library cache, resulting in a hard parse, which is a complete
cursor rebuild.
Locking also occurs when a cursor is built and executed. But the perspective is different
than pinning. Pins are concerned with memory being deallocated. A lock ensures an Oracle
250
table related to a cursor is not altered during cursor creation and execution. Obviously, that
could create some rather strange situations, and Oracle will not allow this to occur.
Library Cache Management

Before you can come up with good library cache performance solutions, you need to have a
solid grasp of the library cache’s architecture. With that knowledge, and the background
presented in the previous chapters, you’ll be able to grasp why a solution makes sense and the
possible impact it can have.
Library Cache Architecture

The library cache architecture is extremely complex. The combined requirements of
concurrency, relationships between objects, and fast searching really stress the architecture.
I’ll start at a very conceptual level of this architecture, and then methodically step down into
ever increasing detail. I’ll stop when the details are not particularly helpful for performance
firefighting.
A wonderful conceptual library cache model is the traditional library. Let’s say you want
to find Ray Bradbury’s book, Dandelion Wine. Because the library is a massive storehouse of
books (think all library cache objects), sequentially or randomly searching would be futile. So
you go the card catalog area (think hashing structure) and walk directly to the card catalog
containing books pertaining to authors whose last name start with the letters A through D
(think hashed to a specific bucket). There is someone in front of you, so you must wait (think
acquiring the related hash bucket’s latch). Finally, you are in front of the appropriate card
catalog (think acquired the latch), and begin sequentially searching for the book (think
sequentially searching a hash chain). Finally, you find the card and see the book’s address
813.54 (think library cache handle). You walk over to where the book should be, find it, and
start reading (think accessing the cursor). If you can picture in your mind this story, you are
well on your way to understanding Oracle’s library cache.
Library Cache Conceptual Model

As with the buffer cache, the library cache objects are located using a hashing algorithm. This
involves a hash function, buckets, chains, and latches or mutexes. One key difference is the
hash chain nodes are not composed of buffer headers, but rather simple pointer nodes called
handles.
Handle is common term for a memory address pointer, and this is the case with the
library cache. There is a one-to-one relationship between a handle and a library cache memory
object. So referring to a handle is synonymous as referring to its associated object. When
mutexes are used instead of library cache latches, each individual handle has an associated
mutex. Each library cache object refers to an object of a specific type, sometimes called a
namespace, such as a cursor, a child cursor, a table, or a programmatic construct.
Figure 7-3 abstracts the library cache implementing mutexes and highlighting the
various architectural components, yet without specific object names. Library cache objects are
searched using a hashing structure, so you would expect to see buckets, such as bucket BKT
200. When mutexes are implemented, there is a mutex associated with each handle, so each
memory chunk shown in Figure 7-3 has an associated mutex. Each hash chain can contain
zero or more handles, which relates to zero or more library cache objects, such as cursor CSR
251
500 and table TBL 400. Each parent cursor will have at least one child cursor. As described
earlier, a parent cursor, such as CSR 500 in Figure 7-3, can be associated multiple child
cursors, such as CCSR 600 and CCSR 610.
Figure 7-3. A simple conceptual model of an Oracle library cache. The model shows hash
buckets (BKT), cursors (CSR), child cursors (CCSR), and table references (TBL).
A key library cache characteristic is the object relationships. In Figure 7-3, notice that
table TBL 400 is associated with the three child cursors CCSR 600, CCSR 610, and CCSR
620. So, if table TBL 400 is altered, Oracle knows which library cache objects to invalidate.
For example, if table TBL 400 were altered and Oracle deemed the alteration severe enough
to invalidate the library cache entry, then all the associated library cache objects would also
be invalidated. And, of course, serialization must be maintained, so you can see how even a
relatively small library cache can become quite intense.
Also note the impact of using mutexes instead of latches. Since a mutex is associated
with each library cache object, entire hash chains will not be made unavailable, resulting in a
significant reduction in false contention as well as the acquisition CPU consumption, and
therefore improved response time.
Library Cache Object References

Now, let’s take the conceptual model to a much more realistic level to clarify the library cache
object relationships. Three figures are involved in this example. Figure 7-4 is the actual shell
script containing the SQL. I simply copied and pasted the code into my Oracle user Linux
session. Figure 7-5 is a library cache model with the actual real-life object references. Figure
7-6 shows the relevant parts extracted from the actual trace file.
252
sqlplus system/manager <<EOF

alter session set MAX_DUMP_FILE_SIZE=unlimited;
drop table findme;
create table findme as select * from dual;
alter session set optimizer_mode = all_rows;
select * from findme;
alter session set optimizer_mode = first_rows;
select * from findme;
select dummy from findme;
alter session set events 'immediate trace name library_cache level 10';
exit
EOF
Figure 7-4. Shown are a few lines of code, which show library cache object relationships.
Figure 7-5 is a model of the library cache objects, and Figure 7-6 shows the key parts from
the actual trace file.
Figure 7-4 starts with simply connecting to Oracle. To ensure I didn’t lose any critical
trace file pieces, I set my dump file size to unlimited. I then created a very simple table,
named findme, to make it easy to find the SQL in the massive trace file. Just as I did in
Figure 7-1, I executed the same SQL statement but altered an optimizer parameter. This
should result in two child cursors. Then I executed a different SQL statement, which
references the same findme table. I did this to show that different SQL statements will
reference the same table handle. Finally, I performed the library cache dump and
disconnected.
253
Figure 7-5. A library cache diagram based on the SQL run in Figure 7-4, which produced
the trace file shown in Figure 7-6. The figure highlights the various library cache object
attributes, such as buckets (14778), handles (456dbf90), mutexes (0x456dc044), cursors, child
cursors, and their relationships.
Figure 7-5 is nearly identical to Figure 7-3, with the key exception that the contents are
filled based on the SQL run in Figure 7-4, which produced the trace file shown in Figures 7-6
and 7-7. (The trace file was simply too large to place in a single figure.)
Figure 7-5 shows bucket 14778 is associated with the cursor that has two identical SQL
statements, yet an optimizer parameter was altered between their execution, creating the two
child cursors. Notice both the parent and child cursors have the same type: CRSR. If you look
closely at the top of the Figure 7-6 trace file, you’ll see bucket 14778, its associated handle,
mutex, name, and two child cursors, each with a handle. Bucket 60635 is associated with one
object: the findme table with its handle 456df410.
254
BUCKET 14778:
LIBRARY OBJECT HANDLE: handle=456dbf90 mutex=0x456dc044(2)
name=select * from findme
hash=57c14570e98dc8b98fe8a5a2ebf439ba timestamp=04-07-2009 10:20:30
namespace=CRSR flags=RON/KGHP/TIM/KEP/PN0/SML/KST/DBN/MTX/[120100d4]
CHILDREN: size=16
------ -------- --------- --------
0 456dbaa4 456db758 456db5ec
1 456dbaa4 456db8b8 45664b28
BUCKET 14778 total object count=1
BUCKET 60635:
LIBRARY OBJECT HANDLE: handle=456df410 mutex=0x456df4c4(0)
name=SYSTEM.FINDME
hash=e1c0649b1ca651c3902f9631d172ecdb timestamp=04-07-2009 10:20:30
namespace=TABL flags=KGHP/TIM/SML/[02000000]
BUCKET 67700:
LIBRARY OBJECT HANDLE: handle=45662ce0 mutex=0x45662d94(1)
name=select dummy from findme
hash=70b1c44268eb8c9d2860b06f93850874 timestamp=04-07-2009 10:20:30
namespace=CRSR flags=RON/KGHP/TIM/KEP/PN0/SML/KST/DBN/MTX/[120100d4]
CHILDREN: size=16
------ -------- --------- --------
0 456627f4 456624a8 4566233c
Figure 7-6. The first part of the trace file created by the SQL in Figure 7-4. This part
contains the hash bucket chain details. This text is diagrammed in Figure 7-5. While no lines
have been modified, many lines have been removed.
Now let’s see how Oracle keeps track of child cursor relationships. Figure 7-7 is the
second part of the trace file generated by the Figure 7-4 SQL. This section of the trace file
starts with ANONYMOUS LIST:, making it easier to locate. Notice there are three library
object references and that each one of those has a handle matching one of our three child
cursors! Find the bottom library object entry, which has a handle of 456db5ec. This is a cross-
reference for the child cursor with the handle of 456db5ec. Notice it has two references: a
reference to the table in its SQL statement (findme with handle 456fd410) and a reference to
its parent cursor (handle 456dbf90). Take a minute to perform the same cross-reference with
the other two child cursor library object entries.
Let’s make a few observations. We can clearly see the library cache contains not only
references to the objects themselves, but also their associations or connections. And these
associations are being cached in memory, which provides potentially thousands of references
when working with a single SQL statement. The library cache concurrency requires its
contents be referenced frequently, and in most cases, with other sessions needing access to the
same information. Concurrency is controlled through either latches or mutexes. So creating a
cursor, referencing a cursor, soft and hard parsing, and destroying a cursor all require
serialization control. Also, because the library cache contains object relationship information
in addition to the object itself, Oracle is able to properly invalidate the fewest library cache
objects possible.
255
ANONYMOUS LIST:
LIBRARY OBJECT HANDLE: handle=4566233c mutex=0x456623f0(0)

namespace=CRSR flags=RON/KGHP/PN0/EXP/[10010100]
DEPENDENCIES: count=1 size=16
dependency# table reference handle position flags
----------- -------- --------- -------- -------- ----------------
0 45660a20 45660654 456df410 18 DEP[01]
READ ONLY DEPENDENCIES: count=1 size=16
dependency# table reference handle flags
----------- -------- --------- -------- -------------------
0 45662280 45661fdc 45662ce0 /ROD/KPP[60]
LIBRARY OBJECT HANDLE: handle=45664b28 mutex=0x45664bdc(0)

----------- -------- --------- -------- -------- ----------------
0 4566320c 45662e40 456df410 14 DEP[01]
----------- -------- --------- -------- -------------------
0 45664a6c 456647c8 456dbf90 /ROD/KPP[60]
LIBRARY OBJECT HANDLE: handle=456db5ec mutex=0x456db6a0(0)

----------- -------- --------- -------- -------- --------------
0 456d9cd0 456d9904 456df410 14 DEP[01]
----------- -------- --------- -------- -------------------
0 456db530 456db28c 456dbf90 /ROD/KPP[60]
Figure 7-7. The second part of the trace file created by the SQL in Figure 7-4. This text is
diagrammed in Figure 7-5. While no lines have been modified, many lines have been
removed.
Because library cache objects are related to other library cache objects that are
consuming chunks of memory, Oracle may deem it necessary to release an object’s memory.
It is possible for the handle to remain, but the actual memory chunk to have been deallocated.
If this occurs, before the cursor can be reexecuted, the library object must be reloaded into the
library cache. Oracle keeps track of these reloads in the reloads column from
v$library_cache, and for specific object statistics, in the loads column in v$sql and
v$sqlarea. If reloading becomes a significant problem, CPU parse time will increase, and
shared pool and library cache latch or mutex contention are likely to become your top wait
events.
The more I have learned about the library cache, the less I complain about shared pool
issues. The library cache’s complexity is daunting and should bring about a level of awe.
256
Object Hashing
While hashing may seem rather dull at this point, this section will clearly demonstrate why
two syntactically exact SQL statements are treated as two distinct and different SQL
statements! If you have every wondered why select * from dual is considered
different from select * from Dual, read on!
Before a library cache object is associated with a bucket, it must be hashed. If you look
closely at each library cache object shown in Figure 7-6, you’ll see each has a hash value.
Oracle creates this hash value based on its own proprietary algorithm. Once the hash value is
generated, it is then entered into Oracle’s proprietary library cache hashing algorithm, which
results in the bucket number.
The library cache object hashing situation is more complicated than with block hashing.
We’re not dealing with a file number and a block number. This situation requires hashing the
object’s text. So if a SQL statement or a package is 2,000 lines long, Oracle may choose to
hash all this text.
Generating a hash value for text is far more interesting than generating one for a
number. One way to hash text is to associate each character with a number and then sum all
the numbers. For example, if a is to 1, b is to 2, and c is to 3, then the hash value of string
abbc is 8 (1 + 2 + 2 + 3). Figure 7-8 is a simple hash value generator and hash function Perl
script.
#!/usr/bin/perl
# hash_text.pl - Hash text and assign to a bucket.

# Created by Craig Shallahamer on April 8, 2009
$buckets = $ARGV[0];
$the_text = $ARGV[1];
$hash_value = 0;
$txt_ptr = 0;
while ( $part = substr($the_text,$txt_ptr,1) ) {

$hash_value += ord($part);
$txt_ptr+=1;
}
$the_bucket = $hash_value % $buckets ;
print "Simple text hashing example with bucket assignment. \n\n";

print " Inputs\n";
print " the text = $the_text \n";
print " buckets = $buckets \n";
print " Outputs\n";
print " hash value = $hash_value \n";
print " bucket assignment = $the_bucket \n";
print "\n";
Figure 7-8. Shown is an example of a text hashing and bucket assignment function written in
Perl. The hash function is the modulus (%) operator.
257
Figure 7-9 shows multiple runs of the hash_text.pl script. Look very closely at the
differences in the SQL text. While each statement is syntactically identical, there is a single
character difference in each statement. The second statement has a capital T, and the third
statement simply has an additional space after the select. Based on this hash value function
and the hash function, this seemingly insignificant difference was enough to produce different
hash values and also different hash bucket assignments. This is why Oracle SQL must be
textually exact, or a new cursor will be created for each textually unique statement. This is
important for developers to understand, because unless they are told, it is highly unlikely they
would assume even the statement spacing must be the same.
$ perl hash_text.pl 23 "select * from dual"

Simple text hashing example with bucket assignment.
Inputs
the text = select * from dual
buckets = 23
Outputs
hash value = 1636
bucket assignment = 3
$ perl hash_text.pl 23 "selecT * from dual"

Inputs
the text = selecT * from dual
buckets = 23
Outputs
hash value = 1604
$ perl hash_text.pl 23 "select * from dual"

Inputs
the text = select * from dual
buckets = 23
Outputs
hash value = 1668
Figure 7-9. Shown is the result of three syntactically exact SQL statements, which yield
different hash values and also hash bucket assignments.
While hash value creation may seem quick, every time a SQL statement is executed, its
cursor must be searched and found, so the hash value creation function will be called a
massive number of times each minute. This will eventually add up to considerable CPU
consumption. While simple statements can be hashed quickly and with little resource
consumption, real-life 5,000-line SQL statements will take a relatively long time. So Oracle
will make every effort to optimize the hash value creation.
One attempt is particularly humorous. Back in Oracle 7 days, a shared pool patch was
released with the intention of improving parse times. In particular, the Oracle developers
decided to hash only on the object’s first 60 characters. So a 65-character statement or a
6,500-character statement would take the same amount of hash value generation time. This is
258
equivalent to altering the hash function shown in Figure 7-8 and considering only the first five
characters. If this were the case, since all three statements shown in Figure 7-8 have the same
first five characters (selec), they would each generate the same hash value and be mapped
to the same hash bucket. The hash chain would be three objects long. Now translate this back
to Oracle’s attempt at considering only the first 60 characters while hashing.
Most statements are unique near the tail end of their text, because that is where the
filtering and bind variable assignment take place. As a result of this patch, thousands of SQL
statements generated the same hash value and were then mapped to the same hash bucket!
Library cache hash chains became huge, resulting in massive library cache latch contention.
The intense cursor creation and destruction also created significant shared pool latching
contention. Performance immediately took a significant plunge, and a patch for the patch was
quickly released. This is a classic example of testing code in a nonproduction-like situation
and rubber-stamping the code ready for release. Why this patch caused problems was a
mystery to me until I understood a hashing structure was used to locate objects in the library
cache.
Keeping Cursors in the Cache

It should be clear now that building a cursor is a relatively expensive operation. The CPU
consumption and possible IO to bring objects into the library cache can significantly slow
performance. This typically manifests into increased parsing CPU consumption and, in
particular, the library cache latch or mutex becoming the top wait event. So one obvious
objective is to keep cursors in the library cache. However, a balance must be maintained, or
other performance-inhibiting issues will arise. The shared pool must contain many types of
objects, and library cache objects are just one of those types. Plus memory is a limited
resource. The following subsections discuss a variety of ways to influence Oracle to keep
cursors in the cache.
Increase the Likelihood of Caching

Oracle cannot deallocate open cursors. Even if the shared pool is flushed, open cursors are
pinned and therefore cannot be deallocated. Normally, when cursor execution is complete, the
cursor is closed, the cursor pin is removed, and if there are no other sessions pinning the
cursor, Oracle can deallocate the associated memory. This allows for newer and active cursors
to remain in memory while less active cursors are naturally deallocated. But if parsing
becomes a significant performance problem, as performance analysts, we become motivated
to influence Oracle to keep the cursors in memory. One of the ways to do this is keep the
cursors open.
Oracle does allow us to keep cursors open longer than usual. When set to true (the
default is false), the instance parameter cursor_space_for_time keeps all cursors
pinned until they are specifically closed. Even after cursor execution has completed, Oracle
will keep the cursor pinned, and it will stay that way until the cursor is closed.
But as with all tuning changes, there is a trade-off. This instance parameter affects all
cursors in the entire Oracle instance. Furthermore, it is not session-specific, and the parameter
change requires an instance restart to take effect. The very real implication is much more
shared pool memory will now be required to cache library cache objects. In fact, the effect
can be so dramatic that the shared pool could effectively run out of memory, resulting in the
dreaded 4031, “out of shared pool memory” error. So care must be taken when setting this
parameter.
259
Personally, I do not enable this option unless there is clearly a parsing problem,
identified by at least two of three situations: CPU consumption dominated by parse time and
either shared pool latch contention or library cache latch or mutex contention. Conversely, if
there are “out of shared pool memory” errors occurring, be sure to check that
cursor_space_for_time is set to false.
Force Caching
Most DBAs know one way to ensure large packages are successfully loaded into the shared
pool is use the dbms_shared_pool.keep procedure. When key packages are loaded into
memory immediately after the instance starts, your chances of receiving an “out of shared
pool memory” error are significantly reduced. In earlier versions of Oracle, especially
Oracle8i, this could dramatically reduce the likelihood of running out of shared pool memory.
Figure 7-10 is an OSM report based on v$db_object_cache and shows some of the
initial objects loaded after an Oracle instance restarts. Notice that when the report was
generated, there were no objects being forcibly kept in the shared pool that met the report
selection criteria.
SQL> @dboc 10 20
old 9: where a.sharable_mem >= &min_size
new 9: where a.sharable_mem >= 20
old 10: and a.executions >= &min_exec
new 10: and a.executions >= 10
Database: prod16 09-JUN-10 02:11pm

Report: dboc.sql OSM by OraPub, Inc. Page 1
Oracle Database Object Cache Summary
Obj Exe Size

Owner Obj Name Type Loads (k) (KB) Kept?
---------- --------------------------------- ---- ----- ----- ----- -----
SYS DBMS_AQADM_SYSCALLS PBDY 0 0 24 NO
SYS STANDARD PBDY 0 0 24 NO
SYS DBMS_RCVMAN PBDY 0 0 367 NO
SYS DBMS_PRVT_TRACE PBDY 0 0 12 NO
SQL> l
1 select a.owner ownerx,
2 a.name namex,
3 decode(a.type,'PACKAGE','PKG','PACKAGE
BODY','PBDY','FUNCTION','FNC','PROCEDURE','PRC') typex,
4 a.loads/1000 loadsx,
5 a.executions/1000 execsx,
6 a.sharable_mem/1024 sizex,
7 a.kept keptx
8 from v$db_object_cache a
9 where a.sharable_mem >= &min_size
10 and a.executions >= &min_exec
11 and a.type in ('PACKAGE','PACKAGE BODY','FUNCTION','PROCEDURE')
12* order by executions desc, sharable_mem desc, name
Figure 7-10. Shown is output the OSM script dboc.sql followed by the actual SQL. This
report was run immediately after the instance was cycled. Production Oracle systems will
produce thousands of rows, and therefore the threshold parameters will need to be adjusted.
260
When forcing objects to be kept in the shared pool, keep in mind that we are, in effect,
gaming Oracle’s least recently used (LRU)-based shared pool memory management
algorithm. We are saying that we know better than Oracle does. This could actually be the
case, since most DBAs know their applications very well. But until you do, stuffing your
shared pool full of packages like a Christmas stocking can actually increase the likelihood of
“out of memory” errors, because little room is left for all the hundreds—if not thousands—of
other shared pool objects. So, think carefully before using this procedure.
Private Cursor Caches

Here’s the problem: Because the library cache is shared among all sessions, some type of
serialization control mechanism must be in operation. Whether the mechanism is latches or
mutexes, that means CPU consumption for both acquiring the control structure and also
accessing the memory structure. If the access becomes intense, significant contention can
arise, causing serious performance degradation. So it’s logical to ask what may seem like a
silly question: “Can we simply not use a control structure?”
Sure we can, if serialization is not an issue. What Oracle has done is to reduce the
likelihood of requiring serialized library cache access by providing each session with its own
private library cache-like structure containing just the session’s popular cursors (actually just
the pointers to the cursors, which are their handles). Because the cursor cache is private,
serialization is guaranteed, and therefore, no control structure is required! This is an elegant
solution indeed.
This private library cache-like structure is called a session cursor cache. By default,
every session has a cursor cache containing pointers to its popular cursors. By default, Oracle
Database 10g Release 2 caches 20 cursor pointers. In Oracle Database 11g Release 1, the
default is 50 cursor pointers. Regardless of the defaults, the cache size can be modified at the
system level (not the session level) by altering the session_cached_cursors instance
parameter.
It works like this: When running a SQL statement, the session creates the statement’s
hash value, and then checks if that handle resides in its own cursor cache. Since no other
process can access the session’s cursor cache, no control structure is required. If the handle is
found, the session knows the cursor exists in the cache. If the cursor is not found in the
session cursor cache, the hash value is hashed to a library cache hash bucket, the appropriate
control structure acquired, and then the chain is sequentially scanned, looking for the cursor.
If the handle is found in the session’s cursor cache, some effort has been expended parsing,
but it’s not as much as a hard parse (statement not found in the library cache) or even a soft
parse (statement found in the library cache), and hence the term softer parse is used to
describe this approach.
The good news is that library cache contention can be significantly reduced by
increasing every session’s cursor cache. The bad news is that indeed every session’s cursor
cache is increased. If the Oracle instance has hundreds of sessions, together all the session
cursor caches can require enough memory to cause shared pool memory availability issues.
You’ll know when you’ve gone too far, because you’ll start receiving 4031 “out of memory”
errors. At this point, either reduce the session cache cursor size or, if there is memory
available, increase the shared pool size. So, as with nearly every tuning effort and parameter,
there is a cost. As performance analysts, our hope is that the cost is less than the performance
benefit.
261
Library Cache Latch/Mutex Contention Identification and

Resolution
As the library cache becomes increasingly active, competition for the control structures and
potentially the time holding the control structures can increase so much that it becomes a
serious performance issue. It will become obvious when this occurs because our response-
time analysis will clearly point to a library cache latch- or mutex-related wait event(s).
Furthermore, there will be significant Oracle CPU consumption, with an unusual amount of
recursive SQL or parse-related time. The operating system will be ravaged by a CPU
bottleneck. Fortunately, there are several very good solutions to this problem.
Figure 7-11 is an example of what you may see when severe library cache latch
contention exists. Notice that nearly 100% of the latch contention is related to the library
cache! The load used to create this scenario was multiple sessions executing a tight PL/SQL
loop, which simply opened and closed a cursor. Searching the library cache requires the
library cache latch and opening a cursor requires the cursor to be pinned. As a result, we see
both library cache latch types in heavy demand. Also interesting is that Oracle was consuming
all available host CPU. Less then 1% of the CPU time was classified as parsing, but nearly
75% of the CPU was related to recursive SQL. If you recall back to the discussion in Chapter
5, any SQL with a depth greater than 0 is classified as recursive SQL, and the PL/SQL loop
contents that were repeatedly executed will have a depth of 1 or more. When either recursive
SQL time or parse time consume a significant amount of the available CPU, library cache-
related latch contention is very common.
SQL> @swpctx
Database: prod16 28-JUL-10 10:40am


--------------------------------- ----------- ------- ----------- --------
latch: library cache pin 2.830 43.88 23.2 0
log file sync 0.000 0.00 0.0 0
Figure 7-11. Shown is an example of severe library latch contention when mutexes are not
used. While not shown, Oracle is consuming all the available host CPU, and because the load
is based on a PL/SQL loop, recursive SQL is consuming over half of the Oracle CPU.
Enable Mutexes
Figure 7-12 is based on the same load and interval as Figure 7-11. The only difference is
library cache mutexes have been enabled by setting the instance parameter
_kks_use_mutex_pin to true (which is actually the default). Notice the top wait event
262
is cursor: pin S. This is the result of the cursor being repeatedly and intensely opened
and closed.
While the percentage of recursive SQL was the same with mutexes enabled and
disabled, when latches were used, the total CPU consumption was nearly twice as much! In
fact, look closely and compare the time waited for library cache-related events in Figures 7-11
and 7-12. Once again, mutexes are shown to be more efficient than latches.
SQL> @swpctx
Database: prod16 28-JUL-10 10:43am


--------------------------------- ----------- ------- ----------- --------
cursor: pin S 2.630 94.27 47.0 0
log file sync 0.000 0.00 0.0 0
Figure 7-12. Shown is an example of severe library cache contention when mutexes are
enabled. The same exact load and reporting interval was used as in Figure 7-11.
Use Bind Variables to Create Similar SQL

As was shown in Figure 7-9, Oracle is very particular about what it considers a similar SQL
statement. Every statement must be parsed, and if the cursor is not found in the library cache,
the cursor must be completely built (a hard parse). Hard parsing requires library cache-related
latches and locks, so if hard parsing becomes so intense the related wait events are driven to
the top of our reports, we will look for ways to create similar SQL statements. Oracle
provides two powerful methods to do this.
The first method is to simply use bind variables instead of literals. For example, the
statement select * from employee where emp_no=100 uses a literal. If the
statement select * from employee where emp_no=200 was then issued, because
of Oracle’s hashing algorithm, the two statements would have different hash values, reside in
different buckets (probably), and have different handles. As you can imagine, during intensive
online transaction activity, this will result in a tremendous amount of hard parsing. If the
application developers can submit the SQL statement as select * from employee
where emp_no=:b1 along with employee number, the cursor will not contain the
employee number, and the cursor is highly likely to be reused (since regardless of the
employee number, the same cursor will be reused). This dramatically reduces hard parsing.
It is very simple to see if your statements are using bind variables. Just look at the SQL
Oracle is storing in, for example, v$sqltext. If bind variables are being used, you’ll see
them.
Discovering severe library cache-related contention can lead to the painful realization
that bind variables should be used where they have not been employed. Application
263
developers will be very unhappy, as it can require a significant amount of rework. DBAs
dread bringing this to the developer’s attention because, once again, they look like the bad
guys.
One of my students told me about a situation involving a vendor application used for
university student registration. When students logged on to the system and began registering
for courses, immediate and intense library cache latch contention manifested. It was obvious
and painful! So the DBAs approached the application vendor and presented their case.
Surprisingly to them, the vendor replied that the cost of converting the application to using
bind variables was simply too expensive. The vendor effectively said, “No way!” Not giving
up, the DBAs went back to work and studied the situation closely. To their amazement, they
discovered only three SQL statements were repeatedly run the most often. They figured if just
the three SQL statements used bind variables, that would be enough to solve the problem.
They met with the vendor again, presented their case, and the vendor agreed to make the
change. After the bind variables were used, library cache latch contention dropped from sight!
What I like so much about this story is that the DBAs understood the problem and took
their analysis to the next level. After demanding that the vendor make the change and being
turned down, most DBAs would just start complaining and whining. The problem might
never be resolved, forcing a system with much more CPU power to be purchased.
Use Cursor Sharing

Another way to quickly implement bind variable usage is to have Oracle automatically
transform the SQL. Oracle will effectively take non-bind variable SQL and transform it into
bind variable SQL.3 Suppose you notice a SQL statement like the one shown here.
select count(*)
from customers
where status != :"SYS_B_0"
and org_id != :"SYS_B_1"
If you know your application SQL well, you may be aware that this exact SQL does not
actually exist anywhere in the application. In fact, if you checked the SQL the application
submits to Oracle, it may look like this:
select count(*)
from customers
where status != 'ACTIVE'
and org_id != '15043'
What you are seeing is the result of Oracle automatically transforming SQL so it
becomes more shareable. Oracle calls this feature cursor sharing. The related instance
parameter, cursor_sharing, can take three options and can be altered at both the session
and system level. With a value of exact, no transformation occurs. With a value of
similar, Oracle looks for bind variables and transform them in, what I call, a kinder and
3
The student registration system that needed bind variables was based on version of Oracle that did not
support this transformation option. As a result, the SQL had to be changed.
264
gentler fashion. When set to force, Oracle transforms any and every literal value into a bind
variable.
If you ask a group of performance analysts about their experiences with cursor sharing,
you’ll immediately get a seemingly conflicting and passionate discussion. Some, like myself,
have had wonderful experiences with the similar option; others have had all sorts of
problems. Some who have used the force option have seen their application SQL so
aggressively transformed that the SQL result set was actually different.4 For example, instead
of returning ten rows, the SQL returned two rows, effectively breaking the application!
Obviously, you need to talk to your colleagues, check with Oracle support, and test the
various options in your particular environment. If physically altering the SQL to use bind
variables is not possible or extremely painful, cursor sharing can work marvelously. But you
must be very diligent in your testing before your use the option in a production environment.
Take Advantage of the Hash Structure

From a searching perspective, the library cache is architected in a hashing structure. So just
like with the buffer cache chains, we can alter the number of hash buckets and, if used, the
number of latches. When mutexes are used, Oracle sets the mutex memory structure
relationship. For example, as shown in Figures 7-5 and 7-6, each library cache object has its
own mutex.
Depending on your Oracle release, Oracle may not actually divulge the number of
library cache buckets or latches. For example, Oracle Database 10g Release 2 may show the
number of library cache buckets as nine and the number of library cache latches as zero!
Significantly altering the shared pool size may have no affect on what Oracle reports.
Furthermore, we can easily tell from Figures 7-5 and 7-6 that Oracle actually contains
thousands of buckets.
Oracle allows the number of buckets to be viewed (even though the value may not
represent the truth) via the instance parameter _kgl_bucket_count. The number of
library cache latches is controlled via the instance parameter _kgl_latch_count. I know
of no one who has increased the number of buckets in a production system and successfully
reduced library cache latch contention. However, just as with the cache buffer chain latches,
library cache latch contention can be reduced by increasing the number of library cache
latches.
To demonstrate just how dramatic adding more latches can be, I created an experiment
with the number of library cache latches set to 1 and 100. On a four-CPU core Oracle
Database 10g Release 2 Linux system, I placed the load of only five concurrent processes,
which essentially never ran the same SQL statement twice. I ran each test a few times. With
just one library cache latch available, only 35% of the host’s CPU capacity was used, 84% of
the wait time was attributed to latch: library cache, and the execution rate was 260
exec/sec. With 100 library cache latches available, 64% of the host’s CPU capacity was used,
2% of the wait time was attributed to latch: library cache, and the execution
jumped up to an amazing 3,112 exec/sec. So, as expected, increasing the number of latches
allows for increased concurrency and resource consumption. It’s also significant that Oracle’s
spin/sleep latch acquisition algorithm did not saturate the CPU subsystem by constantly
spinning on the one available latch!
4
Mark Gury, of MGA based in Australia, spends countless hours optimizing PeopleSoft systems. He has
personally told me on nurmous occasions that using the force option has returned a different row set. The lesson
here is to always thoroughy test when using cursor sharing before you make the change in production.
265
Just for fun, I enabled mutexes, and the execution rate reached only around 710
exec/sec! While Figures 7-11 and 7-12 clearly show mutexes provide significant benefit when
intense cursor pinning is required, when building cursors is an issue (a lot of hard parsing),
having plenty of latches can provide increased performance over mutexes. With that said, this
kind of latch and mutex comparison is highly load and Oracle release dependent. So I am not
making the statement that latches will always beat mutexes during intense hard parsing.
Try Mutex-Focused Solutions

When mutexes are available, they are enabled. They can be disabled by setting the instance
parameter _kks_use_mutex_pin to false. If your system is experiencing a deep mutex
problem, Oracle support may advise you to turn off mutexes until a patch or two has been
applied.
Most Oracle sites will never experience mutex contention and if they do, the stress will
probably be related to pinning a cursor in either shared or exclusive mode.
Interestingly, for mutexes to operate, the operating system must support the compare and
swap (CAS) operation. Reduced instruction set computer (RISC) operating systems, such as
AIX or PA-RISC, may have chosen to reduce their instruction set by eliminating the CAS
operation. In situations like these, Oracle will simulate the CAS operation by using a pool of
latches (1,024 by default in Oracle Database 11g Release 1). The latches are named KGX, and
their number can be changed by altering the instance parameter _kgx_latches. Obviously,
this is not optimal for performance, but one hopes the net result will be beneficial.
There are actually a number of mutex-related wait events, as listed in Table 7-1. While I
would like all mutex-related wait events to start with mutex, Oracle has taken a different
path. The mutexes associated with the library cache all start with the word cursor. It makes
sense, since the library cache is full of cursors, but it makes discovering new mutex usage
more difficult for the performance analyst.
266
Table 7-1. Mutex wait events

Mutex Wait Event Description
cursor: mutex X A session posts this event when requesting a mutex in exclusive
mode, cannot get it by spinning, and therefore sleeps. It takes only
one session holding the mutex in shared mode to prevent an
exclusive acquisition. Building a child cursor, capturing SQL bind
data, and build or updated cursor-related statistics require a mutex
exclusive hold.
cursor: mutex S A session posts this event when requesting a mutex in shared mode,
cannot get it by spinning, and therefore sleeps. Multiple sessions can
hold a mutex in shared mode. A mutex cannot be held in shared
mode if another session holds the mutex exclusively. A session holds
a mutex in shared mode, not exclusive mode, when changing the
reference count (refer to Chapter 3 for details). So another session
may be in the middle of changing the reference count. When this
occurs, the mutex is said to be “in flux.” Seeing this event is
extremely unlikely, as changing the reference count is spectacularly
fast (so I’m told and the algorithm suggests). So while multiple
sessions can hold the mutex in shared mode, changing the reference
count is truly a serial operation.
cursor: pin S A session posts this event when requesting to pin a cursor in shared
mode, cannot accomplish the pin by spinning, and therefore sleeps.
Multiple sessions can pin the same cursor in shared mode, but only
one session can hold a mutex in exclusive mode. Pinning increments
the mutex’s reference count, which is a serial operation. Because a
session must pin a cursor to execute it (you don’t want the cursor to
be deallocated in the middle of execution), performance analysts
have seen this event in production systems when a very popular
cursor is repeatedly executed by many sessions.
cursor: pin X A session posts this event when requesting to pin a cursor in
exclusive mode, cannot accomplish the pin by spinning, and
therefore sleeps. It takes only a single session with a shared mutex
pin to prevent an exclusive acquisition. A cursor must be exclusively
pinned while it is being created. You wouldn’t want another session
to create or change the same cursor at the same moment.
cursor: pin S A session posts this event when requesting a pin in shared mode, but
wait on X must wait because another session has the mutex in exclusive mode.
For example, if a session simply wants to execute the cursor, it must
acquire the mutex in shared mode. However, if another session is
building or altering the cursor (which requires an exclusive pin)
while the session is waiting to execute it, it will post this event.
Performance analysts have seen this event when cursors are being
rebuilt (perhaps an underlying table has been altered) while a
number of sessions want to execute the cursor.
267
If you review Table 7-1, you’ll notice the key to solving mutex-related contention is
understanding both the wait event and what is occurring within your application. For example,
if the wait event is cursor: pin S (which is the most likely), it’s possible that the same
cursor is being repeatedly executed by a few users, a few cursors are being executed by many
users, or even one simple SQL statement is being executed by hundreds of users concurrently.
Once you understand this, you’ll then look for the SQL statement with a relatively high
execution rate and do anything you can to reduce its execution rate. Using the wait event to
lead you to the particular cursor-related situation and understanding the nature of your
application is your best solution path.
And again, it is very unlikely mutex waits will be the top wait event (when there is not a
related mutex bug), but it occasionally does occur. So it’s important to understand mutex
serialization control (see Chapter 3) as well as library cache internals and diagnosis.
Shared Pool Memory Management

Oracle has an incredibly difficult challenge in managing shared pool memory. This is rather
obvious by all the changes, bugs, patches, and various performance problems over the years.
While that may invoke some feelings of sympathy, when faced squarely with a nasty memory
management-related issue, sympathy quickly turns to anger. In this section, I will explain how
the shared pool memory is managed, its management progression over the years, how
memory is allocated and deallocated, how to deal with the feared 4031 error, and finally how
to resolve shared pool latch contention.
From Hashing to Subpools

In Oracle 7 and Oracle8i, shared pool memory management was performed with the help of
an interesting hashing structure. If you recall our discussions on the cache buffer hash chains
and the library cache hash chains, then this will make perfect sense, but there was a twist.
When a process needed memory in the shared pool, its resulting hash bucket and chain
were related to the memory size required. The chains are commonly called a heap, which is a
linked list of available memory chunks. So conceptually, the first few chains were related to
around 1KB chunks of memory, the next few chains were related to around 2KB chunks of
memory, and so forth. While this was indeed ingenious, after a while of allocating and
deallocating memory of nonuniform sizes, the chains could become literally a few thousand
nodes long. Keep in mind the hash buffer chain size averages between zero and one. So a
chain of a couple thousand nodes is massive. And to make matters worse, there was only one
single shared pool latch to cover all the hash chains! Flushing the shared pool helped quite a
bit because the chains would be reduced to a respectable size. But that was no way to operate
a large production database, so Oracle had to make a change.
Oracle9i introduced subpools, which naturally lead to multiple shared pool latches. The
hashing-based strategy was replaced by a multiple subpools, each containing a single heap
operating on a standard LRU strategy. Oracle also began standardizing memory requirement
sizes, which increases the likelihood of finding an acceptable size chunk. The subpools,
multiple shared pool latches, and LRU strategy dramatically reduced shared pool memory
management problems. If you have managed both an Oracle8i and an Oracle9i system, you
probably experienced this change and noticed quite a difference.
268
The number of subpools on your system can be easily determined by either looking at
the instance parameter _kghdsidx_count or by counting the number of rows in
x$kghlu.
Figure 7-13 shows a series of SQL statements related to shared pool subpools. In this
case, a 800MB shared pool exists with three subpools. The x$ksmss query returns one row
for each subpool plus another row if the Java pool exists. The instance parameter to set the
number of subpools, _kghdsidx_count cannot be altered dynamically. If you want to
influence Oracle to invoke a subpool number change, you must set the instance parameter and
recycle the instance.
Oracle does place a hard limit on the number of subpools. In Oracle Database 11g, I was
able to start an instance with seven subpools, but with eight subpools, the instance did not
start—in fact, it required a shutdown abort before it could be restarted.
Interestingly, Oracle does not have to respect your subpool number wishes. In fact, in an
example similar to the one shown in Figure 7-13 running Oracle Database 11.1g, the instance
parameter was set to 2 and the instance restarted, yet Oracle created three subpools. With
Oracle Database 11.2g, the instance parameter was again set to 2 and the instance restarted,
and as specified Oracle created two subpools. And running Oracle Database 11.1g and 11.2g
without the instance parameter manually set, Oracle created only a single subpool. So while
you can influence Oracle, it still reserves the right to make changes.
269
SQL> @spspinfo
SQL> select sum(bytes/1024/1024) sp_size
2 from v$sgastat
3 where pool='shared pool';
SP Size (MB)
------------
800
SQL> select count(*) no_sp from x$kghlu;
Num of SPs
----------
4
SQL> select INST_ID, KSMDSIDX, KSMSSLEN

2 from x$ksmss
3 where ksmssnam='free memory';
INST_ID KSMDSIDX KSMSSLEN

---------- ---------- -----------
1 0 301989888
1 1 18818468
1 2 12659340
1 3 7697300
1 4 20482152
SQL> select i.ksppinm param,v.ksppstvl value

2 from x$ksppi i, x$ksppcv v
3 where v.indx=i.indx
4 and v.inst_id=i.inst_id
5 and i.ksppinm='_kghdsidx_count';
PARAM VALUE
-------------------- -----
_kghdsidx_count 4
Figure 7-13. Shown is the OSM script, spspinfo.sql, which shows basic information
about the number and size of the shared pool subpools. In this example, three subpools exist.
Memory Allocation and Deallocation

Memory allocation is fairly straightforward. It follows a standard LRU algorithm combined
with pinning and locking. When an Oracle process (server or background) requests memory, a
portion of the Oracle’s kernel called the heap manager is executed. While the details continue
to change, the conceptual algorithm is pretty much the same.
Oracle processes ask for a specific amount of memory, which is transformed into
multiple chunk-specific size requests. The heap manager searches for a single size chunk of
memory that matches each request. Multiple chunks of memory, think noncontiguous, will not
do. If the process requests 4KB of memory, the heap manager must return an address for a
4KB chunk of shared pool memory.
In Oracle9i, the Oracle process acquires a subpool latch and will search the subpool up
to five times before giving up. Allowing for multiple passes increases the likelihood of
finding memory, as the memory situation can change dramatically and quickly. However,
after five searches, while holding the respective shared pool latch, if the appropriate memory
270
chunk size cannot be found, Oracle gives up, posts the feared 4031, “out of memory”
message, and the session stops processing. As every Oracle DBA knows, this is totally
unacceptable behavior in a production system.
In Oracle Database 10g, the Oracle process is more fervent in its memory quest. If after
searching five times in the current shared pool the memory is not found, the process moves on
to another subpool. This will continue until all the defined subpools have been searched. If, at
this point, the memory cannot be found, as before, Oracle gives up, posts the 4031 error
message, and stops processing. What Oracle has done in this version is reduce the chances of
returning the error message in exchange for the possibility of consuming more CPU and
holding a shared pool latch longer. From a database operational perspective, slower
performance is better than no performance. At least work can be performed while we resolve
the performance problem.
When memory is in short supply, Oracle will deallocate unpopular chunks of memory.
You probably have experienced this when attempting to retrieve a SQL statement’s text and it
is no longer cached in the shared pool. Fortunately, Oracle will not deallocate memory that is
in use. For example, if a cursor is pinned, Oracle will not deallocate the associated memory,
regardless of how unpopular it may be. In fact, even flushing the shared pool will not remove
pinned cursors! If you really want to empty the shared pool and start from the beginning, you
must recycle the instance.
Shared Pool Latch Contention Identification and Resolution

The shared pool latches are used to serialize shared pool memory management. This means
operations such as searching for memory, LRU activity, allocating memory, and deallocating
memory require a shared pool latch. Because multiple subpools exist starting in Oracle9i, and
each subpool has it own shared pool latch, simply running this version or later greatly reduces
the likelihood of shared pool latch contention. But sometimes that is still not enough. The
following are some possible solutions that will decrease latch acquisition time, latch hold
time, or both.
Pin Large and Frequently Used Objects

This strategy is used to ensure objects successfully make it into the cache, regardless of the
memory activity or the object size. The first time any package is called, the entire package is
loaded into memory. If this need arises after an active shared pool has been in operation, it
can force substantial memory management activity, which could result in the object not being
able to load, resulting in a 4031 error. Even if the object does successfully load, the user may
notice the application delay.
There are also times when we may want to pin small objects. For example, suppose an
object has a pattern of intense activity, a long pause causing the object’s memory to be
deallocated, and then another period of intense activity. To ensure there is no application
delay and to reduce memory management, we can simply pin the object. When we force
objects to remain in the shared pool, keep in mind that in a very real sense, we are gaming
Oracle’s shared pool LRU algorithm. But sometimes this is what it takes.
Most large Oracle applications provide a script containing the objects to be pinned in the
shared pool, and they will recommend it be run immediately after the instance has started. It’s
important to know that even if your application vendor provides such a list, you can refine this
list by understanding how your organization actually uses the objects. It is common for the
vendor application developers to create the pin list. However, most application developers
271
think their objects are the most important and should always be pinned. But in reality, many
times no one really knows how your organization will use the application until it is
operational in a production environment. So if 4031 errors are occurring, it is always a good
idea to refine the pin list.
There are four straightforward steps to ensure the objects you want are always kept in
the shared pool. While the word pin is commonly used, the dbms_shared_pool package’s
keep function is used to ensure the object is kept in the shared pool. This package is not
loaded by default when the database is created, so your first step is to load it. The following
code snippet is an example of how to create the procedure.
[oracle@fourcore ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Thu Jun 3 08:58:36 2010
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> @$ORACLE_HOME/rdbms/admin/dbmspool.sql
Package created.
Grant succeeded.
SQL>
The next step is to find the large and/or popular objects. Oracle keeps track of shared
pool object usage and makes this information easily available via the
v$db_object_cache view. Figure 7-14 shows the output from the OSM script
dboc.sql, which is used to identify potential objects. You’re likely to see a group of
packages that are much larger than the rest and also packages that are executed much more
frequently than others. There may also be objects that you personally know have unusual
execution profiles and you want to have cached.
Once you have the list of objects to keep, the next step is to determine how to place them
in the cache. The keep function is used to pin the object, or better said, to keep the object in
the shared pool.
272
SQL> @dboc 10 20
Database: prod18 29-JUL-10 01:57pm


------------ ----------------------------------- ---- ----- ----- ----- -----
SYS PRVT_ADVISOR PBDY 0 2 121 NO
SYSMAN MGMT_GLOBAL PBDY 0 0 12 NO
SYS DBMS_APPLICATION_INFO PBDY 0 0 8 NO
SYSMAN MGMT_LOG PBDY 0 0 12 NO
SYSMAN EMD_LOADER PBDY 0 0 152 NO
SYSMAN MGMT_JOB_ENGINE PBDY 0 0 532 NO
SYS DBMS_AQ PBDY 0 0 28 NO
SYS DBMS_SQL PBDY 0 0 57 NO
...
SQL> l
1 select a.owner ownerx,
2 a.name namex,
3 decode(a.type,'PACKAGE','PKG','PACKAGE
BODY','PBDY','FUNCTION','FNC','PROCEDURE','PRC') typex,
4 a.loads/1000 loadsx,
5 a.executions/1000 execsx,
6 a.sharable_mem/1024 sizex,
7 a.kept keptx
8 from v$db_object_cache a
9 where a.sharable_mem >= &min_size
10 and a.executions >= &min_exec
11 and a.type in ('PACKAGE','PACKAGE BODY','FUNCTION','PROCEDURE')
12* order by executions desc, sharable_mem desc, name
SQL>
Figure 7-14. Shown is the OSM script dboc.sql, which is used to quickly identify large
and/or frequently executed shared pool objects. This example was run soon after the instance
restarted.
To keep a cursor in the shared pool, gather both its address and hash value from a
v$sql, v$sqlarea, or perhaps v$open_cursor script. Notice that in the following
code snippet the address (6877c238) and hash value (1356456286) are concatenated together
with a comma between them. The second parameter is a C, since we are keeping a cursor. For
keeping triggers the parameter is T; for sequences, use Q; and for packages, procedures, and
functions, the parameter is P.
SQL> exec dbms_shared_pool.keep('6877C238,1356456286','C');
273
The preceding code snippet can be used for programmatic structures as well, but most
find the following option easiest to use. The next code snippet is an example of keeping the
buy procedure owned by the trader schema.
SQL> exec dbms_shared_pool.keep('trader.buy');
Finally, upon issuing either one of the preceding code snippets, you can easily check to
ensure the object is indeed being kept. For example, Figure 7-15 (which does not show
cursors) will clearly show the trader.buy procedure with the Kept? column set to YES.
SQL> @dboc 0 0
Database: prod18 29-JUL-10 02:10pm

Obj Exe Size

------------ ----------------------------------- ---- ----- ----- ----- -----
SYS PRVT_ADVISOR PBDY 0 2 121 NO
SYSMAN MGMT_GLOBAL PBDY 0 0 12 NO
SYS DBMS_APPLICATION_INFO PBDY 0 0 8 NO
SYSMAN MGMT_LOG PBDY 0 0 12 NO
...
SYS SYS$RAWTOANY FNC 0 0 16 NO
TRADER BUY PRC 0 0 12 YES
SYS DBMS_LOGSTDBY PBDY 0 0 12 NO
...
234 rows selected.
Figure 7-15. Shown is a recently restarted instance with the trader.buy procedure kept
(pinned) in the shared pool. While kept, the procedure has not been executed yet, so the script
parameters had to be set to 0 for the buy procedure to display.
I’m frequently asked how often to refine the pin list. Personally, I don’t like to invoke
any database change until I have a very good reason. A good reason for refining your pin list
is if your system has suddenly started experiencing shared pool latch contention or has
encountered 4031 errors. This is very important: From a more proactive posture, refine the pin
list if application functionality is added, an application upgrade occurs, or application usage
significantly changes.
274
Flush the Shared Pool

While not on the top of anyone’s list, simply flushing the shared pool will bring some
immediate shared pool latch contention relief. This is especially true for pre-Oracle9i systems,
when subpools did not yet exist. This is obviously not an optimal solution, because every
object not pinned in the shared pool will be removed and its memory deallocated. The initial
result may seem counterproductive because it will likely result in immediate and massive hard
parsing, which as we know, consumes significant CPU resources and forces an unnatural
amount of latching. However, this unfortunate situation will soon subside.
There are times when the combination of shared pool size, Oracle release (pre-Oracle
9i), and application usage will leave the DBA with no choice but to plan periodic shared pool
flushes. That is simply the reality of the situation.
As the following code snippet shows, flushing the shared pool is very simple, yet the
effect is indeed significant.
SQL> alter system flush shared_pool;
System altered.
Increase the Number of Subpools

One of the easiest, most powerful, and most appropriate shared pool latching solutions is to
simply add subpools, which will also add shared pool latches. The process is detailed in the
earlier “From Hashing to Subpools” section.
This particular solution is very clean in that it requires minimal effort and we are not
gaming Oracle’s shared pool LRU algorithm. However, keep in mind that more subpools may
require more shared pool memory, an instance restart is required for the instance parameter
change to take affect, and Oracle reserves the right to not respect your recommendation.
Reduce the Shared Pool Size

As strange as this may sound, before subpools existed, increasing the shared pool size could
eventually result in shared pool latch contention. Every algorithm is limited in capability and
is designed to operate in specific situations. When the situation changes, the algorithm may
not perform as desired. And don’t forget that increasing a cache to support more activity will
almost always require more CPU resources to manage. So, there will likely be a point of
diminishing returns. Oracle’s initial shared pool memory management algorithm worked
fairly well up to a shared pool of around 600MB, but once it hit around 750MB, it was very
common for DBAs to begin seeing significant shared pool latch contention.
To illustrate my point, back in Oracle8i days, while speaking to about 300 Oracle E-
Business Suite DBAs, I took a survey. I asked the DBAs to raise their hand if they kept
increasing shared pool memory to reduce 4031 errors, but while the 4031 errors decreased,
they also saw a significant increase in shared pool latch contention. To the surprise of their
colleagues, about one-third of all the DBAs raised their hands! Upon further questioning, we
learned that for the Oracle Applications (as they were called then), the point of diminishing
returns was around a 700MB shared pool.
Once subpools where introduced, and especially combined with the other solutions I’ve
outlined, shared pool latch contention can be successfully resolved.
275
4031 Error Resolution

If you want to rile an Oracle DBA, sneak up behind them and whisper, “4031.” A 4031 error
is Oracle saying to you, “I give up.” Perhaps there is an, “I tried and I’m sorry” just before
that, but it doesn’t offer much consolation and surely doesn’t stop the phone calls from angry
users.
Oracle has a tricky balancing act deciding when to give up and when to continue
consuming CPU and holding latches. Over the years, the likelihood of Oracle running out of
shared pool memory has decreased, but the chance of a 4031 error is still highly dependent on
the amount of Oracle shared pool memory and the application. The following is an actual
4031 error message:
ORA-04031: unable to allocate 4192 bytes of shared memory ("shared

pool","SELECT * F...","sql area (6,0)","kafco :
qkacol"):4031:375:2008:ocicon.c
In this snippet, 4KB of memory was trying to be allocated in subpool 6, but for some
reason, this could not be accomplished. Fortunately, there are a number of ways to reduce the
likelihood of receiving the dreaded 4031 error.
Flush the Shared Pool

As with resolving shared pool latch contention, one solution to 4031 errors is to flush the
shared pool. While no DBA wants to admit to periodically flushing a shared pool,5 this still
works. Depending on the Oracle release, the amount of allocated shared pool memory, and the
application’s unique memory usage pattern, it may be your best bet. This is especially true for
pre-Oracle 9i systems.
Increase the Shared Pool Size

Increasing shared pool memory conceptually provides Oracle with more flexibility in
satisfying memory requests. However, along with the benefits, there is always a cost when
shifting computing resources. In most cases, the benefits actually do outweigh the costs, so if
the operating system has the memory available, or if you can shift memory from other Oracle
caches to the shared pool, increasing shared pool memory is highly likely to reduce 4031
errors.
Keep in mind that any time you ask Oracle to manage more memory, more CPU is
required to manage that memory. This is especially true with pre-Oracle9i systems because of
the potential for extremely long memory chain heaps. If the chains are thousands of chunks
long, while 4031 errors may subside, the situation can manifest into serious shared pool latch
contention and massive CPU consumption while attempting to acquire the shared pool latch
and also while scanning the long chains—so be careful.
If you are being, as the Oracle documentation states, liberated by automatic memory
management, you may need to set a minimum shared pool size. In the quest to increase the
buffer cache, Oracle has been known to automatically decrease the shared pool size so much
that 4031 errors start occurring.
5
Remember that a shared pool flush is not a true flush. Pinned cursors and objects (think packages) will not be
flushed.
276
Increase the Shared Pool Reserved Size.

A common memory allocation challenge occurs when a relatively large package is initially
being loaded into an already very active shared pool. The more active the shared pool,
especially if it is small and has highly diverse object sizes, the more likely the required
memory will not be found.
Suppose our server process needs memory for a large cursor. When Oracle is searching
for shared pool memory, if the object size is larger than the threshold, Oracle first searches in
the reserved area. If the memory is not found in the reserved area, Oracle will check in the
nonreserved area. This strategy helps to keep smaller objects out of the reserved area, thereby
preserving it for the larger objects.
Three instance parameters can be used in combination:
• The shared_pool_reserved_size can be used to directly set the reserved size
in bytes.
• The hidden parameter _shared_pool_reserved_pct, which defaults to 5 (for
5%), can be used instead of shared_pool_reserved_size.
• A relatively large object is defined by the instance parameter
_shared_pool_reserved_min_alloc, which defaults to 4,400 bytes.
Interesting, notice the default threshold value of 4,400 bytes is just larger than the
typical single chunk request of 4,096 bytes. So, by default, Oracle is saying that any
memory request larger than one typical size memory chunk request is to be considered
large and therefore should get its memory from the reserved size.
Either of the first two parameters can be used to set the memory size exclusively
reserved for the relatively large objects. If you set one of the parameters, Oracle calculates the
other.6
By carefully adjusting these parameters, the performance analyst can increase the
likelihood of a process finding a large amount of memory, while still maintaining plenty of
memory of the relatively smaller objects. A report based on v$db_object_cache like the
one shown earlier in Figure 7-14 can be very helpful to determine both the threshold size and
the reserved memory size. While these parameters are typically not adjusted, if 4031 errors
occur, their careful adjustment may fix the problem.
Minimize Cursor Pinning Duration

When a cursor is being executed it is also pinned. After all, you don’t want your SQL
statement to suddenly disappear during execution! That’s the good news. The potentially bad
news is when the execution has completed, the pin is released. If no other process has the
cursor pinned, Oracle is free to destroy—that is, deallocate—the associated memory. Now
suppose someone wants to reexecute the cursor. If it has been deallocated, a hard parse
results, since the entire cursor will have be rebuilt! Every application usage pattern is unique;
so when combined with a perhaps smaller shared pool or a lot of unique SQL statements, or
both, the memory management and the library cache activity can become incredibly intense.
One way to reduce hard parsing is to keep the cursors pinned so they cannot be deallocated.
6
In case you’re wondering, you can set both (although this is not recommended). If you set both, Oracle will
probably respect the direct setting via shared_pool_reserved_size.
277
Oracle provides a special instance parameter that will keep all cursors for all sessions
pinned until the cursor is closed. But this benefit comes at the cost of increased shared pool
memory consumption, and therefore, the increased likelihood of receiving a 4031 error.
Oracle is very aware of this, so to encourage deallocation to free up memory and decrease the
likelihood of 4031 errors occurring, the cursor_space_for_time instance parameter is
set to false by default.
If a system is experiencing 4031 errors, you should always check the value of
cursor_space_for_time. If your system has experienced severe shared pool latch
contention at some point in the past, someone may have understandably set
cursor_space_for_time to true. While you may not decide to set the parameter to
false, it is a valid option and should be seriously considered.
Reduce Kept Objects Memory Consumption

If too many objects have been forced to remain in the shared pool by issuing the
dbms_shared_pool.keep procedure, they may be consuming so much memory that
Oracle simply may not be able to successfully manage what remains. Also, if a large object is
not being kept in the shared pool, the instance has been running for a while, and then the
object is referenced, when it is forced to be loaded, the memory may not be available. The key
is to not casually keep objects in the shared pool. You should intelligently reduce and refine
your pin list, as described earlier in the ”Resolving Shared Pool Latch Contention” section.
Upgrade to Oracle Database 10g Release 2

Surely, 4031 errors are not the only reason to upgrade, but starting with Oracle Database 10g
Release 2, Oracle began standardizing memory into 4KB chunks.7 While I would never
recommend upgrading to this version only because of this improvement, it may be part of the
rationale to upgrade.
Just as with standardizing on segment extent sizes, having standard memory chunk sizes
increases the likelihood of quickly finding suitable memory. The quicker memory can be
found, the fewer CPU cycles will be consumed, the less time the shared pool latch must be
held, and the less likely a lot of wasted small chunks of memory will exist (increasing the
likelihood of a 4031 error).
In-Memory Undo Management

Oracle itself continues to add new performance features to the core kernel. Sometimes these
take the form of new features like a new SQL optimization path. Other times, it means
replacing an existing function or algorithm that many kernel developers and most DBAs will
be unaware of. One such optimization is the introduction of Oracle’s patented in-memory
undo (IMU). Essentially, instead of maintaining undo in Oracle segments, the undo is
managed, as much as possible, in memory using structures optimized for in-memory
operations. But as you will learn, how Oracle does this is fascinating and foreshadows even
greater things to come. But with any piece of code, there is always the possibility of a
7
Oracle obviously is keeping its options open; for example, the Oracle Database 11g result cache memory
chunks are 1KB.
278
bottleneck, so I’ll cover how to detect IMU performance issues and multiple ways to solve the
problem (in addition to just turning it off).
New Features Bring High Risk

Think about this for a second: If you ran a massive software company, would you sleep good
at night knowing the next morning thousands of your customers would start using a brand-
new algorithm in your core product and your employees have little real-world experience with
the potential problems it may cause or what the solutions may be? It’s not something most
people consider, but over and over Oracle management must face this dilemma. So, before
Oracle, or any software company for that matter, allows a new algorithm into their core bread-
and-butter multibillion-dollar-brand product, it had better know that it works. And not just
work, but work wonderfully. While CEOs may desire their picture on the front page of the
Wall Street Journal, that’s an IT manager’s nightmare.
Oracle has a history of silently slipping core algorithm changes into its database server
code. For example, in Oracle 7, the Oracle developers started instrumenting their code,
resulting in the wait interface; the modified buffer cache LRU algorithm dramatically and
silently changed, using the touch-count algorithm in Oracle8i Release 8.1.5; mutexes began
being used in Oracle Database 10g Release 2; and as I’ll detail in the next chapter, Oracle
begin using multiple redo allocation latches in Oracle 9i Release 2. All these changes were
never formally announced, and some still have not been announced or documented. If you
haven’t heard much about these changes, then Oracle developers did a good job of testing
their code before it was released.
IMU is one such introduction. It became standard issue starting with the initial Oracle
Database 10g release. Oracle knows that for many customers, segment management and, in
particular, creating read-consistent buffers, consumes significant computing resources:
memory, CPU, and IO. So if the developers can reduce one or all of these consumables while
improving performance, they are thrilled. IMU is wonderful, and it sets the stage for
transforming other traditionally segment-based operations into in-memory alternatives.
The Problem: Segment Management

Oracle uses undo segments for two main purposes:
• To allow a transaction to roll back so the database looks just as it did before the
transaction started.
• By default, whenever a query starts and ends, the results will match what the database
looked like when the query started.
As I noted in Chapter 6, this read-consistency model consumes a tremendous amount of
resources.
Segment management is resource-intensive. Needing to manage block-based segments
in memory is clearly not optimal. It makes more sense to do all the in-memory work using
highly optimized structures designed for in-memory operations, and then, when necessary,
transform the in-memory structures into traditional segment structures. As an example, buffer
headers are a highly optimized in-memory structures directly related to cached blocks.
In many ways, traditional undo segments are treated like table and index segments. All
the classic operations must be performed. For example, changes to a segment must be
recorded in the redo stream for recovery and the LRU chains and the write (dirty) lists must
279
be updated to ensure smooth and proper data management. And don’t forget about all the
associated latching and pinning that must occur, not to mention all the associated CPU cycles
to make this happen. Do these operations really need to be done using classic segments? Not
according to Oracle.
Clearly, there exists an opportunity for increased performance. But as I mentioned, it’s
risky and time-consuming to develop, test, and implement a radical change. It requires
everyone to change the way they think about Oracle undo management. But once you cross
this threshold, you’ll begin to see amazing performance improvement possibilities.
Introducing In-Memory Undo

With IMU, Oracle has shifted as much undo work as possible to using optimized in-memory
structures instead of traditional segments. Oracle still creates undo, because it must still
provide rollback and read-consistency capabilities. This has many implications.
Whenever an Oracle buffer is changed, the associated change (called a redo vector) is
consolidated with other redo vectors to form a single redo entry, and then written into the redo
log buffer. Unfortunately, even if an undo segment buffer is changed, its change must also be
recorded in the redo log buffer. But since IMU nodes are not undo segments, their changes do
not generate redo! 8 So, IMU will reduce the amount of redo an instance generates. This
impacts all redo-related operations, including, redo creation, copying redo into the log buffer,
and writing the buffered redo to an online redo log. If you have throughput-related redo
issues, IMU can help.
By default, Oracle provides sessions a transactionally consistent view of the database.
While this is a wonderful feature for most business applications, it involves significant
overhead. If you recall from Chapter 6, when an undo block is needed but does not reside in
the buffer cache, it must be read from disk and placed into the buffer cache. This requires both
CPU and IO resources. However, if the undo information is located in an IMU node, the
associated read-consistency overhead is significantly reduced. So, if you have a lot of read-
consistent-related issues, IMU can help.
Using IMU instead of traditional undo segments does not present a recovery problem for
many reasons,9 including the following:
• IMU structures do not need to be recovered, because they are not segments.
• A database writer will never write a changed buffer to disk unless the associated redo
information has already been written to an online redo log. (A database writer can
trigger the log writer to flush the redo log buffer.) This means Oracle will not permit
the situation where a changed on-disk block exists without the associated redo written
to an online redo log. If an instance or media failure occurs, any changed on-disk block
will have its recovery information in the redo stream. So, a buffer change is not a
recovery issue; it’s the change to an on-disk block that is a recovery issue, and Oracle
has already dealt with this.
• If a failure occurs during a read-consistent operation, it simply stops without any
committed data loss.
8
Some redo is generated for serialization purposes, but it is minimal.
9
Well, almost never. As of Oracle Database 11g, when features like streams, data guard, log miner, and
flashback are used, Oracle disables IMU. Apparently, some serialization issues have caused recovery problems. I
suspect this limitation will be resolved over time.
280
So, IMU nodes provide all the undo-related capabilities without any of the standard
segment management overhead. Brilliant!
Another amazing feature of IMU is when it comes time to transform the in-memory
undo into undo segment format, multiple IMU nodes can be collapsed or consolidated to
minimize the amount of undo segment data. For example, suppose a transaction updates a
buffer at three different times: T1, T2, and T3. Using traditional undo segments, this will
cause three changes to the same buffer, complete with all the associated overhead with
changing a buffer. With IMU in use, while three IMU entries are created, when it comes time
to update the actual undo segment buffer, the three IMU nodes are collapsed to reduce the
amount of undo. Plus the IMU nodes can remain in memory, so read-consistent queries can
access the IMU nodes directly, instead of working with the traditional undo segments.
So as you can see, not only are IMU nodes holders of undo information, but they are
also a mapping structure between a changed buffer and the undo. In a way, they intercept the
traditional pointer from a buffer (located in the interested transaction list) to the undo block
holding the change. When Oracle writes the IMU to the appropriate undo segment and flushes
the IMU out of shared pool memory, the undo pointer is changed to point to the undo
segment.
IMU structures reside—you guessed it—in the shared pool. As we know, when we care
about memory structure access serialization, a control structure is required. Oracle chose to
create a number of IMU latches, which, as you will learn soon, allow for the possibility of
IMU latch contention.
How IMU Works

With a couple of good figures for reference, understanding how both traditional undo and
IMU work is very straightforward (at least at this abstracted level). Visually comparing Figure
7-16 (traditional undo management) and Figure 7-17 (IMU management), you can
immediately see a significant difference. In Figure 7-17, there is the absence of undo segment
activity until the transaction commits at time T4. And just as important, the multiple IMU
nodes have been consolidated (not deallocated and officially called collapsed) into a single
undo segment buffer 310. Here, I will walk you through this process.
Traditional Undo Management

Using Figure 7-16 as our reference, let’s take a close look at how traditional undo
management works.
• Time T1: Buffer 142, containing row 136 having a column value of 105.
• Time T2: A transaction changes row 136 from a value of 105 to 110. This results in
three key operations. First, the buffer (remember this is done in the buffer cache and
not on disk) is changed. Second, the undo must be recorded in an undo segment buffer
(remember this is done in the buffer cache and not on disk), and the undo link in buffer
142’s interested transaction list (ITL) must point to undo buffer 210. And finally, the
redo associated with each buffer’s change must be recorded in the redo log buffer. All
of these operations are performed in memory, but if a buffer is not in memory, it must
first be retrieved from disk, a free buffer found, and the buffer replaced. And of course,
all the associated latching, pinning, and memory and IO management must occur.
281
• Time T3: The transaction once again changes row 136. But this time, from a value of
110 to 115. First, buffer 142 is changed. Second, the buffer change is recorded in the
undo buffer, and the buffer’s ITL is updated to point to the most current undo entry in
undo buffer 210. Third, the changes for both buffer 142 and the undo buffer 210 are
copied into the redo log buffer. Notice the chain of undo has developed, allowing the
transaction to roll back to time T1 and also allowing others to see row 136 at time T1
as well.
• Time T4: The transaction commits. Buffer 142’s ITL is changed, reflecting the
commit, but for simplicity, I did not show the associated undo. The commit redo entry
is written into the redo log buffer for possible recovery needs. The commit triggers the
log writer to flush the redo log buffer to the current online redo log.
At time T4, row 136’s value is 115. Any access to this block that started at time T4 or
later will see a value of 115. However, even after the transaction commits, if a query started
before time T4, a read-consistent view of buffer 142 must be created using buffer 142 at time
T4 and undo buffer 210. I am leaving out quite a bit of detail (covered in Chapter 6), because
the point I want to make here is that all this read-consistency work is referencing standard
Oracle undo segment buffers, not highly optimized in memory structures.
Figure 7-16. How traditional undo segment are used, as well as the various key components
(data buffer, undo buffer, redo buffer, and online redo log) change as a transaction
progresses
282
IMU Management
Referencing Figure 7-17, let’s take a close look at Oracle’s patented (# 6,981,004) IMU
scheme:
• Time T1: For space reasons, time T1 is not shown in Figure 7-17, but it’s the same as
in Figure 7-16. At time T1, buffer 142 contains row 136, which has a column value of
105.
• Time T2: A transaction changes row 136’s value from 105 to 110. This results in three
key operations. First, the buffer is changed (no change from traditional undo
management). Second, an IMU node is set to reflect the undo associated with the
change and the undo link in block 142’s ITL points to the IMU node. IMU nodes
reside in the shared pool. and they must be accessed using one of the IMU latches.
They operate in an LRU algorithm, although Oracle can artificially age the nodes and
change IMU memory allocation. Third, the redo associated with buffer 142 change
must be recorded in the redo log buffer. Notice that since there was no undo buffer
involved (so obviously it was not changed), there is no associated undo buffer redo,
and all the potential associated segment management overhead.
• Time T3: The transaction once again changes row 136 from a value of 110 to 115.
First, buffer 142 is changed. Second, an IMU node is set to reflect the undo associated
with the change, and buffer 142’s ITL is updated to point to the most current IMU
entry. Third, the buffer 142 change is recorded in the redo log buffer. Again, there is
no undo buffer change, so no undo is written into the redo log buffer. Notice the chain
of undo has been developed entirely of IMU nodes, allowing the transaction to roll
back to time T1 and also allowing others to see row 136 at time T1.
• Time T4: The transaction commits. Buffer 142’s ITL is changed, and the commit redo
entry is written into the redo log buffer for possible recovery needs. A commit triggers
the log writer to flush the redo log buffer to the current online redo log.
The IMU nodes are not required to be flushed, but can remain in memory just as long as
Oracle wants them to stay. When Oracle does replace or destroy the IMU nodes, buffer 142’s
ITL entry will simply be changed to point to undo segment buffer 310 instead of the IMU
node. But until that happens, any transaction needing a view of buffer 142 before time T4 will
reference the IMU! So, the read-consistency operation (which includes buffer copying/cloning
and undo application) can be done entirely in memory, using IMU nodes, and without the
associated undo block/buffer overhead. A cloned block is still a buffer and must endure the
traditional cloning activity of having undo applied, but the undo is coming from the IMU and
not undo segment blocks.
283
Figure 7-17. How IMU nodes are used instead of traditional undo segments, as well as the
various key components (data buffer, IMU nodes, undo buffer, redo buffer, and online redo
log) change as a transaction progresses
A Marked Performance Improvement

Although multiple bugs were logged when IMU was first introduced, it now works
wonderfully. But the real question and our chief concern in this book is this: Does it improve
response time? This is especially important for applications with heavy consistent read
activity, since these use undo extensively.
I decided to perform an experiment. Before I share the results, I’ll explain how I set up
the experiment. Take a deep breath, and continue.
The server is a single Intel CPU/core Dell box with 512MB of memory, running Oracle
Database 11g Release 1, with a 256MB of shared pool memory, a buffer cache of 4MB (yes, I
know it’s small, but it forces a lot of IO), and running on Oracle’s unbreakable Linux 5. Each
of the two main tests (IMU enabled and IMU disabled) was run nine times. The worst time
was removed from each series, resulting in eight samples. Before each of the two tests, the
instance was recycled, and the DML load was started and left to stabilize for about 5 minutes.
Then a heavy read-consistent query was run four times. CPU and wall clock time statistics
were collected before and after each query; that is, the wall clock time is the difference
between the time at the end of the last query subtracted from the time before the first query.
For this particular experiment, there was a statistically significant 21% CPU time
reduction when using IMU. However, there was a statistically insignificant 6.5% wall clock
284
time reduction when using IMU. This means, for this particular experiment, there is not quite
enough wall clock time difference to indicate using IMU made an improvement.
Unfortunately, based on this experiment, the significant IMU CPU savings was offset by
an insignificant average wall clock time reduction. The experiment was not designed to detect
why the wall clock time was not significantly better when using IMU. But I’ll speculate:
There was a raging DML and physical read-induced IO bottleneck with plenty of excess CPU
capacity. So while IMU reduced the CPU consumption, the wall clock time was primarily
based on IO. In other words, I reduced Oracle’s CPU consumption when where was plenty of
available CPU—the classic performance blunder of tuning the wrong thing! However, if the
bottleneck were CPU, I should have seen a very significant reduction in both CPU and wall
clock time.
It is interesting to note the IMU results varied less compared with the non-IMU results;
in other words, when using IMU, query times were more consistent. I suspect it was because
there was less segment management, which sometimes requires IO, and IO times can vary
widely. Consistent response time is something users enjoy. If you want to irritate a user, give
them really fast response times combined with really slow response times.
The experimental results also show that although IMU operations are performed in
memory, and memory management consumes CPU resources, CPU resources were
significantly reduced when using IMU. If you think about that, it’s actually quite an amazing
accomplishment.
Based on my understanding of IMU, the experimental results, and understanding Oracle
performance analysis, unless there are kernel issues with IMU (which there have been), I
would always keep IMU enabled. With IMU enabled, you can expect Oracle to consume a
significantly less amount of CPU.
IMU Setup and Monitoring

IMU setup and monitoring is quite simple because Oracle automatically enables IMU by
default. Only eight instance parameters directly affect IMU, and IMU activity is recorded just
like any other Oracle internal activity.
Setting Up IMU
If you want to make changes, carefully consider their effects, as all direct IMU instance
parameters are currently hidden parameters. Eight instance parameters are used to control or
affect IMU operations, as listed in Table 7-2.
285
Table 7-2. IMU instance parameters

Instance Parameter Description
Sets the maximum number of Oracle
connections. The greater this value, the greater
the number of IMU latches. Multiple tests show
that for around every 8.25 processes defined,
one IMU latch is created. For example, 800
processes processes led to 97 latches, 600 processes to 73
latches, 400 processes to 48 latches, and 200
processes to 24 latches. Changing the
processes parameter has a cascading effect,
since many other Oracle parameters are also
based on the processes parameter.
Enables or disables IMU. By default, this is set
_in_memory_undo to true. To turn off IMU, set this parameter to
false.
Sets the number of IMU memory pools. The
default on some systems is 3. Increasing the
_imu_pools
number of pools does not imply a change in the
memory Oracle will allocate to IMU.
Enables all recursive SQL to use IMU. The
_recursive_imu_transactions
default is false.10
Allows Oracle the freedom to artificially age a
_db_writer_flush_imu transaction for increased automatic cache
management. The default is true.
compatibility Must be greater than or equal to 10.0.
Must be set to auto to enabling automatic undo
undo_management management and subsequently providing the
ability to enable IMU.
Must be set to false. At the time of this
writing, RAC does not support IMU. I suspect
the algorithms to support RAC IMU are much
cluster_database
more complicated and could induce all sorts of
bizarre dependency and ordering (think redo)
situations.
Monitoring IMU
There are a few ways to monitor IMU activity. (Detecting and resolving performance
problems are covered in the next section.)
You cannot directly control the memory Oracle allocates to the IMU buffers. However,
you can possibly indirectly influence Oracle’s IMU memory allocation by altering the shared
10
It always bothers me when Oracle doesn’t use its own features. Are they trying to tell us something?
286
pool size by changing, for example, the shared_pool_size instance parameter. The
current IMU memory allocation can be easily queried, as shown in the following code
snippet:
SQL> select * from v$sgastat where name like 'KTI-UNDO';
POOL NAME BYTES

------------ -------------------------- ----------
shared pool KTI-UNDO 1647168
Like many other Oracle statistics, IMU activity is recorded via the system statistics
view, v$sysstat and v$sesstat. It’s also interesting to watch the change between
standard commits and IMU commits. If you see IMU commits increasing, you know IMU is
active. The following is an example showing and comparing IMU and user commits.
SQL> select name,value from v$sysstat where name like '%commits';
NAME VALUE
------------------------- ----------
user commits 1,283
IMU commits 1,676
2 rows selected.
It’s simple to monitor latches. One way to do this is look at the latch requests (gets). If
the latch is being requested, then you know it’s active. Figure 7-18 shows an example taken
from a small Linux server with a single CPU core and 200 processes defined. Notice the
about 8.25:1 process-to-latch ratio is in effect.
287
SQL> l
1 select name, gets, misses, immediate_gets IM, sleeps
2* from v$latch_children where name like 'In%undo%'
SQL> /
NAME GETS MISSES IM SLEEPS

-------------------- ------- ------- ------ ------
In-memory undo latch 0 0 0 0
...
24 rows selected.
Figure 7-18. While 24 IMU latches have been created, at the time of this report, only 9 of
them have been used, as evidenced by their gets activity.
IMU Contention Identification and Resolution

Identifying IMU latch contention is no different than identifying any other type of latch
contention. If you’re not comfortable with this, please review Chapter 3.
If IMU is causing problems, on Oracle Database 10g and later systems, you will see the
wait event latch: In-memory undo latch as your top wait event. Plus your database
server CPUs are probably so hot due to server processes spinning trying to acquire one of the
IMU latches that they are ready to explode. So pinpoint diagnosis is not the tricky part.
As with most latches, there are many possible problem resolution and performance
optimization solutions. Here is a list of possible solutions:
• Turn off IMU. Set _in_memory_undo to false. I’ve even seen Oracle support
recommend this. It’s a shame if you have to do this, but if it turns out there is a serious
problem with the kernel code or how Oracle interacts with the operating system, you
may need to completely disable IMU (and hope there is a patch).
• Increase CPU power. Increase the number of CPU cores, their speed, or available
CPU cycles. The goal here is to give Oracle processes more CPU cycles. By “available
CPU cycles,” I mean to look at all operating system processes (which includes Oracle
processes) and see if you can eliminate them, reduce their impact, or have them run
during nonpeak times (that is, do some workload balancing).
• Increase the number of IMU latches. Keep in mind that just because the number of
IMU latches increases, this does not mean Oracle will actually use them. Run a query
like the one shown in Figure 7-18 to better understand the true usage situation. Here
are a couple ways to increase the number of IMU latches:
288
• Increase the instance parameter, processes. Be careful, as this parameter

affects many other instance parameters.
• Increase the number of IMU pools by increasing the _im_pools parameter or
increase the shared pool size by increasing the shared_pool_size
parameter. These changes only encourage Oracle to create additional latches, so
you may want to do some testing by running a script similar to that shown in
Figure 7-18.
• Influence IMU memory. You can influence an IMU memory increase by increasing
the shared pool size or by shifting other objects out of the shared pool. For example,
you could set cursor_space_for_time to false and reduce the number of
kept shared pool objects. Be careful! Increasing memory could also increase IMU
activity, generating even more stress on the latches. Decreasing memory may
decrease latch activity, but it is also likely to increase IMU node-collapsing activity,
resulting in more traditional undo block activity, which can also result in increased
IO. So before you start attempting to influence IMU memory, give it some serious
thought, talk with your colleagues, and do some testing.
Summary
After reading this chapter, you should have a much better appreciation for the massive
technical challenge Oracle undertook and continues to undertake with caching objects of
various sizes, duration, and access patterns, and understand their relationships. Of course this
complexity has led to a lot of pain for everyone involved: users, performance analysts, Oracle
support, and I’m sure the Oracle kernel developers. But as with most pieces of software, if
you’re serious about improvement and expanding the scope, it can done.
Oracle has made tremendous strides in the shared pool. As a result, for every problem a
performance analyst is likely to encounter, there are a number of solutions. In fact, Oracle has
taken this so seriously that some of the solutions can be invoked with an alter system or
even an alter session command. In the coming years, I suspect we’ll see even greater
things occur in the shared pool.
289
290
CHAPTER
8
Oracle Redo
Management
Internals
I’ll never forget this: I had just accepted Oracle’s offer to join their consulting organization,
and an old guy1 at my current workplace asked me very seriously, “Why would you want to
work for Oracle? Their database is so slow.” While I don’t remember my response, I do
remember thinking, “I don’t really care. I just want out of this place!” But as I came to learn,
Oracle was slow, and that was because it successfully manages massive amount of data. My
previous employer’s database (which was really a glorified virtual storage access method file
manager) had no concept of read consistency or rolling a transaction back, so it commonly
had corrupted files, which meant that we need to take the database offline and try to rebuild
the files. I learned data management capabilities came at a price, both financially and in
performance.
If Oracle systems did not need the capability to roll back, provide read-consistent views
of data, or quickly recover, both undo and redo could be eliminated. Oracle systems would
fly! But we would be spending all of our time trying to recover lost data. My point is that data
1
This old guy was shockingly younger than I am now.
291
Oracle Redo Management Internals
management requires computing resources. In this chapter, we are concerned about one of the
largest database management resource consumers: redo generation.
In my performance firefighting classes, I leave the redo log buffer section for the very
end. I know that if the class moves slowly, the one section I can appropriately and quickly
move through is the one on redo internals. The reason is that once you are comfortable with
buffer cache and shared pool internals, the redo log buffer technology, architecture, and
diagnosis seem straightforward. The real challenge with redo is not in diagnosing and
analyzing the problem, but finding the solution, because redo-related problems can be very
difficult to implement. It is common to require the IO subsystem capacity to be increased, as
well as make significant application architectural changes. Both can be excruciatingly painful.
My objective in this chapter is to clearly present the relevant architectural pieces of the
redo log buffer, the performance challenges we commonly face, and how we can alter the
situation in our favor. Because the diagnosis is fairly straightforward, we’ll spend more time
on the solutions, which is where you’ll be spending most of your time when redo becomes the
point of contention.
Buffer Cache Changes

Every commit issued 2 and current mode buffer change (consistent read buffers do not have
their changes recorded) must be recorded for recovery purposes. And the moment we are
concerned with recovery, serialization becomes of critical importance. It’s the combination of
change recording and serialization that can force Oracle to place tremendous requirements on
the operating system. When redo-related operating resources become scarce, response time
begins to skyrocket, and the contention can be easily diagnosed by performing a wait-event
analysis or a response-time analysis. If you perform the OraPub 3-circle analysis, within a few
minutes, the performance story will be blatantly apparent.
The rule of thumb is that any buffer change results in redo generation. While cloned
buffers are the exception, this rule of thumb will help when understanding redo generation
and flow. There are two surprising examples of recording change that drive this point home.
The first is associated with undo-related redo, and the second is query-related redo. But before
we dive into the interesting situations, we need to discuss just what is actually recorded.
Just Enough Redo Is Generated

Oracle is obviously motivated to minimize the amount of redo generated. Every byte saved is
one less byte of memory management, and also one less byte written to the IO subsystem.
Buffer changes, called change vectors by Oracle kernel developers, are bundled into redo
records (also called redo entries), which are copied into the redo log buffer and then into one
of the online redo log groups by the log writer background process.
Developers tend to look at database changes from the table or row perspective. DBAs
tend to look at changes from an Oracle block or buffer perspective. But Oracle kernel
developers are not constrained by object, relational, or normalization concepts and their
2
Simply issuing a commit command when no object change occurred will not trigger the log writer to write.
This is easy to test. Create a tight PL/SQL loop containing two lines: a commit and a 1-second sleep. Let this run
while operating system tracing the log writer background process. You will not see the log writer issue a write
request. However, if you add a third line into the tight PL/SQL loop that performs some type of subsecond DML
loop, after 60 seconds, you will see the log writer issue about 60 write calls.
292
implementations. To an Oracle kernel developer, a database change is represented as one or

more bytes. Oracle does not record the changed table, row, extent, or block. It simply records
just enough information to, should it be necessary, redo the change. A kernel developer once
told me, “Craig, you have got to stop thinking relational. Think bytes.”
Undo-Related Redo
One of most surprising revelations when learning about Oracle internals is that, because all
current buffer changes must be recorded, when an undo segment buffer changes, its change
must also generate redo and therefore be recorded!
Keep in mind that redo is used for recovery. If we didn’t need to recover, then redo
would be a complete waste of resources. So, why would we ever need to recover undo
segments? When you first learn about Oracle recovery, you come to understand there are two
core phases to the recovery process: first, the system is rolled forward, and second, the system
is rolled back. The other key nugget of information is that before the database writer will
make a change to a block residing on a database (.dbf) file, that change absolutely must
have already been written in an online redo log. Oracle must ensure any block change has
already been recorded in the redo stream in case recovery is required. If this did not occur, an
unfortunate recovery situation could exist where there is no record of a block change.
Part of the rolling forward process is re-creating the undo segments. Just like data and
index segments, undo segments are rolled forward during the first recovery phase. When the
roll forward process has completed, the database file blocks contain both committed and
uncommitted changes. This also means the re-created undo segments contain information
about both uncommitted (active) and committed (inactive) transactions. Remember the
previous chapter’s discussion about the transaction table? Each undo segment header block
contains the transaction table, which has a record of all the transaction information contained
in its associated undo segment. The recovery process accesses this information, and then
proceeds to roll back all uncommitted transactions. When this rolling back portion of the
recovery process is complete, the database can open in a consistent, point-in-time state.
The implications of this are changes to undo segments must be recorded in the redo
stream. So when an undo segment’s cached block is changed, redo is generated. Put in a fuller
context, when a row changes, its buffer is changed, and the associated redo is generated. And
to enable rollback and read-consistency capabilities, an undo buffer is also changed,
generating redo related to the undo buffer change. As I said, data management creates a
tremendous amount of overhead.
Query-Related Redo
Personally, I find it fascinating that, within Oracle, it is possible for a query to produce redo.
In the previous chapter’s discussion of the interested transaction list (ITL), I conveniently
glossed over the fact that when a block cleanout occurs and the buffer changes, redo is
generated. All it takes to trigger a cleanout (normally called a delayed block cleanout) is for
the buffer to be touched by any session after a transaction commits. This causes the buffer to
be cleaned out and the ITL flag to change from --U- to C---. The session that touches the
buffer could be accessing the buffer as a result of a query. So a query can generate redo!
I understand that this is difficult to believe, so I created a simple script to demonstrate it,
as shown in Figure 8-2. The script is executed while other sessions are making changes to the
customers table. First, the script records the redo associated with my connected session.
293
The v$mystat view’s redo size statistic shows how much redo in bytes a connected
session has generated since connection.
$ cat ./select_redo.sql
def redo_size=133
select sn.name,my.value
from v$mystat my,
v$statname sn
where my.statistic#=sn.statistic#
and my.statistic#=&redo_size;
exec dbms_lock.sleep(5);
select count(*) from customers;
select sn.name,my.value
from v$mystat my,
v$statname sn
where my.statistic#=sn.statistic#
and my.statistic#=&redo_size;
Figure 8-1. Shown is a script that will report a session’s redo generation since connect, sleep
for 5 seconds, touch every block in the customers table, and then again report on a
session’s redo generation since connect. The redo generation statistic (redo size) number
depends on the Oracle release. In this script, the redo size statistic number is 133. Of
course, the where clause could have filtered based on the statistic name.
The redo size statistic number depends on the Oracle release, so I always double-
check. This script was run on an Oracle Database 10g Release 1 system, which assigns the
redo size statistic as number 133. In Oracle Database 11g Release 1, the redo size
statistic number is 140. While Oracle does occasionally change a statistic number, changing
the name is even less likely. Therefore, the query in Figure 8-1 could easily have filtered on
the statistic name instead of its number.
After the initial redo bytes-generated value is reported, the session sleeps for 5 seconds
while other sessions are modifying the customers table. When the session wakes up, it then
performs a full-table scan on the customers table, which touches every block in the table,
and then once again gathers how many redo bytes the session has generated. If the
customers table had changes committed while the session slept, and the session was the
first to touch the changed buffer containing at least one committed row, we would expect to
see redo associated with the session. Figure 8-2 shows what actually happened.
294
SQL> @./select_redo.sql
old 5: and my.statistic#=&redo_size
new 5: and my.statistic#=133
NAME VALUE
-------------------------------------------------------------- ----------
redo size 441628
COUNT(*)
----------
2691136
old 5: and my.statistic#=&redo_size

new 5: and my.statistic#=133
NAME VALUE
------------------------------------------------------------- ----------
redo size 881952
SQL>
Figure 8-2. Shown is the result of a session querying a table (2,691,136 rows), which
generated 440,324 bytes (881,952 – 441,628) of redo.
Figure 8-2 shows what we hope to see. The session initially generated 441,628 bytes of
redo. Next, it slept for 5 seconds, and then it touched all customers table blocks by doing a
full-table scan. It’s important to understand that the session did not change any customers
rows. However, other sessions had made changes, and those changes were committed. It just
happened that this session was the first session to touch customers table blocks after one of
the block’s rows had been committed. As a result of the session touching the buffer, block
cleanouts occurred, generating redo associated with my session. The ending redo bytes
generated value was 881,952, which clearly demonstrates that by simply querying the
customers table, my session generated 440,324 bytes of redo!
Some DBAs have experienced this seemingly strange phenomenon. Their story is very
similar: During nightly batch processing, many blocks were changed. When the employees
arrived in the morning and started querying the database, the DBA noticed a lot of redo being
generated. So, what may seem like an anomaly makes perfect sense after all.
Redo Log Buffer Architecture and Algorithm

Compared to the buffer cache and shared pool memory structures, the redo log buffer is very
straightforward. While Oracle continues to enhance the redo log buffer architecture, the
central idea has remained the same: buffer and batch redo changes to improve log writer
background process efficiency and enable extremely quick commit times.
Figure 8-3 is a highly abstracted view of the redo log buffer and is more accurate for
Oracle releases prior to Oracle9i Release 2, but it is a great way to gain a solid conceptual
295
view of the redo log buffer and how it works. The redo log buffer is a circular first-in/first-out
(FIFO) queue structure. The structure also utilizes two pointers:
• The last allocation pointer designates the end point for the last redo log buffer space
allocation and also points to the beginning of the next allocation point.
• The last write pointer designates where the log writer background process’s last
write ended.
The various markers in Figure 8-3—for example, marker MK100—are used to describe
specific redo-related scenarios discussed in the following sections.
Figure 8-3. Shown is an abstracted redo log buffer, indicating the two pointers: the last redo
allocation pointer and the last redo write pointer. The various markers (MK…) are used in
describing specific scenarios in the text.
Suppose a server process needs redo related to a buffer change. It creates the change
vectors and combines them into a single redo entry. At this point, the server process knows
how many bytes of redo are required.
Oracle9i Release 2 introduced significant changes in the redo log buffer architecture and
algorithms. Yet, even prior to this release, there are adjustments the performance specialist
can make to alter how Oracle’s redo mechanism operates. Let’s look at how the redo log
buffer operates and some possible adjustments in both pre- and post-Oracle9i Release 2
versions.
Pre-Oracle9i Release 2 Redo Log Buffer

Prior to Oracle9i Release 2, the server process contends for one of the redo copy latches and
then for the single redo allocation latch. Once both latches have been acquired, the server
process gets the address of the last allocation pointer, which is marker MK100 in Figure 8-3.
The server process adds the bytes it needs to marker MK100 and moves the pointer
clockwise, perhaps to marker MK200. Now that space has been allocated, the server process
immediately releases the single redo allocation latch for other processes to use. However,
296
although the allocation latch has been released, the redo copy latch is still held while the
server process copies the redo entry into the just-allocated space. The copy latch must be held
to ensure the log writer background process does not flush the redo log buffer while a server
process is copying its redo into the redo log buffer. Once the server process has finished
copying its redo, it releases its redo copy latch.
Having a single redo allocation latch makes enforcing redo serialization very
straightforward. But as you can imagine, having a single redo allocation latch also can
become a point of contention. To reduce the likelihood of this, server processes hold the
allocation latch just long enough to allocate redo log buffer space. There is also the instance
parameter _log_small_entry_max_size, which is used to shift allocation latch
activity onto one of the redo copy latches, as discussed in the “Redo Allocation Latch
Contention” section later in this chapter. To further reduce the contention possibilities, Oracle
allows for multiple redo copy latches. The instance parameter
_log_simultaneous_copies is used to control the number of redo copy latches.
So, while having a single redo allocation latch may seem like a serious potential
problem, for most Oracle systems, the problem can be solved entirely from an Oracle
perspective. However, if you’re responsible for one of those other Oracle systems, you may
need something better.
Post-Oracle9i Release 2 Redo Log Buffer

When a single redo allocation latch is in use, which is the case prior to Oracle9i Release 2, the
redo algorithm is very elegant and straightforward. But as Oracle systems continued to grow
in concurrency and redo generation requirements, redo-related latch concurrency constraints
started to become more likely. So, Oracle began making significant changes in the redo log
buffer.
Oracle9i Release 2 and later have multiple redo allocation latches available, and the redo
copy latches are normally not required. In summary, the redo log buffer works like this: A
server process contends for one of the redo allocation latches, acquires one, allocates redo log
buffer space, copies its redo into the redo log buffer, and releases its redo allocation latch. In
more detail, it works like this:
• The server process contends for and acquires one of the redo allocation latches.
• The server process gets the memory address of the last allocation pointer, which is
marker MK100 in Figure 8-3.
• The server process adds the redo bytes it needs to marker MK100 and moves the
pointer clockwise, perhaps to marker MK200.
• The server process copies its redo into the just-allocated space and releases its redo
allocation latch.
When redo copy latches are not used by server processes for copying redo into the redo
log buffer, the log writer background process will need to acquire all the redo allocation
latches, just to make sure it doesn’t flush the redo log buffer while server processes are busy
copying their redo into the redo log buffer.
Figure 8-4 is a conceptual diagram of the log buffer based on Oracle9i Release 2 and
later. Similar to the least recently used (LRU) strategy, Oracle conceptually divides the single
large redo buffer into multiple parts, with each part having its own redo allocation latch. The
297
two main parts are strands of redo and the general log buffer. Strands of redo are where server
processes copy their redo into the redo log buffer. When a transaction commits, its strand of
redo is flushed to the general log buffer, and the log writer flushes the entire general log
buffer to the active online redo log group. Each strand of redo and also the general log buffer
have their own redo allocation latch. So, the number of redo allocation latches can far exceed
one.
Figure 8-4. Shown is a less abstracted redo log buffer based on Oracle9i Release 2 and later.
Similar to the log buffer shown in Figure 8-3, each strand of redo and the general log buffer
have allocation and write markers. When DML occurs, space is allocated in a strand of redo,
protected by one of the redo allocation latches. When a commit occurs, the related strand is
flushed to the general log buffer, and then the general log buffer is flushed by the log writer
background process.
By default, the number of redo strands is dynamic, but it can be made static by setting
the hidden instance parameter _log_parallelism_dynamic to false. When Oracle is
dynamically controlling the number of redo strands, the maximum number of strands is
controlled by the hidden instance parameter _log_parallelism_max. The DBA can
specifically set the number of redo strands via the hidden parameter _log_parallelism.
The default number of redo strands is surprisingly low—perhaps two.
The number of redo allocation latches is controlled by the instance parameter
processes. Around each eight processes defined, one redo allocation latch is created.
Figure 8-5 shows both the query and output (with many output lines removed) of an Oracle
Database 10g Release 2 system with 4,000 processes defined. Oracle created 488 redo
allocation latches, which amounts to 8.2 processes for each redo allocation latch. This ratio of
processes to allocation latches has been observed for many different processes parameter
settings.
298
SQL> select latch#, name, child#, gets, sleeps

2 from v$latch_children
3 where name = 'redo allocation';
LATCH# NAME CHILD# GETS SLEEPS

---------- -------------------- ------- ---------- ---------
148 redo allocation 1 536471 2
...
488 rows selected.
Figure 8-5. Shown is a system with a defined 4,000 Oracle process limit, resulting in 488
redo allocation latches. The system is undergoing an intense three-process DML workload.
Now that I’ve laid the architectural foundation, let’s see how this is combined into the
flow of redo.
Redo Flow
A good word picture of redo flow is a river. Changes made to database buffers can be thought
of as a storm. The rain comes pouring down in the form of change vectors, which are
consolidated into redo entries. Space is allocated in the redo log buffer for the redo entries.
Then the redo entries are copied into the redo log buffer. Finally, the redo buffer contents are
written into an online redo log group. To summarize, there is redo creation, redo space
allocation, redo copying, and redo writing. Anywhere in this flow of redo, contention can
occur. And as with an overflowing river, we need to ensure Oracle’s redo requirements do not
exceed the IO subsystem’s capacity. Otherwise, a flood of performance problems will
develop.
Any time there is a redo-related performance problem, you will find the issues will be
concentrated into one of these main redo areas: redo creation, space allocation, redo copying,
and redo writing. As I present the various performance-hindering situations, pay close
attention to where in the redo flow the problems occur. Understanding and visualizing the
redo flow blockage will help you and your colleagues avoid nonsensical solutions (guessing)
and also communicate your ideas better to others.
299
Global Temporary Tables

If your application is generating a tremendous amount of IO writing, and your IO subsystem
just can’t handle the requirements, global temporary tables (available in Oracle8i), may
provide the solution. When I first heard about global temporary tables, I was so intrigued by
Oracle’s claim that no redo was generated that I decided to perform some experiments. I
found that the redo reduction results are simply amazing and very significant.
The Need for True Interim Tables

Most applications use what I call interim tables (sometimes incorrectly referred to as
temporary tables), which exist in the middle between Oracle temporary segments and data
segments. Interim tables typically hold data only during batch processing operations that have
limited or no recovery requirements. Doesn’t it seem silly then to create, copy, buffer, and
write all that redo? And if you are used to using large PL/SQL tables (or arrays), you know
that a session can exceed its available heap memory limits. Global temporary tables address
these questions by having specific characteristics of both temporary segments and data
segments. For example, while a global temporary table contains rows and can have associated
indexes, like temporary segments, the amount of redo generated is significantly less than with
traditional data segments.
While developers have some amount of control over the use of interim segments, many
applications, along with their massive redo generation requirements, are simply transitioned
over to the DBAs, who are told to somehow just make them perform. Global temporary tables
give the DBA a near-application-level tool, which can be used to possibly avoid significant
and painful application changes.
Common Characteristics
Global temporary tables share many standard data segment and temporary segment
characteristics, such as the following:
• They can be created by a session using a create table variant. For example,
instead of create table customers, the syntax could be create global
temporary table customers.
• They can have rows inserted, updated, and deleted by their creator session.
• They can have associated triggers with standard tables and with other global
temporary tables owned by their creator schema.
• Transaction rollback is supported, but because redo is not fully generated, recovery
is not possible.
• Indexes and views can be created.
• Global temporary tables can be explicitly dropped. When this occurs, the associated
indexes and views are also dropped.
While global temporary tables have many standard data segment characteristics, what
makes them unique is they also inherently possess many temporary segment characteristics,
such as the following:
300
• One session can create a global temporary allowing other sessions to share its
definition, yet not its data. For example, the mg user can create a global temporary
table, insert rows, select rows, and commit. User oe can describe the table but will
not see any rows before the commit (normal read-consistent behavior) or after the
commit. The oe user can insert rows into the table, but will see only the rows they
inserted. And if user mg disconnects, that user’s data is removed, but user’s oe data
remains.
• Each session’s data exists only as long as the session remains connected or, if
specified, until a commit is issued.
• Global temporary tables are created in the schema’s default temporary tablespace. In
fact, there is no tablespace placement option.
• Object growth can be observed by repeatedly by querying v$sort_usage_view.
• Global temporary tables create 50% less redo than standard tables.
So in many respects, you get the best of both worlds: table-like control, along with
automatic temporary segment management.
Truly Reduced Redo

While the Oracle documentation is not entirely clear, it does seem to indicate global
temporary tables do not generate any redo. While my experiments3 showed this to be
incorrect, even the worst case scenario achieved a remarkable 43% redo reduction. For
example, with a row size of 750 bytes, redo was consistently reduced by 68.1%. And even
more important, as the row size increased, so did the redo savings.
If you have the opportunity to design batch processes, do everyone a favor and consider
incorporating global temporary tables. And if you are a performance analyst stuck in a corner
with an application generating far more redo than your IO subsystem can hope to adequately
process, or a developer needing more memory than a single process can allocate, don’t forget
about global temporary tables.
Log Writer Background Process Triggers

The log writer background process’s job is to quickly flush the redo log buffer. As with the
database writer, multiple events trigger the log writer background process to write. Because
Oracle promises that committed transaction information has been written to disk (as far as
Oracle knows), the obvious log writer background process requirement is to flush the redo log
buffer when a transaction commits. But to keep the redo flowing smoothly and quickly, there
are also a number of other events that trigger the log writer background process into action.4
3
Details about these experiments are documented in an OraPub technical paper. Search for “global temp” on
the OraPub web site.
4
In a Real Application Clusters (RAC) environment, the lock manager server (LMS) background process can
post the log writer background process to write. When an exclusive current (XCUR) model buffer is transferred to
another Oracle instance, the holding instance must complete the pending redo associated with the buffer, and the
LMS is the process responsible for shipping the buffer to the other instance. During this transfer time, the LMS
background process will post the wait event log file sync. Sometimes, the log writer background process will
301
Commit Issued
While redo is generated and copied into the redo log buffer as part of DML processing, when
a commit is issued, control is not returned to the server process until all of the associated
transaction redo has been successfully written to an online redo log group. While the server
process is waiting for the log writer to give it control once again, it will post the log file
sync wait event. In fact, the average log file sync wait time is a good indication of the
average commit time from an Oracle perspective.
Once when I was doing some on-site consulting work, the DBA received a call from a
user complaining that saving images in an application was taking much longer today than it
did yesterday. The caller wanted to know what was wrong with the Oracle system. Oracle had
been gathering Statspack data, so the DBA ran a quick query that listed the log file
sync times from the previous day and also the current day. We could both see that the log
file sync times were essentially the same. This meant any save-related issues the user
was experiencing were not because of Oracle, but caused somewhere between the database
and the end user. After doing some investigation, the DBA discovered the problem was
blockage at the application server level. So it wasn’t a database issue after all! This is yet
another creative application of Oracle’s wait interface.
It is important to understand that Oracle trusts the operating system when it indicates
that the redo has been written to disk successfully. If the operating system returns a successful
write, yet that write did not physically occur, and then a media failure occurs, the database
may not be completely recoverable. I have seen failures of lithium backup batteries, disk
mirroring software, and physical drives result in the database system becoming unavailable.
While optimal performance may not be achieved, it may be in everyone’s best interest to
ensure when Oracle receives the message, “Sure, the data has been written to disk,” it actually
has been written to disk. Achieving this performance versus availability balance can be very
tricky indeed.
Commit Write Facility

Oracle Database 10g Release 2 introduced the commit write facility, which gives the
appearance a commit has completed when it really hasn’t. If commit times are taking too long
(log file sync will be the top wait event) and the application can survive the possibility
of lost data5 in an unfortunate instance failure, then technically, one of the commit write
options may provide the extra performance boost you need. But please, ensure your
application and the business can tolerate the potential loss of committed data. As I tell people,
slower performance is better than no performance.
The trusted standard commit command is the same as commit write
immediate, wait, but there are three other alternatives. Each can be put into effect in
any of the following ways:
• By setting the instance parameter commit_write
• By issuing an alter session or alter system command
• By issuing an altered commit command
also flush consistent read (CR) buffers in an RAC envirionment. This activity can be monitered via the instance
statistic views (for example, v$sysstat) statistic gc cr block flush time.
5
I feel very uncomfortable even writing about this! Don’t ever say I didn’t warn you about the potential loss of
data.
302
The following snippet shows the various ways to interactively use and set the
commit_write options to an immediate and nowait setting.
SQL> show parameter commit_write
NAME TYPE VALUE

--------------------------------- ---------- -------------------------
commit_write string
SQL> alter session set commit_write="batch, nowait";
Session altered.
SQL> alter system set commit_write="batch, nowait";
System altered.
SQL> show parameter commit_write
NAME TYPE VALUE

--------------------------------- ---------- -------------------------
commit_write string batch, nowait
SQL> commit write batch nowait;
Commit complete.
This snippet first shows the current commit_write setting, which is null, meaning it
has not been set. The second entry alters only the session, and the third alters the entire
instance. Then the parameter is once again shown to reflect the alterations. Finally, the
modified commit statement was entered directly, overriding any instance parameter, session,
or system setting.
Now let’s take a look at what the various options actually do. The first commit_write
option is either immediate or batch. The immediate option simply posts the log writer
to write, which is the normal behavior. The batch option buffers the commits, allowing
them to remain and accumulate in the redo log buffer while returning a commit completed
response back to the server process. The second option is either wait or nowait. The wait
option instructs the log writer background process to perform a synchronous write, which
means control will not be returned to the log writer background process until its write has
completed. This is normal behavior. The nowait option tells the log writer background
process to write asynchronously, which means immediately after the log writer background
process issues the write to the operating system, control is normally returned to the log writer,
even though the write has not physically completed. This is typical asynchronous behavior,
but not the default log writer background process and Oracle commit behavior. Now let’s put
these options together to do something potentially useful.
The batch option can be used with either the nowait or wait options. The batch
option improves performance by allowing redo entries to build up—that is, to accumulate in
the redo log buffer even after a commit is issued. Normally, every commit forces the log
writer background process to flush the log buffer and write to the current online redo log
group. There is overhead in this activity, so by reducing its frequency, the overhead is
obviously reduced. If an application or a specific application session frequently commits, this
303
option can be used to reduce the effective commit rate instead of modifying the application
code (though personally, I would much rather have the code optimized to begin with).
The nowait option can also be used with either the immediate or batch options.
Normally, the log writer background process writes synchronously, which means the
operating system will not return control to the log writer until the write has been physically
completed. Of course, the underlying IO subsystem could also be caching the write and telling
the operating system the write has been physically completed when it has not. Asynchronous
writes are typically many times faster than synchronous writes, resulting in a substantial
performance improvement.
We would expect the combined batch and nowait options to perform the best, as the
redo entries can be remain buffered longer in the log buffer, resulting in more efficient writes,
and the log writer background process can perform asynchronous writes, resulting in a near
immediate commit command response time. Just to give you a taste of how well the commit
write facility can work, I performed a simple experiment. Table 8-1 shows the results.
Table 8-1. Experimental performance result using the commit write facility
Log File Sync Log File Parallel

Commits/s
Commit Write Option Wait % Write Wait %
Avg Stdev Avg Stdev Avg Stdev

batch, nowait 25,431 653 52 2 35 0
batch, wait 726 41 79 1 20 0
immediate, nowait 21,501 249 63 2 25 1
immediate, wait 741 32 80 0 20 0
The work was measured by the number of average commits performed each second. To
ensure the experiment was statistically significant, multiple 600-second samples were taken,
and the standard deviation calculated. The workload placed on the four-CPU core Linux
Oracle Database 10g Release 2 system was a mix of two sessions performing rapid, small
updates and two sessions performing small inserts. The rapidness was produced by placing
the update or the insert within a very tight PL/SQL loop. After each update, a commit was
issued; after 50,000 inserts, a commit was issued.
As a reminder, the immediate, wait option is normal Oracle behavior and should
serve as a baseline. As expected and as Table 8-1 shows, the top wait event is clearly log
file sync, which is the server process waiting for the commit to finish. The log file
sync percentage change is not overly relevant because the workload was allowed to increase.
So, any response-time gain could be offset by an increase in throughput. This trade-off will be
discussed in detail in the next chapter. The second top wait event, also shown in Table 8-1,
was always log file parallel write, which is the time the log writer background
process is actually waiting for the write to complete.
As we expect buffered asynchronous writes, the batch, nowait option clearly
provided the most workload processed per second. In fact, compared to the immediate,
wait baseline, the batch, nowait option commit rate was more than 34 times greater.
That is an amazing performance improvement!
304
While the batch, nowait option appears to have a higher commit rate than the
immediate, nowait option, I ran a statistical significance test6 to ensure any workload
difference could not be attributed to randomness. The statistical significance test clearly
indicated (99.7% confidence level) that the batch, nowait option did indeed process
more commits per second than the immediate, nowait option.
In this particular experiment, the difference between the batch, wait option and the
immediate, wait option was not statistically significant—that is, any difference can be
attributed to randomness. Just to ensure there was not a problem with my experiment, I
collected four additional samples for the batch, wait option. It appears that whatever
performance improvement was gained by batching the redo was offset by the log writer
background process’s slower synchronous writes. This is plausible because, while Table 8-1
does not show this, the average log file parallel write wait time was around a
painfully slow 20 ms.
To summarize, if your application can handle lost committed transactions at the session
or instance level, and your top wait event is log file sync, then the commit write
facility rocks! Otherwise, don’t mess with it and sleep well at night.
Database Writer Posting the Log Writer

One of Oracle’s database integrity rules is to never allow a block change (not buffer, but
block) to occur unless that change is first recorded in an online redo log group. So if, for any
reason, a database writer wants to write to a database block, the database writer will first
trigger the log writer background process to flush the log buffer, just to ensure the change is
in the redo stream. While the log writer background process is busy flushing the log buffer,
the database writer background process will be posting a log file sync wait event.
Once the log writer has completed the write, the database writer is free to complete its
write. To reduce the likelihood of the database writer background process needing to wait, the
general strategy is to ensure that the log writer background process is writing frequently.
Buffer Fill
It’s pretty obvious that having the log writer background process waiting until the redo log
buffer fills to begin writing could cause serious performance problems. The log writer
background process could potentially need to immediately write tens of megabytes. Plus, a
user could have committed during a log writer background process flush and will need to wait
for the write to complete, and then for the redo log buffer to fill once again. So, to avoid this
type of performance behavior, if the log writer is not triggered to write for some other reason,
it will flush the general log buffer when it is the minimum of one-third full, 1MB full, or
reaches the value of the initialization parameter _log_io_size multiplied by the operating
system block size. The operating system block size (in bytes) can be quickly determined by
issuing the following code snippet:
6
I used OraPub’s Statistical Significance Template spreadsheet. To locate and then download the spreadsheet
for free, go to OraPub’s web site and search for “stat sig.”
305
SQL> select max(LEBSZ) from x$kccle;
MAX(LEBSZ)
----------
512
1 row selected.
The block_size column in v$archived_log also provides the operating system

block size.
There are times when we want to control redo log writing based on the amount of redo,
not solely based on commits. For example, suppose a system is subjected to infrequent bursts
of redo activity, but when these bursts occur, they result in very slow online user updates and
inserts (not commits), and the top wait event is log buffer space. To address the log
buffer space issue, the performance analyst may choose to increase the redo log buffer
size. However, while the user’s updates and inserts may complete quicker, commits may now
take longer! Because the log writer background process is writing less frequently with a larger
redo log buffer, it is likely to have more redo to flush to disk during each commit. If commit
times become unacceptable, the performance analyst will need to find a way to optimally
balance update and insert performance along with commit performance; that is, balance the
log buffer space and the log file sync wait events.
One strategy is to increase the redo log buffer size, enabling it to absorb redo activity
bursts, thereby reducing the likelihood of log buffer space wait events. Then decrease
the _log_io_size parameter to ensure the log writer background process is performing
more frequent, smaller redo log buffer flushes to keep performance smooth and consistent.
However, be aware that if the _log_io_size parameter is set too small, the log writer
would be making frequent and relatively small and inefficient writes, causing other problems.
Once again, the performance analyst must find that elusive performance balance.
Three-Second Timeout
As Figure 8-6 clearly shows, even an inactive log writer background process will wake up
every 3 seconds, even if it’s just to flush the general redo log buffer. However, in Oracle9i
Release 2, which introduced redo strands, experiments demonstrate the entire redo log buffer
is not flushed every 3 seconds; only the general redo log buffer is flushed.
The output shown in Figure 8-6, which contains no write statements over a 9-second
interval, occurred while updates were being issued about once each second! This
demonstrates the log writer background process does not flush the enter redo log buffer every
3 seconds. Thankfully, as soon as a commit was issued, the log writer immediately issued a
write call to the operating system.
As with the database writer background process, the log writer background process
sleeps by issuing a semaphore call. (As you learned in Chapter 3, when a server process
sleeps, in contrast to both the database and log writer background processes, it issues a
select call.) This allows another process to wake up the log writer, triggering it to flush the
general redo log buffer. For example, if the database writer background process wants to write
a dirty buffer to disk, it must ensure any redo associated with the buffer has already been
successfully written into an online redo log group. So, the database writer background process
will trigger the log writer background process to write and post the log file sync wait
306
event. When the log writer background process is finished writing, the database writer
background process can issue its write statement to the operating system.
[oracle@localhost ~]$ ps -eaf|grep lgwr

oracle 26096 1 0 16:51 ? 00:00:00 ora_lgwr_prod3
oracle 27155 1 0 19:13 ? 00:00:00 ora_lgwr_prod5
oracle 27421 27291 0 20:06 pts/3 00:00:00 grep lgwr
...
0.000086 semtimedop(1015808, 0xbfff5134, 1, {3, 0}) = -1 EAGAIN
...
...
...
Figure 8-6. Shown is an operating system trace of an inactive log writer. No commits are
being issued. During the snapshot of this Linux Oracle Database 11g Release 1 system, an
Oracle session was repeatedly issuing updates, yet even over multiple 3-second intervals, the
log writer did not issue a single write request to the operating system. This has also been
observed on 11g Release 2.
Redo-Related Performance Issues

Diagnosing redo-related performance issues is incredibly straightforward. Oracle’s wait
interface provides a number of specific redo-related wait events, and the operating system is
usually suffering from a severe IO bottleneck. The challenge is usually in being able to make
the changes necessary to realize a significant performance gain.
Some redo-related performance issues can be solved with instance parameters and fancy
Oracle features. However, the large majority of difficult redo issues are the direct result of the
application generating far more redo than the IO subsystem can quickly service. This puts
pressure on changing the application code, which many times is deep within the application’s
procedural code. This entices the performance analyst to sometimes get very creative and
consider options like in-memory file systems or using the commit write facility described
earlier in the chapter. But whatever the issues may be, remember to consider and evaluate
solutions focused on Oracle, the operating system, and the application.
The remainder of this chapter addresses the key redo-related wait events shown in Table
8-2. I will describe the problem and then provide a number of tactics and solutions to resolve
it.
307
Table 8-2. Key redo-related wait events

Most Likely
Wait Event Why Posted
Posted By
The process is waiting to get space in the
log buffer space Server process
redo log buffer.
latch free: redo The process is sleeping while acquiring
Server process
allocation one of the redo allocation latches.
latch free: redo The process is sleeping while acquiring
Server process
copy one of the redo copy latches.
Server process The process is waiting for its commit to
log file sync or database complete or its trigger to the log writer
writer to write to complete.
The log writer is waiting for the
log file parallel
Log writer operating system to complete its write
write
request.
The log writer cannot write into the next
log file switch online redo log group because the
(checkpoint Server process database writer has not finished a
incomplete) checkpoint that references redo in the
next redo log group.
The log writer cannot write into the next
log file switch online redo log group because the
(archive Server process archive background process has not
incomplete) finished archiving the next redo log
group.
Log Buffer Space

The wait event log buffer space is the direct result of spending too much time
allocating space in the redo buffer so redo can be copied into the just-allocated space.
As an example of how this can occur regardless of the Oracle release, referring back to
Figure 8-3, suppose there was an unfortunate combination of the last allocation pointer to
marker MK100 and a number of bytes large enough to push the last allocation marker
clockwise beyond the last write pointer (MK300) to marker MK400. Oracle will not allow
this to occur, since the redo between marker MK300 and MK400 would be overwritten. When
this situation arises, the server process will post a log buffer space wait event, trigger
the log writer background process to write, and wait until the log writer background process
flushes the redo log buffer and moves the last write pointer beyond marker MK400. Once this
takes place, the last allocation pointer will point to MK400, and the last write pointer will be
set beyond marker MK400, perhaps to marker MK500.
The log buffer space wait event is most common when a system is initially
placed into production or a dramatic increase in redo generation occurs. Either way, a
wonderfully simplistic Oracle-focused solution that usually fixes the problem is to increase
the redo log buffer size by increasing the instance parameter log_buffer. However,
308
besides wasting memory, if the log buffer becomes too large, other wait events, such as log
file sync, can arise. So be conservative to avoid introducing other problems.
Once the redo log buffer has been expanded, increased pressure will be placed on the log
writer background process and the IO subsystem. So, it is common to observe the
performance issue shift further on down the redo flow toward a log file sync or log
file parallel write wait event. Don’t be discouraged by this, as it is normal and part
of the tuning cycle. What you want to see is either the response time decreasing while
workload remains the same or response time remaining the same while the workload
increases. If your workload is OLTP-focused, then a drop is response time is usually
preferred. If your workload is more batch-centric, a workload increase (the throughput) is
typically your goal. Chapter 9 will help you understand your response time and throughput
options.
Redo Allocation Latch Contention

If a process is spending too much time sleeping while acquiring a redo allocation latch, the
latch free wait event associated with the redo allocation latch (pre-Oracle Database 10g)
or the latch free: redo allocation (Oracle Database 10g and later) wait event
will be your top wait event. This is most likely to occur in an intense high-concurrency
environment. Fortunately, there is a straightforward Oracle-focused solution, which works
with any Oracle release, that will probably resolve the issue.
The bad news is, as discussed earlier, pre-Oracle9i Release 2 systems have a single redo
allocation latch, making it much more likely to experience redo allocation latch contention.
The good news is there are multiple redo copy latches, and by creatively adjusting the
_log_small_entry_max_size parameter, we can influence the likelihood of a process
having to acquire the redo allocation latch.
Think of this instance parameter as a fence. On one side of the fence, the allocation latch
is used, as you would expect, for allocating space in the redo log buffer, and one of the redo
copy latches is used while copying redo into the just-allocated space. On the other side of the
fence is the redo copy latch, which is used to both allocate redo log buffer space and to copy
redo into the just-allocated space! If the redo entry size is greater than the
_log_small_entry_max_size value, the process will hop over the fence, using only
one of the redo copy latches and thereby bypassing requesting the redo allocation latch. So, by
reducing the height of the fence—that is, reducing the parameter value—the processes are less
likely to request a redo allocation latch. This directly reduces redo allocation latch
concurrency, which will reduce the likelihood of receiving the latch free wait event.
For Oracle9i Release 2 and later systems, Oracle provides multiple redo allocation
latches. By simply increasing the instance parameter processes, the number of redo
allocation latches is also increased. Sampling from multiple Oracle systems indicates that
about one redo allocation latch is created for every eight defined processes. So, if your system
has the processes instance parameter set to 2000, you can expect to see around 250 redo
allocation latches. By increasing the number of redo allocation latches, redo allocation latch
concurrency is increased, reducing the likelihood of sleeping while trying to acquire one of
the redo allocation latches.
As with most latches, it is very common for the operating system to be experiencing a
raging CPU bottleneck. While increasing CPU capacity is a valid solution and will most likely
enable more work to flow through Oracle, simply adding more redo allocation latches to the
309
Oracle system will most likely solve the problem, without requiring any operating system or
application-related changes.
Redo Copy Latch Contention

If a process must acquire one of the redo copy latches, it needs to have this control structure to
copy redo into the redo log buffer. This type of latch contention is unusual, but can occur with
very high-concurrency OLTP workloads.
For pre-Oracle Database 10g systems, the wait event will be latch free, with the
associated latch being redo copy. For later systems, the wait event is simply latch free:
redo copy. Regardless of your Oracle release, a simple and very effective solution is to
increase the instance parameter _log_simultaneous_copies, which will increase the
number of redo copy latches. This is an elegant solution that is very likely to eliminate redo
copy latch contention without requiring any changes to either the operating system or to the
application.
As discussed in the previous section, in pre-Oracle9i systems, the redo copy latch can
also be used to allocate space in the redo log buffer. So, one Oracle-focused option is to
increase the _log_small_entry_max_size instance parameter. Because this hidden
instance parameter affects both the redo allocation and redo copy latches, a more direct
solution is to simply add more redo copy latches.
Log File Sync Contention

The title of this section is a little misleading because the contention is centered on commit
times taking an unacceptably long period. When this occurs, the top wait event will be log
file sync. If users occasionally commit, they may not feel like performance is slow (the
good news). However, users who frequently commit, especially when a rapid committing
piece of code is involved, will certainly feel the effect and have a considerable amount of their
wait time associated with the log file sync wait event.
A challenge in effectively dealing with log file sync waits is that there are many
causes for this wait event. For example, it can occur because of rapid application commits
(there is overhead in each commit request), the log writer has too much redo to write per
commit (requiring more time to complete each write), or large amounts of redo per transaction
(requiring more time to complete the write). Whatever the reason, the general solution is
understanding the problem arises when a commit occurs and that commits are simply taking
too long.
As Figure 8-7 shows, when the commit times are taking too long, it is common for other
redo-related wait events to occupy the top few wait slots. The most common second top event
will be log file parallel write.
310
SQL> @swpctx
Database: prod16 26-JUN-10 11:59am


-------------------------------- ------------ ------- ----------- --------
log file sync 750.360 80.31 4.5 165
log file switch completion 0.510 0.05 56.7 0
Figure 8-7. Shown is OSM report swpctx.sql, which is a v$system_event wait event
report. This report indicates a classic wait event situation with severe log file sync
contention. Notice the second top wait event is the typical paired event, log file
parallel write.
Solving log file synchronization issues can be painful. First of all, it’s a server process
(and rarely, but possibly, the database writer background process) that posts the log file
sync wait event, not the log writer background process. While there may be a technical
reason for the log writer to post a log file sync, in a real production system, I have
never seen this occur. This means our solution will nearly always be focused on the server
process, which is running application code. Remember that our overall strategy is to reduce
commit times.
Application-Focused Solutions for Log File Sync Contention

Rapid commits are nearly always the underlying issue. The good news is that there are
multiple ways of addressing the problem. So, even if the application is designed poorly, the
performance analyst has a good chance of solving the problem. Figure 8-8 shows the classic
Oracle application developer botch. The developer is conscientiously ensuring that the
transaction is quickly committed. But unless the application requires the immediate commit,
the code forces the log writer background process to immediately flush the redo log buffer,
issue an IO write request, and then wait for the IO subsystem to return control to it. Simply
moving the commit outside the loop (which would never be executed in the example in
Figure 8-8) may be all that is necessary to solve this problem. So, when encountering a log
file sync wait event, always suspect the application is issuing rapid commits.
311
$ cat cause_problems.sh
#!/bin/sh
uid=$1
pwd=$2
sleep_time=0
sqlplus $uid/$pwd <<EOF

drop table bogus_rapid_commits;
create table bogus_rapid_commits as select * from dual;
declare
i number;
begin
dbms_application_info.set_module('$uid','do_rapid_commits');
loop
update bogus_rapid_commits set dummy='0';
commit;
sys.dbms_lock.sleep($sleep_time);
end loop;
end;
/
Figure 8-8. Shown is a fantastic way to cause severe log file sync performance issues.
Just have a few Oracle schemas run this script, and there is a very good chance of seeing
log file sync as your top wait event. This is one of the scripts used during the commit
write facility experiment presented earlier in this chapter (with the results shown in Table 8-
1).
If rapid committing is an issue, it can be helpful to gain a good understanding of your

application’s commit or the transaction rate. The commit rate data is gathered from the user
commits statistic from v$sysstat or v$sesstat. The transaction rate also includes the
rollback statistic user rollbacks. In a Statspack or AWR report, commit and rollback
rates can be found in the Instance Activity section, and the transaction rate is shown in the
workload portion of the report. If you have been capturing performance information, there is a
very good chance you have been capturing this data. If the log file sync wait event
suddenly becomes a problem, check if the commit rate has also suddenly increased. If a
correlation does exist, you can start looking for the reason the application is suddenly
experiencing a significant commit rate increase.
As Figure 8-9 shows, looking for the log file sync wait event SQL is very
straightforward. The first method shown samples directly from v$session. Starting with
Oracle Database 10g, real-time wait event details are included directly in v$session,
allowing for SQL identification based on any wait event. However, if you do not want to
repeatedly sample from v$session (which is not recommended), and if you have a license
to use the active session history (ASH) facility (discussed in Chapter 5), the second example
shows how you can also easily locate log file sync-related SQL from
v$active_session_history. Since ASH data is buffered for perhaps less than an
hour, if you want prior data, you can sample from the
dba_hist_active_sess_history view, as shown in the third example in the figure. If
you are performing your analysis based on a Statspack or AWR report, look for the highest
total elapsed time DML SQL. While solving log file sync issues can be difficult,
identifying the associated SQL is not.
312
SQL> select distinct SQL_ID

2 from v$session
3 where event = 'log file sync';
SQL_ID
-------------
fn5y3xm766a9r
5f967bzywcu3d
2 rows selected.

2 from v$active_session_history
3 where event = 'log file sync';
SQL_ID
-------------
396g4qadk4afg
fn5y3xm766a9r
5f967bzywcu3d
6k5hp7rh27dt4
5 rows selected.

2 from dba_hist_active_sess_history hist,
3 dba_hist_event_name en
4 where hist.event_id = en.event_id
5 and en.event_name = 'log file sync'
6 and sample_time > (sysdate-1/24);
SQL_ID
-------------
fn5y3xm766a9r
6k5hp7rh27dt4
5f967bzywcu3d
3 rows selected.
Figure 8-9. Shown are three ways to gather the SQL associated with the wait event log
file sync, or with a slight modification, any wait event.
There are times when the application truly demands an immediate commit, which can
result in a tremendous commit rate. For example, I know of systems where financial
transactions arrive from multiple sources. The transactions are small, but because they are
independent transactions, the legal and business rules require that each individual incoming
transaction must be immediately committed to the database. In this situation, you have little
choice but to devise a solution that does not involve changing the application.
Regardless of what you find, when experiencing significant log file sync waits,
assume there is a rapid commit issue and find out why it is occurring. Understanding this will
help you devise a better application and nonapplication solution strategy.
313
Operating System-Focused Solutions for Log File Sync Contention

A log file sync wait event is commonly attributed to either a lack of IO or CPU
resources, or both. If the bottleneck is both CPU and IO, it tends to quickly shift between the
two, which when averaged over an interval, does not clearly show either one to be a problem.
This can easily cloud your understanding of the situation, so you must be very careful.
Remain focused on the application commit times. If the log writer is pushing more IO
(requirements) than the IO subsystem can absorb (capacity), the top wait event will be log
file parallel write, with average wait times greater than 10 ms and high IO
subsystem response times gathered from iostat or sar. If the IO subsystem is responding
poorly, the server process posting the log file sync will indeed need to wait for the log
writer background process to finish writing.
There is more to committing a transaction than simply triggering the log writer to write.
Commits involve overhead in preparing for the write, updating Oracle internal structures
(requiring latches and memory manipulation), and actually issuing the write. Rapid commits
also lower the log writer’s efficiency, as each write will not be as packed full of redo. So
while the IO subsystem may be suffering, the log file sync points more toward an
application issue rather than an IO subsystem issue. But this is no excuse for the IO subsystem
team to abdicate their responsibility of ensuring the IO subsystem can process Oracle’s IO
requirements.
Due to all the Oracle SGA and structural change requirements when generating redo and
committing, a lack of CPU resources can also be an issue. Investigate the classic “get more
CPU” tactics, such as moving workload away from peak processing times, checking for
unnecessary or unusual high-CPU consuming processes, and possibly adding more CPU
resources.
Another creative “get more CPU” strategy is to bind a CPU exclusively to the log writer
background process. While not all operating systems have this capability, if there is a CPU
bottleneck or the log writer process spends too much time context switching,7 and log file
sync is the top wait event, binding a CPU exclusively to the log writer background process
can provide the extra power you need. For details about how to do this, do an Internet search
for “binding process to CPU” and consult your operating system’s documentation.
Oracle-Focused Solutions for Log File Sync Contention

When the top wait event is log file sync, my overriding goal is to reduce commit times.
I ask myself, “What can I tweak to encourage Oracle to decrease commit times?” The three
general approaches are having Oracle say the commits are complete when they really are not
(gulp), reducing the number of members in each online redo log group, and finding ways to
reduce the amount of data written when a commit occurs. This translates into my four main
Oracle-focused solutions: using the commit write facility, decreasing redo log group member
numbers, decreasing the log buffer size, and decreasing the instance parameter
_log_io_size.
7
When a process is serviced by one CPU and then is switched to another CPU, it undergoes a context switch.
The vmstat report’s cs column shows the number of systemwide context switches per second during the reported
interval. An operating system administrator can help you determine if the log writer (or any process for that matter) is
experiencing too much context switching at the expense of not getting enough service time. Exclusively binding a
CPU and process together virtually eliminates context switching, because the CPU will always be there for that
process and the process will never need to wait while the operating system switches its processing to another CPU.
314
If your application and business can support using the Oracle Database 10g commit
write facility, as described earlier, it could possibly instantly resolve this issue. So seriously
yet cautiously consider this option.
If I find that the top one to three wait events is log file sync, I always check the
number of group members to see if it can be reduced. When the log writer background
process writes to an online redo log group, every member of the group must be written before
the write will be complete. Depending on your IO subsystem configuration, it could possibly
take a relatively long time for each member to be written. The number of online redo log
group members should be set based on availability and recoverability targets, but once those
targets are met, do not add any more group members.
As strange as it might seem, decreasing the redo log buffer size can reduce the
likelihood of receiving log file sync wait events. Because of the one-third full flush
rule, a smaller redo log buffer will force the log writer background process to write more
often. These flushes are not the result of commits, but rather the log writer background
process being triggered. So when a commit does occur, there is a higher likelihood that all but
the commit entry will have already been written to an online redo log group and the commit
occurring instantly! However, as you’re probably already thinking, if the log buffer is
decreased too much, server processes will start posting log buffer space events. If
response time is improving, this may be an acceptable compromise. But as always, we need to
discover the optimal performance balance.
Another option similar to decreasing the redo log buffer size but without the likelihood
of inducing log buffer space events is to force the log writer background process to
flush well before the one-third or 1MB flush rule. As I mentioned, by carefully decreasing the
hidden instance parameter _log_io_size, we can force the log writer background process
to write redo more often. So when a server process commits, there is a higher likelihood of all
but the commit entry already being written to the active online redo log group! There is a
balance to maintain though, as rapid log writer writes typically mean less efficient and more
IO write calls. You will need to carefully ensure response time improves along with the log
file sync event wait time.
Log File Parallel Write Contention

Through Oracle’s wait interface, Oracle reports the log writer background process write time
as the wait event, log file parallel write. As with other parallel wait events, this
simply means a multiblock write. The redo log buffer is structured in relatively small
operating system-size blocks (typically 512 bytes), not Oracle-size blocks, so it is normal for
log writer activity to be performed with multiple blocks. When Oracle’s redo generation
requirements overpower the IO subsystem’s capacity to quickly process the sequential
multiblock writes, the log file parallel write wait event will be one of the top
wait events (likely the top one).
Log Writer Write Challenges

On most Oracle systems, unless extremely rapid commits are occurring, the log writer
background process will have a chance to batch 1MB of redo before the 1MB flush rule
triggers the sequential multiblock write. Figure 8-10 shows a snippet of a typical active log
writer operating system trace file.
315
[oracle@fourcore ~]$ ps -eaf|grep lgwr

oracle 3329 1 0 May26 ? 00:05:32 ora_lgwr_prod16
[oracle@fourcore ~]$ strace -rp 3329
...
0.000214 times(NULL) = 540181547
0.000035 pwrite64(21, "..., 475136, 488298496) = 475136
1.288494 times(NULL) = 540181675
0.000036 times(NULL) = 540181675
0.000033 pwrite64(21, "..., 1048576, 488773632) = 1048576
0.043375 times(NULL) = 540181680
0.000033 times(NULL) = 540181680
0.000036 pwrite64(21, "..., 1048576, 489822208) = 1048576
0.033626 times(NULL) = 540181683
0.000037 times(NULL) = 540181683
0.000035 pwrite64(21, "..., 920064, 490870784) = 920064
0.116281 times(NULL) = 540181695
...
Figure 8-10. Shown is an operating system trace of a very busy log writer on a Linux Oracle
Database 10g Release 2 system. Notice the gettimeofday calls are not tightly wrapped
around each write call, reducing the gettimeofday call overhead. While not apparent in
this snippet, the majority of write calls are 1MB.
The trace file snippet in Figure 8-10 is not what performance analysts like to see. The
first write call took 1.3 seconds to complete, followed by 43.4 ms, 33.6 ms, and finally 116.3
ms. Obviously Oracle’s IO requirements have exceeded the IO subsystem’s capacity. If you
see a trace file like this, you are highly likely to see the log file parallel write as
your top wait event.
Figure 8-10 also shows that log writer background process calls on this Linux Oracle
Database 10g Release 2 system are not tightly wrapped within a gettimeofday call. While
this might initially be a little unsettling, making you think Oracle’s timing may now be in
error, as long as Oracle also knows the number of writes and also the multiple write call
duration, it can easily calculate the average. In fact, reducing the number of gettimeofday
calls when the process makes quick multiple write calls (as occurred in Figure 8-10) is not
only more efficient, but also shows Oracle is making efforts to reduce its kernel code
instrumentation overhead.
When redo is involved in a performance issue, as shown in Figure 8-11, it is common to
see multiple redo-related wait events. Focus on the top event, and then work your way down,
while considering expected performance improvement, ease of solution implementation,
supportability, and so on. For example, while log file parallel write is the top
wait event, simply increasing the redo log buffer size will eliminate most of the log
buffer space waits. To make a shockingly strong case for redo problems, by classifying
all redo-related waits, you can say that over 99% of all the wait time is associated with redo!
316
SQL> @swpctx


--------------------------------- ----------- ------- ----------- --------
log buffer space 431.370 33.56 407.0 1
log file sync 337.060 26.22 592.4 1
log file switch completion 7.410 0.58 145.3 0
log file single write 0.170 0.01 8.5 0
buffer busy waits 0.030 0.00 30.0 0
Figure 8-11. Shown is a common v$system_event interval report clearly indicating

severe redo-related performance problems, with instance processes waiting 40% of their wait
time on the log writer background process to complete a write.
Figure 8-11 also shows that to Oracle, the average log writer background process write
call takes over 200 ms! Here is an example of where OraPub’s 3-circle analysis shines.
Instead of the usual, “It’s the application SQL” or “The IO subsystem is operating poorly,” at
this point, all we confidently know is that Oracle’s sequential IO write requirements have
exceeded the IO subsystem’s capacity. And, as I’ll detail shortly, we can develop solutions
from an Oracle, application, and operating system perspective. This is how you build a more
cooperative solution and avoid the usual adverse and destructive finger-pointing sessions.
Figure 8-12 is a good example of how having more than just the average wait can create
a more informative and persuasive argument. The OSM swhistx.sql report is based on an
interval of time (you simply run the report once, pause as long as you want, and then run it
again) and clearly shows only 11.5% of all log writer background process write requests
completed in 16 ms or less. If your IO write goal is 5 ms or less, or perhaps 10 ms or less,
you’re not even close! If you’re feeling feisty, you can even say that only around half of all
log writer background process writes completed in 64 ms or less (just make sure you didn’t
cause the problem in the first place).
If you start telling people write calls are taking over 200 ms, you’re asking for a fight, so
be prepared. Even if you are correct, you should expect to be aggressively challenged. So
make sure you trace the log writer background process in summary, as shown in Figure 8-13,
and in detail, as shown in Figure 8-10. Figure 8-10 highlights the log writer background
process is issuing 1MB write calls, Oracle is gathering its timing information from the
gettimeofday call and also gathering log writer background process CPU consumption
details from the times call. If you run an IO subsystem-focused command, such as iostat,
its numbers could be significantly different based on IO caching and physical configuration.
So focus on how long it takes the Oracle log writer background process to have its writes
serviced.
317
SQL> @swhistx
Remember: This report must be run twice so both
the initial and final values are available.

Report: swhistx.sql OSM by OraPub, Inc. Page 1
Wait Event Activity Delta Histogram (event:log%file%par%write)
Running
Wait Event ms Wait <= Delta Occurs Occurs %
------------------------------- ----------- ------------ --------
log file parallel write 1 6 0.27
Figure 8-12. Shown is a v$event_histogram view, allowing us to glean more than the
average wait time. For example, we can say that only 3.3% of all the log writer background
process writes completed in 8 ms or less. Even more shocking, only 12% of log writer writes
completed in less than 16 ms or less.
Figure 8-13 also shows that over the 60-second trace, the log writer made 18
getrusage calls. Back in Chapter 5, I explained that Oracle’s system time model,
introduced with Oracle Database 10g (v$sess_time_model and v$sys_time_model),
gathers Oracle process CPU consumption by having the processes ask the operating system
for their process’s CPU consumption through a getrusage call, and then places that
information into Oracle internal structures. As performance specialists, we can see this CPU
consumption via the system time model views. Oracle states the CPU time will be accurate
within 5 seconds; my tests have shown the actual time to be typically just over 6 seconds. In
the 60-second interval shown in Figure 8-13, since the log writer background process made 18
getrusage calls, the log writer background process CPU consumption information should
be accurate to within around 3.3 seconds.
318
[oracle@fourcore ~]$ ps -eaf|grep lgwr

oracle 3329 1 0 May24 ? 00:05:32 ora_lgwr_prod16
[oracle@fourcore ~]$ strace -cp 3329
------ ----------- ----------- --------- --------- ----------------
99.96 0.925045 1360 680 pwrite64
0.03 0.000240 7 33 semop
0.01 0.000097 0 2978 gettimeofday
0.01 0.000048 3 19 pread64
0.00 0.000018 0 2249 times
0.00 0.000000 0 1 read
0.00 0.000000 0 4 write
0.00 0.000000 0 3 open
0.00 0.000000 0 3 close
0.00 0.000000 0 1 time
0.00 0.000000 0 1 kill
0.00 0.000000 0 18 getrusage
0.00 0.000000 0 2 writev
0.00 0.000000 0 70 semctl
0.00 0.000000 0 89 1 semtimedop
------ ----------- ----------- --------- --------- ----------------
100.00 0.925448 6151 1 total
Figure 8-13. Shown is a 60-second interval operating system summary trace of the log writer
background process. Within the 60-second interval, Oracle issued 680 write calls
(pwrite64), nearly 3,000 gettimeofday calls, and over 2,200 times calls. During this
interval, the average write call reported from this operating system utility was 925 ms!
Gathering Oracle’s IO Requirements

Just how much IO the log writer background process is generating can be determined from
Oracle’s instance activity view, v$sysstat. Keep in mind that a single Oracle IO operation
may translate into multiple physical IO device operations due to the IO subsystem
configuration. Most IO administrators want Oracle’s IO requirements from either an IO
operations per second (IOPS) perspective or a megabytes per second (MB/s) perspective. In
effect, what they will need to do is ensure the IO subsystem has the capacity to process both
Oracle’s read and write requirements within acceptable service levels. While every system has
its own service level requirements, common service levels are for Oracle reads to be satisfied
within 10 ms and writes to be satisfied within 5 ms.
Figure 8-14 shows both the log writer background process IO targeted script and the
output. The script accepts the reporting interval as an argument and reports only log writer
background process write statistics. Figure 8-15 shows a sample output from a system
experiencing severe log file parallel write waits.
319
def delay_s=&&1
prompt Gathering initial values and sleeping for &delay_s seconds...
set termout off
col x new_value rdo_iops_t0 noprint

select value x
from v$sysstat
where name = 'redo writes';
col x new_value rdo_mb_t0 noprint

select value/(1024*1024) x
from v$sysstat
where name = 'redo size';
exec dbms_lock.sleep(&delay_s);
set termout on
col iops format 9,990.000 heading "LGWR IOPS"

col mbs format 9,990.000 heading "LGWR W MB/s"
select (value-&rdo_iops_t0)/&delay_s iops

from v$sysstat
where name = 'redo writes';
select ((value/(1024*1024))-(&rdo_mb_t0))/&delay_s mbs

from v$sysstat
where name = 'redo size';
Figure 8-14. Shown is a simple log writer background process IO script that will produce
both IO operations per second (IOPS) and MB/s. The script takes a single argument of the
interval gathering time.
SQL> @lgwrio 60
Gathering initial values and sleeping for 60 seconds...
LGWR IOPS
----------
1.600
LGWR W MB/s
-----------
4.070
SQL>
Figure 8-15. Shown is the output from the script shown in Figure 8-14 with a gathering
interval of 60 seconds. During the 1-minute interval, on average, the log writer wrote 1.6 IO
operations each second, or 1.6 IOPS, and it wrote an average of 4.070 MB/s.
While it is useful and relatively simple to collect only the log writer IO statistics, the IO
subsystem administrators may want to see a fuller perspective. Through only Oracle’s
v$sysstat view, all Oracle read and write requirements can be determined for both the
320
background processes and server processes, broken out by reads and writes, and also by IOPS
and MB/s. Figure 8-16 shows the output from OraPub’s OSM script iosumx.sql. The log
writer portion of the script gathers the same information as shown in Figure 8-14.
SQL> @iosumx
final values are available. If no output, press ENTER a few times.
Database: prod16 27-JUN-10 03:43pm

Report: iosumx.sql OSM by OraPub, Inc. Page 1
Oracle IO Interval (v$sysstat) Summary
IOP/s and IOP

Total Read : .593 115
Total Write : 883.371 171374
Total R+W : 883.964 171489
MB/s and MB
Total Read : .005 .898
Total Write : 14.154 2745.878
Total R+W : 14.159 2746.776
Detailed Component Data
Interval (s) : 194
SRVR Read IOP/s, IOP : .593 115
SRVR Read MB/s, MB : .005 .898
DBWR+SRVR Write IOP/s, IOP : 879.825 170686
DBWR+SRVR Write MB/s, MB : 7.489 1452.782
LGWR Write IOP/s, IOP : 3.546 688
LGWR Write MB/s, MB : 6.665 1293.097
Figure 8-16. Shown is OraPub’s OSM iosumx.sql script, which gathers and displays all
Oracle IO details at the instance level. This can be very helpful when working with IO
administrators who need to understand Oracle’s IO requirements.
Application-Focused Solutions for Log File Parallel Write Contention

When focusing on the application, identify the redo-producing SQL, and then find ways to
reduce the volume of redo generated over the period of concern. The metric we are interested
in is the redo generated per second; that is, the redo generation rate. When focused on the
application, we are more interested in the bytes of redo generated, rather than the number of
redo entries or the number of log writer writes.
The redo generation rate in bytes can be gathered from both v$sesstat and
v$sysstat looking at the statistic redo size. Gather a beginning and an interval ending
value to calculate the redo generated over an interval and then dividing that by the interval
time to calculate the redo generation rate. Both Statspack and AWR show the redo generation
rate in their workload sections, labeling it simply “redo size.” And the raw v$sysstat redo
bytes generated during the reporting interval can be found in the Instance Statistics section.
The number of redo writes (occurrences) is the instance statistic redo writes.
Reducing redo generation by directly altering the application is rarely practical, but can
produce amazing performance gains. When an application needs to change data, it simply
must change the data. Some developers tend to update every column, even though only one
column was actually changed. But typically, a gain in this area is usually a batch processing-
focused change, using Oracle’s global temporary tables (discussed earlier in this chapter)
321
during interim processing, or workload balancing. These types of changes nearly always
result in long and cumbersome meetings, but the payoff can be dramatic and what may be
needed to solve the problem.
The heavy redo-generating SQL will obviously be some type of DML, like inserts,
updates, deletes, or perhaps create statements. If you have ASH licensed, as Figure 8-9
demonstrated, you can easily find the SQL related to the log file parallel write
event. If ASH is not an option, two factors complicate identifying the SQL:
• The log writer background process posts the log file parallel write wait
event, not a server process. The closest a server process will get to directly feeling
the log file parallel write wait is when posting a log file sync or
log buffer space wait events.
• Oracle does not track redo generation by SQL statement.
Therefore, when using a report similar to Figure 8-17, a Statspack, or an AWR report,
you should look for the highest elapsed time DML SQL.
The report shown in Figure 8-17 is a stripped-down version of Oracle’s OSM
topdml.sql report. There are many other columns of useful information that could be
added to help identify the offending SQL and locate its application source. Identifying where
in the application the SQL occurs is the first step in determining if the SQL statement itself,
its application area, or its execution rate can be altered to reduce redo creation.
322
SQL> l
1 select *
2 from
3 (
4 select SQL_ID,
5 elapsed_time/1000000 e_time,
6 dense_rank() over(order by elapsed_time desc) toprank,
7 DECODE(command_type,
8 2, 'INSERT',
9 3, 'SELECT',
10 6, 'UPDATE',
11 7, 'DELETE',
12 26, 'LOCK',
13 42, 'DDL',
14 44, 'COMMIT',
15 47, 'PL/SQL BLOCK',
16 command_type) c_type,
17 substr(sql_text,1,15) s_text,
18 module,
19 executions/1000 execs
20 from v$sql
21 where command_type in (2,6,7,44)
22 )
23* where toprank <= 10
SQL> /
ELAPSED
SQL_ID TIME(s) SQL TYPE SQL TEXT MODULE EXECS(k)
------------- ------ ------------ --------------- -------- ----------
ajs17jprc8xc0 9,215 UPDATE UPDATE CUSTOMER mg 32
3mqwga6fvn0bg 1,049 UPDATE UPDATE SPECIAL_ mg 32
dyqmgjbfvvurs 850 UPDATE UPDATE ORDERS S oe 32
cghf05c6r758h 18 INSERT INSERT /*+ APPE 1
fn5y3xm766a9r 12 UPDATE UPDATE BOGUS_RA oe 0
fn5y3xm766a9r 12 UPDATE UPDATE BOGUS_RA oe 0
g337099aatnuj 9 UPDATE update smon_scn 0
0sy05h3pw6f30 9 INSERT insert into wrh 0
350myuyx0t1d6 8 INSERT insert into wrh 0
9txzuppkuffg7 8 INSERT INSERT INTO wrh 0
10 rows selected.
Figure 8-17. Shown is a condensed version of OraPub’s OSM script topdml.sql, which
helps identify redo-generating SQL. Additional columns that can be useful are centered
around locating where the SQL resides or its source, such as the action and service
columns.
Operating System-Focused Solutions for Log File Parallel Write Contention

The operating system should have a painfully raging IO subsystem bottleneck. If not, then
there must be IO blockage somewhere between the log writer background process and the
physical disks. For example, if a vendor-specific tool indicates the IO devices are performing
well within tolerance, yet you can clearly demonstrate the jaw-droppingly slow IO system call
response times (for example, the output from tracing the log writer background process), you
will need to work closely with the IO team to figure out where in the IO stack the bottleneck
is occurring.
323
There are many pieces of hardware and software between the actual physical disk and
the log writer background process, and any one of them could be the bottleneck. Performance
specialists will tell you they have seen everything from broken Ethernet cables and connectors
to nearly all IO being routed to a single controller or a host-based adapter (HBA) card.
Expect the IO team to be defensive and possibly very aggressive in response to this
news. After all, when millions of dollars are spent on an IO subsystem and the average write
takes 50 ms, people get very nervous. Keep stressing that Oracle’s sequential IO write
requirements have simply exceeded the IO subsystem’s capacity, and that people are working
to reduce the IO requirements by focusing on Oracle and on the application. You can also
show them the IO requirements Oracle is placing on the IO subsystem. Reports similar to the
ones shown n Figures 8-10, 8-15, and 8-16 can be very helpful.
Oracle-Focused Solutions for Log File Parallel Write Contention

Oracle-focused solutions center on reducing redo generation and ensuring the log writer
background process is writing efficiently. We want to make sure that when the log writer
background process takes the time to issue a write request, it pushes down a large batch of
redo. It is much more efficient to perform a single 1MB IO write than to repeatedly perform
512 writes of 2,048 bytes.
As I mentioned in the previous section about application-focused solutions, seriously
consider finding creative ways to implement Oracle’s global temporary tables. If the redo is
associated with a restartable batch process, there is a good possibility of success. Using global
temporary tables can cut redo generation by 50%, substantially reducing Oracle’s overall IO
requirements. This alone could solve the log file parallel write problem.
From a more Oracle internals perspective, delaying log writer background process writes
so each write will be a full 1MB IO call will maximize log writer efficiency. Based on what
I’ve said, you’re probably already thinking about some of these options:
• Increase the log buffer size. This will reduce the likelihood of the log buffer filling
forcing the log writer to immediately write possibly with less than its full 1MB
potential.
• Increase the _log_io_size parameter. Changing this parameter to 2048 will
ensure the typical 2,048, 512-byte operating size blocks will also generate a full
1MB write.
• Use the commit write facility with both the batch and nowait options. With a
full understanding of the potential risk of losing committed data, the commit write
facility’s batch option will specifically batch redo entries to maximize redo writing
affectivity. Plus, the nowait option tells the log writer background process to write
in asynchronous mode, which will most likely result in a near-immediate completion
of writes.
So, when one of the top wait events is log file parallel write, remember that
there are a number of Oracle-focused solutions available to you.
Log File Switch Contention

Every Oracle DBA will eventually encounter the log file switch wait events. To summarize,
the log writer background process is ready to switch writing from one online redo log group
324
and begin writing to another, but it is prevented from doing so. As a result, all DML-related
activity eventually comes to a screeching halt. There are two main ways this can occur, and
each has its own related wait event. Figure 8-18 will be used to clarify both situations.
Figure 8-18. Shown are the log writer (LGWR) and archive (ARCH) background processes,
three online redo log groups, and a disk containing all the archived redo logs. The log writer
background process is currently writing to online redo log group RLOG 100, and the archive
background process is reading online redo log group RLOG 300 and writing it to the
archived redo log disk.
Checkpoint Incomplete
To complete a database checkpoint, the database writer must write every associated dirty
buffer to disk, and then every database file and control must be updated with the latest
checkpoint number. Keep in mind that regardless of the archive mode, Oracle promises that
committed transaction data will reside in an Oracle database file, either in .dbf files or redo
log files.
Using Figure 8-18 as a reference, suppose the log writer (LGWR) background process
just finished writing to redo log group RLOG 100, and is ready switch to redo log group
RLOG 200 and begin writing. However, the database writer background process (not shown)
is still finishing the checkpoint related to the redo entries in redo log group RLOG 200. If the
log writer background process were to switch and begin writing into redo log group RLOG
200, it would overwrite committed transaction details that at that time may only reside in the
buffer cache. If the instance crashed, the committed transactions, which reside only in the
buffer cache, would be lost forever! To prevent this, in this example, the log writer
background process will wait to switch and wait to begin writing into redo log group RLOG
200 until the database writer background process has finished writing all the dirty buffers
related to the checkpoint. While the log writer background process is waiting (effectively
stopped writing) it will post a log file switch (checkpoint incomplete) wait
event.
The solution is very simple. Add more redo log groups to give the database writer
background process more time to complete the checkpoint. If the log switch rate is
uncomfortably fast,8 then create new, larger redo log group members, and drop the old redo
log groups. No downtime is involved when adding or dropping the online redo logs.
8
There are many factors involved in deciding how fast redo log switches should occur. Paramount is
recoverability and mean time to recover, not performance. Once the availability issues have been addressed, then
work on slowing redo log switch time until the associated wait events are insignificant.
325
Archive Incomplete
Similar to the checkpoint incomplete wait event, if an online redo log is being archived, the
log writer background process will not switch to it and begin writing until it has been
archived. Doing so would destroy redo that has not yet been archived. If the log writer
encounters a nonarchived redo log, it will stop writing and post the wait event log file
switch (archive incomplete).
This wait event can be more difficult to resolve quickly because it can clash with
recoverability requirements. There are two basic solution approaches:
• Slow down the log switching by increasing the number of online redo log groups
and/or increasing the online redo log member size so the archive background process
can catch up when the redo writing slows a bit.
• Have the archive process write to a faster device.
While this sounds simple enough, for recovery reasons, the archive logs may have all
sorts of recoverability requirements that significantly slow their archiving. For example, the
archived redo log devices may be mirrored, the archived redo logs may need to be written
directly to tape (very rare these days), to network-attached storage (NAS), or any number of
strategies build to meet recovery requirements.
If slowing down the redo log group switching is still not working, you may need to work
closely and cautiously with the recovery team to increase archiving performance. If a solution
focusing on increasing archiving IO capacity cannot be found, then the solution focus will
have to switch to reducing Oracle redo requirements as presented in the preceding section
about the log file parallel write wait event.
Summary
Oracle redo is pure overhead, but it’s this overhead that enables all sorts of advanced recovery
options like point-in-time, table-level, system change number (SCN), and transaction-level.
Ensuring this overhead does not negatively impact performance can be challenging at times.
Fortunately, the redo process is fairly straightforward (create redo, allocate space, copy the
redo, write the redo). and there are a number of very specific wait events. And on top of this,
there are a number of advanced Oracle features directly created to deal with unique redo
requirements. Redo-related performance issues can be especially tense because they are
usually deeply application and IO subsystem related. Making changes to the application and
the IO subsystem is something no one desires, but if Oracle’s redo requirements are exceeding
the capacity of the IO subsystem, sometimes all three perspectives must be intimately
involved with solving the problem.
This completes the three Oracle internals chapters! The next chapter begins a journey
into understanding what kind of impact we can expect from our proposed performance-
increasing solutions.
326
CHAPTER
9
Oracle Performance
Analysis
How many times have you been asked, “So what kind of performance improvement can we
expect?” It’s an honest and painfully practical question, which deserves an honest answer.
Unfortunately, while Oracle professionals are proficient in many areas, one area where there
is a glaring gap is in understanding the impact of their proposed performance solutions. The
skill required to answer this question requires deep Oracle performance analysis proficiency.
This chapter’s coverage borders on predictive performance analysis and some serious
mathematics, yet I’ll keep focused on simplicity, practicality, and value (for example, I will
limit the number of Greek symbols and mathematical formulas to the absolute bare
minimum). Furthermore, I will always tie the analysis back to the fundamentals: an Oracle
response-time analysis, OraPub’s 3-circle analysis, and solid Oracle internals knowledge. I do
not intend to explain how to plan IT architectures. My goal is to provide substance,
conviction, and useful information, and to motivate change toward scientifically ranking the
many possible performance-enhancing solutions.
All anticipatory performance questions require a solid grasp of response-time analysis,
which is the first topic of this chapter. The good news is that if you have a solid understanding
of the topics covered in the first few chapters, you are adequately prepared. (If you just
opened this book, I highly recommend you review those first few chapters, as they set the
foundation for this chapter.) Next, I’ll present a fundamental and surprisingly flexible concept
commonly called utilization. Response-time analysis combined with a solid grasp of
327
Oracle Performance Analysis
utilization will prepare you for the next topic, which is understanding the various ways our
solutions influence performance. Finally, we’ll dive into anticipating our solution’s impact in
terms of time and utilization. To ensure you can do everything presented in this chapter, I will
provide a number of examples. That’s a tall order for a single chapter, so let’s get started!
Deeper into Response-Time Analysis

Oracle performance analysis that is fundamentally based on response-time analysis has the
inherent advantage of naturally being expanded into anticipating change. To do this, we need
to take the components of response time another level deeper, reduce some of the abstraction I
have been using in this book, and examine the relationship between response-time
components specifically used in an Oracle environment.
Oracle Arrival Rates

When rivers flow into the ocean, people enter an elevator, and transactions enter an Oracle
system, over an interval of time, they arrive at an average rate. It could be that just before
Friday’s time card entry deadline between 4:30 p.m. and 5:00 p.m, 9,000 transactions
occurred.
The average arrival rate is expressed in units of work and units of time. In the prior time
card example, the arrival rate is likely to be expressed in terms of transactions and minutes.
The math involved is very straightforward: divide the total number of transactions that arrived
by the time interval. For the example, it would be 9,000 transactions divided by 30 minutes,
which is 300 trx/min.
There can be a difference between the rate of transaction arrivals (or entry) and the rate
of transaction exits. The actual transactions being processed is known as the workload. A
system is deemed stable when, on average, the transaction entries equal the transaction exits.
If this does not occur, eventually either so many transactions will build up on the system that
it will physically shut down, or there will be so few transactions that no work will be
performed. Because of this equality, for our work with Oracle systems, it is acceptable to refer
to the arrival rate as the workload and vice versa. Use the term that makes your work easily
understandable for your audience.
The arrival rate symbol is universal in all publications and it is the Greek letter lambda.
For the example of an arrival rate (the work performed over a period of time) of 9,000
transactions over a 30-minute period, using symbols and converting to seconds, the arrival
rate calculation is as follows:
9000trx 1m
!= = 300trx /m " = 5trx /s
30m 60s
Figure 9-1 is an actual Statspack report from an Oracle Database 10g Release 2 system
that was experiencing severe cache buffer chain (CBC) latch contention. The Load Profile
section appears near the top of both the Statspack and AWR reports. Over the Statspack
reporting duration, on average, Oracle processed 0.22 transaction each second, 145,325
logical IOs per second, and 415 user calls per second. These reports have captured an initial
value and ending value from a specific statistic, such as commits, database calls, or perhaps
redo generated.
328
Load Profile Per Second Per Transaction

~~~~~~~~~~~~ --------------- ---------------
Redo size: 22,936.28 103,552.74
Logical reads: 145,324.74 656,112.40
Block changes: 127.49 575.58
Physical reads: 3.68 16.61
Physical writes: 3.39 15.29
User calls: 414.85 1,872.97
Parses: 6.94 31.31
Hard parses: 0.11 0.48
Sorts: 68.61 309.75
Logons: 0.08 0.36
Executes: 192.89 870.86
Transactions: 0.22
Figure 9-1. Shown is a Statspack Load Profile section from an active Oracle Database 10g
Release 2 system experiencing serious CBC latch contention. Each load profile metric can be
used to represent the arrival rate (the workload).
The load profile calculations are as follows:
S1 " S0
!=
T
where:
• λ is the arrival rate
• S1 is the ending snapshot, captured, or collected value.
• S0 is the initial snapshot, captured, or collected value.
• T is the snapshot or snapshot interval.
The following is an example of how we could gather the arrival rate, expressed in user
calls per second, over a 5-minute (300-second) period.
329
SQL> select name, value

2 from v$sysstat
3 where name = 'user calls';
NAME VALUE
---------------------------------------------------- ----------
user calls 37660
1 row selected.
SQL> exec sys.dbms_lock.sleep(300);
SQL> select name, value

2 from v$sysstat
3 where name = 'user calls';
NAME VALUE
---------------------------------------------------- ----------
user calls 406376
1 row selected.
Placing the collected Oracle workload data into the arrival rate formula, expressed in
user calls per second, the arrival rate is 1,342.05 uc/s.
S1 " S0 406,376uc " 37,660uc 368,716uc

!= = = = 1,229.05uc /s
T 300s 300s
The Statspack and AWR reports calculate their load profile metrics the same way, but
they store the collected data in difference tables and use different sampling techniques. For
example, the Statspack facility typically collects data in 60-minute intervals and stores the
data in tables starting with stats$ (the key table is stats$snap). The AWR report draws
from the WRH$ tables, which contain summarized active session history (ASH) data. With just
a little creativity, you can devise your own reports that pull from the Statspack or AWR
tables.
Using the arrival rate formula and the Statspack data shown in Figure 9-2, we can easily
perform the same load profile calculations as the Oracle Statspack developers (shown in
Figure 9-1). The raw data used for the calculations is included in the Snapshot and Instance
Activity Stats sections. The snapshot interval, labeled simply as “Elapsed,” is expressed in
minutes, which is the difference between the beginning snap time and ending snap time. The
workload interval activity has been calculated and is displayed in the Instance Activity Stats
section. While not shown, the interval activity is simply the difference between the ending
statistic value and the beginning statistic value.
330
Snapshot Snap Id Snap Time Sessions Curs/Sess Comment

~~~~~~~~ ---------- ------------------ -------- --------- ------------
Begin Snap: 2625 02-Oct-08 08:15:02 139 8.2
End Snap: 2635 02-Oct-08 10:45:00 164 8.5
Elapsed: 149.97 (mins)
...
Instance Activity Stats DB/Inst: PDXPROD/PDXPROD Snaps: 2625-2635

------------------------------- -------------- -------------- ------------
CPU used by this session 1,675,794 186.2 840.8
consistent gets 1,306,587,656 145,208.7 655,588.4
db block gets 1,044,354 116.1 524.0
user calls 3,732,826 414.9 1,873.0
user commits 1,993 0.2 1.0
user rollbacks 0 0.0 0.0
workarea executions - onepass 2 0.0 0.0
...
Figure 9-2. Two other Statspack sections from the report in Figure 9-1. The timing detail and
a few of the statistics in the Instance Statistics sections are shown. This is enough information
to calculate the workload; that is, the arrival rate expressed in commits, transactions, logical
IOs, and user calls per second.
Using the data shown in Figure 9-2, the user calls per second workload metric is
calculated as follows:
3,732,826uc 1m 3,732,826uc
! uc / s = " = = 414.84uc /s
149.97m 60s 8,998.20s
Referencing the Load Profile’s user calls per second metric shown in Figure 9-1, notice
it closely matches our calculation of 414.84 uc/s. The difference is due to the two-digit
precision of 149.97 minutes. Knowing that an Oracle transaction contains both commits
(statistic user commits) and rollbacks (statistic user rollbacks), you can calculate
the transaction rate and compare that to the load profile. You’ll find the Statspack report does
the math correctly.
The arrival rate is one of the most fundamental aspects of expressing what is occurring
in a system—whether it’s Oracle, a river, or an expressway. As you’ll see in the following
sections, when combined with other metrics, the arrival rate can be used to convey part of the
performance situation and also provides clues about how our proposed performance solutions
will affect the system.
Utilization
If I were standing in front of you right now, I would have in my hands an empty glass and a
pitcher of water. I would hold out the empty glass and say over and over, “capacity.” Then I
would hold out the pitcher and say repeatedly, “requirements.” Then I would ask you, “Is the
water going to fit in the glass? Are the requirements going to exceed the capacity?” In IT,
331
what usually occurs is the water is poured in the glass, and we all look away, hoping it will fit.
After a while, we start feeling the water dripping down our arm, and we have a mess. That
mess is the result of the requirements exceeding the available capacity. When this occurs, we
have a performance firefighting situation.
Utilization is simply the requirements divided by the capacity:
R
U=
C
where:
• U is utilization
• R is requirements with the same units as capacity
• C is capacity with the same units as requirements
The performance management trick is to ensure the requirements will fit into the
available capacity. In fact, if we can mathematically express the requirements and capacity—
injecting alterations such of politics, budget, purchases, timing, and new and changing
workloads—we have a much better chance of anticipating change. But if we guess at the
requirements or the capacity, then everyone is just plain lucky if the solution works.
Requirements Defined
Requirements are one of the two metrics we need to derive utilization. Requirements can take
on many useful forms, like CPU seconds consumed per second or consumed in a peak
workload hour, IO operations performed per hour or in a single hour, or megabytes
transferred per second or per hour. We can also change the tense from the past, “CPU seconds
used yesterday between 9 a.m. and 10 a.m.” to “How much CPU is the application now
consuming each second?” or to the future, “How much CPU time do we expect the
application to consume during next year’s seasonal peak?” Don’t tie yourself to a single rigid
requirement definition. Throughout your work, allowing a flexible requirements definition
will help bring clarity to an otherwise muddy situation.
Requirements can also be articulated in terms of more traditional Oracle workload
metrics like user calls, SQL statement executions, transactions, redo bytes, and logical IO. For
example, referring to the workload profile shown in Figure 9-1, which is based on
v$sysstat, the workload can be expressed as 415 uc/s, 0.22 trx/s, 145,325 LIO/s, or
22,926 redo bytes generated per second. Referring to Figure 9-2, the system requirements can
also be expressed as 1,675,794 centiseconds (16,757.94 seconds) of CPU consumed over the
149.97-minute interval. This means on average every second, the Oracle instance consumed
1.862 seconds of CPU, which is a simpler way of saying 1.862 CPU seconds consumed per
second. At first, it may seem strange to speak of CPU consumed like this, but it is very correct
and sets us up for the next topic, which is capacity.
Once the definition of requirements is set, the data must be collected. Most Oracle
systems now collect Statspack or AWR data, which means the data collection is currently
occurring for you. Your job is to extract the necessary information.
332
Gathering CPU Requirements

When gathering CPU requirements, we typically look at the time model system statistics
(v$sys_time_model) or the instance statistics (v$sysstat). In previous chapters, I
have presented how to gather CPU requirements from the v$sesstat, v$sysstat,
v$ses_time_mode, and v$sys_time_model views.
The second part of Figure 9-2 shows a few instance statistics from a Statspack report.
Based on this data, during the Statspack reporting interval, Oracle processes consumed—that
is, required—1,675,794 centiseconds of CPU, which is 16,757.94 seconds of CPU.
The time model system statistics provide more accurate CPU consumption details.
Figure 9-3 shows the Time Model System Stats section of the Statspack report shown in
Figures 9-1 and 9-2. According to the time model statistics, Oracle processes (server and
background) consumed 16,881.6 seconds of CPU during the reporting interval of 149.97
minutes. Notice that in this case, there is little difference between the instance (shown in
Figure 9-2) and time model (shown in Figure 9-3) CPU consumption statistics.
Time Model System Stats DB/Inst: PDXPROD/PDXPROD Snaps: 2625-2635

-> Ordered by % of DB time desc, Statistic name
Statistic Time (s) % of DB time

----------------------------------- -------------------- ------------
sql execute elapsed time 578,732.7 99.2
DB CPU 16,749.8 2.9
parse time elapsed 300.2 .1
PL/SQL execution elapsed time 252.0 .0
hard parse elapsed time 189.5 .0
connection management call elapsed 115.6 .0
RMAN cpu time (backup/restore) 96.7 .0
repeated bind elapsed time 55.9 .0
PL/SQL compilation elapsed time 5.7 .0
hard parse (sharing criteria) elaps 2.9 .0
sequence load elapsed time 1.9 .0
failed parse elapsed time 0.0 .0
hard parse (bind mismatch) elapsed 0.0 .0
DB time 583,482.3
background elapsed time 12,739.5
background cpu time 131.8
Figure 9-3. Shown is the Time Model System Stats section from the same Statspack report as
shown in Figures 9-1 and 9-2. During the Statspack reporting interval, Oracle server
processes consumed 16,749.8 seconds of CPU and Oracle background processes consumed
131.8 seconds of CPU.
Along with using Statspack or AWR to gather and report CPU consumption, you can
also easily collect this information yourself. Simply gather the initial value, final value, and if
you want, the consumption per second, over the desired time interval. Figure 9-4 shows a
code snippet used to collect CPU consumption based on v$sys_time_model. During the
60-second interval, the Oracle instance processes consumed 82.4 seconds of CPU; that is, on
average 1.37 seconds each second.
333
SQL> def interval=60

SQL> col t0_s new_value t0_s
SQL> select sum(value)/1000000 t0_s
2 from v$sys_time_model
3 where stat_name in ('DB CPU','background cpu time');
T0_S
----------
498.995974
1 row selected.
SQL> exec sys.dbms_lock.sleep(&interval);
SQL> select sum(value)/1000000 t1_s,

2 sum(value)/1000000-&t0_s CPU_s_Consumed,
3 (sum(value)/1000000-&t0_s)/&interval CPU_s_Consumed_per_sec
4 from v$sys_time_model
5 where stat_name in ('DB CPU','background cpu time');
old 2: sum(value)/1000000-&t0_s CPU_s_Consumed,
new 2: sum(value)/1000000-498.995974 CPU_s_Consumed,
old 3: (sum(value)/1000000-&t0_s)/&interval CPU_s_Consumed_per_sec
new 3: (sum(value)/1000000-498.995974)/60 CPU_s_Consumed_per_sec
T1_S CPU_S_CONSUMED CPU_S_CONSUMED_PER_SEC

---------- -------------- ----------------------
581.431481 82.435507 1.37392512
1 row selected.
Figure 9-4. Shown is a code snippet used to collect and then determine instance CPU
consumption, based on v$sys_time_model, over a 60-second interval. The CPU
consumed (82.4s) and also the CPU consumed per second (1.37s) are displayed.
Gathering IO Requirements
Gathering IO requirements is more complicated than gathering CPU requirements. Oracle9i
Release 2 and earlier require querying from both v$sysstat and v$filestat, whereas
later Oracle releases require querying only from v$sysstat. And depending on the
information desired, different statistics are required. The following snippet shows the
formulas for raw Oracle IO consumption (requirements) for Oracle9i Release.2 and earlier
versions:
Server process read IO operations

= sum(v$filestat.phyrds)
Server process read MBs
= sum(v$filestat.phyblkrd X block size) / (1024 X 1024)
Database writer and server process write IO ops

= sum(v$filestat.phywrts)
Database writer and server process write MBs
= sum(v$filestat.phyblkwrt X block size) / (1024 X 1024)
334
Log writer write IO operations
= v$sysstat.redo writes
Log writer write MBs
= (v$sysstat.redo size) / (1024 X 1024)
The following formulas are appropriate for versions later than Oracle9i Release 2:
Server process read IO operations

= v$sysstat.physical read IO requests
Server process read MBs
= (v$sysstat.physical reads X block size) / (1024 X 1024)
Database writer and server process write IO ops

= v$sysstat.physical write IO requests
Database writer and server process write MBs
= (v$sysstat.physical writes X block size) / (1024 X 1024)
Log writer write IO operations

= v$sysstat.redo writes
Log writer write MBs
= (v$sysstat.redo size) / (1024 X 1024)
Figure 9-5 shows the instance statistics we need to calculate Oracle’s IO consumption
(its requirements) over Statspack’s 149.97-minute interval. The only missing piece of
information is the Oracle block size, which is needed to determine the MB/s figures. While
not shown, the value is found in the Instance Parameter portion of any Statspack or AWR
report. For this example, the db_block_size is 8192, which is 8KB.
Instance Activity Stats DB/Inst: PDXPROD/PDXPROD Snaps: 2625-2635

--------------------------------- ------------ -------------- ------------
physical read IO requests 24,027 2.7 12.1
physical reads 33,106 3.7 16.6
physical write total IO requests 22,959 2.6 11.5
physical writes 30,466 3.4 15.3
redo size 206,380,604 22,936.3 103,552.7
redo writes 3,314 0.4 1.7
Figure 9-5. Based on the same data as Figures 9-1, 9-2, and 9-3, shown are Oracle IO-
related consumption (requirements) metrics. This is an Oracle Database 10g Release 2
system, so all the metrics can be gathered from the Instance Statistics section (based on
v$sysstat), and then used to calculate read and write IO requirements for the server
processes and background processes in both megabytes per second (MB/s) and IO operations
per second (IOPS).
Using the Oracle9i Release 2 and later formulas and based on the Statspack information
shown in Figure 9-5, to determine the total IO read and write operations, we must sum the
server process, database writer background process, and log writer background process IO
read and write operations. The results are as follows:
335
Server process read IO operations = 24,027
DBWR and server process write IO operations = 22,959
LGWR write IO operations = 3,314
The total of read and write IO operations over the 149.97-minute interval was 50,300.
To get the standard IO operations per second (IOPS), simply divide the total IO operations by
the reporting interval, remembering to convert to seconds.
operations 50,300ops 1m
IOPS = = ! = 5.59ops /s = 5.59IOPS
time 149.97m 60s
When the IO administrator asks for Oracle’s IO requirements, based on the Statspack
report time period, you can confidently say Oracle’s IO requirements were 5.59 IOPS. And if
the IO administrator wants the breakdown by read and write operations, or even in megabytes
per second, that can also be provided. Be sure the IO administrator understands this is truly
Oracle’s IO requirements and it is likely the IO subsystem will require multiple physical IO
actions to complete a single Oracle IO operation.
Capacity Defined
Capacity is the other aspect of utilization. As I mentioned earlier, capacity is the empty
glass—it is how much a resource can hold, can provide, or is capable of. Like requirements,
capacity can take on many forms. A specific database server with a specific configuration has
the capability to provide a specific number of CPU cycles each second or a number of CPU
seconds each second. An IO subsystem has a specific capacity that can be quantified in terms
of IOPS or MB/s. It could also be further classified in terms of IO write capacity or IO read
capability. But regardless of the details, capacity is what a resource can provide.
Gathering CPU Capacity
We have already touched on gathering capacity data in a number of areas of this book, so I
will make this brief. The trick to quantifying capacity is defining both the unit of power and
the time interval. For example, the time interval may be a single hour, and the unit of power
may be 12 CPU cores. Combining the time interval and the power unit—12 CPU cores over a
1-hour period of time—we can say the database server can supply 720 minutes of CPU power
over a 1-hour period (12 CPUs × 60 minutes) or 43,200 seconds of CPU power over a 1-hour
period (12 CPUs × 60 minutes × 60 seconds / 1 minute).
A very good way to quantify CPU capacity is based on the number of CPU cores.1 The
number of CPU cores can be gathered from an operating system administrator. Additionally,
the v$osstat view is available with Oracle Database 10g and later versions. While obvious
in the particular Statspack report shown in Figure 9-6, Oracle does not always clearly label
the number of CPU cores. In this case, you can usually spot the value that represents the
number of CPU cores, but you should always double-check, because the number the CPU
cores is so important. It is used to calculate the database server’s CPU capacity and is a key
parameter when calculating capacity and utilization. Figure 9-6 indicates there are two CPUs,
1
The number of CPUs or the number of CPU threads does not accurately reflect the processing power for an
Oracle-based system. There are many reasons for this, but in summary, this is based on Oracle’s multiple process
architecture that can take advantage of multiple cores, but not so much multiple threads per core or CPU.
336
but in reality, there is a single dual-core CPU providing two CPU cores’ worth of processing
power.
OS Statistics DB/Inst: PDXPROD/PDXPROD Snaps: 2625-2635

-> ordered by statistic type (CPU use, Virtual Memory), Name
Statistic Total
------------------------- ----------------------
BUSY_TIME 1,802,616
IDLE_TIME 0
SYS_TIME 96,260
USER_TIME 1,706,356
LOAD 93
OS_CPU_WAIT_TIME 69,269,700
VM_IN_BYTES 69,025,792
VM_OUT_BYTES 0
PHYSICAL_MEMORY_BYTES 17,171,005,440
NUM_CPUS 2
Figure 9-6. Shown is the operating system statistics portion of the same Statspack report
shown in Figures 9-1, 9-2, 9-3, and 9-5. This particular database server has a single dual-
core CPU.
CPU capacity can be defined as the duration multiplied by the number of CPU cores.
CPU capacity = duration × number of CPU cores
For example, as shown in Figure 9-1, the time interval is an unusual 149.97 minutes.
(Usually, Statspack reports are run for a single hour or two.) Therefore, the CPU capacity
based on the Figure 9-6 Statspack report is as follows:
299.94 min = 149.97 min × 2 CPU cores
Converting to seconds, this CPU subsystem can provide 17,996.40 seconds of CPU
capacity within the 149-minute interval.
Gathering IO Capacity
Unlike when I need to gather CPU capacity, if I must determine an IO subsystem’s capacity, I
ask the IO administrator. As detailed in the “Gathering IO Requirements” section earlier, we
have the information needed to determine Oracle’s IO requirements, but determining IO
capacity with authority is best done by the IO subsystem team. If your IO subsystem is simply
a series of SCSI drives daisy-chained together, as was done in the 1980s and early 1990s, then
simple math can be used to predict the IO subsystem’s capacity. However, the combination of
read and write caching and batching from both Oracle and the operating system virtually
eliminates the possibility of deriving a reliable IO capacity figure. Surely, we can gather and
even predict IO requirements, but predicting IO capacity is something I simply will no longer
attempt.
When talking with your IO administrator about capacity, ask for both read and write
capacity in either MB/s or IOPS. While the IO administrator may not classify the read and
write requirements, because Oracle systems have very unique read and write characteristics,
337
just to be safe, it is always best to ask for the requirements of both. Since we can gather IO
requirements in both MB/s and IOPS, it really doesn’t make much difference to us in which
form capacity is delivered or expressed.
Calculating Utilization
With both requirements and capacity defined and the data collection covered, let’s use them
together. The classic requirements-versus-capacity indicator is utilization. It can be applied in
a wide variety of situations—from river water flow and factory production to Oracle
performance analysis.
Oracle CPU Utilization
To calculate Oracle’s CPU utilization, we need Oracle’s CPU requirements (consumption)
and the operating system’s CPU capacity. Figure 9-3 provides the CPU requirements of
16,881.6 seconds of CPU and Figure 9-6 provides the capacity details of two CPU cores.
Figure 9-2 provides the sample interval necessary to complete the CPU capacity calculation.
We place these numbers into the utilization formula:
R 16,881.6s 1m 16,881.6
U= = ! = = 0.938 = 94%
C 2cores !149.97m 60s 17,996.4
This means that during the reporting interval, the Oracle instance consumed 94% of the
available CPU! This also means that only 6% CPU power remains for all other processes.
Obviously, this server is experiencing a severe CPU bottleneck.
In the preceding formula, I purposely included the conversion factors. Notice that all the
time units cancel out, leaving us with a raw numeric without any reference to time. I could
have carried the CPU core’s metric to the final value of 94%, since 94% of the CPU core
capacity was utilized, but since we normally don’t represent utilization like this, it could cause
some confusion.
It is important to understand this calculation is not the operating system CPU utilization,
which can be gathered using an operating system command such as vmstat. What we
calculated with this utilization formula is Oracle’s CPU consumption related to the database
server CPU capacity. This is commonly called the Oracle CPU utilization.
Operating System CPU Utilization
Besides gathering operating system CPU utilization using standard operating system
commands such as vmstat and sar, starting with Oracle Database 10g, both operating
system CPU consumption and CPU capacity details are available through the v$osstat
view. We still need the sample interval, which is 149.97 minutes, as shown in Figure 9-2.
Database server CPU consumption is shown in the BUSY_TIME statistic in Figure 9-6
and represented in centiseconds. The BUSY_TIME statistics is the sum of any and all
database server process CPU consumption during the sample interval. Based on Figure 9-6,
during the reporting interval, all operating system processes consumed a total of 1,802,616
centiseconds of CPU.
Database server CPU capacity is calculated in the same manner as the Oracle CPU
utilization. Placing both the requirements and capacity into the utilization formula and
applying the appropriate conversion factors, we have this calculation:
338
R 1,802,616cs 1s 1m 1,802,616
U= = ! ! = = 1.00 = 100%
C 2cores !149.97m 100cs 60s 1,799,640
This means the CPU subsystem is operating on average at 100% utilization! We should
have expected this, since Oracle is consuming 94% of all the available CPU. Calculating both
Oracle CPU utilization and the operating system CPU utilization, we have a very nice
confirmation that this is the only instance running on the database server and also that the
CPU subsystem is experiencing a raging bottleneck.
The utilization formula implies a linear relationship between requirements and
utilization. In other words, if the requirements double, so will the utilization. When you are
asked, “But how do you know it really works like this?” show the graph in Figure 9-7, or
better yet, create one yourself. While no real Oracle system will match this perfectly, for
CPU-intensive systems, the linearity is very evident.
Figure 9-7 is an example of using SQL executions as the workload (logical IO would
have also worked very well). The solid line is the actual sample data plotted, and the dotted
line is a linear trend line added by Microsoft Excel. The correlation between the real data and
the trend line is 0.9328, which represents a very strong correlation! In the upcoming sections,
I will demonstrate how to use this linear relationship when anticipating the impact of a
firefighting solution.
Figure 9-7. Shown is a demonstration of the linear relationship between utilization and
requirements. This graph is not based on theory or mathematical formulas, but data sampled
from a CPU-intensive Oracle system. The solid line is based on the actual sample data, and
the dotted line is a linear trend line. Their correlation coefficient is a very strong 0.9328.
IO Utilization
Just as with CPU utilization, IO utilization can be calculated. However, because of the
possibility of file system buffer caching and IO subsystem caching, our Oracle-focused
utilization calculation is a worst-case scenario (no caching assumed), includes only the
339
instances we sample data from, and does not include any non-Oracle-related IO. So, the worth
of our calculation is limited (at best). The higher worth metric is Oracle’s IO requirements,
which we calculated in a previous section.
The IO team can use Oracle’s IO requirements, apply whatever caching metric they
wish, and also add any other IO-related metrics. While our utilization calculation has limited
value, when comparing the theoretical worst-case utilization with the actual IO subsystem
utilization, it will demonstrate the effectiveness of caching, changing IO subsystem capacity,
and possibly various tuning efforts.
Suppose the IO administrator told you the IO subsystem has a capacity of 250 IOPS.
Earlier, in the “Gathering IO Requirements” section, we calculated that, during the reporting
interval, Oracle processes generate 5.59 IOPS. Once again, using the utilization formula, we
have this calculation:
R 5.59IOPS
U= = = 0.022 = 2.2%
C 250IOPS
So, while the CPU subsystem is running at 100% utilization, if the IO subsystem is
receiving only this specific Oracle instance’s IO requests, and assuming there is no non-
Oracle caching, the IO subsystem would be running at around 2.2% utilization. It appears the
IO subsystem has plenty of capacity.
Oracle Service Time

Service time is how long it takes a single arrival to be served, excluding queue time. If the
arrival is defined to be an Oracle user call, then the service time may be something like 4.82
milliseconds per user call, or 4.82 ms/uc.
While the total service time includes all the time to service transactions within a given
interval, service time is specifically related to a single arrival service. The unit of time should
be in the numerator, and the unit of work should be in the denominator. Depending on your
data source, the information may be provided as work over time. Make sure to switch it to
time over work. If you forget to do this, any other calculation based on the service time (for
example, utilization and response time) will likely be incorrect.
Besides the general utilization formula presented in the previous sections, the classic
utilization formula is as follows:
St !
U=
M
where:
• St is the service time.
• λ is the arrival rate.
• M is the number of transaction servers, such as a CPU core.
It is important to understand the service time and arrival rate are independent and also
have a direct and linear relationship with utilization. Theoretically, when the arrival rate
increases, service time does not increase. What may increase, if the workload increases
340
enough, is the queue time. More practically, if it takes 10 ms to process one user call when the
system is lightly loaded, then it will continue to take 10 ms to process one user call when the
system is heavily loaded. This is why response-time curves are more or less flat until queuing
sets in. Remember that the users do not experience only service time, but the combination of
service and queue time.
As the utilization formula shows and as you might expect, if the arrival rate doubles, the
utilization will also double. In fact, as Figure 9-7 demonstrates, their relationship is not only
linear in theory and in CPU-intensive Oracle systems, it is linear in practice. This is also true
with the service time. For example, if we work on tuning SQL a statement that results in a
more efficient execution plan and achieve a 50% database server CPU consumption decrease,
and nothing else changes, we can expect the utilization to also decrease by 50%. Now suppose
the CPUs were replaced, and the new CPUs can process an Oracle workload twice as fast. In
this case, and if nothing else changes, we would expect the utilization to also drop by 50%.
For precise forecasts, this formula will be slightly adjusted. But when anticipating and
evaluating alternative performance solutions, this works beautifully.
We do not gather service time directly, but instead derive it from existing data. It turns
out that nearly always (and fortunately for us), we have parameters for all but the service
time. As an example, let’s use the data contained in Figures 9-2 and 9-6. Figure 9-2 shows
that during the Statspack interval, on average, 414.9 user calls were processed each second.
This will be our arrival rate: 414.9 uc/s. Based on Figure 9-6, we know the number of CPU
cores is two and as calculated in the operating system CPU utilization at 100%. Solving the
utilization formula for the service time, plugging in the numbers, and converting time to
milliseconds, we have the following calculation:
UM 1.00 " 2 2.00 1000ms

St = = = = 0.00482s /uc " = 4.8ms /uc
! 414.9uc /s 414.9uc /s 1s
Notice if we are careful with the units, the service time naturally results in the unit of
time in the numerator and the unit of work in the denominator.
Deriving the IO service time based on the utilization formula is fraught with problems
because of non-Oracle IO caching. Even more problematic is knowing the actual number of
active IO devices dedicated to an Oracle instance. But it gets worse. Having other IO activity
on a specific instance’s database files IO device further degrades the service time calculation
quality.
In summary, calculating IO service time is unreliable. The good news is that we are
more interested in IO response time, which is easy to collect, as discussed shortly.
Oracle Queue Time

When a transaction arrives into a system ready to be serviced, it may need to wait, or queue,
before servicing begins. Service time does not include wait time; that is, queue time. For
example, when an IO subsystem is 2.2% utilized, the entire IO processing time is virtually all
service time and no queue time.
Queue time can be calculated a number of ways. The simplest way, which is sufficient
for our purposes, is to subtract the service time from the total request time:
Qt = Rt ! St
341
The total request time is more formally called response time and is discussed in the next
section. The units for queue time are the same as for service time, such as milliseconds per
logical IO.
Looking at the classic utilization formula, you see that utilization can increase if the
service time, the queue time, or both increase. As the classic utilization formula indicates, and
as Figure 9-7 demonstrates occurs in CPU-intensive Oracle systems with a consistent
workload mix, when utilization increases, it is because the arrival rate (the workload) is
increasing, not because service time is increasing.
Surely, the total service time is increasing with each arrival, but the service time per
arrival (called simply the service time) remains roughly the same. It is easy to get the two
terms confused. We know that when the workload increases, the total CPU consumption
increases. But along with the CPU consumption increase comes a workload increase. The two
offset each other, keeping the service time the same while the utilization increases.
When CPU-intensive Oracle algorithms begin to break down, you will notice that
service time starts to increase as the arrival rate increases. If you look ahead to Figure 9-9,
you will see a slight service time upward slope. As discussed back to Chapter 3, the CPU-
intensive Oracle latching acquisition algorithm, with its combination of spinning and sleeping,
does a tremendous job of limiting the increase in service time as the workload increases. What
you feel and what users feel when performance begins to degrade is probably queue time
increasing, rather than service time increasing.
In Chapter 4, we covered how CPU and IO subsystems are fundamentally different from
a queuing perspective. The central difference is there is only one CPU queue, but each IO
device has its own queue, so transactions have no choice but to read or write to a given IO
device, regardless of its queue size. This can result in a busy device with a massive queue,
while another device has little or no queue time. As a result, CPU subsystems with multiple
cores exhibit little queue time until they are utilized starting around 70%, whereas IO
subsystems immediately exhibit queue time. As I detailed in Chapter 4, this is true even for
perfectly balanced IO subsystems.
Figure 9-8 contrasts an eight-device IO subsystem (solid line) and an eight-CPU core
subsystem (dotted line) having the exact same service time. We know their service times are
the same because, at a minimal arrival rate when no queuing occurs, their response time is
exactly the same. With the understanding that service time does not change, regardless of the
arrival rate, we know that any increase in the response time is due to queue time.
Figure 9-8 shows that a CPU subsystem can maintain a fairly consistent response time
until it reaches near capacity.2 This means little or no queuing exists until the arrival rate
significantly increases. In contrast, IO subsystems start queuing immediately, as reflected in
the upward-sloping response-time curve.
2
This is true for multicore CPU subsystems. The greater the number of CPU cores, the flatter the response-
time curve and the steeper the elbow of the curve.
342
Figure 9-8. Shown in the classic response-time curve contrasting an eight-device IO

subsystem (solid line) and an eight-CPU core subsystem (dotted line). Even with a perfectly
balanced IO subsystem, without advanced algorithms and a significant amount of IO caching,
IO requests nearly always contain significant amounts of queue time.
Oracle Response Time

From the previous chapters, you know that response time equals service time plus queue time.
In fact, at the highest level, our Oracle firefighting methodology is based on classifying time
in response time’s two foundational categories: service time and queue time. Not only does
this allow a very systematic diagnostic approach, but it also provides a wonderful and natural
bridge between firefighting and predicting the impact of our possible solutions. Before we
move into quantitatively anticipating our solution’s impact, some additional details about
service time, queue time, and response time specifically related to Oracle systems need to be
covered.
The Bridge Between Firefighting and Predictive Analysis

When performing an Oracle response-time analysis (ORTA), we place Oracle server and
background process time into the classic queuing theory buckets: service time and queue time.
Keeping in mind that all Oracle server and background processes are either consuming CPU
or posting a wait event,3 as I’ll detail in the following sections, we naturally transform their
CPU time into service time and their non-idle wait time into queue time. This creates a bridge,
or link, between firefighting and predictive analysis. This bridge is supported by standard
queuing theory mathematical formulas, some already presented in earlier sections, which we
will use to quantify the anticipated results of our firefighting solutions.
Once I present a few more foundational elements, in addition to Figure 9-7, I will
demonstrate how Oracle systems do, in fact, operate in a manner that follows queuing theory,
3
Oracle does not guaratentee all system calls are instrumented. As a result, there can be missing time. Also,
Oracle CPU consumption includes Oracle processes waiting for CPU and also queuing for CPU. As a result, in a
CPU-saturated system, Oracle may report CPU consumption higher than actually occurred.
343
and by performing an ORTA, we can indeed anticipate our proposed solution’s effect. And
this does not apply only to Oracle-centric solutions, but also to application-focused and
operating system-focused solutions.
Total Time and Time Per Workload

When performing an ORTA, we gather all of a category’s time within a sample interval. For
example, consider the data presented in Table 9-1. This hypothetical data was gathered during
a 1-hour interval, during which Oracle server and background processes consumed (required)
50 seconds of CPU time. We will place this 50 seconds of CPU consumed into the service
time category. During this 1-hour interval, Oracle processes completed 20,000 block changes
and 10,000 SQL executions. These are two metrics commonly used to represent the total
workload. The block change service time is therefore 0.00250 s/bc, which is the total service
time divided by the total block change workload (0.00250 = 50 / 20000).
Table 9-1 also details the total queue time and the queue for a single arrival—that is, unit
of work. The point is, as previously stated, there is a difference between the total service time
and the service time, and also between the total queue time and the queue time. In addition,
we can interject potentially useful and relevant arrival rate metrics, such as block changes,
SQL executions, redo entries, block changes, or logical IO. Selecting a useful workload
metric is discussed in the “Response-Time Graph Construction” section later in this chapter.
Table 9-1. Relationships between time components over a 1-hour interval
Time (sec) per Time (sec) per

Time Category Totals
Block Change SQL Exec
Response time 555 sec 0.02775 0.5550

Service time 50 sec 0.00250 0.0050
Queue time 505 sec 0.02525 0.0505
IO time 500 sec 0.02500 0.0500
Other time 5 sec 0.00025 0.0005
Workload
Block changes 20,000
SQL executions 10,000
CPU Service and Queue Time

Back in Chapter 5, I mentioned Oracle has a limited perspective in classifying CPU time
based on when a transaction is being serviced by a CPU or when it is waiting in the CPU run
queue. In other words, Oracle does not have the ability to split CPU time into service time and
queue time. When we gather CPU consumption from either the instance statistics views or the
system time model views, what we collect as CPU time and typically classify as CPU service
time actually contains both CPU service time and CPU queue time. Oracle has no way of
knowing the difference and reports the total, which is, in the truest sense, CPU response time.
344
This type of perspective is common. In fact, computing systems are actually composed
of a series of interconnected queuing systems. This is called a networked queuing system. An
Oracle transaction does not really enter a single large system, wait to be serviced, is serviced,
and then exits to return the result. It enters a complex series of queuing systems, moving from
one system to the next, each with the possibility of queuing and then servicing the transaction.
When the transaction exits the complete system, the sum of all the service time and the sum of
all the queue times are presented as simply the service time and the queue time. So, this type
of abstraction is very common.
This abstraction is not a problem for three additional reasons:
• It allows our performance analysis to move forward without insignificant details
getting in the way of the problem at hand. Our goal should always be to keep
situations as simple as practically possible. Added complexity and precision take
effort and resources that should not be expended unless absolutely necessary.
• All time is accounted for; that is, time is not lost or unaccounted for. It is simply
classified in an abstracted and summarized format.
• Significant queuing begins to occur near the elbow of the curve, which happens
between 80% to 90% depending on the number of CPU cores. When evaluating
alternative firefighting solutions, we want to be nowhere near the elbow of the curve!
When an Oracle database server CPU subsystem is heavily utilized, we know
performance will be significantly degraded. Knowing the degree of “badness” is not important
when evaluating alternative firefighting solutions. So, when performing our ORTA, it is not a
problem to abstract and simply call this value CPU service time.
IO Service and Queue Time

When IO times are gathered using Oracle’s wait interface, from Oracle’s vantage point, it is
actually more of an IO response time. When Oracle issues an IO request to the operating
system, it waits until the IO request is satisfied. When the IO subsystem processes the IO
request, there is service time (perhaps transferring the data) and queue time (perhaps disk
latency and head movement). The gettimeofday system call Oracle issues does not
distinguish IO service and queue time, and therefore Oracle has no way of knowing the
classification. But just as with CPU time, this does not present a problem, primarily because
as performance analysts, we are interested in how long an IO call takes. The time components
of the call can be of interest, but it’s the total time—the response time—we need to know.
When we perform an ORTA, we classify all IO time as queue time subclassification.
This may seem like an unfortunate and desperate abstraction, but it actually fits perfectly from
a database perspective. If no IO occurs, Oracle satisfies all requests consuming only CPU
time. But as the workload increases and some of this work requires IO, response time begins
to increase; that is, queue time begins to increase; that is, IO time begins to increase. The
pattern fits very nicely into an ORTA.
In summary, both our CPU service time and IO queue time abstractions fit very nicely
into an ORTA, providing us with the opportunity to apply predictive mathematics to evaluate
alternative firefighting solutions.
345
Oracle Response Time in Reality

The classic response-time curve in Figure 9-8 highlights the differences between CPU and IO
subsystems. It turns out that real Oracle systems operate somewhere between the two. The
dotted line in Figure 9-8 represents an Oracle system that operates completely and only with
CPU. In others words, there is no physical IO, only logical IO activity. In contrast, the solid
line represents an IO-centric Oracle system. No Oracle system can operate with only IO,
because there must be CPU resources consumed to run processes, which includes processing
the IO once it has been read from disk.
Figure 9-9 graphically shows a system with an intense logical IO load, which consumes
virtually no physical IO resources. While you can see the classic response-time curve, it is not
nearly as nice and neat as the mathematics would have us believe. But this is the reality of the
situation, and as they say, it is what it is. In all fairness, the graph would have looked more
like a theoretical response-time curve if I gathered more samples at each workload (and
plotted the averages) and increased the sample time from 120 seconds to perhaps 360 seconds
or an hour. But I wanted you to see that even with limited data, the CPU subsystem does
exhibit queuing theory characteristics. Every system and every load will produce a different
graph, but from an abstracted view, they will have similarities, and we will use these to
anticipate the impacts of our possible firefighting solutions.
Figure 9-9. Shown is an actual response-time curve based on a heavily CPU-loaded Linux
Oracle Database 10g Release 2 system with a four-CPU core subsystem. The dotted line is
the service time (CPU), and the solid line is the response time (CPU plus all non-idle wait
time), with the difference between the two being queue time (non-idle wait time). The initial
large jump in queue time occurred at 75% utilization, and the last data point occurred at 98%
utilization.
The arrival rate in Figure 9-9, which is the horizontal axis, is simply the number of
logical IOs (v$sysstat: buffer gets plus consistent gets) processed per
millisecond. The service time was calculated by dividing the total service time
(v$sys_time_mode: DB CPU plus background cpu time) by the total number of
346
logical IOs. The queue time was calculated by dividing all non-idle wait time by the number
of logical IOs. From a mathematical perspective, the data collection interval is irrelevant as
long as all the data is gathered during the same interval. But if you are curious, the sample
interval was 120 seconds.
Figure 9-10 graphically shows a system with an intense physical read IO load. Because
the system is experiencing a heavy physical IO load, the response-time curve is likely to
correlate with physical IO-related statistics. For this figure, I chose the instance statistic
physical reads.4 The service time metric is the sum of the time model statistics DB
CPU for server process CPU time and background cpu time for the background
process CPU time.5 The queue time consists of all non-idle wait event time. With only these
simple time classifications, the graph in Figure 9-10 was created. As you’ll see later in the
chapter, we can use graphs like this to anticipate our solution’s impact.
Figure 9-10. Shown is an actual response-time curve based on a heavily read IO-loaded
Linux Oracle Database 10g Release 1 system. The dotted line is the service time (CPU), and
the solid line is the response time (CPU plus all non-idle wait time), with the difference
between the two being queue time (non-idle wait time). The unit of work is physical Oracle
blocks read, which is the instance statistic physical reads. The initial large jump in
queue time occurred when IO read concurrency (wait event, read by other session)
suddenly appeared and eventually become about one-third of all the non-idle wait time.
There is so much to be gleaned from this single graph. While the top wait event was db
file scattered read, notice the queue time for each arrival (it’s the difference
between the response time and the service time lines). Before queuing really sets in, the queue
time (not service time or response time) is around 0.01 ms! This means that while the
requested blocks were not in Oracle’s buffer cache, they were in some other cache. The
4
The instance statistic physical reads signifies the number of Oracle blocks that Oracle server processes
had to request from the operating system because the blocks did not reside in Oracle’s buffer cache
5
I don’t mean to insult your intelligence.
347
operating system was able to provide Oracle a requested block, on average, in around 0.01
ms.
While these details are not shown, when queuing took a second dramatic increase at
around an arrival rate of 7E+06 pio/ms, it wasn’t because the physical devices became busy.
They averaged at around only 3% utilized. Queue time took this big increase because CPU
utilization reached around 80%. Because IO requests were being satisfied primarily with CPU
resources, from an Oracle performance perspective, IO response time was based on CPU
utilization! Since we can see that Figure 9-10 does exhibit response-time curve characteristics,
in this particular situation, we can use a CPU queuing theory model to anticipate Oracle IO
read times.
However, this is a very specific situation (though not as unusual as most people think),
in which the CPU is used heavily to satisfy IO requests. The majority of Oracle systems have
their IO requests satisfied from a dynamic mix of physical disk IO and caching. As a result,
with an IO intensive system our queuing theory mathematics will need to be more IO focused
than CPU focused.
Another challenge when anticipating IO response time is visually demonstrated by the
initial response time jump around an arrival rate of 4.5+E06 pio/ms. This occurred not
because of the IO subsystem, or even the CPU subsystem, reaching capacity. It occurred
because of concurrency issues! This jump in queuing time occurred when the server processes
started asking for the same database blocks to be brought into the cache at nearly the same
time. Eventually, this concurrency issue accounted for about 30% of the total queue time. This
is an Oracle Database 10g Release 1 system, and the currency wait event is read by
other session. In earlier Oracle versions, the wait event would have been buffer
busy wait.
Did you notice the initial drop (not increase) in the response time? This occurs in Oracle
systems because cache efficiencies (Oracle and the operating system) increase as the
workload begins to increase. For IO subsystems, I have seen response-time curves (based on
real data) that look like a smiling face because of the significant cache efficiency effect. This
is what we want to see! Eventually, however, as the workload increases, some component in
the system will reach its capacity limit (in Figure 9-10, it was the CPU subsystem and
concurrency issues), and the classic response-time curve elbow will appear.
Now that I’ve detailed how to collect data and plot the actual response-time graph, it’s
time to move on to creating a response-time graph that is more general and suitable for
anticipating the impact of a firefighting solution.
Response-Time Graph Construction

This is where the real fun—and also the real risk—begins. The moment you draw a picture of
your system, all eyes will be focused on you. Your objective is to convey the situation as
simply as possible, without misleading anyone. Simplicity and abstraction are your friends.
The moment you attempt to be precise or get heavily into the mathematics, you’re doomed.
This book is not about predictive performance analysis, and this is not our focus here either.
Our goals are to convey the situation and anticipate the general effect of our proposed
solutions. Providing more information promotes better decisions about which solutions to
implement and in what order.
While the examples used in this section are based on an entire Oracle instance activity,
everything described can also be applied to a single session or a group of sessions. For
348
example, instead of gathering CPU consumption and wait time from v$sysstat,
v$sys_time_model, and v$system_event, when focusing on a particular session or
group of sessions, use v$sesstat, v$ses_time_model, and v$session_event.
Obviously, to calculate operating system utilization, the v$osstat view will have to be
used. But a session’s or group of session’s contribution to the utilization can be calculated in
the same way as the Oracle instance CPU utilization (which is simply called Oracle CPU
utilization).
Selecting the Unit of Work

When creating a response-time graph representing a real system, it is important to use an
appropriate unit of work. For your graph to provide value—mimic and show any relation to
reality—it must use a unit of work that relates to the queue time issue. For example, as Table
9-2 shows, if the bottleneck is CPU, logical IO processing will mostly likely correlate very
well with CPU consumption. If the bottleneck is IO, the number of SQL executions, the
number of block changes, or the number of physical block reads may correlate very well with
the IO activity. A good unit of work, when increased, will push the response time into the
elbow of the curve.
Table 9-2. Selecting a unit of work based on the bottleneck
Bottleneck Focus Area Instance Statistic

db block gets + consistent gets, session logical
CPU Logical IO
reads
Latching v$latch gets, misses, sleeps
Parsing parse count (hard), parse count (total)
IO Read Physical IO physical reads, physical read requests
IO Write DML db block changes, redo writes, redo bytes
Concurrency Locking enqueue requests, enqueue waits
Commits user commits, rows/commit
SQL*Net roundtrips to/from client,
Network Transfers
SQL*Net roundtrips to/from dblink
sorts (memory), sorts (rows),
Memory SQL Sorting
v$system_event direct path write temp
If you have multiple samples (for example, you are running reports, pulling from the
AWR tables), you will know if a good unit of work has been chosen because the resulting
graph will look somewhat like a response-time curve. As Figures 9-9 and 9-10 demonstrate, it
won’t be perfect, but it should have an elbow in the curve.
A good unit of work will also help you identify the high-impact SQL that deserves
attention, forging a strong link between Oracle, the application, and the operating system. For
example, if there is a CPU bottleneck with the top wait event related to CBC latch contention,
349
then we would normally look for the top CPU-consuming SQL and the top logical IO SQL
(shown as Buffer Gets or simply Gets in AWR and Statspack). If we select logical IO as
our unit of work, we are likely to get a good response-time graph, and because the graph’s
arrival rate is based on logical IOs, we can naturally present how, by identifying and tuning
the high logical IO SQL, we will move out of the elbow of the curve. So, picking a good unit
of work is more than a technical exercise. It is also relevant in communication, performance
improvement strategy, and anticipation of the impact of the proposed solution.
Choosing the Level of Abstraction

Just as when you are asked what you do for a living and start with, “I work in IT,” when
initially and graphically conveying the performance situation using a response-time graph,
start at a very abstract level. Obviously, this is particularly important when presenting to a
nontechnical audience.
First, consider if numbers must be displayed. Showing numbers can lead to detailed
discussions that may not be necessary and can be distracting. If you show numbers, be ready
to answer questions like, “What is a user call and how does that relate to performance?” If
you don’t want to answer this because it is a clear distraction to your objectives, then do not
show these numbers.
I am not advocating misleading or misrepresenting the situation. I am advocating
appropriate abstraction and simplification to get the job done. You can drive down to the
details, but don’t go there unless it becomes necessary.
Figure 9-11 is an example of a very high-level response-time graph. With a graph like
this, you can help your audience understand three fundamental facts:
• Explain that the response time graph is a very abstracted perspective of what users
are experiencing. Make sure they understand as the workload increases, so does poor
performance. And the objective is to get the system out of the elbow of the curve.
Most people inherently understand (even though they may not be able to articulate it)
being in the elbow of the curve is bad and being to the left of the elbow is good. It
then follows that your solutions will somehow move the system out of the elbow of
the curve.
• The workload is so intense it is pushing performance degradation very high.
Highlight the vertical bar in the elbow of the curve (I even included an arrow in
Figure 9-11), so there is no doubt your audience understands the workload is much
too large. Everyone will naturally know that one solution is to reduce the workload.
• Dramatic performance degradation results when operating in the elbow of the curve,
which means even seemingly small workload changes can bring about dramatic
performance changes. This can be very frustrating to users who crave consistency
and dependability.
350
Figure 9-11. Shown is a highly abstracted response-time graph with minimal information. It
is used to convey the performance situation as unacceptable and clearly in the elbow of the
curve. People seem to intuitively know that being in the elbow of the curve is a bad thing.
If you feel it is necessary, then show the numbers. Be ready to explain them, how they
relate to performance, and how your solutions will alter the situation.
Figure 9-12 was created using the same data as the graph in Figure 9-11. The only
difference is that I included the numbers and used standard words (for example, “response
time”) and metrics (for example, “exec” for executions and “ms” for milliseconds). If asked
why the SQL execution metric is relevant, I may respond that there is a CPU bottleneck, and
in this system, the number of SQL statement executions directly impacts CPU consumption,
which affects the response time. As I’ll detail in later sections, you can also state that your
proposed solutions are aimed at reducing the execution rate and the impact of each execution.
After you have shown and described an abstracted response-time curve like the one in
Figure 9-11 or Figure 9-12, if your audience members are technical and will benefit from
seeing real data, and you have multiple samples, then show them a graph containing real data,
like the ones in Figure 9-9 and Figure 9-10. If you do show real data, be very well prepared to
keep control of the presentation, because you will be peppered with questions, many of which
will be irrelevant.
351
Figure 9-12. Shown is the same data as in Figure 9-11, but with slightly less abstraction.
Notice I use more traditional words, such as “response time” and “arrival rate,” and include
numeric values. If you include technical words and numbers, be prepared to explain what
they mean and how they relate to the performance situation.
As I’ll detail in the next section, using basic query theory math, you can construct a
graph similar to Figure 9-11 or Figure 9-12 with only a single peak time 1-hour interval
sample (for example, from Statspack or AWR).
The Five-Step Response-Time Graph Creation Process

To help you get started creating a response-time graph, I created a five-step process. You can
use this process regardless of the database server bottleneck and even if you have a single
sample or hundreds. Enjoy!
Know the System Bottleneck

If the database server is the bottleneck, then the database server bottleneck will be either CPU,
IO, or some lock/blocking (for example, enqueues) issue. Your graph will reflect either the
general queue time increase of an IO bottleneck or the steep and dramatic elbow of a CPU
bottleneck. Figure 9-8 is a good guide, as it contrasts both the CPU and IO bottlenecks.
Based on v$osstat data shown in Figure 9-6 and the reporting interval shown in
Figure 9-2, we calculated in the subsequent sections the server is running at 100% CPU
utilization. While the wait event situation is not shown, the Statspack report shows the top
wait event is clearly CBC latch contention. Based on the instance CPU consumption data
shown in Figure 9-3, the reporting interval shown in Figure 9-2, and the CPU core number
shown in Figure 9-6, we calculated an Oracle CPU utilization of 94%. Clearly, there is a CPU
bottleneck.
352
Pick an Appropriate Unit of Work

When you choose an appropriate unit of work, the response-time graph will be a good
representation of the real system. This will make presenting the graph very natural and
understandable, and will naturally lead into your performance solutions discussion.
Following our example of a raging CPU bottleneck, we will use logical IOs as our unit
of work. Logical IOs consist of all buffer touches. Oracle distinguishes current mode buffer
touches by the db block gets instance activity statistic, and the consistent mode buffer
touches are signified by the consistent gets statistic. These two statistics will be
combined to produce a single logical IO statistic. Based on the instance statistics shown in
Figure 9-2, 1,307,632,010 logical IOs occurred during the Statspack reporting interval.
Determine the Service Time and Queue Time

As detailed in the previous sections, for each of your samples (perhaps a single sample or
hundreds), get the sample interval time, total CPU consumption (total service time), total non-
idle wait time (total queue time), and workload for your selected unit of work (total arrivals).
Then for each sample, calculate the arrival rate, service time, and queue time.
Continuing with our example, Figure 9-2 shows the sample interval to be 149.97
minutes in which the logical IO value (sum of db block gets and consistent gets)
is 1,307,632,010. Here is the arrival rate math:
lio 1,307,632,010lio 1m 1s
! lio = = " " = 145.32 lio ms
time 149.97m 60s 1000ms
Based on Figure 9-3, the total service time is 16,881.6 seconds, or 16,881,600
milliseconds. Determine the service time by dividing the total service time by the unit of work
value. Here is the service time math:
St:tot 16,881,600ms
St = = 0.0129 ms lio
! work:tot 1,307,632,010lio
Determine the queue time by dividing the total queue time by the unit of work value. For
Oracle systems, the total queue is all the non-idle wait time that occurred during the sample
interval. Most Statspack and AWR reports have a Top 5 Timed Events section near the top of
their reports. This is simply the top four most time-consuming wait events and also the CPU
time. Usually the top four wait events account for 90% or more of all the non-idle wait time.
For our required level of precision, we can simply sum the wait time for the top four wait
events. While the details are not shown, their combined wait time is 45,672 seconds during
the sample interval. Here is the queue time math:
Qt:tot 45,672s 1000ms

Qt = = " = 0.035 ms lio
! work:tot 1,307,632,010lio 1s
353
If Possible, Compare Utilizations

For CPU subsystems, you can compare the actual CPU utilization (perhaps gathered from
v$osstat or vmstat) with the classic utilization formula. If you picked a good unit of
work, the difference should be within 15%. If the bottleneck is the IO subsystem, because of
caching and batching, utilization comparison may be interesting, but it is unlikely to closely
match or provide much value.
Continuing with our example, the actual CPU utilization, based on the v$osstat
statistics shown in Figure 9-6, is 100%. To derive the CPU utilization, enter the calculated
arrival rate, service time, and number of CPU cores (also shown in Figure 9-6) as follows:
St ! 0.0129 ms lio "145.32 lio ms 1.87

U= = = = 0.94 = 94%
M 2cores 2
As we hoped, we are within 10%. If we are not within 15%, we can still use our graph
for informational purposes, but for numerically quantifying and anticipating our solution’s
impact (as described later), it will not be reliable.
Create the Response-Time Graph

It is finally time to introduce the basic response-time graph formula. The following is the
general response-time formula:
Rt = St + Qt
Here is the response time formula for CPU focused systems:
St St
Rt:CPU = St + Qt = =
# S "%
M
1! U M
1! t
$ M&
Here is the response time formula for IO focused systems:
St St
Rt:IO = St + Qt = =
1! U # S "%
1! t
$ M&
The first equation above simply states that response time is the sum of service time and
queue time. The second and third equations show the CPU and IO response time formulas
respectively including the utilization symbol and also with the utilization formula, which can
be handy if the utilization is unknown.6
Figure 9-13 is the response-time graph based on the Statspack report used throughout
this chapter and on which many of the preceding values and calculations are based (including
6
There is a more precise response-time formula based on Mr. Agner Krarup Erlang’s (1878-1929) famous
ErlangC formula used to study telephone networks. For details on this formula, see Forecasting Oracle Performance
(Apress, 2007), Chapter 3, “Increasing Forecast Precision.”
354
the queuing theory calculations shown in this section). The actual graph was constructed by
inputting the core statistics of sample interval timer, total workload units, total CPU
consumption, number of CPU cores, and total non-idle wait time into OraPub’s response-time
graph template (a Microsoft Excel-based tool).7
Figure 9-13. Shown is the response-time graph created using OraPub’s response-time graph
template. This particular graph is based on the data shown in the calculations in this section.
The peak arrival rate is clearly beyond what the system can process, and as we would expect,
severe performance problems are occurring.
Notice the peak arrival rate is deep in the elbow of the curve. The system is so busy and
the peak arrival rate intersects the response-time curve so high up that it dwarfs the service
time. It shouldn’t take much effort to convince your audience of this dire situation, preparing
them to embrace your solutions about how to get out of the elbow of the curve.
Before we embark on anticipating our performance solution’s impact, let’s look an
another example.
A Response-Time Curve for an IO-Bottlenecked System

The previous example was based on a two-CPU core system experiencing a raging CPU
bottleneck. Here, we will walk through the same process, but with a larger system
experiencing a classic multiblock IO read bottleneck.
This example is based on a real Oracle Database 10g Release 2 system with four CPU
cores. The performance data is based on a standard 60-minute interval AWR report. We will
complete each of the five steps outlined in the previous sections, resulting in the response-
time graph.
Know the System Bottleneck

The database server bottleneck is the IO subsystem. Simply put, Oracle’s IO read
requirements have exceeded the IO subsystem’s read capacity. We will expect our response-
7
This tool is available for free from OraPub’s web site. Locate it by searching for “firefighting.”
355
time graph to reflect the general queue time increase of an IO bottleneck, which has a
continual and steady increase in queue time until the elbow of the curve is reached, and then
response time skyrockets.
While the wait event situation, shown in Figure 9-14, looks like an IO bottleneck,
especially with the 20 ms average db file scattered read time, there could be also
be a CPU bottleneck. To double-check, calculate both the operating system and the Oracle
CPU utilization. Using Figure 9-15 to calculate Oracle CPU requirements and considering
both server process (DB CPU of 2,065.87 seconds) and background process (background
cpu time of 25.95 seconds) CPU consumption, the total instance CPU consumption is
2,091.82 seconds.8 The database server CPU capacity is based on the four CPU cores and the
60-minute reporting interval.
Figure 9-14. Shown is a snippet from both v$system_event (wait events) and
v$sys_time_model (v$sys_time_model, DB CPU). Oracle does not include
background CPU when calculating “CPU time.”
The following is the Oracle CPU utilization calculation:
R 2,091.82s 1m 2,091.82
U= = ! = = 0.147 = 15%
C 4cores ! 59.31m 60s 14,234.40
8
When the Statspack and AWR reports base CPU conumption on the v$sys_time_model view, they
incorrectly include only server process CPU consumption (DB CPU) in the Top 5 Timed Event report, and do not
include any background process CPU consumption (background cpu time). Notice the DB CPU time shown in
Figure 9-15 matches the CPU Time statistic shown in Figure 9-14.
356
Oracle processes are consuming only 15% of the available CPU capacity. Unless there is
another instance or other processes consuming CPU, we would expect the operating system
CPU utilization to between around 1% to 10% higher than the Oracle CPU utilization. While
not shown in a figure, the v$osstat BUSY_TIME statistic is 228,056 cs, which means all
operating system processes during the 60-minute interval consumed 2,280.56 seconds of
CPU. Placing the CPU consumption (requirement) value into the utilization formula, we see
the operating system CPU utilization is only 16%. So at this low CPU utilization, the
operating system overhead is minimal.
R 228,056cs 1s 1m 228,056
U= = ! ! = = 0.160 = 16%
C 4cores ! 59.31m 100cs 60s 1,423,440
Figure 9-15. Shown is an AWR report snippet from the v$sys_time_model used in this
exercise. The total CPU consumption (service time) during the 60-minute interval is 2,091.82
seconds. This includes both server process (DB CPU) and background process
(background cpu time) CPU consumption.
Clearly, there is no CPU bottleneck. Combined with the Top 5 Timed Events report
snippet shown in Figure 9-15, we can see there is an IO bottleneck.
357
Pick an Appropriate Unit of Work

Because it’s obvious that there is an IO read bottleneck, the number of server process IO read
requests should be a good unit of work. This is the instance statistic (v$sysstat)
physical read IO requests. Oracle tracks physical IO by SQL statement, allowing
our response-time mathematics, the response-time curve, our performance-improving
strategy, and communication to be easily understood and well founded. If there were an IO
write bottleneck, the instance statistic db block changes would be another good unit of
work candidate.
Determine the Service Time and Queue Time

The AWR Instance Statistics section showed the physical read IO requests
(breads for short) statistic to be 148,439. Before we calculate the service and queue times, the
arrival rate based on our unit of work needs to be calculated. Here is the arrival rate math:
breads 148,439breads 1m 1s 148,439

!= = " " = = 0.042breads ms
time 59.31m 60s 1000ms 3,558,600
Based on Figure 9-14, the total service time is 2,091.82 seconds, or 2,091,820
milliseconds. Determine the service time by dividing the total service time by the unit of work
value. Here is the service time math:
St:tot 2,091,820ms
St = = = 14.09 ms breads
! tot:breads 148,439breads
Determine the queue time by dividing the total queue time by the unit of work value.
While the detailed wait event listing is not shown, even by looking at the top four wait events
shown in Figure 9-15, we can infer these account for 90% or more of all the non-idle wait
time. For the necessary level of precision, I simply added the wait times for the top four wait
events. Their combined wait time is 1,867 seconds during the sample interval. Here is the
queue time math:
Qt:tot 1,867s 1000ms

Qt = = " = 12.58 ms breads
! tot:breads 148,439breads 1s
If Possible, Compare Utilizations

Since this system is undergoing an IO bottleneck, computing the IO utilization will not add
much value and may actually cause more unimportant questions to be asked (creating
unnecessary distractions).
Create the Response-Time Graph

Creating the IO response-time graph is a little tricky because you never see response time
solved for the number of devices (M). Looking at the IO focused response-time equation
below, we know every variable except for M. We know and have calculated above the
358
response time’s core components, service time and queue time. Here is the core IO centric
response time formula:
St
Rt:IO = St + Qt =
# S "%
1! t
$ M&
Solving the IO focused response time formula for M:9
St ! (St + Qt )
M=
Qt
Based on this exercise’s AWR data, the various values gathered and derived were
entered into OraPub’s response-time graph template tool, resulting in the graph shown in
Figure 9-16.10 The tool is very simple to use and requires only the data presented in this
example.
Figure 9-16. Shown is the response-time graph for this exercise. It shows the service time,
queue time, response time, and the arrival rate as reported from the AWR report.
With the key queuing theory calculations performed and the response-time graph
created, we are nearly ready to move on to anticipating the performance improvement impact
of our various solutions. However, before we get to that topic, it is important to understand
the ways we can alter the users’ experience.
9
To check the math, go to http://wolframalpha.com and enter, s+q=(s/(1-(s l/m))), solve m
10
To create graphs highlighting your area of interest, it is often helpful to set the maximum values for the x
axis and y axis. This can be done by right-clicking on the axis, clicking on the scale tab, and manually entering the
maximum axis value.
359
How to Improve the Performance Situation

When it comes to improving performance, the bottom line is to get out of the elbow of the
curve. As I have mentioned, when presenting a response-time curve, even nontechnical
audiences quickly grasp that being in the elbow is “bad” and being out of the elbow is “good.”
Use this intuitiveness to demonstrate—even at a very high level—your performance-
improving strategies. This will build confidence in your solutions and also help more
effectively rank them.
Tuning: Reducing Requirements

Tuning Oracle, the application, or the operating system effectively reduces its requirements.
For example, instead of a SQL statement consuming 5 seconds of CPU, it now consumes only
2 seconds of CPU. Thinking about the basic utilization formula of requirements divided by
capacity, if requirements decrease and capacity remains the same, then the utilization must
decrease. The only way to increase the utilization once again is to increase the requirements.
One way to do this is to increase the workload; that is, the arrival rate. So, through tuning, we
have provided the basic performance-enhancing options of decreased response time, increased
throughput, or some combination of both.
From a queuing theory perspective, what really happens when service time drops is that
a new response-time curve takes effect. Because the service time decreases, with no load on
the system and therefore no queuing, the response time with minimal arrivals is less. So, the
curve has shifted down. But it gets better. Because each transaction server (for example, a
CPU core) can process each arrival quicker, it can process more arrivals per unit of time
before queuing sets in, which shifts the graph to the right. So, tuning shifts the response-time
curve down and also to the right!
Figure 9-17 graphically shows how tuning can affect a system. Starting at point A, the
performance is unacceptable and highly variable. By tuning the application, Oracle, or the
operating system, the response time decreases (that is, improves), and the system is operating
at point B. However, now the administrators have a choice. By controlling the workload (the
arrival rate), they can allow more work to flow the system without affecting the response
time. Point C shows this negligible affect on response time by allowing the arrival rate to
increase. So again, tuning provides the performance analyst with several options: decreased
response time, increased workload, or a managed combination of both!
360
Figure 9-17. Shown is the response time effect of tuning. By tuning, a new response-time
curve takes effect (dotted line), and the response time drops from point A to point B. By
controlling the workload, performance can remain at point B or by allowing the workload to
increase to point C, the system can still maintain both improved response time and an
increased workload.
Buying: Increasing Capacity

When additional or faster CPUs or IO devices are added to a system, we have effectively
increased capacity. For example, because the old CPUs were replaced with CPUs that are
twice as fast, instead of a SQL statement consuming 4 seconds of CPU, it now consumes only
2 seconds of CPU. Or perhaps six additional CPU cores were added. Thinking about the basic
utilization formula of requirements divided by capacity, if capacity increases and the
requirements remain the same, then the utilization must decrease. The only way to increase
the utilization is to increase the requirements. One way to do this is to increase the workload
(the arrival rate). So, by increasing capacity, we have provided the basic performance-
enhancing options of decreased response time, increased throughput, or some combination of
both.
From a queuing theory perspective, what really happens when capacity is added depends
on if additional transaction processors (think more CPU cores) are implemented or the
transaction processors are faster (think faster CPUs)—or if we’re lucky, both. If the
transaction processors are faster, the service time decreases with the same effect as with
tuning. We can expect a new response time curve similar to the one shown in Figure 9-17 to
take effect. However, if we add transaction processors with no change to service time, the
response-time curve does not shift down. But because there are more transaction processors
available, as a whole, they can process more transactions per unit of time, which shifts the
curve to the right, allowing for an increase in the arrival rate before queuing sets in.
Figure 9-18 graphically shows how adding more transaction processors can affect a
system. If the bottleneck is IO, then the same general effect occurs when adding IO devices.
Starting at point A, the performance is unacceptable and highly variable. By implementing
addition transaction processors, the response time decreases (that is, improves), and the
361
system is operating at point B. However, now the administrators have a choice. By controlling
the workload (the arrival rate), they can allow more work to flow through the system without
affecting response time. Point C shows this negligible effect on response time by allowing the
arrival rate to increase. So, by adding more capacity (either more and/or faster transaction
processors), the performance analyst once again has several options: decreased response time,
increased workload, or a managed combination of both.
Figure 9-18. Shown is the response-time effect of increasing capacity by adding transaction
processors (for example, CPU cores). By adding CPU cores, a new response-time curve takes
effect (dotted line). The response time drops from point A to point B. By controlling the
workload, performance can remain at point B, or by allowing the workload to increase to
point C, the system can still maintain both improved response time and an increased
workload.
Balance: Managing Workload

Workload management can provide arguably the most elegant of all performance
improvements. And of all the missed performance-improving opportunities, I would say better
workload management has got to be near the top. While shifting workloads may not be a very
satisfying technical challenge (though it can be), when the workload is better managed, peak
workload and painful performance periods can be dramatically improved. And all this can
occur without tuning Oracle, the application, or the operating system, and without any capital
investment.
Suppose around time 13 in Figure 9-19 is when users are extremely upset. It’s not time
23, because the users are asleep and the batch jobs are running just fine. The performance
analyst must determine what is occurring—that is, the workload mix—during time 13 and
work with the user community to shift a segment of that workload, perhaps to time 15. While
this may seem unlikely, when confronted with a severe performance problem, a graphic
clearly showing the situation (for example, Figure 9-19), users can be surprisingly flexible.
But if they are told to change the way they work without understanding why, they will most
likely rebuff any attempt to alter the workload.
362
Figure 9-19. Shown is a workload graph, which appears to have ample workload-balancing
opportunities. By moving some of the workload during painful peak processing time to
nonpeak processing times, the workload requirements during peak times are effectively
reduced. This decreases response time, allows for increased workload of a specified type, or
some combination of both.
In some cases, the users may not even be aware of the workload shift. For example,
during a consulting engagement, I noticed a messaging process woke every 30 seconds to
check for messages to transfer and then performed the transfer. I discovered even with only a
few messages, there was a tremendous amount of overhead. I asked the application
administrator (not the end users) if the message process could wake up every 5 minutes just
during the peak processing times. To my surprise, he willingly embraced this rather elegant,
zero downtime, and zero capital investment performance-improving solution.
From a queuing theory perspective, when the workload is better balanced, the arrival
rate is reduced. Figure 9-20 graphically shows when the arrival rate is decreased, we moved
from point A to point B. When the arrival rate is decreased, system requirements decrease,
resulting in a utilization decrease, as well as a response time decrease. Unlike with the tune
and buy options, there is no response-time curve shift. What has shifted is the workload; that
is, the system has traveled along the response-time curve. This is usually difficult to initially
understand. But consider that when the workload has decreased, there is no change in the
service time, as it takes a transaction processor just as long to process a single transaction as
before. Therefore, the response-time curve does not shift down. The response-time curve does
not shift to the right because no additional capacity has been implemented. What changed is
the arrival rate, so we simply move along the existing response-time curve as the response
time decreases. As we move to the left, while service time remains the same, the queue time
will decrease, resulting in an improved response time.
363
Figure 9-20. Shown is the effect of workload balancing on response time. During peak
processing time, the response time is at point A. By shifting some of the workload to another
time, the arrival rate is reduced from 24 trx/ms to 18 trx/ms, resulting in a significant
response time reduction. Notice the response-time curve does not shift, but rather the system
activity has shifted.
Anticipating Solution Impact

Now, we finally answer the question posed at the beginning of the chapter: What kind of
performance improvement can we expect? I’ll start with some direction and words of caution.
Then I will move directly into a series of exercises to demonstrate step by step how you can
creatively apply all the material presented in this book to anticipate the impact of a solution
and improve the situation.
Simplification Is the Key to Understanding

With our performance-improving objective in mind, we will continue to responsibly simplify
and to use abstraction. Simplification is the key to understanding, and to communicating
technical concepts and information. We want to make our performance presentation
memorable. We want to motivate change.
By making the numbers and the concepts simple, your audience will quickly understand
you and be able to draw the same striking conclusions as you have. Purposely and deliberately
be imprecise, while not misleading or being incorrect. For example, unless it is absolutely
necessary, do not show a number like 3.14156; instead, show 3.
If you are successful, your audience will come away with the same conviction you have
in implementing the solutions. In addition to this, the decision makers will have useful
information they understand, allowing them to determine what solutions should be
implemented, in what order and when, and by which group.
In summary, to be understood, simplify.
364
A Word of Caution
As we move more fully into anticipating change, which is a gentler term for forecasting,
predictive analysis, and capacity planning, be very cautious. The concepts and techniques I
have presented in this book so far, and what remains, are not meant for deep predictive
analysis. As I continue to state, our objective is to anticipate the impact of solutions. Use
general and imprecise words and numbers to convey change, movement, and direction.
The main reason the forecasting techniques presented in this book are not up to
predictive analytics snuff is because they are not validated and may be based on a single
sample. The math is fine, but I have purposely not brought you through the steps to create a
robust forecast model and then to validate the model so you can understand its precision and
usefulness. For our objectives, which are very general and imprecise, the increased precision
and complexity are not necessary. If you desire more precision, a number of technical papers,
books, and some training opportunities are available.11
Full Performance Analysis: Cycle 1

Here’s the situation: Users are angry, very angry. The Oracle Database 10g Release 2 online
system residing on a single four-CPU core Linux server is taking much too long when simply
querying for basic information, like a customer. But it’s not just one, two, or three queries—
it’s the entire system. You have been assigned to diagnose the system, recommend solutions,
and clearly explain the solutions, including reasonable expectations for their impact on
performance.
The worksheets shown in the figures in the remainder of this chapter are all contained
within a single Microsoft Excel workbook, the firefighting diagnostic template workbook.12
Only the yellow cells require input. If you enter information into a cell that isn’t shaded
yellow, you are typing over a formula! Data entered in one of the worksheets may be
referenced in another worksheet. For example, the Oracle CPU consumption shown in the
Oracle-focused worksheet (Figure 9-21) is referenced in the operating system CPU analysis
worksheet (Figure 9-22). All the data-entry fields have been previously discussed. If you need
more information about their source, please review the appropriate section in this book.
Oracle Analysis
To summarize, the Oracle subsystem is being forced to ask for blocks outside its cache. While
the operating system returns these blocks extremely fast, the number of requests results in a
significant portion of the total response time. From a purely Oracle perspective, we can easily
reduce the queue time by 20% by simply increasing Oracle’s buffer cache.
Figure 9-21 provides the core Oracle diagnostic information collected over a 30-minute
interval in a response-time analysis format. At this point in the book, you should know the
service time CPU information came from v$sys_time_model and the queue time
information came from v$system_event. All wait events that consumed more than 5% of
the wait time during the reporting interval are included in this analysis and shown in Figure 9-
21. Clearly, the top wait event is db file scattered read, yet the average wait time
is only 0.093 ms! So, it’s the classic situation where the requested blocks are not in Oracle’s
11
These are listed on the OraPub web site. To find them, review OraPub’s training schedule and/or do a search
for “forecast.”
12
This tool can be downloaded for free from OraPub’s web site. Just do an OraPub search for “firefighting
template.”.
365
buffer cache, but the operating system retrieves them very quickly. If the system were
bottlenecked, we would expect to find a raging CPU bottleneck. Otherwise, the sheer number
of buffers Oracle must process, combined with the CPU speed, is resulting in unacceptable
performance.
Figure 9-21. Shown is the ORTA information entered into a firefighting diagnostic template,
which makes diagnosing, analyzing, and anticipating change impact much simpler. Clearly,
db file scattered reads events are the issue. While the CPU subsystem capacity is
not shown, Oracle is consuming only 26% of the available CPU resources.
Calculating Averages
I’ve included one new piece of information in Figure 9-21. Notice that column K is the
average type (Avg Type). Two different types of average calculations are shown in Figure 9-
21: straight and weighted.
While the straight average for the IO read wait times is not shown in the figure, it is
calculated as follows:
0.093 + 0.107 + 0.010 0.210

Avg = = = 0.070
3 3
However, the scattered read wait time of 0.093 ms occurs much more frequently than the
other waits. A more accurate average calculation would take into account the occurrences of
the scattered read waits in addition to its average event wait time. This is called a weighted
average, and it is a much better average calculation when working with diverse and highly
variant data sets, as we have here.
The average calculation, weighted by the total wait time (which includes the weight
occurrences) is shown in Figure 9-21. It is calculated as follows:
( 754.650 ! 0.093) + (5.410 ! 0.107) + ( 4.520 ! 0.010)

WA = = 0.093
754.650 + 5.410 + 4.520
366
If you think about it, the weighted concept makes sense. Because the scattered read
waits happen so much more often, the average IO read time should reflect this and be pulled
toward the scattered read wait times. As Figure 9-21 shows, the weighted average value
actually rounds to the average scattered read wait time of 0.093. While the difference may
seem insignificant, not only can this have a dramatic effect when anticipating the impact of a
performance solution, but it also makes the averages more realistic.
Reducing Queue Time
Our Oracle-focused solutions will concentrate on service time, queue time, or both. One
solution to reduce the scattered read waits is to increase Oracle’s buffer cache. There is plenty
of memory, and (as shown shortly) there is also plenty of CPU available to handle the
possible increase in cache management resources. Based on the size of the tables involved, a
1GB buffer cache should be able to cache the entire customer table.
Because the total queue time accounts for nearly 30% (28.9%) of the total response time,
if queue time is eliminated, total response time could improve by as much as 30%. But there
will likely be some other queue time, so to be conservative; let’s say we anticipate a 20%
decrease in queue time.
Reducing Service Time
Total service time accounts for nearly 70% of the sample interval’s total response time.
Clearly, there is a opportunity here for improvement. How we reduce the service time may not
be so easy in practice. While there are possibilities to reduce service time from both an
operating system and an application perspective, from an Oracle perspective, a
straightforward tweak is not apparent. This is not a problem because of the potentially
massive performance improvement achieved by increasing the buffer cache to reduce the total
queue time and also by tuning key CPU-intensive SQL statements (as explained in the
discussion of the next analysis cycles).
Operating System Analysis

To summarize, the operating system is not experiencing a shortage of capacity in the classic
sense. The Oracle system is predominantly consuming CPU resources, yet due to Oracle and
operating system scalability limitations, the operating system CPU is only 28% utilized. As a
result, an Oracle server process is primarily bound by CPU speed, which translates into
service time.
Figure 9-22 provides the metrics for our operating system investigation. While not
shown, neither the network or memory subsystem is an issue.
367
Figure 9-22. Shown is the operating system analysis information entered into a firefighting
diagnostic template. There is no CPU or IO bottleneck. Oracle is consuming 26.1% of the
CPU resources, and based on vmstat observations and v$osstat data, CPU utilization is
around 28%. The IO subsystem is responding to IO requests in less than 1 ms! All IO data
was gathered from the Oracle v$sysstat performance view.
CPU Subsystem
The CPU subsystem consists of a single four-CPU core. Based on both vmstat observations
and v$osstat view data, on average, the CPUs are about 28% utilized. Over the 30-minute
data collection interval, the CPU subsystem has the capacity to supply 7,203 seconds of CPU.
Based on the Oracle service time analysis, Figure 9-21 shows Oracle consumed about 1,880
seconds of CPU, meaning Oracle consumed about 26% of the available CPU resources.
From a CPU subsystem perspective, it is not possible to increase scalability by somehow
splitting a single Oracle server process activity onto multiple CPU cores. Our only option is to
decrease total service time to use faster CPUs. Because of cost and budgetary timing issues,
we do not want to entertain this option unless absolutely necessary. So at this point, we will
not seek to improve performance by increasing the CPU subsystem capacity.
IO Subsystem
Based solely on the v$sysstat performance view, the IO subsystem is receiving read
requests at nearly 530 MB/s. Oracle read requests (db file scattered read) are
being satisfied in less than 1 ms, which indicates the Oracle blocks reside in the operating
system buffer cache! While not shown in Figure 9-22, the average IO device utilization is
around 2%, meaning they are idle.
368
The Oracle and application tuning strategies are intended to reduce the number of IO
read calls, making an increase in IO activity and subsequent IO performance issues highly
unlikely.
Application Subsystem
To summarize, by reducing both physical and logical block activity, performance can be
significantly improved. This means SQL tuning and/or reducing SQL statement execution
rates. Figure 9-23 shows there are three high-consuming physical IO SQL statements, with the
top statement consuming more than twice as much physical IO as the second and third ones
combined! Figure 9-24 shows the system is processing nearly 70 logical IOs each
millisecond.
Figure 9-23. Shown is the essential application SQL information entered (obviously copy and
pasted) into a template. All the information was gathered during the 30-minute collection
interval from v$sql and represents only what was processed during the collection interval.
Notice the most resource-consuming statement is not the slowest and consumes no more
resources per execution than other statements. It’s the combination of execution rate and per-
execution resource consumption that makes it stand out.
The Oracle analysis has directed us to the most important application SQL, which is
SQL needing blocks that do not currently reside in Oracle’s buffer cache. By focusing on
SQL with the highest physical IO consumption, we can significantly reduce the application
impact. There is no guessing or gut feeling about this. It is a fact. However, we expect the
Oracle-focused solution of increasing the buffer cache to have a profound impact, and the
change requires only a single parameter adjustment and an instance cycle. We will want to
reanalyze the situation during the second analysis cycle. So at this point in the analysis, we
will wait before suggesting any application changes.
Figure 9-24 shows common workload metrics we will combine with our Oracle and
operating system analysis when building our response-time graphs and anticipating change.
Figure 9-24 also provides two distinct informational aspects: the workload metrics in both
seconds and milliseconds, and response-time-classified details. It provides these details by
calculating the appropriate resource consumed (for example, CPU consumption) divided by
the workload metric activity during the reporting interval. For example, each logical block
processed consumed 0.01507 ms—that is, 0.01507 ms/lio. This is the logical IO service time
and can be useful when constructing a response-time curve based on logical IO activity.
369
Figure 9-24. Shown is the workload diagnostic information. Notice only the total interval
workload values and the interval (sample) time require entry. The workload information,
combined with the ORTA, provides a plethora of diagnostic data we will use when
anticipating performance solution impact.
Response-Time Graphs
Our ORTA shows response time can be significantly reduced by focusing on both physical
block IO (queue time) and logical block IO (service time). To more clearly convey the
situation and help others come to the same performance-enhancing conclusions as we have,
we will create two response-time graphs: one focused on the CPU subsystem and the other on
the IO subsystem.
Figure 9-25. Shown is a response-time graph created using OraPub’s response-time graph
template based solely on data shown in this example’s related figures. This response-time
graph focuses on the CPU subsystem, so we chose logical IO as our workload metric. As
expected, the system is not operating in the elbow of the curve. Since there is virtually no
queue time per logical IO processed, improving performance will be the result of decreasing
service time by either influencing the optimizer to choose a better execution plan, tune Oracle
to be more efficient, or use faster CPUs.
Figure 9-25 shows the response-time graph based on logical IO processing during our
reporting interval. The response-time graphs for this example were created as described
370
previously in this chapter, using OraPub’s response-time graph template. The logical IO
workload metric was chosen, since it typically has a high correlation to CPU consumption.
Because there is virtually no queue time related to process a logical IO, to reduce the
LIO response time, the service time will need be decreased. The trick to reducing service time
is to figure out a way for each logical IO to consume less CPU. There are many ways to do
this; using faster CPUs, tuning Oracle to be more efficient, or influencing the optimizer to
choose a more efficient execution plan. During the second analysis cycle, we will focus on
this tuning approach.
Figure 9-26. This response-time graph focuses on the IO subsystem, so we chose physical IO
as our workload metric. IO subsystems nearly always exhibit some queue time, and this
situation is no different. Physical IO requests do include a significant amount of queue time,
so we have multiple ways to reduce the physical IO-related response time. However, on this
system, physical IOs are satisfied so quickly that the best course of action is to simply
eliminate them by increasing the buffer cache.
Figure 9-26 shows the response-time graph based on physical IO processing during our
reporting interval. The physical IO workload metric was chosen because it typically has a
high correlation to IO requests and directly relates to our application analysis.
As expected, there is significant queue time involved with our IO requests. As presented
previously, there are multiple ways to reduce the queue time and also the service time. One of
our performance-improving strategies is to virtually eliminate all physical IO requests,
essentially changing the arrival rate to zero. While the service time theoretically will not
change, because the number of physical IO requests will be drastically reduced, Oracle will
not need to spend so much time placing blocks into the buffer cache. This effectively reduces
the CPU time spent per logical process, resulting in a reduction in service time. It will be
interesting to see what actually occurs!
What-If Analysis
Now let’s combine our recommendations with response-time mathematics to anticipate
change. The second analysis cycle will show the actual effect of our changes!
371
To summarize this exercise’s performance situation, the online users are experiencing
poor performance due to Oracle being required to retrieve blocks from the IO subsystem and
then process them. There is plenty of CPU, IO, and memory capacity. It just needs to be
shifted to maximize performance. The planned shifts are to increase Oracle’s buffer cache to
virtually eliminate all physical IO requests and to tune the most CPU-consuming SQL
statement. Both changes will have a dramatic performance improvement impact.
It is always best and more reliable to focus on one change at a time. As anyone working
in IT has experienced, multiple simultaneous changes can have unanticipated effects. We
need to know the impact of each change. Therefore, only one change will be implemented at a
time.
Because it’s the easiest to implement and should result in a significant performance
improvement, we will increase the buffer cache first. Increasing the buffer cache to 1GB will
effectively cache the customer table, resulting in virtually no physical IO requests. Figure
9-26 shows if the physical IO arrival rate drops to nearly zero the resulting response time will
not significantly be reduced. So our goal is to, as best we can, eliminate the number of actual
physical IO requests. Figure 9-21 shows physical IO accounts for nearly 30% of the total
Oracle response time. By eliminating nearly all the physical IO, assuming the users don’t
perform more work and we don’t hit some locking type of performance issue (for example,
row-level locking), physical IO intensive process performance will improve by 30%.
Users who unknowingly run multiple queries at the touch of a button, waiting for the
application to return control to them and getting upset, are highly likely to feel the effect of
the Oracle in-memory queries! Figure 9-24 shows that both logical and physical IO response
times are around 0.022 ms. In Figure 9-23, notice that all of the top ten statements have nearly
the same number of physical IO and logical IO activity. This means that by eliminating the
physical IO—that is, ensuring the tables are completely cached—we would expect the elapsed
time to drop by half. Figure 9-23 shows the top SQL statement average elapsed time is 0.632
sec/exec. By increasing the Oracle buffer cache, average elapsed time is likely to drop to
perhaps 0.316 sec/exec (0.632/2).
With an increase in performance (decrease in response time), users may perform work
more rapidly, increasing the workload. The more time users are sitting and waiting for the
application to return control to them, the more of a workload increase we can expect to see. It
is possible a significant increase in the workload could offset the gain in response time. But
either way, the users win. If they don’t increase the workload, online performance increases.
If they do significantly increase the workload, they will get more work done!
When making a statement like this, someone is likely to ask how much more throughput
can be expected. That’s when it’s time to once again show the response-time graph in Figure
9-25, which illustrates the current situation in a pure logical IO (CPU) perspective. If the
service time does not decrease (this is detailed in the second analysis cycle), then it appears
we can nearly double the workload before response time significantly increases. Unless you
have a way to control the user workload or understand the application very well, there may be
no way of knowing if the users can or will increase the workload. But regardless, you have
graphically and with simplified numbers demonstrated the performance impact of increasing
the buffer cache.
To see what actually occurred when the buffer cache was increased to 1GB, read on!
372

As described in the previous section, an Oracle buffer cache increase was chosen as the first
performance-enhancing change. We are anticipating around a 30% decrease in total Oracle
response time, and for users who run multiple serial queries at a single touch to feel around a
50% decrease in response time. We are not sure if users will be able to take advantage of
faster response time and get more work done, but it won’t surprise us if they can.
Oracle Analysis
As Figure 9-27 shows and as we expected, physical IO has been virtually eliminated. The
total service time has also decreased. We were hoping for a 30% drop in total response time.
But when comparing the Oracle analysis shown in Figure 9-21 to Figure 9-27, we can see the
total response time decreased from 2,644 seconds to 1,257 seconds, which is a 52%
improvement! (A direct comparison was possible because the collection interval was the
same: 30 minutes.) The large drop in service time is due to less cache management related to
placing blocks into the buffer cache. The decrease in Oracle’s CPU consumption should result
in a drop in CPU utilization. Any further performance gain should now focus on reducing
CPU consumption, which is squarely focused on heavy logical IO SQL.
Figure 9-27. Shown is the 30-minute interval ORTA as a result of the buffer cache increase.
Comparing this to Figure 9-21, as expected, total queue time has been virtually eliminated.
Total service time has also decreased due to less buffer cache management related to placing
buffers into the cache.

Figure 9-28 shows the operating system is looking even better than before! The CPU
utilization dropped from 28% to 20%, and the IO subsystem is receiving from Oracle less
than 1 MB/s in read and writes. So, it appears that increasing the Oracle buffer cache has had
a very positive effect on resource consumption.
373
Figure 9-28. Shown is the operating system analysis information. Compared to Figure 9-22,
Oracle CPU consumption dropped to 17%, and the operating system CPU utilization dropped
from 28% to 20%. Since the utilization significantly dropped, we should not expect a large
increase in the workload.
At this point, the only way to decrease CPU-related response time is to either use faster
CPUs or reduce the SQL statement logical IO consumption (tune or balance). While
additional CPUs may provide more CPU capacity, Oracle and the operating system are not
able to fully take advantage of the existing four cores (for details, see the scalability
discussion near the end of this chapter).
Application Analysis
The application situation has indeed changed, as shown in Figure 9-29. First, we can see that
no significant physical IO is being consumed! Thus means increasing the buffer cache had its
intended affect. We were hoping for a 50% decrease in elapsed time, to around 0.316
ms/exec. What actually occurred was an elapsed time drop from 0.632 to 0.266, which is a
58% decrease in response time! So, we met and exceeded our objective. It appears the users
are also able to get more work done because the SQL statement execution rate increased from
25.6 exec/sec (see Figure 9-24) to 27.1 exec/sec (Figure 9-30).
374
Figure 9-29. Shown is the essential application SQL information. Notice there is no physical
IO consumed. Compared to Figure 9-23, the top SQL statement’s elapsed time per execution
improved from 0.632 ms/exec to 0.266 ms/exec, while at the same time, the number of
executions during the sample interval increased from 473 to 536.
Figure 9-30 shows logical IO response time decreased to 0.009354 ms/lio from 0.02199
ms/lio (Figure 9-24). Clearly, there was a significant service time change. This means initially
Oracle was burning CPU cycles on other tasks besides accessing buffers that already resided
in the cache. This is another example of the overhead involved in bringing buffers into
Oracle’s cache and updating all the related memory structures. As a result of the service time
drop, the response-time curve will shift down and to the right, as shown generally in Figure 9-
17 and especially in Figure 9-31. This explains why SQL statement elapsed time decreased
and utilization decreased, while the workload increased.
Figure 9-30. Shown is the workload diagnostic information. Compared to Figure 9-24,
logical IO response time dropped from 0.02119 ms/lio down to a staggering 0.00935 ms/lio.
In addition, the overall logical IO workload increased from 69.29 lio/ms to 74.66 lio/ms,
representing an 8% increase. So again, performance has improved while the workload has
also increased.
Figure 9-31 shows the initial and current (buffer cache increase) response-time curves
using logical IO as the workload metric. The variables used to create the response-time curve
are four CPU cores (M=4); service times (St) of 0.01507 ms/lio and 0.009326 ms/lio for the
initial and increased buffer cache situation, respectively; and their various arrival rates of 69.3
lio/ms and 74.7 lio/ms, as indicated on the graph as points A and B, respectively. Because
Oracle now consumes less CPU per logical IO, the service time for logical IO decreased. As
shown graphically in Figure 9-31, the performance situation changed from point A to point B,
375
allowing both improved SQL statement elapsed time combined with an increase in workload
and a reduction in CPU utilization.
Figure 9-31. Shown is the response-time curve shift as a result of the logical IO service time
decrease (improvement). Not only does this increase performance with no workload change
(69 lio/ms), but in the current situation (point B), the response time remains improved along
with an 8% workload increase (74 lio/ms).
What-If Analysis
Now let’s suppose the users would like even more of a performance improvement. Based on
the OraPub 3-circle analysis, one obvious place to squeeze more performance out of the
system is a reduction in logical IO, which will reduce the total CPU consumption. This can be
accomplished by reducing the total number of LIOs. Figure 9-31 shows that to reduce LIO
response time, the LIO service time must be reduced. This means we must reduce the CPU
consumption per LIO. The most direct way to accomplish this is to influence the optimizer to
choose a more efficient execution plan, thereby reducing the CPU consumed per LIO.
Typically we also receive the added benefit of reducing the number of total LIOs processed.
As shown in Figure 9-29, the statement with a SQL_ID ending in d6w consumed nearly
22.7 million logical IOs during the 30-minute reporting period, which is about 42.3 thousand
logical IOs during each of its 536 executions. By tracing the d6w SQL statement, it was
confirmed that a typical execution touches around 42.3 thousand logical buffers. It was also
obvious that the large customer table was being full-table scanned! By simply creating an
index on the status column and rerunning the query, only three logical buffers were
touched. (While indexing a status column usually will not produce an improvement like
this, in this application, it was indeed the case.) This means even if the statement is run 536
times, only 1,608 buffers will be touched. And since each logical IO consumes around
0.00933 milliseconds of CPU (LIO service time), during the 30-minute interval, the statement
should consume only 15.003 seconds of CPU (1,608 × 0.00933). Keep in mind the 0.00933
ms figure is the average CPU consumption per LIO over the entire sample interval.
376
But the impact is more far-reaching, because creating an index on the status column
also impacts three other statements out of the top five logical IO statements. The other three
statements also touch only three logical IOs per execution. Of the five top logical IO
statements, only the statement with the SQL_ID ending in ggt is not improved by the new
index. As you’ll see later, the lack of a thorough index impact analysis will have unintended
consequences.
Table 9-3 details one way to calculate the CPU consumption change for multiple SQL
statements. By creating the status column index, each of the queries will consume only
three logical IOs per execution. Based on their number of executions during the 30-minute
sample interval, the expected logical IOs are calculated. Since each logical IO consumes
around 0.00933 ms of CPU, the expected total CPU consumption per tuned SQL statement is
calculated. When combined, the tuned statements will now consume only 0.0231 second of
CPU, compared to the initial 230.553 seconds.
Table 9-3. Determining CPU savings when using a status column index
Current Total Total Expected Expected Total

SQL_ID
CPU (ms) Execs LIOs CPU (ms)
fg8cnnjrf2d6w 142,797 536 1,608 15.0
cyfcvf5k75npm 65,528 243 729 6.8
ggaj1gzcj3gxp 10,872 23 69 0.6
680t2uhr9tqqb 11,356 24 72 0.7
230,553 23.1
While the improvement seems dramatic, only when users trigger multiple and serial
executing SQL statements are they likely to feel any difference. Additionally, total sample
interval Oracle service time is 1,254 seconds (Figure 9-27), so a decrease of around 230
seconds may not result in much of a utilization improvement. But let’s do the math, create the
index, and see what happens.
Subtracting the CPU consumption from the statements shown in Table 9-3 (230,553
ms), and then adding back their tuned CPU consumption of 23.1 milliseconds, the expected
Oracle CPU consumption becomes 1,026,948.1 milliseconds (1,257,478 – 230,553 + 23.1),
which is 1,026.948 seconds. Placing the expected CPU consumption into the standard
utilization formula, we see the expected utilization is about 14%.
R 1,026.948s 1m
U= = ! = 0.143 = 14%
C 4cores ! 30m 60s
As shown in Figure 9-28, Oracle consumed 17% of the available CPU. By creating the
status index, we expect Oracle to consume about 14% of the available CPU. So what
seemed like a massive improvement that would certainly change the performance situation, is
actually expected to result in only a 3% CPU utilization savings. To see what actually
occurred when the index was added, read on!
377

As a result of the first analysis cycle, it was decided to increase the Oracle buffer cache. The
result of that performance-enhancing change was reflected in the second analysis. In the
second analysis, we decided to further increase performance by creating an index on the
customer table column status. We are anticipating around a 3% CPU utilization
decrease and additional room for workload growth. While the top logical IO SQL statements
should have their elapsed times decreased to about zero, only users executing multiple serial
SQL statements are likely to feel any difference. As before, we are not sure how the users will
affect the workload. But as the dotted response-time curve in Figure 9-31 shows, even
doubling the workload should not significantly degrade response time. Here is what actually
happened.
Figure 9-32. Shown is the 30-minute interval ORTA as a result of an increase in the buffer
cache and adding the status column index. Compared to Figure 9-27, as expected, total
queue time is about the same and is insignificant. Far surpassing our expectations, total
service time decreased from 1,254 to 344 seconds! Clearly, the status column index
touched far more SQL statements than our top SQL report showed.
Oracle Analysis
Figure 9-32 shows a rather dramatic decrease in CPU consumption over the 30-minute sample
interval. When adding the status column index, we expected the total Oracle CPU
consumption to drop to around 1,027 seconds. However, it dropped to 343 seconds! So,
obviously the index had a much broader (and positive) impact than we anticipated. Based on
the ORTA, further performance improvements should once again focus on reducing CPU
consumption.
378

Figure 9-33 shows the operating system is looking even better than before! The CPU
utilization dropped from 20% to 7%, and the IO subsystem is receiving virtually no IO
requests from Oracle. The status index creation reduced Oracle CPU consumption far
more than our anticipated 3%. The index impact was so prolific that it resulted in a 13% CPU
consumption reduction. As with the prior tuning cycle, from an operating system perspective,
using faster CPUs will decrease CPU-related response time.
Figure 9-33. Shown is the operating system analysis information. Because the Oracle load is
almost entirely CPU-based, targeting heavy logical IO SQL statements by creating the
status column index reduced CPU utilization to 7%. Oracle is submitting virtually no IO
requests to the operating system.
Application Analysis
The application situation has profoundly changed. Oracle is now processing fewer logical IOs
while at the same time executing more SQL statements. This means users are getting more
work done but consuming fewer resources! The addition of the status column index had a
much larger and positive impact than we anticipated. Clearly, there were other SQL
statements that benefited from the index creation. Over the 30-minute interval, we anticipated
Oracle CPU consumption would decrease from 1,257 seconds down to 1,027 seconds, but in
reality, the consumption decreased to a staggering 344 seconds.
Comparing the top SQL statements in Figure 9-29 (before index addition) with Figure 9-
34, notice there is now a new top logical IO-consuming SQL statement along with the
previous number three statement (SQL_ID ending in ggt). If performance is to be further
improved, we have once again clearly identified (and supported by an OraPub 3-circle
ORTA-focused approach) the next two SQL statements to address.
379
Figure 9-34. Shown is the essential application SQL information. Because of the index
addition, the targeted high logical IO SQL statements no longer appear on the top SQL
report! In addition, the top logical IO SQL statements now consume 8.4M and 4.1M logical
IOs, compared to the earlier case where the top two statements consumed 22.6M and 10.3M
logical IOs, respectively (Figure 9-29). The status column index has had a profound
impact on the most resource-consuming SQL.
It is also very encouraging that as result of the additional index, the top two logical IO
statements consumed a combined 12.5M logical IOs (Figure 9-34), whereas before the index
addition, the top two consumed 32.9M logical IOs (Figure 9-29). So, by aligning our ORTA
with the application analysis, we correctly targeted the high-impact SQL statements.
The beauty of this is the drop in logical IO consumption occurred in conjunction with an
increase in the number of SQL statement executions. For example, Figure 9-30 shows Oracle
processed 134M logical IOs and 48.7K SQL statement executions. But with the additional
index, Figure 9-35 shows Oracle processed only 37M logical IOs while executing over 53.1K
SQL statements! And this all occurred with a reduced CPU utilization.
Figure 9-35. Shown is the workload diagnostic information. Compared to Figure 9-30,
logical IO response time was maintained, Oracle processed fewer logical blocks from 74.7
lio/ms to 20.5 lio/ms, while increasing the SQL execution rate from 27.1 exec/sec up to 29.5
exec/sec!
Full Performance Analysis: Summary

If you have been reading this book sequentially, my hope is you easily followed the preceding
performance analysis cycles. In a way, it is a broad review of the key aspects of this book.
Each cycle involved conducting an OraPub 3-circle analysis, understanding Oracle internals,
performing an ORTA, and anticipating the solution’s impact. The following are the key
points:
380
• Spot-on diagnosis resulting in very specific and targeting changes. There should be
no question about how we arrived at our recommendations.
• Each recommendation was accompanied with an anticipatory impact shown both
graphically and numerically. Even without doing any predictive mathematics, the
anticipated performance situation change was clearly evident when presenting the
response-time curves, further building consensus around the recommendations.
• The systematic analysis naturally created an easy-to-follow and convincing story
containing plenty of abstraction for the nontechnical folks, as well as specific Oracle
internals and even some mathematics to satisfy the more technically astute.
• There was no finger-pointing. Each subsystem was investigated and possible
performance-enhancing changes discussed. The implemented recommendations were
objectively selected based on the anticipated impact and ease of implementation.
However, there was no discussion about application usage disruption, uptime
requirements, and availability requirements. In real life, these issues almost always
take precedence over performance.
Table 9-4 shows a rather dramatic and satisfying flow of performance-improvement
metrics. There are a couple items worth highlighting. First, while the physical or logical IO
workload dropped, the number of SQL statement executions increased. While not shown in
this table, the key SQL statements had a continual elapsed time improvement. The reduction
in SQL statement resource consumption occurred in conjunction with a decrease in CPU
utilization and total Oracle response time. This is exactly the kind of result we want to see.
Table 9-4. Full performance analysis key metric change
Oracle Oracle
CPU IOPS
Cycle PIO/ms LIO/ms Exec/sec ST QT
Util. R+W
(sec) (sec)
Baseline 67.78 69.29 25.64 1880 765 28% 4762.9
Buffer
cache 0.00 74.66 27.06 1254 4 20% 0.7
increase
Index
0.00 20.52 29.50 344 4 7% 0.3
addition
While Table 9-4 is a numeric representation of our analysis flow (and success), Figure 9-
36 is a graphical representation. Based on logical IOs, Figure 9-36 shows the initial and final
response-time curves and the respective arrival rates. Technical and nontechnical people alike
should be able to easily grasp that the situation is much better now at point B than when we
started at point A. Adding that there is now more room for growth and that the users are also
performing more real work (SQL statement execution) will add a final punch to our
presentation.
381
Figure 9-36. Shown is a logical IO-focused response-time curve highlighting and contrasting
the initial performance situation (point A) to the final performance situation (point B). This
response-time curve indicates a very successful performance effort because fewer resources
are required for a single logical IO (service time decreased), users are putting less of a load
on the system (not shown: while their work productivity has increased), and the database
server’s CPU subsystem can now accommodate a much larger future growth.
Improper Index Impact Analysis Performed

When adding the status column index, we anticipated only a 3% decrease in CPU
utilization, but in reality, there was a 13% drop! Always try to be conservative, but in this
case, the anticipated performance impact was simply wrong. We got lucky because many
other SQL statements were impacted (for the better) in addition to the four we targeted.
Because we did not analyze all possible affected SQL statements, there could have easily been
other statements negatively impacted, eliminating any performance gain achieved from our
targeted efforts.
I could have simply left the index addition section out of the book, but I included it for
two reasons. First, to provide another example of how performance change can be anticipated.
Second, so you can observe how easy it is to be wrong by not thinking through a change.
Proper Use of Work and Time

Each cycle of this performance analysis used data from a 30-minute sample. While different
sample durations could have been used, by using the same sample interval, direct numeric
comparisons without a unit of time are possible. For example, I mentioned when the status
column index was added, the number of SQL executions increased from 48.7K to 53.1K. If
the first sample interval was 30 minutes yet the second interval was 60 minutes, we could not
have responsibly made this direct comparison. Instead of stating there was a total of 53.1K
statement executions, we could have stated the statement execution rate was 29.5 exec/sec.
Notice in Table 9-4, for all work-related metrics a unit of time was included. By
providing a unit of work and time, direct comparisons from the past or with other systems can
382
be made, regardless of the sample interval. It’s OK to use interval totals without reference to
time, as long as the sample intervals are the same.
Batch Process-Focused Performance Analysis

Many Oracle systems contain a mix of online and batch processing, or perhaps more online-
focused during the day and batch-focused during the night. Obviously, improving batch
processing performance is just as important as improving online processing.
There is a significant difference or shift in focus when working on batch processes. Our
concern shifts from response time per unit of work to total response time. In other words, we
are more concerned about how long it takes to process 500MB of IO or 5,000 seconds of CPU
compared to the response time of a single physical IO. Another way of looking at this is our
unit of work becomes the entire batch process or a step in the batch process.
While the response-time curve can be used when working with batch processes, because
of the longer and singular process time focus, it is not nearly as useful. The response-time
curve shines when it relates time to small units of work, like a logical IO or the execution of a
SQL statement. Because our focus has shifted from small units of work to an entire process or
process segment, our method of reflecting the situation must also change. Instead of using a
response-time curve, the situation can be conveyed numerically in a table format (see Table 9-
5) or by using a simple timeline.
Setting Up the Analysis

Table 9-5 shows how to set up a batch process analysis. The entire batch process has been
segmented into three steps, or segments. Step determination is based on your objectives,
available statistics, and your audience. The time data comes from the same sources as with
online transactions, but, as you’ll see, with a slight yet significant twist. When the process
steps have been defined and the respective data collected, a table similar to Table 9-5 can be
constructed.
Table 9-5. Analyze batch process performance by response time per step
Total
Total Total Total IO Total IO
Other
Step Response CPU Time Read Time Write Time
Time
Time (sec) (sec) (sec) (sec)
(sec)
Load 1,989 89 267 1,628 5

Process 2,106 1,706 239 29 132
Update 624 76 123 403 22
Total 4,719 1,871 629 2,060 159
In addition to helping focus the analysis, a setup similar to Table 9-5 naturally allows us
to calculate anticipated change with a greater degree of accuracy. For example, if we believe
through increased parallelism the Load step’s write time can be reduced by 50%, we can
easily adjust the table and recalculate the response time. So not only does this table help us
383
understand the situation, target our efforts, and communicate the situation to others, it also
aids in anticipating performance improvements.
To help understand the complexity of working on each step, additional columns can be
helpful. For example, the table could also include the number of top SQL statements and
some complexity metric. The point is that the table should encourage fruitful discussion and
analysis, so an informed decision can be made about where to target the performance effort.
Capturing Resource Consumption

There are two significant differences in capturing response time information when focused on
a batch process: client process time and background process time.
Capturing Client Process Time
In vivid contrast with online activity, during batch processing, there is no think time and there
can be significant client-processing time. As a result, for the total Oracle response time to
equal batch process elapsed time, our response-time analysis must include client-processing
time and also communication time between the client and server process. As presented in
Chapter 5, this time component is captured by the SQL*Net message from client
and SQL*Net more data from client wait events. If database links are involved,
then don’t forget to also include SQL*Net message from dblink and SQL*Net
more data from dblink. When this normally useless and idle classified time is
included, the batch process elapsed time will equal the total Oracle response time.
Removing Background Process Time
While including background process service and queue time is important and useful when
analyzing an entire system, it can become less useful when focused on a single or a few
sessions or processes. This normally does not present a problem, because session-focused
collection will naturally include only time related to a specific session or group of sessions.
So, what may seem like quite a technical challenge turns out to be not that big of a deal.
Depending on the Oracle release, Statspack and AWR reports may separate background
process time. This makes our job even easier. But if the time is not separated, simply
manually exclude any background process-related time.
When gathering data using your own scripts, remember to make the appropriate
adjustments. The significant time-consuming background process wait events should be of no
surprise: log file parallel write and db file parallel write. There are
others, of course, but these are the most common. The other wait events are easily
recognizable by their event name being associated with a background process. If you are
unsure, refer to Oracle documentation or search Metalink. Even better, sample from
v$session_wait or, for Oracle Database 10g and later, sample from
v$active_session_history or v$session, to see which sessions are posting the
wait event in question.
Including Parallelism in the Analysis

Back in Chapter 3, I stated that serialization is death. When working with batch processes,
this is profoundly important. Suppose that a process takes 60 minutes to complete, and the
system has ample available capacity. We know that if we can alter the process to run in two
streams instead of one, the process may complete twice as fast—in 30 minutes. That is using
parallelism to our advantage.
384
With online processes, Oracle has already taken significant steps to increase and take
advantage of parallelism. The existence of multiple server and background processes is an
example of this. However, having a batch Oracle client process related to a single Oracle
server process can become a serialization limitation. So, our parallelism effort will focus on
ways to split the process into multiple streams, each with its own Oracle client and server
process.
Anticipating Elapsed Time
When a process is serialized, there may be plenty of available capacity, but it cannot be used.
For example, if there are four CPU cores providing 240 seconds of CPU power over a 1-
minute period (4 × 1 × 60), but a single stream process is serialized, it can only hope to
consume at most 60 seconds of CPU. If we look at the operating system during the serial
process, average CPU utilization will be 25%, while our CPU-intensive batch process crawls
along. What is needed is increased parallelism to take advantage of the additional and
available resources.
We can mathematically determine batch process segment elapsed time by simply
dividing the required resource time by the available parallelism. For example, suppose a CPU-
intensive batch process segment consumes 120 seconds of CPU. When run serially, this
process takes 120 seconds. After some analysis, it was determined the process could be split
into three parallel streams without any Oracle concurrency issues. The anticipated elapsed
time becomes 40 seconds. The formula is as follows:
R 120s
E= = = 40s
P 3
where:
• E is the elapsed time.
• R is the resources required or duration.
• P is the used and available parallel streams.
For this example, 120 seconds of CPU is required and three parallel streams are
available, so the anticipated elapsed time is 40 seconds. If we looked at the average CPU
utilization, it would now be around 75% busy, because only three of the four available CPU
cores are being used.
The used and available parallelism are obviously very important. Just because four CPU
cores or 100 IO devices exist does not mean they can be used. In our example, there are four
CPU cores available, but the application developers were able to create only three parallelism
streams. And, of course, if the application creates ten parallel streams, yet only four cores are
available, the CPU subsystem will become a bottleneck, operating deep in the elbow of the
response-time curve.
Scalability Issues
Expecting a double in parallelism to yield a two times performance improvement is a best-
case scenario that is unlikely. So there are a number of reasons why parallelism can be
limited:
• There must be available resources.
385
• Oracle concurrency issues—such as enqueues, buffers being busy, and latching—can

arise.
• Processes that are split typically must have their results merged, which may force the
creation of an additional process or, if the merge process already exists, it may take
more time to complete.
• There is classic operating system-related scalability.
In reality, with every additional parallel resource (for example, CPU core), a fraction of
the power effectively becomes unavailable or lost. As mentioned, if a batch process is split,
there may be the need to merge the results. The merge process is the direct result of the
increased parallelism, and this constitutes a piece of perceived processing gain we effectively
lose. It’s true that overall we can reduce elapsed time, but the scalability effect is real, and it
grows as the number of parallel streams increases.
There are a number of ways to determine the scalability effect. The simplest way is to
run tests and get to know your application. If that is not practical, then be conservative. There
are also a number of ways we can numerically represent the scalability effect. For example,
let’s suppose with every additional parallel stream, 10% is lost to scalability. This results in a
more realistic elapsed time expectation. There are many methods of account for scalability.
For this example, I chose to use a simple yet robust geometric scaling model. The elapsed
time formula now becomes as follows:
R R
E= = 0
P ! + � + ! P "1
where:
• E is the elapsed time.
• R is the resources required.
• P is the used and available parallelism.
• Φ is the parallel factor: 1 is complete parallelism, and 0 is no parallelism.
Applying scalability to our example of splitting the batch process into three streams and
an optimistic parallelization factor of 98%, the elapsed time is calculated as follows:
R 120s 120s
E= = 2 = = 41s
P 0.98 + 0.98 + 0.98
0 1
2.94
I realize that most DBAs will not be so scientific in their work, but please do not forget
to include some overhead when anticipating a parallelization increase. Forgetting or ignoring
scalability will produce overly optimistic predictions.13
13
For an in-depth discussion of scalability, refer to Forecasting Oracle Performance (Apress, 2007), Chapter
10, “Scalability”.
386
Operating in the Elbow of the Curve

In Chapter 4, I noted that, in certain situations, it is desirable to encourage a system to be
operating in the elbow of the response-time curve. That discussion should make a whole lot
more sense now, and I’ll summarize it again.
When focused on online user response time, a snappy response is desired. To increase
the likelihood of snappy response time, we want the likelihood of queuing to be very low. We
encourage this by ensuring a low utilization by keeping the arrival rate low enough to prohibit
the system from operating in the elbow of the curve. While this produces snappy online
response time, it also leaves available computing resources on the table. With a batch process
focus, we want to use those leftover and available computing resources. In fact, to leave any
available resource unused can be considered wasteful, shows parallelism is limited, and could
result in a longer elapsed time. So, with batch processing, we look for ways to use any and all
computing resources available to minimize elapsed time. This means the system will be
operating in the elbow of the curve.
From a CPU perspective, this means the average run queue will always be equal to or
greater than the number of CPU cores. But there is a limit to our aggressive resource usage. If
we allow the system to push too far into the elbow, the overhead associated with managing
the system increases to the point where the elapsed time begins to degrade. Our job as
performance analysts is to find the sweet spot and operate the batch processing there!
Summary and Next Steps

This chapter is truly about performance analysis. Some would say it is enough to find a
problem and make some changes. I respectfully disagree. I believe we can do so much more,
be so much more effective, and add greater value to our organization. Nearly any Oracle
analysis will result in a number of recommendations. But in order to responsibly decide which
changes to implement first, some ranking must occur. This chapter focused on bringing
rational debate and consensus to ranking the performance-enhancing solutions. I hope you
have found it enlightening and practical.
This final chapter brings a natural and fitting finality to Oracle Performance
Firefighting. We started with method, moved into diagnosis and data collection, then into
relevant Oracle internals, enabling us to intelligently choose valid performance-enhancing
solutions, and then finally anticipate the impact of our proposed solutions. It’s part of being an
effective Oracle performance firefighter.
I truly wish this book had been available in 1989 when I first joined Oracle’s consulting
division. It would have made such a difference! So to those of you who are relatively new to
Oracle optimization, here it is and enjoy! And those of you who have been slugging along for
years optimizing Oracle systems, I hope you have a renewed enthusiasm and expectation for
your work.
Thank you for taking the time to read my book. I look forward to hearing from you!
387
388
Errata
To view and download book updates, fixes, etc. (errata)
please go to the main webpage:
http://resources.orapub.com/Oracle_Performance_Firefighting_Book_p/ff_book.htm
Index
To view and download the index (actually it’s a concordance)
please go to the main webpage:
http://resources.orapub.com/Oracle_Performance_Firefighting_Book_p/ff_book.htm
Oracle Performance Firefighting

Concordance
Fouth Printing – June 2010
adjusting, 71, 214, 277, 309

A adjustment, 277, 369
adjustments, 296, 384
abort, 269 administer, 102, 238
aborts, 147 administration, 102
abstract, 8, 345, 350 administrative, 246
abstracted, 27, 78, 208, 236, 237, 281, 295, 296, 298, 345, administrator, 29, 37, 83, 102, 104, 112, 118, 120, 122, 123,
346, 350, 351 129, 133, 134, 314, 336, 337, 340, 363
abstraction, 2, 29, 30, 33, 103, 248, 328, 345, 348, 350, 352, administrators, 13, 41, 61, 83, 102, 109, 111, 118, 120, 122,
364, 381 130, 132, 133, 135, 149, 319, 320, 321, 360, 362
abstractions, 194, 345 advance, 86
abstracts, 8, 251 advanced, 5, 65, 124, 125, 154, 156, 177, 192, 206, 326, 343
academic, 29 advantage, 5, 8, 15, 55, 75, 91, 97, 98, 160, 182, 231, 328,
access, 42, 67, 69, 72, 73, 74, 75, 77, 78, 81, 92, 93, 97, 98, 336, 373, 374, 384, 385
102, 119, 133, 186, 189, 194, 195, 199, 208, 209, 220, 223, advantages, 27, 40, 187
226, 227, 228, 243, 244, 245, 255, 261, 281, 282, 289, 291 adventure, 233
accessed, 73, 76, 187, 188, 192, 205, 223, 283 adverse, 317
accesses, 72, 191, 199, 228, 293 advise, 266
accessing, 22, 50, 69, 72, 73, 75, 92, 93, 189, 201, 223, 226, advisers, 19
242, 251, 261, 293, 375 advocating, 350
account, 41, 128, 216, 353, 358, 366, 386 affiliation, 50
accountant-like, 233 age, 283, 286
accounting, 39, 55, 84, 162 aggregation, 170
accounts, 63, 105, 106, 108, 127, 128, 367, 372 aggressive, 120, 122, 123, 176, 324, 387
accumulate, 303 aggressively, 122, 265, 317
accumulated, 31, 32 aggressor, 122
accuracy, 3, 4, 177, 182, 383 agree, 12, 19, 177
accurate, 20, 24, 25, 37, 55, 126, 176, 177, 216, 295, 318, agreed, 39, 264
333, 366 agreeing, 12
accurately, 336 agrees, 13
acquire, 22, 72, 73, 74, 76, 80, 81, 83, 97, 156, 195, 197, 208, alarming, 133
210, 220, 221, 243, 267, 276, 288, 297, 309, 310 alert, 134
acquired, 70, 72, 74, 80, 81, 143, 214, 251, 261, 296 alerted, 63
acquires, 74, 199, 210, 270, 297 algorithm, 4, 8, 70, 78, 84, 85, 94, 191, 192, 193, 195, 197,
acquiring, 70, 72, 74, 75, 93, 97, 98, 197, 214, 251, 261, 308, 199, 200, 201, 203, 204, 205, 206, 207, 208, 213, 214, 216,
309 250, 251, 257, 261, 263, 265, 267, 270, 271, 275, 278, 279,
acquisition, 75, 81, 84, 85, 86, 93, 99, 203, 211, 214, 217, 283, 297, 342
252, 265, 267, 271, 342 algorithmic, 205, 207
active, 5, 36, 42, 47, 56, 80, 90, 94, 103, 104, 110, 111, 112, algorithms, 4, 5, 7, 8, 66, 67, 70, 76, 86, 94, 100, 120, 125,
116, 117, 121, 125, 129, 136, 142, 143, 144, 151, 176, 180, 153, 185, 186, 187, 188, 191, 203, 206, 286, 296, 342, 343
182, 183, 184, 198, 209, 216, 217, 221, 231, 238, 239, 240, aligned, 71
241, 242, 243, 259, 262, 271, 277, 287, 293, 298, 312, 313, aligning, 380
315, 329, 330, 341, 384 all_rows, 249, 253
actively, 65, 124, 249 all_status, 201
activities, 15, 37, 81 allocate, 276, 286, 297, 301, 309, 310, 326
adapter, 324 allocated, 145, 183, 268, 276, 296, 298, 299
address, 5, 15, 19, 24, 78, 80, 88, 185, 189, 193, 208, 209, allocates, 286, 297
220, 234, 238, 240, 243, 244, 251, 270, 273, 296, 297, 300, allocating, 268, 271, 308, 309
306, 379
allocation, 73, 89, 250, 270, 277, 279, 283, 286, 296, 297,
addressable, 161
298, 299, 308, 309, 310
addressed, 24, 70, 90, 325
alter, 19, 148, 151, 172, 198, 201, 219, 230, 238, 239, 241,
addresses, 128, 307 245, 246, 248, 249, 253, 265, 275, 289, 292, 296, 302, 303,
addressing, 311 351, 359, 362, 384
adjustable, 182 altered, 80, 172, 206, 207, 246, 251, 252, 253, 254, 264, 267,
adjusted, 260, 277, 341 269, 275, 302, 303, 322
altering, 16, 201, 219, 246, 259, 261, 265, 266, 267, 286, 321 architects, 76, 80, 160, 245
alters, 303 architectural, 153, 158, 163, 184, 185, 245, 251, 292, 299
analogy, 26 architecture, 4, 22, 31, 63, 67, 75, 119, 153, 154, 155, 156,
analyses, 22, 36, 41, 54, 110, 117, 129, 137, 151, 157 158, 159, 160, 161, 163, 169, 228, 246, 247, 251, 292, 295,
analysis, 2, 3, 4, 7, 8, 9, 11, 15, 16, 17, 18, 19, 20, 21, 22, 23, 296, 336
24, 25, 26, 27, 29, 31, 33, 35, 36, 38, 40, 41, 42, 44, 45, 46, architectures, 5, 65, 156, 158, 162, 165, 177, 327
51, 52, 54, 55, 56, 62, 65, 67, 71, 100, 101, 102, 103, 110, array, 21, 29, 31, 129, 162, 167, 176, 182
118, 123, 127, 128, 130, 134, 135, 136, 137, 141, 144, 149, arrays, 123, 300
151, 152, 153, 154, 155, 157, 158, 164, 168, 169, 173, 176, arrival, 28, 29, 30, 31, 124, 125, 130, 328, 329, 330, 331, 340,
179, 180, 181, 184, 185, 190, 219, 224, 262, 264, 285, 292, 341, 342, 344, 346, 347, 348, 350, 352, 353, 354, 355, 358,
312, 317, 327, 328, 338, 343, 345, 348, 365, 367, 368, 369, 359, 360, 361, 362, 363, 364, 371, 372, 375, 381, 387
371, 372, 373, 374, 376, 377, 378, 379, 380, 381, 382, 383, arrivals, 328, 353, 360
384, 385, 387 arrive, 4, 8, 12, 70, 204, 313, 328
analyst, 8, 27, 37, 43, 63, 64, 162, 233, 266, 277, 289, 301, arrived, 295, 328, 381
306, 307, 311, 360, 362 arrives, 341
analysts, 36, 41, 57, 121, 150, 239, 245, 259, 261, 265, 267, arrow, 350
289, 316, 345, 387 arrows, 76, 209
analytics, 365 articles, 86
analyze, 177, 382 artificially, 211, 283, 286
analyzed, 15, 19 ascending, 232
analyzing, 15, 19, 61, 117, 127, 156, 292, 366, 384 ash, 177, 182, 183
ancient, 14, 17, 19 ashpct.sql, 180
anomaly, 295 ashrt.sql, 178, 180, 181
anticipate, 2, 4, 12, 22, 27, 186, 344, 346, 347, 348, 364, 365, ashsqlpctcpu.sql, 181
367, 371, 387 ashsqlpcte.sql, 179
anticipated, 19, 343, 378, 379, 381, 382, 383, 385
assert, 43
anticipating, 328, 332, 339, 341, 343, 348, 354, 355, 359, assertion, 43
365, 366, 367, 369, 370, 373, 378, 380, 384, 386
assertions, 127
anticipation, 39, 350
assessment, 109
anticipatory, 327, 381
assign, 96, 231, 257
anxiety, 13
assigned, 69, 231, 241, 242, 365
anxious, 162, 197
assignment, 1, 21, 96, 97, 257, 258, 259
appear, 62, 133, 224, 348, 380
assignments, 258
appearance, 119, 302
assigns, 294
appeared, 39, 347
assimilate, 3
appears, 7, 62, 78, 97, 111, 112, 144, 305, 328, 340, 363, 372,
associate, 92, 160, 257
373, 374
associated, 35, 46, 49, 51, 55, 56, 62, 63, 64, 65, 67, 69, 70,
appended, 159
71, 72, 73, 74, 75, 76, 78, 80, 84, 86, 90, 92, 93, 119, 120,
application, 12, 13, 14, 18, 20, 21, 22, 24, 25, 27, 31, 40, 41,
127, 128, 154, 158, 159, 163, 165, 174, 179, 180, 187, 188,
44, 51, 55, 61, 63, 64, 65, 67, 68, 71, 83, 97, 105, 130, 132,
189, 190, 194, 195, 197, 200, 202, 203, 204, 205, 208, 210,
133, 138, 139, 148, 149, 153, 155, 156, 161, 162, 164, 165,
215, 219, 220, 234, 237, 238, 239, 243, 244, 246, 250, 251,
167, 171, 173, 201, 202, 203, 214, 216, 219, 220, 221, 227,
252, 254, 257, 259, 266, 271, 277, 280, 281, 282, 283, 292,
228, 230, 233, 234, 238, 244, 246, 263, 264, 265, 268, 271,
293, 294, 295, 300, 301, 302, 306, 309, 310, 312, 313, 316,
274, 275, 276, 277, 283, 292, 300, 301, 302, 303, 305, 307,
324, 325, 384, 387
310, 311, 312, 313, 314, 315, 317, 321, 322, 324, 326, 332,
associates, 43
349, 360, 362, 363, 367, 369, 371, 372, 374, 375, 376, 379,
associating, 161, 162
380, 381, 385, 386
association, 215
application-focused, 22, 148, 178, 222, 324, 344
applications, 24, 127, 132, 150, 165, 193, 201, 206, 238, 261, associations, 194, 255
271, 280, 284, 300 assume, 15, 219, 258, 313
application-specific, 24, 227 assumed, 339
approach, 2, 4, 5, 8, 19, 25, 27, 35, 36, 53, 65, 75, 76, 120, assuming, 340, 372
122, 136, 142, 156, 167, 176, 199, 207, 220, 228, 230, 261, assumption, 151, 157, 186
343, 371, 379 assumptions, 186, 219
approached, 18, 264 assured, 82
approaches, 136, 177, 223, 314, 326 asynchronous, 43, 303, 304, 324
appropriate, 4, 5, 20, 33, 42, 43, 53, 71, 164, 168, 199, 208, asynchronously, 303
210, 221, 223, 227, 233, 242, 246, 250, 251, 261, 270, 275, atomic, 97
281, 335, 338, 349, 350, 353, 365, 369, 384 attached, 42, 44, 82, 129, 143, 165, 217, 218, 316, 319
appropriately, 56, 71, 145, 184, 195, 221, 292 attaches, 164
approximately, 88 attaching, 163
architect, 8 attack, 24, 52, 134
architected, 265 attacking, 19
attacks, 151 basic, 4, 16, 88, 102, 110, 123, 162, 164, 191, 270, 326, 352,
attendees, 157 354, 360, 361, 365
attending, 9 basis, 242
attention, 7, 12, 20, 35, 56, 61, 62, 86, 88, 89, 100, 167, 244, batch, 24, 65, 107, 109, 110, 160, 216, 218, 219, 222, 295,
264, 299, 349 300, 301, 303, 304, 305, 315, 321, 324, 362, 383, 384, 385,
attractive, 93 386, 387
attributed, 152, 265, 305, 314 batch-centric, 109, 110, 137, 309
attributes, 57, 62, 78, 254 batches, 219
auditing, 182 batch-focused, 110, 383
author, 2 batching, 305, 337, 354
authority, 337 batteries, 302
authorization, 75 battle, 12, 17
authorized, 72, 248 battled, 203
authors, 36, 251 battles, 151
auto, 286 behavior, 25, 246, 271, 301, 303, 304, 305
automated, 3, 183, 248 behemoth, 245
automatic, 37, 177, 230, 231, 276, 286, 301 benchmark, 81, 242
automatically, 111, 187, 231, 264, 276, 285 beneficial, 164, 184, 266
availability, 19, 228, 261, 302, 315, 325, 381 benefit, 3, 4, 27, 41, 53, 56, 61, 78, 165, 197, 199, 208, 261,
average, 15, 21, 22, 30, 31, 32, 38, 46, 47, 48, 51, 61, 77, 84, 266, 278, 351, 376
85, 94, 110, 112, 113, 114, 116, 117, 118, 124, 126, 127, 128, benefited, 36, 379
130, 132, 133, 137, 143, 146, 148, 149, 150, 175, 195, 196, benefits, 3, 19, 36, 37, 41, 93, 137, 276
198, 199, 203, 212, 285, 302, 304, 305, 314, 316, 317, 318, best, 1, 2, 8, 9, 13, 14, 15, 16, 33, 36, 62, 67, 69, 70, 120, 132,
319, 320, 324, 328, 332, 333, 339, 341, 348, 356, 365, 366, 138, 154, 156, 167, 176, 183, 195, 197, 198, 222, 229, 234,
367, 368, 372, 376, 385, 387 247, 268, 276, 301, 302, 304, 337, 338, 340, 371, 372
average_wait, 47 best-case, 385
averaged, 314, 348 billing, 37
averages, 268, 346, 367 binary, 6, 112, 191
avg, 38, 40 bind, 57, 147, 163, 172, 175, 259, 263, 264, 265, 267, 314, 333
awk, 102, 112, 114, 132, 216 binding, 314
axis, 27, 84, 209, 346, 359 binds, 168, 172
bit, 25, 29, 41, 68, 97, 110, 125, 195, 213, 268, 282, 326
B bitand, 235
bitmaps, 230
backdoor, 61 bits, 74
background, 3, 21, 27, 55, 69, 103, 104, 121, 136, 137, 138, blend, 15
142, 143, 144, 145, 146, 147, 176, 178, 182, 183, 184, 209, block, 21, 22, 25, 43, 44, 45, 49, 50, 51, 52, 61, 71, 72, 74, 75,
251, 270, 292, 295, 296, 297, 298, 301, 303, 304, 305, 306, 76, 77, 78, 81, 93, 95, 97, 102, 126, 147, 148, 149, 173, 175,
308, 309, 311, 314, 315, 316, 317, 318, 319, 320, 321, 322, 179, 180, 187, 188, 189, 190, 193, 194, 195, 198, 199, 200,
323, 324, 325, 326, 333, 334, 335, 343, 344, 346, 347, 356, 201, 203, 204, 205, 206, 208, 210, 211, 214, 219, 220, 221,
357, 384, 385 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233,
back-in-time, 177, 242 236, 237, 238, 239, 240, 241, 242, 243, 244, 248, 257, 280,
backup, 147, 302, 333 281, 282, 283, 289, 292, 293, 294, 295, 302, 305, 306, 331,
backups, 220 334, 335, 344, 348, 349, 353, 358, 369, 370
baggage, 35 block_id, 51, 225
balance, 157, 185, 186, 197, 259, 302, 306, 315, 374 blockage, 299, 302, 323
balanced, 103, 123, 124, 342, 343, 363 block-based, 279
balancing, 123, 124, 130, 180, 187, 202, 222, 276, 288, 322, blocked, 113, 153
364 blocking, 18, 219, 352
barriers, 186 blocks, 21, 43, 45, 48, 49, 50, 51, 52, 53, 57, 61, 62, 72, 81,
base, 91, 117, 121, 176, 220, 356 126, 128, 147, 148, 149, 157, 173, 179, 187, 188, 192, 193,
based, 2, 4, 7, 8, 17, 19, 25, 29, 30, 31, 36, 44, 45, 46, 47, 50, 198, 201, 206, 208, 213, 215, 216, 218, 219, 224, 225, 226,
55, 56, 61, 63, 64, 65, 71, 77, 84, 85, 86, 87, 88, 90, 91, 105, 227, 228, 229, 230, 232, 233, 239, 242, 243, 248, 279, 283,
107, 110, 115, 117, 126, 128, 130, 132, 136, 137, 140, 144, 293, 295, 315, 324, 347, 348, 365, 368, 369, 371, 372, 373,
145, 146, 147, 151, 155, 156, 157, 158, 164, 165, 167, 168, 380
169, 172, 173, 176, 177, 178, 180, 181, 182, 184, 187, 193, body, 26
196, 200, 201, 206, 207, 224, 227, 231, 232, 233, 234, 235, bogus_rapid_commits, 312
239, 254, 257, 260, 262, 264, 265, 277, 285, 286, 294, 297, boiling, 81
298, 306, 312, 315, 317, 328, 332, 333, 334, 335, 336, 337, boils, 162
339, 340, 341, 343, 344, 346, 347, 348, 349, 350, 354, 355, bonus, 205
356, 358, 365, 368, 369, 370, 371, 381, 383 bonuses, 13
baseline, 14, 15, 17, 24, 186, 304
baselines, 16
book, 1, 2, 3, 4, 5, 7, 8, 9, 11, 12, 15, 22, 25, 26, 27, 29, 45, 56, bundled, 292
100, 101, 102, 103, 136, 186, 189, 207, 228, 251, 284, 327, bursts, 216, 219, 306
328, 336, 348, 364, 365, 380, 382, 387 busier, 22, 46, 105, 107
booking, 1 business, 13, 14, 16, 17, 20, 21, 23, 69, 70, 97, 104, 180, 220,
books, 8, 36, 45, 251, 365 229, 280, 302, 313, 315
booted, 112, 113 business-centric, 62
bot, 125, 153 business-critical, 15
bots, 156 busy, 7, 22, 28, 51, 68, 69, 81, 82, 83, 103, 104, 105, 106, 107,
bottleneck, 4, 8, 18, 19, 20, 21, 22, 24, 31, 48, 83, 92, 93, 97, 108, 109, 111, 116, 124, 125, 130, 137, 146, 161, 221, 222,
102, 116, 118, 120, 121, 122, 123, 125, 128, 134, 141, 148, 223, 224, 226, 227, 228, 229, 230, 231, 232, 233, 238, 297,
149, 203, 220, 221, 262, 279, 285, 307, 309, 314, 323, 324, 305, 316, 317, 342, 348, 355, 385, 386
338, 339, 349, 351, 352, 353, 354, 355, 356, 357, 358, 361, busyness, 83, 106, 107, 108, 109, 110, 125
366, 368, 385 button, 33, 372
bottlenecked, 136, 137, 366 bypassing, 309
bottlenecks, 352 byte, 292
bound, 367 bytes, 6, 43, 91, 134, 183, 216, 232, 270, 276, 277, 293, 294,
box, 24, 103, 136, 213, 237, 284 295, 296, 297, 301, 305, 308, 315, 321, 324, 332, 349
boxes, 237
breach, 30
breadth, 8, 158
C
break, 13, 16, 42, 63, 69, 154, 186, 342 c_type, 222, 323
breakdown, 53, 90, 141, 150, 154, 157, 179, 336 cable, 132
breaking, 265 cables, 324
breaks, 157 cache, 5, 19, 21, 22, 25, 26, 31, 32, 41, 43, 46, 48, 57, 61, 64,
breakthrough, 41 72, 74, 75, 76, 77, 78, 81, 86, 87, 89, 90, 91, 92, 93, 94, 98,
breed, 1, 20 99, 111, 117, 121, 126, 128, 134, 148, 149, 152, 157, 158,
breeding, 11 173, 175, 178, 179, 180, 181, 185, 186, 187, 188, 189, 190,
breeds, 102 191, 193, 194, 195, 198, 200, 202, 203, 204, 205, 206, 207,
bridge, 7, 27, 343 208, 209, 210, 211, 212, 213, 214, 215, 219, 220, 221, 222,
broke, 54 228, 229, 233, 243, 244, 245, 246, 247, 248, 249, 250, 251,
broken, 321, 324 252, 253, 254, 255, 256, 257, 259, 260, 261, 262, 263, 264,
browser, 153, 154, 156, 163 265, 266, 268, 271, 272, 273, 275, 276, 277, 278, 279, 280,
281, 284, 286, 292, 295, 311, 325, 328, 347, 348, 365, 366,
btime, 113
367, 368, 369, 371, 372, 373, 374, 375, 378, 381
bucket, 143, 145, 193, 194, 208, 220, 251, 254, 255, 257, 258,
cached, 21, 99, 157, 186, 187, 190, 198, 199, 203, 245, 247,
259, 261, 265, 268
248, 255, 261, 271, 272, 279, 293, 372
buckets, 27, 55, 191, 193, 195, 198, 199, 203, 247, 251, 252,
cache-like, 261
254, 257, 258, 263, 265, 343
cache-related, 262, 263
budget, 16, 169, 218, 229, 332
caches, 186, 203, 248, 261, 276
budgetary, 368
caching, 125, 126, 218, 220, 289, 304, 317, 337, 339, 340,
budgets, 161
341, 343, 348, 354
buffer, 5, 19, 21, 22, 25, 29, 31, 32, 43, 45, 48, 57, 72, 74, 75,
calculate, 89, 116, 195, 196, 316, 321, 330, 331, 335, 336,
76, 77, 78, 81, 91, 93, 126, 128, 148, 149, 150, 152, 157, 180,
338, 349, 353, 356, 358, 377, 383
182, 183, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
calculated, 56, 89, 90, 117, 304, 330, 331, 338, 339, 340, 341,
195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206,
346, 349, 352, 354, 358, 366, 377, 386
207, 208, 209, 210, 211, 212, 213, 214, 215, 217, 219, 220,
221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, calculates, 277
233, 238, 240, 241, 242, 243, 244, 246, 248, 250, 251, 265, calculating, 132, 336, 341, 356, 369
268, 276, 279, 280, 281, 282, 283, 284, 292, 293, 294, 295, calculation, 31, 88, 91, 116, 184, 191, 213, 328, 331, 338,
296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 308, 339, 340, 341, 356, 366
309, 310, 311, 314, 315, 316, 317, 322, 324, 325, 328, 339, calculations, 32, 116, 156, 329, 330, 354, 355, 359, 366
346, 347, 348, 353, 365, 366, 367, 368, 369, 371, 372, 373, calendar, 37
374, 375, 378 call, 3, 21, 22, 28, 29, 31, 32, 38, 42, 43, 44, 56, 61, 63, 68, 80,
buffer%busy, 224 82, 83, 84, 85, 91, 102, 107, 119, 120, 123, 128, 129, 130,
buffer-centered, 215 142, 143, 147, 148, 149, 175, 176, 177, 178, 184, 195, 205,
buffered, 21, 176, 187, 280, 304, 312 208, 217, 218, 220, 228, 243, 244, 264, 300, 302, 306, 316,
buffers, 21, 22, 46, 48, 64, 72, 75, 77, 78, 81, 87, 89, 90, 126, 317, 318, 319, 323, 324, 333, 340, 341, 345, 350
148, 152, 177, 178, 180, 181, 182, 183, 187, 188, 189, 190, called, 17, 18, 21, 24, 25, 28, 38, 43, 75, 80, 88, 93, 97, 101,
193, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 103, 104, 105, 119, 120, 143, 149, 155, 159, 162, 167, 188,
208, 210, 211, 213, 214, 215, 216, 219, 220, 221, 222, 223, 189, 191, 193, 194, 199, 203, 204, 205, 207, 215, 230, 231,
227, 228, 229, 279, 282, 286, 292, 299, 302, 303, 311, 325, 237, 238, 248, 249, 250, 251, 258, 261, 268, 270, 271, 275,
366, 373, 375, 376, 386 280, 281, 292, 293, 327, 338, 342, 345, 349, 366
bug, 98, 268 caller, 302
bugs, 268, 284
calls, 28, 29, 30, 39, 45, 85, 91, 97, 126, 128, 129, 142, 143, child, 26, 75, 78, 80, 248, 249, 250, 251, 252, 253, 254, 255,
147, 148, 159, 160, 162, 173, 176, 177, 178, 184, 217, 218, 267, 299
219, 264, 276, 292, 315, 316, 317, 318, 319, 328, 329, 330, child#, 249, 255, 299
331, 332, 341, 343, 369 children, 26, 288, 299
campaign, 17 chunk, 73, 247, 250, 251, 256, 268, 270, 271, 277, 278
campaigns, 17 chunks, 247, 248, 250, 256, 268, 270, 271, 276, 278
cancel, 338 chunk-specific, 270
candidate, 358 circle, 18, 19, 20, 24, 25
capabilities, 4, 5, 14, 15, 37, 92, 96, 123, 177, 231, 237, 280, circles, 20, 52, 68, 101, 156
281, 291, 293 circle-specific, 25
capability, 9, 86, 113, 123, 142, 165, 174, 177, 275, 291, 314, circular, 296
336 circumstances, 110
capable, 14, 123, 336 claim, 300
capacity, 6, 16, 21, 30, 53, 57, 68, 110, 111, 112, 118, 122, claiming, 62, 130
130, 136, 141, 146, 148, 149, 159, 160, 161, 163, 203, 219, claims, 245
220, 265, 285, 292, 299, 309, 314, 315, 316, 317, 319, 324, clarify, 13, 155, 252, 325
326, 331, 332, 336, 337, 338, 340, 342, 348, 355, 356, 357, clarity, 156, 162, 332
360, 361, 362, 363, 365, 366, 367, 368, 372, 374, 384, 385
class, 47, 49, 71, 89, 145, 292
capacity-planning, 36
classes, 7, 69, 164, 292
capital, 258, 362, 363
classic, 27, 28, 46, 68, 69, 86, 88, 126, 132, 137, 148, 155, 159,
capture, 137, 138, 164, 248
176, 191, 192, 195, 202, 214, 259, 279, 285, 311, 314, 338,
captured, 61, 62, 63, 115, 163, 328, 329, 384 340, 342, 343, 346, 348, 354, 355, 365, 367, 386
captures, 56, 61, 62 classification, 47, 49, 53, 54, 55, 61, 62, 104, 136, 138, 140,
capturing, 267, 312, 384 141, 157, 164, 181, 345
card, 31, 37, 133, 134, 251, 324, 328 classifications, 45, 53, 54, 55, 136, 142, 178, 347
career, 4, 17, 53, 78, 86, 102, 130, 132, 152 classified, 12, 55, 57, 63, 83, 104, 118, 137, 143, 178, 262,
cascade, 246, 249 336, 345, 384
cascading, 187, 286 classifies, 157
cat, 113, 114, 174, 216, 240, 241, 242, 294, 312 classify, 27, 53, 55, 136, 142, 143, 145, 157, 178, 337, 344,
catalog, 251 345
categories, 15, 27, 54, 56, 62, 71, 118, 119, 138, 139, 343 classifying, 53, 316, 343, 344
categorization, 56 clean, 55, 177, 275
categorize, 27, 56, 83, 218 cleaned, 293
categorized, 83 cleaning, 24
categorizing, 53, 119, 157 cleanout, 240, 242, 293
category, 54, 55, 56, 63, 119, 155, 344 cleanouts, 295
caution, 364 cleansing, 25
cautious, 365 clear, 2, 18, 30, 31, 35, 38, 46, 96, 118, 153, 187, 202, 247,
cautiously, 315, 326 259, 301, 350
celebrate, 17 cleared, 97, 183
cell, 365 clearer, 188
cells, 365 clearly, 3, 7, 8, 14, 16, 19, 21, 22, 24, 25, 35, 38, 40, 43, 45,
center, 80, 228, 229, 324 50, 55, 62, 65, 83, 86, 88, 89, 118, 123, 128, 130, 132, 134,
centered, 155, 184, 228, 310, 323 137, 145, 148, 152, 156, 175, 176, 192, 197, 201, 206, 222,
centiseconds, 47, 137, 145, 332, 333, 338 227, 238, 242, 244, 245, 246, 250, 255, 257, 260, 262, 266,
central, 8, 12, 45, 182, 230, 246, 247, 295, 342 274, 279, 292, 295, 304, 305, 306, 314, 317, 323, 336, 351,
centric, 61, 359 352, 355, 362, 365, 370, 379, 381
chain, 22, 46, 48, 76, 77, 78, 126, 148, 152, 190, 191, 194, 195, client, 1, 16, 17, 37, 48, 63, 64, 65, 123, 153, 154, 155, 156,
196, 197, 198, 199, 203, 204, 205, 206, 207, 208, 210, 211, 158, 159, 160, 161, 162, 163, 164, 165, 167, 168, 169, 170,
212, 213, 214, 215, 220, 221, 222, 243, 244, 248, 251, 255, 171, 172, 173, 174, 175, 176, 177, 181, 349, 384, 385
259, 261, 265, 268, 276, 282, 283, 311, 328 client/server, 63, 155, 156, 159, 160, 161
chain-related, 214 client_id, 164, 167, 168, 170, 171, 172, 173, 174
chains, 46, 48, 64, 75, 77, 87, 89, 90, 91, 126, 148, 152, 178, client_id_stat_disable, 168, 170, 174
180, 181, 188, 190, 191, 194, 195, 196, 197, 198, 199, 202, client_id_stat_enable, 168, 172
203, 204, 205, 209, 210, 213, 214, 215, 245, 247, 251, 252, client_id_trace_disable, 168, 170, 174
259, 265, 268, 276, 279 client_id_trace_enable, 168, 172
chaos, 15 client_identifier, 164, 171, 172, 173
chaotic, 11 client_info, 165
characteristic, 69, 252 clients, 3, 16, 37, 161
characteristics, 1, 5, 12, 23, 127, 248, 300, 337, 346, 348 clock, 94, 284, 285
characters, 62, 258 clockwise, 183, 296, 297, 308
chief, 1, 13, 284 clone, 199, 200, 241, 244
clone.sql, 200
cloned, 200, 201, 240, 242, 243, 244, 283, 292 commercial-grade, 184
clones, 200, 201 commit, 61, 127, 241, 282, 283, 292, 295, 298, 301, 302, 303,
cloning, 195, 199, 200, 201, 202, 203, 236, 242, 243, 283 304, 305, 306, 307, 308, 310, 311, 312, 313, 314, 315, 324,
cluster, 173, 286 349
cluster_database, 286 commit_write, 302, 303
code, 25, 37, 38, 39, 40, 41, 42, 43, 45, 46, 50, 51, 65, 70, 71, commits, 127, 173, 187, 241, 281, 282, 283, 287, 293, 298,
72, 73, 74, 75, 78, 82, 90, 91, 92, 93, 97, 98, 104, 115, 117, 301, 303, 304, 305, 306, 307, 310, 311, 312, 314, 315, 328,
118, 119, 140, 141, 150, 163, 170, 172, 174, 176, 185, 186, 331, 349
197, 210, 214, 223, 232, 250, 252, 253, 259, 272, 273, 274, committed, 231, 238, 240, 241, 242, 244, 280, 293, 294, 295,
275, 278, 279, 287, 288, 304, 305, 307, 310, 311, 316, 333, 301, 302, 305, 311, 313, 324, 325
334 committing, 310, 312, 314
coding, 92 communicate, 1, 3, 8, 13, 18, 23, 53, 54, 65, 155, 159, 299,
coefficient, 339 384
cohesive, 19 communicates, 53
col, 38, 51, 138, 139, 140, 171, 198, 225, 233, 235, 248, 250, communicating, 3, 7, 8, 15, 53, 159, 160, 364
320, 334 communication, 8, 11, 23, 53, 54, 63, 64, 156, 159, 350, 358,
col$, 139, 233, 248, 250 384
cold, 124, 208, 211, 212, 215 communications, 23, 63
collaboration, 93 community, 17, 27, 151, 157, 182, 362
collapse, 54 companies, 13, 74, 156, 162
collapsed, 281 company, 9, 13, 63, 134, 186, 279
colleague, 1, 46, 101, 104 company-critical, 17
colleagues, 1, 9, 23, 25, 265, 275, 289, 299 compare, 107, 144, 263, 266, 331, 354
collect, 15, 53, 137, 164, 167, 168, 320, 332, 333, 334, 341, compare-and-swap, 97
344, 348 compared, 27, 61, 91, 124, 186, 244, 285, 304, 377, 380, 383
collected, 37, 62, 132, 142, 164, 169, 284, 305, 329, 330, 332, compares, 244
365, 383 comparing, 57, 243, 281, 287, 340, 373
collecting, 17, 158, 168, 171, 173 comparison, 14, 244, 266, 354, 373, 382
collection, 5, 8, 9, 15, 36, 37, 53, 55, 63, 136, 137, 140, 142, comparisons, 382
156, 158, 163, 164, 165, 167, 168, 169, 170, 171, 172, 173, compatibility, 286
174, 176, 177, 182, 332, 338, 347, 368, 369, 373, 384, 387 competition, 262
collector, 5, 182, 184 compilation, 147, 246, 333
collectors, 177, 184 complex, 1, 2, 3, 4, 5, 20, 52, 53, 92, 93, 103, 122, 132, 152,
collects, 42, 163, 177, 182, 330 163, 194, 219, 249, 251, 345
collide, 133 complexity, 21, 36, 53, 54, 92, 223, 256, 289, 345, 365, 384
collision, 132, 133, 193 complicate, 138, 322
collisions, 132, 133, 134, 193, 194, 196, 197, 200 complicated, 50, 145, 153, 162, 247, 257, 286, 334
colorful, 80 component, 2, 45, 46, 56, 67, 153, 156, 348, 384
column, 47, 63, 65, 84, 88, 89, 90, 111, 115, 116, 117, 121, components, 23, 29, 53, 57, 62, 72, 100, 112, 114, 153, 157,
122, 129, 130, 150, 165, 178, 179, 180, 187, 188, 189, 213, 159, 163, 245, 248, 249, 251, 282, 284, 328, 344, 345, 359
216, 223, 227, 228, 230, 232, 233, 234, 235, 236, 237, 238, compression, 182
240, 246, 256, 274, 281, 283, 306, 314, 321, 366, 376, 377, compromise, 203, 233, 315
378, 379, 380, 382 computer, 28, 41, 69, 104, 119, 161, 162, 207, 266
columns, 47, 49, 50, 88, 89, 111, 112, 115, 129, 165, 171, 216, computers, 53, 162
227, 249, 322, 323, 384 computing, 29, 68, 134, 153, 157, 158, 160, 162, 163, 176,
combat, 222 244, 276, 279, 292, 345, 358, 387
combination, 1, 18, 31, 46, 54, 109, 151, 168, 176, 191, 193, concatenated, 239, 273
213, 214, 220, 221, 222, 249, 275, 277, 292, 308, 337, 341, concatenation, 240
342, 360, 361, 362, 363, 369 concentrated, 299
combinations, 164
concentrating, 46
combine, 53, 100, 130, 369, 371
concept, 52, 97, 110, 155, 207, 209, 215, 291, 327, 367
combined, 31, 53, 54, 63, 100, 127, 164, 182, 185, 199, 213,
concepts, 7, 8, 27, 53, 94, 102, 245, 247, 292, 364, 365
228, 230, 251, 270, 275, 277, 285, 299, 304, 327, 331, 353,
conceptual, 135, 251, 252, 270, 295, 297
358, 366, 369, 370, 376, 377, 380
conceptually, 268, 276, 297
combines, 163, 296
conclusion, 105, 197
combining, 16, 165
conclusions, 364, 370
command, 24, 41, 42, 53, 57, 62, 64, 71, 82, 110, 111, 112,
concurrency, 77, 85, 92, 173, 185, 189, 195, 214, 223, 228,
118, 119, 120, 121, 129, 132, 133, 134, 216, 238, 239, 241,
229, 230, 232, 233, 236, 237, 238, 239, 242, 244, 251, 255,
249, 289, 292, 302, 304, 317, 323, 338
265, 297, 309, 347, 348, 385, 386
commands, 21, 102, 116, 117, 122, 132, 136, 338
concurrent, 84, 223, 232, 238, 240, 265
comment, 24
condensed, 323
comments, 24, 169
condition, 80
commercial, 14
conditional, 73, 97 constraints, 297

conditions, 1, 29, 62, 81, 188, 221 construct, 29, 248, 251, 352
conduits, 27 constructed, 20, 355, 383
conference, 27, 123, 156, 157 constructing, 369
conferences, 157 construction, 93, 240
confidence, 1, 2, 3, 7, 11, 16, 17, 20, 26, 42, 65, 94, 305, 360 constructs, 246, 249
confident, 3, 26, 37, 65, 247 consult, 314
confidently, 3, 25, 102, 128, 317, 336 consultant, 67
configurable, 177, 208 consultants, 36
configuration, 21, 61, 122, 124, 315, 317, 319, 336 consulting, 9, 36, 37, 63, 105, 118, 123, 155, 186, 291, 302,
configurations, 125 363, 387
configured, 21, 39, 123, 126 consume, 44, 55, 83, 86, 111, 137, 141, 184, 187, 196, 203,
confirm, 20, 39, 134, 202, 220 244, 247, 262, 285, 332, 371, 376, 377, 380, 385
confirmation, 339 consumed, 29, 30, 31, 57, 62, 84, 110, 116, 136, 137, 141,
confirmed, 19, 21, 41, 101, 376 143, 144, 145, 146, 147, 153, 156, 184, 197, 203, 221, 238,
confirms, 179 278, 332, 333, 334, 338, 344, 346, 357, 365, 368, 369, 374,
conflicting, 265 375, 376, 377, 380
confronted, 63, 148, 362 consumer, 181, 196, 222
confronting, 48 consumers, 153, 292
confuse, 68, 117 consumes, 31, 61, 68, 83, 98, 136, 137, 176, 197, 199, 275,
confused, 118, 342 279, 285, 346, 360, 361, 369, 375, 376, 377, 385
confusing, 7, 16, 103, 155, 169 consuming, 12, 18, 21, 22, 31, 44, 62, 84, 86, 97, 119, 122,
confusion, 82, 103, 118, 133, 213, 223, 248, 338 136, 137, 141, 146, 154, 155, 176, 178, 180, 181, 182, 197,
199, 208, 221, 222, 244, 256, 262, 271, 276, 278, 314, 332,
connect, 165, 294
339, 343, 345, 357, 360, 361, 366, 367, 368, 369, 379
connected, 19, 48, 154, 167, 197, 199, 246, 247, 293, 301
consumption, 12, 18, 21, 30, 31, 37, 40, 53, 55, 57, 61, 81,
connecting, 253
83, 84, 89, 91, 110, 113, 136, 137, 138, 139, 141, 142, 143,
connection, 21, 47, 52, 130, 147, 162, 165, 167, 177, 239, 144, 145, 146, 152, 168, 175, 177, 178, 180, 181, 184, 197,
294, 333 202, 203, 210, 214, 248, 250, 252, 258, 259, 260, 261, 262,
connections, 130, 167, 255, 286 263, 265, 276, 278, 285, 317, 318, 333, 334, 335, 338, 341,
connectors, 324 342, 343, 344, 349, 351, 352, 353, 355, 356, 357, 365, 369,
connects, 45 371, 373, 374, 376, 377, 378, 379, 380, 381
consensus, 381, 387 contact, 9, 97, 98, 104, 133
consequence, 93, 136 contacted, 134
consequences, 214, 377 contend, 74, 92, 93, 208, 220, 221
conservative, 309, 367, 382, 386 contending, 73, 77, 92, 198, 199
consider, 9, 14, 19, 29, 83, 103, 123, 138, 139, 149, 197, 201, contends, 296, 297
202, 214, 228, 233, 238, 244, 247, 279, 285, 301, 307, 315, contention, 4, 5, 18, 21, 22, 25, 26, 30, 31, 36, 61, 67, 70, 72,
324, 344, 350, 363 75, 76, 77, 78, 81, 82, 83, 84, 86, 87, 88, 89, 92, 93, 97, 98,
considerations, 14, 229 99, 100, 102, 136, 151, 152, 185, 186, 188, 194, 195, 196,
considered, 16, 55, 74, 83, 111, 112, 133, 179, 199, 207, 210, 197, 198, 199, 201, 202, 203, 205, 207, 209, 213, 214, 218,
215, 228, 257, 277, 278, 387 222, 230, 231, 234, 236, 242, 246, 252, 256, 259, 260, 261,
considering, 31, 136, 141, 178, 180, 203, 219, 259, 316, 356 262, 263, 264, 265, 266, 268, 271, 274, 275, 276, 278, 281,
considers, 62, 139, 263 288, 292, 297, 299, 309, 310, 311, 328, 329, 349, 352
consist, 55, 191, 353 contestants, 80
consistency, 199, 230, 236, 237, 240, 291, 350 context, 167, 293, 314
consistent, 7, 8, 13, 94, 115, 117, 188, 200, 210, 215, 239, contiguous, 226
241, 242, 243, 244, 280, 284, 285, 292, 293, 302, 306, 331, continent, 2
342, 346, 349, 353 continual, 85, 228, 245, 356, 381
consistently, 21, 84, 120, 301 continually, 112, 122
consisting, 153, 194 continue, 16, 37, 47, 53, 65, 80, 81, 98, 104, 161, 165, 195,
consists, 56, 75, 80, 118, 145, 215, 223, 239, 347, 368 209, 245, 247, 270, 271, 276, 284, 341, 364, 365
consolation, 276 continued, 161, 206, 297
console, 105 continues, 14, 28, 42, 48, 74, 80, 85, 91, 141, 169, 187, 210,
consolidate, 41 211, 247, 278, 289, 295
consolidated, 24, 174, 175, 240, 280, 281, 299 contrast, 25, 72, 74, 107, 124, 186, 306, 342, 346, 384
consolidates, 169 contrasting, 343, 382
consolidating, 57, 172, 242 contrasts, 125, 342, 352
consolidation, 9, 23 contribute, 20, 246
constant, 8, 12, 17, 116, 143, 245, 246 contributed, 221, 230
constantly, 3, 177, 234, 265 contribution, 152, 153, 349
constitutes, 83, 386 control, 11, 14, 19, 39, 40, 46, 69, 70, 71, 72, 73, 74, 75, 80,
constrained, 69, 292 81, 87, 91, 92, 93, 96, 97, 98, 99, 104, 107, 109, 122, 126,
148, 152, 182, 189, 194, 197, 201, 202, 209, 216, 237, 242, countries, 2, 9, 20
246, 248, 255, 261, 262, 263, 268, 281, 285, 286, 297, 300, counts, 72, 112, 212, 213
301, 302, 303, 304, 306, 310, 311, 317, 325, 351, 372 courses, 9, 25, 264
controlled, 77, 92, 162, 183, 230, 237, 255, 265, 298 courseware, 1
controller, 324 coverage, 135, 327
controlling, 73, 75, 134, 298, 360, 361, 362 covered, 3, 8, 11, 33, 136, 194, 244, 245, 282, 286, 327, 338,
controls, 67, 92, 177, 203, 204 342, 343
conventions, 142 covers, 15, 31, 146
conversation, 2, 29, 53, 73 cpu, 39, 40, 112, 113, 114, 115, 138, 140, 141, 142, 143, 144,
conversations, 2, 8, 41 145, 146, 147, 175, 178, 180, 181, 333, 334, 346, 347, 356,
conversion, 338 357
convert, 116, 137, 336 crashed, 325
converted, 145 create, 2, 16, 22, 23, 25, 89, 97, 112, 113, 132, 138, 172, 177,
converting, 142, 230, 264, 328, 341 181, 184, 195, 198, 199, 212, 219, 230, 231, 232, 238, 242,
converts, 145 248, 249, 250, 251, 253, 262, 263, 267, 271, 272, 281, 289,
convey, 331, 348, 351, 365, 370 300, 301, 312, 317, 322, 325, 326, 339, 359, 365, 370, 375,
conveyed, 383 377, 385
conviction, 2, 327, 364 created, 9, 17, 32, 37, 63, 77, 88, 94, 97, 99, 139, 157, 162,
convince, 3, 20, 102, 355 165, 172, 174, 193, 201, 206, 232, 233, 237, 238, 239, 245,
convinced, 36 248, 249, 250, 253, 255, 256, 258, 259, 265, 267, 269, 272,
convincing, 3, 102, 151, 168, 381 281, 282, 286, 288, 293, 298, 300, 301, 309, 326, 347, 351,
352, 355, 359, 370, 381
cooling, 211
creates, 20, 76, 82, 119, 139, 196, 199, 203, 239, 250, 257,
cooperation, 11, 20, 21
261, 280, 293, 296, 343, 385
cooperative, 2, 317
creating, 5, 7, 11, 97, 161, 176, 177, 219, 220, 233, 241, 243,
cooperatively, 229
254, 255, 279, 348, 349, 352, 358, 376, 377, 378, 379
coordinating, 75
creation, 92, 95, 97, 163, 206, 232, 233, 250, 251, 258, 259,
coordination, 63, 75 280, 299, 322, 379, 386
copied, 220, 242, 252, 282, 292, 299, 302, 308 creative, 5, 7, 17, 29, 31, 42, 69, 81, 93, 100, 109, 156, 158,
copies, 119, 297, 310 187, 214, 228, 233, 302, 307, 314, 324
copy, 87, 89, 187, 199, 224, 243, 296, 297, 298, 300, 308, 309, creatively, 82, 126, 220, 228, 309, 364
310, 326, 369 creativity, 113, 164, 330
copying, 24, 280, 283, 297, 299, 309 creator, 300
copying/cloning, 283 credit, 241
core, 5, 7, 8, 27, 30, 33, 45, 50, 51, 52, 57, 61, 78, 79, 85, 103, cripple, 81
104, 105, 106, 107, 108, 109, 114, 115, 116, 117, 121, 123, crisis, 14
124, 136, 141, 146, 158, 161, 176, 186, 235, 246, 247, 265,
criteria, 147, 163, 164, 169, 210, 212, 215, 260, 333
278, 279, 284, 287, 293, 304, 336, 338, 340, 342, 343, 346,
critical, 12, 119, 188, 253, 292
352, 355, 359, 360, 365, 368, 386
cores, 7, 22, 30, 31, 57, 85, 103, 104, 105, 107, 109, 114, 116, critically, 100
117, 118, 136, 137, 146, 184, 197, 288, 336, 337, 338, 341, cross, 117, 244, 280
342, 345, 354, 355, 356, 361, 362, 368, 374, 375, 385, 387 crossed, 212
corporate, 176 crosses, 211, 212, 215
correlate, 347, 349 crossover, 55
correlated, 24, 101 cross-reference, 255
correlates, 129 cross-referenced, 45, 236
correlation, 88, 130, 312, 339, 371 cross-referencing, 235
corresponding, 6, 42, 93, 112, 155, 169 crystal, 153
corresponds, 231, 246 crystal-clear, 157
corrupt, 231 csc, 240, 241, 242
corrupted, 209, 291 cu, 139, 173
corrupting, 75 culmination, 2
corruption, 73, 187 cultures, 20
cost-based, 233 cumulative, 173
count, 17, 38, 40, 43, 62, 78, 79, 80, 81, 86, 93, 96, 97, 139, currency, 348
147, 152, 173, 175, 188, 189, 197, 200, 204, 207, 209, 210, current, 18, 25, 31, 49, 96, 111, 122, 155, 173, 175, 183, 188,
211, 212, 213, 214, 215, 216, 221, 255, 256, 264, 265, 267, 199, 200, 203, 204, 242, 271, 282, 283, 287, 291, 292, 293,
269, 270, 294, 349 301, 302, 303, 353, 372, 375, 376
counted, 62 currently, 31, 49, 50, 56, 62, 73, 87, 93, 97, 110, 112, 116,
counter, 207, 212 121, 123, 174, 178, 180, 182, 208, 241, 242, 250, 285, 325,
count-frequency, 207 332, 369
counting, 41, 119, 269 cursor, 74, 77, 90, 92, 93, 94, 95, 98, 99, 173, 233, 246, 247,
248, 249, 250, 251, 254, 255, 256, 258, 259, 260, 261, 262,
countless, 13, 65, 265
263, 264, 265, 266, 267, 268, 271, 273, 277, 278, 289
cursor_sharing, 248, 264 deceived, 31

cursor_space_for_time, 259, 260, 278, 289 deceiving, 14
cursor-related, 248, 267, 268 declare, 139, 312
cursors, 92, 99, 173, 246, 247, 248, 249, 250, 252, 253, 254, decode, 79, 81, 235, 260, 273
255, 259, 261, 266, 267, 268, 271, 274, 276, 277, 278 dedicated, 157, 183, 341
curve, 27, 28, 29, 30, 54, 107, 109, 116, 120, 342, 343, 345, deep, 1, 2, 3, 4, 5, 8, 15, 17, 26, 33, 35, 67, 71, 85, 88, 101,
346, 347, 348, 349, 350, 351, 355, 356, 358, 360, 361, 362, 107, 184, 186, 244, 266, 284, 307, 327, 355, 365, 385
363, 364, 369, 370, 375, 376, 378, 382, 383, 385, 387 deeper, 5, 17, 20, 21, 33, 43, 44, 55, 77, 107, 134, 150, 187,
curves, 341, 348, 375, 381 199, 236, 238, 242, 328
customer, 16, 65, 130, 156, 164, 176, 365, 367, 372, 376, 378 deeply, 72, 156, 183, 237, 326
customers, 175, 176, 186, 206, 264, 279, 293, 294, 295, 300 default, 56, 76, 80, 85, 176, 177, 183, 196, 199, 200, 203, 207,
customize, 23 208, 209, 211, 212, 213, 214, 215, 219, 221, 231, 237, 238,
customized, 9 259, 261, 262, 266, 272, 277, 278, 279, 280, 285, 286, 298,
cutoff, 210 301, 303
cycle, 16, 94, 309, 369, 371, 372, 378, 379, 380, 382 defaults, 78, 196, 212, 261, 277
cycled, 71, 260 definition, 14, 55, 95, 139, 153, 157, 188, 212, 248, 301, 332
cycles, 22, 25, 36, 85, 86, 97, 151, 186, 197, 214, 278, 280, deflecting, 13
288, 336, 367, 375, 380 degradation, 185, 261, 350
cyclical, 151 degrade, 104, 342, 378, 387
cycling, 216 degraded, 345
degrades, 341
degrading, 117
D degree, 345, 383
dangerous, 32 degrees, 72
database-centric, 154 delay, 132, 143, 271, 320
data-collection, 15, 36 delayed, 240, 293
datafile, 51, 225, 239, 241 delaying, 324
date, 134 delays, 93
day, 8, 14, 37, 61, 76, 101, 118, 132, 134, 186, 302, 383 delete, 219, 241
db%scat, 52, 150, 179 deleted, 183, 300
db%scat%, 150 deletes, 322
db_block_buffers, 198 deleting, 238
db_block_size, 198, 335 delta, 152
db_multiblock_read_count, 43 demand, 262
db_writer_max_scan_pct, 219, 221, 222 demanding, 81, 264
db_writer_processes, 215 demands, 61, 199, 313
dba_data_files, 51, 224, 225, 226 demonstratable, 151
dba_extents, 50, 51, 224, 225, 226 demonstrate, 20, 24, 48, 128, 142, 143, 177, 188, 240, 248,
dba_extents-based, 225 257, 265, 293, 306, 323, 339, 340, 343, 349, 360, 364
dba_hist_mutex_sleep, 98 demonstrated, 16, 20, 142, 155, 182, 322, 348, 372
dba_lock, 233 demonstrates, 3, 20, 31, 51, 105, 143, 295, 306, 341, 342
dba_objects, 234, 235, 236 demonstration, 339
dba_segments, 227, 230 denial-of-service, 134
dba_tables, 237 denominator, 227, 340, 341
dbf, 50, 201, 226, 293, 325 density, 229
dblink, 349, 384 department, 162
dbms_application, 312 departure, 208
dbms_lock, 138, 140, 172, 294, 312, 320, 330, 334 dependability, 350
dbms_monitor, 167, 170, 172, 174 dependable, 4, 65
dbms_session, 165, 167 dependency, 256, 286
dbms_shared_pool, 260, 272, 273, 274, 278 dependent, 176, 226, 266, 276
dboc.sql, 260, 272, 273, 274 depending, 334, 345
dbw, 143, 216, 217, 218 depends, 76, 105, 127, 169, 231, 294, 361
deadline, 328 deprecated, 208
deadlock-type, 76 depth, 27, 42, 55, 139, 262
deallocate, 248, 250, 259, 271, 277 dequeue, 80, 233
deallocated, 74, 92, 93, 246, 248, 250, 256, 259, 267, 268, dequeued, 72
271, 275, 277, 281 derive, 33, 52, 332, 341, 354
deallocating, 268, 271 derived, 359
deallocation, 250, 278 deriving, 337
death, 1, 68, 110, 384 descending, 90
debate, 69, 387
describe, 41, 74, 118, 119, 123, 124, 167, 194, 261, 296, 301, dflt, 198
307 diagnose, 4, 5, 7, 8, 18, 20, 25, 26, 36, 41, 123, 145, 177, 365
described, 23, 25, 77, 217, 233, 252, 278, 307, 315, 348, 351, diagnosed, 5, 18, 33, 52, 292
354, 370, 373 diagnosing, 19, 26, 65, 149, 153, 162, 292, 366
describes, 208 diagnosis, 1, 2, 3, 4, 7, 8, 11, 15, 18, 22, 26, 27, 31, 33, 35, 36,
describing, 296 37, 51, 52, 54, 65, 70, 86, 135, 151, 153, 177, 182, 184, 223,
description, 49, 53, 198 268, 288, 292, 381, 387
descriptions, 4, 23, 129 diagnostic, 4, 14, 37, 45, 53, 56, 136, 137, 150, 163, 177, 182,
descriptive, 63 223, 235, 343, 365, 366, 368, 370, 375, 380
deserves, 71, 86, 89, 154, 327, 349 diagrammed, 209, 255, 256
design, 162, 230, 301 diagramming, 204
designates, 296 diagrams, 229
designed, 8, 17, 62, 98, 163, 186, 275, 279, 285, 311 dictated, 132
designing, 177 dictates, 81
desktop, 161 dictionary, 50, 71, 139, 194, 195, 199, 208, 220, 233, 248, 250
destroy, 91, 277, 283, 326 dictionary-based, 72
destroyed, 93, 248, 250 differentiate, 68, 137, 145
destroying, 206, 255 differentiates, 246
destruction, 91, 259 differentiating, 246
destructive, 16, 317 differently, 117, 118
detached, 129, 143, 319 digital, 6
detail, 22, 27, 29, 44, 45, 53, 61, 76, 97, 129, 141, 175, 188, digits, 223
190, 195, 199, 208, 218, 223, 228, 251, 279, 282, 297, 304, dimension, 65
317, 331, 343, 351, 352 diminished, 19
detailed, 24, 39, 41, 45, 49, 53, 55, 86, 98, 128, 137, 149, 150, diminishes, 86, 199
176, 194, 197, 202, 275, 337, 342, 348, 350, 353, 358, 372 diminishing, 275
details, 2, 4, 15, 24, 27, 31, 33, 39, 41, 45, 47, 49, 50, 52, 53, diplomatic, 12
54, 55, 56, 62, 64, 71, 73, 84, 87, 91, 96, 100, 112, 116, 119, direct, 18, 36, 46, 87, 99, 126, 127, 151, 208, 214, 238, 242,
123, 137, 139, 140, 142, 143, 153, 163, 165, 172, 175, 176, 262, 263, 277, 285, 307, 308, 310, 340, 349, 373, 376, 382,
177, 182, 183, 185, 190, 201, 222, 224, 234, 235, 247, 251, 386
255, 267, 270, 312, 314, 317, 321, 325, 333, 336, 338, 343, directed, 77, 112, 124, 369
344, 345, 348, 350, 353, 354, 369, 374, 377 directing, 132
detect, 98, 102, 125, 163, 185, 279, 285 direction, 364, 365
detected, 250 directly, 13, 14, 18, 24, 35, 36, 37, 52, 61, 65, 72, 88, 104, 116,
detecting, 86, 188 129, 145, 157, 158, 165, 176, 177, 179, 182, 183, 185, 227,
determination, 383 229, 240, 251, 277, 279, 281, 285, 286, 303, 309, 312, 321,
determine, 16, 18, 37, 38, 49, 50, 63, 70, 72, 78, 88, 89, 91, 322, 326, 341, 351, 364, 371
93, 102, 116, 129, 133, 134, 142, 143, 154, 178, 187, 190, directory, 132, 170, 174
213, 215, 219, 226, 227, 232, 234, 235, 241, 272, 277, 314, directs, 141, 179
334, 335, 337, 362, 364, 385, 386 dirty, 187, 188, 189, 190, 203, 204, 210, 213, 215, 216, 218,
determined, 94, 129, 188, 189, 212, 224, 235, 269, 305, 319, 219, 220, 221, 222, 279, 306, 325
320, 385 dirtying, 216
determines, 196, 207 disable, 168, 169, 170, 174, 183, 288
determining, 8, 63, 86, 234, 241, 250, 322, 337 disabled, 94, 174, 263, 266, 284
deterministic, 233 disables, 280, 286
devaluing, 215 disabling, 169
develop, 1, 2, 3, 19, 20, 23, 29, 35, 37, 65, 81, 86, 88, 156, disagree, 123, 184, 387
187, 192, 223, 229, 234, 246, 280, 299, 317 disappear, 207, 277
developed, 29, 33, 56, 66, 102, 186, 206, 223, 282, 283 disappoint, 153
developer, 38, 39, 45, 55, 72, 73, 78, 80, 123, 138, 150, 201, disarms, 130
264, 293, 301, 311 disaster, 81
developers, 3, 37, 40, 41, 42, 47, 71, 72, 82, 91, 92, 93, 95,
disastrous, 15, 193
97, 98, 130, 161, 164, 189, 197, 206, 207, 227, 231, 238, 258,
disciplines, 157, 182
263, 264, 271, 278, 279, 289, 292, 300, 321, 330, 385
disclose, 35
developing, 3
disconnect, 165, 169
development, 37, 38, 39, 40, 156
disconnected, 47, 165, 253
develops, 20, 67
disconnects, 37, 45, 137, 142, 176, 301
deviation, 15, 132, 304
disconnets, 142
device, 43, 51, 103, 112, 123, 124, 125, 126, 129, 130, 319,
discouraged, 309
326, 341, 342, 368
discover, 21, 26, 27, 56, 61, 70, 101, 129, 148, 155, 164, 179,
devices, 7, 43, 103, 123, 124, 125, 126, 129, 130, 323, 326,
341, 348, 358, 361, 385 194, 195, 203, 221, 315
discovered, 52, 56, 102, 132, 199, 207, 264, 302, 363
df, 51, 225
discovering, 266 double-count, 83

discovers, 242 download, 9, 177, 305
discredited, 151 downloaded, 9, 365
disguise, 16 downtime, 325, 363
disk, 21, 44, 74, 119, 120, 129, 157, 175, 179, 180, 183, 186, dozen, 177
187, 188, 189, 190, 195, 204, 205, 208, 210, 213, 216, 218, drill, 88, 234, 236
219, 220, 221, 222, 280, 281, 301, 302, 306, 324, 325, 345, drill-down, 47
346, 348 drilled, 181
disk_reads, 179 drive, 292, 350
disk-based, 186 driven, 263
disks, 323 drives, 302, 337
dispersed, 17, 153, 192 driving, 69
display, 162, 274 drop, 107, 132, 172, 182, 198, 253, 309, 312, 325, 341, 348,
displayed, 57, 62, 114, 330, 334, 350 372, 373, 374, 375, 378, 380, 382
displaying, 62, 105 dropped, 132, 134, 172, 233, 264, 300, 373, 374, 375, 378,
displays, 88, 200, 224, 321 379, 381
disposal, 27, 36 dropping, 105, 325
disprove, 3 drops, 94, 104, 360, 361, 362, 372
disputed, 155 dual, 138, 141, 167, 249, 253, 257, 258, 312
disrupted, 65, 81 dual-core, 337
disrupting, 182 dummy, 253, 255, 312
disruption, 35, 381 dump, 71, 120, 172, 238, 239, 240, 241, 242, 249, 253
distill, 103 dumped, 240, 241, 242
distilled, 70 duration, 14, 30, 56, 115, 126, 136, 142, 143, 182, 245, 289,
distinct, 15, 19, 25, 26, 68, 74, 94, 107, 179, 182, 257, 313, 316, 328, 337, 385
369 durations, 382
distinction, 79, 217 dwarf, 155
distinctly, 73, 247 dwindled, 82
distinguish, 159, 238, 239, 345 dynamic, 32, 80, 81, 117, 158, 246, 298, 348
distinguishes, 353 dynamically, 139, 238, 250, 269, 298
distract, 16
distracted, 41
distracting, 33, 350
E
distraction, 350 e_time, 323
distractions, 11, 358 easy-to-use, 45
distraught, 13 economies, 85
distribute, 17, 201 educational, 67
distributed, 69 elaps, 147, 333
distributing, 231 elapsed, 145, 147, 173, 175, 312, 322, 323, 333, 372, 374,
distribution, 192, 193 375, 376, 378, 381, 384, 385, 386, 387
disturbed, 156 elapsed_time, 323
diverse, 14, 100, 277, 366 elbow, 26, 29, 30, 107, 109, 116, 120, 342, 345, 348, 349, 350,
divide, 75, 76, 133, 137, 142, 328, 336 351, 352, 355, 356, 360, 370, 385, 387
divided, 18, 30, 31, 57, 76, 77, 88, 89, 110, 119, 169, 208, 328, ellipsis, 42
332, 344, 360, 361, 369 elusive, 306
divides, 297 email, 9, 22, 23, 25, 134
dividing, 76, 321, 346, 353, 358, 385 embeds, 142
division, 69, 387 employed, 71, 263
doctor, 25, 26, 52, 65 employee, 71, 138, 224, 233, 250, 263
doctor-level, 26 employees, 13, 279, 295
document, 22, 24, 25, 186 employer, 291
documentation, 23, 42, 207, 276, 301, 314, 384 employs, 194
documented, 150, 207, 216, 279, 301 empties, 170
documenting, 17, 25 emulator, 159
dollars, 156, 324 enable, 53, 158, 162, 165, 168, 171, 172, 183, 238, 260, 286,
dominate, 61 293, 295, 309
dominated, 260 enabled, 94, 168, 170, 172, 174, 183, 230, 262, 263, 266, 284,
dominates, 62 285
dominating, 21 enables, 94, 124, 178, 213, 231, 285, 326
doomed, 348 enabling, 33, 37, 39, 65, 167, 169, 231, 286, 306, 387
dotted, 125, 339, 342, 343, 346, 347, 361, 362, 378 encounter, 13, 130, 132, 134, 203, 221, 234, 289, 324
dotted-line, 125 encountered, 13, 142, 149, 274
encountering, 215, 311 exactly, 11, 29, 53, 62, 81, 82, 121, 138, 153, 176, 219, 242,
encounters, 326 246, 248, 342, 381
endpoint, 208 exam, 247
enforcing, 97, 297 examination, 61, 115, 199
engineer, 86 examine, 36, 221, 244, 328
engineers, 156 examining, 165, 249
enlightening, 387 exceed, 116, 197, 200, 215, 298, 299, 300, 331
enq, 72, 235, 236 exceeded, 21, 111, 122, 148, 219, 238, 316, 317, 324, 355,
enqueue, 45, 71, 80, 90, 91, 143, 155, 176, 229, 233, 234, 374
235, 236, 237, 238, 242, 250, 349 exceeding, 118, 122, 123, 326, 332
enqueued, 72 exceeds, 106, 108, 117, 193
enqueues, 71, 72, 74, 80, 233, 234, 246, 250, 352, 386 excel, 11
environment, 11, 67, 158, 162, 188, 203, 265, 272, 301, 309, except, 45, 62, 78, 94, 99, 153, 176, 178, 198, 211, 219, 358
328 exception, 8, 24, 65, 70, 175, 214, 219, 254, 292
equal, 28, 193, 210, 212, 286, 328, 384, 387 exceptional, 54, 100
equality, 96, 328 exceptionally, 72, 86, 98
equally, 124 exceptions, 8, 44
equals, 30, 31, 151, 343 excerpt, 115
equates, 30 excess, 111, 149, 285
equation, 125, 128, 151, 354, 358 exchange, 271
equipment, 93 exclude, 384
equipped, 3, 4 excluding, 215, 340
equipping, 3 exclusion, 91
equivalent, 62, 259 exclusive, 71, 72, 74, 80, 94, 96, 97, 99, 138, 139, 188, 266,
eradicate, 202 267, 301
erased, 183 exclusively, 74, 93, 96, 267, 277, 314
error, 136, 156, 174, 239, 242, 244, 250, 259, 260, 268, 271, exec, 138, 140, 170, 172, 174, 260, 265, 266, 273, 274, 294,
276, 278, 316 320, 330, 334, 351, 372, 374, 375, 380, 382
errors, 19, 129, 134, 143, 260, 261, 272, 274, 275, 276, 277, exec/sec, 265, 266, 374, 380, 382
278, 319 executable, 104, 111, 119, 120, 159, 160, 248
escape, 12 executables, 159
essence, 16, 23, 97, 104, 157, 223, 247 execute, 63, 74, 147, 173, 175, 183, 186, 248, 250, 267, 333
essential, 369, 375, 380 executed, 55, 63, 64, 74, 97, 155, 157, 175, 248, 249, 250,
essentially, 20, 21, 24, 76, 104, 120, 122, 155, 162, 191, 193, 253, 258, 262, 267, 268, 270, 272, 273, 274, 277, 293, 311
212, 246, 265, 302, 371 executes, 250
establish, 20, 23, 24, 33 executing, 57, 136, 175, 194, 248, 262, 377, 378, 379, 380
established, 16, 21, 36, 97, 151 execution, 75, 90, 147, 149, 180, 202, 214, 222, 233, 248,
establishing, 25 249, 250, 251, 254, 259, 265, 266, 267, 268, 272, 277, 322,
establishment, 151 333, 341, 351, 369, 370, 371, 374, 375, 376, 377, 380, 381,
evaluate, 307, 345 382, 383
evaluating, 341, 345 executions, 29, 55, 65, 155, 248, 249, 260, 273, 274, 323,
event, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 56, 61, 63, 65, 331, 332, 339, 344, 349, 351, 375, 376, 377, 380, 381, 382
71, 83, 84, 86, 87, 88, 89, 94, 97, 98, 125, 126, 127, 130, 136, exempt, 215
143, 148, 149, 150, 151, 154, 155, 156, 161, 163, 169, 172, exercise, 16, 23, 168, 350, 357, 359, 372
175, 176, 177, 178, 179, 180, 182, 195, 199, 202, 208, 211, exercises, 26, 364
213, 214, 216, 217, 218, 219, 220, 221, 222, 223, 224, 228, exhausted, 119
229, 231, 232, 233, 234, 235, 236, 238, 259, 262, 267, 268, exhausting, 81
288, 301, 302, 304, 305, 306, 307, 308, 309, 310, 311, 312, exhibit, 342, 346, 348, 371
313, 314, 315, 316, 317, 318, 322, 325, 326, 343, 347, 348, exhibited, 102
349, 352, 356, 358, 365, 366, 384 exist, 44, 68, 75, 117, 123, 157, 160, 174, 176, 186, 195, 202,
event_id, 47, 313 228, 238, 264, 270, 271, 275, 278, 293, 300, 312, 385
event_name, 45, 88, 313 existed, 275
event_type.sql, 56 existence, 186, 385
event-based, 157 existing, 32, 91, 230, 245, 278, 341, 363, 374
event-dependent, 49 exists, 13, 70, 98, 108, 124, 142, 158, 195, 202, 261, 262, 269,
events, 5, 44, 45, 46, 52, 53, 54, 55, 56, 61, 62, 98, 127, 129, 280, 301, 342, 386
130, 155, 175, 214, 218, 219, 220, 223, 227, 230, 231, 232, exit, 37, 253
233, 244, 249, 253, 256, 263, 266, 267, 301, 306, 307, 308, exited, 81
309, 310, 315, 316, 322, 324, 325, 326, 353, 356, 358, 365, exits, 328, 345
366, 384 expanded, 309, 328
evolved, 246 expanding, 289
exact, 36, 103, 124, 182, 204, 248, 257, 258, 263, 264, 342 expend, 35
expended, 261, 345 fault, 13, 120, 183

expense, 69, 124, 314 faults, 120
expensive, 15, 68, 72, 120, 125, 161, 162, 177, 199, 244, 249, fchdsk, 174, 175
250, 259, 264 fetch, 175, 248
experience, 17, 26, 27, 31, 46, 65, 88, 93, 109, 117, 120, 137, field, 207
152, 154, 156, 159, 160, 161, 162, 163, 183, 221, 266, 279, field-proven, 4
309, 341, 359 fields, 75, 365
experienced, 28, 81, 107, 108, 230, 268, 271, 278, 295, 372 file#, 51, 200, 225
experiences, 2, 36, 152, 153, 265 filer, 129
experiencing, 12, 22, 46, 82, 129, 130, 132, 201, 266, 274, filter, 129, 155, 161, 165, 183, 294
278, 302, 309, 312, 313, 314, 319, 328, 329, 338, 339, 347, filtered-out, 63
350, 355, 367, 372 filtering, 62, 229, 259
experiment, 94, 102, 151, 265, 284, 285, 304, 305, 312 finality, 387
experimental, 285 finally, 1, 4, 100, 104, 154, 184, 195, 209, 234, 239, 242, 268,
experiments, 151, 300, 301, 306 281, 316, 354, 364, 387
expert, 14, 86 finance, 17
expertise, 7, 9 financial, 1, 2, 105, 182, 313
experts, 14 financially, 291
explain, 1, 3, 4, 8, 25, 53, 56, 69, 86, 102, 123, 174, 185, 199, findme, 253, 254, 255
210, 244, 268, 284, 327, 351, 352, 365 fine-grained, 91
explained, 30, 318, 367 finger, 21, 44
explaining, 53, 196 finger-pointing, 13, 20, 317, 381
explains, 375 finish, 37, 304, 314
explanation, 39 finished, 183, 297, 307, 308, 325
exploit, 27 finishing, 325
exploits, 25 first_rows, 248, 249, 253
explore, 56, 112 fixed, 65, 123, 182, 183
extended, 142 fixed-length, 228, 238
extends, 163 fixing, 15
extent, 225, 226, 229, 238, 250, 278, 293 flag, 240, 241, 242, 293
extents, 50, 51, 224, 225, 226 flags, 241, 242, 255, 256
extract, 332 flavor, 115
extracted, 252 flavors, 117, 156
extracts, 37 flawed, 25
flaws, 142
F flexibility, 56, 91, 175, 177, 246, 276
flexible, 92, 98, 163, 207, 245, 327, 332, 362
factor, 16, 40, 386 flight, 1, 18
factors, 40, 163, 176, 228, 322, 325, 338 floating-point, 5
factory, 338 flood, 299
facts, 215, 350 floor, 63, 156
fail, 19, 97 flow, 13, 65, 80, 153, 292, 299, 309, 328, 338, 360, 362, 381
failed, 16, 68, 81, 147, 333 flowchart, 210
fails, 80, 94 flowed, 206
failure, 97, 280, 302 flowing, 165, 301
fairly, 29, 82, 123, 156, 160, 197, 270, 275, 292, 326, 342 flows, 163
fairness, 157, 346 flush, 183, 222, 275, 276, 280, 282, 283, 286, 297, 301, 302,
faith, 156 303, 305, 306, 311, 315
FALSE, 79, 81, 95, 96, 215 flushed, 183, 259, 276, 283, 298, 306
falsely, 92 flushes, 275, 281, 298, 306, 308, 315
family, 45 flushing, 271, 275, 276, 305
famous, 29, 354 focal, 61
fanatical, 156 focus, 3, 7, 9, 13, 15, 17, 19, 20, 24, 27, 31, 36, 37, 40, 41, 46,
far, 25, 27, 29, 39, 105, 110, 111, 116, 133, 135, 154, 156, 177, 55, 57, 70, 71, 87, 89, 91, 102, 109, 110, 111, 121, 129, 130,
182, 186, 218, 220, 228, 231, 257, 261, 298, 301, 307, 365, 136, 152, 161, 163, 182, 199, 220, 221, 245, 317, 326, 348,
378, 379, 387 371, 372, 373, 378, 383, 385, 387
far-left, 116 focused, 1, 2, 7, 8, 15, 18, 22, 55, 56, 65, 90, 93, 110, 130,
farm, 93 132, 135, 148, 155, 162, 179, 185, 246, 307, 311, 314, 321,
far-right, 150, 216 327, 348, 354, 358, 359, 370, 373, 384, 387
fast_get, 79 focuses, 4, 22, 47, 56, 65, 158, 164, 230, 370, 371
fast-full, 208
fatal, 157
focusing, 15, 19, 22, 24, 40, 45, 46, 86, 110, 127, 151, 156, future, 15, 17, 212, 332, 382
163, 178, 180, 181, 194, 220, 221, 228, 321, 324, 326, 349,
369, 370
footprint, 91
G
forbid, 121 game, 109
force, 161, 222, 224, 246, 249, 265, 271, 292, 315, 386 gaming, 261, 271, 275
forced, 41, 43, 221, 278, 365 gap, 47, 327
forcefully, 24 gazillion, 170
forces, 128, 176, 192, 245, 275, 284, 303, 311 general, 4, 20, 67, 70, 78, 97, 100, 105, 120, 124, 125, 127,
forcibly, 260 185, 192, 205, 207, 298, 305, 306, 310, 314, 340, 348, 352,
forcing, 150, 219, 246, 261, 264, 324 354, 356, 361, 365
forecast, 365 generally, 13, 207, 375
forecasting, 7, 15, 27, 28, 31, 365 generate, 177, 209, 259, 280, 293, 301, 324, 340
forecasts, 341 generated, 39, 222, 255, 257, 259, 260, 280, 292, 293, 294,
foreshadows, 278 295, 300, 302, 321, 328, 332
forging, 349 generates, 199, 231, 280
form, 23, 45, 91, 183, 278, 280, 299, 338 generating, 36, 52, 219, 257, 289, 293, 295, 300, 301, 307,
formal, 7, 22 314, 319
formally, 25, 279, 342 generation, 258, 292, 294, 297, 300, 308, 315, 321, 322, 324
format, 38, 41, 51, 62, 141, 146, 171, 191, 198, 225, 235, 281, generator, 257
320, 345, 365, 383 gentle, 156
formats, 169 gentler, 118, 265, 365
formatted, 56, 169, 174 gently, 118
formatting, 57, 62, 169 geographically, 17
forms, 332, 336 geometric, 386
formula, 330, 338, 339, 340, 341, 342, 354, 357, 359, 360, get_latch, 79
361, 365, 377, 385, 386 getrusage, 129, 142, 143, 176, 318, 319
formulas, 7, 31, 327, 334, 335, 339, 343, 354 gettimeofday, 42, 43, 44, 82, 129, 142, 143, 176, 184, 195,
forward, 11, 12, 13, 53, 81, 142, 175, 177, 183, 229, 231, 293, 217, 218, 307, 316, 317, 319, 345
345, 387 ggt, 377, 379
forward-thinking, 27 glass, 331, 336
foundation, 35, 36, 136, 299, 327 global, 300, 301, 321, 324
foundational, 56, 343 glossed, 293
founded, 358 government, 17, 104
fraction, 177, 386 governmental, 93, 203
fragmentation, 246 governments, 17
frame, 130, 134, 177, 219 grade, 31
framework, 2, 3, 4, 11, 33, 52, 66, 135 gradual, 108
free, 9, 41, 45, 46, 48, 49, 56, 72, 75, 78, 83, 87, 88, 89, 90, 99, granular, 39, 40, 142
111, 117, 120, 121, 122, 126, 150, 155, 177, 187, 188, 189, granularity, 39, 92, 144, 150, 183
190, 195, 199, 200, 201, 202, 204, 205, 208, 210, 211, 213, granularly, 92
214, 216, 217, 219, 220, 221, 222, 228, 229, 230, 237, 238, grapes, 17
241, 243, 250, 270, 277, 278, 281, 305, 308, 309, 310, 355, graph, 27, 29, 53, 84, 85, 107, 125, 132, 133, 193, 339, 346,
365 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 358, 359,
freedom, 286 360, 363, 370, 371, 372, 375
freely, 102 graphic, 362
frequency, 182, 184, 204, 233, 303 graphical, 124, 381
frequent, 176, 306 graphically, 2, 5, 31, 105, 107, 124, 346, 347, 350, 360, 361,
frequently, 3, 15, 16, 33, 117, 118, 122, 179, 217, 255, 272, 363, 372, 375, 381
273, 274, 303, 305, 306, 310, 366 graphics, 15, 31
fsc, 240, 241, 242 graphing, 15
fsl, 240, 241, 242 graphs, 5, 16, 132, 347, 359, 369, 370
full-table, 205, 206, 208, 294, 295, 376 grep, 42, 44, 82, 102, 112, 114, 132, 143, 174, 216, 217, 218,
function, 7, 79, 80, 81, 83, 88, 96, 97, 155, 159, 160, 167, 191, 307, 316, 319
192, 193, 195, 196, 251, 257, 258, 259, 272, 278 group, 14, 17, 20, 21, 38, 40, 53, 62, 65, 77, 110, 129, 154,
functional, 14, 187 156, 157, 164, 165, 167, 168, 169, 181, 186, 189, 200, 204,
functionality, 91, 274 227, 265, 272, 298, 299, 302, 303, 305, 306, 308, 314, 315,
functioning, 25, 247 324, 325, 326, 348, 364, 384
functions, 7, 78, 79, 95, 191, 203, 246, 247, 249, 273 grouping, 118
fundamental, 27, 75, 248, 327, 331, 350 groups, 17, 20, 164, 292, 325, 326
fundamentally, 123, 124, 125, 328, 342 growth, 237, 301, 378, 381, 382
fundamentals, 70, 327 guarantee, 82, 244
guaranteed, 41, 137, 142, 244, 261 holding, 70, 96, 197, 205, 209, 214, 262, 267, 270, 271, 276,
guess, 2, 53, 65, 80, 102, 136, 137, 184, 247, 332 281, 301
guessed, 9, 26, 43, 142, 281 holds, 267
guesses, 70, 246 holistic, 2, 4, 18
guessing, 3, 66, 86, 214, 299, 369 hook, 102
guesswork, 68 hooked, 98
guidance, 120 horizontal, 28, 29, 68, 84, 209, 346
guide, 13, 18, 23, 201, 352 host, 57, 102, 167, 262, 265
guideline, 127 hostile, 118
guides, 24 hot, 21, 81, 123, 124, 207, 208, 210, 211, 212, 214, 215, 224,
guru, 7 288
gurus, 36 hotels, 2
gut, 67, 369 hour, 14, 32, 132, 133, 183, 184, 205, 312, 332, 336, 337, 346
hours, 14, 17, 37, 55, 65, 132, 156, 207, 212, 220, 265
human, 26, 69, 169
H humans, 29
halt, 157, 171, 325 hybrid, 97
handle, 28, 123, 162, 186, 197, 207, 230, 248, 249, 251, 253, hypotheses, 39
254, 255, 256, 261, 300, 305, 367 hypothetical, 344
handles, 251, 254, 261, 263
hard-coded, 97 I
hardware, 9, 161, 184, 324
harmful, 81 idea, 15, 42, 51, 77, 78, 105, 120, 128, 129, 157, 182, 184, 207,
hash, 191, 192, 193, 194, 195, 196, 198, 199, 203, 208, 213, 233, 272, 295
220, 234, 245, 248, 249, 251, 252, 255, 257, 258, 259, 261, ideas, 24, 299
263, 265, 268, 273 identical, 254, 258
hash_value, 234, 257 identifiable, 22, 64
hashed, 191, 192, 193, 194, 200, 251, 257, 258, 261 identification, 5, 63, 65, 164, 165, 169, 171, 176, 312
hashes, 193, 195, 199 identified, 57, 62, 63, 164, 167, 211, 227, 234, 260, 379
hashing, 76, 77, 191, 192, 193, 194, 195, 197, 199, 200, 201, identifier, 47, 49, 62, 63, 65, 96, 97, 155, 164, 165, 167, 168,
245, 247, 250, 251, 257, 258, 259, 263, 265, 268 169, 170, 171, 172, 173, 174, 176, 177, 179, 180, 181, 234,
hashing-based, 268 236
hash-like, 250 identifies, 63
header, 71, 187, 189, 190, 191, 195, 196, 197, 199, 200, 204, identify, 22, 24, 57, 62, 65, 100, 157, 162, 164, 165, 167, 171,
205, 207, 208, 209, 210, 211, 212, 213, 215, 218, 221, 223, 180, 272, 273, 321, 322, 323, 349
225, 226, 227, 228, 229, 230, 231, 238, 240, 241, 242, 293 identifying, 11, 14, 22, 39, 65, 102, 179, 222, 288, 312, 322,
header_block, 227 350
header_file, 227 idl, 111, 114
headers, 189, 190, 194, 196, 200, 203, 204, 205, 206, 207, idle, 68, 69, 103, 104, 109, 111, 112, 116, 117, 154, 155, 161,
208, 210, 211, 212, 215, 219, 220, 221, 246, 251, 279 162, 217, 368, 384
health, 116 idly, 155
heap, 268, 270, 300 ifconfig, 133, 134
heaps, 250, 276 illegal, 93, 150, 176
hexadecimal, 49 illustrate, 142, 196, 275
hidden, 67, 71, 80, 86, 94, 151, 177, 200, 203, 204, 206, 208, illustrated, 32, 75, 189, 237
214, 277, 285, 298, 310, 315 illustrates, 18, 53, 96, 372
high-concurrency, 91, 94, 98, 230, 231, 309, 310 illustration, 37, 80, 81, 151
high-level, 27, 45, 56, 57, 62, 350 image, 41, 81, 242
high-performance, 193, 213 images, 302
high-performing, 207, 233 imbalance, 124
hindered, 51 immediate, 16, 20, 21, 61, 79, 80, 94, 124, 125, 249, 253,
histogram, 149, 150, 192, 318 264, 275, 288, 302, 303, 304, 305, 311, 313
historical, 15, 112 immediate_gets, 288
historically, 15, 154 impact, 4, 5, 9, 14, 15, 16, 24, 37, 46, 54, 57, 70, 82, 83, 88,
history, 5, 20, 35, 36, 98, 136, 142, 176, 179, 182, 183, 279, 89, 90, 91, 101, 151, 152, 153, 154, 158, 163, 177, 179, 180,
312, 313, 330, 384 181, 184, 200, 213, 220, 222, 233, 246, 251, 252, 288, 326,
hit, 25, 29, 32, 85, 100, 184, 195, 222, 248, 275, 372 327, 328, 339, 343, 347, 348, 350, 351, 354, 355, 359, 364,
hits, 173 365, 366, 367, 369, 370, 372, 377, 378, 379, 380, 381, 382,
hold, 28, 75, 80, 120, 157, 201, 208, 214, 231, 267, 271, 297, 387
300, 331, 336 impacted, 121, 233, 382
holder, 96, 97 impacting, 83, 110, 118
holders, 74, 75, 281 impacts, 14, 21, 46, 149, 280, 346, 351, 377
impatient, 43, 168
impatiently, 43 increasingly, 20, 21, 39, 83, 85, 91, 152, 158, 162, 183, 186,
implement, 3, 16, 75, 76, 150, 214, 228, 233, 264, 280, 292, 194, 262
324, 348, 372, 387 incredible, 94, 228
implementation, 71, 77, 92, 130, 203, 210, 213, 293, 316, incredibly, 41, 97, 203, 245, 268, 277, 307
381 increment, 150, 207, 209
implementations, 213, 293 incremental, 84
implemented, 20, 39, 67, 91, 171, 176, 207, 251, 361, 363, incrementation, 207
364, 372, 381 incremented, 84, 93, 97, 143, 209, 238
implementing, 67, 202, 251, 361, 364 incrementing, 45, 212
implements, 242 increments, 267
implication, 259 indented, 73
implications, 84, 103, 231, 280, 293 independent, 227, 313, 340
implied, 21 independently, 167
implies, 31, 62, 72, 75, 86, 130, 148, 162, 197, 201, 233, 339 in-depth, 26, 386
imply, 70, 86, 105, 171, 241, 275, 286 index, 147, 206, 224, 231, 232, 233, 279, 293, 376, 377, 378,
implying, 21, 30, 48, 74, 127 379, 380, 382
import, 69, 159 indexed, 183, 232
imprecise, 364, 365 indexes, 71, 206, 231, 233, 300
improper, 127 indexing, 202, 214, 376
improperly, 246 indicate, 55, 61, 89, 241, 285, 301
improve, 3, 16, 18, 19, 54, 70, 74, 86, 93, 94, 128, 151, 154, indicated, 41, 180, 181, 184, 187, 305, 375
156, 169, 185, 186, 199, 245, 247, 284, 295, 364, 367, 368, indicates, 31, 88, 94, 148, 150, 155, 178, 180, 205, 218, 302,
372 309, 311, 323, 336, 342, 368, 382
improved, 17, 37, 94, 252, 361, 362, 363, 369, 375, 376, 377, indicating, 97, 223, 242, 296, 317
379 indication, 120, 183, 302
improvement, 2, 16, 19, 21, 36, 71, 153, 154, 156, 219, 278, indicator, 88, 338
280, 285, 289, 304, 305, 316, 327, 350, 359, 364, 367, 372, indirect, 72
373, 376, 377, 381, 385 indirectly, 286
improvements, 362, 378, 384 individual, 46, 47, 61, 157, 175, 251, 313
improves, 16, 303, 315, 360, 361 induce, 286
improving, 3, 4, 16, 67, 187, 197, 258, 279, 315, 360, 370, induced, 217, 229
383 inducing, 315
imu, 286 industry, 36, 156
inactive, 210, 231, 238, 241, 242, 293, 306, 307 industry-accepted, 157
inbound, 134, 147 inefficient, 136, 306
inception, 36 inet, 134
inclusion, 246 inet6, 134
incoming, 133, 313 infer, 179, 182, 358
incomplete, 25, 308, 325, 326 inferences, 162, 179
inconsistencies, 19 inferring, 15, 65
incorporating, 301 infinite, 106, 108
incorrect, 16, 27, 37, 149, 151, 190, 301, 340, 364 infinity, 212
incorrectly, 300, 356 influence, 26, 41, 67, 70, 182, 246, 259, 269, 286, 289, 309,
increase, 12, 15, 20, 22, 28, 29, 30, 31, 45, 47, 48, 61, 68, 75, 328, 376
85, 86, 107, 109, 124, 125, 148, 149, 151, 152, 153, 163, 177, influencing, 31, 370, 371
182, 188, 193, 195, 197, 198, 199, 203, 206, 208, 211, 215, influx, 208
219, 222, 228, 233, 248, 256, 261, 262, 275, 276, 277, 288,
information, 2, 3, 4, 7, 8, 9, 12, 13, 16, 17, 20, 21, 25, 31, 35,
289, 304, 306, 308, 309, 310, 312, 326, 340, 342, 345, 348,
37, 41, 43, 44, 45, 46, 47, 49, 51, 53, 54, 56, 57, 62, 64, 65,
352, 356, 360, 361, 362, 367, 368, 369, 372, 373, 374, 375,
70, 74, 86, 90, 112, 113, 116, 117, 118, 119, 120, 133, 136,
376, 378, 380, 381, 385, 386, 387
138, 142, 149, 150, 151, 156, 164, 165, 167, 168, 169, 175,
increased, 21, 28, 47, 81, 86, 92, 107, 144, 148, 151, 161,
176, 179, 182, 183, 187, 189, 192, 199, 201, 223, 225, 226,
162, 186, 193, 199, 201, 203, 206, 209, 214, 215, 219, 222,
227, 228, 229, 230, 231, 238, 239, 240, 243, 244, 247, 248,
230, 233, 259, 261, 265, 266, 278, 280, 286, 289, 292, 301,
250, 255, 270, 272, 280, 281, 293, 301, 312, 317, 318, 321,
309, 312, 346, 349, 360, 361, 362, 363, 365, 372, 374, 375,
322, 327, 331, 332, 333, 334, 335, 337, 340, 348, 351, 364,
381, 382, 383, 385, 386
365, 366, 368, 369, 370, 374, 375, 379, 380, 384
increases, 19, 28, 53, 69, 83, 85, 86, 220, 222, 232, 248, 268, informational, 354, 369
270, 278, 288, 309, 340, 342, 345, 348, 350, 361, 372, 386,
informative, 122, 317
387
infrastructure, 27, 132, 156
increasing, 16, 19, 61, 65, 69, 78, 85, 86, 104, 105, 130, 141,
infrequent, 306
162, 182, 193, 195, 203, 211, 214, 219, 220, 222, 228, 229,
231, 232, 238, 251, 261, 265, 275, 276, 278, 287, 289, 308, infuse, 4
309, 316, 326, 342, 361, 362, 365, 367, 368, 369, 371, 372, infused, 246
373, 374, 380 infusion, 4
ini_trans, 237 intense, 12, 13, 14, 20, 22, 26, 81, 83, 84, 136, 152, 195, 200,
initial, 11, 32, 43, 44, 46, 69, 85, 87, 99, 116, 126, 130, 137, 214, 219, 230, 233, 242, 252, 259, 261, 263, 264, 266, 271,
148, 150, 151, 177, 181, 202, 237, 245, 260, 262, 263, 275, 277, 299, 309, 346, 347, 350
279, 294, 311, 317, 318, 320, 321, 328, 329, 333, 346, 347, intensely, 263
348, 375, 377, 381, 382 intensifying, 201
initialization, 91, 305 intensity, 12, 132, 202
initialized, 47 intensive, 263, 348, 372
initially, 15, 129, 149, 157, 177, 212, 213, 248, 277, 295, 308, intent, 14, 80, 113
316, 350, 363, 375 intention, 177, 187, 258
initiating, 167 intently, 80
initrans, 237, 238 interaction, 22, 62, 175
in-memory, 119, 185, 186, 187, 188, 189, 244, 248, 278, 279, interactive, 15, 32, 111, 151, 152
280, 281, 307, 372 interactively, 88, 89, 112, 303
insert, 139, 176, 208, 213, 219, 229, 231, 232, 233, 301, 304, interacts, 288
306, 323 intercept, 281
inserted, 37, 42, 176, 182, 205, 206, 208, 209, 211, 229, 233, interconnected, 92, 205, 246, 345
300, 301 interest, 16, 88, 163, 164, 165, 172, 173, 182, 195, 199, 249,
inserting, 221, 229, 232, 238 302, 345, 359
insertion, 75, 207, 208, 233 interested, 29, 35, 53, 70, 157, 169, 188, 201, 208, 227, 236,
inserts, 229, 304, 306, 322 237, 281, 293, 321, 341, 345
inside, 21, 150, 195 interesting, 27, 36, 52, 63, 69, 80, 82, 94, 136, 150, 169, 184,
insight, 25, 68, 205 185, 195, 197, 220, 221, 235, 242, 244, 257, 262, 268, 285,
insights, 53, 85, 102, 103 287, 292, 354, 371
insignificant, 61, 152, 258, 284, 285, 325, 345, 367, 378 interface, 26, 27, 35, 36, 41, 42, 43, 45, 46, 53, 67, 70, 78, 86,
inspection, 52 88, 89, 98, 125, 126, 129, 130, 142, 151, 159, 176, 177, 182,
install, 102, 111 184, 195, 218, 246, 247, 250, 279, 302, 307, 315, 345
installation, 56 interim, 300, 322
installed, 15, 111, 112 internal, 8, 25, 36, 55, 66, 78, 98, 100, 207, 228, 229, 244,
installing, 119 285, 314, 318
instance, 16, 18, 19, 22, 31, 32, 33, 41, 43, 45, 47, 49, 57, 67, internally, 157, 232
71, 78, 80, 86, 94, 97, 99, 105, 119, 127, 136, 137, 138, 141, internals, 1, 2, 3, 4, 8, 11, 22, 26, 31, 33, 35, 67, 70, 88, 98,
142, 143, 144, 145, 146, 147, 152, 164, 167, 168, 177, 182, 100, 135, 150, 177, 184, 185, 234, 244, 268, 292, 293, 324,
183, 187, 197, 198, 199, 200, 203, 204, 206, 207, 208, 211, 326, 327, 380, 381, 387
213, 214, 215, 216, 217, 219, 222, 246, 259, 260, 261, 262, interrelationships, 20
264, 265, 266, 269, 271, 273, 274, 275, 277, 278, 280, 284, interrupt, 16, 42, 44, 82, 128, 129, 143, 217, 218, 316, 319
285, 286, 287, 289, 297, 298, 301, 302, 303, 305, 307, 308, interrupted, 80
309, 310, 314, 315, 317, 319, 321, 325, 332, 333, 334, 335, intersects, 355
338, 339, 340, 341, 344, 347, 348, 352, 353, 356, 357, 358, interval, 29, 30, 31, 32, 45, 46, 55, 56, 57, 61, 62, 111, 112,
369 114, 116, 121, 126, 127, 128, 137, 138, 141, 145, 146, 147,
instance-focused, 56, 148 148, 150, 151, 152, 172, 173, 174, 177, 233, 262, 263, 306,
instance-level, 35, 46, 55, 56, 63, 126, 151, 157, 177, 180, 314, 317, 318, 319, 320, 321, 328, 329, 330, 332, 333, 334,
181 335, 336, 337, 338, 340, 341, 344, 347, 352, 353, 355, 356,
instances, 249, 340 357, 358, 365, 367, 368, 369, 370, 371, 373, 375, 376, 377,
instance-wide, 56 378, 379, 382, 383
instantly, 315 interval-based, 177
instinctively, 69 intervals, 307, 330, 383
instruction, 97, 266 interworkings, 8, 150
instructs, 303 intimately, 246, 326
instrument, 37, 41 introduce, 27, 35, 54, 56, 154, 354
instrumentation, 4, 25, 35, 36, 37, 38, 39, 40, 41, 42, 44, 45, introduced, 4, 21, 33, 74, 91, 102, 136, 160, 162, 163, 207,
46, 52, 55, 126, 135, 142, 144, 163, 176, 182, 316 236, 239, 242, 245, 248, 268, 275, 284, 296, 302, 306, 318
instrumentation-based, 157 introduces, 5, 11, 23, 29
instrumented, 25, 36, 37, 39, 41, 42, 43, 45, 65, 164, 184, introducing, 135, 309
343 introduction, 36, 142, 176, 231, 278, 279
instrumenting, 25, 37, 279 intrusive, 37
instruments, 25, 42, 126 invading, 200
integer, 191, 192 invalidate, 249, 252, 255
integrate, 91 invalidated, 248, 252
integrated, 2, 33 invalidation, 246
integrates, 5 invalidations, 249
integration, 2, 33 invented, 157
integrity, 23, 305 invention, 74, 91, 150
inventory, 65
investigate, 24, 102, 165, 219 keyword, 232

investigated, 381 kgx, 266
investigation, 62, 302, 367 kill, 143, 319
investigative, 15 knee, 29
investing, 14 knees, 67, 132
investment, 362, 363 know-how, 74
invoke, 118, 268, 269, 274 knowing, 5, 8, 26, 61, 69, 86, 117, 120, 279, 341, 344, 345,
invoked, 174, 289 372
involved, 2, 8, 11, 12, 13, 14, 16, 20, 21, 36, 55, 57, 61, 69, 72, knowledge, 1, 4, 8, 11, 26, 33, 36, 42, 52, 66, 134, 184, 234,
74, 86, 94, 102, 104, 126, 130, 154, 155, 162, 165, 169, 176, 251, 327
183, 192, 197, 201, 209, 219, 233, 234, 235, 236, 237, 240,
241, 242, 246, 249, 252, 283, 289, 310, 316, 325, 326, 328,
367, 371, 375, 380, 384
L
io, 305, 306, 314, 315, 324 label, 124, 336
iops, 129, 320 labeled, 204, 209, 330
ior, 25, 27, 56, 303 labeling, 321
ioread, 39, 40 lambda, 328
iostat, 9, 125, 129, 130, 314, 317 language, 26, 167
iosumx.sql, 321 laptop, 2, 8
iow, 56 latch, 4, 22, 30, 31, 41, 45, 46, 48, 49, 52, 61, 64, 67, 70, 72,
iowrite, 40 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88,
ipcbc.sql, 198 89, 90, 91, 92, 93, 94, 96, 97, 98, 99, 100, 126, 135, 148, 151,
irrelevant, 94, 347, 351 152, 176, 178, 180, 181, 194, 195, 196, 197, 198, 199, 201,
isolate, 162, 247 202, 203, 204, 205, 207, 208, 209, 210, 211, 213, 214, 215,
isolated, 26, 162 217, 220, 221, 222, 242, 243, 246, 251, 256, 259, 260, 262,
isolation, 18, 157 264, 265, 266, 268, 270, 271, 274, 275, 276, 278, 281, 286,
issue, 18, 31, 41, 42, 46, 54, 61, 62, 63, 67, 68, 69, 70, 71, 75, 287, 288, 289, 296, 297, 298, 299, 308, 309, 310, 311, 328,
88, 89, 93, 102, 103, 107, 109, 130, 132, 133, 155, 157, 162, 329, 349, 352
195, 201, 202, 203, 205, 219, 220, 228, 229, 231, 233, 261, latch%free, 87
262, 266, 268, 279, 280, 292, 302, 306, 307, 309, 311, 312, latch.sql, 88, 89
313, 314, 315, 316, 324, 348, 349, 352, 366, 367, 372 latch/mutex, 4
issued, 43, 71, 82, 127, 149, 184, 217, 241, 263, 292, 301, 302, latches, 4, 22, 31, 67, 71, 72, 73, 74, 75, 76, 77, 78, 80, 86, 89,
303, 304, 306, 307, 319 90, 91, 92, 93, 94, 97, 98, 185, 189, 194, 195, 196, 197, 198,
issues, 4, 7, 8, 21, 29, 36, 40, 41, 46, 52, 61, 64, 68, 70, 71, 75, 199, 203, 204, 213, 214, 244, 245, 246, 251, 252, 255, 261,
82, 86, 97, 98, 111, 122, 123, 129, 130, 132, 133, 142, 156, 263, 265, 266, 268, 271, 275, 276, 279, 281, 283, 286, 287,
160, 161, 162, 187, 195, 197, 201, 202, 203, 218, 219, 222, 288, 289, 296, 297, 298, 299, 308, 309, 310, 314
228, 250, 256, 259, 261, 279, 280, 285, 299, 302, 303, 306, latching, 4, 31, 52, 54, 67, 68, 69, 70, 74, 75, 78, 83, 85, 86,
307, 311, 312, 325, 326, 345, 348, 368, 369, 381, 385, 386 88, 92, 94, 100, 141, 155, 189, 194, 199, 201, 202, 205, 214,
issuing, 42, 43, 44, 102, 120, 129, 132, 188, 189, 195, 201, 233, 235, 246, 249, 250, 259, 275, 280, 281, 342, 386
208, 228, 238, 239, 242, 274, 278, 292, 302, 305, 306, 307, latch-related, 61, 83
311, 314, 317 latch-requesting, 86
iteratively, 24 latch-specific, 86, 89
latch-suffering, 88
latency, 132, 133, 345
J launch, 80, 105
jellyfish-like, 230 launched, 105
launching, 105
layers, 162
K leadership, 162
kernel, 4, 7, 8, 25, 36, 37, 41, 42, 43, 45, 46, 70, 71, 72, 73, 74, leaf, 206, 232, 233
75, 76, 78, 82, 90, 91, 92, 93, 97, 98, 104, 117, 123, 150, 163, leap, 142, 175
176, 177, 182, 185, 186, 189, 197, 210, 214, 231, 250, 270, least, 14, 18, 19, 27, 44, 74, 75, 76, 81, 125, 127, 142, 162,
278, 285, 288, 289, 292, 316 167, 175, 189, 190, 192, 196, 198, 199, 200, 205, 212, 252,
kernel-centric, 43 260, 261, 271, 281, 294, 297
kernel-level, 5, 36, 41, 177 leftover, 155, 387
kernel-level-embedded, 182 legal, 176, 313
key, 5, 7, 8, 13, 14, 15, 16, 17, 20, 25, 26, 31, 47, 56, 57, 62, legally, 37, 176
70, 74, 82, 83, 88, 90, 97, 103, 110, 116, 122, 123, 129, 134, legitimate, 200
142, 149, 153, 155, 156, 163, 164, 165, 169, 190, 194, 200, length, 7, 110, 117, 194, 195, 196, 197, 198, 199, 219, 221,
206, 207, 212, 222, 223, 232, 233, 244, 247, 251, 252, 253, 241
254, 260, 268, 278, 281, 282, 283, 284, 293, 307, 330, 336, lengths, 72, 163, 194
359, 364, 367, 380, 381 lens, 42, 44
keys, 11
level, 20, 22, 24, 30, 31, 39, 41, 42, 45, 47, 48, 52, 53, 54, 55, location, 16, 90, 163, 229, 243
64, 91, 94, 97, 102, 110, 112, 142, 152, 157, 158, 162, 175, lock, 36, 39, 71, 74, 89, 91, 138, 140, 143, 172, 182, 189, 218,
183, 190, 218, 242, 249, 251, 252, 253, 256, 261, 264, 281, 224, 233, 234, 235, 236, 237, 241, 250, 293, 294, 301, 312,
302, 305, 319, 321, 328, 343, 350, 353, 358, 360 320, 330, 334, 352, 370
levels, 20, 21, 30, 53, 54, 55, 110, 132, 137, 157, 170, 319 lock/blocking, 352
lgwr, 307, 316, 319 locked, 72, 223, 233, 234, 235, 236, 237, 240, 241, 246
liberally, 197 locking, 18, 71, 72, 102, 119, 188, 189, 219, 229, 233, 235,
liberated, 276 237, 242, 246, 249, 250, 270, 372
liberties, 30, 33 locking/blocking, 219
liberty, 8 lockreq, 39, 40
library, 26, 48, 61, 64, 74, 86, 89, 90, 91, 92, 93, 94, 98, 99, locks, 71, 72, 185, 224, 234, 246, 250, 263
148, 175, 233, 246, 247, 248, 249, 250, 251, 252, 253, 254, log, 15, 46, 48, 61, 87, 99, 125, 126, 127, 130, 148, 152, 186,
255, 256, 257, 259, 260, 261, 262, 263, 264, 265, 266, 268, 187, 202, 220, 262, 263, 280, 281, 282, 283, 284, 292, 293,
277, 311 295, 296, 297, 298, 299, 301, 302, 303, 304, 305, 306, 307,
library_cache, 249, 253, 256 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319,
library-related, 98 320, 321, 322, 323, 324, 325, 326, 335, 384
license, 37, 163, 175, 176, 184, 312 log_buffer, 308
licenses, 161, 184 logged, 264, 284
limit, 27, 53, 157, 200, 231, 269, 299, 327, 348, 387 logical, 21, 22, 31, 52, 84, 173, 180, 181, 198, 202, 203, 222,
limitation, 65, 92, 176, 280, 385 261, 328, 331, 332, 339, 342, 344, 346, 349, 350, 353, 369,
limitations, 15, 142, 151, 206, 367 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381,
limited, 8, 14, 15, 17, 51, 65, 68, 77, 83, 93, 136, 137, 150, 382, 383
161, 195, 201, 214, 223, 238, 239, 259, 275, 300, 340, 344, logon, 165, 167, 169, 171
346, 385, 387 logs, 61, 325, 326
limiting, 44, 197, 238, 342 long-running, 37, 176
limits, 210, 300 loop, 55, 79, 80, 81, 94, 99, 139, 262, 292, 304, 311, 312
linear, 339, 340, 341 looper, 139
linearity, 339 loops, 94
link, 21, 51, 75, 238, 243, 244, 281, 283, 343, 349 low, 13, 15, 91, 109, 124, 136, 210, 212, 233, 298, 357, 387
linked, 75, 76, 159, 190, 204, 268 low-activity, 126
linking, 51, 177 lower, 45, 91, 97, 109, 182, 213, 314
links, 19, 24, 75, 204, 230, 384 lowercase, 121
lio/ms, 375, 376, 380 lower-level, 45
list, 7, 13, 24, 27, 30, 51, 52, 75, 76, 78, 91, 167, 169, 179, 189, lowest, 227
190, 191, 199, 202, 203, 205, 207, 210, 215, 216, 218, 219, low-hanging, 16
221, 222, 227, 230, 234, 268, 271, 272, 274, 275, 278, 281, low-impact, 177
288, 293 low-processing, 162
listed, 24, 47, 88, 266, 285, 302, 365 lru, 46, 48, 64, 89, 126, 148, 204, 208, 213, 214, 311
listen, 4, 26, 28, 35, 65 luck, 152, 169
listened, 65 lucky, 80, 152, 201, 332, 361, 382
listening, 27, 65
listing, 358
lists, 45, 49, 61, 75, 76, 127, 185, 190, 203, 215, 230, 236, 237,
M
279 machine, 111, 158, 160, 171
literal, 263, 265 machines, 160
literally, 9, 246, 268 main, 11, 41, 61, 72, 79, 80, 95, 187, 214, 237, 279, 284, 298,
literals, 263 299, 314, 325, 365
lithium, 302 maintain, 116, 194, 232, 242, 315, 342, 361, 362
load, 29, 61, 81, 84, 85, 117, 129, 147, 163, 184, 217, 232, 248, maintained, 72, 231, 232, 247, 249, 252, 259, 380
262, 263, 265, 266, 271, 272, 284, 329, 330, 331, 332, 333, maintaining, 4, 176, 277, 278
346, 347, 360, 362, 379, 382 maintains, 234
load-balancing, 123, 153 maintenance, 161, 163, 230
loaded, 184, 260, 271, 272, 277, 278, 341 majority, 21, 39, 155, 202, 214, 307, 316, 348
loads, 56, 256, 260, 273 malloc, 120
loadsx, 260, 273 man, 14, 80, 82, 91
local, 133, 216 manage, 72, 110, 119, 137, 148, 186, 230, 245, 275, 276, 278,
locally, 230 279
locatable, 200, 248, 250 managed, 230, 268, 278, 360, 362
locate, 75, 77, 247, 248, 250, 255, 259, 305, 312, 322 management, 3, 12, 14, 20, 21, 22, 23, 25, 31, 53, 56, 61, 65,
located, 9, 21, 89, 156, 182, 194, 230, 238, 248, 249, 251, 280, 71, 72, 104, 107, 110, 111, 123, 147, 162, 183, 186, 189, 203,
281 214, 219, 230, 231, 244, 245, 246, 247, 248, 250, 261, 268,
locates, 242 271, 275, 276, 277, 279, 280, 281, 283, 285, 286, 291, 292,
locating, 62, 323 293, 301, 332, 333, 362, 367, 373
management-related, 268 methodical, 3, 4, 35, 52

manager, 14, 17, 24, 26, 37, 38, 39, 40, 63, 65, 102, 162, 174, methodically, 11, 13, 16, 25, 251
253, 270, 279, 291, 301 methodologies, 100
managers, 4, 13, 17, 53 methodology, 7, 8, 100, 135, 151, 182, 343
manages, 183, 236, 239, 291 methods, 2, 3, 4, 11, 16, 25, 33, 167, 184, 263, 386
managing, 71, 72, 104, 110, 132, 186, 268, 387 metric, 321, 329, 331, 338, 340, 344, 347, 351, 369, 370, 371,
manipulate, 209, 212 375, 381, 384
manipulated, 190 metrics, 57, 176, 330, 331, 332, 335, 340, 344, 351, 367, 369,
manipulating, 14 381, 382
manipulation, 190, 314 middle, 49, 50, 96, 176, 208, 267, 300
manual, 43, 82, 88, 91, 112, 122, 128, 129, 130, 133, 156, 230, midpoint, 207, 208, 209, 210, 211, 212
248 migrate, 205, 208
manually, 61, 89, 156, 169, 269, 359, 384 migrates, 211
map, 23, 230, 231 million, 77, 156, 376
mapped, 259 millions, 133, 196, 324
mapping, 281 millisecond, 28, 41, 43, 72, 148, 150, 346, 369
maps, 192 milliseconds, 21, 84, 128, 130, 149, 340, 341, 342, 351, 353,
mark, 71, 72 358, 369, 376, 377
marker, 296, 297, 308 min, 6, 178, 179, 180, 181, 260, 273, 274, 277, 328, 337
markers, 296, 298 minimal, 242, 275, 280, 342, 351, 357, 360
market, 8 minimize, 68, 194, 201, 281, 292, 387
marketing, 160 minimized, 37, 61, 156, 159, 193
marks, 209 minimizing, 35
master, 9, 75 minimum, 12, 81, 187, 216, 228, 276, 305, 327
master/slave, 75 minus, 31, 116
math, 55, 117, 142, 247, 328, 331, 337, 352, 353, 358, 359, minutia, 41
365, 377 mirrored, 187, 326
mathematical, 132, 191, 327, 339, 343, 347 mirroring, 302
mathematically, 30, 136, 332, 385 mirrors, 63, 187
mathematics, 327, 345, 346, 348, 358, 371, 381 misconfiguration, 20, 44
maximization, 68 misconfigured, 130
maximize, 109, 134, 233, 324, 372 misdiagnose, 26, 151
maximizing, 69 misdiagnosis, 19, 27, 55, 137
maximum, 97, 141, 161, 183, 186, 201, 238, 286, 298, 359 misguided, 151
mean, 19, 31, 65, 70, 75, 120, 150, 151, 222, 240, 242, 288, mislabeled, 130
315, 325, 347, 352, 385 misleading, 32, 53, 162, 310, 348, 350, 364
measurable, 136 misled, 40, 41, 156
measure, 110 mismatch, 147, 333
measured, 110, 197, 304 misrepresent, 136
measuring, 37 misrepresentation, 136
mechanism, 61, 261, 296 misrepresented, 151
mechanisms, 189 misrepresenting, 350
media, 280, 302 miss, 8, 15, 149
memory, 4, 5, 18, 21, 22, 24, 31, 43, 61, 67, 69, 70, 71, 72, 73, missed, 209, 362
74, 75, 76, 77, 78, 80, 81, 91, 92, 93, 96, 97, 102, 104, 111, misses, 90, 288, 349
112, 118, 119, 120, 121, 122, 142, 148, 158, 161, 163, 183, missing, 2, 19, 41, 135, 156, 175, 335, 343
186, 191, 194, 195, 196, 197, 203, 208, 209, 214, 220, 221, mistake, 18, 19, 20, 174
222, 244, 245, 246, 247, 248, 250, 251, 255, 256, 259, 260, mistakes, 9, 179
261, 265, 268, 270, 271, 275, 276, 277, 278, 279, 281, 282, misunderstand, 46
283, 284, 285, 286, 288, 289, 292, 295, 297, 300, 301, 309, misunderstood, 154
314, 349, 367, 372, 375
misused, 27
memory-related, 122
mix, 61, 82, 110, 124, 228, 229, 304, 342, 348, 362, 383
merge, 386
mixed, 2
merged, 386
mod, 38, 168, 191
message, 23, 48, 63, 65, 119, 154, 155, 175, 271, 276, 302,
mode, 74, 80, 93, 94, 95, 96, 97, 104, 111, 112, 175, 188, 195,
363, 384
199, 200, 248, 249, 253, 266, 267, 292, 324, 325, 333, 346,
messages, 25, 363
353
messaging, 363
model, 29, 55, 56, 72, 84, 85, 96, 103, 105, 123, 136, 142, 143,
metadata, 190, 237, 239 144, 145, 146, 147, 160, 168, 176, 182, 183, 187, 199, 237,
metastructure, 189 239, 244, 251, 252, 253, 279, 301, 318, 333, 334, 344, 347,
method, 1, 2, 3, 11, 12, 18, 19, 20, 21, 22, 25, 27, 36, 62, 102, 348, 349, 356, 357, 365, 386
157, 184, 219, 223, 263, 291, 312, 383, 387 models, 107
modern, 5, 122, 156, 158, 163, 177, 219 N

modern-day, 133, 162
modes, 187 namespace, 251, 255, 256
modified, 158, 199, 201, 206, 207, 208, 255, 256, 261, 279, navigate, 63
303 navigation, 162
modify, 56, 199 negating, 228
modifying, 294, 304 negative, 83, 233
module, 12, 13, 37, 38, 39, 40, 72, 73, 164, 165, 167, 168, 171, negatively, 22, 121, 326, 382
177, 312, 323 negatives, 137
modules, 18, 37, 38, 39, 40, 41, 72, 73 net, 91, 152, 266
modulus, 192, 193, 257 netbooks, 162
money, 16, 17, 36, 37, 69, 123, 176, 177 netstat, 133
monitor, 16, 111, 126, 167, 170, 172, 174, 286, 287 network, 4, 18, 21, 22, 24, 27, 63, 64, 65, 102, 118, 126, 132,
monitored, 248 133, 134, 153, 154, 155, 159, 161, 162, 167, 367
monitoring, 111, 113, 156, 169, 285 network-attached, 126, 326
monitors, 112 networked, 27, 345
monotonically, 231 networking, 156
month, 37 network-related, 132
month-end, 21 networks, 132, 133, 354
monthly, 37 nice, 112, 126, 128, 168, 339, 346
months, 1, 21, 32, 132 node, 103, 248, 280, 283
motion, 229 node-collapsing, 289
motivate, 149, 327, 364 nodes, 75, 185, 230, 251, 268, 280, 281, 283, 284
motivated, 92, 162, 187, 201, 259, 292 normalization, 292
mts_max_servers, 105 notation, 5
mtx, 91 notations, 5
mtx_unlock_spin, 91 notify, 9
muddies, 55 nowait, 80, 94, 95, 303, 304, 305, 324
muddy, 29, 137, 162, 176, 247, 332 null, 303
multiblock, 21, 41, 42, 43, 44, 48, 49, 50, 51, 61, 125, 126,
127, 128, 143, 148, 149, 178, 179, 206, 208, 216, 218, 315, O
355
multiperspective, 20 object_id, 175, 232, 234, 235, 236
multiple, 2, 15, 18, 19, 21, 28, 39, 43, 48, 53, 56, 63, 65, 67, objects, 51, 72, 90, 91, 185, 228, 234, 235, 236, 244, 246, 247,
69, 70, 75, 76, 78, 80, 92, 119, 123, 128, 134, 148, 161, 165, 248, 249, 250, 251, 252, 253, 255, 256, 259, 260, 261, 271,
169, 175, 179, 188, 197, 199, 201, 214, 218, 223, 224, 227, 272, 273, 276, 277, 278, 289
228, 231, 232, 244, 249, 252, 258, 262, 267, 268, 270, 271, objfb.sql, 50, 51, 52, 201, 225
279, 281, 284, 297, 301, 304, 307, 309, 311, 313, 315, 316, occupied, 184, 238
319, 336, 342, 349, 351, 368, 371, 372, 373, 377, 378, 385 occupies, 200
multiple-block, 21, 43 occupy, 310
multiple-choice, 247 occurred, 12, 29, 30, 31, 38, 49, 126, 127, 128, 179, 200, 242,
multiple-latch, 76 243, 244, 292, 295, 306, 316, 328, 343, 346, 347, 348, 353,
multiple-perspective, 33 372, 374, 377, 380, 381
multiples, 81 occurrence, 12, 200
multiplied, 30, 31, 57, 88, 89, 116, 305, 337 occurrences, 128, 130, 213, 321, 366
multipurpose, 248 occurring, 15, 31, 49, 69, 71, 93, 109, 121, 134, 150, 157, 182,
multithreaded, 105, 165, 169 217, 232, 238, 240, 246, 249, 260, 268, 272, 276, 278, 313,
multitier, 65 315, 323, 331, 332, 355, 362
multitude, 3 occurs, 13, 15, 20, 28, 62, 68, 89, 92, 94, 97, 104, 107, 111,
multiuser, 137 122, 124, 125, 126, 134, 136, 143, 190, 194, 200, 210, 211,
mutex, 4, 67, 70, 72, 74, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 219, 220, 223, 228, 230, 239, 249, 250, 256, 262, 264, 267,
194, 249, 251, 252, 254, 255, 256, 259, 260, 262, 265, 266, 274, 277, 280, 293, 298, 300, 302, 308, 310, 314, 322, 332,
267, 268 342, 345, 348, 361, 366, 371
mutex_sleep, 98 officially, 139, 281
mutexes, 4, 67, 69, 70, 71, 72, 73, 74, 91, 92, 93, 94, 96, 97, offline, 291
98, 189, 245, 246, 251, 252, 254, 255, 261, 262, 263, 265, offset, 285, 304, 305, 342, 372
266, 279 onepass, 331
mutex-related, 91, 98, 262, 266, 268 open, 41, 94, 143, 248, 259, 273, 278, 293, 319
mutex-specific, 70, 97 opened, 94, 173, 262, 263, 327
mutual, 91 opening, 99, 262
opens, 27
operate, 1, 15, 135, 266, 268, 275, 283, 343, 346, 387
operated, 17 264, 265, 266, 269, 273, 275, 277, 278, 286, 287, 289, 297,
operates, 62, 296, 346 298, 302, 303, 305, 306, 308, 309, 310, 314, 315, 324, 336,
operating, 2, 4, 8, 9, 12, 14, 18, 19, 20, 21, 22, 23, 24, 27, 29, 369
30, 31, 37, 41, 42, 43, 44, 48, 49, 50, 51, 52, 53, 57, 62, 69, parameters, 41, 45, 49, 67, 128, 169, 177, 183, 197, 198,
72, 81, 82, 83, 86, 97, 101, 102, 104, 105, 107, 110, 111, 112, 201, 207, 208, 214, 215, 216, 219, 223, 224, 238, 248, 260,
115, 116, 118, 119, 120, 121, 122, 123, 125, 128, 129, 130, 274, 277, 285, 286, 289, 307, 341
134, 135, 136, 137, 141, 142, 143, 146, 148, 149, 155, 156, paranoid, 134
157, 158, 161, 167, 183, 184, 195, 197, 203, 208, 214, 216, parent, 92, 248, 250, 252, 254, 255
217, 218, 219, 220, 221, 228, 262, 266, 268, 276, 288, 292, parse, 55, 138, 139, 140, 141, 145, 146, 147, 173, 175, 249,
302, 303, 304, 305, 306, 307, 308, 309, 310, 314, 315, 316, 250, 256, 258, 260, 261, 262, 263, 277, 333, 349
317, 318, 319, 323, 324, 336, 337, 338, 339, 341, 344, 345, parsed, 112, 263
347, 348, 349, 350, 356, 357, 360, 362, 365, 366, 367, 368, parse-related, 262
369, 370, 373, 374, 379, 385, 386, 387 parses, 169, 329
operation, 5, 13, 35, 41, 63, 71, 94, 97, 100, 185, 241, 249, parsing, 55, 136, 138, 139, 141, 147, 250, 255, 259, 260, 261,
250, 259, 261, 266, 267, 271, 280, 283, 319, 336 262, 263, 266, 275, 277
operational, 248, 271, 272 parsing-related, 138, 139
operations, 28, 69, 97, 100, 129, 130, 159, 186, 187, 219, partition, 129
233, 246, 247, 271, 278, 279, 280, 281, 283, 285, 300, 319, partitioning, 233
320, 332, 334, 335, 336 patch, 12, 207, 258, 259, 266, 288
operator, 257 patches, 268
optimization, 3, 5, 41, 44, 151, 152, 184, 245, 247, 248, 278, patching, 246
288, 387 patented, 242, 278, 283
optimizations, 29
patents, 74, 150
optimize, 93, 98, 152, 258
pattern, 12, 81, 130, 223, 227, 228, 271, 276, 277, 345
optimized, 186, 189, 212, 233, 244, 278, 279, 280, 282, 304
patterns, 129, 224, 245, 289
optimizer, 158, 233, 249, 250, 253, 254, 370, 371, 376 pct_free, 201, 228, 229, 238
optimizer_mode, 249, 253
pct_used, 201
optimizing, 44, 72, 186, 265, 387
pctfree, 233
ordered, 90, 231, 233, 337
peak, 22, 93, 132, 180, 202, 203, 214, 220, 222, 229, 314, 332,
ordering, 74, 80, 231, 232, 286 352, 355, 362, 363, 364
organization, 21, 218, 271, 291, 387 percent, 7, 116, 207, 208, 212, 214, 215
organizations, 9 percentage, 46, 53, 57, 61, 88, 90, 111, 112, 133, 178, 215,
originate, 27 263, 304
originated, 55 percentile, 132
originates, 80 performance-critical, 202, 214
oscillation, 81 performance-enhancing, 327, 360, 361, 370, 373, 378,
oscillation-like, 82 381, 387
outbound, 134 performance-hindering, 299
outgoing, 133 performance-improving, 21, 181, 358, 360, 362, 363, 364,
overhead, 85, 92, 104, 141, 146, 182, 184, 196, 244, 280, 281, 371
283, 293, 303, 310, 314, 316, 326, 357, 363, 375, 386, 387 performance-inhibiting, 259
overlap, 14, 18, 20, 55 performance-limiting, 4, 40, 195
period, 30, 32, 53, 69, 116, 133, 139, 150, 182, 183, 184, 209,
P 242, 271, 310, 321, 328, 329, 336, 376, 385
periodic, 20, 176, 275
p1, 45, 49, 52, 87, 223, 224, 232, 235, 236 periodically, 15, 16, 17, 224, 276
p2, 45, 49, 52, 87, 88, 213, 223, 224, 232, 234, 235, 236 periods, 20, 209, 362
p3, 45, 49, 52, 87, 223, 224, 235, 236 perl, 257, 258
package, 111, 163, 175, 257, 271, 272, 277 permission, 81, 129
packages, 246, 247, 249, 260, 261, 272, 273, 276 persistent, 165, 167
packet, 132, 133 personnel, 21
packets, 132, 133, 134 perspective, 2, 4, 15, 19, 21, 22, 27, 31, 33, 36, 43, 45, 46,
panic, 11, 13, 105, 179, 234 51, 61, 73, 83, 101, 102, 104, 120, 127, 134, 137, 149, 153,
parallel, 46, 69, 87, 90, 99, 126, 127, 148, 152, 202, 216, 218, 154, 156, 158, 160, 162, 182, 184, 186, 202, 205, 214, 219,
219, 220, 222, 228, 262, 263, 304, 305, 308, 309, 310, 311, 220, 221, 232, 237, 250, 265, 271, 292, 297, 302, 317, 319,
314, 315, 316, 317, 318, 319, 322, 324, 326, 384, 385, 386 320, 324, 342, 344, 345, 347, 348, 350, 360, 361, 363, 365,
parallelism, 68, 69, 75, 298, 383, 384, 385, 386, 387 367, 368, 372, 379, 387
parallelization, 68, 69, 100, 386 perspectives, 2, 19, 44, 102, 326
parallelize, 68, 69 phase, 293
parameter, 22, 43, 45, 47, 49, 50, 51, 62, 71, 78, 80, 82, 86, phases, 293
88, 89, 94, 97, 99, 105, 121, 151, 177, 179, 180, 182, 183, phyrds, 129, 334
198, 200, 203, 204, 206, 208, 211, 213, 214, 215, 219, 222, physical, 21, 22, 43, 61, 62, 103, 118, 120, 129, 130, 149, 158,
223, 224, 233, 237, 238, 246, 249, 253, 254, 259, 261, 262, 161, 173, 179, 180, 210, 211, 213, 214, 219, 221, 222, 228,
285, 302, 317, 319, 323, 324, 335, 336, 346, 347, 348, 349, preceding, 30, 70, 81, 139, 170, 171, 174, 175, 274, 326, 338,
358, 369, 370, 371, 372, 373, 374, 375, 381, 383 354, 380
phywrts, 130, 334 precise, 26, 341, 348, 354
pin, 74, 93, 94, 99, 189, 209, 221, 259, 262, 263, 266, 267, 268, precision, 82, 331, 345, 353, 358, 365
271, 272, 274, 277, 278 predict, 27, 136, 228, 337
ping, 132 predictable, 86
pinned, 74, 93, 187, 189, 204, 210, 246, 259, 262, 267, 271, predicting, 337, 343
274, 275, 277, 278 prediction, 117
pinning, 92, 94, 189, 246, 250, 259, 266, 270, 280, 281 predictions, 386
pins, 93 predictive, 7, 9, 27, 29, 103, 110, 137, 168, 173, 327, 343,
plan, 13, 14, 29, 71, 229, 233, 275, 327, 341, 370, 371, 376 345, 348, 365, 381
planned, 160, 372 preferences, 169
planning, 27, 68, 365 prepare, 11, 12, 26, 328
plans, 248 prepared, 52, 98, 169, 317, 327, 351, 352
platform, 102, 118, 120 prepares, 2
platforms, 112, 117, 121 preparing, 314, 355
plist-sz, 118 presentation, 22, 23, 25, 157, 351, 364, 381
pmap, 119 presentations, 9
pointer, 75, 251, 281, 296, 297, 308 presented, 5, 27, 31, 70, 100, 113, 122, 211, 213, 227, 247,
pointers, 261, 296 251, 264, 312, 326, 328, 333, 340, 343, 344, 345, 359, 364,
point-in-time, 119, 293, 326 365, 371, 384
policy, 197 presenting, 42, 157, 350, 353, 360, 381
political, 156 presents, 63, 65
politics, 332 preserve, 241
pool, 5, 48, 74, 89, 90, 91, 97, 182, 186, 212, 244, 245, 246, preserving, 277
247, 248, 250, 256, 258, 259, 260, 261, 265, 266, 268, 269, pressure, 13, 20, 118, 120, 122, 149, 307, 309
270, 271, 272, 273, 274, 275, 276, 277, 278, 281, 283, 284, preview, 29
287, 289, 292, 295 primary, 47
pooling, 162, 165 prime, 192
pool-related, 141, 246 principle, 39
pools, 148, 212, 214, 215, 286, 289 prior, 228, 295, 296, 297, 312, 328, 379
pop, 120 priorities, 14
popped, 120, 233 prioritize, 3, 13, 71, 186, 246
popular, 61, 70, 75, 134, 188, 190, 195, 199, 201, 203, 204, priority, 90, 110, 112
205, 206, 207, 209, 210, 211, 212, 215, 221, 228, 261, 267, private, 9, 118, 119, 248, 261
272 privileged, 104
popularity, 201, 207, 221, 230 privileges, 116
portable, 82 proactive, 274
possess, 199, 300 probabilistic, 80
post, 15, 44, 63, 65, 83, 154, 216, 220, 221, 228, 232, 238, 244, probability, 233
267, 301, 302, 306, 308, 311, 325, 326 probably, 15, 17, 18, 19, 23, 30, 32, 102, 116, 120, 127, 128,
posted, 9, 16, 47, 71, 84, 126, 154, 229, 231, 236 132, 144, 146, 148, 164, 169, 177, 203, 221, 222, 227, 232,
posting, 16, 43, 44, 49, 87, 94, 154, 155, 156, 199, 207, 208, 263, 266, 268, 271, 277, 288, 309, 315, 324, 342
222, 232, 233, 235, 238, 305, 314, 315, 322, 343, 384 probe, 62
postings, 155 problem, 1, 3, 5, 7, 8, 13, 14, 15, 18, 19, 20, 21, 22, 24, 26, 30,
posts, 71, 154, 220, 250, 267, 271, 303, 311, 322 39, 41, 46, 52, 54, 62, 63, 64, 65, 69, 70, 85, 86, 88, 102, 103,
posture, 21, 27, 274 105, 120, 123, 124, 125, 129, 130, 132, 133, 134, 137, 142,
poured, 332 148, 153, 154, 155, 162, 163, 164, 176, 177, 185, 186, 195,
pouring, 299 199, 201, 202, 203, 204, 206, 220, 221, 223, 228, 229, 232,
power, 18, 19, 20, 30, 31, 85, 104, 110, 112, 116, 141, 146, 238, 245, 246, 256, 259, 260, 261, 262, 264, 266, 271, 277,
161, 165, 177, 184, 187, 203, 214, 222, 264, 288, 314, 336, 279, 280, 288, 289, 292, 297, 299, 302, 305, 307, 308, 310,
337, 338, 385, 386 311, 312, 314, 317, 322, 324, 326, 345, 362, 367, 384, 387
powerful, 100, 165, 181, 244, 263, 275 problematic, 70, 86, 88, 341
practical, 3, 36, 92, 100, 150, 157, 185, 244, 321, 327, 386, problems, 2, 5, 12, 15, 19, 24, 33, 61, 71, 86, 100, 122, 123,
387 176, 186, 187, 195, 206, 207, 214, 231, 234, 246, 247, 248,
practicality, 327 249, 259, 265, 268, 279, 280, 286, 288, 292, 299, 305, 306,
practically, 107, 121, 187, 341, 345 309, 312, 316, 317, 341, 355
practice, 26, 94, 341, 367 problem-solvers, 130
practicing, 26 procedural, 307
practitioners, 36 procedure, 138, 140, 165, 167, 169, 172, 174, 175, 246, 260,
pragmatic, 14 261, 272, 273, 274, 278, 295, 330, 334
pread64, 128, 129, 319 procedures, 167, 168, 169, 246, 247, 273
precedence, 381 process-level, 112, 119
processor, 363 173, 178, 219, 221, 222, 233, 234, 236, 296, 340, 341, 342,
processors, 361, 362 343, 344, 345, 346, 347, 348, 349, 352, 353, 354, 356, 358,
process-related, 118, 119, 384 359, 363, 365, 367, 370, 371, 373, 378, 384, 387
process-specific, 119 queued, 28, 103, 110
process-to-latch, 287 queues, 117, 124, 134, 185, 244
procs, 111, 113, 117, 121 queuing, 28, 29, 67, 68, 69, 74, 84, 85, 102, 104, 105, 106,
procs_blocked, 113 107, 108, 109, 110, 116, 123, 124, 125, 136, 178, 341, 342,
procs_running, 113 343, 345, 346, 347, 348, 355, 359, 360, 361, 363, 387
production, 2, 8, 12, 41, 67, 70, 71, 77, 80, 86, 115, 196, 204, queuing-like, 29
207, 212, 216, 265, 267, 268, 271, 272, 308, 311, 338 quotient, 26
productivity, 20, 382
products, 15, 32, 36, 56, 70, 176, 177, 182, 186 R
professional, 1, 13, 26
professional-grade, 177 r/s, 129, 130
professionals, 3, 4, 327 random, 75, 192, 228
proficiency, 327 randomize, 83
proficient, 327 randomly, 192, 251
profile, 29, 63, 65, 154, 155, 157, 162, 180, 329, 330, 331, 332 randomness, 81, 82, 83, 305
profiled, 63, 157, 158 range, 32, 77, 127, 191, 192, 206, 218, 233
profiles, 272 rank, 323, 360
profiling, 35, 53, 54, 63, 65, 154, 155, 156, 157, 158, 160, 161, ranking, 327, 387
175 ranks, 178
programmatic, 93, 245, 246, 247, 248, 249, 250, 251, 274 ratio, 25, 27, 32, 36, 141, 183, 248, 287, 298
programmed, 63, 132 ratios, 25, 36, 141
programmers, 91 rds, 8
prohibit, 387 reaction, 196
prolific, 379 reactive, 5
promote, 210, 221 read-consistency, 279, 280, 282, 283, 293
promoted, 17, 205, 208, 210, 211, 212, 215 read-consistent, 244, 279, 280, 281, 282, 284, 291, 301
promotes, 210, 348 read-consistent-intensive, 248
promotion, 205, 207, 210, 212 read-consistent-related, 280
proof, 130 readers, 3, 242
proprietary, 150, 192, 257 read-induced, 285
protected, 76, 77, 93, 203, 204, 244, 298 reading, 9, 12, 29, 53, 101, 102, 103, 135, 136, 141, 150, 156,
protecting, 199, 204, 219 189, 251, 289, 325, 380
protection, 71, 189 read-related, 149
protocol, 167 reads, 21, 39, 40, 41, 46, 47, 51, 53, 54, 61, 62, 126, 127, 128,
prove, 128, 210, 249 129, 130, 149, 157, 173, 178, 179, 180, 202, 208, 210, 214,
proved, 162 319, 321, 329, 335, 347, 349, 366
proven, 2, 8, 11, 12, 100, 156 readv, 42, 43, 44, 91, 128, 129, 220
ps, 42, 44, 80, 82, 119, 128, 143, 183, 216, 217, 218, 307, 316, reallocated, 74
319, 354 reaper, 205
pseudocode, 79, 80, 81, 82, 83, 94, 95, 96 rebooted, 110, 133, 134
pswpin/s, 121 rebuilding, 246
pswpout/s, 121 rebuilt, 267, 277
publications, 36, 328 recalculate, 383
publish, 17, 207 recall, 73, 217, 242, 262, 268, 280
published, 27, 36, 157 recompilation, 246
publishing, 36 recompilations, 249
purchase, 176, 186 reconfiguration, 130
purchased, 63, 126, 186, 264 record, 37, 55, 83, 183, 195, 242, 293
purchases, 332 recorded, 37, 83, 94, 98, 111, 139, 142, 144, 151, 163, 182,
purchasing, 158 187, 244, 279, 280, 281, 282, 283, 285, 287, 292, 293, 305
push-to-disk, 218, 220 recording, 38, 292
pwrite, 216 records, 43, 44, 83, 144, 183, 201, 292, 293
pwrite64, 143, 216, 218, 316, 319 records_per_block, 201
recover, 291, 293, 325
recoverability, 315, 325, 326
Q recoverable, 76, 302
queue, 7, 22, 27, 28, 29, 30, 31, 44, 45, 53, 54, 55, 62, 63, 68, recovered, 280
69, 81, 83, 84, 85, 102, 103, 104, 105, 106, 108, 109, 110, recovery, 279, 280, 282, 283, 292, 293, 300, 326
116, 117, 118, 123, 124, 125, 130, 134, 135, 136, 146, 151, recurring, 11, 17
recursive, 55, 138, 139, 140, 141, 145, 146, 147, 262, 263, requested, 43, 45, 49, 50, 70, 72, 80, 81, 94, 98, 121, 122,
286 126, 148, 167, 206, 287, 347, 365
recycle, 148, 212, 214, 215, 269, 271 requesting, 51, 179, 195, 267, 309
recycled, 284 requests, 21, 39, 43, 48, 72, 79, 80, 86, 88, 93, 97, 98, 99,
redo, 61, 73, 87, 89, 130, 147, 173, 186, 187, 231, 279, 280, 109, 122, 124, 125, 126, 127, 128, 129, 130, 148, 149, 157,
281, 282, 283, 284, 286, 291, 292, 293, 294, 295, 296, 297, 158, 161, 178, 179, 180, 270, 276, 287, 317, 335, 340, 343,
298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 345, 348, 349, 358, 365, 368, 371, 372, 379
310, 311, 314, 315, 316, 320, 321, 322, 324, 325, 326, 328, require, 7, 14, 16, 37, 91, 92, 122, 132, 163, 186, 228, 243,
332, 335, 344, 349 244, 245, 246, 247, 248, 255, 261, 264, 267, 271, 275, 292,
redo_writes, 130 313, 327, 334, 336, 365, 370
redo-related, 5, 280, 292, 296, 297, 299, 307, 308, 310, 316, required, 14, 15, 17, 37, 53, 72, 80, 111, 156, 159, 171, 175,
317 176, 188, 200, 203, 209, 223, 228, 242, 250, 259, 261, 266,
reference, 9, 15, 24, 45, 51, 55, 56, 93, 96, 97, 112, 201, 213, 268, 269, 275, 276, 277, 281, 283, 293, 296, 297, 327, 333,
241, 249, 253, 255, 256, 267, 281, 283, 325, 338, 383 334, 344, 353, 372, 382, 385, 386
referenced, 14, 50, 93, 183, 234, 249, 250, 255, 278, 365 requirement, 214, 228, 268, 301, 332, 357
references, 8, 45, 75, 175, 207, 252, 253, 255, 308 requirements, 16, 21, 71, 72, 111, 118, 122, 123, 127, 130,
referencing, 22, 31, 93, 235, 240, 241, 255, 282 132, 141, 148, 161, 179, 182, 218, 219, 220, 245, 246, 250,
referred, 188, 240, 244, 300 251, 292, 297, 299, 300, 314, 315, 316, 317, 319, 320, 321,
referring, 46, 128, 189, 248, 251, 308, 332 324, 326, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340,
refers, 46, 120, 241, 251 355, 356, 360, 361, 363, 381
region, 208, 211, 212, 215 requires, 33, 37, 71, 74, 75, 92, 116, 119, 138, 158, 177, 191,
regions, 208, 212 199, 201, 214, 234, 238, 243, 244, 249, 250, 255, 257, 259,
262, 263, 267, 275, 280, 285, 292, 311, 327, 345, 359, 369
registration, 264
requiring, 14, 88, 195, 205, 261, 310, 314
relational, 71, 72, 186, 189, 223, 231, 233, 246, 250, 292
reserved, 174, 175, 272, 277
release, 8, 45, 74, 88, 91, 115, 137, 177, 195, 208, 209, 221,
reserves, 198, 269, 275
230, 256, 259, 265, 266, 275, 276, 279, 294, 296, 308, 309,
310, 384 reset, 45, 165, 167, 170, 210, 211, 212, 213, 215
released, 36, 80, 206, 258, 259, 277, 279, 297 resets, 212
releases, 71, 88, 89, 212, 228, 295, 296, 297, 334 resetting, 212, 238
reloaded, 248, 256 reside, 4, 14, 21, 51, 66, 119, 126, 128, 148, 149, 156, 158,
reloading, 256 189, 190, 191, 195, 198, 203, 204, 205, 208, 210, 212, 220,
228, 250, 263, 280, 281, 283, 325, 347, 368, 369
reloads, 256
resided, 43, 153, 158, 160, 375
remote, 123, 129
resident, 118, 119
remotely, 154
resides, 18, 36, 44, 50, 51, 65, 75, 102, 126, 157, 158, 165,
renamed, 155, 249
183, 185, 186, 189, 190, 191, 199, 208, 220, 247, 250, 261,
renewed, 387
323
reparsing, 245
residing, 21, 153, 159, 206, 219, 228, 293, 365
repeat, 16, 45
resist, 199
repeated, 25, 38, 40, 80, 81, 94, 147, 155, 333
resolve, 5, 22, 67, 71, 102, 201, 202, 268, 271, 307, 309, 315,
repeatedly, 20, 25, 38, 39, 80, 94, 97, 143, 144, 188, 197, 326
205, 208, 209, 223, 227, 262, 263, 264, 267, 268, 301, 307, resolved, 123, 142, 264, 275, 280
312, 324, 331
resolving, 36, 68, 70, 188, 233, 247, 276, 286
replace, 74, 189, 203, 205, 208, 209, 210, 221, 243, 283
resource, 18, 37, 40, 61, 68, 142, 143, 177, 178, 258, 259,
replaceable, 203
265, 292, 336, 369, 373, 381, 385, 386, 387
replaced, 3, 93, 157, 187, 188, 189, 205, 206, 211, 215, 243,
resource-consuming, 24, 137, 369, 380
268, 281, 341, 361
resource-intensive, 279
replacement, 209, 210
resources, 7, 12, 18, 22, 40, 62, 68, 69, 85, 98, 104, 110, 111,
replaces, 205, 210
117, 118, 128, 156, 157, 161, 176, 185, 197, 203, 244, 246,
replacing, 98, 200, 278 250, 275, 276, 279, 280, 285, 292, 293, 314, 345, 346, 348,
repository, 37, 177 366, 367, 368, 369, 379, 382, 385, 386, 387
represent, 29, 30, 31, 32, 53, 68, 76, 85, 103, 117, 134, 148, respects, 301
194, 197, 244, 265, 329, 338, 344, 386 response, 13, 14, 18, 19, 21, 22, 27, 28, 29, 30, 31, 36, 39, 41,
representation, 7, 73, 187, 194, 248, 353, 381 44, 46, 52, 53, 56, 57, 61, 63, 64, 65, 67, 68, 71, 83, 84, 85,
representatives, 14 86, 104, 105, 106, 107, 108, 109, 110, 117, 118, 121, 123,
represented, 61, 156, 203, 232, 247, 293, 338 124, 125, 126, 127, 128, 129, 130, 134, 136, 137, 149, 151,
representing, 349, 375 152, 153, 154, 155, 156, 157, 178, 179, 180, 181, 185, 195,
represents, 31, 68, 69, 73, 81, 88, 105, 111, 112, 121, 124, 197, 214, 220, 227, 228, 244, 252, 284, 285, 291, 292, 303,
186, 187, 193, 197, 199, 209, 244, 336, 339, 346, 369 304, 309, 314, 315, 323, 324, 328, 340, 341, 342, 343, 344,
request, 21, 27, 28, 42, 43, 44, 49, 53, 73, 78, 80, 91, 94, 96, 345, 346, 347, 348, 349, 350, 351, 352, 354, 356, 358, 359,
97, 122, 123, 125, 126, 127, 130, 143, 148, 149, 195, 206, 360, 361, 362, 363, 364, 365, 367, 370, 371, 372, 373, 374,
216, 219, 233, 270, 277, 292, 307, 308, 309, 310, 311, 324, 375, 376, 378, 379, 380, 381, 383, 384, 387
341, 342, 345, 347 response-time, 4, 5, 11, 18, 21, 22, 24, 25, 27, 29, 30, 33, 35,
36, 45, 53, 54, 55, 56, 57, 65, 67, 102, 107, 109, 116, 120,
125, 135, 151, 154, 157, 158, 169, 173, 178, 180, 181, 262, sample, 14, 15, 17, 45, 47, 88, 89, 111, 112, 113, 116, 117,
292, 304, 327, 328, 341, 342, 343, 346, 347, 348, 349, 350, 121, 128, 132, 133, 145, 151, 176, 179, 182, 184, 196, 223,
351, 352, 353, 354, 355, 356, 358, 359, 360, 361, 362, 363, 224, 234, 312, 313, 319, 338, 339, 340, 344, 346, 347, 352,
364, 365, 369, 370, 371, 372, 375, 376, 378, 381, 382, 383, 353, 355, 358, 365, 367, 370, 375, 376, 377, 378, 382, 383,
384, 385, 387 384
response-time-based, 182 sampled, 84, 178, 339
responsibilities, 8 samples, 15, 51, 84, 88, 94, 116, 127, 132, 133, 177, 179, 182,
responsibility, 13, 15, 72, 216, 314 183, 196, 224, 284, 304, 305, 312, 346, 349, 351, 353
responsible, 4, 31, 77, 86, 149, 178, 179, 183, 197, 237, 297, sampling, 15, 62, 110, 111, 112, 121, 132, 157, 163, 176, 177,
301 179, 182, 184, 330
responsibly, 364, 382, 387 sar, 9, 20, 110, 111, 112, 116, 117, 118, 121, 122, 125, 129,
responsive, 109 314, 338
responsiveness, 109, 110, 186 saturate, 265
restart, 187, 259, 275 saturated, 85
restartable, 324 saturation, 78
restarted, 269, 273, 274 scalability, 103, 367, 368, 374, 386
restarts, 260 scale, 23, 85, 114, 160, 161, 359
restated, 156 scaled-down, 204
retained, 37 scaling, 386
retention, 244 scan, 11, 71, 81, 122, 195, 197, 198, 206, 208, 210, 219, 221,
retrieval, 112, 241 222, 233, 294, 295
retrieve, 21, 48, 50, 51, 112, 176, 243, 248, 250, 271, 372 scanned, 80, 122, 205, 206, 221, 261, 376
retrieved, 52, 56, 112, 228, 281 scanning, 199, 208, 210, 211, 221, 243, 276
retrieves, 128, 229, 366 scans, 122, 199
return, 21, 43, 49, 52, 79, 91, 95, 96, 192, 195, 270, 304, 311, scarce, 85, 157, 292
345, 372 scatter, 132
returned, 110, 127, 129, 168, 199, 265, 302, 303 scattered, 41, 43, 45, 46, 47, 48, 49, 50, 51, 52, 53, 64, 65, 90,
returning, 17, 214, 265, 271, 303 99, 126, 127, 148, 149, 150, 175, 178, 179, 180, 181, 202,
returns, 16, 81, 97, 247, 269, 275, 302, 365 214, 228, 233, 347, 356, 365, 366, 367, 368
reuse, 238 scenario, 100, 208, 220, 262, 301, 339, 385
reused, 248, 263 scenarios, 296
revenue, 13 schedule, 9, 16, 365
reversal, 232 scheduled, 19, 123
reverse, 232, 233 scheduling, 31, 81, 104, 110, 111, 197
reversed, 232, 233 schemas, 312
reverses, 232 scheme, 53, 56, 98, 194, 204, 207, 208, 210, 242, 283
review, 93, 101, 116, 146, 235, 268, 288, 327, 365, 380 schemes, 54
reviewed, 93 school, 31
reviewers, 93 science, 102, 110, 207
reviewing, 63 scientific, 5, 151, 386
rewrites, 23 scientifically, 327
ring, 15, 43, 56, 183, 184, 277, 305 scn, 242, 323
rollback, 230, 231, 280, 293, 300, 312 scope, 17, 22, 24, 27, 289
rollbacks, 173, 312, 331 scoping, 14
rolled, 176, 231, 238, 240, 241, 293 search, 29, 31, 36, 76, 77, 150, 167, 191, 194, 200, 201, 208,
root, 13, 42, 44, 120, 133 219, 220, 221, 223, 231, 234, 248, 250, 270, 305, 314, 365,
row-level, 188, 236, 237, 242, 372 384
rownum, 175 searched, 194, 249, 251, 258, 271
rtsess.sql, 56, 65, 155, 157 searcher, 77
rtsess9.sql, 63, 65 searches, 77, 270, 277
rtsysx.sql, 56, 57, 62, 151, 157 searching, 76, 90, 151, 191, 193, 205, 211, 222, 231, 249,
rule-of-thumb, 149 250, 251, 265, 271, 277, 355
runq-sz, 117, 118 search-type, 191
runtime, 186 seasonal, 332
secure, 2, 14
security, 71, 120, 152, 153, 163, 250
S security-related, 161
safeguards, 20 segment-based, 279
salesman, 2 segmented, 383
salespeople, 13, 130 segmenting, 203
salesperson, 13, 37 segment-related, 244
segments, 118, 119, 158, 226, 227, 228, 229, 230, 231, 244,
248, 278, 279, 280, 281, 284, 293, 300, 383
self-adjusting, 142, 176, 187, 216 235, 236, 248, 249, 255, 259, 261, 262, 267, 278, 280, 293,
self-aware, 142 294, 295, 301, 304, 317, 348, 384
self-explanatory, 197 session-specific, 259
selfish, 68, 208 set_client_id_trigger, 167
self-managing, 104 sharable_mem, 260, 273, 274
self-preservation, 48 share, 2, 62, 72, 80, 157, 284, 300, 301
self-tuning, 142 shareable, 264
semaphore, 217, 306 shared, 5, 48, 72, 74, 80, 89, 90, 93, 94, 96, 97, 105, 118, 119,
seminars, 9 141, 158, 161, 162, 165, 182, 186, 188, 195, 200, 244, 245,
semop, 319 246, 247, 248, 250, 256, 258, 259, 260, 261, 265, 266, 267,
semtimedop, 129, 143, 217, 307, 319 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 281,
separate, 7, 55, 56, 74, 92, 149, 210, 247, 384 283, 284, 286, 287, 289, 292, 295
separated, 158, 384 shared_pool, 260, 272, 273, 274, 275, 277, 278, 287, 289
separately, 18 shared_pool_reserved_size, 277
separation, 158, 160 shared_pool_size, 287, 289
sequence, 75, 147, 227, 228, 231, 232, 233, 238, 239, 240, shared-server, 161
333 shell, 129, 132, 252
sequences, 71, 232, 273 shift, 5, 28, 29, 107, 143, 161, 164, 187, 197, 214, 276, 297,
sequential, 43, 46, 48, 53, 61, 90, 99, 126, 127, 148, 175, 191, 309, 314, 361, 362, 363, 364, 375, 376, 383
194, 195, 197, 202, 206, 208, 214, 220, 228, 232, 262, 263, shifted, 161, 280, 360, 363, 364, 372, 383
315, 317, 324 shifting, 14, 276, 289, 362, 364
sequentially, 77, 194, 251, 261, 380 shifts, 65, 202, 360, 361, 372, 383
serial, 63, 67, 69, 73, 75, 155, 168, 177, 181, 267, 373, 377, shipping, 13, 301
378, 385 shmget, 119
serialization, 67, 68, 69, 70, 74, 77, 91, 92, 97, 98, 100, 103, shmget-related, 119
189, 194, 195, 196, 204, 209, 246, 250, 252, 255, 261, 268, shortage, 367
280, 281, 292, 297, 384, 385 shrink, 241
serialize, 271 shutdown, 269
serialized, 77, 261, 385 signal, 182, 219
serializes, 67 signals, 219
serially, 93, 385 similar, 8, 14, 17, 25, 54, 62, 63, 65, 73, 94, 129, 145, 159,
series, 4, 13, 25, 26, 29, 269, 284, 337, 345, 364 164, 169, 178, 181, 214, 236, 248, 250, 263, 264, 265, 269,
serious, 86, 105, 195, 199, 202, 232, 249, 261, 262, 276, 288, 289, 295, 315, 322, 324, 352, 361, 383
289, 297, 305, 327, 329 similarities, 94, 346
seriously, 228, 278, 289, 291, 315, 324 similarly, 23, 138
serv_mod_act_stat_disable, 168 simplicity, 27, 43, 73, 282, 327
serv_mod_act_stat_enable, 168 simplification, 149, 350
serv_mod_act_trace_disable, 168 simplifications, 194
serv_mod_act_trace_enable, 168 simplified, 153, 163, 372
servers, 7, 27, 68, 69, 103, 104, 105, 153, 162, 340 simplifies, 149
serves, 24, 85, 103 simplify, 53, 364
service, 27, 28, 29, 30, 31, 53, 54, 55, 57, 62, 68, 83, 84, 85, simplistic, 21, 36, 53, 75, 123, 248, 308
103, 110, 123, 124, 130, 132, 135, 136, 139, 140, 141, 142, simulate, 232, 266
144, 145, 146, 151, 152, 164, 167, 168, 171, 173, 177, 178, simulates, 97
218, 307, 314, 319, 323, 340, 341, 342, 343, 344, 345, 346, simulation, 97, 124
347, 353, 354, 355, 357, 358, 359, 360, 361, 363, 365, 367, simulation-based, 157
368, 369, 370, 371, 372, 373, 375, 376, 377, 378, 382, 384 simulator, 89
service_name, 164, 171 simultaneous, 297, 310, 372
serviced, 27, 69, 103, 109, 110, 116, 314, 317, 341, 344, 345 simultaneously, 73, 185
service-level, 127, 132 single-task, 159, 160
services, 27, 36, 153 sizing, 156
servicing, 69, 103, 111, 112, 124, 130, 341, 345 slave, 75, 89
sess_user, 167 sleep, 41, 79, 81, 82, 83, 84, 85, 86, 88, 91, 94, 95, 97, 98, 114,
session_cached_cursors, 99, 261 138, 140, 151, 152, 172, 217, 265, 279, 292, 294, 305, 312,
session_state, 178, 180, 181 320, 330, 334
session_trace_disable, 168 sleeping, 80, 83, 84, 85, 86, 91, 94, 98, 195, 197, 199, 209,
session_trace_enable, 168 214, 308, 309, 320, 342
session-focused, 56, 384 sleeps, 81, 88, 89, 90, 151, 172, 217, 267, 288, 294, 299, 306,
session-level, 35, 45, 48, 62, 63, 64, 65, 137, 142, 157, 158, 349
169, 176, 182, 248 sleeptime, 79
sessions, 44, 45, 46, 47, 51, 52, 53, 61, 62, 65, 84, 87, 88, 94, slept, 15, 294, 295
99, 154, 155, 157, 164, 165, 169, 171, 172, 173, 176, 177, snappy, 109, 387
178, 180, 181, 182, 184, 225, 226, 227, 228, 231, 232, 234,
snapshot, 45, 46, 54, 56, 57, 239, 242, 244, 307, 329, 330 statistics, 5, 29, 31, 32, 33, 46, 49, 56, 63, 110, 115, 129, 132,
snapshots, 48, 57 137, 139, 142, 144, 145, 146, 147, 151, 158, 163, 164, 167,
software, 25, 35, 37, 97, 123, 124, 156, 161, 162, 184, 186, 168, 169, 170, 171, 172, 173, 174, 176, 182, 183, 187, 222,
245, 279, 289, 302, 324 233, 250, 256, 267, 284, 287, 319, 320, 331, 333, 334, 335,
solve, 13, 26, 102, 130, 175, 195, 199, 203, 206, 231, 264, 279, 337, 338, 344, 347, 353, 354, 355, 383
310, 311, 322, 324, 359 statistics_level=typical, 183
solved, 142, 176, 221, 297, 307, 358 statistics-based, 157
solver, 41 stats, 168, 172, 173, 330
solves, 65, 176 stopwatch, 153
solving, 1, 2, 19, 20, 21, 41, 86, 223, 234, 268, 311, 312, 326 storage, 21, 126, 201, 233, 291, 326
sort=fchdsk, 174 store, 93, 232, 244, 330
sorted, 21, 90, 167, 232 stored, 45, 71, 112, 119, 174, 176, 178, 182, 184, 190, 191,
sorts, 65, 265, 286, 326, 349 227, 228, 231, 232, 238, 239, 244
sp_size, 270 storehouse, 251
special_cases, 232 stores, 56, 112, 177, 182, 330
speculate, 285 stories, 1, 13, 20, 185
speed, 3, 68, 69, 102, 124, 125, 184, 186, 208, 288, 366, 367 storing, 179, 238, 263
speeds, 186 story, 20, 22, 23, 25, 123, 155, 194, 208, 223, 251, 264, 292,
speedy, 5, 37 295, 381
spend, 8, 16, 37, 63, 85, 104, 110, 155, 176, 177, 179, 198, strace, 42, 44, 82, 83, 128, 129, 143, 216, 217, 218, 307, 316,
292, 371 319
spending, 45, 57, 65, 71, 104, 291, 292, 308, 309 strand, 298
spends, 180, 265, 314 strands, 298, 306
spin, 78, 79, 80, 81, 83, 85, 86, 91, 94, 151, 152, 265 strategically, 156
spin/sleep, 265 strategies, 70, 135, 164, 205, 229, 326, 360, 369, 371
spin_get, 79, 81 strategy, 17, 72, 138, 142, 176, 205, 213, 214, 233, 242, 268,
spin-and-sleep, 94 271, 277, 297, 305, 306, 311, 313, 314, 350, 358
spindle, 103 streamline, 93
spinning, 43, 44, 80, 81, 83, 84, 86, 91, 94, 97, 98, 126, 135, strength, 19
151, 195, 197, 199, 208, 214, 265, 267, 288, 342 strengthen, 19
spins, 80, 97, 208, 220 strengthened, 41
spin-sleep, 79, 80, 81 strengthens, 134, 158
split, 208, 212, 344, 385, 386 stress, 12, 213, 220, 227, 251, 266, 289
splitting, 233, 368, 386 stressed, 22, 137, 185, 186, 202, 213, 214
spreadsheet, 29, 89, 132, 305 stresses, 22
spreadsheets, 9 stressing, 41, 230, 324
sql_address, 234 string, 257, 303
sql_hash_value, 234 stringing, 132
sql_id, 227 strings, 230
sqlplus, 171, 253, 272, 312 structural, 23, 314
stabilize, 16, 284 structure, 18, 22, 24, 69, 70, 72, 73, 74, 75, 76, 77, 78, 80, 81,
stable, 110, 328 91, 92, 93, 96, 97, 170, 183, 184, 186, 189, 191, 192, 194,
stack, 118, 119, 120, 323 197, 199, 200, 202, 203, 204, 208, 210, 213, 215, 220, 221,
stage, 4, 23, 142, 212, 279 229, 230, 231, 238, 243, 245, 247, 250, 251, 259, 261, 265,
268, 281, 296, 310
standard, 4, 7, 9, 15, 21, 36, 91, 94, 102, 125, 132, 149, 159,
structured, 4, 22, 23, 71, 138, 315
175, 186, 189, 205, 206, 236, 268, 270, 278, 279, 281, 282,
287, 300, 301, 302, 304, 336, 338, 343, 351, 355, 377 structures, 5, 66, 69, 70, 71, 72, 73, 75, 76, 77, 92, 93, 185,
standardizing, 268, 278 186, 194, 195, 199, 208, 209, 223, 228, 229, 230, 231, 233,
237, 244, 245, 246, 247, 250, 262, 274, 278, 279, 280, 281,
standards, 153
282, 295, 314, 318, 375
stat_name, 144, 145, 173, 334
student, 120, 264
statement, 14, 16, 31, 55, 62, 72, 73, 74, 81, 82, 94, 96, 105,
students, 1, 2, 3, 8, 18, 25, 145, 230, 247, 264
130, 137, 138, 139, 157, 175, 178, 179, 180, 181, 194, 201,
studied, 264
202, 208, 220, 222, 227, 228, 232, 240, 242, 247, 248, 249,
250, 253, 255, 257, 258, 261, 263, 264, 265, 266, 268, 271, studies, 20, 26
277, 303, 307, 322, 332, 341, 351, 358, 360, 361, 369, 372, study, 8, 20, 21, 29, 354
374, 375, 376, 377, 379, 380, 381, 382, 383 studying, 129
static, 32, 298 subatomic, 133
statistic, 12, 32, 33, 57, 84, 102, 117, 130, 137, 138, 143, 144, subdivide, 138, 139
145, 146, 149, 182, 294, 295, 302, 312, 321, 328, 330, 331, subject, 7, 8, 27, 36, 157, 188
337, 338, 347, 353, 356, 357, 358 subjects, 3
statistical, 149, 163, 176, 177, 305 submission, 93
statistically, 94, 182, 284, 304, 305 submit, 127, 138, 216, 263
submits, 61, 264
submitted, 42, 44, 129, 130 sysdba, 197, 272

submitting, 82, 216, 379 sysstat, 31, 32, 45, 55, 56, 84, 111, 129, 130, 136, 137, 138,
suboptimal, 20 140, 141, 142, 145, 146, 147, 168, 176, 182, 287, 302, 312,
subpool, 269, 270, 271, 276 319, 320, 321, 330, 332, 333, 334, 335, 346, 349, 358, 368
subpools, 268, 269, 270, 271, 275 systematic, 343, 381
subside, 81, 275, 276
subsides, 230, 231 T
substance, 327
substantial, 220, 271, 304 tab$, 139, 248
substantially, 324 tablespace, 50, 51, 225, 230, 231, 301
substitute, 7 tablespace_name, 51, 225
subsystem, 4, 18, 19, 20, 21, 22, 24, 39, 43, 44, 48, 53, 61, tch, 213, 215, 297
83, 85, 102, 103, 104, 107, 109, 110, 111, 112, 116, 120, 122, technique, 18, 40
123, 124, 125, 126, 127, 128, 129, 130, 134, 136, 141, 148, techniques, 2, 3, 4, 17, 134, 330, 365
149, 175, 195, 197, 202, 203, 206, 214, 219, 220, 228, 265, telecom, 203
292, 299, 300, 301, 304, 307, 309, 311, 314, 315, 316, 317, temp, 81, 127, 301, 349
319, 320, 323, 324, 326, 336, 337, 339, 340, 341, 342, 343, template, 167, 355, 359, 365, 366, 368, 369, 370, 371
345, 346, 348, 354, 355, 365, 366, 367, 368, 370, 371, 372, temporarily, 217, 223, 244, 307
373, 379, 381, 382, 385 temporary, 224, 300, 301, 321, 324
subsystems, 7, 18, 19, 21, 24, 102, 104, 105, 111, 122, 123, terminal, 159, 167
124, 125, 130, 219, 342, 346, 348, 354, 371 termout, 51, 225, 320
summarization, 37 testimony, 85
summarize, 24, 25, 26, 40, 45, 53, 193, 299, 305, 324, 365, testing, 36, 129, 130, 184, 186, 207, 214, 259, 265, 279, 289
367, 369, 372, 387 tests, 2, 25, 26, 143, 151, 155, 184, 186, 188, 208, 219, 233,
summarized, 24, 25, 37, 49, 187, 330, 345 248, 284, 286, 318, 386
summarizes, 37, 214 theory, 29, 67, 84, 85, 102, 105, 107, 116, 123, 136, 339, 341,
summarizing, 23 343, 346, 348, 352, 355, 359, 360, 361, 363
summary, 4, 17, 22, 24, 33, 45, 57, 61, 63, 116, 128, 151, 297, third-party, 176, 177, 182
317, 319, 336, 341, 345, 364 three-dimensional, 190
support, 19, 22, 97, 98, 132, 161, 162, 176, 203, 264, 265, threshold, 206, 208, 215, 221, 222, 260, 277, 280
266, 275, 286, 288, 289, 315 throughout, 7, 110, 205, 246, 354
supportability, 316 throughput, 4, 6, 68, 109, 110, 134, 186, 228, 233, 304, 309,
survey, 275 360, 361, 372
sustain, 130 throughput-related, 280
svctm, 129 tick, 209
swap, 118, 120, 122, 266 time_type, 40
swap-out, 120 time_waited, 47, 84
swap-outs, 120 time_waited_micro, 47
swapped, 120, 121, 122 time-based, 56, 153
swapped-out, 120 time-consuming, 69, 280, 353, 384
swapping, 22, 118, 120, 121, 122 time-critical, 71
swaps, 121 timed_statistics, 49
swenq.sql, 235 time-enduring, 9
swhistx.sql, 150, 317, 318 time-focused, 136, 137, 152
swpctx.sql, 45, 46, 87, 99, 126, 148, 202, 262, 263, 311, 317 time-intensive, 141
swsessid.sql, 48 timekeeping, 55
swsid.sql, 50 timer, 38, 39, 355
swswp.sql, 52, 87, 224, 236 time-related, 83
symbol, 328, 354 timers, 37, 38, 39
symbols, 7, 327, 328 timing, 12, 15, 17, 41, 43, 44, 45, 55, 153, 154, 169, 246, 249,
sync, 46, 48, 61, 87, 99, 126, 127, 148, 152, 202, 262, 263, 316, 317, 331, 332, 368
301, 302, 304, 305, 306, 308, 309, 310, 311, 312, 313, 314, tkprof, 41, 169, 174
315, 317, 322 tnsping, 65, 132, 155
synchronous, 43, 303, 304, 305
token, 72, 80
synchronously, 304
tolerance, 323
synonymous, 251 tolerate, 302
synthetic, 84 tools, 2, 4, 9, 15, 20, 32, 33, 36, 56, 102, 110, 116, 128, 135,
sys.auth$, 248 136, 154, 161, 162, 169, 176, 177, 222
sys.col$, 233, 248, 250 topdml.sql, 322, 323
sys.tab$, 248 toprank, 323
sys_context, 167 total_waits, 47
sys_time, 55, 56, 84, 136, 142, 143, 145, 168, 318, 333, 334, touch-count, 204, 207, 208, 213, 214, 279
346, 349, 356, 357, 365
touched, 62, 189, 205, 207, 209, 210, 212, 215, 229, 242, 293, undersized, 155, 156, 222
295, 336, 376, 378 underutilized, 123
touches, 81, 199, 242, 293, 294, 353, 376 undo, 199, 224, 226, 229, 230, 231, 236, 238, 239, 240, 241,
touching, 210, 295 242, 243, 244, 247, 248, 278, 279, 280, 281, 282, 283, 284,
tp, 9, 29, 31, 36 286, 288, 289, 291, 293
trace, 41, 42, 43, 55, 82, 128, 135, 139, 142, 143, 157, 163, undo_management, 286
164, 165, 167, 168, 169, 170, 172, 173, 174, 175, 217, 218, undocumented, 150
249, 252, 253, 254, 255, 256, 307, 315, 316, 317, 318, 319 undo-related, 281, 292
traced, 44, 55, 82, 129, 156, 163, 217 unified, 19
tracing, 42, 43, 44, 97, 125, 128, 129, 130, 143, 154, 155, 157, uninterrupted, 13
163, 164, 165, 167, 170, 171, 172, 174, 184, 216, 292, 323, union, 213
376 unpin, 209, 221
traffic, 69, 132, 133, 159 unpinned, 93, 250
transaction, 27, 29, 53, 54, 62, 69, 72, 84, 103, 109, 123, 124, upgrade, 102, 274, 278
187, 203, 219, 230, 231, 233, 234, 236, 237, 238, 239, 240, upgrades, 246
241, 242, 243, 244, 263, 279, 281, 282, 283, 284, 286, 291, upgrading, 278
293, 298, 301, 302, 310, 311, 312, 313, 314, 325, 328, 331, uptime, 16, 71, 117, 228, 381
340, 341, 344, 345, 360, 361, 362, 363 utilization, 28, 30, 31, 53, 57, 62, 83, 85, 86, 109, 110, 111,
transactional-focused, 62 112, 113, 114, 115, 116, 117, 124, 129, 130, 135, 136, 141,
transactionally, 280 146, 327, 332, 336, 338, 339, 340, 341, 342, 346, 348, 349,
transactions, 14, 15, 28, 29, 68, 69, 103, 104, 134, 231, 236, 352, 354, 356, 357, 358, 360, 361, 363, 368, 373, 374, 375,
238, 239, 240, 241, 242, 243, 286, 293, 305, 313, 325, 328, 376, 377, 378, 379, 380, 381, 382, 385, 387
331, 332, 340, 342, 361, 383 utilization-related, 111
transfer, 301, 363 utilizations, 130
transferred, 301, 332 utilized, 112, 202, 203, 338, 341, 342, 345, 348, 367, 368
transferring, 345 utilizes, 296
transfers, 220 utilizing, 141
transform, 20, 53, 69, 125, 192, 264, 279, 281, 343
transformation, 264
transformed, 265, 270 V
transforming, 264, 279 v$active_session_history, 136, 182, 183, 312, 313, 384
transforms, 27, 265 v$activity_session_history, 179
transition, 169 v$bc, 190
transitioned, 300 v$bh, 187, 188, 189, 190, 200, 216
translate, 259, 319 v$bh-based, 188, 189
translates, 102, 314, 367 v$client_stats, 168, 172, 173
transported, 205 v$datafile, 51, 225
trap, 14, 57, 132, 157 v$db_object_cache, 260, 272, 273, 277
trapped, 13 v$enqueue_statistics, 233
trc, 170, 174, 175, 240, 241, 242 v$event_histogram, 149, 150, 318
trend, 247, 339 v$event_name, 45
trending, 15 v$filestat, 129, 130, 334
trends, 132, 133 v$latch, 45, 49, 84, 87, 88, 89, 90, 98, 213, 288, 299, 349
trigger, 165, 167, 169, 171, 240, 280, 292, 293, 301, 305, 306, v$latch_children, 288, 299
308, 377 v$latch_misses, 90
triggered, 305, 315 v$latchname, 45, 49, 87, 88
triggering, 306, 314 v$library_cache, 256
triggers, 171, 273, 282, 283, 300, 315 v$lock, 233, 234
trunc, 171, 235 v$mutex_sleep, 98
truss, 42, 102 v$mutex_sleep_history, 98
trust, 12, 13, 15, 20, 24, 197 v$mystat, 294
trust-building, 169 v$open_cursor, 273
trusted, 150, 302 v$osstat, 37, 62, 115, 116, 117, 135, 336, 338, 349, 352, 354,
truth, 16, 36, 265 357, 368
truth-based, 3 v$rowcache, 248
tweak, 314, 367 v$serv_mod_act_stats, 168
tweaking, 207 v$service_stats, 168
two-task, 159, 160 v$ses_time_mode, 55, 333, 349
v$ses_time_model, 55, 349
U v$sess_time_model, 142, 143, 144, 145, 163, 318
uet$, 235
v$session, 45, 47, 48, 49, 50, 51, 52, 56, 62, 63, 65, 72, 87, waiting, 12, 16, 21, 22, 27, 28, 39, 40, 43, 44, 45, 47, 48, 49,
88, 96, 156, 164, 165, 171, 176, 179, 213, 216, 223, 224, 227, 50, 51, 52, 57, 63, 64, 65, 68, 69, 71, 83, 87, 88, 103, 104,
232, 234, 235, 236, 312, 313, 349, 384 109, 110, 111, 116, 117, 118, 143, 148, 154, 155, 175, 176,
v$session.sql_id, 227 178, 179, 180, 182, 195, 208, 220, 221, 224, 225, 226, 227,
v$session_event, 45, 47, 48, 49, 156, 176, 349 228, 232, 234, 235, 236, 267, 302, 304, 305, 308, 317, 325,
v$session_wait, 45, 47, 49, 50, 51, 52, 72, 87, 88, 213, 216, 343, 344, 372
223, 224, 232, 234, 235, 236, 384 wait-interface, 29
v$sesstat, 45, 55, 137, 142, 143, 144, 145, 155, 163, 176, waits, 27, 43, 46, 47, 48, 65, 68, 71, 103, 150, 168, 172, 174,
287, 312, 321, 333, 349 178, 184, 217, 219, 220, 221, 222, 223, 224, 227, 228, 229,
v$sgastat, 183, 247, 270, 287 231, 232, 233, 268, 310, 313, 316, 317, 319, 345, 349, 366,
v$sort_usage_view, 301 367
v$sql, 179, 222, 256, 263, 273, 323, 369 wait-time, 61
v$sqlarea, 256, 273 walk, 78, 81, 104, 124, 194, 208, 220, 251, 281, 355
v$sqltext, 263 walked, 123, 156
v$statname, 294 wall, 94, 107, 284, 285
v$sys_time_mode, 55, 56, 84, 136, 142, 143, 145, 168, 318, warehouse, 206
333, 334, 346, 349, 356, 357, 365 warehouses, 206
v$sys_time_model, 55, 56, 84, 136, 142, 143, 145, 168, wc, 114, 174
318, 333, 334, 349, 356, 357, 365 weight, 366
v$sysstat, 31, 32, 45, 55, 56, 84, 130, 136, 137, 138, 140, weighted, 366, 367
141, 142, 144, 145, 146, 147, 168, 176, 182, 287, 302, 312, willing-to-wait, 79, 80, 94
319, 320, 321, 330, 332, 333, 334, 335, 346, 349, 358, 368 wio, 64, 104, 111, 114
v$systats, 130 woke, 363
v$system_event, 45, 46, 47, 49, 56, 84, 86, 87, 126, 176, woken, 183, 217
311, 317, 349, 356, 365 workarea, 331
v$tablespace, 51, 225 workbook, 365
validate, 365 workday, 37
validated, 2, 365 workloads, 57, 310, 332, 362
validating, 36 worksheet, 365
variance, 32, 83 worksheets, 365
variances, 81 worst-case, 339, 340
variant, 300, 366 worthless, 61, 231
variations, 158, 191 wrapped, 198, 316
varied, 285 wrh, 323
variety, 25, 53, 110, 118, 167, 174, 184, 201, 217, 218, 245, write-focused, 61
259, 338 write-intensive, 220
vector, 280 write-per-second, 130
vectors, 280, 292, 296, 299 writer, 62, 125, 127, 130, 142, 143, 144, 188, 210, 211, 215,
vendors, 15, 20, 41, 69, 74, 104, 122, 123, 125, 128, 130, 154, 216, 217, 218, 219, 220, 221, 222, 280, 282, 283, 286, 292,
162, 176, 177 293, 295, 296, 297, 298, 301, 302, 303, 304, 305, 306, 307,
vendor-specific, 119, 323 308, 309, 310, 311, 314, 315, 316, 317, 318, 319, 320, 321,
vendor-supplied, 15 322, 323, 324, 325, 326, 334, 335
version, 8, 9, 25, 151, 155, 183, 201, 208, 241, 243, 244, 264, write-related, 130, 217
271, 278, 322, 323 writers, 210, 213, 215, 217, 220, 222
versions, 8, 55, 119, 260, 296, 334, 335, 336, 348 writes, 39, 54, 120, 127, 130, 173, 183, 188, 215, 216, 218,
version-specific, 8, 219 219, 281, 304, 305, 306, 315, 316, 317, 318, 319, 320, 321,
versus, 94, 97, 302 324, 329, 335, 349, 373
vertical, 27, 350 writing, 9, 13, 18, 36, 89, 103, 117, 183, 185, 216, 218, 219,
visual, 29, 33 220, 221, 222, 280, 286, 299, 300, 302, 305, 306, 307, 314,
visualizing, 183, 299 324, 325, 326
visually, 348 written, 2, 9, 18, 21, 23, 120, 124, 147, 165, 177, 183, 188,
vmstat, 9, 20, 110, 111, 112, 116, 117, 120, 121, 122, 314, 190, 204, 205, 216, 218, 219, 250, 257, 280, 282, 283, 292,
338, 354, 368 293, 299, 301, 302, 306, 314, 315, 326
volatile, 116
X
W x$bh, 213, 215
wait_class, 47, 49 x$kccle, 306
wait_class_#, 47, 49 x$kghlu, 269, 270
wait_class_id, 47, 49 x$ksmss, 269, 270
wait-based, 45, 87, 88, 224, 227, 236 x$ksppcv, 198, 270
wait-event, 11, 25, 26, 27, 29, 33, 36, 47, 56, 153, 292 x$ksppi, 198, 270
xcur, 188, 189

Opff4 33646

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Opff4 33646

Uploaded by

Copyright:

Available Formats

33646 abhinav sharma 15008 lake union hill way alpharetta, GA 30004

United States 7706088398 abheenav.sharma@gmail.com

Oracle Performance Firefighting

Printed and bound in the United States of America.

Fourth Printing: June 2010

Project Manager Copy Editor

Cover Design Technical Reviewers

The most fascinating thing in life is

About OraPub, Inc.

For more information, please visit http://www.orapub.com.

About the Technical Reviewers

CHAPTER 0. Introduction ........................................................................................1

CHAPTER 1. Methods and Madness ...................................................................11

CHAPTER 2. Listening to Oracle’s Pain...............................................................35

CHAPTER 3. Serialization Control........................................................................67

CHAPTER 4. Identifying and Understanding Operating System Contention...101

CHAPTER 5. Oracle Performance Diagnosis ....................................................135

CHAPTER 6. Oracle Buffer Cache Internals ......................................................185

CHAPTER 7. Oracle Shared Pool Internals .......................................................245

CHAPTER 8. Oracle Redo Management Internals............................................291

CHAPTER 9. Oracle Performance Analysis.......................................................327

About the Technical Reviewers

CHAPTER 0. Introduction ...........................................1

CHAPTER 1. Methods and Madness .......................11

Wait-Event Analysis ............................................................................................ 25

CHAPTER 2. Listening to Oracle’s Pain..................35

CHAPTER 3. Serialization Control ...........................67

Cache Buffer Chains....................................................................................76

CHAPTER 4. Identifying and Understanding

Operating System Contention ..........101

Load-Balancing Still Helps ............................................................................ 123

CHAPTER 5. Oracle Performance Diagnosis ........135

Perform Your Analysis ...............................................................................169

CHAPTER 6. Oracle Buffer Cache Internals ..........185

Implementing the Appropriate Solution Set ............................................. 227

CHAPTER 7. Oracle Shared Pool Internals............245

Flush the Shared Pool ...............................................................................275

CHAPTER 8. Oracle Redo Management Internals 291

Redo Copy Latch Contention ....................................................................... 310

CHAPTER 9. Oracle Performance Analysis...........327

Create the Response-Time Graph............................................................358

About the Author

About the Technical Reviewers

Daniel Fink has been working with Oracle since 1995,

Tim Gorman has worked with relational databases

K Gopalakrishnan (Gopal) is a strategic

Kirtikumar Deshpande works for Oracle

Dwayne King has been an Oracle professional since the

Why Buy This Book?

What Is the Value to Me?

• Know where to start your analysis.

Who Will Benefit?

• Are responsible for making performance better

How This Book Is Organized

What Notations Are Used?

Table 1. Numeric notations used in this book

Decimal Scientific Floating Point

Units of time are abbreviated as shown in Table 2.

Table 2. Abbreviations for units of time

Abbreviation Unit Scientific Notation

Table 3. Abbreviations for units of binary capacity

Symbol Unit Decimal Equivalent Power of 2 Equivalent

Table 4 gives a list of symbols used in this book.3