Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

Session I07

Detecting and Resolving


Locking Problems

Using MAINVIEW for DB2

Judy Quenet
BMC Software

DB2 UDB for z/OS


Tuesday, May 9, 2006 • 04:00 – 05:10 p.m.

Locking in DB2 is a complex subject, and V8 didn’t make it much


easier. This presentation focuses on
- causes of lock contention,
- how to detect locking problems with MAINVIEW for DB2,
- how to analyze them in either
- a single DB2, or
- a data sharing environment

Early warnings, current lock views, historical deadlock / timeout analysis


and thread history indicators help you focus on both effects and causes.

Note: My thanks to David Witkowski who helped me immensely by creating


workloads and capturing screens to demonstrate lock contention and the information
that is available when it occurs.

1
Overview
• Causes of lock contention

• How to detect locking problems

• Analyzing one DB2

• Analyzing a data sharing group

• Tuning considerations

First I’ll give a short summary of the various causes of locking problems and an
overview of lock information in DB2. That is followed by a monitoring strategy to
use those different information sources where needed, with some examples of
MAINVIEW for DB2 online screens. Then there is a walkthrough of two scenarios,
one analyzing a timeout in a single DB2 subsystem, and the other looking at a
deadlock caused by global contention in a data sharing group.

As for tuning considerations, there are so many – and for so many different
situations - that boiling it down to fit in one presentation became quite a challenge.
Instead of focusing on a few specific situations, a quick reference seemed more
appropriate. Therefore following a short summary, there are several slides that
summarize the best sources of further information, and a list of tuning
recommendations you can use as a reference. These are organized in the same
order as the recommendations in the DB2 documentation, to make it easier for you
to find additional details when needed. And of course there are a few notes
interspersed about where to find the relevant data in MAINVIEW for DB2.

2
Monitoring DB2 Locking - Why?
• DB2 always resolves lock contention
• So let it be?

• Lock suspensions are necessary for integrity


• But – can degrade performance, response times
• Timeouts or deadlocks resolve lock contention
• But – the resolution may not be what is wanted
• Terminates a request, adds a cost to retry
• Maybe the victim thread is the culprit!
• But maybe not . . .

• Proactive preparation can save you time


• Immediate warning of problems
• Understand where to go to diagnose problems
3

DB2 was designed from the very beginning to eventually resolve any lock conflict.
And lock suspensions are not bad, they are very necessary to preserve data integrity
and dependable application results. The real problems that need to be addressed are
those that occur too frequently, cause too much delay in response times (that may
endanger service level agreements), or impact well-behaving applications that may
have higher priority.

It is worthwhile spending a little time preparing yourself with some simple monitor
setup steps, and familiarizing yourself with where to find the relevant data when you
need it to diagnose a high-impact problem.

3
The Causes of Lock Problems
• SQL, SQL, SQL . . .
• Access paths
• Application-related
• Commit frequencies
• Updates not close to commit
• Incompatible access between concurrent
applications
• Inappropriate ACQUIRE/RELEASE
• Isolation level too restrictive
• Lock escalation
4

Of course, the tendency in DB2 is to blame the SQL, and there are usually plenty of
reasons to do so. But while poor access paths can cause contention, lock
considerations are often much broader. They can be categorized as either
application or system related issues. Here is a list of major considerations, but it
certainly isn’t a complete list.

First some of the application-related issues. The application variables related to


locking are many, such as lock size, duration, mode, different lock resource types,
lock promotion for increasing control, lock escalation to a higher level of object,
Bind options like isolation, acquire/release options, etc.

4
The Causes – More
• System or workload-related
• System contention increasing lock times
• Swapped out threads holding locks
• Utility or DDL / DCL contention
• Data design - LOCKSIZE, hot spots, etc.
• IRLM configuration
• Data sharing Coupling Facility set up
• ZPARM options
• Many variables, may be complex to diagnose

And here are some of the general system or workload-related considerations.

In fact, there are so many variables that impact locking, and so many differences in
your environments and applications, I decided not to spend a lot of time on just a
few specific locking problems and how to fix them. Instead, I have focused on
where to find the information you need, and have included a short summary of many
of the recommendations covered in the DB2 documentation. It can be used as a
quick reference for things you need to consider when you are working on one of
your own specific lock contention situations.

5
Locking - Where We’ve Come From
• Each DB2 release has provided improvements
• Here are just a few:
• Utility claim / drain processing
• Improved indexes
• Lock avoidance
• Row level locking option
• Reduced global contention in V8
• Lock holder priority escalation in V8

• Even though the basic philosophy has held:


• DB2 resolves the situation
• (although you might not like it)
6

Early DB2 locking was much simpler, very much “touch a page, take a lock.”
Many improvements have been made since then. The best ones from the DBA point
of view “just happen” to reduce locking. Others require study and understanding,
time to implement, and monitoring to ensure that improvement is achieved. Hard
work, in other words.

Note that when I wrote earlier that DB2 V8 hadn’t made locking problems much
easier to manage, it did provide some relief to help avoid problems, especially in a
data sharing environment. Also they did provide a new performance IFCID 337 to
trace lock escalations.

6
Lock Contention - Where We Stand
• More tools to avoid contention
• Uncommitted read access
• Lock escalation options
• Partition locking
• Special ZPARM tuning knobs
• Data and application design guidelines
• Better documentation
• A lot more instrumentation data
• More measurements, more IFCIDs
• More to learn!

Here are some of the many advances made over time in DB2. Again, some take
effect immediately, others need some work to understand and implement.

7
Locking Information Sources
• DB2 subsystem statistics
• And DB2 status data collected by monitors
• Thread accounting data
• Performance IFCIDs
• DB2 console messages
• Explain
• Access paths, initial locksize
• ZPARMs
• Many related to locking
• Catalog
• Plan / package BIND options
• Object characteristics like LOCKSIZE
• z/OS information
8

There is information related to locking scattered everywhere and it changes and is


expanded over time.

8
EZDBA – Entry Menu

This is an entry menu providing access to either a single DB2 (on the left), or to a
group of DB2s in Single System Image (SSI) mode on the right. Some shared
features are in the middle column.

The default context of ALL is to show all defined DB2s on all LPARS, even cross-
sysplex. You can define an SSI context to include any subset of DB2s, such as the
members of a data sharing group.

Options of special interest for lock analysis:


- Locking Menu - The hyperlink here takes you to a menu of options for analyzing
lock activity and contention both in realtime and historically.
- ZPARMs – There are so many that a menu organizes them by function, but there
is also an index by ZPARM name to find the one you want quickly.
- Catalog Manager Browse allows you to quickly access information about your
application and objects from MVDB2.
- Explain – for active SQL, dynamic SQL in the cache, and plan / package SQL in
the catalog.

9
Monitoring Locking - Where to Start?
• Find problems - before they find you

• Three techniques
• Surveillance
• Catch current problems
• Health checks
• Recent history
• Longer term trending if available
• Troubleshooting
• Analyze current or recent situations

• Refine your own strategy


• Based on your environment
• At a minimum, set early warnings
10

The key to monitoring is to squeeze some time out of your busy schedule to try to
get ahead of the curve.
- Setting up procedures to receive early warnings
- Ensuring that necessary data is captured so that it is easily accessible when needed
for analysis
- And hopefully finding and fixing problems before the end users start calling

10
Surveillance Monitoring
Key Indicators
• Lockouts – both timeouts and deadlocks
• Most important, since SQL is terminated
• Lock Suspensions
• Suspensions are necessary
• A sudden burst may signal problems
• Class 3 lock waits (% of in-DB2 elapsed)
• Affecting response time goals?
• Global lock contention - data sharing
• Other indicators to consider
• Lock escalations or other “gross locks”
• Claim or drain failures if there is utility contention
• Level of GBP-dependency in data sharing
11

There are just a few really important key indicators for lock problems. Keep your
eye on the top four continuously. The others may or may not be necessary in your
environment.

Note: GBP-dependency is closely related to “Inter-DB2 Read/Write interest”,


which is rather long to say or fit on a slide. So I’ll use this term instead.

11
Surveillance Monitoring
Requirements
• Continuous
• Set up for automatic startup
• Low overhead
• Optimize your choices of what to monitor - and how
• Send warning messages
• For immediate action
• Console messages / automation tool
• TSO Ids
• For chronological history
• Journal log
• Post / clear exceptions
• Current exception / alert views
12

Surveillance monitoring should be both continuous and automatic. You also want it
to be low overhead – both on your systems and on you. Then you can watch your
active threads or do more research only when a lock contention problem is
indicated.
Besides warning messages themselves, consolidated alert views and a chronological
view of exception events can be valuable tools on occasion.

12
Surveillance Monitoring
(1) System-level Exceptions
• Based on DB2 statistics and current status
• Resource Monitors for low-overhead periodic checks
• Number of timeouts (LTIME monitor)
• Check every minute / threshold > zero or low value
• May defer warning until exceeded more than once
• Number of deadlocks (LDEAD monitor)
• Check every minute, threshold > zero (?)
• Number of suspensions (LSUSP monitor)
• Check every minute, threshold > tuned value / defer
• Global contention % (data sharing)
• Set an alarm on statistics field (STGBLLK view)
13

So lets go into the details of how this could be done in MAINVIEW for DB2
(MVDB2 for short).
Resource monitors are your first choice for simple low-overhead exception
detection.
You may need to adjust the thresholds to avoid too frequent warnings (don’t
“cry wolf” too often). Besides the actual threshold value, another effective
way to do that is to defer the warning message until the condition has existed
for more than one sample. One or a few timeouts are probably not serious,
but if they keep happening you may have a bad application running that is
affecting several other threads.
Besides the resource monitors, you can set an alarm on any statistics field
like global contention %, or a rate of activity.
Other warnings to consider:
• Monitors for Lock escalations (LESCL), claim failures (CLMF),
drain failures (DRNF), global contention (GSUSP).
• MAINVIEW Alarm Management allows you to set alarms on any
element in any view.

13
Surveillance Monitoring
(2) Capture Lockout Events
• Easily accessible timeout and deadlock event data
• Lockout views to analyze
• Event views with drilldowns
• Single DB2 or a data sharing group
• Views to print
• Likely the first source for problem analysis
• See which threads are in conflict
• See which resources are involved
• Even identify dynamic SQL

• Data is also in the console messages DSNT375I-378I


• OK if a few, difficult to analyze several for patterns
14

In MVDB2, a trace is automatically started for IFCIDs 196 (timeouts) and 172
(deadlocks). An online buffer holds the most recent events for immediate analysis
(you can specify its size in DMRBEX00). All events are also written to history for
later online retrieval or reporting (TIME command).
Lockout views allow you to analyze timeouts and deadlocks for a single DB2 or for
a data sharing group to identify global contention. Special views designed for
printing (LKPRINT, LKPR133) can be used to create a report, or even a CSV file
(to load into Excel), with MV Batch. You can also invoke this feature online with
the MAINVIEW Infrastructure EXPORT command on any view.
The lockout views are often the first place to look for lock contention analysis, since
snapshots in time won’t necessarily show you the patterns you need to understand.
For example, if an active thread is suspended waiting on a longer-running thread,
that may be bad, or just business as usual. But if several threads in a row have timed
out because of such a thread, you will probably be concerned. This is an easy place
to see that one thread that caused multiple timeouts, even after the thread
terminates.
Also, the views are designed to allow you to look at the data by chronological event,
summarized by the resources involved in lockouts, or summarized by the plans
causing the most contention. You can even identify dynamic SQL. An example of
this is shown in one of the scenarios shown later.
Of course, the console log has the DB2 messages. These are fine if there are just a
few events. But it is difficult to identify patterns across many different lockouts.

14
Surveillance Monitoring
(3) Thread-Related Indicators
• Active thread exceptions
• Specify per attach type
• Elapsed time (ELAPSED)
• Updates per Commit (UPDCOM)
• Set alarms for specific threads in customized views
• When needed for a problem application
• Use Workload Monitors for overall averages
• Based on thread accounting records when completed
• Workload objectives views DOBJx (elapsed times)
• Limit to transaction-type workloads like CICS/IMS
• Average elapsed waiting for locks (@ELLK monitor)
• Check every 6 minutes, threshold > ? seconds (tune)
15

Define active thread exceptions by attach type in DMRBEX00. This may be more
valuable to warn you of threads that could cause contention. To set an alarm on
specific thread types with more choice of measurements, start from the THDLIST
view (which includes more elements to choose from than THDACTV). Customize
it to filter just the threads you are concerned about, and include only those elements
needed for filtering, identification and key measurements.
The workload monitors can be used to check overall response times. In this case,
you will likely only want to monitor workloads that have true response time goals,
like CICS or IMS transactions. These objectives views are automatically available
per attach type. You could define additional monitors for specific workloads.
The average class 3 lock wait time of completed transactions can also be a good
indicator of excessive lock contention.

15
Surveillance Monitoring
(4) Exception Threads
• Capture and retain problem threads
• For later analysis of lock problems
• Quick review / follow up a problem report
• Set one more more filters on lock contention
indicators, for threads that are
• Suffering problems?
• % elapsed lock wait time . . .
• Causing problems?
• High updates per commit, locks held . . .

16

This suggestion is not to send out warnings, but to capture and retain problem
threads for later analysis. This is very simple to do with summary exception traces.
For very low overhead, you can save exceptions separate from thread history. If
needed later, this provides complete accounting data for those threads. This is often
simpler and more reliable than finding such threads afterward in thread history for
the high-volume systems being run today.
You can restrict which workloads are checked with workload IDs like plan, authid,
etc., or the distributed end user identifiers.
Here are examples of threads you might want to capture. Don’t overdo it.
For threads suffering problems (the “victims”):
- % elapsed lock wait time
- Lockouts, lock suspends
- Claim / drain failures
For threads that may be causing problems (the “culprits” or blockers):
- High updates per commit
- High maximum locks held
- LOCK TABLE statements, lock escalations
- Excessive activity (SQL, In-DB2 CPU or elapsed, updates)

16
Trace Exception Filters

Lock-related Filters
UIDCOM LOCKSUSP MAXLOCK CLAIMDR
LOCKESCL LOCKTBL
TIMEOUT
PWAITLK
17

Here is an example of the exception filters highlighting those that are especially
interesting for lock analysis.

17
Health checks - System
• Recent online system statistics
• Local and global locking measurements
• ZPARM settings
• Batch statistics reports
• Current level of GBP-dependency and
partition locking (data sharing)

18

Once you have the surveillance tools in place, you can turn your attention to
periodic health checks. The first place to start is at the system level per DB2 or per
data sharing group. If these general parameters and indicators are not “in tune”, it
will be difficult to analyze whether contention is really caused by an application.

18
Lock Statistics (Local)

19

All “local” lock statistics are included in this view, STLOCKD, shown both for the
current interval (1-15 minutes) and session counts since DB2 startup. The most
important are at the top.

19
Global Lock Statistics – Group

Î Set an alarm on % Global Contention

20

This view STGBLLK is a tabular view of global lock statistics, showing the most
recent interval. Typically you will enter this view in a context that just includes the
members in a particular data sharing group. An alarm can be set on any of these
values. The first one, % Global Contention, is the most important to consider. This
is one of the key indicators mentioned earlier for surveillance monitoring.
You can also use the TIME command to look at these values over a past time
period. For example, to see the last 2 hours enter TIME * * 8I to see 8 15-minute
intervals. Just enter TIME to display a panel to fill in.
However, you could also look at all your DB2s at once, and sort descending on the
second column to quickly see high contention. (Type in SORT D in the command
line, place the cursor in the column header and press Enter.)
(Note: the first 3 lines of all views have been excluded from the screen examples to
save space.
- line 1 is a header with date and time
- line 2 is the command line
- line 3 is the window control line to allow for multiple windows at a time
(horizontal or vertical splits).
Multiple views, even from different MAINVIEW products, can be combined into
one screen and saved for later use.

20
Global Lock Statistics – Member

21

The detail view STGBLLKD shows the global lock statistics for one selected
member at a time. Another view later will show some of the statistics summarized
across the group. Here you can see both the interval data shown before, and the
session counts since this DB2 was last started.

21
EZDZPARM – ZPARM Menu

22

There are so many ZPARMs, with many more being added in each DB2 release, that
a menu provides access to different ZPARMs in views organized by function. The
two under the Locking header include most lock-related parameters.

22
Lock ZPARMs (ZPIRLMLD)

23

Most of the lock-related ZPARMs are on this view, although some are on the IRLM
definition view.
Where appropriate there are hyperlinks from the ZPARMs to the related
performance views. For example, the “. STATS” hyperlink in this view.

23
ZPARM Index Lookup / Help

If you just want to look up a specific ZPARM, you don’t have to search all these
views. Just look it up in the index!
For example, if you want to find RELCURHL, one of the lock ZPARMs, go to the
index view for “R”, ZPNAMER. The hyperlink on RELCURHL takes you to
ZPIRLMLD, the view shown just before this. There each ZPARM has detailed help
available.

24
Health Checks - Applications
• Thread History
• Lock waits, lock statistics
• Current locks
• When needed to resolve a current problem
• Or to investigate what a thread is doing
• Batch accounting reports
• For summary lock counts
• For workload analysis

25

Once the system is checked out, the next priority is to look at application issues. Of
course for this analysis, thread history of accounting data is the first source, either
online or in batch reports. The elapsed time analysis is of special interest – both the
(local) lock waits and possible global lock waits.
Current lock views would typically only be used to resolve an existing high-priority
problem, and when watching particular threads execute.

25
Thread History Summary Data

In MVDB2, thread history is accessible online in this summarized format for higher-
level workload analysis, and as drilldowns to individual thread accounting data.
In this view (only parts are included here), you see a summary of all the thread data
in the online data sets, from 13:15 to 16:00.
In this example, notice the high class 3 wait time, and the deadlocks and timeouts.
Drill down on the time field to find out in which 15-minute interval they occurred.
This summary data can be viewed by interval or by connect type - or both.

26
Thread Accounting Detail

27

In the thread accounting data, the first indicator of trouble is a high wait time, but
there are several things that could cause this, for example heavy I/O.

Here, the key indicators section highlights the timeout for you, so you may not need
to look at all the accounting lock counts.

27
Thread Accounting Elapsed Times

More wait types, including global lock waits


(from a different thread)

28

The breakdown of the class 3 wait times is useful to identify lock waits. Global
waits are also included (these are taken from another thread that experienced global
contention).
Note that in an active thread that is currently suspended for a lock, lock wait time is
not updated until the wait terminates, either by getting the resource or with a
negative SQL code.

Further down in the display are the local lock counts, and the global lock counts if
used.

28
Troubleshooting
• Now you have a problem – an alert? a user call?
• What next?
• Still in progress?
• Î Current lock contention views
• Î Active thread views, sort by #locks or elapsed
• Check for lock suspend status
• If not, shift into post-analysis mode
• Lockout analysis of events, resources, plans, SQL
• Review threads captured in exception traces
• Review thread history, summary or thread
• Check catalog data for applications and objects
• Explain plan / package SQL for access paths
29

Now you have a problem:


• An exception alert message was triggered
• You see many active threads suspended for locks
• A user called to complain about response time
• Your health check identified an increase in lock waits

There are two different paths to take for immediate problems and for after-the-fact
analysis.

29
Suspended Threads

Hyperlink on a waiting thread to see all its locks

30

First let’s look at current locking. Typically you would want to do this for a single
DB2 subsystem, or for all the members in a data sharing group. The lock menu
provides access to four different views of current locking.
The most useful are
- suspended threads (seen here)
- resources with waiters (an example follows)

But sometimes you may want the whole picture, so there are also views for:
- all locks held / by thread
- all locks held / by resource

Hyperlink on a waiting thread to see all the resources that thread is holding or
waiting on.

30
Suspended Thread – All Locks

Now hyperlink on the resource waited on to see the holder(s)

31

On the view you can see all the locks for the selected thread, and which resource is
being waited on. Drilldown on that resource will show which thread is holding it.

31
Resource Holder and Waiter(s)

Scroll right to see holder member and SQL

Different members
32

Here it is. One thread with an exclusive lock is causing a suspend. You can see
how long the lock has been held, and how long the waits have been – and whether
those threads are close to a timeout. If you scroll right on this view, you can see
additional information about each thread, including plan name, connect type, DB2
(important if global contention!), SQL statement number and SQL type.

32
Resource Contention

Drill down to contention view, same as before

33

This is the view of all resources in contention with thread waiters. A drilldown here
on a resource also identifies the holder and all waiters, the same view as just shown
by navigating from the thread.

33
Troubleshooting
Deep Dive with Detail Trace
• Identify problem applications
• Start a detail trace to see performance IFCIDs
• Usually SQL and scan events are sufficient
• May want to add I/Os
• Lock events – only when necessary (rare)
• Includes access to full accounting data
• Drilldown to chronological events with pop-ups

34

Once you have identified a problem application that is causing contention, you may
need to get down to the performance IFCIDs. In MVDB2, this is called a detail
application trace and it relates these detail events with the thread accounting data.
Tracing individual locks can be expensive, and is not usually necessary. There is a
wealth of lock information in other IFCIDs.
Once you have activated a detail trace, qualified to select the threads you need to
analyze, you can drill down to see the events in the life of that thread.

34
Detail Trace Analysis
• Start analysis with level 2 events
• Most important events like SQL, commits, lockouts
• Other lock-related information
• PLAN or PKG-ALLOC for BIND specifications
• LOCK-SUMMARY for objects, including
• Max pages held, table escalations,
highest lock state, partition locking
• COMMIT-LSN for lock avoidance
• CLAIM-WAIT / DRAIN-WAIT for utility interaction
• Scroll down looking for suspect SQL or lockouts

• Then expand to include level 3 events


• Lock suspends, scans, I/Os, detail locks if traced

35

Start/end events are paired to provide elapsed and CPU times. Nesting shows
events initiated by other events
Start DTRAC analysis with level 2 events
• to track SQL and commit patterns (IFCIDs 58-66, 272-273)
• timeouts and deadlocks (196 and 172)
• dynamic SQL text and Explain (63 / 350 and 22)
• and more information:
• PLAN or PKG-ALLOC for BIND specifications (IFCIDs 112 / 177)
• LOCK-SUMMARY for objects (IFCID 20)
Max pages held, table escalations, highest lock state, partition
locking
• COMMIT-LSN for lock avoidance (IFCID 218)
• CLAIM-WAIT / DRAIN-WAIT for utilities (IFCIDs 213-216)
Since this is a reduced set of events, it is easy to scroll through it to identify SQL
statements or lockouts that need additional analysis.
Position on a suspect SQL event and expand to include level 3 events, like scans,
I/Os, LSN-DETAIL, and detail locks if traced.

35
Detail trace DTRAC LEVEL=2

Select TIMEOUT for IFCID 196 details - resource and holder


36

This is an example of the detail trace display showing only level 1 (SQL) and level 2
(important) events. All events with an asterisk in the Detail column have detail pop-
ups available that show you the information included in the performance IFCID for
that event.
This thread experienced a timeout while attempting an update. The timeout detail
shows the resource involved and the holder thread. If there was global contention,
the holder member is shown.
Also note that there was no lock avoidance (see the COMMIT-LSN event). This is
not surprising for an update, but could be of importance in other situations.

36
Detail trace DTRAC LEVEL=3

37

Now you see the additional lock events, including the suspend preceding the
timeout.
In other situations you would also see scans, I/O and the LSN-Detail.
Again, most of these events have pop-ups for IFCID detail.

37
Scenarios
• Now there are two scenarios that walk through
some of these views

• The first concentrates on a single DB2, and drills


down to analyze a timeout

• The second is in for a data sharing group,


analyzing global contention

38

Because there are so many different causes of lock contention problems, these
scenarios focus more on how to access data for analysis, rather than detail about a
specific lock problem. You will probably encounter different problems each time
lock contention becomes an issue, but the information you need for analysis is
similar.

38
Lockout Analysis – a Timeout

Select this timeout


39

We are going to select a recent timeout (the one at 13:18:25), and drill down for
details.

You can also scroll right for more details about the blocker thread.

39
Drilldown to Resource
- Timeout

Scroll right to see lock states . . .

Hyperlink to the detail view


40

In the timeout only one resource conflict was involved. You can scroll right for
more details, which is often useful for deadlocks with multiple resources for quick
comparisons of lock characteristics. Then we will drill down to the detail view.

40
Drilldown to Conflict Detail
Blocker / Waiter / Resource

There is a lot of information per conflict. At the top right you see most of the
blocker thread information. If you scroll down, there is more information about the
waiter thread on the right, and the resource on the left.

41
Scroll Right for SQL Statement IDs

Î Hyperlink to Statement Cache to see text and Explain it

Here we have scrolled right to see the additional blocker/ waiter information –
especially useful for distributed threads like this one. Even more important, if the
wait was caused by dynamic SQL, the token identifying that SQL in the dynamic
cache is provided. Since it is highlighted, you see that there is another hyperlink. It
goes to the view SCSQLD for that statement. From there you can view the SQL
text, and hyperlink to Explain it. If collected, execution statistics are also shown.
Of course, if you look at this lockout quite a bit later than it occurred, the SQL may
no longer be available in the statement cache. Then again, if the statement is not
being reused enough to stay in the cache, perhaps it is not of particular concern.
But for recent deadlocks this complete information on the conflict - down to the
dynamic SQL – will often be enough to identify the problem.

42
Scenario 2 - Data Sharing
Global Contention

43

For scenario 2 on global contention, we will walk through this example of an


MVDB2 tuning wizard dialog. It provides a dialog path of logical steps for analysis,
and key indicators to help you decide whether a particular path is of interest or not.
If you look at the top line (line 4 on your terminal), you will see “ALL” after the
view name of WZLOCK. This is the context name, and indicates that all defined
DB2s are being monitored. In fact, in this case a record from a non-data sharing
subsystem was returned first, so no group name is shown.

To analyze a data sharing group, the first thing we need to do is limit the context to
its members.

43
Set Group Context

44

That first hyperlink brought us to a list of all predefined contexts (usually done by
an administrator). If you place your cursor on “DBGK” and press enter, the context
will be set, as you will see in the next screen. It is shown in the context line, and
also verified with the group name in the view. The context does not have to be the
same as the group name, but is easier to remember if it is. Of course, if you already
know the context name, you could just enter “CON DBGK” in the command line on
the first wizard view.
Note that we now also see a valid number of members for this group and the
statistics below it are comprised of summary data from all active members. So you
can already see that there is quite a bit of lock contention, both recently (interval)
and over time (session). So it should be worthwhile to continue the analysis.

44
Now Analyze the Group
There is Global Contention!

45

Now we are in the dialog path for data sharing group analysis. On this view the data
from both members is summarized for several key indicators. The focus now is
global contention - you quickly see the high global contention percent - and the
number of timeouts and deadlocks.

45
Check the Members & Compare

46

The first option provides you this list of members so you can quickly see if there are
major differences between them. Our simple test system only has 2 members, but
most production systems have more.
This view is concerned with locking, but the nice thing about MAINVIEW SSI
mode is that it is available for all tabular views so you can easily compare group
member performance for any other functional area like buffer pool activity, for
example.
Select one DB2 to view details.

46
Member Detail

47

The member detail view not only provides both interval and session statistics, but
points out measurements you may need to be concerned about. The hyperlinks take
you to further information.
For example, if you have a high rate of global contention, you need to analyze the
level of GBP-Dependency or “Inter-DB2 Read/Write Interest”. The first white
hyperlink takes you to a view to see the current status.
False contention is more closely related to coupling facility setup. That hyperlink
can take you to the coupling facility data collected by the MV/OS390 product (if
installed).
The other hyperlinks take you to the complete local or global locking statistics
views if needed.

47
Inter-DB2 R/W Interest? Where?

48

The member detail view suggests that one of the issues with high global contention
is the level of inter-DB2 read/write interest, and provides a hyperlink to this view,
which summarizes the status per group buffer pool. A drilldown on buffer pool
shows a list of objects and the specific level of interest in each member. (A bit
easier to understand than the cryptic codes in the DISPLAY DATABASE LOCKS
command.)

48
Check Group Lockouts
(Next wizard option)

Scroll right to see the blocker member for global lockouts


49

After checking the members, PF3 returns you to the group analysis dialog view.
Refer to slide 45 for a refresher. The next analysis path is to check for gross locks,
like table lock statements or lock escalations, or contention with utilities (claim /
drain failures). But the panel also showed that no gross locks have occurred, so it is
not necessary to follow this path.
The next option is to view lockouts, but now from a group perspective. This view
is similar to the LKEVENT view shown earlier, but it is designed for data sharing to
identify global lockouts and which members are involved. Other local lockout
events are included for completeness. Scroll to the right to see the blocker member
for those events marked with Global Contention set to YES.

49
Which Resources Were Involved?
(Next wizard option)

With resource number

And a list of events to see who and when

The next option of the group dialog panel is to look at all the resources involved in
timeouts or deadlocks. It even identifies when some of the lockouts were between
members.
This first view displayed is a summary by database / table space, with a drilldown to
see the details of each resource, down to a page or row number and type of lock.
From here you can access a list of all the events that were involved in contention for
this resource.

From here you can access each of the lockouts on this resource. Did they all occur
close together because of one blocker thread, or is there a different pattern?

50
And Check Blockers / Waiters
(Next wizard option)

A drilldown on the worst blocker would show when it


caused its lockouts

51

This is another option in the dialog panel to identify blocker and waiter plans
involved in lockouts and how many were global contention. Again, you can drill
down to see the events caused by a specific blocker (did a bad application hold its
locks a long time and cause a group of lockouts, or was it just active frequently?), or
by waiter (does this application conflict with many plans, or just one?).

51
Summary
• Set up surveillance to send warnings to take action
• Review data sources ahead of time
• Be ready to react to a hot situation
• Perform health checks when time permits
• Troubleshooting navigation tools to find data quickly
• EZDBA with single DB2 and SSI access
• EZDLOCK menu of lock analysis views
• Lock wizard to step you through the options
• DB2 TOPICx views
• Like TOPICL to find lock-related data directly
• Read up on the DB2 recommendations
• When needed for problem analysis
• When needed during application design
52

This is the end of the actual presentation, but some good references follow.

52
Where to Find Recommendations
• DB2 Administration Guide
• V8 – SC18-7413-01 / V7 – SC26-9931
• Performance Monitoring and Tuning section
• Chapter 30 – Improving Concurrency
• A quick summary of these recommendations follows
• Includes pointers to MVDB2 data
• DB2 Data Sharing: Planning and administration
• V8 – SC18-7417-02 / V7 - SC26-9935-05
• Chapter 6 – Improving Concurrency
• IBM RedBooks
• DB2 Performance Topics
• V8 – SG24-6465 / V7 – SG24-6129

• A very condensed recommendation list follows


53

The DB2 documentation on locking issues has been much improved from earlier
days. It concentrates not only on the details of how locking works, but on
recommendations on what your should do about it. I could not improve on it, so I
am instead providing a short list of recommendations, generally in the order in
which they are presented in both “Improving Concurrency” sections. Obviously, the
full explanations of each recommendation are not included here, so go to the source
and read further for the details.

There are quick notes included in the list of where to find related information in
MAINVIEW for DB2.

53
Recommendations (1)
And Where to Check in MVDB2
• System optimization
• Address overall performance issues first
• System, subsystem, applications (like TS scan)
• Too many threads?
• Î Monitors CONUT, THDUT, DBTUT (DBATs)
• Eliminate swapping
• Î THDACTV to see swapped out threads
• Reduce system resource contention waits holding locks
• Î Thread History summary HTDTLZ or TSTAT
• Class 3 waits, esp. I/O and not accounted
• Storage contention – system or DBM1 paging
• Î Monitor DB2DP, or DB2 Status Detail STDB2D
54

54
Recommendations (2)
• Database design
• Keep like things together (tables for applications)
• Keep unlike things apart
• Adequate number of DBs, TSs, qualifiers
• Plan for batch inserts, especially for data sharing
• Use LOCKSIZE ANY, unless reason not to
• Examine small tables needing high concurrency
• Spread out data or consider row locks
• Partition large tables
• Partition secondary indexes (review access first)
• Fewer rows of data per page (MAXROWS)
• Consider volatile tables to ensure index access

• Î Catalog Manager / Page


55
Set views / Explain

55
Recommendations (3)
• Application design
• Access data in a consistent order to avoid deadlocks
• Commit work as soon as is practical to reduce
contention
• Issue ROLLBACK after unsuccessful SQL (Prepare)
• Frequent commit points for
• Less contention, more lock avoidance
• Quicker rollback or DB2 restart
• Improved utility access
• Close cursors as soon as possible, free LOB locators

• Î “time since last commit” in active thread (DUSER)


• Î “Updates / commit” in accounting
• Î Detail SQL trace for application analysis
56

56
Recommendations (4)
• Application design
• Retry SQL timeout / deadlock in a batch program to
avoid operations rescheduling
• Bind plans with ACQUIRE(USE) for best
concurrency, ACQUIRE(ALLOCATE) to avoid
timeouts, especially with gross locks
• Bind with ISOLATION(CS) and CURRENTDATA(NO)
typically (review other options)
• (V8) Use sequences to generate unique number keys
• Examine multi-row operations to reduce contention
and escalation

• Î Catalog Manager Browse to see options


• Î Thread history to see accounting data

57

57
Recommendations (5)
• Miscellaneous
• Avoid catalog contention from DDL and GRANT/REVOKE
• Reduce concurrent use of statements that update
SYSDBASE for the same table space
• Reduce incompatible locks on skeleton tables
• Bind, rebind, free, drop resource, revoke privilege needed by
plan/package
• Review DBD lock issues, primarily from DDL (X locks)
• Check if more than 25% of lock escalations are causing
lockouts, and consider reducing / disabling
• If lots of deadlocks, reduce DEADLOCK TIME to 1 to
resolve more quickly
• Not too many deadlocks, increase TIME up to 5 to reduce
latch suspensions

• Î Locking statistics in STLOCK / STGBLLKD, reports


58
• Î Lockout and current lock views

58
Recommendations (6)
• Miscellaneous
• Specify IDLE THREAD TIMEOUT > 0 if
distributed users leave applications idle while
holding locks
• LOCKS PER USER (NUMLKUS) may need to be
increased for row locking or LOBs
• Monitor application effect of anything other than
LOCKSIZE ANY
• Review use of special lock ZPARMs
• EVALUNC, RRULOCK, XLKUPDLT, RELCURHL, SKIPUNCI
• Review advantages and disadvantages of
ADQUIRE/RELEASE combinations, ISOLATION
and CURRENTDATA for problem applications
• Understand claims, drains, and utility compatibility
• LOB locks are different!
• Î ZPARM views 59

The ZPARM views will give you information about these options.

59
Recommendations (7)
• Data sharing
• Everything said before to reduce locking!
• Minimize global lock propagation
• Less inter-DB2 read/write interest / one updater
• Partitioning - more granular control of interest
• Avoid locking all parts (like plan ACQ(ALLOC))
• Optimize lock avoidance techniques
• Check size of CF lock table to reduce false contention
• Speed up global contention detection cycles
• Tune XCF message buffers
• Watch for high IRLM latch contention
• Avoid large numbers of waiters / reduce deadlock time?

• Î Lock wizard / group dialog


• Î MVZOS / RMF for coupling facility information
60

60
Session I07
Detecting and Resolving
Locking Problems

Judy Quenet
BMC Software
judy_quenet@bmc.com

61

61

You might also like