Professional Documents
Culture Documents
Deadlock From Library Cache Lock
Deadlock From Library Cache Lock
Introduction
The in-house Oracle performance analyzer tool can be very handy and efficient to analyze performance issue triggered by library cache lock. Here I will present a real life case to analyze and resolve such type of performance issues in minutes.
The Issue
On 01/03/2014, DB sp2-stgadmdb suffered from library cache lock contention. Some queries have waited for more than 15 hours. From Oracle performance analyzer, we can see the active session list, wait events, running time and wait time. Here I sorted the list by column SEC_WAIT. Oracle performance analyzer provides a lock finder tool via context menu: Find Lock Holder. It can be used to retrieve the lock queue and lock holder.
Recursive SQL
Note the SQL_ID of the slave processes is different from the QC session. The bottom pane can be used to identify the actual SQL. The slave processes are working on a recursive SQL from parsing process. Basically, the SQL is used to figure out partition pruning. The SQL_ID without child number indicates the related sessions have not done the parsing.
Find Lock Holder function, triggered from the slave session waiting for library cache lock, displays another list of the lock queue. The lock holder is the query QC session. The lock is held in Share mode. Why the blocked slave sessions cannot be granted the lock with Share mode? The SYS session (sid 13, node 2) is requesting the lock with Exclusive mode. It also requested before the PX slave sessions (check WAIT_SEC, it has waited for 54160 seconds).
The Troublemaker
The context menu Track This Session can be used to track the SYS session which is requesting exclusive lock. To break the deadlock, either the SYS stats job, or the query, has to be killed.
Back To Normal
Here is the DB status minutes after I cleared the SYS stats job session. All but one queries have running time longer than 1 minute. At the time I am writing this slide (within one hour), all user queries are completed and the only visible active sessions are from Oracle Perf analyzer. As a side node, if a stat job got killed, better to restart it later. Missing or incomplete stats could cause other performance issues.