Professional Documents
Culture Documents
CRS RAC Troubleshooting
CRS RAC Troubleshooting
CRS RAC Troubleshooting
Troubleshooting
Krishnadev Telikicherla
Cluster & Parallel Storage Technology
Oracle Corporation
Oracle Corporation
Topics:
Oracle Corporation
Defining the Issue
Layers
What layers are involved in the issue:
• Oracle Clusterware
• CRS daemon
• CSS daemon
• HangCheckTimer [Linux] / Oprocd (not
Linux)
• EVM
• OCR
• Voting
• General RDBMS
• Operating System
• Hardware
Oracle Corporation
Defining the Issue
Cause vs. Effects
Causes:
– Resource issues
– Oracle issues
– OS issues
Effects:
– Hangs/Spins
– Instances Crashes and Evictions
– Node Reboots and Evictions
– Oracle Errors (ORA-600, ORA-7445, ORA-29740)
Oracle Corporation
Defining the Issue
Description
When describing the problem while creating the SR
via Metalink it is important that you use phrases that
will help identify known issues either in bugs or
Metalink content.
In the body of the SR try to be as detailed as possible
about the environment.
Nobody knows the system better than the you.
Talk to the sys-admin as well regarding OS/Network
related issues.
Oracle Corporation
Creating a Timeline
Oracle Corporation
Hang or slowdown
Oracle Corporation
Is it a Hang or a Slowdown?
Check:
System states to see if there is any change
over a short period of time
V$SESSION_WAIT where wait_time=0
Overall machine load, including cpu,
memory, swap, I/O
Oracle Corporation
Is it a Hang or a Slowdown?
Oracle Corporation
Performance
Oracle Corporation
Single Process or Single
Statement
Find the wait event
10046 level 12
- oradebug setorapid
- oradebug event 10046 trace name context forever, level 12
- oradebug tracefile_name
Explain plan
10053 if plan problems are found
V$SESSTAT
Truss/trace/dbx/pstack if OS-related
problems are suspected
Oracle Corporation
Instance Slowdown
Statspack / AWR
OS performance statistics - cpu, memory,
and I/O
Characteristics:
– Related to a particular job?
– Certain time of day?
– What’s changed?
Oracle Corporation
Multi-Instance Slowdowns
Oracle Corporation
Multi-Instance Slowdowns
Oracle Corporation
Debugging Techniques
v$session_wait
System states from all nodes
10046 level 12 trace of the hung process
ORADEBUG
Lock layer and DLM tracing
Get any traces:
DLM traces
Background processes, alert logs, and init.ora
User traces
Oracle Corporation
Debugging and Diagnostics
Oracle Corporation
ORADEBUG and Tools
Hang analyze:
– hanganalyze <level>
Note: 301137.1 – OS Watcher User Guide
Note: 135714.1 - Script to Collect RAC
Diagnostic Information (diagcollection.pl)
Oracle Corporation
Gathering Data
Best Practices
Single most important step
There is never too much data, but including lots of
useless data can increase download time of the data
as well as increase the amount of time to process the
data.
Always error on getting too much data, but be aware
of the impact on the resolution time.
Too little data increases resolution time more than too
much data.
Always include a readme.txt file that explains the
contens of the provided files
Oracle Corporation
Gathering Data
Processes
Always get stacks from processes that seem
to be spinning, hanging or unresponsive:
– oradebug
– gdb
– pstack
ps and top info can be very usefull when
trying to determine if a processes exhibits
issues such as memory leaks, spinning or
hanging
Oracle Corporation
Gathering Data
RAC
For instance evictions please review Metalink
note 219361.1
See Metalink note 203226.1 : RAC Survival
Kit: Real Application Clusters Troubleshooting
and Information
See Metalink note 289690.1 : Data Gathering
for Troubleshooting RAC and CRS issues
Oracle Corporation
Gathering Data
Tools
RDA – system and Oracle configuration information
racdiag – modifiable sql script for gathering rac data. See
Metalink note 135714.1 “Script to Collect RAC Diagnostic
Information
OSW – OS Watcher gathers top, slabinfo, netstat and ps data
over programmable intervals 301137.1 “OS Watcher User
Guide”
Oracle Corporation
Gathering Data
CRS 10.2.0.x (continued)
CRS and other resource issues:
– ORA_CRS_HOME
log/<hostname>/cssd/oclsmon
log/<hostname>/cssd
log/<hostname>/client
log/<hostname>/crsd
log/<hostname>/evmd
log/<hostname>/racg
– ORACLE_HOME (rdbms)
racg/dump
ORACLE_BASE/<db_name>/hdump
Oracle Corporation
Gathering Data
Tools (continue)
Starting with 10.2.0.1 $ORA_CRS_HOME/bin/diagcollection.pl collect all
RAC relevant files (run as root)
oracle10@stnsp010>./diagcollection.pl
Production Copyright 2004, 2005, Oracle. All rights reserved
Cluster Ready Services (CRS) diagnostic collection tool
diagcollection
--collect
[--crs] For collecting crs diag information
[--oh] For collecting oracle home diag information
[--ob] For collecting oracle base diag information
[--all] Default.For collecting all diag information
NOTE:
1. You can also do the following
./diagcollection.pl --collect --crs --oh
2. ORA_CRS_HOME,ORACLE_HOME and ORACLE_BASE env variables
need to be set.
--clean cleans up the diagnosability
information gathered by this script
--coreanalyze extracts information from core files
and stores it in a text file
Oracle Corporation
Testcases
Oracle Corporation
Rediscovery
Oracle Corporation
Engaging Oracle Support
Oracle Corporation
Examples
Oracle Corporation
Examples
Oracle Corporation
Questions?
Oracle Corporation