Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

About Slacksite

Contact
Legal Info
Advertising
Some Witty Tagline Goes Here
Home
Apache Documents
Solaris Documents
Solaris TechNotes
Linux Documents
Other Technical Docs
Humour
Logos

Archives
• March 2008
• June 2008
• February 2010
• March 2010

Categories
• Networking
• IT Industry
• Certification
• Careers
• Linux
• Virtualization
• Travel
• Japan
• travel japan tokyo
• Training
• Storage

Search
Top of Form

Search

Bottom of Form

Crash Dump Analysis on Solaris


Introduction
This document attempts to provide a high-level introduction to handling basic crash dump
analysis on Sun servers. A sample procedure is included which can be adopted to any
organization for uniform handling of Sun server crashes. The term 'Crash Dump Analysis' may
be a bit misleading in the context of this document. Coverage of actual analysis of the system
crash dump using a debugger is not covered--Sun has an excellent instructor-led training class on
this topic. Most System Administrators at most organizations will never have to use a debugger
on a crash dump--this is typically a service provided by Sun with a service contract. In light of
this, this document covers introductory materials regarding server crashes and preparing the
necessary information to present to Sun when a service call is opened.

What Happens After a Crash?


When a panic occurs on a Solaris system, a message describing the error is usually echoed to the
system console. The system will then attempt to write out the contents of the physical memory to
a predetermined dump device, which is usually a dedicated disk partition, or the system swap
partition. Once this is completed, the system is then rebooted.
Once the system begins rebooting, a startup script will call the savecore utility, if enabled. This
command will perform a few tasks on the memory dump. First it will check to make sure that the
crash dump corresponds to the running operating system. If the dump passes this test, savecore
will then begin to copy the crash dump from the dedicated dump device to the directory
/var/crash/`uname -n', or some other predetermined device. The dump is written out to two
files, unix.n and vmcore.n, where n is an sequential integer identifying this particular crash.
Finally, savecore logs a reboot using the LOG_AUTH syslog facility.
A sample memory dump of a system named testbox appears as follows:
# ls -l /var/crash/testbox
total 1544786
-rw-r--r-- 1 root root 2 Jun 15 16:02 bounds
-rw-r--r-- 1 root root 670367 Jun 15 16:00 unix.0
-rw-r--r-- 1 root root 790110208 Jun 15 16:02 vmcore.0
Various options related to performing the actual crash dump and the savecore functions can be
set using the dumpadm command. This utility allows the administrator to determine the dedicated
dump device, the directory savecore will write to, and whether or not savecore runs at all. In
addition, the /etc/init.d/savecore initilization script is the actual script run at bootup which
executes savecore.
Typical output from dumpadm for the system testbox appears as follows:
# dumpadm
Dump content: kernel pages
Dump device: /dev/dsk/c0t0d0s3 (swap)
Savecore directory: /var/crash/testbox
Savecore enabled: yes

What Causes a Crash?


Fatal operating system errors can be caused by bugs in the operating system, its associated
device drivers and loadable modules, or by faulty hardware. Whatever the cause, the crash dump
itself provides invaluable information to a Sun Support Engineer (if you are lucky enough to
have a support contract) to aid in diagnosing the problem.

What To Do In Case of a Crash?


Any action taken when a Sun server crashes is obviously going to depend on the local policies
and procedures in place at your organization. The presence of a Sun Service Agreement and its
level will also affect your response to a crash.
What follows is an example of a typical procedure for dealing with a crash. This procedure was
created based on real world experiences but does not reflect any particular real-world
organization. For the purposes of illustration, assume that the organization in this example has a
Platinum level contract with Sun.
The first step in analysing a crash is to determine if the necessary evidence is present in order to
find a root cause. To begin, scan /var/adm/messages for any warnings or errors. Many crashes
will leave evidence in the logs, such as which CPU caught the panic or which memory DIMM
had errors. Often Sun engineers can diagnose the cause of a crash based on this information
alone.
Next, check /var/crash/`uname -n` for a crash dump. If one is not present, confirm that
savecore is enabled. Try running savecore -v if it was not previously enabled. It would also
be a good idea to run prtdiag at this time to determine if there are any egregious hardware
faults.
Armed with this information, open a call with Sun. Take note of the case ID number. For
purposes of this example the case ID will be 123456. The Sun engineer may be able to diagnose
the fault based on the panic strings or error messages from /var/adm/messages, or they may
require the actual crash dump for analysis. Luckily there are two tools, CTEact (ACT), and
explorer, which cull useful information from the crash dump and the system making it
unecessary to upload the actual crash dump (which could be gigabytes in size).
Use the following steps to generate the ACT analysis of that core file to send to Sun:
Create a temporary upload directory. This directory will hold the output of these programs and
will eventually be uploaded to Sun.
# mkdir /tmp/upload
# cd /var/crash/`uname -n`
# /opt/CTEact/bin/act -n unix.0 -d vmcore.0 > /tmp/upload/act_out
Install (if necessary) and run the explorer script as follows:
# ./explorer
The explorer script will prompt you for some information. Do not select email output. The script
will create both a subdirectory and a uuencoded file containing the system audit. Copy the
uuencoded system audit output to the /tmp/upload directory. For example:
# cp explorer.80b0c1cc.uu /tmp/upload
Tar and compress the output for upload to Sun:
# cd /tmp
# mv upload 123456
# tar -cvf 123456.tar 123456
# gzip 123456.tar
Finally, FTP the output to Sun:
# ftp sunsolve.sun.com
ftp> username: ftp
ftp> password: <your email address>
ftp> bin
ftp> put 123456.tar.gz
ftp> quit
At this point you can remove the temporary upload directory:
# /bin/rm -rf /tmp/123456
Retain the original core files in /var/crash/`uname -n` until the case is closed. Once the case is
closed by Sun, remove these file to free up disk space.

Conclusion
Those who wish to do more than simply upload information to Sun and let them analyse the
crash dump should strongly consider taking Sun's "Core Dump Analysis" course.
For more information, particularly on self-analysis of crash dumps, see Printceton University
Solaris 2.x Core Dump Analysis(www.princeton.edu/~unix/Solaris/troubleshoot/savecore.html).

Website Design & Maintenance by Erika Stokes

You might also like