Professional Documents
Culture Documents
Problem Systems
Problem Systems
Directions
When running vmstat there are a few things to keep in mind; running vmstat itself
does impact the system. I would not recommend running vmstat with less than a
five second interval. Also it should be noted that on solaris the first vmstat
entry is bogus; some look close but I would recommend ignoring them.
Also when running these we need to take snapshots at different times to compare
as systems sometimes have busy periods.
% vmstat 5 10
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr s1 s1 s1 sd in sy cs us sy id
31 6 0 4542592 1747840 378 3631 94 262 245 0 0 0 25 25 0 2365 17793 4086 62 18 20
52 2 0 4446880 1556736 325 2474 35 300 294 0 0 0 0 0 0 1681 22277 2279 83 11 6
70 1 0 4451632 1557256 567 5702 35 404 390 0 0 0 2 2 0 1879 24490 2432 79 17 4
56 2 0 4461664 1562864 553 6001 60 289 265 0 0 0 1 1 0 1611 16779 2288 69 17 14
58 2 0 4454664 1562192 450 3765 179 374 356 0 0 0 0 0 0 1848 20948 2345 84 14 2
62 2 0 4457192 1561640 360 3179 9 504 491 0 0 0 0 0 0 1789 22680 2201 83 13 5
12 2 0 4448856 1553080 305 2701 22 216 208 0 0 0 1 2 0 1499 12331 1934 84 10 7
97 2 0 4448936 1556288 615 5030 43 582 571 0 0 0 1 1 0 2128 29096 2560 84 16 0
54 1 0 4453744 1559144 770 8629 3 286 272 0 0 0 0 0 0 1988 20503 2553 78 21 1
6 2 0 4459680 1561424 272 2773 27 280 265 0 0 0 1 1 0 1373 10061 2027 63 9 28
CPU bottleneck on a 4 CPU system over 16 is a problem and this is consistently above
that
Memory is okay as scan rate sr is 0 and there is not much paging activity
I/O is being blocked as there are non-zero values under the proc->b column we need to
look at disks controllers with the "iostat" command to get more information
CPU's look okay here, no cpu's blocked waiting for I/O or runnable processes
Memory is in short supply here we are doing some memory intensive operations and is
very close to needing more memory sr has value consistently above 150
Cpu's are blocked waiting for I/O so they are showing up as idle cpu->idle
Memory on this system looks okay some paging but no scanning (sr)
I/O this system has blocked processes and no runnable ones definate problems here;
again running "iostat" command to get more information
Description
Keeping an eye on system performance is critical for many applications. The vmstat
program provides detailed if cryptic information about the system load.
Directions
To keep an eye on the system, use the follwoing command:
vmstat 1
The 1 specifies the number of seconds between updates. Below is sample output from
vmstat
To get a quick feel for the system load, look at the last and first columns. The last column
is the CPU idle (larger is better); a very busy system will show zero. The first column is
the number of processes waiting for the cpu. The uptime load is an average of this value
over periods of time (see the recipe Determine the system CPU load using uptime). The
number of waiting processes should be less than 4 times the number of processors in the
system for optimal load.
The swap and memory colums display the amount of free swap space and physical
memory in KB. The pi and po colums show the number of KB pages in and out of
memory, respectively. If these values are consistently very high, they may indicate a need
for more physical memory.
The columns under the disk category show the number of disk operations per second. The
s columns represent different disks on the system.
Description
Using the uptime command, a measure of the system's processing load can be
determined.
Directions
Running uptime on a system provides three values that relate to the system's load
average. They represent the load average during the last 1 minute, 5 minutes, and 15
mintues, respectively. Each value represents an average of the number of processes
waiting to run on a CPU. The ideal load for a system is equal to the number of processors
in the system.
Description
Different Sun platforms have different default device targets for the CD-ROM drive. This
simple command will display the drives in the system and allow you to easily determine
the CD-ROM target id.
Directions
Run the command:
iostat -En
In the output, the description of the product usually contains CD. In the following
example output, the CD-ROM target id is c0t3d0:
Description
It is sometimes useful to delete files conditionally based on their age. This recipe
describes a procedure for deleting aged files using the UNIX find command.
Directions
The find command has a -newer expression which compares the found files against a
reference file. It returns files with a modification time newer than the reference file. To
find older files, preceed the -newer expression with the negation operator !.
If you do not have a file to use with an appropriate timestamp, you can create a file with
this recipe: Create/modify a UNIX file with an arbitrary timestamp.
Given the reference file /tmp/timeref, you can find all older files in the current working
directory and beneath using the following command:
find . ! -newer /tmp/timeref -exec ls -l {} \; | more
This command is safe and will only provide a long listing of the target files, showing the
files one page at a time. Using this command, you can make sure that the command
syntax is correct before performing the irreversible deletion.
Once you are comfortable with the find syntax, you can change to command to cause
deletion:
Be very careful using this command. There is no easy way to undelete files in UNIX.
Using the long file list version of this command first provides a great deal of safety since
you can see exactly what will happen. Changing to a directory and performing the find
command within it using the . as the path provides some additional protection from
mistyping the path.
Description
The touch command in UNIX creates a file if it doesn't exist or updates the modification
time of an existing file to the current time. An option of the touch command allows the
modification timestamp to be set to any arbitrary time.
Directions
To change the modification time of a file (the time displayed in a long listing of the file)
of a file called testfile to November 18, 2000, 2:30 PM, use the following command:
touch -t 200011181430
This will alter the modification time of an existing testfile or, if not present, will create an
empty file with that timestamp.
[[CC]YY]MMDDhhmm [.SS]
Date and time elements in square brackets are optional, so the minimum timestamp
includes month, day, hour, and minute.
Description
Errors in a filesystem can prevent a system from booting properly and are commonly
caused by improper system shutdown. The fsck (filesystem check) utility can identify and
repair errors.
Directions
To check and interactively repair filesystem errors on the device c0t0d0s1, run the
following command as root:
fsck /dev/rdsk/c0t0d0s1
Be careful when specifying the path to the device. There are two links to the same
physical device, one in /dev/dsk and the other in /dev/rdsk. The rdsk represents the media
as a raw device which is appropriate for low-level operations such as fsck and newfs. The
dsk represents a cooked filesystem that is appropriate for mounting and other high-level
operations.
Description
Having an appropriate amount of swap space is important for optimal system
performance. Simple commands allow monitoring swap space utilization.
Directions
To get a summary of total system swap space, use the swap command:
swap -s
total: 597744k bytes allocated + 99760k reserved = 697504k used, 095216k available
The output of the swap -s command shows the amount of swap space used (697504KB in
this example) and available (95216KB), and further breaks down the used swap space
into allocated and reserved. Allocated space represents swap space currently in use.
Reserved space is in limbo, not available, not in use, but reserved for future use.
To get details on the individual devices or files that constitute the swap space, use:
swap -l
Description
Sometimes swap space is inadequate but no free disk slices are available. Until a
permanent solution is available, a file can be constructed and added to the swap space.
Directions
To make a 250MB addition to swap space using the /var filesystem, run as root:
The mkfile command makes a file of a specified size. The swap -a command adds the
file to the swap space. If you have a free slice, such as /dev/dsk/c0t0d0s5, you can add
that to the swap space temporarily with the following command:
swap -a /dev/dsk/c0t0d0s5
These changes are temporary because they will not persist after the system reboots. To
make the preceeding two example swap space additions persist after a reboot, place the
following lines in the /etc/vfstab file (be very careful making changes to this file):
/var/newswap - - swap - no -
/dev/dsk/c0t0d0s5 - - swap - no -
1. cp /etc/system /etc/system.bkp
2. vi /etc/system (make and save changes)
3. Reboot
4. If the system fails to come back up or cannot recover from a failure
5. Press Stop+A
6. At OK> prompt type boot -a
eg:OK boot -a
7. This is the command for an interactive boot it will ask you several things give defaults
until it asks for your system file. SPECIFY THE /etc/system.bkp file and your system
should come up normally.
Description
Add raid to your Solaris system
Directions
This describes how to install Software raid using Solstice Disksuite. First, you need to
grab the DiskSuite packages from Sun. It's a free download, but you do need to create a
sunsolve account to get it.
Install the packages - There's an installer included. Don't bother. Just pkgadd the
individual packages. There's only 5 or so.
Once you're done that, you'll need to determine how you want to lay out your disks. The
following assumes that:
0-/
1 - swap
2 - whole-disk
3 - unassigned 64-MB
4 - unassigned 64-MB
Adjust the above to match whatever your preferred layout is. This is only for a simple
example. Slices 3 and 4 are for Meta-Database logging. If you don't have 128MB of free
space to spare, then try and make some space (ie., sacrafice some swap if you have to).
Disclaimer: Before you continue, I can't stress enough. Make backups. I always do
this procedure during system installation, and I do it routinely. Although your data
is supposed to survive and actually migrate without incident, I have seen DiskSuite
eat a system once or twice - usually due to operator error.
Make backups.
You need to duplicate your layout from disk0 to disk1. It's fairly important that the disk
geometry matches. Metadevices work at the block-level of the disk, and if one disk has
fewer blocks than the other you'll wind up making a mess. Once you're sure you're ready
to proceed, dump the layout from disk0 to disk1 thusly:
Second, you need to create your meta-databases. This is for logging, and all but
eliminates the need for fsck to run after a dirty shutdown. Do the following:
This adds (-a) 2 (-c for count) meta-databases in each of the slices. If you have more
disks, you can span the databases across multiple disks for better performance and fault-
tolerance.
The next step is to create your raid-devices. In a two-disk system, you're stuck with Raid0
and Raid1. Since Raid0 is almost pointless (you're doing this for redundancy,
remember?!), we'll go with Raid1 - mirrored disks.
d0 - / mirror
d10 - /dev/dsk/c0t0d0s0
d20 - /dev/dsk/c0t1d0s0
d1 - swap
d11 - /dev/dsk/c0t0d0s1
d21 - /dev/dsk/c0t1d0s1
The device names are somewhat arbitrary. In a simple setup like this, I use d0 to match
up with a mirrored slice0, and d10 to indicate member 1 of d0 (member 1 d0 = d10,
member 2 d0 = d20).
This initializes the devices. The command "metastat" will show you that the devices
exist, but the mirror-halves aren't attached. So let's attach them:
metattach -f d0 d10
metattach -f d1 d11
You've just attached the first half of the mirror. Yes, this is the disk that you're currently
running on. Your data is still there.
Next, you need to ensure the system will use the metadevices. The root-filesystem is
easy:
metaroot d0
Next, you need to edit /etc/vfstab to change the swap device to use /dev/md/dsk/d1 as
swap. While you're in there, turn on logging under the mount options for the root
filesystem (d0). Double-check that you haven't screwed up. Save and exit if it all looks
good.
lockfs -fa
init 6
Watch your system come up. There will be some new messages, most notably the kernel
complaining about not being able to forceload three raid modules:
You can ignore these messages. They're harmless. Basically, you haven't created any
raid-devices that require those modules so they're refusing to load.
Now that your system is up (You didn't mess up vfstab, did you?!), you need to finish off
the process. Log in and do this:
metattach -f d0 d20
metattach -f d1 d21
You'll notice that your system is now a little slower, both commands took a moment to
return, and your disks are going nuts. Look at the output of "metastat" and you'll see why
- your disks are syncing.
You'll need to install the bootsector to your second disk so that you can boot from it. This
is fairly easy to do:
You might also want to set the OBP to boot from disk1 if it can't boot from disk0. If you
bring the machine to the OBP (ok) prompt via init 0, you can enter the following:
This will set up a failover boot to disk1. The very last command there will also boot from
disk1, proving to you that this works. Do be sure to substitute the correct disk for "disk1".
You now have your root filesystem and swap space sitting on raid1 volumes. This means
that losing a disk no longer means that you have to rebuild your system. Now you just
need to replace a disk.
I strongly suggest that you read through DiskSuite's docbook at Sun's online
documentation site (docs.sun.com). There's a lot more that you can do with it that's not
covered here. Oh, and you'll probably want to read up on how to actually replace a failed
disk. :)
(If there's enough interest, I might be coerced into posting a howto on that)
Description
This will make Solaris reject remote logging from other devices.
Directions
Syslogd to reject remote logging
syslogd -t will turn on sysloging but it will not receive remote logging from other
devices.
Description
When troubleshooting networking issues, it is often helpful to determine the state of an
ethernet interface. Solaris offers access to many configurable networking parameters
through ndd.
Directions
To determine the ethernet interface link status, duplex, and speed on hme0, run the
following commands as superuser:
If you have only one ethernet interface, you can leave out the instance command.
Otherwise, you can specify the hme instance number there. The results of the next three
commands are either 1 or 0. In each case, the value means:
Description
If a network interface was configured with the wrong subnet mask as can happen when
the default subnet is selected with a variable length subnet mask, a simple configuration
change will fix it.
Directions
Consider a host that is assigned the IP address 10.50.90.15 in the class C subnet
10.50.90.0/24. The normal subnet mask for a class A 10.* subnet is 255.0.0.0, and this is
the value that an operating system will guess given that IP address information alone. To
correct this problem permanently so that it will persist after the host reboots, edit the
/etc/netmask file and add the following line:
10.50.90.0 255.255.255.0
To reconfigure the interface, say hme0, immediately without rebooting the system, run
the following as root:
Directions
netstat -r
Optionally, to prevent reverse dns lookups which may slow down the execution of the
command:
netstat –rn
Description
Get information about the default route (gateway)
Directions
route get default
Description
Add a default route (gateway). Create an /etc/defaultrouter file with the IP to have it set
to this for each boot.
Directions
route add default xxx.xxx.xxx.xxx
Using the editor of your choice, edit the file /etc/defaultrouter -- the only line in the file
should be the default route of the system, for example: 192.168.1.1. This change will not
take effect until the system is rebooted.
To make the route change take effect immediately, you must first delete the default route.
If the current default route is 192.168.254.1, then the command would be:
To implement the new default route without rebooting the system, use the following
command substituting your default route for 192.168.1.1:
Description
Virtual interfaces allow a single ethernet interface to listen on additional IP addresses.
Directions
Given an ethernet interface hme0 (use ifconfig -a to identify the names of your
interfaces), you can create a subinterface called hme0:1 with the following command:
You can set the IP address of the interface to 192.168.1.15 and turn on the interface with
the following command:
To make the virtual interface persist following a reboot, you can add the ip address or
hostame from /etc/hosts in the file /etc/hostname.hme0:1
Description
The cron facility provides a powerful, minute-resolution process scheduler. If a process
needs to run repeatedly without human intervention, an entry in the crontab file can
accommodate most schedules. There are simple rules for modifying the crontab entries
that must be followed.
Directions
To edit the crontab file, the crontab program must be used. The actual crontab files
should not be edited directly because the contents are cached and changes will not take
effect until the crond process is restarted. Using the crontab program to edit the crontabs
will update the cache when the file is changed. To edit the current user's crontab file, use:
crontab -e
The -e option tells the program to edit a copy of the user's crontab file. The EDITOR
environment variable is referenced to determine which editor to use (default is ed). To set
this environment variable, see recipes for ksh and sh.
The superuser can edit a specific user's crontab by adding the username at the end of this
command. The processes run from a user's crontab will be run as that user. Be careful
with commands in root's crontab because these will run as root and could cause problems.
If shell scripts are run from root's crontab, make sure their file permissions do not allow
modification by anyone but root.
The syntax of crontab is simple. Each line represents a single scheduled task. The first
five fields represent timing information and everything following is interpreted as the
command to schedule. The timing fields in order are:
minutes - 0-59
hours - 0-23
days of month - 1-31
months of year - 1-12
days of week - 0-6 (Sunday-Saturday)
A variety of options work for each field. An asterisk (*) indicates all possible occurrences
for that field. A number sets that single occurrence. Two numbers separated by a -
indicates a range of values, and numbers separated by a comma indicate a list of
occurrences.
Several examples:
15 * * * * logcheck
Runs a command called 'logcheck' every 15 minutes of every day.
Description
You've run netstat -an. You see that port TCP/65237 is listening. What program is
actually running and holding that port open? Here's how to find out.
Directions
Sometimes you notice odd ports open and listening on your machine. In this day and age,
that's often a bad thing. But what do you kill to close off that port? Is that port supposed
to be open (ie, something legitimately running).
You've probably heard of lsof. List Open Files. You usually use it to find out what files
are open on a given mount point. Well, you can also use it to find out what open sockets
your machine has:
eg for oracle
# lsof -nl | egrep "TCP|UDP" |grep ora
That shows that my workstation, while typing this message, has wish running at PID
30766. It's listening on two ports: TCP/65237 and TCP/63251.
You can then run ps to determine what that process really is (no worries... it's aMSN on
my workstation) and decide if you really want to kill it off.
If you're running Solaris, you won't be graced with lsof unless you install it from source
or via SunFreeWare. It does come installed by default on many Linux distributions.
Description
The default banner displayed during a telnet login contains the Solaris version which can
be useful to a potential attacker.
Directions
Create a plain text file called /etc/default/telnetd which contains a line such as:
Description
By default , the solaris inetd deamon does not log the IP address of the machines that are
connecting to Solaris Server . To enable the logging of all the IP addresses of machines
connecting to the server and the connection time ...........
Directions
By default , the solaris inetd deamon does not log the IP address of the machines that are
connecting to Solaris Server . To enable the logging of all the IP addresses of machines
connecting to the server and the connection time the following changes can be
incorporated
1. cd /etc/init.d
2. vi inetsvc
3. Change the last line in the file, ie
/usr/sbin/inetd -s &
to /usr/sbin/inetd -s -t &
./inetsvc stop
./inetsvc start
5. vi /etc/syslog.conf
6 . Add the following line
deamon.notice /var/adm/name_of_log_file ( the two fields should be seperated by tabs )
7. touch /var/adm/name_of_log_file
8. kill -HUP syslogd
After these changes are made all connections that are started through the inetd deamon
( Telnet , FTP ) etc will be logged to the new file created
This is also very useful for auditing purposes with NTP protocol enabled which gives us
a consistent time throughout the enterprise, accountability can be implemented in the
organisation.
Description
A quick one-liner for getting disk and device path.
Directions
echo | format | awk '/c4t1d2/ {print;getline;print}'
example output:
Description
Using a 'here document' it is possible to send multiple lines of input to a program from a
shell script.
Directions
A here document uses the << I/O redirector followed by a code. The subsequent lines are
redirected to the program until the specified code. The basic syntax is:
extra input
CoDe
The code needs to be something that would not occur in the text. This technique is useful
for working with interactive programs like ftp. In the following example, a file specified
in a shell variable ($filename) is retrieved from an ftp server:
filname=important.file.gz
ftp -n ftp.server.name << TheEnd
user username password
cd directory/subdir
get $filename
quit
TheEnd
Description
As admins, we often have to take actions against a list of things (computers, servers, file
shares, etc). One great way to approach this is to put the list in a test file and use a FOR
command to loop through the file and take a action against the contents.
Directions
Say we have a file full of computernames:
complist.txt:
EricsPC
BobsPC
ExtraPC
and we need to delete each of these computers from the domain. Using a FOR loop to
profress the file is the way to go (especially if there are really 300 computer names!)
First a test:
FOR /f %a in ('complist.txt') do echo Computer: %a
should return
Computer: EricsPC
Computer: BobsPC
Computer: ExtraPC
To actually delete the PCs from the domain, change the command to:
Of course we can use this to run any command-line against any list. In fact, we can use
the FOR to run a command that would generate the file.
(Note: This is valid command-line syntax. To run in a batch file, use two percent signs
(e.g. '%%a' )
\\Greg
Description
This quick recipe describes how to supress responses to commands within batch files.
Directions
During the execution of your batch files, users can see your commands on the screen.
Most of the time this is good for debugging purposes; however, often it's better to run
your batch file silently.
Any command that is followed by "> nul" will not be shown on the screen. The greater
sign directs the output to null (nowhere).
For example, copy *.* *.bak will show you as it copies all the files in the directory and
renames them with the bak extension.
copy *.* *.bak > nul will silently complete its work without placing anything on the
screen.
Description
Successful batch files cannot be slowed down with a bunch of user prompts. This little
recipe will show you how to supress prompts in batch files.
Directions
Commands such as deleting all files can complicate batch files because the system
prompts for confirmation. For example...
To supress the "File Not Found" error when trying to delete files from an empty
directory, use this code instead: