Professional Documents
Culture Documents
AIX 6 1 AN15 Advanced Administration and Problem Determination Instructor PDF
AIX 6 1 AN15 Advanced Administration and Problem Determination Instructor PDF
AIX 6 1 AN15 Advanced Administration and Problem Determination Instructor PDF
cover
Front cover
Instructor Guide
ERC 1.1
Instructor Guide
Trademarks
The reader should recognize that the following terms, which appear in the content of this
training document, are official trademarks of IBM or other companies:
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
AIX® AIX 5L™ DB2®
HACMP™ MWAVE® POWER™
POWER4™ POWER5™ POWER5+™
POWER6™ POWER Gt1™ POWER Gt3™
Power Systems™ PowerVM™ pSeries®
Redbooks® RS/6000® SP™
System i® System p® System p5®
Tivoli® WebSphere® Workload Partitions
Manager™
Adobe is either a registered trademark or a trademark of Adobe Systems Incorporated in
the United States, and/or other countries.
Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc.
in the United States, other countries, or both.
Linux® is a registered trademark of Linus Torvalds in the United States, other countries, or
both.
Windows is a trademark of Microsoft Corporation in the United States, other countries, or
both.
UNIX® is a registered trademark of The Open Group in the United States and other
countries.
Other company, product, or service names may be trademarks or service marks of others.
TOC Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
TMK Trademarks
The reader should recognize that the following terms, which appear in the content of this
training document, are official trademarks of IBM or other companies:
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
AIX® AIX 5L™ DB2®
HACMP™ MWAVE® POWER™
POWER4™ POWER5™ POWER5+™
POWER6™ POWER Gt1™ POWER Gt3™
Power Systems™ PowerVM™ pSeries®
Redbooks® RS/6000® SP™
System i® System p® System p5®
Tivoli® WebSphere® Workload Partitions
Manager™
Adobe is either a registered trademark or a trademark of Adobe Systems Incorporated in
the United States, and/or other countries.
Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc.
in the United States, other countries, or both.
Linux® is a registered trademark of Linus Torvalds in the United States, other countries, or
both.
Windows is a trademark of Microsoft Corporation in the United States, other countries, or
both.
UNIX® is a registered trademark of The Open Group in the United States and other
countries.
Other company, product, or service names may be trademarks or service marks of others.
Duration: 5 days
Purpose
This course provides advanced AIX system administrator skills with a
focus on availability and problem determination. It provides detailed
knowledge of the ODM database where AIX maintains so much
configuration information. It shows how to monitor for and deal with
AIX problems. There is special focus on dealing with Logical Volume
Manager problems, including procedures for replacing disks. Several
techniques for minimizing the system maintenance window are
covered. It also covers how to migrate AIX Workload Partitions to
another system with minimal disruption. While the course includes
some AIX 6.1 enhancements, most of the material is applicable to
prior releases of AIX.
Audience
This is an advanced course for AIX system administrators, system
support, and contract support individuals with at least six months of
experience in AIX.
Prerequisites
You should have basic AIX System Administration skills. These skills
include:
• Use of the Hardware Management Console (HMC) to activate a
logical partition running AIX and to access the AIX system console
• Install an AIX operating system from an already configured NIM
server
• Implementation of AIX backup and recovery
• Manage additional software and base operating system updates
• Familiarity with management tools such as SMIT
• Understand how to manage file systems, logical volumes, and
volume groups
Objectives
On completion of this course, students should be able to:
• Perform system problem determination and reporting procedures
including analyzing error logs, creating dumps of the system, and
providing needed data to the AIX Support personnel
• Examine and manipulate Object Data Manager databases
• Identify and resolve conflicts between the Logical Volume Manager
(LVM) disk structures and the Object Data Manager (ODM)
• Complete a very basic configuration of Network Installation
Manager to provide network boot support for either system
installation or booting to maintenance mode
• Identify various types of boot and disk failures and perform the
matching recovery procedures
• Implement advanced methods such as alternate disk install,
multibos, and JFS2 snapshots to use a smaller maintenance
window
• Install and configure Workload Partition Manager to support WPAR
management and to implement Live Application Mobility (LAM)
pref Contents
• Overview of advanced administration techniques
• Error monitoring
• The Object Data Manager (ODM)
• Basic Network Installation Manager (NIM) configuration
• System initialization problem determination
• Disk management theory and procedures
• Advanced techniques for installation and backup
• Workload Partition (WPAR) Manager and Live Application Mobility
• The AIX system dump facility
pref Agenda
The estimated timings provided here are for content only. It assumes the remainder of
the day is consumed with hourly breaks and a one hour lunch break. Most days are
timed to allow class dismissal between 4 p.m. and 4:30 p.m. assuming a 9 a.m. to 5
p.m. class day. If the class runs quicker than expected, most days have an optional lab
for the students to play with, which will help fill in the time.
References
SG24-5496 Problem Solving and Troubleshooting in AIX 5L
(Redbook)
SG24-5766 AIX 5L Differences Guide Version 5.3 Edition
(Redbook)
SG24-7559 IBM AIX Version 6.1 Differences Guide (Redbook)
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Application outages
IBM Power Systems
• Functional or performance
• Avoid unplanned outages with best practices
– Change control
– Data security
– Capacity planning
– High availability design
• Avoid planned outages
– Fall-over to backup server
– Relocate application (LPAR or WPAR mobility)
• Use maintenance windows
– Application stopped versus slow activity
– Plan enough time for back-out or recovery
– Minimize time needed
• Effective problem determination and recovery
© Copyright IBM Corporation 2009
Notes:
Introduction
Providing system availability is a major responsibility of any system administrator. An
outage may be caused by a functional problem (such as an application or system crash) or
a server performance problem (business is seriously impacted due to poor response times
or late jobs). There are many approaches to dealing with this.
Unplanned outages
When most of us think of availability, we think of unplanned outages. Regular hardware and
software maintenance can often avoid these outages. Designing the computing facility to
have redundant components (power, network adapters, network switches, storage, and
more) can make the overall system resilient to the failure of individual components.
Performance problems are often the result of failing to do proper capacity planning,
resulting in not enough resources (memory, processors, network bandwidth, or disk I/O
bandwidth) to handle the increased workload. If there is no change control to manage what
Uempty work is placed on a system, capacity planning is even more challenging. Furthermore,
uncontrolled changes to a system result in uncontrolled exposure to possible outages
created by those changes, an thus unplanned outages. Computer viruses and other
malicious attacks by computer hackers can also reduce system availability (in addition to
the exposure of losing proprietary information). Good data security policies are essential.
Even when implementing good policies in these areas, some unplanned outages will still
happen. In these situations, the system administrator needs to have a plan for minimizing
the impact and recovering as quickly as possible. One common approach is to have an
alternate system that can take over the work of the failed system. High Availability Cluster
Multi-Processing (HACMP) provides a system for either concurrent processing by multiple
systems, or an automated fall-over to a backup system, thus minimizing the impact of a
server failure. Such server redundancy can be designed to work within a single facility or
be divided between different geographical locations. Obviously, rapid notification of a
problem, effective and prompt diagnosis of the cause, and being able to quickly implement
an effective solution will all contribute to a smaller mean time to recovery.
Planned outages
By using change control, the risk associated with certain categories of potential unplanned
outages can be managed by implementing the changes during planned windows of time
when the impact of any unexpected problem (resulting from the change) is minimized. In
addition, there are certain types of changes for which an outage is unavoidable.
Some facilities will implement multiple types of maintenance windows. One type would be
frequent short maintenance windows for any administrative work that will compete with
applications for resources (performance impact) or have a small chance of having a
functional disruption. Another type would be a less frequent window in which any reboot of
the system or any major change to the level of the operating system or major subsystems,
such as database software, would be allowed.
Sometimes, the amount of time in a maintenance window is relatively small and the work
has to be carefully planned. You also need to allow time to recover if any thing goes wrong
due to the maintenance. Any needed resources that can be pre-staged will help expedite
the work. Any approach that can speed recovery after a problem occurs is also useful.
For systems which need to be up 24 hours a day, seven days a week, and every day in the
year (24x7x365), even a short outage cannot be tolerated. In those situation, a method to
non-disruptively move the applications to another system can be invaluable. If an HACMP
cluster solution is already in place to handle unplanned outages, then this can be used to
manually fall-over the services to another system while maintenance is being done. Other
solutions are to use Live Partition Mobility or Live Application Mobility.
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide an overview of issues related to application availability.
Details —
Additional information —
Transition statement — Let’s briefly look at the use of LPAR and WPAR mobility to avoid
application outages.
Uempty
• Live Partition Mobility allows the Multiple systems managed by a single HMC
migration of a running logical
partition to another physical
server. Server 1 Server 2
VIOS
VIOS
– Operating system, applications, P1 P2 P3 P1 P5
and services are not stopped during
the process
– Requires POWER6 , AIX 5.3 HMC
and VIO server Network
• Live Application Mobility allows moving a workload partition from one server to
another.
– Without requiring the workload running in the AIX # 2
WPAR to be restarted
– Provides outage avoidance Workload
AIX # 1
and multi-system 1. Partition
Billing
Workload
2.
AIX # 3
Partition
workload balancing Workload Workload
n
Data Mining Workload
Par tition Par titio Partition
EMail App Srv Test
– Requires AIX 6.1 Workload
tition Par
Workload Training
Partition Workload
Web Partition Policy
Dev Workload
Partitions
Manager
Figure 1-3. Live Partition Mobility versus Live Application Mobility AN151.0
Notes:
As the number of hosted partitions and applications increases, finding a maintenance
window acceptable to all becomes increasingly difficult. Live partition or application mobility
allow you to move your partitions around such that you can perform disruptive operations
on the machine when it best suits you, rather than when it causes the least inconvenience
to the users.
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Live Application Mobility (LAM) is a new capability that allows a client to relocate a running
WPAR from one system to another, without requiring the workload running in the WPAR to
be restarted. LAM is intended for use within a data center and requires the use of the new
Licensed Program Product, the IBM AIX Workload Partitions Manager.
Live Application Mobility differs significantly from Live Partition Mobility in that Live Partition
Mobility is a feature of POWER6 processors. As such, it can be used on operating systems
other than AIX 6, such as Linux or earlier AIX versions. On the other hand, WPAR is
specifically a feature of AIX 6, but it can run on various hardware platforms (for example:
POWER6, POWER5 or POWER5+, or POWER4 systems).
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
• System backups
– Minimizing rootvg size
– Snapshot techniques for user file systems
Notes:
Uempty An important technique, that we will cover, is the use of an alternate storage for the target
of the software update. What we mean is that the updates are not made to the rootvg, but
rather to a copy of the rootvg. This has two advantages. First, there is no change being
made to the active rootvg. For locations that make a distinction between changing the level
of the operating system and simply doing work that has a performance impact, the actual
time consuming update activity can be done in a more frequently available window. Then
when a major maintenance window arrives, you only need to reboot to make it effective.
The second advantage, and to some the more important advantage, is the ease of
recovery. If you find that there are serious problems with running under the new level of
code, you only need to reboot back to the earlier code level, rather than recover from a
mksysb or reject the entire update. Of course, the down side is that you will need to reboot
to make the update effective; but, this is something a major maintenance window should
expect.
There are two techniques that we will cover. One technique, is creating an alternate set of
logical volumes that are copies of the rootvg BOS logical volumes. This is called multibos.
The other technique, is creating an alternate volume group which is a clone of the rootvg. In
each case, you would apply the maintenance to the copy and then later reboot to make it
effective.
Expediting backups
Another common maintenance activity is backing up the system. Unless you have an
application that is designed to manage a recovery process using fuzzy backups, you will
need to quiesce the application activity long enough to be sure that there are no
inconsistencies in the backup. The term fuzzy backup refers to a backup in which the
application was making changes during the backup. For a given transaction, multiple data
changes are made. Some of these transaction related changes are made before that data
was backed up, while other changes were made after that data was backed up. Thus the
backup has one piece of data which reflects the transaction and another piece of data that
does not reflect the transaction. The two pieces of data are inconsistent and such a backup
is referred to as fuzzy.
For the rootvg itself, the size of the rootvg should be minimized. It should only contain what
is needed for the OS. All user data and other non-essential files should be backed up and
restored separately. An example would be the standard location of a software repository:
/usr/sys/inst.images. The software repository can be very large and yet this
common path resides in the /usr file system, which is in the rootvg. Placing the software
repository in a separate file system with its own recovery plan (could be using the original
media as the backup) can help reduce backup and recovery time. Another common
example is the /home filesystem. If users have vast amounts of data stored there, then
over mounting with a separate file system can again speed up working with the rootvg.
There other file systems such as /tmp that could have contents be eliminated from the
system backup.The trick is that these would need to be excluded (not mounted or identified
in /etc/exclude.rootvg) from the backup during mksysb execution, and then
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
separately recovered from their own backup. Other user data will be in separate user
volume groups.
With the emphasis on separate backups for non-BOS data, there comes a need to
minimize how long the applications need to be quiesced and still have data consistency.
One technique that AIX provides is JFS2 snapshots, which will allow us to only very briefly
quiesce the application and still have a consistent picture of the data at a single point in
time. Then we can either use that snapshot of the data as its own backup, or base an
actual backup upon that snapshot (in order to have off-site storage of the backup). There
other facilities for doing snapshot captures of data. Some are part of the storage
subsystems and some are part of total storage solutions such as Tivoli Storage Manager.
Our focus will be on the facility that is provided with AIX: JSF2 snapshot.
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
• If an AIX bug:
– Collect problem information.
– Open problem report with AIX Support.
– Provide snap with information.
Notes:
System maintenance
Sometimes code works well under normal testing or production circumstances, but can
have a poor logic discovered when faced with an unanticipated situation. Alternatively, it
could be some non-central aspect of the code that is not noticed normally. The number
of facilities using this code is large enough that there is a good chance that one of the
facilities will detect and report the problem not long after release of the new code level.
Uempty The fix for the code defect will usually come out in the next released fix pack. On the
other hand, many facilities may not be effected by or be concerned about the code
defect problem for months, until the circumstances arise in which it represents a
problem. By installing newer service packs, a facility can benefit from the experience of
others and avoid being impacted by known problems.
Obviously there is always the possible exposure that a new fix pack will introduce new
problems, while solving many old problems.
This course will cover some techniques to use in applying fix packs.
Problem determination
Once you find yourself impacted by what you believe to be a product defect, you will
need to obtain prompt resolution. While there is no substitute for experience (the ability
to recognize a situation and remember the details of how you dealt with it the last time a
similar problem occurred), many problems will be most effectively solved by following a
well developed problem determination methodology. This course will cover a basic
problem determination methodology.
Problem determination
When you find yourself impacted by what you believe to be a product defect, you will
need to contact AIX Support. Before contacting AIX Support, you should write up a
description of the problem and the surrounding circumstances. When you open a new
Problem Management Report (PMR) with AIX Support, you will be expected to provide
them with a wealth of information to assist them in determining the cause of the
problem. The snap command is a common tool to assist in collecting a vast amount of
information about the environment surrounding the problem. The course materials will
cover these problem reporting procedures.
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce problem management.
Details —
Additional information —
Transition statement — As just stated, keeping good documentation is important. Let’s
take a closer look at this.
Uempty
System
System
documentation
documentation
Notes:
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
1.Identify the
problem
2. Talk to users
to define the
problem
3. Collect
system data
4. Resolve the
problem
Notes:
Suggested questions
- What is the problem?
- What is the system doing (or not doing)?
- How did you first notice the problem?
- When did it happen?
- Have any changes been made recently?
Keep them talking until the picture is clear. Ask as many questions as you need to in
order to get the entire history of the problem.
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
• Progress codes
• System reference codes (SRCs)
• Service request numbers (SRNs)
• Obtained from:
– Front panel of system enclosure
– HMC or IVM (for logically partitioned systems)
– Operator console message or diagnostics (diag utility)
• Online hardware and AIX documentation available at:
http://publib.boulder.ibm.com/infocenter/systems
– Select System Hardware > System i and System p
• Popular links and effective searches available
– Select Operating System > AIX 6.1 Information
• Search for “message center”
• Diagnostic Information for Multiple Bus Systems (SA38-0509)
Notes:
Introduction
AIX provides progress and error indicators (display codes) during the boot process.
These display codes can be very useful in resolving startup problems. Depending on
the hardware platform, the codes are displayed on the console and the operator panel.
Operator panel
For non-LPAR systems, the operator panel is an LED display on the front panel.
POWER4, POWER5, and POWER6-based systems can be divided into multiple
Logical Partitions (LPARs). In this case, a system-wide LED display still exists on the
front panel. However, the operator panel for each LPAR is displayed on the screen of
the Hardware Management Console (HMC). The HMC is a separate system which is
required when running multiple LPARs. Regardless of where they are displayed, they
are often referred to as LED Display Codes.
Documentation
Note: all information on Web sites and their design is based upon what is available at
the time of this course revision. Web site URLs and the design of the related Web
pages often change.
Online hardware documentation and AIX message codes are available at:
http://publib.boulder.ibm.com/infocenter/systems
- Many of the codes you will deal with are actually hardware or firmware related. For
those codes, you need to navigate to the infocenter that specializes in system
hardware.
• The content area has popular links for accessing code information, or you can
use search strings such as: system reference codes, service request numbers,
or service support troubleshooting.
- For AIX codes and messages, you will need to navigate to the Operating System
infocenter for AIX.
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
• From here you can use the search string of AIX message center to obtain
information on various codes (including the seven digit message codes).
• One very useful reference that you can find at the AIX infocenter is the:
RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems
(SA38-0509).
Chapter 30 has AIX diagnostic numbers and location codes. It provides
descriptions for the numbers and characters that display on the operator panel
and descriptions of the location codes used to identify a particular item.
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
If you believe that your problem is the result of a system defect, you can call AIX Support to
request assistance. Before you call 1-800-IBM-SERV, it is a good idea to have certain
information ready. They will want to verify your name against a list of names associated
with your customer number, and validate that your customer number has support for the
product in question. They will also need to know some details about the hardware and
software environment in which the problem is occurring - such as your MTMS (machine
type, model, serial), your AIX OS level, and the level of any other relevant software. Of
course, you need to explain your problem, providing as much detail as possible, especially
any error messages or codes.
The level 1 personnel will ask you for the priority of your problem.
• Severity level 1(critical) indicates that the function does not work, your business is
severely impacted, there is no work around, and that there needs to be an immediate
solution. Be aware that, for severity level 1, you will be expected to be available 24x7
until the problem is resolved.
Uempty • Severity level 2 (significant impact) indicates that the function is usable but is limited in
a way that your business is severely impacted.
• Severity level 3 (some impact) indicates that the program is usable with less significant
features (not critical to operations) unavailable.
• Severity level 4 (minimal impact) indicates that the problem causes little impact on
operations, or a reasonable circumvention to the problem has been implemented.
Level 1 will assign you a PMR number (actually a PMR and branch number combination)
for tracking purposes. Each time, in the future, when you call about this problem, you
should have the PMR and branch numbers at hand.
Once the basic information has been collected, you are passed to level 2 personal for the
product area for which you are having a problem. They will work with you in investigating
the nature and cause of your problem. They will search the support database to see if it is a
known problem that is either already being worked on or has a solution already developed.
In many cases, they will request that you update to a specific technology level and service
pack that already includes the fix.
If they do not have a fix, they may still ask you to update your system and determine if the
problem still exists. If the problem still exists, they now have a known software environment
to work with. At this point they will often ask for a complete set of information from your
system to be collected and uploaded to their server, to support their investigation. The
basic tool for collecting your system information is the snap command.
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the procedure for working with AIX Support.
Details —
Additional information —
Transition statement — Let’s look at how we work with the snap command.
Uempty
# snap –a
# mv /tmp/ibmsupt/snap.pax.Z \
PMR#.b<branch#>.c<country#>.snap.pax.Z
Notes:
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# ftp testcase.software.ibm.com
User: anonymous
Password: <your email address>
ftp> cd /aix/toibm
ftp> bin
ftp> quit
© Copyright IBM Corporation 2009
Notes:
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Interim fixes
On rare occasions, a customer has an urgent situation which needs fixes for a problem
so quickly that they cannot wait for the formal PTF to be released. In those situations, a
developer may place one or more individual file replacements on an FTP server and
allow the system administrator to download and install them. Originally, this would
simply involve manually copying the new files over the old files. But this created
problems, especially in identifying the state of a system which later experienced other
(possibly related) problems or in backing out the changes.
Today, there is a better methodology for managing these interim fixes using the efix
command. Security alerts will often provide interim fixes for the identified security
exposure. Depending upon your own risk analysis, you might immediately use the
interim fix, or wait for the next service pack (which will include these security fixes).
The syntax and use of the efix command was covered in the prerequisite course.
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain standard terminology for software updates
Details —
Additional information —
Transition statement — Let’s look at how we obtain these updates.
Uempty
Relevant documentation
IBM Power Systems
Notes:
IBM Redbooks
Redbooks can be viewed, downloaded, or ordered from the IBM Redbooks Web site:
http://www.redbooks.ibm.com
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Identify URLs for hardware and software documentation.
Details — Let students know that hard copy versions of the manuals can be ordered from
their IBM marketing representative.
Additional information —
Transition statement — Let’s review what we have covered with some checkpoint
questions.
Uempty
Checkpoint
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Discuss the first group of checkpoint questions.
Details — A checkpoint solution is provided below:
Checkpoint solutions
IBM Power Systems
Additional information —
Transition statement — Let’s take a look at what we have in the class lab environment.
Uempty
Notes:
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the exercise for this unit.
Details —
Additional information —
Transition statement — Let’s summarize what we have covered in this unit.
Uempty
Unit summary
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 1. Advanced AIX administration overview 1-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Remind the students of some of the key points in this unit.
Details —
Additional information —
Transition statement — That is the end of this unit.
References
Online AIX Version 6.1 Command Reference volumes 1-6
Online AIX Version 6.1 General Programming Concepts:
Writing and Debugging Programs
Online AIX Version 6.1 Technical Reference: Kernel and
Subsystems
Note: References listed as “online” above are available through the
IBM Systems Information Center at the following address:
http://publib.boulder.ibm.com/infocenter/systems
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Devices Software
System
resource ODM SMIT menus
controller
Notes:
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide an overview what data is stored in the ODM.
Details — Go quickly through the list and mention that the main emphasis in this unit is on
devices and software vital product data. You might want to point out that the two “hands” on
the visual “point” to the types of data that will be emphasized. Later on, you supply the
corresponding ODM database files where the data is stored.
Additional information — You might mention that TCP/IP configuration can still be set up
without using ODM. In this case, traditional ASCII files are used for storing TCP/IP data. To
determine whether ODM is used for TCP/IP, use the following command:
# lsattr -El inet0
If the attribute bootup_option is set to no, ODM files are used. If it is set to yes, ODM will
not be used.
Transition statement — Let’s define some key terminology we will need for our discussion
of the ODM.
Uempty
ODM components
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Define the basic components of ODM.
Details — Complete the visual during the lesson. ODM components are:
• Object classes
The ODM consists of many database files, where each file is called an object class.
• Objects
Each object class consists of objects. Each object is one record in an object class.
• Descriptors
The descriptors describe the layout of the objects. They determine the name and
datatype of the fields that are part of the object class.
Additional information — This visual shows an extraction out of the ODM class PdAt. Do
not explain the meaning of PdAt or the different fields on this page. Concentrate on the
components of the ODM.
Transition statement — It is also important to understand how the terms predefined
device information and customized device information are used when discussing the ODM.
Uempty
sm_menu_opt, sm_name_hdr,
SMIT menus sm_cmd_hdr, sm_cmd_opt
Notes:
Current focus
In this unit, we will concentrate on ODM classes that are used to store device
information and software product data. At this point, we will narrow our focus even
further and confine our discussion to ODM classes that store device information.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Predefined databases
PdDv
PdCn PdAt
Configuration Manager
Config_Rules
(cfgmgr)
Customized databases
CuDvDr CuVPD
Notes:
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Configuration manager
IBM Power Systems
Predefined
"Plug and Play"
PdDv
PdAt
PdCn
Config_Rules
cfgmgr
Customized Methods
CuDv Define
Device Load
CuAt Configure
Driver
CuDep Change
CuVPD Undefine
© Copyright IBM Corporation 2009
Notes:
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
CuDv
CuAt Network
CuDep
CuDvDr
CuVPD
Config_Rules PdDv
PdAt
history PdCn
inventory
lpp history
product inventory
lpp history
nim_* product inventory
SWservAt lpp
SRC* sm_* product
Notes:
Introduction
To support diskless, dataless and other workstations, the ODM object classes are held
in three repositories. Each of these repositories is described in the material that follows.
/etc/objrepos
This repository contains the customized devices object classes and the four object
classes used by the Software Vital Product Database (SWVPD) for the / (root) part of
the installable software product. The root part of the software contains files that must
be installed on the target system. To access information in the other directories, this
directory contains symbolic links to the predefined devices object classes. The links are
needed because the ODMDIR variable points to only /etc/objrepos. It contains the
part of the product that cannot be shared among machines. Each client must have its
own copy. Most of this software requiring a separate copy for each machine is
associated with the configuration of the machine or product.
Uempty /usr/lib/objrepos
This repository contains the predefined devices object classes, SMIT menu object
classes, and the four object classes used by the SWVPD for the /usr part of the
installable software product. The object classes in this repository can be shared across
the network by /usr clients, dataless and diskless workstations. Software installed in
the /usr part can be can be shared among several machines with compatible
hardware architectures.
/usr/share/lib/objrepos
Contains the four object classes used by the SWVPD for the /usr/share part of the
installable software product. The /usr/share part of a software product contains files
that are not hardware dependent. They can be shared among several machines, even if
the machines have a different hardware architecture. An example of this are terminfo
files that describe terminal capabilities. As terminfo is used on many UNIX systems,
terminfo files are part of the /usr/share part of a system product.
lslpp options
The lslpp command can list the software recorded in the ODM. When run with the -l
(lower case L) flag, it lists each of the locations (/, /usr/lib, /usr/share/lib) where it finds
the fileset recorded. This can be distracting if you are not concerned with these
distinctions. Alternately, you can run lslpp -L which only reports each fileset once,
without making distinctions between the root, usr, and share portions.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the different directories that hold ODM data.
Details — Describe what ODM files reside in /etc/objrepos, /usr/lib/objrepos
and /usr/share/lib/objrepos.
Explain the meaning of the root, /usr and /usr/share part of a software product and
identify that /usr/lib/objrepos and /usr/share/lib/objrepos can be shared in a
network.
Additional information —
Transition statement — It is important to understand how ODM classes interact.
Uempty
PdDv: CuDv:
type = "14106902" name = "ent1"
class = "adapter" status = 1
subclass = "pci" chgstatus = 2
prefix = "ent" ddins = "pci/goentdd"
cfgmgr location = "02-08"
DvDr = "pci/goentdd" parent = "pci2"
Define = /usr/lib/methods/define_rspc" connwhere = "8“
Configure = "/usr/lib/methods/cfggoent"
PdDvLn = "adapter/pci/14106902"
uniquetype = "adapter/pci/14106902"
PdAt: CuAt:
uniquetype = name = "ent1"
"adapter/pci/14106902" chdev -l ent1 \ attribute = "jumbo_frames"
attribute = "jumbo_frames" -a jumbo_frames=yes value = "yes"
deflt = "no" type = "R"
values = "yes,no"
Notes:
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Summarize how the basic ODM classes interact.
Details — Explain the flow as described in student notes.
Additional information — None.
Transition statement — As you know, not all system data is managed by the ODM.
Uempty
Filesystem
information ?
User/security
information ?
Queues and
queue devices ?
Notes:
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Review some files from the basic administration course.
Details — Ask the students the following questions:
1. Which file contains information about the file systems on your system?
/etc/filesystems
2. Which file contains most of the basic information (such as home directory and shell)
about the users on your system?
/etc/passwd
Which file contains user attributes like password rules?
/etc/security/user
3. Where is information about your queues and queue devices stored?
/etc/qconfig
Be sure to fill in the appropriate line on the visual as you give the answer to each question.
Additional information — Tell the students that this is only a subset of data that is not in
ODM.
Transition statement — Let’s review some of the points we have covered so far in this
unit.
Uempty
Let’s review:
Device configuration and the ODM
IBM Power Systems
1.
_______
2. 3.
AIX kernel Applications
Figure 2-11. Let’s review: Device configuration and the ODM AN151.0
Notes:
Instructions
Please answer the following questions by writing them on the picture above. If you are
unsure about a question, leave it out.
1. Which command configures devices in an AIX system? Note: This is not an ODM
command.)Which ODM class contains all devices that your system supports?
2. Which ODM class contains all devices that are configured in your system?
3. Which programs are loaded into the AIX kernel to control access to the devices?
4. If you have a configured tape drive rmt1, which special file do applications access to
work with this device?
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide information about what happens when a device is configured in AIX.
Details — Give the students five minutes to answer the questions. Then, provide the
following answers:
1. cfgmgr
2. PdDv
3. CuDv
4. Device Driver
5. /dev/rmt1
Additional information — Summarize the picture after the discussion:
If a device is to be configured, it must first be part of the PdDv class. It is not possible to
configure a device that is not defined/predefined in the corresponding Pd classes.
If a device is in the defined state, you definitely have an object in ODM class CuDv. The
difference between the defined state and the available state is that, in the defined state, no
device driver has been loaded into the AIX kernel. In other words, the program that controls
the device does not exist in the defined state.
When a device is made available, the device driver is loaded into the kernel. Additionally, a
special file is created in the /dev directory that applications need to access the device.
All this is done dynamically without a need to recompile the AIX kernel (which historically
had to be done on other UNIX systems). Historically, this has been one big advantage of
AIX against other UNIX systems.
Transition statement — Now, let’s look at some commands used to work with the ODM.
Uempty
ODM commands
IBM Power Systems
Descriptors: odmshow
Notes:
Introduction
Different commands are available for working with each of the ODM components:
object classes, descriptors, and objects.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
2. To delete an entire ODM class, use the odmdrop command. The odmdrop command
has the following syntax:
odmdrop -o object_class_name
The name object_class_name is the name of the ODM class you want to remove.
Be very careful with this command. It removes the complete class immediately.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
PdAt:
uniquetype = "tape/scsi/scsd"
attribute = "block_size"
deflt = “512" Modify deflt to 512
values = "0-2147483648,1"
width = ""
type = "R"
generic = "DU"
rep = "nr"
nls_index = 6
Notes:
Possible queries
As with any database, you can perform queries for records matching certain criteria.
The tests are on the values of the descriptors of the objects. A number of tests can be
performed:
= equal
!= not equal
> greater
>= greater than or equal to
< less than
<= less than or equal to
like similar to; finds patterns in character string data
For example, to search for records where the value of the lpp_name attribute begins
with bosext1., you would use the syntax lpp_name like bosext1.*
Tests can be linked together using normal boolean operations, as shown in the
following example:
uniquetype=tape/scsi/scsd and attribute=block_size
In addition to the * wildcard, a ? can be used as a wildcard character.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe some ODM commands used to manage objects.
Details — Go through the example to show how the commands are used. Ensure that the
students understand the purpose of each command. For example, state that the odmget
command is used to retrieve a specific object from an object class to either view it or
change it in some way.
The following example can be used to illustrate the use of the odmget, odmadd and
odmdelete commands:
Assume that you are manipulating the PdAt object class, which has an entry for the 8 mm
tape drive with attribute block_size set to 1024. Assume that you wish to modify the default
blocksize value to 512.
The odmget command extracts the block size record into a file.
Note that, if there is more than one entry matching the pattern, then information regarding
each will be retrieved.
Having obtained the record, use vi (or your favorite editor) to edit that record and overtype
the new number.
Ask the students what potential problem they would encounter if you issued the odmadd
command at this point. They would have duplicate instances of the block_size attribute,
as the original record would not be overwritten. To overcome this problem, issue the
odmdelete command before the odmadd command. If there are duplicate objects, only the
first is recognized.
If there are multiple records matching the search pattern, the odmdelete command will
delete all of them. So, be specific with your search.
Now you can issue the odmadd command.
Notice that, with this command, all you specify is the temporary file where the new
information is held. You do not specify the object class name where the record is to go. Ask
the students how the odmadd command knows which object class this entry is to go to. The
answer is that saved in the file is the name of the object class from where the record was
obtained. The stanza labels in the input file will contain this information, in this case PdAt.
Additional information — None
Transition statement — Let’s look at another way of carrying out the above set of steps.
Uempty
PdAt:
uniquetype = "tape/scsi/scsd"
attribute = "block_size"
deflt = “512" Modify deflt to 512
values = "0-2147483648,1"
width = ""
type = "R"
generic = "DU"
rep = "nr"
nls_index = 6
Notes:
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Define how the odmchange command can be used instead of the odmadd and
odmdelete commands.
Details — Novice users should be encouraged to use odmdelete and odmadd commands
rather than the odmchange command, which does the delete and the add operations all in
one step. This is because with the odmchange command, you have to be very careful about
the possibility of additional entries with the same field as the one you are using for
searching, as you might end up changing more than you anticipated.
Additional information — None
Transition statement — Now, let’s look at some of the key ODM classes in more detail.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
lpp: product:
name = "bos.rte.printers“ lpp_name = "bos.rte.printers“
size = 0 comp_id = "5765-C3403“
state = 5 state = 5
ver = 6 ver = 6
rel = 1 rel = 1
mod =0 mod =0
fix = 0 fix = 0
description = "Front End Printer ptf = "“
Support“ prereq = "*coreq bos.rte 5.1.0.0“
lpp_id = 38 description = "“
supersedes = ""
inventory: history:
lpp_id = 38 lpp_id = 38
private = 0 ver = 6
file_type = 0 rel = 1
format = 1 mod = 0
loc0 = "/etc/qconfig“ fix = 0
loc1 = "“ ptf = "“
loc2 = "“ state = 1
size = 0 time = 1187714064
checksum = 0 comment = ""
© Copyright IBM Corporation 2009
Notes:
Contents of SWVPD
The following information is part of the SWVPD:
• The name of the software product (for example, bos.rte.printers)
• The version, release, modification, and fix level of the software product (for example,
5.3.0.10 or 6.1.0.0)
• The fix level, which contains a summary of fixes implemented in a product
• Any program temporary fix (PTF) that has been installed on the system
• The state of the software product:
- Available (state = 1)
SWVPD classes
The Software Vital Product Data is stored in the following ODM classes:
lpp The lpp object class contains information about the installed
software products, including the current software product state
and description.
inventory The inventory object class contains information about the files
associated with a software product.
product The product object class contains product information about
the installation and updates of software products and their
prerequisites.
history The history object class contains historical information about
the installation and updates of software products.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the software vital product database.
Details — Explain what kind of data is stored in the ODM classes (version, release, and so
forth) and the meaning of the shown ODM classes. Identify how the classes are linked
together by the lpp_id descriptor. Note that the list of descriptors is not complete and that
the slide only lists selected descriptors for teaching purposes.
Additional information — At this point, you might introduce the lslpp command, which
has options like -l, -h, -f and -w. This command queries the software vital product
database. We can see most of this information with the high-level lslpp command. The
flags (and the related object classes) are:
L : list the filesets (lpp object class)
d : list the fileset dependencies (product object class)
p : list the fileset prerequisites (product object class)
w : list the fileset for a given file (inventory object class)
f : list the files for a given fileset (inventory object class)
h : list the maintenance history for a fileset (history object class)
The commands used to produce the output on the visual are:
• lpp:
odmget -q name=bos.rte.printers lpp
• product:
odmget -q lpp_name=bos.rte.printers product
• inventory:
odmget -q lpp_id=38 inventory | pg
Since there are a number of files in the root file system for this fileset, there are a
number of objects that match this query (hence the pg command). Note that there are
also files in this fileset in the usr file system.
To display these: ODMDIR=/usr/lib/objrepos, then rerun the last odmget command.
(Note: ODMDIR defaults to /etc/objrepos.)
• history:
odmget -q lpp_id=38 history
Transition statement — Let’s introduce the most important software states.
Uempty
Notes:
Introduction
The AIX software vital product database uses software states that describe the status of
an install or update package.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Once a product is committed, if you would like to return to the old version, you must
remove the current version and reinstall the old version.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
PdDv:
type = “scsd"
class = "tape"
subclass = "scsi"
prefix = "rmt"
...
base = 0
...
detectable = 1
...
led = 2418
setno = 54
msgno = 0
catalog = "devices.cat"
DvDr = "tape"
Define = "/etc/methods/define"
Configure = "/etc/methods/cfgsctape"
Change = "/etc/methods/chggen"
Unconfigure = "/etc/methods/ucfgdevice"
Undefine = "etc/methods/undefine"
Start = ""
Stop = ""
...
uniquetype = "tape/scsi/scsd"
Notes:
type
This specifies the product name or model number, for example, 8 mm (tape).
class
Specifies the functional class name. A functional class is a group of device instances
sharing the same high-level function. For example, tape is a functional class name
representing all tape devices.
Uempty subclass
Device classes are grouped into subclasses. The subclass scsi specifies all tape
devices that may be attached to a SCSI interface.
prefix
This specifies the Assigned Prefix in the customized database, which is used to derive
the device instance name and /dev name. For example, rmt is the prefix name
assigned to tape devices. Names of tape devices would then look like rmt0, rmt1, or
rmt2.
base
This descriptor specifies whether a device is a base device or not. A base device is any
device that forms part of a minimal base system. During system boot, a minimal base
system is configured to permit access to the root volume group (rootvg) and hence to
the root file system. This minimal base system can include, for example, the standard
I/O diskette adapter and a SCSI hard drive. The device shown on the visual is not a
base device.
This flag is also used by the bosboot and savebase commands, which are introduced
later in this course.
detectable
This specifies whether the device instance is detectable or undetectable. A device
whose presence and type can be determined by the cfgmgr, once it is actually powered
on and attached to the system, is said to be detectable. A value of 1 means that the
device is detectable, and a value of 0 that it is not (for example, a printer or tty).
led
This indicates the value displayed on the LEDs when the configure method begins to
run. The value stored is decimal, but the value shown on the LEDs is hexadecimal
(2418 is 972 in hex).
setno, msgno
Each device has a specific description (for example, SCSI Tape Drive) that is shown
when the device attributes are listed by the lsdev command. These two descriptors are
used to look up the description in a message catalog.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
catalog
This identifies the filename of the national language support (NLS) catalog. The LANG
variable on a system controls which catalog file is used to show a message. For
example, if LANG is set to en_US, the catalog file
/usr/lib/nls/msg/en_US/devices.cat is used. If LANG is de_DE, catalog
/usr/lib/nls/msg/de_DE/devices.cat is used.
DvDr
This identifies the name of the device driver associated with the device (for example,
tape). Usually, device drivers are stored in directory /usr/lib/drivers. Device
drivers are loaded into the AIX kernel when a device is made available.
Define
This names the define method associated with the device type. This program is called
when a device is brought into the defined state.
Configure
This names the configure method associated with the device type. This program is
called when a device is brought into the available state.
Change
This names the change method associated with the device type. This program is called
when a device attribute is changed through the chdev command.
Unconfigure
This names the unconfigure method associated with the device type. This program is
called when a device is unconfigured by rmdev -l.
Undefine
This names the undefine method associated with the device type. This program is
called when a device is undefined by rmdev -l -d.
Start, stop
Few devices support a stopped state (only logical devices). A stopped state means that
the device driver is loaded, but no application can access the device. These two
attributes name the methods to start or stop a device.
Uempty uniquetype
This is a key that is referenced by other object classes. Objects use this descriptor as a
pointer back to the device description in PdDv. The key is a concatenation of the class,
subclass, and type values.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce object class PdDv.
Details — Explain the different descriptors.
Additional information — If you want, you can mention there is an additional method for
starting and stopping a device. To stop a device issue the following command:
# rmdev -l <device_name> -S
Be happy if you found a device that supports the stopped state. Remember physical
devices do not support a stopped state.
You can list the devices in the Predefined Devices object class using the following
command:
# lsdev -P
Transition statement — Next class is PdAt.
Uempty
PdAt:
uniquetype = "tape/scsi/scsd"
attribute = "block_size"
deflt = ""
values = "0-2147483648,1"
...
PdAt:
uniquetype = "disk/scsi/osdisk"
attribute = "pvid"
deflt = "none"
values = ""
...
PdAt:
uniquetype = "tty/rs232/tty"
attribute = "term"
deflt = "dumb"
values = ""
...
Notes:
uniquetype
This descriptor is used as a pointer back to the device defined in the PdDv object class.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
attribute
This identifies the name of the attribute. This is the name that can be passed to the
mkdev or chdev command. For example, to change the default name of dumb to ibm3151
for tty0, you can issue the following command:
# chdev -l tty0 -a term=ibm3151
deflt
This identifies the default value for an attribute. Nondefault values are stored in CuAt.
values
This identifies the possible values that can be associated with the attribute name. For
example, allowed values for the block_size attribute range from 0 to 2147483648, with
an increment of 1.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
CuDv:
name = "ent1"
status = 1
chgstatus = 2
ddins = "pci/goentdd"
location = "02-08"
parent = "pci2"
connwhere = "8"
PdDvLn = "adapter/pci/14106902"
CuDv:
name = "hdisk2"
status = 1
chgstatus = 2
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
PdDvLn = "disk/scsi/scsd"
Notes:
Uempty name
A customized device object for a device instance is assigned a unique logical name to
distinguish the device from other devices. The visual shows two devices, an Ethernet
adapter ent1 and a disk drive hdisk2.
status
This identifies the current status of the device instance. Possible values are:
- status = 0 - Defined
- status = 1 - Available
- status = 2 - Stopped
chgstatus
This flag tells whether the device instance has been altered since the last system boot.
The diagnostics facility uses this flag to validate system configuration. The flag can take
these values:
- chgstatus = 0 - New device
- chgstatus = 1 - Don't care
- chgstatus = 2 - Same
- chgstatus = 3 - Device is missing
ddins
This descriptor typically contains the same value as the Device Driver Name descriptor
in the Predefined Devices (PdDv) object class. It specifies the name of the device
driver that is loaded into the AIX kernel.
location
Identifies the AIX location of a device. The location code is a path from the system unit
through the adapter to the device. In case of a hardware problem, the location code is
used by technical support to identify a failing device.
parent
Identifies the logical name of the parent device. For example, the parent device of
hdisk2 is scsi1.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
connwhere
Identifies the specific location on the parent device where the device is connected. For
example, the device hdisk2 uses the SCSI address 8,0.
PdDvLn
Provides a link to the device instance's predefined information through the uniquetype
descriptor in the PdDv object class.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
CuAt:
name = "ent1"
attribute = "jumbo_frames"
value = "yes"
...
CuAt:
name = "hdisk2"
attribute = "pvid"
value = "00c35ba0816eafe50000000000000000"
...
Notes:
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
PdCn: CuDvDr:
uniquetype = resource = "devno"
"adapter/pci/sym875“ value1 = "36"
connkey = "scsi“ value2 = "0"
connwhere = "1,0" value3 = "hdisk3“
PdCn: CuDvDr:
uniquetype = resource = "devno"
"adapter/pci/sym875“ value1 = "36"
value2 = "1"
connkey = "scsi“ value3 = "hdisk2"
connwhere = "2,0"
CuDep: CuVPD:
name = "rootvg“ name = "hdisk2"
dependency = "hd6" vpd_type = 0
vpd = "*MFIBM *TM\n\
CuDep: HUS151473VL3800 *F03N5280
*RL53343341*SN009DAFDF*ECH17
name = "datavg“
923D *P26K5531 *Z0\n\
dependency = "lv01" 000004029F00013A*ZVMPSS43A
*Z20068*Z307220"
Notes:
PdCn
The Predefined Connection (PdCn) object class contains connection information for
adapters (or sometimes called intermediate devices). This object class also includes
predefined dependency information. For each connection location, there are one or
more objects describing the subclasses of devices that can be connected.
The sample PdCn objects on the visual indicate that, at the given locations, all devices
belonging to subclass SCSI could be attached.
CuDep
The Customized Dependency (CuDep) object class describes device instances that
depend on other device instances. This object class describes the dependence links
between logical devices and physical devices as well as dependence links between
Uempty logical devices, exclusively. Physical dependencies of one device on another device
are recorded in the Customized Devices (CuDev) object class.
The sample CuDep objects on the visual show the dependencies between logical
volumes and the volume groups they belong to.
CuDvDr
The Customized Device Driver (CuDvDr) object class is used to create the entries in
the /dev directory. These special files are used from applications to access a device
driver that is part of the AIX kernel. The attribute value1 is called the major number and
is a unique key for a device driver. The attribute value2 specifies a certain operating
mode of a device driver.
The sample CuDvDr objects on the visual reflect the device driver for disk drives
hdisk2 and hdisk3. The major number 36 specifies the driver in the kernel. In our
example, the minor numbers 0 and 1 specify two different instances of disk dives, both
using the same device driver. For other devices, the minor number may represent
different modes in which the device can be used. For example, if we were looking at a
tape drive, the operating mode 0 would specify a rewind on close for the tape drive, the
operating mode 1 would specify no rewind on close for a tape drive.
CuVPD
The Customized Vital Product Data (CuVPD) object class contains vital product data
(manufacturer of device, engineering level, part number, and so forth) that is useful for
technical support. When an error occurs with a specific device, the vital product data is
shown in the error log.
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain briefly the function of some additional ODM classes.
Details — Describe the ODM classes shown using the explanations in the student notes.
Additional information — None
Transition statement — We have reached a checkpoint.
Uempty
Checkpoint
IBM Power Systems
________________________________________________
________________________________________________
________________________________________________
________________________________________________
________________________________________________
© Copyright IBM Corporation 2009
Notes:
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose —
Details —
Checkpoint solutions
IBM Power Systems
CuAt
Additional information —
Transition statement — Let’s look at reinforcing what we have covered by playing with the
ODM in the lab.
Uempty
Notes:
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the exercise.
Details —
Additional information —
Transition statement —
Uempty
Unit summary
IBM Power Systems
Notes:
The ODM is made from object classes, which are broken into individual objects and
descriptors.
AIX offers a command line interface to work with the ODM files.
The device information is held in the customized and the predefined databases (Cu*, Pd*).
© Copyright IBM Corp. 2009 Unit 2. The Object Data Manager 2-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Review some of the key points covered in the unit.
Details — Present the highlights from the unit.
Additional information — None.
Transition statement — Let’s continue with the next unit.
References
Online AIX Version 6.1 General Programming Concepts:
Writing and Debugging Programs (Chapter 5.
Error-Logging Overview)
Online AIX Version 6.1 Command Reference volumes 1-6
Note: References listed as “online” above are available at the
following address:
http://publib.boulder.ibm.com/infocenter/systems
Unit objectives
IBM Power Systems
Notes:
CuDv, CuAt
error
CuVPD
daemon
errlog
error record /var/adm/ras/errlog
template
/var/adm/ras/errtmplt
/usr/lib/errdemon
errclear
errstop errlogger
application
errlog()
User
/dev/error Kernel
errsave()
(timestamp)
kernel module © Copyright IBM Corporation 2009
Notes:
Detection of an error
The error logging process begins when an operating system module detects an error.
The error detecting segment of code then sends error information to either the
errsave() kernel service or the errlog() application subroutine, where the information
is in turn written to the /dev/error special file. This process then adds a timestamp to
the collected data. The errdemon daemon constantly checks the /dev/error file for
new entries, and when new data is written, the daemon conducts a series of operations.
Uempty To create an entry in the error log, the errdemon daemon retrieves the appropriate
template from the repository, the resource name of the unit that caused the error, and
the detail data. Also, if the error signifies a hardware-related problem and hardware vital
product data (VPD) exists, the daemon retrieves the VPD from the ODM. When you
access the error log, either through SMIT or with the errpt command, the error log is
formatted according to the error template in the error template repository and presented
in either a summary or detailed report. Most entries in the error log are attributable to
hardware and software problems, but informational messages can also be logged, for
example, by the system administrator.
Instructor notes:
Purpose — Define the components of the error logging facility.
Details —
Additional information — See the AIX 5L Differences Guide Version 5.3 Edition Redbook
(SG24-7463-00) for more information about error log hardening (also referred to as error
log RAS).
The following is a list of terms that you will probably refer to:
error ID This is a 32-bit hexadecimal code used to identify a particular
failure. Each error record template has a unique error ID.
error label This is the mnemonic name for an error ID.
error log This is the file that stores instances of errors and failures
encountered by the system.
error log entry A record in the system error log that describes a failure.
Contains captured failure data.
error record template A description of what will be displayed when the error log is
formatted for a report, including information on the type and
class of error, probable causes and recommended actions.
Collectively, the templates comprise the Error Record Template
Repository.
Cover the diagram on the visual starting from the bottom, with the error being detected by
errlog() or errsave() and an entry being made in /dev/error, up to the point where a
user can look at the records of the error log either by going through SMIT or by executing
the errpt command.
An errpt command can be run from the shell or SMIT to format records in the errlog into
readable reports. The ODM classes CuDv, CuAt and CuVPD provides information for the
detailed error reporting.
Transition statement — SMIT can be used to generate an error report.
Uempty
# smit errpt
Generate an Error Report
...
CONCURRENT error reporting? no
Type of Report summary +
Error CLASSES (default is all) [] +
Error TYPES (default is all) [] +
Error LABELS (default is all) [] +
Error ID's (default is all) [] +X
Resource CLASSES (default is all) []
Resource TYPES (default is all) []
Resource NAMES (default is all) []
SEQUENCE numbers (default is all) []
STARTING time interval []
ENDING time interval []
Show only Duplicated Errors [no]
Consolidate Duplicated Errors [no]
LOGFILE [/var/adm/ras/errlog]
TEMPLATE file [/var/adm/ras/errtmplt]
MESSAGE file []
FILENAME to send report to (default is stdout) []
...
Notes:
Overview
The SMIT fastpath smit errpt takes you to the screen used to generate an error
report. Any user can use this screen. As shown on the visual, the screen includes a
number of fields that can be used for report specifications. Some of these fields are
described in more detail below.
Type of report
Summary, intermediate, and detailed reports are available. Detailed reports give
comprehensive information. Intermediate reports display most of the error information.
Summary reports contain concise descriptions of errors.
Error classes
Values are H (hardware), S (software), and O (operator messages created with
errlogger). You can specify more than one error class.
Error types
Valid error types include the following:
- PEND - The loss of availability of a device or component is imminent.
- PERF - The performance of the device or component has degraded to below an
acceptable level.
- TEMP - Recovered from condition after several attempts.
- PERM - Unable to recover from error condition. Error types with this value are usually
the most severe errors and imply that you have a hardware or software defect. Error
types other than PERM usually do not indicate a defect, but they are recorded so that
they can be analyzed by the diagnostic programs.
- UNKN - Severity of the error cannot be determined.
- INFO - The error type is used to record informational entries
Error labels
An error label is the mnemonic name used for an error ID.
Error IDs
An error ID is a 32-bit hexadecimal code used to identify a particular failure.
Resource classes
Means device class for hardware errors (for example, disk).
Resource types
Indicates device type for hardware (for example, 355 MB).
Instructor notes:
Purpose — Explain how an error report can be generated through SMIT.
Details —
Additional information — This option will allow you to produce a detailed or summary
report. Examples of both will be given.
Mention all the different fields that can be used to generate specific searches and reports.
Note that the report can be sent to a file - which is defined by the last option.
The Show only Duplicated Errors option in the Generate an Error Report screen was
introduced in AIX 5L V5.1. Examples of duplicate errors might include floppy drive not
ready, external drive not ready, or Ethernet card unplugged.
Transition statement — Instead of using SMIT, you can also generate a report from the
command line. Let's see how this can be done.
Uempty
• Summary report:
# errpt
• Intermediate report:
# errpt -A
• Detailed report:
# errpt -a
Notes:
The -d option
The -d option (flag) can be used to limit the report to a particular class of errors. Two
examples illustrating use of this flag are shown on the visual:
- The command errpt -d H specifies a summary report of all hardware (-d H) errors.
- The command errpt -a -d S specifies a detailed report (-a) of all software (-d S)
errors.
The -c option
If you want to display the error entries concurrently, that is, at the time they are logged,
you must execute errpt -c. In the example on the visual, we direct the output to the
system console.
The -D flag
Duplicate errors can be consolidated using errpt -D. When used with the -a option,
errpt -D reports only the number of duplicate errors and the timestamp for the first and
last occurrence of the identical error.
The -P flag
Shows only errors which are duplicates of the previous error. The -P flag applies only to
duplicate errors generated by the error log device driver.
Additional information
The errpt command has many options. Refer to your AIX Commands Reference (or
the man page for errpt) for a complete description.
# errpt
Error Type:
Error Class:
• P: Permanent,
• H: Hardware
Performance, or Pending
• S: Software
• T: Temporary
• O: Operator
• I: Informational
• U: Undetermined
• U: Unknown
© Copyright IBM Corporation 2009
Notes:
LABEL: LVM_SA_PVMISS
IDENTIFIER: F7DDA124
Description
PHYSICAL VOLUME DECLARED MISSING
Probable Causes
POWER, DRIVE, ADAPTER, OR CABLE FAILURE
Detail Data
MAJOR/MINOR DEVICE NUMBER
8000 0011 0000 0001
SENSE DATA
00C3 5BA0 0000 4C00 0000 0115 7F54 BF78 00C3 5BA0 7FCF 6B93 0000 0000 0000 0000
Notes:
Instructor notes:
Purpose — Explain the information that is obtained from a detailed report.
Details — Explain using the information in the student notes.
Additional information — None
Transition statement — Disk errors are frequently seen in the error log. There are many
different types of disk errors. Let’s identify the different types and find out the severity of
each.
Uempty
Error
Error Label Recommendations
Type
DISK_ERR1 P Failure of physical volume media
Action: Replace device as soon as possible
DISK_ERR2, P Device does not respond
DISK_ERR3 Action: Check power supply
DISK_ERR4 T Error caused by bad block or occurrence of a
recovered error
Rule of thumb: If disk produces more than one
DISK_ERR4 per week, replace the disk
SCSI_ERR* P SCSI communication problem
(SCSI_ERR10) Action: Check cable, SCSI addresses,
terminator
Error Types: P = Permanent
T = Temporary
© Copyright IBM Corporation 2009
Notes:
4. Sometimes SCSI errors are logged, mostly with the LABEL SCSI_ERR10. They
indicate that the SCSI controller is not able to communicate with an attached device.
In this case, check the cable (and the cable length), the SCSI addresses, and the
terminator.
DISK_ERR5 errors
A very infrequent error is DISK_ERR5. It is the catch-all (that is, the problem does not
match any of the above DISK_ERRx symptoms). You need to investigate further by
running the diagnostic programs which can detect and produce more information about
the problem.
Class
Error Label and Recommendations
Type
LVM_BBEPOOL, S,P No more bad block relocation
LVM_BBERELMAX, Action: Replace disk as soon as
LVM_HWFAIL possible.
LVM_SA_STALEPP S,P Stale physical partition
Action: Check disk, synchronize data
(syncvg).
LVM_SA_QUORCLOSE H,P Quorum lost, volume group closing
Action: Check disk, consider working
without quorum.
Error Classes: H = Hardware Error Types: P = Permanent
S = Software T = Temporary
Notes:
# smit errdemon
Change / Show Characteristics of the Error Log
LOGFILE [/var/adm/ras/errlog]
*Maximum LOGSIZE [1048576] #
Memory Buffer Size [32768] #
...
# smit errclear
Clean the Error Log
Notes:
Instructor notes:
Purpose — Introduce the errdemon and errclear commands.
Details — Explain using the information in the student notes.
Additional information — The Change / Show Characteristics of the Error Log
screen also contains duplicate error options. If Duplicate Error Detection is set to true,
Duplicate Time Interval in milliseconds is used to set a threshold during which
identical error log entries are removed. The Duplicate error maximum sets the point at
which an additional identical error will be considered a new error. For more information, see
the AIX Commands Reference entry for errdemon.
Transition statement — Let’s switch over to an exercise. This exercise has three parts,
but you should only do the first part now. There will be time to do the other parts of the
exercise later.
Uempty
Notes:
Instructor notes:
Purpose — Introduce the next exercise.
Details — Be sure to mention that students should only do “Part 1" of the exercise at this
time. They will do the rest of the exercise later. Provide the goals of this part of the exercise
as given in the student notes.
Additional information — None
Transition statement — Let’s switch over to the next topic, “Error Notification and
syslogd.” We will start by discussing the different ways that error notification can be
implemented.
ODM-Based:
/etc/objrepos/errnotify
Error notification
Notes:
Uempty 3. ODM-based error notification: The errdemon program uses the ODM class errnotify
for error notification. How to work with errnotify is discussed later in this topic.
Instructor notes:
Purpose — Provide different ways to implement error notification.
Details — Explain using the information in the notes. The two methods shown are covered
in the visuals that follows, so there is no need to “pre-teach” them in detail now.
Additional information — Earlier versions of the course discussed concurrent error
logging (errpt -c). Periodic Diagnostics using diagela are not used on p5 and p6 platforms.
The two methods shown are covered in the visuals that follow, so there is no need to
“pre-teach” them now. By default, periodic diagnostics sends mail notifications. It can be
customized to take other actions, such as interfacing to other applications. To specify a
customized action, one would create a PDiagAtt ODM class object with a value descriptor
set to the full path to a script. To see more details about this, refer to the document AIX 5L
Version 5.3 Understanding the Diagnostic Subsystem for AIX (SC23-4919)
Periodic Diagnostics: The diagnostics package (diag command) contains a periodic
diagnostic procedure (diagela). Whenever a hardware error is posted to the log, all
members of the system group get a mail message. Additionally, a message is sent to the
system console. The diagela program has disadvantages:
• Since it executes many times a day, the program might slow down your system.
• Only hardware errors are analyzed.
• Since AIX 5.2, diagela has only supported analyzing processor errors and no other
hardware.
• In POWER5 and POWER6 hardware, diagela does not even support processor
diagnostics. Instead, the platform firmware (service processor) handles this and reports
hardware errors to the managing HMC.
Transition statement — Let’s provide an example to show how you might implement
self-made error notification.
Uempty
#!/usr/bin/ksh
while true
do
sleep 60 # Let's sleep one minute
done
Notes:
- The two files are compared using the command cmp -s (silent compare, that means
no output will be reported). If the files are not different, we jump back to the
beginning of the loop (continue), and the process will sleep again.
- If there is a difference, a new error entry has been posted to the error log. In this
case, we inform the operator that a new entry is in the error log. Instead of print
you could use the mail command to inform another person.
errnotify:
en_pid = 0
en_name = "sample"
en_persistenceflg = 1
en_label = ""
en_crcid = 0
en_class = "H"
en_type = "PERM"
en_alertflg = ""
en_resource = ""
en_rtype = ""
en_rclass = "disk"
en_method = "errpt -a -l $1 | mail -s DiskError root"
Notes:
List of descriptors
Here is a list of all descriptors for the errnotify object class:
en_alertflg Identifies whether the error is alertable. This descriptor is
provided for use by alert agents with network management
applications. The values are TRUE (alertable) or FALSE (not
alertable).
en_class Identifies the class of error log entries to match. Valid values are
H (hardware errors), S (software errors), O (operator messages),
and U (undetermined).
en_crcid Specifies the error identifier associated with a particular error.
en_label Specifies the label associated with a particular error identifier as
defined in the output of errpt -t (show templates).
en_method Specifies a user-programmable action, such as a shell script or a
command string, to be run when an error matching the selection
criteria of this Error Notification object is logged. The error
notification daemon uses the sh -c command to execute the
notify method.
The following keywords are passed to the method as arguments:
$1 Sequence number from the error log entry
$2 Error ID from the error log entry
$3 Class from the error log entry
$4 Type from the error log entry
$5 Alert flags from the error log entry
$6 Resource name from the error log entry
$7 Resource type from the error log entry
$8 Resource class from the error log entry
$9 Error label from the error log entry
en_name Uniquely identifies the object
en_persistenceflg Designates whether the Error Notification object should be
removed when the system is restarted. 0 means removed at boot
time; 1 means persists through boot.
syslogd daemon
IBM Power Systems
/etc/syslog.conf:
daemon.debug /tmp/syslog.debug
/tmp/syslog.debug:
syslogd inetd[16634]: A connection requires tn service
inetd[16634]: Child process 17212 has ended
# stopsrc -s inetd
Notes:
Function of syslogd
The syslogd daemon logs system messages from different software components
(kernel, daemon processes, system applications).
Instructor notes:
Purpose — Describe how the syslogd daemon works.
Details — Explain using the information in the student notes.
Additional information — None
Transition statement — Let’s provide some other syslogd configuration examples.
Uempty
/etc/syslog.conf:
Notes:
- The following line specifies that all messages, except messages from the mail
subsystem, are to be sent to the syslogd daemon on the host server:
*.debug; mail.none @server
Note that, if this example and the preceding example appear in the same
/etc/syslog.conf file, messages sent to /tmp/daemon.debug will also be
sent to the host server.
Facilities
Use the following system facility names in the selector field:
kern Kernel
user User level
mail Mail subsystem
daemon System daemons
auth Security or authorization
syslog syslogd messages
lpr Line-printer subsystem
news News subsystem
uucp uucp subsystem
* All facilities
Priority levels
Use the following levels in the selector field. Messages of the specified level and all
levels above it are sent as directed.
Uempty emerg Specifies emergency messages. These messages are not distributed to all
users.
alert Specifies important messages such as serious hardware errors. These
messages are distributed to all users.
crit Specifies critical messages, not classified as errors, such as improper login
attempts. These messages are sent to the system console.
err Specifies messages that represent error conditions.
warning Specifies messages for abnormal, but recoverable conditions.
notice Specifies important informational messages.
info Specifies information messages that are useful in analyzing the system.
debug Specifies debugging messages. If you are interested in all messages of a
certain facility, use this level.
none Excludes the selected facility.
Instructor notes:
Purpose — Provide some syslogd configuration examples.
Details — Explain using the information in the student notes.
Additional information — Do not explain all facilities and levels. Just explain the
examples.
Transition statement — Let’s explain how to redirect syslogd messages to the error log.
Uempty
/etc/syslog.conf:
# errpt
Notes:
Instructor notes:
Purpose — Explain how to redirect syslog messages to the AIX error log.
Details — Explain using the information in the student notes.
Additional information — None
Transition statement — What about the other way round?
Uempty
errnotify:
en_name = "syslog1"
en_persistenceflg = l
en_method = "logger Error Log: `errpt -l $1 | grep -v TIMESTAMP`"
errnotify:
en_name = "syslog1"
en_persistenceflg = l
en_method = "logger Error Log: $(errpt -l $1 | grep -v TIMESTAMP)"
errnotify:
en_name = "syslog1"
en_persistenceflg = l
en_method = "errpt -l $1 | tail -1 | logger -t errpt -p
daemon.notice"
Notes:
Command substitution
You will need to use command substitution (or pipes) before calling the logger
command. The first two examples on the visual illustrate the two ways to do command
substitution in a Korn shell environment:
- Using the ‘UNIX command‘ syntax (with backquotes) - shown in the first example on
the visual
- Using the newer $(UNIX command) syntax - shown in the second example on the
visual
Instructor notes:
Purpose — Provide information on how to direct error log entries to the syslogd.
Details — Explain using the information in the student notes.
Point out that the visual just shows three ways to accomplish the same thing. The first two
examples use two different formats to invoke command substitution, which will place the
report text on the line before execution of the logger command. The last example feeds
the report text though a pipe to the logger command.
Additional information — None
Transition statement — It is time for a checkpoint.
Uempty
• System hangs:
– High priority process
– Other
• Actions:
– Log error in the Error log
– Display a warning message on the console
– Launch recovery login on a console
– Launch a command
– Automatically REBOOT system
Notes:
Actions
If lower priority processes are not being scheduled, shdaemon will perform the specified
action. Each action can be individually enabled and has its own configurable priority
and time-out values. There are five actions available:
- Log error in the Error log.
- Display a warning message on a console.
- Launch a recovery login on a console.
- Launch a command.
- Automatically REBOOT the system.
Configuring shdaemon
IBM Power Systems
# shconf -E -l prio
sh_pp disable Enable Process Priority Problem
Notes:
Introduction
shdaemon configuration information is stored as attributes in the SWservAt ODM object
class. Configuration changes take effect immediately and survive across reboots.
Use shconf (or smit shd) to configure or display the current configuration of shdaemon.
The values shown in the visual are the default values.
Enabling shdaemon
At least two parameters must be modified to enable shdaemon:
- Enable priority monitoring (sh_pp)
- Enable one or more actions (pp_errlog, pp_warning, and so forth)
Action attributes
Each action has its own attributes, which set the priority and timeout thresholds and
define the action to be taken. The timeout attribute unit of measure is in minutes.
Example
By changing the chconf attributes, we can enable, disable, and modify the behavior of
the facility. For example:, shdaemon is enabled to monitor process priority
(sh_pp=enable), and the following actions are enabled:
- Enable the to monitor process priority monitoring:
# shconf -l prio -a sh_pp=enable
- Log error in the Error Logging:
# shconf -l prio -a pp_errlog=enable
Every two minutes (pp_eto=2), shdaemon will check to see if any process has been
run with a process priority number greater than 60 (pp_eprio=60). If not, shdaemon
logs an error to the error log.
- Display a warning message on a console:
# shconf -l prio -a pp_warning=enable (default value)
Every two minutes (pp_wto=2), shdaemon will check to see if any process has been
run with a process priority number greater than 60 (pp_wprio=60). If not, shdaemon
sends a warning message to the console specified by pp_wterm.
- Launch a command:
# shconf -l prio -a pp_cmd=enable -a pp_cto=5
Every five minutes (pp_cto=5), shdaemon will check to see if any process has been run with
a process priority number greater than 60 (pp_cprio=60). If not, shdaemon runs the
command specified by pp_cpath (in this case, /home/unhang).
Instructor notes:
Purpose — Describe how shdaemon is configured.
Details —
Additional information — shdaemon also supports lost I/O detection.
Transition statement — Let’s look at the Resource Monitoring and Control subsystem.
Uempty
Notes:
Instructor notes:
Purpose —
Details —
Additional information —
Transition statement —
Notes:
Uempty Set up
The following steps are provided to assist you in setting up an efficient monitoring
system:
1. Review the predefined conditions of your interests. Use them as they are,
customize them to fit your configurations, or use them as templates to create
your own.
2. Review the predefined responses. Customize them to suit your environment and
your working schedule. For example, the response “Critical notifications” is
predefined with three actions:
a) Log events to /tmp/criticalEvents.
b) E-mail to root.
c) Broadcast a message to all logged-in users anytime when an event or a
rearm event occurs.
You may modify the response, such as to log events to a different file anytime
when events occur, e-mail to you during non-working hours, and add a new
action to page you only during working hours. With such a setup, different
notification mechanisms can be automatically switched, based on your working
schedule.
3. Reuse the responses for conditions. For example, you can customize the three
severity responses, “Critical notifications,” “Warning notifications,” and
“Informational notifications” to take actions in response to events of different
severities, and associate the responses to the conditions of respective severities.
With only three notification responses, you can be notified of all the events with
respective notification mechanisms based on their urgencies.
4. Once the monitoring is set up, your system continues being monitored whether
your Web-based System Manager session is running or not. To know the system
status, you may bring up a Web-based System Manager session and view the
Events plug-in, or simply use the lsaudrec command from the command line
interface to view the audit log.
More information
A very good Redbook describing this topic is:
A Practical Guide for Resource Monitoring and Control (SG24-6615). This redbook can be
found at http://www.redbooks.ibm.com/redbooks/pdfs/sg246615.pdf.
Instructor notes:
Purpose — Describe RMC.
Details —
The RMC subsystem is installed by default and is delivered in one bundle named rsct.core
containing nine different filesets. All executables and related items are installed into the
/usr/sbin/rsct directory.
Additional information — Due to the number of available options on this subsystem, it
can only be controlled through the Web-based System Manager. You cannot use SMIT to
interact with RMC.
Beginning with AIX 5L, a new Resource Monitoring and Control (RMC) subsystem was
created that is comparable in function to the Reliable Scalable Cluster Technology (RSCT)
on the IBM SP type of machines. This subsystem allows you to monitor system resources,
taking a specified action when a specified condition occurs. An example condition might be
the /tmp file system becoming 90% full. An example action might be to send e-mail to a
system administrator.
By default, the RMC subsystem is automatically started by an entry in /etc/inittab as
shown in the visual. This subsystem can be controlled using the SRC commands. It also
has its own control command (/usr/sbin/rsct/bin/rmcctrl), which is the preferred way
to stop and start it.
To control the definition and state of the RMC daemons, use the rmcctrl command.
Following are some of the more common flags used:
-k Stops the RMC subsystem
-d Deletes the RMC subsystem from the SRC and the inittab
-a Adds the RMC subsystem to the SRC and to the inittab
-s Starts the RMC subsystem
-A A convenient combination of -a and -s
Configuration and management of the RMC subsystem is accomplished through the
Web-based System Manager. A SMIT interface is not available.
Transition statement — Let's take a look at the Condition Properties screen General tab.
Uempty
RMC conditions property screen:
General tab
IBM Power Systems
Notes:
Conditions
A condition monitors a specific property, such as total percentage used, in a specific
resource class, such as JFS.
Each condition contains an event expression to define an event and an optional rearm
event.
Instructor notes:
Purpose — Describe RMC condition properties.
Details — The event expression is a combination of the monitored property, a
mathematical operator and some number for example, PercentTotUsed > 90 in the case of
a file system.
The rearm expression is a similar entity, for example PercentTotUsed < 85.
To access this properties panel, launch the Web-based System Manager. Select the
Monitoring icon and then select Conditions.
Additional information — When a monitored value matches an expression (for example,
exceeded a file system usage threshold) the expression is disabled. This is in order to
prevent a flood of actions as opposed to a single action (such as an e-mail). The
expression stays disabled until the value matches the rearm expression. This is often the
result of the administrator taking actions to fix the potential problem (such as deleting
unnecessary files).
The rearm expression could be triggering an action. When it does, the rearm expression is
disabled and is not reenabled until the expression is matched.
Transition statement — Let’s take a look at the Condition Properties screen, Monitored
Resources tab.
Uempty
RMC conditions property screen:
Monitored Resources tab
IBM Power Systems
Figure 3-23. RMC conditions property screen: Monitored Resources tab AN151.0
Notes:
Monitoring condition
You can monitor the condition for one or more resources within the monitored property,
such as /tmp, or /tmp and /var, or all of the file systems.
Instructor notes:
Purpose — Describe RMC monitored resources screen.
Details —
Additional information — The lscondition command may also be used to list the
existing conditions.
Transition statement — Let’s take a look at the Action Properties screen, General tab.
Uempty
RMC actions property screen:
General tab
IBM Power Systems
Notes:
Defining an action
To define an action, you can choose one of the following predefined commands:
- Send mail
- Log an entry to a file
- Broadcast a message
- Send an SNMP trap
You can also specify an arbitrary program or script of your own by using the Run program
option.
Instructor notes:
Purpose — Describe RMC Action Properties, General tab screen.
Details —
Additional information —
Transition statement — Let’s take a look at the Action Properties, When in Effect screen.
Uempty
RMC actions property screen:
When in Effect tab
IBM Power Systems
Figure 3-25. RMC actions property screen: When in Effect tab AN151.0
Notes:
Instructor notes:
Purpose — Describe RMC Action Properties, When in Effect tab screen.
Details —
Additional information — Mention that because the logevent script uses the alog
command to log events to the files designated, the content of these files can be listed with
the alog command.
Transition statement — Let’s look at the management of daemons which the RMC facility
depends upon.
Uempty
RMC management
IBM Power Systems
Notes:
Notes:
Checkpoint
IBM Power Systems
Notes:
Checkpoint solutions
IBM Power Systems
1. Which command generates error reports? Which flag of this command is used to
generate a detailed error report?
errpt
errpt -a
2. Which type of disk error indicates bad blocks?
DISK_ERR4
3. What do the following commands do?
errclear Clears entries from the error log.
errlogger Is used by root to add entries into the error log
4. What does the following line in /etc/syslog.conf indicate?
*.debug errlog
All syslogd entries are directed to the error log.
5. What does the descriptor en_method in errnotify indicate?
It specifies a program or command to be run when an error
matching the selection criteria is logged.
Unit summary
IBM Power Systems
Notes:
• Use the errpt (smit errpt) command to generate error reports.
• Different error notification methods are available.
• Use smit errdemon and smit errclear to maintain the error log.
• Some components use syslogd for error logging.
• The syslogd configuration file is /etc/syslog.conf.
• You can redirect syslogd and error log messages.
• You can monitor resource conditions and take automated action, such as sending mail
to root.
References
SC23-6616 AIX Version 6.1 Installation and migration
SG24-7296 NIM from A to Z in AIX 5L (Redbook)
http://www.redbooks.ibm.com
IBM Redbooks
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
NIM overview
IBM Power Systems
Notes:
Purpose of NIM
NIM provides centralized AIX software administration for multiple machines over the
network. NIM supports full AIX operating system installation as well as installing or
updating individual packages and performing software maintenance.
Advantages
NIM provides several advantages:
- Provides one central point for AIX software administration for all the NIM clients
- Eliminates need to walk a CDROM or tape to each system and the need for a tape
drive or CDROM drive at every system
- Installations can be initiated from the master machine (push) or from the client (pull)
Method Description
Command Line The command line gives you complete control, but the
number of options needed can be somewhat daunting.
Still, if you want to script NIM operations, you must use
the command line. The basic NIM commands are:
• nimconfig: Configure NIM master.
• nim: Perform NIM operations from the master.
• nimclient: Perform NIM operations from a client.
• niminit: Configure NIM client.
• lsnim: List information about NIM objects.
SMIT There are basically two paths into SMIT’s NIM interface:
• smit nim: Configure master and client machines and
perform all NIM operations.
• smit eznim: This provides a simplified environment to
configure machines and perform some basic NIM
operations. This may be a good starting point for a
new NIM system administrator.
Web-based System You can also used IBM’s Web-based System Manager to
Manager (wsm) configure and manage your NIM environment.
- As you become familiar with the NIM environment, you may find that you use a
combination of methods. For example, you may use the command line to list NIM
status and perform simple NIM operations, while using SMIT or WebSM for more
complex operations or for operations that you do not perform frequently.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide an overview of NIM function.
Details —
Additional information —
Transition statement — Let’s take a closer look at the three roles illustrated in the
overview.
Uempty
Machine roles
IBM Power Systems
• Master
– File sets:
• bos.sysmgt.nim.master
• bos.sysmgt.nim.client
• Stores NIM database
– NIM administration
– Can initiate push installations to NIM clients
– AIX version >= all other NIM machines
• Client
– File sets:
• bos.sysmgt.nim.client
– Can initiate pull installations from a server
• Server
– Any machine, master or client
– Serves NIM resources to clients, thus requires adequate disk space and
throughput
Notes:
There are three basic roles that a machine can assume in the NIM environment: master,
client, and resource server. There can only be one master machine in a NIM
environment, all other machines are clients. Any machine, master or client, can be a
resource server.
NIM software
All machines in the NIM environment must install bos.sysmgt.nim.client. The master
machine must also install bos.sysmgt.nim.master and bos.sysmgt.nim.spot.
Master
The NIM master manages all other machines that participate in the NIM environment. The
NIM database is stored on the NIM master. The NIM master is fundamental for all
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
of the operations in the NIM environment and must be set up and operational before
performing any NIM operations. The master can initiate a software installation to a
client, which is called a push installation.
Also, the NIM master is the only machine that is given the permissions and ability to
execute NIM operations on other machines within the NIM environment. The rsh
command is used to remotely execute commands on clients which allows the NIM
master to install to a number of clients with one NIM operation. With AIX 5.3 or AIX 6.1,
nimsh can be used as an alternative to rsh.
Client
All other machines in a NIM environment are clients. Clients can request a software
installation from a server machine (pull installation).
Server
Any machine, the master or a client, can be configured by the master as a server for a
particular software resource. Most often, the master is also the server. However, if your
environment has many nodes or consists of a complex network environment, you may
want to configure some nodes to act as servers to improve installation performance.
Servers must have adequate disk space for the resources they will be providing. They
also need network connections to the client machines they serve and sufficient
bandwidth to respond to the expected volume.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Transfer control to
mini-runtime environment
Configure CD
devices for
installation
tape
Figure 4-4. Boot process for AIX installation (tape or CD) AN151.0
Notes:
To understand how NIM works, we need to understand what happens when we install
AIX on a system. We start by reviewing what happens when we boot from CD or tape to
install AIX.
Power on
A Power machine must be booted or reset in order to install the AIX Base Operating
System (BOS).
Configuring devices
In order to keep the boot image small, not all of the software needed to configure
devices is included in the boot image. These additional files are contained in a small
usr directory tree called a Shared Product Object Tree or SPOT. The boot script mounts
this usr directory tree on /SPOT in the memory file system. The SPOT is mounted
directly from the CDROM.
Note: Since tape devices do not support file system operations, the SPOT files are
included in the boot image in the case of booting from a tape drive.
Install script
Once the devices have been configured, rc.boot invokes the BOS installation program
(bi_main), and installs AIX from the installation images on the tape or CD.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Review the flow and components of an AIX installation from tape or optical
media.
Details —
Additional information —
Transition statement — If we next look at how a network install is handled, we will see
that there are many similarities with a regular installation, of course with some significant
variations.
Uempty
Power on machine
Transfer control to
nim server client mini-runtime environment
bootpd
/etc/bootptab Invoke boot script
en0
bootp
boot file name Configure
network
tftp boot file devices for
boot file installation
Notes:
Booting over the network, using NIM, is essentially the same as booting from CD or
tape, except that the boot file (SPOT file) and installation images come from the server
system over the network.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
NIM objects
IBM Power Systems
•Object classes
Re
s
ork
sou
– Networks
tw
Ne
rce
– Machines
s
– Resources
Machines
•Group objects
– mac_group
– res_group
Notes:
NIM is made up of various components, called objects. There are three classes of
objects: machines, networks, and resources.
All information about the NIM environment is stored in Object Data Manager (ODM)
databases on the NIM master system.
Network objects
Network objects are objects in the NIM database that represent information about each
Local Area Network (LAN) that is part of the NIM environment. These objects and some
of their attributes reflect the physical characteristics of the network. NIM network objects
are not used to perform management tasks in the overall network environment; they are
only used to represent the physical network topology of the NIM environment. In other
words, if something changes in the physical network environment, you must remember
to make the change in the NIM database as well.
Uempty There are five types of networks supported by NIM: Token-Ring, Ethernet, ATM, FDDI,
and generic. These network types are represented as network objects in the NIM
environment.
Machine objects
Machines in the NIM environment are simply the machines that will be managed by
NIM.
Resource objects
All operations on clients in the NIM environment require one or more NIM resources.
NIM resource objects represent the files, directories, and devices that are used in order
to support each type of NIM operation. Some resources are AIX filesets (or devices
which contain filesets) that can be installed on a client machine. Other resources are
scripts or configuration files that are used in the installation process.
The location and other attributes for these resources are stored as resource objects in
the NIM database.
Group objects
NIM supports two types of group objects:
- mac_group
A machine group is a group of machine objects. You can use a machine group to
simplify performing a NIM operation on multiple machines.
- res_group
A resource group is a group of resource objects. If you have a set of resources that
you typically want to use at the same time, you can create a resource group to
simplify allocating those resources.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the NIM objects.
Details —
Additional information —
Transition statement — It is useful to be able to list the existing defined objects and their
attributes. Let’s look at the lsnim command that provides this information. Then we will
explain the meaning and use of the displayed attributes for each type of object. Later, we
will cover how to create these objects.
Uempty
# lsnim –l ent0
ent0:
class = networks
type = ent
Nstate = ready for use
prev_state = information is missing from this object's definition
net_addr = 10.31.192.0
snm = 255.255.240.0
routing1 = default 10.31.192.1
Notes:
The lsnim command is used to list various types of NIM information. You have the
opportunity to experiment with lsnim in the exercise.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how to use the lsnim command to display objects and their attributes.
Details — Keep the focus on these uses of lsnim. The listing of all NIM objects and the
listing of attributes for a particular object are the two most common uses of lsnim. The other
lsnim options are better left to the NIM course.
Additional information —
Transition statement — We will now discuss the various NIM objects in the context of
configuring NIM. Let’s start with a summary of the basic NIM configuration procedure.
Uempty
NIM configuration
IBM Power Systems
• Configure master
– Install master NIM file sets.
– Run nimconfig.
• Define resources
– Create real resource with full path.
– Create resource object to represent.
• Define networks
– How do clients on networks access the master.
• Define clients
– Able to relate network address of the client with object name
• Allocate resources to clients
– Different operations need different resources.
• NIM operations on clients
– Setting up for operation
– Initiating operation
Notes:
Installing NIM
The NIM filesets that need to be installed on a machine designated to act as NIM
master are:
- bos.sysmgt.nim.client
- bos.sysmgt.nim.master
- bos.sysmgt.nim.spot
Configure master
Configuring the master machine consists of installing the master filesets and running
nimconfig. You must specify the primary network interface and a NIM network name
for the network which is attached to the primary interface. There are several optional
attributes which can be specified.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
nimconfig creates the NIM database and the /etc/niminfo configuration file. It also
starts the NIM daemon (nimesis) and creates an entry in /etc/inittab so that
nimesis is started on every boot of the master machine.
Allocate resources
Once the resource and machine objects are defined, you need to decide what operation
you want to perform on your client machine. For each operation, there are different
resources needed.
Next you need to allocate the resource to your client. This identifies which resource
object will be used to implement the client operation. There are two ways in which this is
done:
- Use the nim -o allocate operation (or equivalent SMIT dialog) to relate the resource
to the machine.
- Use a SMIT dialog which prompts for the resources to allocate as part of the
machine operation definition.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
resources objects
IBM Power Systems
•Object types
– boot Represents the network boot image resource
– nim_script Directory for customization scripts created by NIM
– spot Shared Product Object Tree - equivalent to /usr
filesystem
– lpp_source Source device for software product images
– bosinst_data Config file used during base system installation
– image_data Config file used during base system installation
– mksysb A mksysb image
– script A user created script which is executed on a client
to perform customization
– resolv_conf Configuration file for name-server information
– . . . (additional resource types)
• Attributes
– location Directory path
– server Machine which servers this resource
– Rstate,
prev_state Status attributes
– . . . (additional attributes)
Notes:
Resources are the files and directories that NIM uses to install software on the clients.
Resource types
Resource types identify the different types of files used by NIM. For example:
- An lpp_source resource is a directory containing product images to be installed.
- A spot resource contains the files used during the boot operation.
- A script resource is a user definable script which can be used to perform
customization on a newly installed client.
- A mksysb resource is a backup image that can be used to install a client.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover resources objects and their attributes.
Details — Note the variety of resources and that the attributes basically map between the
resource name and the location of the file or directory that contains that resource. Be
careful not to pre-teach the details on resources covered on later visuals, such as
lpp_source, spot, or mksysb. These are covered after the discussion of operations, so they
can be discussed in the context of those operations (in particular, the bos_inst operation).
Additional information —
Transition statement — Let’s take a closer look at the resource types that we will need to
define to support a NIM installation of an AIX operating system, starting with the
lpp_source.
Uempty
•lpp_source
– Directory containing software product images
– Supports NIM install operations (bos_inst and cust)
– Also used for creation of spot resource o py
enc
g
•Defining an lpp_source:
• # smit nim_mkres
aix61-00-00 aix61-01-00
bos filesets
Notes:
lpp_source
When a resource of this type is defined, it represents a directory in which software
product images are stored. lpp_source resources are used to support NIM install
operations. An lpp_source can also be used as the source for the creation of a SPOT.
When you perform a NIM install operation and have allocated an lpp_source resource
to the client, NIM NFS mounts the lpp_source directory on the client, and then invokes
the installp command on the client to install from the directory. When installp
finishes, NIM automatically unmounts the resource.
simages attribute
This attribute is used to indicate that an lpp_source resource contains the set of
installable images to which NIM requires access to perform its basic functionality. This
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Uempty lpp_source directory and run nim -o check to update the lpp_source attributes.
Previously, SMIT allowed you to add packages to an lpp_source through the smit
nim_bffcreate fast path. However, this SMIT function does not check to see if the
lpp_source is allocated or locked, nor does it update the simages attribute when
finished. The update operation has been created to address this situation.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover the definition of the lpp_source.
Details —
Additional information —
Transition statement — Once we have an lpp_source, we next need to use the
lpp_source to generate a matching SPOT. Let’s look at how that is done.
Uempty
• spot
– /usr directory tree used during network boot
lppsource
– Matching network boot images generated:
- /tftpboot/<spot_name>.<Platform>.<Kernel>.<Network>
• Defining a SPOT
# nim -o define -t spot \
-a server=<machine> \
-a location=<directory> \
spot
-a source=<lpp_source_name> \
[ optional attributes ] \
<spot_name> spot61-00-00 spot61-01-00
usr
• # smit nim_mkres
Notes:
SPOT
• Components
- A /usr file system
A Shared Product Object Tree (SPOT) is a directory containing AIX code that is
equivalent in content to the code that resides in a /usr file system on a system
running AIX. The NIM SPOT creation process restores files from AIX filesets into the
directory in which the SPOT resides.
The SPOT is NFS-mounted on a booting client to provide necessary device support
for the boot process.
Boot image:
As part of the creation of a SPOT resource, NIM also creates network boot images.
The network boot images are constructed in /tftpboot on the same machine in
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
which the SPOT is created. The boot images are constructed with code from the
newly created SPOT. The boot images are also sometimes called spot files. The
boot image file is transferred to the client system using the BOOTP protocol.
Since one SPOT can potentially support several types of machines, several boot
image files may be created. The naming convention identifies each boot image as:
<spot_name>.<Platform>.<Kernel>.<Network>, where:
• <Platform> identifies which architecture this boot image supports: chrp, rspc,
and so forth.
• <Kernel> specifies whether this boot image contains a multi-processor (mp) or
uni-processor (up) kernel.
• <Network> identifies the network type: ent, tok, and so forth.
These days, the only combination most of us work with is: chrp.mp.ent.
During a network boot, the boot image is transferred over the network and loaded
into the client’s memory.
- /tftpboot
It is good practice to make /tftpboot be a separate file system. This removes the
risk of filling the root file system. If you are supporting multiple AIX versions on
multiple machine types or multiple network types, this directory can get quite large.
• Defining a SPOT resource
- Command line:
The visual shows the nim syntax to define a spot. The -t flag identifies the type of
object you wish to define. In addition, you must specify the following required attributes:
• server=<machine>
NIM name for the machine which serves this resource
• location=<directory>
Directory (on the server) where the SPOT files are located
• source=<lpp_source_name>
This attributes points to the location of the files used to create the SPOT
resource. This can be an existing lpp_source resource, a device name (for
example: /dev/cd0) or a directory which contains the source filesets used to
create the SPOT. Most commonly, the lpp_source resource is created first and
then the spot is created from the lpp_source.
• <spot_name>
The last argument on the nim command line is the name of the object you are
operating on, in this case, the name of the SPOT resource we are creating.
- Optional attributes
There can be a number of optional attributes, including:
Uempty • installp_flags=<flags>
NIM calls installp to create the SPOT. By default, NIM uses the -agX flags
when calling installp. You can use installp_flags to specify the options you
require.
• auto_expand={yes|no}
Indicates that file systems should be automatically expanded if additional space
is needed.
- Defining a SPOT using SMIT
The visual shows the SMIT fast path for defining resource objects. SMIT opens with
a window that allows you to select which type of resource you want to define. Once
you select a resource type, SMIT opens a window with the necessary fields to
specify the resources and attributes for that type of object, in this case, a SPOT.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover how to define a SPOT.
Details —
Additional information —
Transition statement — While we can use an lpp_source and matching SPOT to install a
new operating system, quite often the network installs are actually recoveries of mksysb
images. This is either to recover a lost rootvg or to clone an AIX image to other machines or
LPARs. Let’s see how we define a mksysb resource.
Uempty
•mksysb
– Identifies a mksysb system backup image file
– Used for bos_inst operations
• Defining a mksysb
# nim -o define -t mksysb \
-a server=<machine> \
-a location=<mksysb_path> \
[ optional attributes ] \
<mksysb_name>
• # smit nim_mkres
Notes:
mksysb
A mksysb resource represents a system backup image file created using the mksysb
command. A mksysb resource can be used as the source of the BOS run-time files
when a bos_inst is performed.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
- location=<mksysb_path>
If the system backup image already exists, enter the name of the file where the
image resides. If you are creating the system backup image as part of this operation,
enter the name of the file where you want the image placed after it is created.
There are a number of optional attributes, including:
- mk_image={yes|no}
If the backup file already exists, specify no (the default). If you want nim to create a
new backup file, specify yes.
- source=<machine_name>
If you want nim to create a backup image for you, specify the NIM name of the
machine you want to back up.
- mksysb_flags=<value>
You can use this attribute to specify optional flags for the mksysb command, if
needed.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
networks objects
IBM Power Systems
•Object types
– ent Ethernet network
– fddi FDDI network
– tok Token ring network
– atm ATM network (no network boot capability)
– generic Generic network (no network boot capability)
• Attributes
– net_addr Network address for a network
– snm Subnetmask for a network
– routing<X> Routing information for a network
– Nstate,
prev_state Status attributes
– . . . (Additional attributes)
Notes:
In order to perform certain NIM operations, the NIM master must be able to supply
information necessary to configure client network interfaces. The NIM master must also
be able to verify that client machines can access all the resources provided by the NIM
server. To avoid the overhead of repeatedly specifying network information for each
individual client, NIM network objects are used to represent the networks in a NIM
environment.
Network types
NIM supports the four network types shown in the visual, plus a generic type. Network
boot support is provided for Ethernet, Token-Ring, and FDDI. Network boot operations
are not supported on ATM or generic networks. NIM supports both standard Ethernet
and IEEE 802.3 Ethernet networks.
Routing
NIM routing information represents standard TCP/IP routing information for the
networks that are part of a NIM environment. This information defines the gateways that
are used to establish communication between the master machine and the clients.
The routing<X> attribute defines a route and includes:
- A destination (default or a NIM network name)
- A gateway address
If needed, multiple routes can be created and are numbered routing1, routing2, and
so forth.
Additional attributes
There are a number of other attributes for each network object. lsnim is probably the
easiest way to get information about NIM attributes.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover networks objects and their attributes.
Details — Point out that we do not usually define a network object directly. Instead, the
information provided when defining a machine is used to either match to an existing
network object or to create a new network object. The most important point to make is that
the networking information is from the perspective of the machine being defined. In the
network diagram shown in the visual, when defining the client, it is the router interface
which is in the network to the right that needs to be defined as the gateway. The network
option is defining how the client would network boot in order to send a bootp request to the
NIM server.
Additional information — Unlike other network adapters, ATM adapters cannot be used
to boot a machine. This means that installing a machine over an ATM network requires
special processing (refer to the AIX Installation Guide and Reference, Chapter 20. Basic
NIM Operations and Configuration for instructions). The generic network type is used to
represent all other network types where network boot support is not available. For clients
on generic networks, NIM operations that require a network boot, such as bos_inst and
diag, are not supported. However, non-booting operations, such as cust and maint, are
allowed.
Transition statement — Next, let’s look at the machines object.
Uempty
machines objects
IBM Power Systems
•Object types
– master
– standalone
– diskless
Master
– dataless
• Attributes
– platform Architecture
Standalone
– netboot_kernel Up or mp
– if<X> Network interface
information
– serves Resource served
by this machine Diskless
– Cstate, prev_state, Mstate
Status attributes
– . . . (additional attributes)
Dataless
Notes:
NIM supports four types of machines: the master type and three types of clients:
stand-alone, diskless, and dataless.
Master
The master machine is defined by installing the master fileset, and then performing
some quick configuration. There can only be one master in the NIM environment. Once
a machine is defined as the master, it can participate in NIM operations.
Stand-alone clients
Stand-alone clients have local disk resources. They are installed from the NIM server,
but once installed, they boot and operate from their local disks.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Diskless clients
Diskless clients have no disks of their own. They run entirely using resources from the
NIM server.
Dataless clients
Dataless machines can only use a local disk for paging space and the /tmp and /home
file systems. All of the other storage is provided over the network by the NIM server.
Machine attributes
Each machine object belongs to one of the four machines’ object classes. Additionally,
machine objects store other attributes about the machine. The visual shows a few of
them:
- The platform attribute describes the machine architecture (chrp, rspc, and so
forth).
- netboot_kernel indicates which type of kernel is required, uni-processor (up) or
multi-processor (mp).
- if<X> is used to provide information about a machine’s network interfaces. If there
are multiple interfaces, they are numbered: if1, if2, and so forth. This attribute
includes the NIM network this interface connects to, the host name, the MAC
address, and the network type.
- The serves attribute identifies resources that are served by this machine. If the
machine serves several resources, there will be a serves attribute for each
resource.
- Cstate indicates the NIM operation that is currently being performed on a machine
or that no NIM operations are currently being performed.
- prev_state shows the previous Cstate.
- Mstate shows the execution state for a machine.
Note: NIM attempts to keep the value of this attribute synchronized with the
machine's execution state, but NIM does not guarantee its accuracy. Perform the
check operation on the machine for NIM to attempt to determine the machine's
execution state.
Additional attributes
There are a number of other attributes for each machine object. lsnim is probably the
easiest way to get information about NIM attributes.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Follow these steps to add a client with the network information using SMIT:
• On the NIM master, add a standalone client to the NIM environment by using SMIT
(nim_mkmac is the fast path).
• Specify the host name of the client.
- This is the name translation of the IP address of the install adapter of this machine.
By default, this also becomes the hostname of this client when the client is installed.
If using DNS, enter in the long host name here (lpar1.my.company.com).
• The next SMIT screen displayed depends on whether NIM already has information
about the client's network. Supply the values for the required fields or accept the
defaults. Use the help information and the LIST option to help you specify the correct
values to add the client machine.
The if1 quoted value, in the example, has multiple space delimited fields as follows:
- network is the network object name.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover how to define standalone machines.
Details —
Additional information —
Transition statement — An easy way to define a machine is to use SMIT. The visual
shows the SMIT menu path to use, but let’s look at the resulting dialog panel.
Uempty
Define a Machine
Notes:
NIM Machine Name/Host Name - There are two names given to your client: a NIM name
and a hostname. The NIM name is what is used when performing operations on this client.
The hostname becomes the system-wide hostname of this client and is also the name
associated with the client's adapter that NIM uses to do the client install. In our case, we
used a short name on the prior panel. Hence, the NIM name and hostname are identical. If
we had used a long name on the prior panel, then we would see the long name for the
hostname and the short name for the NIM Name. For example, if we put
lpar1.my.company.com on the prior panel, then the hostname would be
lpar1.my.company.com and the NIM name would be lpar1.
Machine Type - Only one client machine type is used anymore - standalone.
Hardware Platform Type - You can choose between chrp, rspc or the really old classical
rs6k. Since the chrp architecture came out in the mid 90s, most folks are using that today. If
you want to double check what architecture your client is using, run the command: getconf
-a | grep MACHINE_ARCHITECTURE. On older AIX release levels, try the bootinfo -p
command.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Kernel Type - If a client machine is running the 64-bit kernel, then mp should be chosen.
However, if the client is running the 32-bit kernel, either the up or mp kernel may be
chosen. To determine what client is currently, run the ls -l /usr/lib/boot/unix
command. Notice whether it is linked to the 64 up or mp kernel in that same directory. Also
the getconf -a can be run to determine if the machine is capable of running an mp
kernel. An MP_CAPABLE setting of 1 means yes. On older releases, run the bootinfo -z
command to find out if the machine can handle mp. A setting of 1 again means yes.
Starting with version 6.1, AIX only uses a 64 bit kernel.
Communication Protocol - Either the less secure shell protocol (rsh) may be used or the
newer (nimsh) protocol (which is available in AIX 5L 5.3 and later versions of AIX).
Note: Each client can have a different setting.
Cable Type - Most configurations today are set to N/A (not applicable, as modern adapters
are autosensing of the connection type, or only support a single type (such as twisted pair
or fiber).This can be double checked by running the lsattr -El entX command to
notice whether the cable_type field shows. If not, then setting to N/A should work. If running
twisted pair cable, then setting it to tp should work.
Network Speed/Duplex - These settings are only used when performing a push boot
operation on the client. If not set, the current SMS speed/duplex settings for your install
adapter are used.
NIM Network - This is the NIM network to which the client is assigned.
Hardware Address - This is the MAC address of the client. It is only needed for BOOTP
broadcast operations. This MAC address, if ever needed, can be retrieved by looking at
your client's Remote IPL SMS menus.
Logical Device Name - This is the name of NIC physical adapter over which you plan to
install. For example, it might be ent0 or ent1. This adapter receives the hostname you have
set above on this screen in the Host Name field when the client is installed.
IPL ROM Emulation - This is only set for machines that do not support network boot.
Please see online documentation for details.
CPU_Id - This is the machine ID retrieved from running the uname command on the client.
It will be used to uniquely identify this client in the future. You do not have to set this, NIM
will configure this.
Machine Group - You can assign a client to a machine group.
Command Line - The equivalent NIM command for the above operation is:
nim -o define -t standalone -a if1="network1 lpar1 0 ent0" \
-a cable_type1="N/A" -a connect=nimsh \
-a platform=chrp -a netboot_kernel=mp lpar1
Use the lsnim -q define -t standalone command for more information or see your
nim man page.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
NIM operations
IBM Power Systems
•Operations on clients
– bos_inst
• rte
• mksysb
– cust
– maint
– diag
– maint_boot
•Procedure
– Allocate resources to clients (for intended operation)
– Perform operation
– Unallocate resources
•Other NIM object operations
– define, change, remove, allocate, deallocate, maint, lslpp,
lppchk, check, and so forth
Notes:
Operations on clients
NIM supports several different types of operations to install and manage software on
NIM clients. In addition, there are operations to manage the NIM objects themselves.
For the purposes of this class, we are primarily interested in three client operations:
- bos_inst
Allows you to install AIX on a client
- cust and maint
Allows you to update and maintain AIX software
- maint_boot
Allows you to boot a client to maintenance mode over the network
Uempty bos_inst
A bos_inst operation is used to perform a Basic Operating System (BOS) installation
on a client. There are two types of bos_inst operations: rte and mksysb.
bos_inst customization
The NIM installation process provides the ability to invoke a customization script after
AIX is installed on the system. This is done by allocating a script resource to the client
before performing the bos_inst. That script could be used to perform such
customization as setting passwords, changing network addresses, and so forth.
cust
This NIM operation performs software customization on a running NIM client. You can
use the cust operation to:
- Update existing software
- Install additional software
- Run a customization script
maint
This NIM operation performs software maintenance operations on clients, such as
committing applied software, removing software, and so forth.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
diag
This NIM operation enables the client to boot to diagnostics over the network.
maint_boot
This operation enables the client to boot to maintenance mode over the network.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Cover the various NIM operations that can be performed on machines.
Details —
Additional information —
Transition statement — Let’s take a closer look at the most common NIM operation -
setting up for installation of an operating system.
Uempty
bos_inst operation
IBM Power Systems
•Command line
# nim -o bos_inst \
-a lpp_source=<lpp_res_name> \
-a spot=<spot_name> \
-a source={rte|mksysb} \
-a mksysb=<mksysb_name> \
-a boot_client={yes|no} \
[optional attributes] \
<client_name>
• # smit nim_bosinst
Notes:
bos_inst
Configuring NIM to perform a bos_inst can be done from the command line or through
SMIT. There are two steps: allocating resources to the client and enabling the
bos_inst. It is also possible to combine these steps into one command:
# nim -o bos_inst -a lpp_source=<lpp_res_name> \
-a spot=<spot_name> \
[additional resources] \
[-a source={rte|mksysb} \
[additional attributes] \
<client_name>
If you use SMIT to enable a bos_inst, SMIT opens a series of windows to prompt you
for the required information and then displays a window where you can set additional
optional attributes.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Required information
The required information for a bos_inst operation is:
- <client_name>
As always, the last argument specifies the NIM object you want to operate on. In this
case, this is the target client machine that you wish to install.
- spot=<spot_name>
Specifies the SPOT resource you wish to use.
- lpp_source=<lpp_res_name>
This is the name of the lpp_source resource you wish to use for the installation. In
AIX 5L V5.3 and later, this attribute is not required for a mksysb install (see note
below).
Optional information
Optional attributes include:
- source={rte|mksysb}
mksysb=<mksysb_name>
If you do not specify the source attribute, nim performs a rte bos_inst. If you set
source=mksysb, then you must use the mksysb attribute to specify the name of the
mksysb resource you wish to use.
Note: In most cases, you must still include an lpp_source resource, even if you are
doing a mksysb install. With AIX 5L and later, if you have created a mksysb that
includes all devices, you do not need to specify an lpp_source.
- boot_client={yes|no}
When set to yes, the master attempts to reboot the client machine automatically for
reinstallation. For this option to succeed, the client must be running and initialized as
a NIM client or have rhosts permissions granted to the master. If set to no, the
server is configured to support the network boot. The actual boot would need to be
initiated later.
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
•Documentation
–NIM from A to Z in AIX 4.3
(http://www.redbooks.ibm.com/ )
–AIX Version 6.1 Installation Guide and Reference
•EZ NIM
–nim_master_setup, nim_client_setup
Notes:
Classes
You should also consider the following class.
- AU08 AIX 5L Network Installation Management (NIM)
(IBM Learning Services training course:
http://www.ibm.com/services/learning/index.html)
EZNIM
The SMIT EZNIM feature helps the system administrator by organizing the commonly
used NIM operations and simplifies frequently used advanced NIM operations.
Features of SMIT EZNIM include:
- Task-oriented menus
- Automatic resource naming that includes the level of the software used to create
them
- The user can review what steps will take place before executing a task, whenever
possible.
Use the smit eznim fast path to open the EZNIM main menu.
nim_master_setup
SMIT EZNIM has a command line equivalent: the nim_master_setup command.
For reference, here is the nim_master_setup usage message:
# nim_master_setup -h
Usage nim_master_setup: Setup and configure NIM master.
nim_master_setup [-a mk_resource={yes|no}]
[-a file_system=<fs name>]
[-a volume_group=<vg name>]
[-a disk=<disk name>]
[-a device=<device>] [-B] [-v]
Default values:
mk_resource = yes
file_system = /export/nim
volume_group = rootvg
device = /dev/cd0
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
nim_master_setup example
Here is an example to give you an idea of the NIM resources created by:
nim_master_setup (or EZNIM):
# nim_master_setup -a file_system=/csminstall/nim \
[-a volume_group=othervg] -B
Since we did not specify the device attribute, nim_master_setup will use /dev/cd0 as
the source to create the lpp_resource.
You can use -v for debug output or tail -f /var/adm/ras/nim.setup to get more
information. In this example, we show the output of various commands to illustrate what
nim_master_setup has done.
# lsnim
master machines master
boot resources boot
nim_script resources nim_script
master_net networks ent
master_net_conf resources resolv_conf
bid_ow resources bosinst_data
520lpp_res resources lpp_source
520spot_res resources spot
basic_res_grp groups res_group
# df -k
...
/dev/lv10 1474560 491980 67% 11578 4% /csminstall/nim
/dev/lv11 49152 47572 4% 17 1% /tftpboot
# lsnim -l master_net
master_net:
class = networks
type = ent
Nstate = ready for use
prev_state = ready for use
net_addr = 9.41.90.0
snm = 255.255.255.0
routing1 = default 9.41.90.1
# lsnim -l master_net_conf
master_net_conf:
class = resources
type = resolv_conf
Rstate = ready for use
prev_state = unavailable for use
location = /csminstall/nim/resolv.conf
alloc_count = 0
# lsnim -l bid_ow
bid_ow:
class = resources
type = bosinst_data
Rstate = ready for use
prev_state = unavailable for use
location = /csminstall/nim/bid_ow
alloc_count = 0
server = master
# lsnim -l 520lpp_res
520lpp_res:
class = resources
type = lpp_source
arch = power
Rstate = ready for use
prev_state = unavailable for use
location = /csminstall/nim/lpp_source/520lpp_res
simages = yes
alloc_count = 0
server = master
# lsnim -l 520spot_res
520spot_res:
class = resources
type = spot
plat_defined = chrp
arch = power
bos_license = yes
Rstate = ready for use
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# lsnim -l basic_res_grp
basic_res_grp:
class = groups
type = res_group
member1 = bid_ow
member2 = 520lpp_res
member3 = 520spot_res
member4 = master_net_conf
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Exercise 4 overview
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint
IBM Power Systems
Notes:
Checkpoint solutions
IBM Power Systems
Additional information —
Transition statement —
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit summary
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 4. Network Installation Manager basics 4-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
References
Online AIX Version 6.1 Operating system and device
management
Note: References listed as “online” above are available at the
following address:
http://publib.boulder.ibm.com/infocenter/systems
Uempty
Unit objectives
IBM Power Systems
Notes:
Introduction
Hardware and software problems might cause a system to stop during the boot
process.
This unit describes the boot process of loading the boot image from the boot logical
volume and provides the knowledge a system administrator needs to analyze the boot
problem.
Possible failures
Notes:
Uempty To use an alternate boot location you must invoke the appropriate bootlist by pressing
function keys during the boot process. There is more information on bootlists, later in
the unit.
Last steps
Passing control to the operating system means that the AIX kernel (which has just been
loaded from the boot image) takes over from the system firmware that was used to find
and load the boot image. The operating system is then responsible for completing the
boot sequence. The components of the boot image are discussed later in this unit.
All devices are configured during the boot process. This is performed in different
phases of the boot by the cfgmgr utility.
Towards the end of the boot sequence, the init process is started and processes the
/etc/inittab file.
Instructor notes:
Purpose — Introduce the AIX boot process. Keep this at the overview level.
Details —
Additional information — You might mention at this point that logical key switches are
used to determine which bootlist is used. If you press F5 or numeric 5, the system tries to
boot from a default bootlist that contains the diskette, CD-ROM, hard drive, and network. If
it boots from the hard drive, it will load AIX diagnostics rather than perform a normal boot.
Transition statement — Let’s show how the boot image is loaded from the boot logical
volume when booting from disk.
Uempty
Firmware
Boo
Boot ts
devices
(1) Diskette codetrap
(2) CD-Rom RAM
(3) Internal disk Boot Logical Volume
(4) Network (hd5)
hdisk0
Boot
controller
Notes:
Introduction
This visual shows how the boot logical volume is found during the AIX boot process.
Machines use one or more bootlists to identify a boot device. The bootlist is part of the
firmware.
Bootstrap code
System p and pSeries systems can manage several different operating systems. The
hardware is not bound to the software. The first block of the boot disk contains
bootstrap code that is loaded into RAM during the boot process. This part is sometimes
referred to as System Read Only Storage (ROS). The bootstrap code gets control. The
task of this code is to locate the boot logical volume on the disk, and load the boot
image. In some technical manuals, this second part is called the Software ROS. In the
case of AIX, the boot image is loaded.
Notes:
AIX kernel
The AIX kernel is the core of the operating system and provides basic services like
process, memory, and device management. The AIX kernel is always loaded from the
boot logical volume. There is a copy of the AIX kernel in the hd4 file system (under the
name /unix), but this program has no role in system initialization. Never remove
/unix, because it is used for rebuilding the kernel in the boot logical volume.
RAMFS
This RAMFS is a reduced or miniature root file system which is loaded into memory and
used as if it were a disk-based file system. The contents of the RAMFS are slightly
different depending on the type of system boot:
Uempty
Type of boot Contents of RAM file system
Programs and data necessary to access rootvg and
Boot from system hard disk bring up the rest of AIX. When booted from in service
mode, it will boot a diagnostics facility.
Boot from the Installation Programs and data necessary to install AIX or
CD-ROM perform software maintenance
Boot from Diagnostics Programs and data necessary to execute standalone
CD-ROM diagnostics
Reduced ODM
The boot logical volume contains a reduced copy of the ODM. During the boot process,
many devices are configured before hd4 is available. For these devices, the
corresponding ODM files must be stored in the boot logical volume.
Instructor notes:
Purpose — Describe the components of the BLV.
Details — Introduce the different components as described in the student material.
Describe that the AIX kernel from the BLV is used during the boot process.
Additional information — Describe what the term reduced ODM means. Explain that
device support is available only for devices that are marked as base devices in PdDv.
The protofiles (in /usr/lib/boot and /usr/lib/boot/protoext) are used by the
bosboot command to determine which files should be put into the RAMFS image that is
included in the boot image.
Transition statement — Many system boot problems involve being unable to locate a
good boot image. In order to fix these problems, we often need to boot into special modes.
Let’s look at what determines which boot device is used.
• Normal bootlist:
# bootlist -m normal hdisk0 hdisk1
# bootlist -m normal -o
hdisk0 blv=hd5
hdisk1 blv=hd5
Notes:
Introduction
You can use the command bootlist or diag from the command line to change or
display the bootlists. You can also use the System Management Services (SMS)
programs. SMS is covered on the next visual.
bootlist command
The bootlist command is the easiest way to change the bootlist. The first example
shows how to change the bootlist for a normal boot. In this example, we boot either from
hdisk0 or hdisk1. To query the bootlist, you can use the -o option.
The second example shows how to display the customizable service bootlist.
Uempty The bootlist command also allows you to use IP parameters to use when using a
network adapter:
» # bootlist -m service ent0 gateway=192.168.1.1 bserver=192.168.10.3 \
client=192.168.1.57
Using the service bootlist in this way can allow you to boot to maintenance or diagnostic
using a NIM server without having to use SMS to specify the network adapter as the
boot device.
Types of bootlists
The normal bootlist is used during a normal boot.
The default bootlist (hard coded in the firmware) is called when F5 or numeric 5 is
pressed during the boot sequence.
Most machines, in addition to the default bootlist and the customized normal bootlist,
allow for a customized service bootlist. This is set using mode service with the
bootlist command. The service bootlist is called when F6 is pressed during boot. For
POWER5 and POWER6 systems, the numeric 6 key is used.
For machines which are partitioned into logical partitions, the HMC is used to boot the
partitions and it provides for specifying boot modes, thus eliminating the need to time
the pressing of special keys. Since pressing either 5/F5 or 6/F6 causes a service mode
boot and since a service mode boot using a boot logical volume will result in booting to
diagnostics, these options are referred to in the HMC as booting to diagnostic either
with the default bootlist or the stored (customizable) bootlist.
Here is a list summarizing the boot modes and the manual keys associated with them
(this may vary depending on the model of your machine):
- F1 (graphic console) or 1 (ASCII console and newer models): Start an SMS (System
Management Services) mode boot.
- F5 (graphic console) or 5 (ASCII console and newer models): Start a service mode
boot using the default service bootlist (which searches the removable media first).
- F6 (graphic console) or 6 (ASCII console and newer models): Start a service mode
boot using the customized service bootlist.
You may find variations on the different models of AIX systems. Refer to the User’s
Guide for your specific model at:
http://publib.boulder.ibm.com/infocenter/pseries/index.jsp?topic=/com.ibm.pseries.doc/
hardware.htm.
Instructor notes:
Purpose — Describe how to work with the bootlists.
Details —
Additional information — The bootlist command will accept one more mode called
both. As you might suspect, the both mode sets the service and normal bootlist as the
same time to the same value.
Transition statement — The SMS programs provide another method to set a bootlist.
Let’s take a look at how to start SMS.
Uempty
Notes:
One of the keyboard actions you may do during this brief period of time is to press the
F1 (or numeric 1) key to request that the system boot using SMS firmware code.
===> 8
© Copyright IBM Corporation 2009
Notes:
Uempty Select the device type. If you do not have many bootable devices it is sometimes easier
to use the List All Devices option.
It is important to understand that when SMS is used to modify the bootlist, both the
normal bootlist and the service bootlist are modified. If you wanted them to be different,
you will need to recustomize them, later, when you have a command prompt (such as in
multiuser mode).
Instructor notes:
Purpose — Show how to change the bootlist in SMS
Details — When you use SMS to change the bootlist, you are changing both the normal
and service customizable bootlists. After fixing the problem at hand, you may with to use
the bootlist command to recustomize them if you want them to be different.
Additional information — The following keys are used (follow with the HMC identifying
text):
- F1 or numeric 1: Start System Management Services
- F5 or numeric 5: Boot in diagnostic mode, use default bootlist
- F6 or numeric 6: Boot in diagnostic mode, use nondefault bootlist
The default bootlist is set to diskette, CD-ROM, internal disk and any communication
adapter.
To boot diagnostics from disk, do not insert a CD and request to use the default bootlist
(press the appropriate key (F5/numeric 5)or specify with HMC).
The other options:
Boot versus Multiboot
Under Select Boot Options, there is a multiboot mode item. This is a toggle that turns
multiboot mode either on or off. If you turn it on, the system will boot to an SMS menu every
time you boot the system in normal mode. This is to allow you to choose where to boot from
each time. For example, you might have different versions of AIX on different hard disks
and want to alternate boot between them. If an SMS menu is displayed when performing a
normal boot, this might be the reason.
Transition statement — Once we have selected the category of boot device, we need to
select the particular device we wish to use in the identified position in the bootlist. Let’s see
how we do this.
Uempty
Select Device
Device Current Device
Number Position Name
1. - IBM 10/100/1000 Base-TX PCI-X Adapter
( loc=U789D.001.DQDWAYT-P1-C5-T1 )
2. - SAS 73407 MB Harddisk, part=2 (AIX 6.1.0)
( loc=U789D.001.DQDWAYT-P3-D1 )
3. 1 SATA CD-ROM
( loc=U789D.001.DQDWAYT-P1-T3-L8-L0
Select Task )
4. None
SAS 73407 MB Harddisk, part=2 (AIX 6.1.0)
===> 2 ( loc=U789D.001.DQDWAYT-P3-D1 )
1. Information
2. Set Boot Sequence: Configure as 1st Boot Device
Notes:
Instructor notes:
Purpose — Complete the walkthrough of how to change a bootlist in SMS.
Details —
Additional information —
Transition statement — Let’s next discuss how to handle a corruption of the boot logical
volume.
Notes:
Boot alternatives
The device the system will boot off of is the first one it finds in the designated bootlist.
Whenever the effective boot device is bootable media, such as a mksysb tape/CD/DVD
or installation media, the system will boot to the Install and Maintenance menu.
If the booting device is a network adapter, the mode of boot depends on the
configuration of the NIM server which services the network boot request. If the NIM
server is configured to support an AIX installation or a mksysb recover, then the system
will boot to Install and Maintenance. If the NIM server is configured to serve out a
maintenance image, then the system boots to a Maintenance menu (a sub-menu of
Install and Maintenance). If the NIM server is configured to serve out a diagnostic
image, then we boot to a diagnostic mode.
There are other ways to boot to a diagnostic utility. If the booting device is a CD with a
diagnostic CD in the drive, we boot into that diagnostic utility. If a service mode boot is
Uempty requested and the booting device is a hard drive with a boot logical volume, then the
system boots into the diagnostic utilities.
The system can be signaled which bootlist to use during the boot process. The default
is to use the normal bootlist and boot in a normal mode. This can be changed during a
window of opportunity between when the system discovers the keyboard and before it
commits to the default boot mode. The signal may be generated from the system
console (this may be an HMC provided virtual terminal) or from a service processor
attached workstation (such as an HMC) which can simulate a keyboard signal at the
right moment.
The keyboard signal that is used can vary from firmware to firmware, but the most
common is a numeric 5 to indicate that the firmware should use the service bootlist and
a numeric 6 to indicate that the firmware should use the customizable service bootlist.
Either of these special keyboard signals will result in a service mode boot, which as we
stated can cause a boot to diagnostic mode when booting off a boot logical volume on
your hard drive.
With an HMC, you can specify which signal to send as part of the LPAR activation.
Even if you forget to override the default boot mode (usually normal to multiuser), you
can still use the virtual console keyboard as described to override, once the keyboard
has been discovered.
Instructor notes:
Purpose — Explain how the boot mode is controlled.
Details —
Additional information —
Transition statement — Let’s continue to look at the factors that affect boot behavior.
Uempty
Notes:
Instructor notes:
Purpose — Continue covering the factors that affect boot behavior.
Details —
Additional information —
Transition statement — Let’s use what we have just learned to effect a boot to
maintenance mode.
Uempty
HMC
Boot the system from
Advance Activate options: the BOS CD-ROM, tape
Default bootlist or
network device (NIM)
Maintenance
Notes:
Introduction
The visual shows an overview of how we access a system that will not boot normally.
The maintenance mode can be started from an AIX CD, an AIX bootable tape (like a
mksysb), or a network device that has been prepared to access a NIM master. The
devices that contain the boot media must be stored in the bootlists.
then you will need to first deallocate it from that other LPAR.Use a dynamic LPAR
operation on the HMC to allocate that slot.
- If using the default bootlist, the sequence is fixed and the CD drive is the first
practical device.
- If using a tape drive or a network adapter as your boot device and not selecting a
boot device through SMS for this particular boot, then you will need to use one of the
customizable bootlists, usually the service bootlist.
Verify your bootlist, but do not forget that some machines do not have a service
bootlist. Check that your boot device is part of the bootlist:
# bootlist -m service -o
- If you want to boot from your internal tape device, you need to change the bootlist
because the tape device by default is not part of the bootlist. For example:
# bootlist -m service rmt0 hdisk0
- Whichever bootlist you are using, insert the boot media (either tape or CD) into the
drive.
- Power on the system (or activate the LPAR). The system begins booting from the
installation media. After several minutes, c31 is displayed in the LED/LCD panel (or
as the reference code on the HMC display) which means that the software is
prompting on the console for input (normally to select the console device and then
select the language). For an LPAR, your will need to have the virtual console started
to interact with the prompts.
- Normally, you are prompted to select the console device and then select the
language. After making these selections, you see the Installation and
Maintenance menu.
For partitioned systems with an HMC, you would normally use the HMC to access SMS
and then select the bootable device, which would bypass the use of a bootlist.
You can also use a NIM server to boot to maintenance. For this, you would need to
place your system’s network adapter in your customized service bootlist before any
other bootable devices, or use SMS to specifically request boot over that adapter (the
latter option is most common). Here is an example of setting the service boot list:
# bootlist -m service ent0 gateway=192.168.1.1 \
bserver=192.168.10.3 client=192.168.1.57
You would also need to set up the NIM server to provide a boot image for doing a
maintenance boot. For example, at the NIM server:
# nim -o maint_boot -spot <spotname> <client machine object
name>
Instructor notes:
Purpose — Identify how to access a system that does not boot.
Details — Emphasize that what causes us to boot into the Installation and Maintenance
menu is the fact that we booted off of installation media. It does not matter if we boot in
normal mode (using the normal bootlist) or service mode (using the default or customizable
service bootlists). It is only important that we find bootable installation media (tape, CD, or
network server) in the bootlist before anything else (such as a BLV or a Diagnostic CD).
With some SMS facilities, we can specify a particular device to use and bypass the
bootlists entirely.
Additional information —
Transition statement — Let’s show the maintenance mode menus that are available.
Uempty
Notes:
First steps
When booting in maintenance mode, you first have to identify the system console that
will be used, for example your virtual console (vty), graphic console (lft), or serial
attached console (tty that is attached to the S1 port).
After selecting the console, the Installation and Maintenance menu is shown.
As we want to work in maintenance mode, we use selection 3 to start up the
Maintenance menu. In a network boot using NIM, the console goes straight to the
maintenance menu.
From this point, we access our rootvg to execute any system recovery steps that may
be necessary.
Instructor notes:
Purpose — Explain the first maintenance menus that are shown.
Details — Describe how to start up the maintenance mode.
Additional information — You could, optionally, provide a brief explanation of what other
steps could be executed in the Maintenance menu. Copy a dump to a removable media
like a tape, accessing an advanced maintenance shell where no rootvg is available,
restoring a mksysb tape.
Transition statement — Let’s describe how to access the rootvg.
Uempty
Choice: 1
Choice [99]: 1
© Copyright IBM Corporation 2009
Notes:
Access this volume group and start a shell before mounting file systems
When you choose this selection, the rootvg will be activated, but the file system
belonging to the rootvg will not be mounted.
A typical scenario where this selection is chosen is when a corrupted file system needs
to be repaired by the fsck command. Repairing a corrupted file system is only possible
if the file system is not mounted.
Another scenario might be a corrupted hd8 transaction log. Any changes that take place
in the superblock or i-nodes are stored in the log logical volume. When these changes
are written to disk, the corresponding transaction logs are removed from the log logical
volume.
A corrupted transaction log must be reinitialized by the logform command, which is
only possible, when no file system is mounted. After initializing the log device, you need
to do a file system repair for all file systems that use this transaction log. Beginning with
AIX 5L V5.1, you have to explicitly specify the file system type: JFS or JFS2:
# logform -V jfs2 /dev/hd8
# fsck -y -V jfs2 /dev/hd1
# fsck -y -V jfs2 /dev/hd2
# fsck -y -V jfs2 /dev/hd3
# fsck -y -V jfs2 /dev/hd4
# fsck -y -V jfs2 /dev/hd9var
# fsck -y -V jfs2 /dev/hd10opt
# exit
Keep in mind that US keyboard layout is used but you can use the retrieve function by
using set -o emacs or set -o vi.
Maintenance
Notes:
Maintenance mode
If the boot logical volume is corrupted (for example, bad blocks on a disk might cause a
corrupted BLV), the machine will not boot.
To fix this situation, you must boot your machine in maintenance mode, from a CD or
tape. If NIM has been set up for a machine, you can also boot the machine from a NIM
master in maintenance mode. NIM is actually a common way to do special boots in a
logical partition environment.
Instructor notes:
Purpose — Describe the bosboot command.
Details — Describe the steps that are necessary to recreate the boot logical volume. Tell
the students that working in maintenance mode is explained later in this unit.
Describe that an hd5 boot logical volume must exist on the system.
Additional information — Be careful to use the correct AIX installation CD to boot your
machine. Consider installing AIX base media and then applying patches to the OS. The
patches make changes to both kernel routines AND libc. This invalidates using the
installation CDs to boot the system into maintenance mode and accessing the disks. This is
because when we boot, we use the /unix and libraries from the CD. Since they all match,
this should not be an issue. As we activate the rootvg, the root (/) file system from the CD is
overlaid with the root (/) file system from the disks. Now, any reference to /unix are resolved
to the DISK! If this /unix does not match what we actually booted from on the CD, bad
things will happen. The same applies for libraries being referenced.
Transition statement — Let’s describe how to work with bootlists.
Uempty
Checkpoint (1 of 2)
IBM Power Systems
1. True or False: You must have AIX loaded on your system to use the
System Management Services programs.
2. Your AIX system is currently powered off. AIX is installed on hdisk1
but the bootlist is set to boot from hdisk0. How can you fix the problem
and make the machine boot from hdisk1?
__________________________________________________
__________________________________________________
3. Your machine is booted and at the # prompt.
What is the command that will display the normal bootlist?
______________________________
How could you change the normal bootlist?
______________________________
4. What command is used to build a new boot image and write it to the
boot logical volume? _____________________________________
5. What script controls the boot sequence? _________________
Notes:
Instructor notes:
Purpose — Review and test the students, understanding of this first part of the unit.
Details — A suggested approach is to give the students about five minutes to answer the
questions on this page. Then, go over the questions and answers with the class.
Checkpoint solutions (1 of 2)
IBM Power Systems
1. True or False: You must have AIX loaded on your system to use the
System Management Services programs. False. SMS is part of the
built-in firmware.
2. Your AIX system is currently powered off. AIX is installed on hdisk1
but the bootlist is set to boot from hdisk0. How can you fix the problem
and make the machine boot from hdisk1? You need to boot the SMS
programs and set the new boot list to include hdisk1.
3. Your machine is booted and at the # prompt.
What is the command that will display the normal bootlist?
# bootlist -om normal.
How could you change the normal bootlist?
# bootlist -m normal device1 device2
4. What command is used to build a new boot image and write it to the
boot logical volume? bosboot -ad /dev/hdiskx
5. What script controls the boot sequence? rc.boot
Additional information —
Transition statement — Let’s continue to the second section, solving boot problems.
Uempty
Checkpoint (2 of 2)
IBM Power Systems
6. True or False: During the AIX boot process, the AIX kernel is
loaded from the root file system.
Notes:
Instructor notes:
Purpose — Review and test the students understanding of this unit.
Details — A suggested approach is to give the students about five minutes to answer the
questions on this page. Then, go over the questions and answers with the class.
Checkpoint solutions (2 of 2)
IBM Power Systems
6. True or False: During the AIX boot process, the AIX kernel is
loaded from the root file system.
False. The AIX kernel is loaded from hd5.
Additional information —
Transition statement — Now, let’s do an exercise.
Uempty
Notes:
Introduction
This exercise can be found in your Student Exercise Guide.
Instructor notes:
Purpose — Introduce the exercise.
Details —
Additional information —
Transition statement — Let’s summarize.
Uempty
Unit summary
IBM Power Systems
Notes:
During the boot process, the kernel from the boot image is loaded into memory.
Boot devices and sequences can be updated using the bootlist command, the diag
command, and SMS.
The boot logical volume contains an AIX kernel, an ODM, and a RAM file system (that
contains the boot script rc.boot that controls the AIX boot process).
The boot logical volume can be recreated using the bosboot command.
Instructor notes:
Purpose — Summarize the unit.
Details — Present the highlights from the unit.
Additional information —
Transition statement — Let’s continue with the next unit.
References
Online AIX Version 6.1 Operating system and device
management
Note: References listed as “online” above are available at the
following address:
http://publib.boulder.ibm.com/infocenter/systems
Unit objectives
IBM Power Systems
Notes:
Introduction
There are many reasons for boot failures. The hardware might be damaged or, due to
user errors, the operating system might not be able to complete the boot process.
A good knowledge of the AIX boot process is a prerequisite for all AIX system
administrators.
/
Restore RAM file system
from boot image etc dev mnt usr
rc.boot 2
Activate rootvg
/etc/inittab
© Copyright IBM Corporation 2009
Notes:
Boot sequence
The visual shows the boot sequence after loading the AIX kernel from the boot image.
The AIX kernel gets control and executes the following steps:
1. The kernel restores a RAM file system into memory by using information
provided in the boot image. At this stage the rootvg is not available, so the
kernel needs to work with commands provided in the RAM file system. You can
consider this RAM file system as a small AIX operating system.
2. The kernel starts the init process which was provided in the RAM file system
(not from the root file system). This init process executes a boot script
rc.boot.
3. rc.boot controls the boot process. In the first phase (it is called by init with
rc.boot 1), the base devices are configured. In the second phase (rc.boot 2),
the rootvg is activated (or varied on).
Uempty 4. After activating the rootvg at the end of rc.boot 2, the kernel overmounts the
RAM file system with the file systems from rootvg. The init from the boot
image is replaced by the init from the root file system, hd4.
5. This init processes the /etc/inittab file. Out of this file, rc.boot is called a
third time (rc.boot 3) and all remaining devices are configured.
Instructor notes:
Purpose — Introduce the AIX software boot process. Keep this on the overview level.
Details — Explain as described in the student notes.
Additional information — Underline that at the beginning of the boot process, no rootvg is
available. Before activating the rootvg, all devices that are needed to varyon the rootvg
must be configured.
Transition statement — Let’s look what rc.boot is doing.
Uempty
rc.boot 1
IBM Power Systems
Failure LED
Process 1 rootvg is not active.
F05 init
c06
rc.boot 1
Boot image
ODM
restbase
548 510
RAM file system
s
cfgmgr -f ig _ Rule ODM
f
Con se=
1
pha
Notes:
3. Base devices are all devices that are necessary to access the rootvg. If the
rootvg is stored on a hdisk0, all devices from the motherboard to the disk itself
must be configured in order to be able to access the rootvg.
4. At the end of rc.boot 1, the system determines the last boot device by calling
bootinfo -b. The LED shows 511.
rc.boot 2 (part 1)
IBM Power Systems
fsck -f /dev/hd9var
518 mount /var /
copycore RAM File system
umount /var
swapon /dev/hd6
Notes:
Uempty 4. Next, /dev/hd2 is checked and mounted (again with option -f, it is checked only
if the file system wasn't unmounted cleanly). If the mount fails, LED 518 is
displayed and the boot stops.
5. Next, the /var file system is checked and mounted. This is necessary at this
stage, because the copycore command checks if a dump occurred. If a dump
exists in a paging space device, it will be copied from the dump device,
/dev/hd6, to the copy directory which is by default the directory /var/adm/ras.
/var is unmounted afterwards.
6. The primary paging space /dev/hd6 is made available.
Instructor notes:
Purpose — Describe the first part of rc.boot 2.
Details — Introduce this boot phase as described in the student material.
Additional information — Beginning with AIX 5L V5.1, the rootvg file system is mounted
directly over the root directory in the RAMFS. This simplifies several steps during phase 2
and eliminates the need to remount the rootvg file systems at the end of phase 2.
In many reference documents, LED 518 is defined as indicating that the /usr file system
could not mount using the network. This is incorrect. LED 518 will display anytime /usr
cannot be mounted.
Transition statement — Let’s describe the second part of rc.boot 2.
Uempty
rc.boot 2 (part 2)
IBM Power Systems
swapon /dev/hd6
rootvg
mount /var
Notes:
Final stage
At this stage, the AIX kernel removes the RAM file system (returns the memory to the
free memory pool) and starts the init process from the / file system in rootvg.
rc.boot 3 (part 1)
IBM Power Systems
Process 1 /etc/inittab:
init /sbin/rc.boot 3 553
fsck -f /dev/hd3
Here, we work with mount /tmp
Rootvg.
savebase hd5:
ODM
Notes:
Uempty 4. The configuration manager reads the ODM class Config_Rules and executes
either all methods for phase=2 or phase=3. All remaining devices that are not
base devices are configured in this step.
5. The console will be configured by cfgcon. The numbers c31, c32, c33 or c34 are
displayed depending on the type of console:
- c31: Console not yet configured. Provides instruction to select a console.
- c32: Console is a lft terminal.
- c33: Console is a tty.
- c34: Console is a file on the disk.
If CDE is specified in /etc/inittab, the CDE will be started and you get a
graphical boot on the console.
6. To synchronize the ODM in the boot logical volume with the ODM from the / file
system, savebase is called.
Instructor notes:
Purpose — Describe the first part of rc.boot 3.
Details — Describe as explained in the student notes.
Additional information — Underline the savebase command that is necessary to
synchronize the ODMs from hd4 and hd5.
Transition statement — Let’s describe the second part of rc.boot 3.
Uempty
rc.boot 3 (part 2)
IBM Power Systems
/etc/objrepos:
savebase ODM
syncd 60
errdemon
hd5:
Turn off LEDs ODM
rm /etc/nologin
A device that was previously detected
s
Ye could not be found. Run "diag -a".
chgstatus=3
System initialization is completed.
CuDv ?
Notes:
rc.boot summary
IBM Power Systems
Where Phase
Action
From Config_Rules
restbase
rc.boot 1 /dev/ram0 1
cfgmgr -f
ipl_varyon rootvg
mount /, /usr,
rc.boot 2 /dev/ram0 /var fileystems
Merge /dev
Copy ODM
mount /tmp
cfgmgr -p2 2-normal
rc.boot 3 rootvg
cfgmgr -p3 3-service
savebase
© Copyright IBM Corporation 2009
Notes:
Summary
During rc.boot 1, all base devices are configured. This is done by cfgmgr -f which
executes all phase 1 methods from Config_Rules.
During rc.boot 2, the rootvg is varied on. All /dev files and the customized ODM
files from the RAM file system are merged to disk.
During rc.boot 3, all remaining devices are configured by cfgmgr -p. The
configuration manager reads the Config_Rules class and executes the corresponding
methods. To synchronize the ODMs, savebase is called that writes the ODM from the
disk back to the boot logical volume.
Notes:
Instructor notes:
Purpose — Explain how to fix a corrupted file system.
Details — Point out that a common cause of this type of corruption is the use of the HMC
shutdown immediate option for an LPAR with a running operating system. This is the
equivalent of cutting power to a computer while the operating system is running, which
does not allow for a proper shutdown. An administrator should always use (when possible)
the HMC OS shutdown option or issue the shutdown command from the LPAR command
prompt.
Additional information —
Transition statement — Let’s review the phases of rc.boot.
Uempty
(1)
rc.boot 1
(2)
(4)
(3)
(5)
Notes:
Instructions
Using the following questions, put the solutions into the visual.
1. Who calls rc.boot 1? Is it:
• /etc/init from hd4
• /etc/init from the RAMFS in the boot image
2. Which command copies the ODM files from the boot image into the RAM file
system?
3. Which command triggers the execution of all phase 1 methods in Config_Rules?
4. Which ODM files contain the devices that have been configured in rc.boot 1?
• ODM files in hd4
• ODM files in RAM file system
5. How can you determine the last boot device?
Instructor notes:
Purpose — Review and test the students understanding of rc.boot phase 1.
Details — This is the first of three reviews. You can review each one separately, or have
the students do all three, then review them all.
(1)
/etc/init from RAMFS
rc.boot 1
in the boot image
restbase (2)
(4)
ODM files cfgmgr -f (3)
in RAM file system
bootinfo -b (5)
Additional information —
Transition statement — Now, let’s review rc.boot phase 2.
Uempty
(5)
rc.boot 2
(1) (6)
(2) (7)
(3)
(8)
557
(4)
Notes:
Instructions
Please order the following nine expressions in the correct sequence.
1. Turn on paging.
2. Merge RAM /dev files.
3. Copy boot messages to alog.
4. Activate rootvg.
5. Mount /var; copy dump; unmount /var.
6. Mount /dev/hd4 onto / in RAMFS.
7. Copy RAM ODM files.
Finally, answer the following question. Put the answer in box 8:
Your system stops booting with an LED 557. Which command failed?
Instructor notes:
Purpose — Review and test the students, understanding of rc.boot phase 2.
Details — This is the second of three reviews. You can review each one separately, or
have the students do all three, then review them all.
(5)
rc.boot 2 Merge RAM /dev files
(1) (6)
Activate rootvg Copy RAM ODM files
(4)
Turn on
paging
Additional information — Question 8 is important for the lab. The command that failed is
the mount of /dev/hd4. One reason for this might be a damaged log logical volume.
Transition statement — Now, let’s review rc.boot phase 3.
Uempty
rm _________
s_______ ________&
Missing
________ -p2 devices ?
_________=3
________ -p3 ______ ?
Notes:
Instructions
Please complete the missing information in the picture.
Your instructor will review the activity with you.
Instructor notes:
Purpose — Review and test the students understanding of rc.boot phase 3.
Details — This is the last of three reviews. You can review each one separately, or have
the students do all three, then review them all.
savebase
/etc/inittab
syncd 60
/sbin/rc.boot3 errdemon
rm /etc/nologin
syncvg rootvg &
chgstatus=3
cfgmgr -p2 CuDv ?
cfgmgr -p3
Execute next line in
Start Console: cfgcon /etc/inittab
Start CDE: rc.dt boot
© Copyright IBM Corporation 2009
Additional information —
Transition statement — Now, let’s switch over to the next topic.
Configuration manager
IBM Power Systems
Predefined
PdDv
PdAt
PdCn
cfgmgr Config_Rules
Customized Methods
CuDv Define
CuAt Device
load
Configure
Driver
CuDep Change
CuDvDr Unconfigure
unload
CuVPD Undefine
© Copyright IBM Corporation 2009
Notes:
Automatic configuration
Many devices are automatically detected by the configuration manager. For this to
occur, device entries must exist in the predefined device object classes. The
configuration manager uses the methods from PdDv to manage the device state, for
example, to bring a device into the defined or available state.
Define method
When a device is defined through its define method, the information from the predefined
database for that type of device is used to create the information describing the device
specific instance. This device specific information is then stored in the customized
database.
Configuration order
The configuration process requires that a device be defined or configured before a
device attached to it can be defined or configured. At system boot time, the
configuration manager configures the system in a hierarchical fashion. First the
motherboard is configured, then the buses, then the adapters that are attached, and
finally the devices that are connected to the adapters. The configuration manager then
configures any pseudodevices (volume groups, logical volumes, and so forth) that need
to be configured.
Instructor notes:
Purpose — Summarize how the cfgmgr works.
Details — Explain that the cfgmgr can detect devices automatically. The devices must be
defined in the predefined ODM classes. When they get defined, they are stored in the
customized ODM classes.
The cfgmgr is method or rule driven. It just uses methods to define or configure a device.
These methods are device specific and are listed in PdDv.
During the boot process, cfgmgr uses the Config_Rules class to configure the devices in
the correct sequence.
Note that the actual Config_Rules object class has more objects in each phase than are
listed in the visual.
Additional information — The output from the configuration manager is viewable in the
boot alog. During run-time, cfgmgr can be started with the flag -v, to get more information
about the devices that are configured.
Transition statement — Let’s have a look in the Config_Rules ODM class.
Uempty
1 10 0 /etc/methods/defsys cfgmgr -f
1 12 0 /usr/lib/methods/deflvm
2 10 0 /etc/methods/defsys
2 12 0 /usr/lib/methods/deflvm cfgmgr -p2
2 19 0 /etc/methods/ptynode (Normal boot)
2 20 0 /etc/methods/startlft
3 10 0 /etc/methods/defsys
3 12 0 /usr/lib/methods/deflvm cfgmgr -p3
3 19 0 /etc/methods/ptynode (Service boot)
3 20 0 /etc/methods/startlft
3 25 0 /etc/methods/starttty
Notes:
Introduction
The Config_Rules ODM object class is used by cfgmgr during the boot process. The
phase attribute determines when the respective method is called.
Phase 1
All methods with phase=1 are executed when cfgmgr -f is called. The first method that
is started is /etc/methods/defsys, which is responsible for the configuration of all
base devices. The second method /usr/lib/methods/deflvm loads the logical volume
device driver (LVDD) into the AIX kernel.
If you have devices that must be configured in rc.boot 1, that means before the
rootvg is active, you need to place phase 1 configuration methods into Config_Rules.
A bosboot is required afterwards.
Phase 2
All methods with phase=2 are executed when cfgmgr -p2 is called. This takes place in
the third rc.boot phase, when the key switch is in normal position or for a normal boot
on a PCI machine. The seq attribute controls the sequence of the execution: The lower
the value, the higher the priority.
Phase 3
All methods with phase=3 are executed when cfgmgr -p3 is called. This takes place in
the third rc.boot phase, when the key switch is in service position, or a service boot
has been issued on a PCI system.
Sequence number
Each configuration method has an associated sequence number. When executing the
methods for a particular phase, cfgmgr sorts the methods based on the sequence
number. The methods are then invoked, one by one, starting with the smallest
sequence number. Methods with a sequence number of zero are invoked last, after
those with non-zero sequence numbers.
Boot mask
Each configuration method has an associated boot mask:
- If the boot_mask is zero, the rule applies to all types of boot.
- If the boot_mask is non-zero, the rule then only applies to the boot type specified.
For example, if boot_mask = DISK_BOOT, the rule would only be used for boots from
disk versus NETWORK_BOOT which only applies when booting through the network.
# alog -t boot -o
-------------------------------------------------------
attempting to configure device 'sys0'
invoking /usr/lib/methods/cfgsys_rspc -l sys0
return code = 0
******* stdout *******
bus0
******* no stderr *****
-------------------------------------------------------
attempting to configure device 'bus0'
invoking /usr/lib/methods/cfgbus_pci bus0
return code = 0
******** stdout *******
bus1, scsi0
****** no stderr ******
-------------------------------------------------------
attempting to configure device 'bus1'
invoking /usr/lib/methods/cfgbus_isa bus1
return code = 0
******** stdout ******
fda0, ppa0, sa0, sioka0, kbd0
****** no stderr *****
Figure 6-15. cfgmgr output in the boot log using alog AN151.0
Notes:
/etc/inittab file
IBM Power Systems
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console #
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunab
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
rcemgr:23456789:once:/usr/sbin/emgr -B > /dev/null 2>&1
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # ru
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
mkcifs_fs:2:wait:/etc/mkcifs_fs > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cron:23456789:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
cons:0123456789:respawn:/usr/sbin/getty /dev/console
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability
Notes:
Purpose of /etc/inittab
The /etc/inittab file supplies information for the init process. Note how the
rc.boot script is executed out of the inittab file to configure all remaining devices in
the boot process.
Modifying /etc/inittab
Do not use an editor to change the /etc/inittab file. One small mistake in
/etc/inittab, and your machine will not boot. Instead use the commands mkitab,
chitab, and rmitab to edit /etc/inittab. The advantage of these commands is that
they always guarantee a non-corrupted /etc/inittab file. If your machine stops
booting with an LED 553, this indicates a bad /etc/inittab file in most cases.
Viewing /etc/inittab
The lsitab command can be used to view the /etc/inittab file. For example:
# lsitab dt
dt:2:wait:/etc/rc.dt
If you issue lsitab -a, the complete /etc/inittab file is shown.
Instructor notes:
Purpose — Describe the /etc/inittab file and some important commands to view and
manipulate this file.
Details — Show that rc.boot is executed out of /etc/inittab. Describe that it is risky
to edit the /etc/inittab file. It is always better to use the commands described in the
notes.
Additional information — Point out that a corrupted /etc/inittab file is indicated by
LED 553. The students will see this in their exercise.
The mkitab, chitab, and rmitab commands provide automatic syntax checking. The line
must match the proper format for /etc/inittab.
There is a -i option with mkitab to insert the new line anywhere in the /etc/inittab
file. Without the -i, the line will be appended to the end of the file.
Transition statement — Let’s describe the basics for system hang detection.
Uempty
Boot logical volume or 20EE000B Access the rootvg. Re-create the BLV:
boot record corrupt? # bosboot -ad /dev/hdiskx
JFS/JFS2 log corrupt? 551, 552, 554, 555, Access rootvg before mounting the rootvg file
556, 557 systems. Re-create the JFS/JFS2 log:
# logform -V jfs /dev/hd8 or
# logform -V jfs2 /dev/hd8
Run fsck afterwards.
Superblock corrupt? 552, 554, 556 Run fsck against all rootvg file systems. If fsck
indicates errors (not an AIX file system), repair the
superblock as described in the notes.
rootvg locked? 551 Access rootvg and unlock the rootvg:
# chvg -u rootvg
ODM files missing? 523 - 534 ODM files are missing or inaccessible. Restore the
missing files from a system backup.
Mount of /usr or /var failed? 518 Check /etc/filesystem. Check network (remote mount),
file systems (fsck) and hardware.
© Copyright IBM Corporation 2009
Notes:
Introduction
The visual shows some common boot errors that might happen during the AIX software
boot process.
Bootlist wrong?
If the bootlist is wrong, the system cannot boot. This is easy to fix. Boot in SMS and
select the correct boot device. Keep in mind that only hard disks with boot records are
shown as selectable boot devices.
rootvg locked?
Many LVM commands place a lock into the ODM to prevent other commands from
working at the same time. If a lock remains in the ODM due to a crash of a command,
this may lead to a hanging system.
To unlock the rootvg, boot in maintenance mode and access the rootvg with file
systems. Issue the following command to unlock the rootvg:
# chvg -u rootvg
Instructor notes:
Purpose — Describe some common causes of boot problems.
Details — Describe as explained in the student notes. Describe the meaning of 553 and
557 as they are part of the exercise.
Additional information —
Transition statement — Let’s review the /etc/inittab file which was described in the
basic administration course.
Uempty
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3
rc:2:wait:/etc/rc
fbcheck:2:wait:/usr/sbin/fbcheck
srcmstr:2:respawn:/usr/sbin/srcmstr
cron:2:respawn:/usr/sbin/cron
rctcpip:2:wait:/etc/rc.tcpip
rcnfs:2:wait::/etc/rc.nfs
qdaemon:2:wait:/usr/bin/startsrc -sqdaemon
dt:2:wait:/etc/rc.dt
tty0:2:off:/usr/sbin/getty /dev/tty1
myid:2:once:/usr/local/bin/errlog.check
Notes:
Instructions
Answer the following questions as they relate to the /etc/inittab file shown in the
visual:
1. Which process is started by the init process only one time?
The init process does not wait for the initialization of this process.
4. Which line determines that multiuser mode is the initial run level of the system?
11. Which line takes care of varying on the volume groups, activating paging spaces,
and mounting file systems that are to be activated during boot?
Additional information —
1. myid line is started only one time
The action once indicates the init process to start the process and not to wait for its
initialization. When the process ends, it will not be restarted.
2. qdaemon
The qdaemon controls the queueing subsystem in AIX. It manages jobs in queues and
their assignment to the different queues in the system.
Uempty
Checkpoint
IBM Power Systems
Notes:
Instructor notes:
Purpose — Review and test the students, understanding of this unit.
Details — A suggested approach is to give the students about five minutes to answer the
questions on this page. Then, go over the questions and answers with the class.
Checkpoint solutions
IBM Power Systems
Additional information —
Transition statement — Now, let’s do an exercise.
Uempty
Notes:
Introduction
This exercise can be found in your Student Exercise Guide.
Instructor notes:
Purpose — Prepare the students for the lab.
Details —
Additional information —
Transition statement — Let’s summarize.
Uempty
Unit summary
IBM Power Systems
Notes:
• After the boot image is loaded into RAM, the rc.boot script is executed three times to
configure the system.
• During rc.boot 1, devices to varyon the rootvg are configured.
• During rc.boot 2, the rootvg is varied on.
• In rc.boot 3, the remaining devices are configured.
• Processes defined in the /etc/inittab file are initiated by the init process.
Instructor notes:
Purpose — Summarize the unit.
Details — Present the highlights from the unit.
Additional information —
Transition statement — Let’s continue with the next unit.
References
Online AIX Version 6.1 Command Reference volumes 1-6
Online AIX Version 6.1 Operating system and device
management
Note: References listed as “online” above are available at the
following address:
http://publib.boulder.ibm.com/infocenter/systems
GG24-4484-00 AIX Storage Management (Redbook)
SG24-5422-00 AIX Logical Volume Manager from A to Z: Introduction
and Concepts (Redbook)
SG24-5433-00 AIX Logical Volume Manager from A to Z:
Troubleshooting and Commands (Redbook)
Unit objectives
IBM Power Systems
Notes:
Physical Logical
Partitions Partitions
Physical Logical
Volumes Volume
Volume
Group
Notes:
Introduction
This visual and the associated student notes will provide a review of basic LVM terms.
Uempty physical partitions per physical volume for scalable volume groups, although there is
currently a limit of 2 M physical partitions for the entire volume group.
Instructor notes:
Purpose — Introduce some basic LVM terms.
Details — Use the student notes to guide your presentation.
Additional information — If no PP size is specified when creating the VG, the mkvg
command attempts to figure out an appropriate PP size based on the disks in the volume
group.
Transition statement — Let’s look at the unique identifiers used by LVM for the volume
groups, logical volumes, and physical volumes.
Uempty
LVM identifiers
IBM Power Systems
# lsvg rootvg
... VG IDENTIFIER: 00c35ba000004c00000001157f54bf78
# lspv 32 bytes long
hdisk0 00c35ba07b2e24f0 rootvg active
... 32 bytes long
# lslv hd4 (16 are shown)
LOGICAL VOLUME: hd4 VOLUME GROUP: rootvg
LV IDENTIFIER: 00c35ba000004c00000001157f54bf78.4 ...
...
VGID.minor number
# uname -m
00C35BA04C00
Notes:
Use of identifiers
The LVM uses identifiers for disks, volume groups, and logical volumes. As volume
groups could be exported and imported between systems, these identifiers must be
unique worldwide.
AIX generated identifiers are based on the CPU ID of the creating host and a
timestamp.
Disk identifiers
Hard disk identifiers have a length of 32 bytes, but currently the last 16 bytes are
unused and are all set to 0 in the ODM. Notice that, as shown on the visual, only the
first 16 bytes of this identifier are displayed in the output of the lspv command.
In a SAN environment, path management needs to have a method for identifying a disk
discovered over two different paths is actually the same disk. Some storage solutions,
in an AIX environment use the PVID for this purpose. Other storage solutions use a
IEEE volume identifier (ieee_volname) or a UDID unique identifier (unique_id) for this
purpose. Each of these would be attributes of the disk in the ODM.
The PVID attribute is set the first time a disk is assigned to a volume group.
If you ever have to manually update the disk identifiers in the ODM, do not forget to add
16 zeros to the physical volume ID.
Notes:
Uempty instead is kept in the same reserved disk area as the VGDA. Also, the administrator
of a big VG can use the -T option of the mklv command to request that the LVCB not
be stored in the beginning of the LV.
LVCB-related considerations
For standard VGs, the LVCB resides in the first block of the user data within the LV. Big
VGs keep additional LVCB information in the VGDA. The LVCB structure on the first LV
user block and the LVCB structure within the VGDA are similar but not identical. If a big
VG was created with the -T 0 option of the mkvg command, no LVCB will occupy the first
block of the LV. With scalable VGs, logical volume control information is no longer
stored on the first user block of any LV. Therefore, no precautions have to be taken
when using raw logical volumes, because there is no longer a need to preserve the
information held by the first 512 bytes of the logical device.
Instructor notes:
Purpose — Introduce the disk control blocks.
Details — Explain using the information in the student notes.
Additional information — None
Transition statement — Let’s see which other locations are used to store LVM data.
Uempty
• AIX files
– /etc/vg/vgVGID Handle to the VGDA copy in memory
– /dev/hdiskX Special file for a disk
– /dev/VGname Special file for administrative access to a VG
– /dev/LVname Special file for a logical volume
– /etc/filesystems Used by the mount command to associate
LV name, file system log, and mount point
Notes:
Instructor notes:
Purpose — Describe where LVM data is stored.
Details — Explain using the information in the student notes. Keep this on an overview
level.
Additional information — None
Transition statement — Let's see what’s stored in the VGDA.
Uempty
Notes:
Introduction
The table on the visual shows the contents of the VGDA. The individual items listed are
discussed in the paragraphs that follow.
Time stamps
The time stamps are used to check if a VGDA is valid. If the system crashes while
changing the VGDA, the time stamps will differ. The next time the volume group is
varied on, this VGDA is marked as invalid. The latest intact VGDA will then be used to
overwrite the other VGDAs in the volume group.
VGDA example
IBM Power Systems
5: ____________
Logical:
00c35ba000004c00000001157fcf6bdf.1 lv00 1
00c35ba000004c00000001157fcf6bdf.2 lv01 1
00c35ba000004c00000001157fcf6bdf.3 lv02 1
Physical: 00c35ba07fcf6b93 2 0
6: ____________ 7: ____________
© Copyright IBM Corporation 2009
Notes:
Uempty b. 2 VGDAs in VG
c. 3 LVs in VG
d. PP size = 220 (2 to the 20th power) bytes, or 1 MB (for this volume group)
e. LVIDs (VGID.minor_number)
f. 1 PVs in VG
g. PVIDs
SNAPSHOT VG: 0
IS_PRIMARY VG: 0
PSNFSTPP: 4352
VARYON MODE: 0
VG Type: 0
Max PPs: 32512
Notes:
Example on visual
In the example on the visual, the getlvcb command is used to obtain information from
the logical volume hd2. The information displayed includes the following:
- Intrapolicy, which specifies what strategy should be used for choosing physical
partitions on a physical volume. The five general strategies are edge (sometimes
called outer-edge), inner-edge, middle (sometimes called outer-middle),
inner-middle, and center (c = Center).
- Number of copies (1 = No mirroring)
Uempty - Interpolicy, which specifies the number of physical volumes to extend across (m =
Minimum).
- LVID
- LV name (hd2)
- Number of logical partitions (103)
- Can the partitions be reorganized? (relocatable = y)
- Each mirror copy on a separate disk (strict = y)
- Number of disks involved in striping (stripe width)
- Stripe size
- Logical volume type (type = jfs)
- JFS file system information
- Creation and last update time
Instructor notes:
Purpose — Describe the LVCB.
Details — Explain that the LVCB stores LV attributes. Do not explain each attribute shown;
just do a short overview of the LVCB.
Additional information — If your logical volume interpolicy is set to maximum, the
getlvcb command will show interpolicy = x.
The values for intrapolicy are:
ie inner edge
im inner middle
c center
m outer middle
e outer edge
Transition statement — Let’s identify how LVM uses the ODM and the VGDA/LVCB.
Uempty
mkvg
extendvg
mklv Update
crfs exportvg
chfs
rmlv
reducevg
...
© Copyright IBM Corporation 2009
Figure 7-9. How LVM interacts with ODM and VGDA AN151.0
Notes:
High-level commands
Most of the LVM commands that are used when working with volume groups, physical,
or logical volumes are high-level commands. These high-level commands (like mkvg,
extendvg, mklv, and others listed on the visual) are implemented as shell scripts and
use names to reference a certain LVM object. The ODM is consulted to match a name,
for example, rootvg or hdisk0, to an identifier.
end up in a situation where the VGDA/LVCB and the ODM are not in sync. The same
situation may occur when low-level commands are used incorrectly.
CuDv:
name = "hdisk2"
status = 1
chgstatus = 0
ddins = "scdisk"
location = "01-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
PdDvLn = "disk/scsi/scsd"
Notes:
Key attributes
Remember the most important attributes:
- status = 1 means the disk is available
- chgstatus = 2 means the status has not changed since last reboot
- location specifies the location code of the device
- parent specifies the parent device
Instructor notes:
Purpose — Explain that information about all disks is stored in the CuDv object class.
Details — Use the student notes to guide your explanation.
Additional information — None
Transition statement — Let’s look at CuAt.
Uempty
Notes:
Instructor notes:
Purpose — Explain that the PVID is stored in CuAt.
Details — Use the student notes to guide your explanation.
Additional information — None
Transition statement — Let’s look at CuDvDr.
Uempty
CuDvDr:
resource = "devno"
value1 = "36"
value2 = "0"
value3 = "hdisk3"
# ls -l /dev/hdisk[03]
brw------- 1 root system 17, 0 Oct 08 06:17 /dev/hdisk0
brw------- 1 root system 36, 0 Oct 08 09:19 /dev/hdisk3
Notes:
Special files
Applications or system programs use the special files to access a certain device. For
example, the visual shows special files used to access hdisk0 (/dev/hdisk0) and
hdisk1 (/dev/hdisk1).
Instructor notes:
Purpose — Explain that major and minor numbers are stored in CuDvDr.
Details — Explain that this ODM class is used to build the special files in /dev.
If it seems appropriate for the particular group of students you are teaching, you might
provide the major number (22) and minor number (1) for hdisk0 (as given in the student
notes) and then ask the students what the major number (22) and minor number (2) are for
hdisk1.
Additional information — None
Transition statement — Let’s see how volume group information is stored in the ODM.
Uempty
Notes:
VGID
One of the most important pieces of information about a volume group is the VGID. As
shown on the visual, this information is stored in CuAt.
Instructor notes:
Purpose — Describe how volume group information is stored in CuDv and CuAt.
Details — Use the student notes to guide your explanation.
Additional information — None
Transition statement — The CuAt output continues on the next page.
Uempty
CuAt:
name = "rootvg"
attribute = "timestamp"
value = "470a1bc9243ed693"
type = "R"
generic = "DU"
rep = "s"
nls_index = 0
CuAt:
name = "rootvg"
attribute = "pv"
value = "00c35ba07b2e24f00000000000000000"
type = "R"
generic = ""
rep = "sl"
nls_index = 0
Notes:
Length of PVID
Remember that the PVID is a 32-number field, where the last 16 numbers are set to
zeros.
Instructor notes:
Purpose — Describe additional objects for volume groups in CuAt.
Details — Use the student notes to guide your explanation.
Emphasize that PVIDs for disks are stored with a length of 32 bytes.
Ensure the students understand that a CuAt object is created for each disk in a volume
group. For example, if there were two physical volumes in rootvg, there would be two
entries with name = "rootvg" and attribute = "pv" in CuAt.
Additional information —
Transition statement — Let’s consider logical volumes.
Uempty
Notes:
Instructor notes:
Purpose — Explain how logical volume data is stored in the ODM.
Details — Use the student notes to guide your explanation.
Additional information — Remind the students that the LVID is created from the VGID
and the minor number of the special file entry of the logical volume.
Transition statement — The CuDvDr and CuDep object classes also contain logical
volume data.
Uempty
# ls -l /dev/hd2
brw------- 1 root system 10,5 08 Jan 06:56 /dev/hd2
Notes:
CuDvDr logical volume objects
Each logical volume has an object in CuDvDr that is used to create the special file entry
for that logical volume in /dev. As an example, the sample output on the visual shows
the CuDvDr object for hd2 and the corresponding /dev/hd2 (major number 10, minor
number 5) special file entry in the /dev directory.
Instructor notes:
Purpose — Continue the explanation of where logical volume data is stored in the ODM.
Details — Explain logical volume objects in CuDvDr and CuDep.
Additional information — None
Transition statement — What are reasons for ODM-related LVM problems?
Uempty
2.
VGDA High-level commands ODM
LVCB
- Signal handler
1. - Lock
Notes:
Causes of problems
The signal handlers used by high-level LVM commands do not work with a kill -9, a
system shutdown, or a system crash. You might end up in a situation where the VGDA
has been updated, but the change has not been stored in the ODM.
Problems might also occur because of the improper use of low-level commands or
hardware changes that are not followed by correct administrator actions.
Another common problem is ODM corruption when performing LVM operations when
the root file system (which contains /etc/objrepos) is full. Always check the root file
system free space before attempting LVM recovery operations.
If the ODM problem is not in the rootvg, for example in volume group
homevg, do the following:
# varyoffvg homevg
Notes:
Uempty 3. In the last step, you import the volume group by using the importvg command.
Specify the volume group name with option -y, otherwise AIX creates a new volume
group name.
You need to specify only one intact physical volume of the volume group that you
import. The importvg command reads the VGDA and LVCB on that disk and
creates completely new ODM objects.
It should be noted that this procedure does not allow the data to be used while repairing
the corruption, even if the file systems are mounted and are accessible despite the
problem. The logical volumes must be closed to vary the volume group offline.
Instructor notes:
Purpose — Describe how to fix ODM problems in non-rootvg volume groups.
Details — Explain the student material.
Additional information — None
Transition statement — Let’s discuss how to fix ODM problems in rootvg.
Uempty
done
odmdelete -q "value3=$LVname" -o CuDvDr • Uses odmdelete
odmdelete -q "name=$VG" -o CuAt to “export” rootvg
odmdelete -q "parent=$VG" -o CuDv
odmdelete -q "name=$VG" -o CuDv
• Uses importvg to
odmdelete -q "name=$VG" -o CuDep import rootvg
odmdelete -q "dependency=$VG" -o CuDep
odmdelete -q "value1=10" -o CuDvDr
odmdelete -q "value3=$VG" -o CuDvDr
importvg -y $VG $PV # ignore lvaryoffvg errors
varyonvg $VG
Notes:
Problems in rootvg
For ODM problems in rootvg, finding a solution is more difficult because rootvg cannot
be varied off or exported. However, it may be possible to fix the problem using one of
the techniques described below.
After deleting all ODM objects from rootvg, it imports the rootvg by reading the VGDA
and LVCB from the boot disk. This results in completely new ODM objects that describe
your rootvg.
Instructor notes:
Purpose — Describe how to fix ODM problems in rootvg by using the rvgrecover script
and other techniques.
Details — Explain the student material. Ensure students understand that they do not need
to reboot in maintenance mode to fix non-rootvg inconsistencies. Remind them of the
importance of backing up rootvg (if possible) before attempting repair on rootvg.
Additional information — The AIX 4.3 Problem Solving Guide and Reference was
published in 1997.
The man page entries (and corresponding entries in the AIX 6.1 Commands Reference) for
redefinevg and synclvodm are brief but helpful.
Transition statement —
Uempty
Notes:
Overview
There are situations where you are unable to run the exportvg or importvg commands
because they depend on finding a minimal level of information in the ODM. Even if
these high level LVM commands can be run, they require that the volume group be
taken offline, which would be disruptive. In these situations it is useful to know some
intermediate level LVM commands. These commands are primarily intended to be used
by high level ODM commands, but they can be useful in solving tough problems.
be active for the resynchronization to occur. If logical volume names are specified, only
the information related to those logical volumes is updated.
The synclvodm command, by itself, can do a fairly complete job of resynchronizing the
ODM with the LVM data areas on the disk. It will also synchronize the information
between the LVM data areas. As such, it can worsen a situation where only one disk in
the volume group has corrupted data areas. The command can be restricted to
synchronizing only specific logical volumes. Otherwise, it synchronizes all logical
volumes. The synclvodm command depends upon a minimal amount of information in
the ODM; most importantly, the ODM needs to know the volume group name plus the
physical volume and logical volume memberships.
Figure 7-21. Exercise 7: LVM metadata and problems (parts 1 and 2) AN151.0
Notes:
Mirroring
IBM Power Systems
Logical
hdisk0 Partitions
hdisk1
Mirrored
hdisk2 Logical
Volume
Notes:
Role of VGSA
The information about the mirrored partitions is stored in the VGSA, which is contained
on each disk. In the example shown on the visual, we see that logical partition 5 points
to physical partition 5 on hdisk0, physical partition 8 on hdisk1, and physical partition 9
on hdisk2.
Instructor notes:
Purpose — Review the concept of mirroring.
Details — This is a review of concepts that were covered in the prerequisite course.
Mirroring is being discussed to support the discussion of failed disks and why we turn off
quorum checking. Use the student notes to guide your explanation.
Additional information — None
Transition statement — Let’s describe what stale partitions are.
Uempty
Stale partitions
IBM Power Systems
hdisk0
Mirrored
Logical
hdisk1
Volume
Notes:
Mirroring rootvg
IBM Power Systems
hdisk0 hdisk1
1. extendvg 5. bosboot -a
2. chvg -Qn 6. bootlist
3. mirrorvg -s 7. shutdown -Fr
4. syncvg -v 8. bootinfo -b
Notes:
Uempty - If you use one mirror disk, be sure that a quorum is not required for varyon:
# chvg -Qn rootvg
- Add the mirrors for all rootvg logical volumes:
# mklvcopy hd1 2 hdisk1
# mklvcopy hd2 2 hdisk1
# mklvcopy hd3 2 hdisk1
# mklvcopy hd4 2 hdisk1
# mklvcopy hd5 2 hdisk1
# mklvcopy hd6 2 hdisk1
# mklvcopy hd8 2 hdisk1
# mklvcopy hd9var 2 hdisk1
# mklvcopy hd10opt 2 hdisk1
# mklvcopy hd11admin 2 hdisk1
If you have other logical volumes in your rootvg, be sure to create copies for them
as well.
An alternative to running multiple mklvcopy commands is to use mirrorvg. This
command was added in AIX V4.2 to simplify mirroring VGs. The mirrorvg
command by default will disable quorum and mirror the existing LVs in the specified
VG. To mirror rootvg, use the command:
# mirrorvg -s rootvg
- Now synchronize the new copies you created:
# syncvg -v rootvg
- As we want to be able to boot from different disks, we need to use bosboot:
# bosboot -a
As hd5 is mirrored, there is no need to do it for each disk.
- Update the bootlist. In case of a disk failure, we must be able to boot from different
disks.
# bootlist -m normal hdisk1 hdisk0
# bootlist -m service hdisk1 hdisk0
- Reboot the system
# shutdown -Fr
- Check that the system boots from the first boot disk.
# bootinfo -b
Instructor notes:
Purpose — Review how to mirror rootvg.
Details — Again, this is review from the prerequisite course. Use the information in the
student material to guide your presentation. You may wish to ask the students why the
mirroring of the rootvg has extra steps. One of the steps is the turing off of quorum
checking; this can be used as a segue into the next visuals on quorum.
Additional information — When mirroring rootvg, hd6 should be mirrored because the
paging space availability is critical to keeping the system online. hd6 serves both as paging
space and as the default dump device. In AIX V 4.3.3 and subsequent releases, there is no
problem with mirroring dump devices.
In releases prior to 4.3.3, dump devices did not work correctly if mirrored. On these older
releases, a separate dump device should be created and not mirrored.
Before 4.3.3, if the dump device was mirrored, when the dump occurred, the data would be
written to one copy of the mirror. Even though only one copy was updated, no partitions
would be marked stale. When the machine rebooted, the dump data would attempt to move
the data from hd6 and write it to /var/adm/ras (by default). Since LVM would think the
mirror was in sync, it would read the data from all copies of hd6 causing the dump to
become corrupted. In 4.3.3 and subsequent releases, it is possible to read a specified copy
of a mirror.
Basically, if working with releases prior to 4.3.3, the dump area should be separate and not
mirrored. With 4.3.3 and subsequent releases, it is safe to leave hd6 as the dump device
and mirror it.
Also, with mirrorvg, quorum is turned off by default. Use -Q to leave quorum enabled. The
-s option prevents the sync from occurring. If you use the -s, make sure syncvg is run
eventually to sync the mirrors.
Transition statement — Let’s show another way to mirror the rootvg.
Uempty
VGDA count
IBM Power Systems
Notes:
Instructor notes:
Purpose — Describe how VGDAs are stored on disks in a volume group and how these
VGDAs are involved in determining whether quorum exists.
Details — Use the information in the student material to guide your presentation.
Additional information — None
Transition statement — Let’s discuss what happens if a quorum is not available.
Uempty
datavg
hdisk1 hdisk2
Notes:
Introduction
What happens if quorum checking is enabled for a volume group and a quorum is not
available?
Consider the following example (illustrated on the visual and discussed in the following
paragraphs): In a two-disk volume group datavg, the disk hdisk1 is not available due to
a hardware defect. hdisk1 is the disk that contains the two VGDAs; that means the
volume group does not have a quorum of VGDAs.
Notes:
Instructor notes:
Purpose — Describe nonquorum volume groups.
Details — Cover the material in the student notes.
Additional information — None
Transition statement — What can you do if a varyonvg fails?
Uempty
datavg
r em oved"
" hdisk1 hdisk2
# varyonvg -f datavg
Failure accessing hdisk1. Set PV STATE to removed.
Volume group datavg is varied on.
Notes:
Quorum checking on
With Quorum Checking On, you always need > 50% of the VGDAs available (except to
vary on rootvg).
missing missing
varyonvg -f VGName
Hardware
repair
removed
Hardware repair
followed by:
varyonvg VGName
chpv -v a hdiskX
removed
© Copyright IBM Corporation 2009
Notes:
Introduction
This page introduces physical volume states (not device states). Physical volume states
can be displayed with lsvg -p VGName.
Active state
If a disk can be accessed during a varyonvg, it gets a PV state of active.
Missing state
If a disk can not be accessed during a varyonvg, but quorum is available, the failing
disk gets a PV state missing. If the disk can be repaired, for example, after a power
failure, you just have to issue a varyonvg VGName to bring the disk into the active state
again. Any stale partitions will be synchronized.
Instructor notes:
Purpose — Introduce physical volume states.
Details — Use the student notes to guide your presentation. Distinguish between PV
states and device states.
Additional information — None
Transition statement — It is time for a checkpoint.
Uempty
Checkpoint
IBM Power Systems
Notes:
Instructor notes:
Purpose — Discuss the checkpoint questions.
Details — A “Checkpoint Solution” is given below:
Checkpoint solutions
IBM Power Systems
Uempty
Exercise 7: LVM metadata and problems
(parts 4 and 5)
IBM Power Systems
Figure 7-31. Exercise 7: LVM Metadata and problems (parts 4 and 5) AN151.0
Notes:
Instructor notes:
Purpose — Prepare the students for the lab.
Details — Use this visual as a transition to the lab. Provide the goals of the lab at this point.
If there is extra time after completing part 4, the students can either work on part 5 or go
back and work on any of the previous optional parts.
Additional information — None
Transition statement — Let’s review some of the key points from this unit.
Uempty
Unit summary
IBM Power Systems
Notes:
• The LVM information is held in a number of different places on the disk, including the
ODM and the VGDA.
• ODM-related problems can be solved by:
- exportvg/importvg (non-rootvg VGs)
- rvgrecover (rootvg)
- LVM intermediate commands
- Manually fixing using ODM commands
• Quorum means that more than 50% of VGDAs must be available.
• Quorum enforcement should be disabled when dealing with a two-disk mirrored VG.
Instructor notes:
Purpose — Summarize key points from the unit.
Details — Present the highlights from the unit.
Additional information — None
Transition statement — Let’s continue with the next unit.
References
Online AIX Version 6.1 Command Reference volumes 1-6
Online AIX Version 6.1 Operating system and device
management
Note: References listed as “online” above are available at the
following address:
http://publib.boulder.ibm.com/infocenter/systems
GG24-4484 AIX Storage Management (Redbook)
SG24-5432 AIX Logical Volume Manager from A to Z: Introduction
and Concepts (Redbook)
Uempty
Unit objectives
IBM Power Systems
Notes:
Introduction
This unit presents many disk management procedures that are very important for any
AIX system administrator.
Yes
Disk mirrored? Procedure 1
No
Yes
Disk still working? Procedure 2
No
Volume group
No
Procedure 3
lost?
rootvg Not rootvg
Yes
Procedure 4 Procedure 5
© Copyright IBM Corporation 2009
Notes:
Flowchart
Before starting the disk replacement, always follow the flowchart that is shown in the
visual. This will help you whenever you have to replace a disk.
1. If the disk that must be replaced is completely mirrored onto another disk, follow
procedure 1.
2. If a disk is not mirrored, but still works, follow procedure 2.
Uempty 3. If you are absolutely sure that a disk failed and you are not able to repair the
disk, do the following:
- If the volume group can be varied on (normal or forced), use procedure 3.
- If the volume group is totally lost after the disk failure, that means the volume
group could not be varied on (either normal or forced).
• If the volume group is rootvg, follow procedure 4.
• If the volume group is not rootvg follow procedure 5.
Instructor notes:
Purpose — Provide considerations before a disk replacement.
Details — Explain as described in the student material.
Additional information — This flowchart is a method to offer disk replacement procedures
for many types of disk failures. It is not guaranteed that 100% of all disk failures are
covered.
A good way to distinguish between the various procedures is to focus on where we recover
the data from:
1. Procedure 1 - We synchronize from a remaining good mirror copy.
2. Procedure 2 - We migrate the data off the suspect disk to the new disk before removing
the suspect disk.
3. Procedure 3 - We recover the data from the filesystem backup(s) (or LV backup
provided by the using application).
4. Procedure 4 - We recover using the mksysb backup of the rootvg.
5. Procedure 5 - We recover using the savevg backup for the non-rootvg.
Transition statement — Let’s start with procedure 1.
Uempty
Notes:
Disk state
This procedure requires that the disk state of the failed disk be either missing or
removed. Refer to Physical Volume States in Unit 5: Disk Management Theory for more
information on disk states. Use lspv hdiskX to check the state of your physical
volume. If the disk is still in the active state, you cannot remove any copies or logical
volumes from the failing disk. In this case, one way to bring the disk into a removed or
missing state is to run the reducevg -d command or to do a varyoffvg and a
varyonvg on the volume group by rebooting the system.
Disable the quorum check if you have only two disks in your volume group.
Notes:
Uempty 3. Before executing the next step, it is necessary to distinguish between the rootvg
and a non-rootvg volume group.
- If the disk that is replaced is in rootvg, execute the steps that are shown on
the visual Procedure 2: Special Steps for rootvg.
- If the disk that is replaced is not in the rootvg, use the migratepv command:
# migratepv hdisk_old hdisk_new
This command moves all logical volumes from one disk to another. You can
do this during normal system activity. The command migratepv requires that
the disks are in the same volume group.
4. If the old disk has been completely migrated, remove it from the volume group.
Use either the SMIT fastpath smit reducevg or the reducevg command.
5. If you need to remove the disk from the system, remove it from the ODM using
the rmdev command as shown. Finally, remove the physical disk from the
system.
Instructor notes:
Purpose — Explain procedure 2.
Details — Describe the procedure as explained in the student material.
Additional information — Make it clear to the students that step 3 is different for rootvg.
Transition statement — Let’s describe the special considerations for rootvg.
Uempty
rootvg 1…
hdiskX 2…
hdiskY
Notes:
If the disk contains the boot logical volume, migrate the logical volume to the
new disk and update the boot logical volume on the new disk. To avoid a
potential boot from the old disk, clear the old boot record by using the
chpv -c command. Then, change your bootlist:
# migratepv -l hd5 hdiskX hdiskY
# bosboot -ad /dev/hdiskY
# chpv -c hdiskX
# bootlist -m normal hdiskY
If the disk contains the primary dump device, you must deactivate the dump
before migrating the corresponding logical volume:
# sysdumpdev -p /dev/sysdumpnull
- Migrate the complete old disk to the new one:
# migratepv hdiskX hdiskY
If the primary dump device has been deactivated, you have to activate it
again:
# sysdumpdev -p /dev/hdX
4. After the disk has been migrated, remove it from the root volume group.
# reducevg rootvg hdiskX
5. If the disk must be removed from the system, remove it from the ODM (use the
rmdev command), shut down your AIX, and remove the disk from the system
afterwards.
# rmdev -l hdiskX -d
1. Identify all LVs and file systems on failing disk: Volume group
# lspv -l hdiskY
Notes:
Instructor notes:
Purpose — Describe procedure 3.
Details — Describe as explained in the student notes.
Additional information — This procedure requires the volume group to be brought online,
either by a varyonvg or a varyonvg -f. If it is forced, the failed disk will be in a removed
state. Use lspv to analyze physical volume states. If it is a normal varyonvg, the disk will
be in a missing state.
Note that removing logical volumes is possible on a disk that could not be accessed.
Transition statement — Let’s describe procedure 4.
Uempty
rootvg
3. Restore from a mksysb tape
hdiskX hdiskY
4. Import each volume group into the
new ODM (importvg) if needed
Contains OS
datavg logical
volumes
hdiskZ
mksysb
Notes:
Procedure steps
Follow these steps:
1. Replace the bad disk and boot your system in maintenance mode.
2. Restore your system from a mksysb tape.
If any rootvg file systems were not mounted when the mksysb was made, those file
systems are not included on the backup image. You will need to create and restore
those as a separate step.
If your mksysb tape does not contain user volume group definitions (for example, you
created a volume group after saving your rootvg), you have to import the user volume
group after restoring the mksysb tape. For example:
# importvg -y datavg hdisk9
Only one disk from the volume group (in our example hdisk9), needs to be selected.
Export and import of volume groups is discussed in more detail in the next topic.
2. Check /etc/filesystems.
hdiskX
3. Remove bad disk from ODM and the system:
# rmdev -l hdiskX -d
Notes:
Procedure steps
Follow these steps:
1. To fix this problem, export the volume group from the system. Use the command
exportvg as shown. During the export of the volume group, all ODM objects that
are related to the volume group will be deleted.
2. Check your /etc/filesystems. There should be no references to logical
volumes or file systems from the exported volume group.
Uempty 3. Remove the bad disk from the ODM (use rmdev as shown). Shut down your
system and remove the physical disk from the system.
4. Connect the new drive and boot the system. The cfgmgr will configure the new
disk.
5. If you have a volume group backup available (created by the savevg command),
you can restore the complete volume group with the restvg command (or the
SMIT fastpath smit restvg). All logical volumes and file systems are recovered.
If you have more than one disk that should be used during restvg, you must
specify these disks:
# restvg -f /dev/rmt0 hdiskY hdiskZ
The savevg and restvg commands will be discussed in a future chapter.
6. If you have no volume group backup available, you have to recreate everything
that was part of the volume group.
Recreate the volume group (mkvg or smit mkvg), all logical volumes (mklv or
smit mklv) and all file systems (crfs or smit crfs).
Finally, restore the lost data from backups, for example with the restore
command or any other tool you use to restore data in your environment.
Instructor notes:
Purpose — Explain procedure 5.
Details — Describe as explained in the student notes.
Additional information —
Transition statement — Let’s discuss some common disk replacement failures.
Uempty
rootvg
rootvg - Migration
hdiskY hdiskX
Fix:
• Check bootlist (SMS menu)
• Check bootlist (bootlist)
• Recreate boot logical volume (bosboot)
Notes:
Instructor notes:
Purpose — Show what might happen after a rootvg migration.
Details — Explain as described in the student notebook.
Additional information — On a MicroChannel system, you get alternating LED codes
223-229. Modern systems stop at SMS.
Transition statement — Let’s explain another disk replacement error.
Uempty
VGDA:
PVID: PVID: ...
datavg ...221... ...555...
physical:
...221...
hdisk4 hdisk5 ...555...
ODM:
CuAt:
name = "hdisk4"
hdisk5 is removed from ODM and attribute = "pvid"
from the system, but not from the value = "...221..."
...
volume group: CuAt:
name = "hdisk5"
# rmdev -l hdisk5 -d attribute = "pvid"
value = "...555..."
© Copyright IBM Corporation 2009 ...
Figure 8-10. Frequent disk replacement errors (2 of 4) AN151.0
Notes:
The problem
Another frequent error occurs when the administrator removes a disk from the ODM (by
executing rmdev) and physically removes the disk from the system, but does not
remove entries from the volume group descriptor area (VGDA).
The VGDA stores information about all physical volumes of the volume group. Each
disk has at least one VGDA.
Disk information is also stored in the ODM, for example, the physical volume identifiers
are stored in the ODM class CuAt.
Note: Throughout this discussion the physical volume ID (PVID) is abbreviated in the
visuals for simplicity. The physical volume ID is actually 32 characters.
What happens if a disk is removed from the ODM but not from the volume group?
Instructor notes:
Purpose — Introduce the VGDA corruption if a disk is removed from the ODM but not from
the volume group.
Details — Describe as explained in the student notes.
Additional information — It is not possible to remove a disk from the ODM as long as it
has open logical volumes. If any process is using a logical volume from a disk, you cannot
remove the disk with rmdev.
Transition statement — Let’s describe the fix for this error.
Uempty
VGDA:
...
PVID:
datavg ...221...
physical:
...221...
...555...
hdisk4 !!!
ODM:
# rmdev -l hdisk5 -d
CuAt:
name = "hdisk4"
Fix: attribute = "pvid"
value = "...221..."
# reducevg datavg ...555... ...
Notes:
The fix
After removing a disk from the ODM, there is still a reference in the VGDA of the other
disks in the volume group of the removed disk. In early AIX versions, the fix for this
problem was difficult. You had to add ODM objects that described the attributes of the
removed disk.
This problem can now be fixed by executing the reducevg command. Instead of
specifying the disk name, the physical volume ID of the removed disk is specified.
Execute the lspv command to identify the missing disk. Write down the physical
volume ID of the missing disk and compare this ID with the contents of the VGDA. Use
the following command to query the VGDA on a disk:
# lqueryvg -p hdisk4 -At (Use any disk from the volume group)
If you are sure that you found the missing PVID, pass this PVID to the reducevg
command.
Instructor notes:
Purpose — Describe how to fix this VGDA corruption.
Details —
Additional information —
Transition statement — Let’s explain other errors that might come up after a disk
replacement.
Uempty
# lsvg -p datavg
ODM failure unable to find device id
...734... in device
configuration database
ODM problem in No
rootvg? Export and import
volume group
Yes
rvgrecover
© Copyright IBM Corporation 2009
Notes:
ODM failure
After an incorrect disk replacement, you might detect ODM failures. For example, when
issuing the command lsvg -p datavg, a typical error message could be:
unable to find device id 00837734 in device configuration database
In this case, a device could not be found in the ODM.
moon
hdisk9
To export a volume group:
lv10
lv1 1.Unmount all file systems
loglv1
01 from the volume group:
# umount /dev/lv10
# umount /dev/lv11
Notes:
The scenario
The exportvg and importvg commands can be used to fix ODM problems. These
commands also provide a way to transfer data between different AIX systems. This
visual provides an example of how to export a volume group.
The disk, hdisk9, is connected to the system moon. This disk belongs to the myvg
volume group. This volume group needs to be transferred to another system.
Uempty 2. When all logical volumes are closed, use the varyoffvg command to vary off the
volume group.
3. Finally, export the volume group, using the exportvg command. After this point,
the complete volume group (including all file systems and logical volumes) is
removed from the ODM.
4. After exporting the volume group, the disks in the volume group can be
transferred to another system.
Instructor notes:
Purpose — Explain how to export a volume group.
Details —
Additional information —
Transition statement — Let’s describe how to import a volume group.
Uempty
myvg
Notes:
In AIX V4.3 and subsequent releases, the volume group is automatically varied
on.
3. Finally, mount the file systems.
mars
lv10
l v11
loglv0
1
hdisk3
myvg
lv10
# importvg -y myvg hdisk3
lv11
loglv importvg: changing LV name lv10 to fslv00
01 importvg: changing LV name lv11 to fslv01
hdisk2
datavg
importvg can also accept the PVID in place of the hdisk name
© Copyright IBM Corporation 2009
Notes:
# umount /home/michael
# mount -o log=/dev/loglv01 /dev/lv24 /home/michael
Notes:
Uempty If the file system type is jfs2, you have to specify this as well
(-V jfs2). You can get this information by running the command
getlvcb lv24 -At
Another method is to add a new stanza to the /etc/filesystems file. This is covered
in the next visual.
Instructor notes:
Purpose — Describe what happens if a file system already exists during the import.
Details —
Additional information —
Transition statement — Let’s see how to add a stanza to the /etc/filesystems file.
Uempty
# vi /etc/filesystems
/dev/lv10: /home/sarah
/home/michael: /dev/lv11: /home/michael
dev = /dev/lv11
vfs = jfs /dev/loglv00: log device
log = /dev/loglv00
mount = false datavg
options = rw
account = false
# mount /home/michael
# mount /home/michael_moon Mount point must exist
© Copyright IBM Corporation 2009
Notes:
- account specifies whether the file system should be processed by the accounting
system. A value of false indicates no accounting.
Before mounting the file system /home/michael_moon, the corresponding mount
point must be created.
Checkpoint
IBM Power Systems
Notes:
Checkpoint solutions
IBM Power Systems
Additional information —
Transition statement — Now, let’s do an exercise.
Exercise 8:
Exporting and importing volume groups
IBM Power Systems
• Disk replacement
• Export and import a volume group
• Analyze import messages (optional)
Notes:
Introduction
This exercise can be found in your Student Exercise Guide.
Unit summary
IBM Power Systems
Notes:
Different procedures are available that can be used to fix disk problems under any
circumstance:
Procedure 1: Mirrored disk
Procedure 2: Disk still working (rootvg specials)
Procedure 3: Total disk failure
Procedure 4: Total rootvg failure
Procedure 5: Total non-rootvg failure
exportvg and importvg can be used to easily transfer volume groups between systems.
Reference
Online AIX Version 6.1 Command Reference volumes 1-6
Online AIX Version 6.1 Operating system and device
management
Online AIX Version 6.1 Installation and migration
Note: References listed as “online” above are available at the
following address:
http://publib.boulder.ibm.com/infocenter/systems
SG24-2014 AIX Version 4.3 Differences Guide (Redbook)
SG24-5765 AIX 5L Differences Guide: V 5.2 Edition (Redbook)
SG24-7463 AIX 5L Differences Guide: V 5.3 Edition (Redbook)
SG24-7414 AIX 5L Differences Guide: V 5.3 Addendum
(Redbook)
SG24-7559 IBM AIX Version 6.1 Differences Guide (Redbook)
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 1 objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
#smit alt_install
Notes:
Uempty Filesets
An alternate disk installation uses the following filesets:
- bos.alt_disk_install.boot_images must be installed for alternate disk mksysb
installations
- bos.alt_disk_install.rte must be installed for rootvg cloning and alternate disk
mksysb installations
alt_disk_install
New Commands
Command Arguments
-C args disk alt_disk_copy args -d disks
-d mksysb args disks alt_disk_mksysb -m mksysb args -d disks
-W args disk alt_rootvg_op -W args -d disk
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
alt_disk_install
New Commands
Command Arguments
-S args alt_rootvg_op -S args
-P2 args disks alt_rootvg_op -C args -d disks
-X args alt_rootvg_op -X args
-v args disk alt_rootvg_op -v args -d disk
-q args disk alt_rootvg_op -q args -d disk
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
hdisk0
• rootvg (AIX 5L V5.3)
hdisk1
AIX 6.1
Notes:
Introduction
An alternate mksysb installation involves installing a mksysb image that has already
been created from another system onto an alternate disk of the target system. The
mksysb image must have been created on a system running AIX V4.3 or subsequent
versions of the operating system.
Example
In the example, an AIX V6.1 mksysb tape image is installed on an alternate disk, hdisk1
by executing the following command:
# alt_disk_mksysb -m /dev/rmt0 -d hdisk1
The system now contains two rootvgs on different disks. In the example, one rootvg
has an AIX 5L V5.3 (hdisk0), one has an AIX 6.1 (hdisk1).
alt_disk_mksysb options
The alt_disk_mksysb command has the following options:
-m device
-d target disks
-B : do not change the bootlist
-i image.data
-s script
-R resolve.conf
-p platform
-L mksysb_level
-n : remain a nim client
-P phase
-c console
-r reboot after install
-k keep mksysb device customization
-y : import non-rootvg volume groups
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce alternate mksysb disk installation.
Details —
Additional information —
Transition statement — Let’s introduce the SMIT interface.
Uempty
# smit alt_mksysb
[Entry Fields]
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the SMIT interface for alternate mksysb disk installation.
Details — Keep it very brief - we only want to show that this can be easily executed from a
SMIT dialog panel.
Additional information — The installation on the alternate disk is broken into three
phases:
1. Phase 1 creates the altinst_rootvg volume group, the alt_logical volumes, the
/alt_inst file systems and restores the mksysb data.
2. Phase 2 runs any specified customization script and copies a resolv.conf file, if
specified.
3. Phase 3 umounts the /alt_inst file systems, renames the file systems and logical
volumes and varies off the altinst_rootvg. It sets the bootlist and reboots, if
specified.
Each phase can be run separately. Phase 3 must be run to get a usable rootvg volume
group.
Transition statement — Let’s describe alternate disk rootvg cloning.
Uempty
hdisk0
• rootvg (AIX 6.1 TL01)
Clone
hdisk1
AIX AIX 6.1 TL02 • rootvg (AIX 6.1 TL02)
Notes:
Example
In the example, rootvg which resides on hdisk0, is cloned to the alternate disk hdisk1.
Additionally, a new maintenance level will be applied to the cloned version of AIX.
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce alternate disk rootvg cloning.
Details —
Additional information — The alt_disk_copy options are (see man page):
-b bundle name
-f APAR_list file
-F list_of_APARs
-l path to location of installp images
-w list_of_filesets_to_install
-d target disks
-B : do not change bootlist
-r : reboot after cloning
-s script
-P phases
-R resolv.conf
-W filesets
Transition statement — Let’s show the SMIT fastpath.
Uempty
# smit alt_clone
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the SMIT fastpath for alternate disk rootvg cloning.
Details — Keep it very brief.
Additional information —
Transition statement — Let’s show how to remove an alternate disk installation.
Uempty
Original hdisk0
• rootvg (AIX 6.1 TL01)
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
hdisk0
• rootvg
•(AIX 5.3)
Clone
NIM server NIM client:
hdisk1
lpar1
• rootvg
AIX AIX 6.1
•(AIX 6.1)
Notes:
What is nimadm?
The nimadm command (Network Install Manager Alternate Disk Migration) is a utility that
allows the system administrator to create a copy of rootvg to a free disk (or disks) and
simultaneously migrate it to a new version or release level of AIX. The nimadm command
uses NIM resources to perform this function.
Advantages of nimadm
There are several advantages to using the nimadm command over a conventional
migration:
• Reduced downtime. The migration is performed while the system is up and functioning
normally. There is no requirement to boot from install media, and the majority of
processing occurs on the NIM master.
Uempty • The nimadm command facilitates quick recovery in the event of migration failure. Since
the nimadm command uses alt_disk_install to create a copy of rootvg, all changes are
performed to the copy (altinst_rootvg). In the even of serious migration installation
failure, the failed migration is cleaned up and there is no need for the administrator to
take further action. In the event of a problem with the new (migrated) level of AIX, the
system can be quickly returned to the pre-migration operating system by booting from
the original disk.
• The nimadm command allows a high degree of flexibility and customization in the
migration process. This is done with the use of optional NIM customization resources:
image_data, bosinst_data, exclude_files, pre-migration script, installp_bundle, and
post-migration script.
Details of using NIM to perform an alternate disk migration are not covered in this course.
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the use of nimadm.
Details — The intent is only to make the students aware of this NIM capability. You do not
even need to cover the displayed example of the nimadm command. Instead, refer then to
the full NIM training course. It is important that they understand that an alternate disk
migration can not be done without using NIM.
Additional information —
Transition statement — Let’s do a review of this section.
Uempty
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose —
Details —
Additional information —
Transition statement —
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 2 objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
multibos overview
IBM Power Systems
Notes:
Overview
The main purpose of using multibos is to have the type of alternate BOS (base
operating system) capabilities that are available with the alternate disk technology,
without having to use another disk. The operating system filesets do not occupy enough
space to justify allocating another entire disk for that purpose. With multibos, you can
have the two BOS versions on the same disk.
This is accomplished by creating copies of the effected (by an OS update) base
operating system logical volumes (active BOS) with a different file name path. Note that
these copies are in the one and only rootvg.
Another advantage to multibos is that there is lower overhead to the cloning operation,
since it does not need to clone all the LVs in the rootvg.
Once you have created the alternate BOS, changes, such as applying maintenance,
can be made to these copies, without changing the level of code being used in the
Uempty active BOS. In addition to applying maintenance, you can access and make
configuration changes to the standby BOS through two techniques: mounting the
standby BOS and starting an interactive shell (chroot) for the standby BOS.
When you would like to test the standby BOS, you simply reboot using the standby copy
of the boot logical volume (BLV). If there is a problem with the changes that were made,
configure the bootlist to use the original BLV and a reboot will return you to the original
version of the BOS.
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide an overview of multibos function and purpose.
Details —
Additional information —
Transition statement — Let’s first look at the file system structure of the alternate BOS,
when created.
Uempty
Active BOS
/
BLV jfslog
(hd5) (hd8) (hd4)
Standby BOS
home opt usr var tmp bos_inst (if mounted)
(hd1) (hd10opt) (hd2) (hd9var) (hd3) (bos_hd4)
BLV jfslog
(bos_hd5) (bos_hd8)
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain the structure of the standby BOS.
Details —
Additional information —
Transition statement — Next, we will look at how we actually create a standby BOS using
the multibos command.
Uempty
• multibos –s –X
• Pre-validate that there is sufficient rootvg free space
• Uses default image.data (can customize with –i)
• Special logical volumes and file systems created for the
standby OS
– bos_<lvname>
– /bosinst/<mount point>
• Copies BOS file systems – backup and restore
• Non-BOS logical volumes are shared
• Optional post-creation customization script
• Bootlist updated (-t will block)
– 1st: standby BOS
– 2nd: active BOS
© Copyright IBM Corporation 2009
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
image.data customization
If you want to change any characteristics of the cloned rootvg logical volumes or file
systems, you can create a copy of the image to data file, edit the copy, and then specify
that the multibos command should use your edited copy (by using the -i flag).
For example, if you wanted the cloned LVs to be placed on a disk we added to the
rootvg, then you would first run the mkszfile command (to obtain a current capture of
the characteristics), copy the created /image.data to a different name, and edit it to
specify that the cloned LVs should be placed on the additional disk. Then you need to
point to that new file by running the multibos -i <image.data copy> -Xs.
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Uempty 3. If you specify a fix list, with the -f flag, the fix list is installed using the instfix utility. The
fix list syntax should follow instfix conventions. If you specify the -p preview flag, then
instfix will perform a preview operation.
4. If you specify the update_all function, with the -a flag, it is performed using the
install_all_updates utility. If you specify the -p preview flag, then install_all_updates
performs a preview operation. Note: It is possible to perform one, two, or all three of the
installation options during a single customization operation.
5. The standby boot image is created and written to the standby BLV using the AIX
bosboot command. You can block this step with the -N flag. You should only use the -N
flag if you are an experienced administrator and have a good understanding of the AIX
boot process.
6. Upon exit, if standby BOS file systems were mounted in step 1, they are unmounted.
Alternate boot
The bootlist command supports multiple BLVs. As an example, to boot from disk hdisk0
and BLV bos_hd5, you would enter the following:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Topic 3 objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
JFS2 snapshot (1 of 2)
IBM Power Systems
Notes:
JFS2 snapshot
A point-in-time image for a JFS2 file system is called a snapshot. The file system which
is the source of this point-in-time image is referred to as the snapped file system or
snappedFS.
The snapshot view of the data remains static and retains the same security permissions
that the original snappedFS had when the snapshot was made. Also, a JFS2 snapshot
can be created without unmounting the file system, or quiescing the file system (though
it may be advisable for some application to briefly quiesce during the snapshot). A
snapshot can be used to access files or directories as they existed when the snapshot
was taken.
The snapshot can then be used to create a backup of the file system at the given point
in time that the snapshot was taken. The snapshot also provides the capability to
access files or directories as they were at the time of the snapshot.
Uempty
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the JFS2 snapshot function.
Details —
Additional information —
Transition statement — Let’s see how to create a JFS2 snapshot.
Uempty
JFS2 snapshot (2 of 2)
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
logical volume from the file system. The external snapshot can be mounted separately
from the file system at its own unique mount point. A given file system can only use
either internal or external snapshots; it cannot mix the different types.
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
snappedFS
inode1 inode2
snapshot
inode1 inode2
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
snappedFS
inode1 inode2
snapshot
inode1 inode2
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# smit jfs2
. . .
List Snapshots for an Enhanced Journaled File System
Create Snapshot for an Enhanced Journaled File System
Mount Snapshot for an Enhanced Journaled File System
Remove Snapshot for an Enhanced Journaled File System
Unmount Snapshot for an Enhanced Journaled File System
Change Snapshot for an Enhanced Journaled File System
Rollback an Enhanced Journaled File System to a Snapshot
Notes:
The various JFS2 snapshot operations can be executed from SMIT dialog panels. Shown
is the SMIT JFS2 menu, with selective display of only those menu items which are JFS2
snapshot related.
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
[Entry
[Entry Fields]
Fields]
File
File System
System Name
Name /home/myfs
/home/myfs
SIZE of snapshot
SIZE of snapshot
Unit
Unit Size
Size Megabytes
Megabytes ++
** Number
Number of
of units
units [500]
[500] ##
Notes:
Uempty
Creating an internal snapshot for a JFS2 file system that is not mounted
First, it is important to know that the you cannot use internal snapshots unless the file
system was enabled to support them at file system creation.
• To enable the file system to support internal snapshots (at creation time only):
# crfs –a isnapshot=yes ....
The mount option, -o snapto=snapshotlv, can be used to create a snapshot for a
JFS2 file system that is not currently mounted:
# mount -o snapto=snapshotLV snappedFS MountPoint
or
# mount -o snapto=snapshotname snappedFS MountPoint
If the snapto value starts with a slash, then it is assumed to be a special device file for
an existing logical volume where the snapshot should be created. If the snapto value
does not start with a slash, then it is assumed to be the name of an internal snapshot to
be created.
For example:
# mount -o snapto=/dev/mysnaplv /dev/fslv00 /home/myfs
This will mount the file system contained on the /dev/fslv00 to the mount point of
/home/myfs and then proceeds to create a snapshot for the /home/myfs file system
in the logical volume /dev/mysnaplv.
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
[Entry
[Entry Fields]
Fields]
File
File System
System Name
Name /home/myfs
/home/myfs
** Snapshot Name
Snapshot Name [mysnap]
[mysnap]
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Listing snapshots
IBM Power Systems
Snapshots
Snapshots for
for /home/myfs2
/home/myfs2
Current Name
Current Name Time
Time
mysnap
mysnap Wed
Wed 19
19 Nov
Nov 08:44:33
08:44:33 2008
2008
mysnap2
mysnap2 Fri 21 Nov 09:33:33 2008
Fri 21 Nov 09:33:33 2008
** mysnap3
mysnap3 Mon
Mon 24
24 Nov
Nov 14:03:18
14:03:18 2008
2008
## snapshot
snapshot -q
-q /home/myfs
/home/myfs
Snapshots
Snapshots for
for /home/myfs
/home/myfs
Current
Current Location 512-blocks
Location 512-blocks Free
Free Time
Time
** /dev/fslv06
/dev/fslv06 262144
262144 261376 Wed May
261376 Wed May 66 18:15:11
18:15:11 2009
2009
Notes:
The snapshot –q option can be used display the snapshots related to the specified file
system.
If the file system uses internal snapshots, then the report provides the snapshot names and
creation times. The * indicates the current snapshot.
If the file system uses external snapshots, then the report provides, for each snapshot, the
logical volume special device file, the snapshot size, how much space is free in the
snapshot, and the creation time.
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
rollback
The rollback command is an interface to revert a JFS2 file system to a point-in-time
snapshot. The snappedFS parameter must be unmounted before the rollback command
is run and remains inaccessible for the duration of the command. Any snapshots that are
taken after the specified snapshot (snapshotObject for external or snapshotName for
internal) are removed. The associated logical volumes are also removed for external
snapshots.
Uempty As with any file copying, be careful about changing the nature of the file (ownership,
permission, sparseness, and so on). Using the backup and restore utilities to implement a
copy of files is often a safer technique.
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how to use a JFS2 snapshot to recover data.
Details —
Additional information —
Transition statement — While using a snapshot directly to recover data is useful, it does
not address a situation where there is a situation in which the disk holding the snappedFS
is lost, much less a site disaster recovery situation. Let’s look at how we can use a
snapshot as a stable source for a backup to media or to a network server.
Uempty
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-73
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how to use snapshot with a backup utility.
Details —
Additional information —
Transition statement — If you make a mistake and underestimate how quickly data is
modified or deleted, then you can have space allocation problems related to the JFS2
snapshot allocation. Let’s look at how to monitor and manage that situation.
Uempty
• External snapshot:
– The snapshot report identifies the size and amount of free space.
– If snapshot needs more space:
# snapshot –o size=+1 snapshotLV
• Internal snapshot:
– Shares logical volume with the snappedFS
# df –m snappedFS
– If snappedFS is out of space, try to free up space – possibly delete
old snapshots.
#snapshot –d –n snapshot_name snappedFS
© Copyright IBM Corporation 2009
Notes:
It is useful to be able to identify situation where a snapshot is growing large. If a snapshot
runs out of space then all snapshots are invalidated and become unusable. If dealing with
an internal snapshot, the snapshots can contribute to the entire filesystem running out of
space.
To monitor an external snapshot, use the query option of the snapshot command. An
alternative would be to mount the snapshot and use the df command, but that is more
complicated.
If an external snapshot needs more room, you can dynamically increase the size of the
snapshot logical volume by using the size option of the snapshot command.
For an internal snapshot, there is no mechanism for identifying the space usage of the
snapshots. Instead, you monitor the size of the snappedFS.
When a file system is running out of space, one way to free space is to delete old
snapshots. Keeping many generations of snapshots can be useful, but it can also be
expensive in terms of space usage.
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-75
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how to manage snapshot space allocation issues.
Details —
Additional information —
Transition statement — Let’s review what we have covered.
Uempty
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-77
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose —
Details —
Additional information —
Transition statement —
Uempty
Checkpoint (1 of 4)
IBM Power Systems
4. Why should you not use exportvg with an alternate disk VG?
________________________________________________________
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-79
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose —
Details —
Checkpoint solutions (1 of 4)
IBM Power Systems
4. Why should you not use exportvg with an alternate disk VG?
This will remove rootvg related entries from /etc/filesystems.
Additional information —
Transition statement —
Uempty
Checkpoint (2 of 4)
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-81
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose —
Details —
Checkpoint solutions (2 of 4)
IBM Power Systems
Additional information —
Transition statement —
Uempty
Checkpoint (3 of 4)
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-83
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Review and test the students, understanding of this section.
Details — A suggested approach is to give the students about five minutes to answer the
questions on this page. Then, go over the questions and answers with the class.
Checkpoint solutions (3 of 4)
IBM Power Systems
Additional information —
Transition statement —
Uempty
Checkpoint (4 of 4)
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-85
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Review and test the students, understanding of this unit.
Details — A suggested approach is to give the students about five minutes to answer the
questions on this page. Then, go over the questions and answers with the class.
Checkpoint solutions (4 of 4)
IBM Power Systems
Additional information —
Transition statement — Now, let’s summarize the unit.
Uempty
Unit summary
IBM Power Systems
Notes:
Alternate disk installation techniques are available:
• Installing a mksysb onto an alternate disk
• Cloning the current rootvg onto an alternate disk
Alternate BOS can be created and maintenance applied
JFS2 snapshots are a great way to capture a file system image at a point in time with
minimal impact to the application.
© Copyright IBM Corp. 2009 Unit 9. Install and backup techniques 9-87
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Summarize the unit.
Details — Present the highlights from the unit.
Additional information —
Transition statement — Let’s continue with the next unit.
Reference
Online AIX Version 6.1 Command Reference volumes 1-6
Online AIX Version 6.1 IBM Workload Partitions for AIX
Online AIX Version 6.1 IBM PowerVM Workload Partitions
Manager for AIX
Note: References listed as “online” above are available at the
following address:
http://publib.boulder.ibm.com/infocenter/systems
Uempty
Unit objectives
IBM Power Systems
Notes:
Instructor notes:
Purpose — Explain unit objectives.
Details —
Additional information —
Transition statement — Let’s first review some WPAR basics.
Topic 1 objectives
IBM Power Systems
After
Aftercompleting
completingthis
thistopic,
topic,you
youshould
shouldbe
beable
ableto:
to:
•• Explain
Explainthe
theprimary
primarybenefits
benefitsof
ofusing
usingWPARs
WPARs
•• Explain
Explainthe
thedifference
differencebetween
betweenaasystem
systemWPAR
WPARand
andan
an
application
applicationWPAR
WPAR
•• Explain
Explainthe
thedifference
differencebetween
betweenand
andreasons
reasonsfor
forshared
shared
namefs
namefsmounts
mountsandandprivate
privateJFS2
JFS2mounts
mountsfor
forsystem
systemWPARs
WPARs
Notes:
Server BI
Ɣ WPARs can be relocated
– Load balancing
– Server maintenance
Notes:
Introduction
Workload Partition (WPAR) is a software-base virtualization capability of AIX 6 that
provides a new capability to reduce the number of AIX operating system images that need
to be maintained when consolidating multiple workloads on a single server. WPARs provide
a way for clients to run multiple applications inside the same instance of an AIX operating
system while providing security and administrative isolation between applications. WPARs
complement logical partitions and can be used in conjunction with logical partitions. WPAR
can improve administrative efficiency by reducing the number of AIX operating system
instances that must be maintained and can increase the overall utilization of systems by
consolidating multiple workloads on a single system and is designed to improve cost of
ownership.
WPARs allow users to create multiple software-based partitions on top of a single AIX
instance. This approach enables high levels of flexibility and capacity utilization for
applications executing heterogeneous workloads, and simplifies patching and other
operating system maintenance tasks.
Uempty
WPARs provide unique partitioning values.
• Smaller number of OS images to maintain
• Performance-efficient partitioning through sharing of application text and kernel data
and text
• Fine-grain partition resource controls
• Simple, lightweight, centralized partition administration
WPARs enable multiple instances of the same application to be deployed across
partitions.
• Many WPARs running DB2, WebSphere, or Apache in the same AIX image
• Different capability from other partitioning technologies
• Greatly increases the ability to consolidate workloads because often the same
application is used to provide different business services
• Enables the consolidation of separate discrete workloads that require separate
instances of databases or applications into a single system or LPAR
• Reduces costs through optimized placement of workloads between systems to yield the
best performance and resource utilization
WPAR technology enables the consolidation of diverse workloads on a single server
increasing server utilization rates.
• Hundreds of WPARs can be created, far exceeding the capability of other partitioning
technologies.
• WPARs support fast provisioning and fast resource adjustments in response to both
normal or unexpected demands. WPARs can be created and resource controls
modified in seconds.
• WPAR resource controls enable the over-provisioning of resources. If a WPAR is below
allocated levels, the unused allocation is automatically available to other WPARs.
• WPARs support the live migration of a partition in response to normal or unexpected
demands.
• All of the above capabilities enable more consolidation on a single server or LPAR.
WPARs enable development, test, and production cycles of one workload to be
placed on a single system.
• Different levels of applications (production1, production2,test1, test2) may be deployed
in separate WPARs.
• Quick and easy roll out and roll back to production environments.
• Reduced costs through the sharing of hardware resources.
• Reduced costs through the sharing of software resources such as the operating
system, databases, and tools.
A WPAR supports the control and the management of its resources, CPU, memory, and
processes. That means that you can assign specific fractions of CPU and memory to each
WPAR and this is done by WLM running on the partition.
Most resource controls are similar to those supported by the Workload Manager. You can
specify shares_CPU which is the number of processor shares available for a workload
partition, or you can specify minimum and maximum percentages. The same is true for
memory utilization. There are also WPAR limits for run-away situations (for example: total
processes).
When you create a WPAR, a WLM class is created (having the same name as the WPAR).
All processes running in the partition inherit this classification. You can see the statistics
and classes using the wlmstat command which has been enhanced to display WPAR
statistics. wlmstat -@ 2 --shows the WPAR classes. Also, you cannot use WLM inside the
WPAR to manage its resources.
• System WPAR
– Autonomous virtual system environment
• Shared file systems (with the global environment) : /usr and /opt
• Private file systems for the WPAR’s own use: /, /var and /tmp
• Unique set of users, groups, and network addresses
– Can be accessed through:
• Network protocols (for example: telnet or ssh)
• Log in from the global environment using the clogin command
– Can be stopped and restarted
• Application WPAR
Create and run
– Isolate an individual application
– Light weight; quick to create and remove
• Created with wparexec command
• Removed when stopped
• Stopped when the application finished
Stop and remove
Notes:
System workload partition
System workload partitions are autonomous virtual system environments with their own
private root file systems, users and groups, login, network space, and administrative
domain.
A system WPAR represents a partition within the operating system isolating runtime
resources such as memory, CPU, user information, or file system to specific application
processes. Each system WPAR has its own unique set of users, groups and network
addresses. The systems administrator accesses the WPAR through the administrator
console or through regular network tools such as telnet or ssh. Inter-process
communication for a process in a WPAR is restricted to those processes in the same
WPAR.
System workload partitions provide a complete virtualized OS environment, where multiple
services and applications run. It takes longer to create a system WPAR compared to an
application WPAR as it builds its file systems. The system WPAR is removed only when
Uempty requested. It has its own root user, users, and groups, and own system services like inetd,
cron, syslog, and so forth.
A system WPAR does not share writable file systems with other workload partitions or the
global environment. It is integrated with the Role Based Access control (RBAC).
Application workload partition
• Normal WPAR except that there is no file system isolation
• Login not supported
• Internal mounts not supported
• Target: Lightweight process group for mobility
Application workload partitions do not provide the highly virtualized system environment
offered by system workload partitions, rather they provide an environment for segregation
of applications and their resources to enable checkpoint, restart, and relocation at the
application level.
The application WPAR represents a shell or an envelope around a specific application
process or processes which leverage shared system resources. It is lightweight (that is,
quick to create and remove and does not take lots of resources) since it uses the global
environment system file system and device resources. Once the application process or
processes are finished, the WPAR is stopped. The user cannot log in inside the application
WPAR using telnet or ssh from the global environment. If you need to access the
application in some way this must be achieved by some application-provided mechanism.
All file systems are shared with the global environment. If an application is using devices it
uses global environment devices.
The wparexec command builds and starts an application workload partition, or creates a
specification file to simplify the creation of future application workload partitions.
An application workload partition is an isolated execution environment that might have its
own network configuration and resource control profile. Although the partition shares the
global environment file system space, the processes running therein are only visible to
other processes in the same partition. This isolated environment allows process
monitoring, gathering of resource, accounting, and auditing data for a predetermined
cluster of applications.
The wparexec command invokes and monitors a single application within this isolated
environment. The wparexec command returns synchronously with the return code of this
tracked process only when all of the processes in the workload partition terminate. For
example, if the tracked process creates a daemon and exits with the 0 return code, the
wparexec command blocks until the daemon and all of its children terminate, and then
exits with the 0 return code, regardless of the return code of the daemon or its children.
Instructor notes:
Purpose — Review the difference between a system WPAR and an application WPAR.
Details —
Additional information —
Transition statement — In the lab, we will be creating a relocatable system WPAR. To do
this, we need to be very clear about how shared and private file systems are accessed in
the WPAR.
Uempty
{Marie} / # mount
• System WPAR Node mounted mounted over vfs date options
– /usr Æ namefs, -------- ------------- --------------- ------ ------ ---------
/dev/fslv01 / jfs2 Sep 03 14:55 rw,log=INLINE
nfs mount or local /dev/fslv02 /home jfs2 Sep 03 14:55 rw,log=INLINE
/opt /opt namefs Sep 03 14:55 ro
– /opt Æ namefs, /proc /proc namefs Sep 03 14:55 rw
nfs mount or local /dev/fslv03 /tmp jfs2 Sep 03 14:55 rw,log=INLINE
/usr /usr namefs Sep 03 14:55 ro
– /proc Æ namefs /dev/fslv04 /var jfs2 Sep 03 14:55 rw,log=INLINE
Notes:
Storage level access in a system WPAR is primarily through set of file systems assigned to
the WPAR at creation and mounted within the WPAR during activation. A system WPAR
operates within a localized view of these file systems, by default:
/
/usr
/opt
/tmp
/var
/home
Each WPAR must have a writable / (root) directory. The other system directories (/tmp,
/var, /home) may be simple subdirectories under that / directory or they may be separate
file systems mountable under /. The default storage model is to have each of these system
directories established as separate file systems mounted into the WPAR. These may also
be NFS-mounted from an NFS server.
Topic 2 objectives
IBM Power Systems
After
Aftercompleting
completingthis
thistopic,
topic,you
youshould
shouldbe
beable
ableto:
to:
•• Describe
DescribeWPAR
WPARManager
Managerconcepts
conceptsand
andcomponents
components
•• Install
InstallWPAR
WPARManager
Manager
•• Access
AccessWPAR
WPARManager
ManagerGUI
GUI
•• Create
Createand
andmanage
manageWPARs
WPARsfrom
fromWPAR
WPARManager
Manager
•• Perform
PerformWPAR
WPARmobility
mobilityand
andadvanced
advancedoperations
operations
Notes:
Notes:
IBM AIX 6.1 Workload Partition Manager (WPAR Manager) is a platform management
solution that provides a centralized point of control for managing workload partitions
(WPARs) across a collection of managed systems running AIX.
It is an optional product, part of IBM Systems Director family, designed to facilitate the
management of WPARs and application mobility, as well as provide advanced features
such as policy-based mobility for automation of WPAR relocation based on current
performance state. The Workload Partition Manager is a separated product, not part of
AIX.
By deploying the WPAR Manager, users are able to take full advantage of WPAR
technology by leveraging the following features:
• Basic life cycle management: Create, start, stop, and delete WPAR instances
• Manual WPAR mobility: User-initiated relocation of WPAR instances
• Creation and administration of mobility policies: User-defined policies governing
automated relocation of WPAR instances based on performance state
Uempty • Creation of compatibility criteria on a per WPAR basis: User-defined criteria based on
compatibility test results gathered by the WPAR Manager
• Administration of migration domains: Creation and management of server groups
associated to specific WPAR instances which establish which servers would be
appropriate as relocation targets
• Server profile ranking: User-defined rankings of servers for WPAR relocation based on
performance state
• Reports based on historical performance: Performance metrics gathered by WPAR
manager for both servers and WPAR instances
• Event logs and error reporting: Detailed information related to actions taken during
WPAR relocation events and other system operations
• Inventory and automated discovery: Complete inventory of WPAR instances deployed
on all servers with WPAR Manager agents installed whether created by the WPAR
Manager or through the CLI on the local system console.
Workload Partition Manager helps with resource optimization. Physical servers can be
consolidated and deconsolidated dynamically. For application granularity, this allows for
more utilization of the already powerful virtualization (APV or PowerVM) capability of AIX
and System p.
For applications which require less than 1/10 of a processor to run, the WPAR approach
allows for their consolidation into a global LPAR that can distribute the workload at a finer
grain utilization of a CPU and other systems resources. This provides better use of systems
and future cost savings to the enterprise.
Instructor notes:
Purpose — Provide a basic definition of WPAR Manger function and value
Details — The WPAR Manager is a management system designed to provide a
centralized interface for administration of WPAR instances across multiple systems.
Additional information — The WPAR Manager feature improves flexibility of WPAR
management across several systems especially in a virtualized environment. Its main
benefit is automated workload balancing for WPARs across a farm of servers.
It may be considered as a software feature for WPAR virtualization in addition to hardware
PowerVM virtualization tools.
The WPAR Manager software product is part of the Director family.
The WPAR Manager product provides the Metacluster Checkpoint and Restart (MCR)
software required for WPAR relocation and all other agents, It provides both a built-in
database capability (using Apache Derby) and provides a limited use copy of DB2 which is
recommended for large environments.
Explain to students that the WPAR Manager licensed program product is not part of AIX.
That means WPAR relocation cannot be performed without the WPAR Manager LPP. The
MCR fileset (mcr.rte) provides the chkptwpar, restartwpar, and movewpar commands,
corresponding to checkpoint, restart, and relocation functions.
Transition statement — Let’s next look at the graphical user interface to the WPAR
Manager.
Uempty
Notes:
WPAR Manager for AIX server is a Java application running in the management server.
The WPAR Manager user interface provides a browser-driven interface to the WPAR
management server. The user interface allows for the display of information that has been
collected through the agents, and also provides management capability such as creation,
deletion, relocation of WPARs, and so forth. The agent is based on Common Agent
Services (CAS) technology. Many of these tasks can also be accomplished from the
command line interface.
Automated WPAR mobility provides another key to success and lowering cost in the
optimization of uptime: applications can be relocated on maintenance windows, or set up
for proactive fail over in case of indication of degradation (predictive failure analysis).
This provides for non-interruptive maintenance providing zero downtime for server fixes
and upgrades through virtual server/application relocation. This is clearly something that
would need to be tested before going into a production environment and is not a
replacement for high availability software such as HACMP or similar products.
WPAR Manager is an enhancement to the flexibility and power of the IBM UNIX story as it
becomes a more highly available solution. Other factors that promote AIX availability are
more dynamic allocation and reallocation along with the configuration of virtual servers,
storage, and network resources.
Optimization of performance: applications or virtual servers can be scaled up or down,
based on actual throughput demand and performance requirements. Sharing of application
text, kernel data and text, through the WPAR technology, improves efficiency of
partitioning.
To use your browser with the WPAR management console, you must use Firefox 1.5+ or
Internet Explorer (IE) version 6+, and JavaScript must be enabled in the browser. Since IE
does not have native support for Scalable Vector Graphic (SVG), the Adobe SVG plug-in is
needed, which can be downloaded from
http://www.adobe.com/svg/viewer/install/main.html.
Agent DB
Discovery
Agent Manager WPAR Manager Database access
(resource Manager)
WPAR Agent
WPAR Agent
M Mobility operations
M Mobility operations
C C
Managed
R R Managed
system/LPAR
system/LPAR
NFS exports for mobility
NFS Server
© Copyright IBM Corporation 2009
Notes:
The figure shows the basic installation components configuration.
Deploying management software usually requires a server that hosts the management
software and an agent that has to be installed on each server that is to be managed.
The WPAR Manager is composed of three components:
• The WPAR Manager (resource manager) is the back-end part containing the database
and Web server. It is a server component.
• The Agent Manager for WPAR Manager LPAR communication with WPAR clients. It is a
server component.
• The WPAR agent running on the client WPAR LPARs. It is the client component.
To simplify the installation, by default, WPAR Manager and CAS Agent Manager are
installed on the same system (the management server) and CAS Agent and WPAR Agent
are installed on any server that will be managed.
Uempty The goal of Common Agent Services (CAS) is to minimize the complexity of the software
management deployment by reducing the efforts needed for deployment and utilizing
system resources more effectively.
During WPAR Agent registration you have to provide the hostname of the CAS Agent
Manager. The WPAR Manager then instructs the WPAR Agent to send it the information, in
the format of an XML document, at a regular interval (default is 1 minute).
This is a system wide value for all servers managed by the WPAR Manager.
The information received by the WPAR Manager is maintained in database tables. With
this information, the WPAR management console allows us to monitor various aspects of
the managed WPAR, such as:
• WPAR status:
• WPAR name
• Operational state
• Type
• Last modification time
Also, a significant amount of performance metrics are sent by the WPAR Agent to WPAR
Manager at a regular interval.
Client and manager agents communication also provides for the checking of application
health inside WPAR:
• The administrator can provide scripts to check the health of the application running in
WPAR.
Instructor notes:
Purpose — Cover the WPAR Manager components and what roles they play.
Details — WPAR Manager utilizes the information in the registry on the Agent Manager to
discover new managed systems.
The Agent Manager is the server component of the common agent services. It provides
authentication and authorization services, enables secure connections between managed
systems in your deployment, maintains the registry about the managed systems and the
software running on those systems. It also handles queries from the resource managers
against the database.
The agent manager has the following components:
• The agent manager service
The agent manager service serves as a certificate and registration authority to provide
authentication and authorization using X.590 digital certificates and the Secure Socket
Layer (SSL) protocol. It also handles requests for registry information from common
agents and resource managers.
Resource managers and common agents must register with the agent manager service
before they can use its services to communicate with each other. This registration is
password protected and there are separate passwords for the common agents and
resource managers.
For WPAR Manager, you only need to specify the registration password for the
common agents. The password for resource manager is automatically generated during
the configuration of WPAR Manager.
• The registry
The registry is the database that contains the current configuration of all known
common agents and resource managers.
Some of the information contained in the registry are:
- The identity, digital certificates, and communication information for each resource
manager
- Basic configuration information for each common agent, for example, hardware type
and operating system version
- The status of each common agent
- The last error or, optionally, a configurable number of errors, reported by each
common agent
WPAR Manager listens to port 14080 and 14443 and communicates to port 9511, 9512
and 9513 on the Agent Manager and to port 9510 on the WPAR Agent.
Notice that these are default ports which can be overridden by the user during the
configuration of WPAR Manager.
Notes:
This scenario is listed as an example with a completely new installation with no existing
CAS Agent Manager or any DB2 server in the environment.
It lists steps for installation of WPAR Manager, CAS Agent Manager, and (optionally) DB2
on the server machine.
Prerequisites
Required free space:
- /tmp is 175 MB.
- /opt - 700 MB.
- /home - 800 MB.
- /var is 200 MB.
If using DB2 instead of the provided default Apache Derby for the database, then an
additional 2GB of disk space is recommended:
Install filesets
During wparmgt.mgr.rte fileset installation, three prerequisites are also installed:
lwi.runtime, tivoli.tivguid, and wparmgt.cas.agentmgr.
Configure WPAR Manager
There are three modes in which WPAR Manager Configurator can be used:
i. Graphical mode (GUI - this is the default mode)
ii. Console mode (text)
iii. Quiet mode (use a response file)
Here is the command syntax to start each mode:
•/opt/IBM/WPAR/manager/bin/WPMConfig.sh
•/opt/IBM/WPAR/manager/bin/WPMConfig.sh -i console
•/opt/IBM/WPAR/manager/bin/WPMConfig.sh -i silent \
-f /opt/IBM/WPAR/manager/config/wpmInstall.properties
Console mode for text input is convenient as it can be started from any user interface:
/opt/IBM/WPAR/manager/bin/WPMConfig -i console
You are guided through several menus to enter parameters such as LOCALE variable,
communication ports, manager hostname, agent manager password.
Following actions are performed:
- Start CAS Agent Manager and WPAR Manager
- Register WPAR Manager to CAS Agent Manager
- Set WPAR Manager to autostart at reboot (/etc/inittab file)
- Set CAS Agent Manager to autostart at reboot (/etc/inittab file)
Once configured, the WPAR Manager daemon should be active. You can manage the
daemon by using the wparmgr command:
- To verify, use /opt/IBM/WPAR/manager/bin/wparmgr status
- To start, use /opt/IBM/WPAR/manager/bin/wparmgr start
- To stop, use /opt/IBM/WPAR/manager/bin/wparmgr stop
Once configured, the CAS Agent Manager daemon should also be active. You can
manage this daemon using the agentmgr command:
- To verify, use /opt/IBM/WPAR/manager/bin/agentmgr status
- To start, use /opt/IBM/WPAR/manager/bin/agentmgr start
- To stop, use /opt/IBM/WPAR/manager/bin/agentmgr stop
You can also use the Web browser to verify the installation by testing that it can connect
to both CAS Agent Manager and WPAR Manager.
- To verify the connection to the CAS Agent Manager: http://<WPAR Manager
hostname>:9513/AgentMgr/Info
- To verify the connection to the WPAR Manager: http://<WPAR Manager
hostname>:14080/ibm/console
Installing and configuring (optional) DB2
This is not necessary if you will be configuring WPAR Manager to use the default
Apache Derby database that comes with it. For environments with large numbers of
WPARs, it is recommended to use a DB2 database.
Install the wparmgt.db fileset. This is a limited use packaging of DB2 for use with WPAR
Manager.
Optionally, you could use an existing DB2 9.1 instance. In that case, you need to work
with the database administrator to create and populate the database catalog and
schema.
Start /opt/IBM/WPAR/manager/db/bin/DBInstall.sh.
You can specify the following options:
–dbinstallerdir <MOUNT_POINT/db2>
-dbpassword <db2wmgt user password>
The following actions are performed the by DBInstall.sh script:
- Verify that port 50000, which will be used for DB2, is not already in use.
- Verify that there is enough space in /tmp and /opt/IBM/WPAR/manager/db2.
- Run db2setup to install DB2.
- Verify that there is enough space in /home/db2wmgt (instance owner home).
- Run db2isetup to create db2 instance db2wmgt and database WPARMGTDB.
- Create and populate tables, indexes, views, and triggers that WPAR Manager will
use.
- Set the database to automatically start when the system starts.
You can also view the detail information in the log file at
/var/opt/IBM/WPAR/manager/logs/install/WPMDBI.log
# cd /opt/IBM/WPAR/agent/bin
# ./configure-agent –amhost <serverIP> -prompt
Notes:
Hardware and software requirements
Any system running AIX 6.1, can run a WPAR Agent. The WPAR Manager version 1.2
on the server system can work with a WPAR agent which is either version 1.1 or version
1.2. But, the WPAR Manager 1.2 enhanced capabilities (especially the enhanced live
relocation) are only available with the WPAR Manager version 1.2 agent. WPAR
Manager 1.2 requires at least AIX6.1 TL2 (preferably with most recent service pack).
Install the packages
Select the wparmgt.agent and the mcr.rte packages for installation. The prerequisite
software is normally installed by default when installing the operating system.
Notice that, besides wparmgt.agent, there are three other prerequisite fileset that are
also installed:
wparmgt.cas.agent, this fileset contains the CAS Agent function.
tivoli.tviguid (co-requisite for wparmgt.cas.agent) GUID is a 32-hexadecimal digit
number that is used to uniquely identify a common agent or a resource manager in the
Verifying
Once the agent has been configured, you can verify the agent daemon is running by
using the following command:
# wparagent status
The real test is to use the WPAR Manager web interface to discover the managed
systems. The successful discovery of the agent completes the validation, and enables
the WPAR Manager to create, activate, and relocate WPAR on the agent system.
Instructor notes:
Purpose — Cover the steps involved in the agent installation and configuration.
Details —
Additional information —
The common agent has a “heartbeat” function that sends periodic status and configuration
reports to the agent manager.
The frequency of this update can be set or it can be turn off.
The common agent functionality for WPAR Manager is in the fileset wparmgt.cas.agent.
WPAR Agent listens to port 9510 and communicates to port 9511, 9512, and 9513 on the
CAS Agent Manager.
Notice that these are default ports which can be overridden by the user during the
configuration of WPAR Manager.
Transition statement — Now that we have installed and configured the software, we are
ready to use the WPAR Manager graphic interface. To do this you will need to log in to the
WPAR Manager. Let’s look at some of the possible roles that can be assigned to an AIX
user who is logging in.
Uempty
• Authentication
– Any user with a user ID and password on the local AIX system hosting the WPAR
Manager application can authenticate to WPAR Manager, but the actions
available in the interface differ depending on the role assigned to the user
• WPAR Manager roles
– Administrator: Can define roles for other users
– WPARAdministrator: Provides access to all WPAR Manager management actions
– WPARUser:
• Provides access to all basic WPAR actions
• Does not provide access to high-level administrative tasks:
• Discovering, modifying, and deleting managed systems
• Creating or modifying relocation policies
• Modifying general WPAR settings
– WPARMonitor
• Provides read-only access to managed systems, WPARs, and WPAR groups
• Does not allow you to make any changes to the environment
Notes:
Some implementation may be considered after installation, such as creating additional
users accounts for WPAR Manager access.
You may define the appropriate role to each of the AIX users.
During WPAR Manager installation, the root user is mapped to the administrator and
WPAR administrator roles.
ID-to-application role mappings can either be performed using the lwiMapRole.sh script or
with the user interface using the Console User Authority window.
Full accessibility support for screens readers should be enabled from the Configure WPAR
Manager > User Preferences panel.
WPAR Manager uninstall is done through the following steps:
1. Connect as root to the server partition and remove WPAR Manager and CAS Agent
Manager filesets using SMIT.
2. Run “/cdrom/db2/DBUninstall.sh db2wmgt” to remove DB2 database.
3. Connect as root to the client partition and remove WPAR Agent filesets using SMIT.
See the installation slides to determine the list of filesets to uninstall.
You can determine WPAR Manager and Agent versions looking at the following files:
•/opt/IBM/WPAR/manager/version.properties
•/opt/IBM/WPAR/agent/version.properties
License files are located in /usr/swlag/<Locale>/WPARManager_110* and
/opt/mcr/mcr.rte.copyright.
Functional components:
WPAR Manager
Ɣ Basic management
Global load Compatibility
Relocation
Ɣ Relocation balancing
Ɣ Recovery
LPAR 1 LPAR 2
Notes:
The basic management features are used for standard operations such as create, view,
modify, start, stop, and remove.
Using the WPAR Manager, the deployment operation is provided for copying WPAR
definition to the managed server.
The deployment option enables you to build a WPAR profile on the WPAR manager with or
without creating it on the client LPAR.
Global load balancing:
By automating the relocation of a WPAR to a managed system that is better suited to its
current workload, we are effectively load balancing all systems under WPAR Manager
control to achieve, not only a better performance for each application, but also a better
utilization for the entire IT enterprise.
The global load balancing feature of the WPAR Manager is based on the concepts of
WPAR group, server group, server ranking profile, and WPAR relocation policy.
Uempty The balancer component can be used in semi-automatic or automatic mode for relocation.
Manual mode is also available.
Using manual relocation, the WPAR manager assists the administrator by providing
WPARs and LPARs performance metrics and handling automatically compatibility checks.
There are three major functions of the WPAR Manager workload balancer, relocation
analysis, relocation workflow management, and relocation recovery.
This component uses the monitoring information that is collected and analyzes the current
utilization of all WPARs to find whether there is any requirement to relocate any WPAR, if
there are multiple events, it also prioritizes the order of relocation events. It selects the
most appropriate managed system as the target for relocation, based on the user-defined
policy for each WPAR group. In short, this component finds which WPAR to relocate, when
and where to relocate it.
Notice that Partition Load Manager (PLM) moves idle resources across the LPARs of a
single server.
Monitoring and reporting:
The monitoring and reporting feature is used for performance metrics collection to the
database.
This information contains:
• WPAR Agent's GUID. WPAR Manager uses the GUID to identify which client the
information comes from.
• Global environment performance metrics
• A list of WPARs
• Performance metrics for each active WPAR
This not only allows the administrator who uses the WPAR Management console to
monitor the performance of each WPAR and managed system, but also enables the WPAR
Manager to know which WPAR may need to be relocated and which logical partition or
managed system is the most suitable candidate to host the WPAR.
Instructor notes:
Purpose — Identify a list of functions provided through the WPAR Manager interface.
Details — Do not preteach all of the items here. This is mostly a list of what is covered on
the following visuals. The exception is the load balancing function. For that, just briefly
define what it is and explain that:
a. We do not have time to cover the configuration of that capability
b. The use of the load balancing requires extensive understanding of AIX performance
issues.
Additional information —
Transition statement — Let’s take a look at the first item in this list.
Uempty
Basic management
IBM Power Systems
Notes:
Pick your browser of choice and type in https://<ip or name of server>:14443/ibm/console.
This opens the GUI to the user login page.
Type in the user name and password and click log in button.
Clicking Guided Activities opens up a drop-down menu to choose from. The two options
are listed as:
1. Create Workload Partition
2. Create WPAR Group
The first option, Create Workload Partition, opens a new page which is the entry page to
the wizard. The welcome page for the wizard give you an opportunity to choose between
using the default wizard interface or using the advanced interface (able to jump between
the tasks using tabs). Next, the list of tasks for this activity will be listed.
The following panels then guide you through all the options and parameters to define the
WPAR.
Instructor notes:
Purpose — Discuss basic management using the WPAR Manager interface.
Details — While the visual focuses on the Guided Activities menu, most of the basic
management actually occurs in the Resources Views. There, you can select a managed
system or a WPAR and select the action to perform against the selected entity. Using a
resource view is best covered on a later visual.
For this visual, the main focus is actually the use of the Guided Activities to invoke the
Create Workload Partition guided wizard,
Additional information —
Transition statement — Let’s see what we get if we click the Create Workload Partition
menu item.
Uempty
Creating a WPAR
IBM Power Systems
Welcome
General
Filesystems
Options
Network
Routing
Resource Controls
Security
Advanced settings
Summary
Notes:
After selecting the option Create Workload Partition, this task list appears and you are
guided through all the steps for defining the properties.
You will be guided through a wizard to:
• Provide a name and description for the new partition
• Select whether this will be a system or application partition
• Specify whether the partition can be relocated from one system to another
• Set up network addresses and settings
• Set up WPAR properties, Role Based Access Control (RBAC), resource controls
• Review or change settings for file systems and paths
• Choose whether to deploy and start partition immediately or at a later time
Instructor notes:
Purpose — Illustrate the start of the Create Workload Partition wizard.
Details — We will not cover the details of using the wizard in this lecture. Instead, the lab
exercise will walk them through the process, step-by-step. Note that they should already be
familiar with using the AIX command line interface to define and activate a new WPAR.
Additional information —
Transition statement — Once we have WPAR created, we will want to monitor the
resource usage.
Uempty
Notes:
Performance metrics are sent by the WPAR Agent to WPAR Manager at a regular interval.
This not only allows the administrator who uses the WPAR Management console to
monitor the performance of each WPAR and managed system, but also enables the WPAR
Manager to know which WPAR may need to be relocated and which logical partition or
managed system is the most suitable candidate to host the WPAR.
Instructor notes:
Purpose — Explain how WPAR manager collects and displays metrics.
Details —
Additional information —
Transition statement — Most of the management of workload partitions will be done by
locating an existing WPAR on a resource view and selecting an action to be performed.
Let’s see what that looks like.
Uempty
Resources view
IBM Power Systems
Notes:
To see the resources defined or discover others, move the mouse to the upper left corner
and click Resource Views.
It has a drop-down with three options: Managed Systems, Workload Partitions, and
Workload Groups.
Each view provides a list of known resources in that category. For example, the Workload
Partitions view has a list of known workload partitions. From a list, you can select a
resource, such as a particular WPAR, and then select an action from the Actions menu. For
example, you can activate, deactivate, or remove a known workload partition.
Instructor notes:
Purpose — Describe WPAR Manager Resource Views.
Details —
Additional information —
Transition statement — One of the most important WPAR Manager abilities is to manage
the relocation of workload partitions. We can select a known workload partition from the
Workload Partitions resource view and then request a relocation.
Uempty
Notes:
The relocate wizard prompts you for relocation options and then manages the relocation
process with minimal effort on the part of the administrator. It will suggest a compatible and
optimal target system, but allows you to pick your own.
Best practice is to first test for compatibility before attempting relocation. The Compatibility
item is directly under the Relocate item in the actions menu. Compatibility analysis
determines if it is safe to relocate a WPAR from one machine to another. Both software and
hardware compatibility tests are run. Also, both critical and optional test cases are run.
Even if we do not pretest for compatibility, the relocation wizard will automatically verify
compatibility before executing a relocation.
Details of the relocation process and compatibility rule will be covered in the next lecture
topic.
Instructor notes:
Purpose — Explain how WPAR Manager can initiate a WPAR relocation.
Details — Do not preteach too much here. The next topic goes into the details of WPAR
relocation.
Additional information —
Transition statement — Regardless of what specific activities we invoke in WPAR
Manager, it is useful to be able to track the progress of that action, or later be able to
examine a log to investigate any problems.
Uempty
Notes:
There are three basic categories of tracking information available with the WPAR Manager
components.
- Detailed task monitoring
- Log information from the various components
- Performance metrics
The task monitoring is easily available though the WPAR Manager interface. It can be
viewed either while the activity is in progress or at any time after the activity has completed.
It enumerates the task history (with their status). The task details, for a given task, will list
the operations which implement that task. For some operations, you can obtain the
command executed along with any STDIN and STDERR that was written.
The logs are mostly for detailed problem determination when an activity fails. If working
with AIX Support on a problem, the support staff will likely ask for these logs to be collected
and included in the snap. There are separate logs for each component, obviously collected
from several servers. The logs are not easily read; some of them are in a tagged html
format.
The performance metrics are supported by the WPAR Manager GUI. The WPAR Manager
collects and can later display the performance metrics. Especially useful is the graphing of
the metrics.
Notes:
This lists the location of the logs for the three main WPAR Manager components.
The WPAR Agent Manager log files would be on the WPAR Manager server system.
The WPAR Agent and Common Agent log files would be on the systems managed by the
WPAR Manager server. Remember that when diagnosing a WPAR relocation problem,
there are two platforms with agents working to implement the relocation activity.
Topic 3 objectives
IBM Power Systems
After
Aftercompleting
completingthis
thistopic,
topic,you
youshould
shouldbe
beable
ableto:
to:
•• Explain
Explainthe
therole
roleof
ofApplication
ApplicationMobility
Mobility
•• Explain
Explainthe
theNFS
NFSrole
rolein
inLive
LiveApplication
ApplicationMobility
Mobility(LAM)
(LAM)
•• List
Listthe
theLAM
LAMrequirements
requirementsforforthe
theWPAR
WPARand
andthe
thelogical
logical
partitions, and validate that the requirements are met
partitions, and validate that the requirements are met
•• Migrate
Migrateaalive
livesystem
systemWPAR
WPAR from
from one
one logical
logical partition
partition to
to
another
another
•• Explain
Explain WPAR
WPAR Manager
Manager support
support for
for static
static relocation
relocation
© Copyright IBM Corporation 2009
Notes:
Application mobility
IBM Power Systems
Notes:
Outage avoidance
Hardware components of an IT infrastructure might need to undergo maintenance
operations requiring the component to be powered off. If an application is not part of a
cluster of servers designed to provide continuous availability, then using WPARs to host
them can help to reduce interruption of availability. Using the live application mobility
feature, the applications that are executing on a physical server can be temporarily
moved to another server without an application blackout period during the period of time
required to perform the server physical maintenance operations.
Workload sizing and balancing
Using the mobility feature of WPARs, the server sizing and planning can be based on
the overall resources of a group of servers, rather than being performed server by
server. It is possible to allocate applications to one server up to 100% of its resources.
When an application grows and requires resources that can no longer be provided by
the server, the application can be moved to a different server with spare capacity.
Notes:
Live relocation
The live relocation was implemented (under WPAR Manager version 1.1 and is still
supported by WPAR Manager 1.2), using the checkpoint command to pause the WPAR
and capture all of its state information in a collection of state files. Not only did the
private file systems need to be on an NFS server (common to both source and target
systems), but also the state file was passed through the NFS server. At the target
system, a clone of the source WPAR needed to be defined, and the state file used, to
restart the WPAR.
The time between WPAR pause and WPAR restart was long enough that connections
from application clients or peers could time out. On the other hand, the supporting line
commands are fully documented and some administrators find the checkpoint-based
live relocation to be more reliable than the enhanced live relocation.
While WPAR Manager version 1.2 still supports the command line implementation, the
WPAR Manager GUI uses enhanced live relocation.
Uempty
Enhanced live relocation
WPAR Manager 1.2 has improved the technology used to relocate active WPARs. The
command line interface implementation has fewer steps. The enhanced live relocation
is also referred to as asynchronous live relocation, because of the use of memory
transfer technologies (similar to what is used in live partition mobility). The important
result of the enhancement is a much shorter period of application freeze, thus avoiding
most connection outages.
The WPAR Manager GUI orchestrates the agents to carry out the relocation using the
MCR commands. The main command is the movewpar command. While it is possible
to use a command line interface on the source and target LPARs to implement
enhanced live relocation, the movewpar command is not officially documented.
Information on how you might use the movewpar command is provided only in the
WPAR redbook. The intent is that you would use the WPAR manager GUI.
In both live relocation and enhanced live relocation, the WPAR processes (when they
restart) expect to be in an execution environment that looks the same on the target as it
did on the source system. This expectation is expressed as a series of compatibility
requirements between the source and target systems.
Static relocation
If there is not a requirement for the WPAR to stay active with its applications running
during relocation, then you can relocate a WPAR without the use of WPAR Manager.
You simply need to save the WPAR on the source system, and restore the WPAR on
the target system.
In WPAR version 1.2, the WPAR manager GUI will implement this type of relocation
with a single request. This is referred to as a static relocation. The important
requirements are that the private file systems must not be local (rather than NFS
mounted) and that the WPAR backup files must be on an NFS server common to both
source and target.
Since there are no running WPAR processes in a static relocation, the compatibility
requirements are much less than in the live relocation scenario.
Instructor notes:
Purpose — Compare the different types of relocation support.
Details —
Additional information —
Transition statement — Let’s talk about the importance of compatibility between the
source and the target systems.
Uempty
Compatibility issues
IBM Power Systems
• Software compatibility
– WPAR Manager agent levels
– Global environment AIX operating system levels
– Any other binaries which are in a namefs mounted file system
• Hardware compatibility
– Server processor type
– Devices and hardware features
Notes:
WPAR mobility across systems requires the departure and arrival systems to be
compatible. This includes the software and the hardware compatibility.
Software compatibility
The software levels on both the departure and arrival systems must match. This is
absolutely necessary as the application binaries are not saved in the checkpoint state and
are instead restarted utilizing the arrival system's application binaries.
Hardware compatibility
The hardware characteristics of the departure and arrival system must be compatible. This
ensures that an application that is aware of system hardware characteristics will continue
to see the same features after migration to the remote system.
For WPAR Live Application Mobility each machine or LPAR needs to be configured in the
same way for the WPAR. This includes:
• The file systems needed by the application
VIOS
VIOS
– Operating system, applications, P1 P2 P3 P1 P5
and services are not stopped during
the process
– Requires POWER6 , AIX 5.3 HMC
Network
and VIO server
• Live application mobility: Moving a
workload partition from one server
to another
AIX # 2
– Without requiring the
workload running in the
Workload
WPAR to be restarted AIX # 1
1. Partition
Workload
2.
Billing AIX # 3
Partition
– Provides outage Workload
Partition
Workload
Partition
Data Mining Workload
Partition
avoidance and EMail App Srv
Workload
Test
Partition
multi-system workload Workload Training
Partition Workload
balancing Web Partition Policy
Dev Workload
Partitions
Figure 10-25. Live partition mobility versus live application mobility AN151.0
Notes:
Overview
Live Partition Mobility and Live Application Mobility are capabilities that enable users to
move workloads between systems with no (or limited) application downtime. Both types
of mobility allow organizations to move workloads from busy servers to less busy ones
in order to improve overall performance and system utilization (based on requirements
at a particular time). They can also be used to enable a maintenance window on a
machine without necessarily needing any application downtime. This is accomplished
by moving the work (either WPARs or entire LPARs) off the machine needing the
maintenance and then later returning the work to that same machine after the
maintenance is completed.
The only interruption of service would be due to network latency. If sufficient bandwidth
was available, a delay of – at most – a few seconds could typically be expected.
Live Application Mobility
Uempty Application Mobility is a capability that allows a client to relocate a running WPAR from
one system to another, without requiring the workload running in the WPAR to be
restarted. Application Mobility is intended for use within a data center and requires the
use of the new Licensed Program Product; the IBM AIX Workload Partitions Manager.
WPARs differ significantly from Live Partition Mobility in that Live Partition Mobility is a
feature of POWER6 processors. As such, it can be used on operating systems other
than AIX 6, such as Linux or earlier AIX versions. In contrast, Workload Partitions is a
feature of AIX 6 specifically and can run on a variety of hardware (for example either
POWER6, POWER5 or POWER5+ systems).
Live Partition Mobility
Partition mobility enables the movement of full partitions between systems, which not
only enables better optimization of your IT environment by balancing workload, it also
helps to eliminate the need for planned outages for system upgrades.
An active migration moves the definition of a logical partition from one system to
another along with its network and disk configuration. The operating system, the
applications, and the services they provide are not stopped during the process. The
physical memory content of the logical partition is copied from system to system
allowing the transfer to be imperceptible to users. During an active migration, the
applications continue to handle their normal workload. Disk data transactions, running
network connections, user contexts, and the complete environment is migrated without
any loss and migration can be activated any time on any production partition.
Instructor notes:
Purpose — Explain the difference between Live Partition Mobility and Live Application
Mobility.
Details —
Additional information —
Transition statement — How is it possible to move a WPAR between systems without
disrupting the execution of the programs?
Uempty
1. Issue movewpar
2. Send WPAR spec file to
target and save page and
segment table on NFS
3. Target receives spec file,
creates WPAR. State of T
4. Get WPAR memory page
and segment table from NFS
5. Source state: T -> M
6. Transmit Memory data
7. Source state: M -> T
8. Transfer complete,
target state T -> A
9. Source state T -> D
© Copyright IBM Corporation 2009
Notes:
WPAR enhanced live mobility is implemented as follows:
1. Our WPAR is active, and we issue movewpar on the target (or Arriving) system. The
WPAR changes state to T, and starts the move processes.
2. We send our WPAR spec file to the arriving system. At the same time, we start saving
page and segment table information on our NFS server.
3. When our Arriving system receives the spec file, it creates the WPAR. When this is
done we change state to T.
4. We are ready for receiving memory data from the source (or Departing) system, and we
can get page and segment table information from the NFS server.
5. While this happens on the Arriving system, our Departing system changes state from T
to Moving (M).
6. The M state is only shown while our memory data is transmitted to the Arriving system.
7. As soon as this is finished, we change the state to T.
8. When our Arriving system has received all needed data, it starts and changes states
from T to A.
9. At the same time, our Departing system changes state to D.
1.
1 On the NFS server, create, and export WPAR file systems
2
2. Create WPAR on source system (with checkpointable flag)
3
3. Start and use the WPAR
4
4. Identify compatible target system
5
5. Invoke Relocation task for WPAR to target system
WPAR
Mgr
Source Target
System System
WPAR WPAR
ServerA ServerA
NFS relocated
Server
Figure 10-27. Steps for WPAR enhanced live mobility (WPAR Mgr GUI) AN151.0
Notes:
Here is an overview of the different steps needed to create a workload partition
check-pointable, then to relocate if from system 1 to system 2.
1. Create the file-systems structure on the nfs server for the WPAR:
- /
- /tmp
- /home
- /var
(and optionally /usr and /opt ; not recommended)
Export the file systems with root access to both of the global environments and to the
WPAR.
2. Create the WPAR – checkpointable (mkwpar –c).
3. Start the WPAR – startwpar.
Uempty 4. Identify a compatible target system (WPAR Manager will provide a list of registered
systems and their compatibility with the global system of the WPAR).
Alternatively, prior to attempting relocation, you can select the WPAR and click the
Compatibility item in the Action menu. This will list the known managed systems and
their compatibility status for the selected WPAR.
If you attempt a relocation with a non-compatible system, WPAR Manager will fail the
relocation and identify the compatibility issue.
5. Select the WPAR in the WPAR Manager GUI and select the Relocate task from the
actions menu. WPAR Manager will manage the entire relocation process and provide a
listing of the steps taken and the status of the relocation task.
Instructor notes:
Purpose — Cover how to relocate a WPAR using the WPAR Manager GUI.
Details —
Additional information —
Transition statement — Let’s look at the steps the wizard goes through to relocate an
active WPAR.
Uempty
Verify that the WPAR is active. Verify that the WPAR does not exist.
Lock the WPAR on the source. Lock the WPAR name on the target.
Pause WPAR and send WPAR state. Receive WPAR state from the departure
server.
Remove WPAR.
Notes:
When all operations are run through the guided Relocation wizard, the WPAR Manager
orchestrator automatically performs all the steps of the workflow.
The relocation is performed with a minimal application downtime and interruption to the end
user.
WPAR Manager uses a WPAR lock mechanism during the relocation process. Locks are
not used for manual relocation when performed from the command line. Details about
relocation steps have been described in the previous WPAR topic.
Relocation process details and performed steps can be checked from the monitoring task
menu.
Instructor notes:
Purpose — Cover the steps in relocating an active WPAR, using the wizard.
Details —
Additional information —
Transition statement — We can track these steps in the WPAR Manager graphic
interface in the Task Details panel. Let’s see what this looks like after a successful
relocation.
Uempty
Notes:
During or after a relocation task, you can examine the individual operations and their status
by using the WPAR Manager Task Details panel, as shown here.
If there is any problem with the relocation, the first point of failure in the workflow will be
identified as the failed operation.
Instructor notes:
Purpose — Illustrate the ability to track the tasks during a relocation.
Details — Do not spend much time on this. Just note that this can be an invaluable tool if
there are problems in completing a relocation.
Additional information —
Transition statement — We just looked at what a successful relocation would look like.
What if the relocation task failed? How would we diagnose the problem?
Uempty
Notes:
If there is a problem with the relocation task, the Task Details list of operations will identify
what operation failed.
When you click on the name of the operation it will bring up the details about that particular
operation.
Instructor notes:
Purpose — Illustrate what a task failure would look like.
Details —
Additional information —
Transition statement — If we click the failed operation name, what will we see?
Uempty
Notes:
The Operations Details panel provides additional information about the operation. If the
operation involved the execution of a line command, then there will be three tabs in the
operation details:
- Command: The full syntax of the issued command with options and arguments
- Output: The standard output from the command
- Error: The standard error from the command
Obviously, the standard error listing is very useful in diagnosing what caused the task
failure.
In the displayed example, the enhanced live relocation failed because the NFS server
setup was not complete; the NFS server had not identified the relocation target system
global environment as having root access to the exported file system.
Instructor notes:
Purpose — Illustrate the Operation Details panel.
Details —
Additional information —
Transition statement — Let’s see how we would implement an enhanced live relocation
from the command line interface.
Uempty
1
1. On the NFS server, create, and export WPAR file systems.
2
2. Create WPAR on the source system (with checkpointable flag).
3
3. Start and use WPAR.
4
4. Generate a spec file for the WPAR.
5
5. Ensure the target system is compatible.
6
6. Create WPAR on the target system using the spec file.
7
7. Start a migration server on the source system.
– Record the reported connection key value
8 Start migration on the target system (using connection key).
8.
Source: Target:
lparX lparY
WPAR WPAR
ServerA ServerA
NFS relocated
Server
Figure 10-32. Steps for WPAR enhanced live mobility (command line) AN151.0
Notes:
In comparison to the graphic interface, the command line interface (CLI) requires more
work on behalf of the administrator, but has the advantage that it can be embedded in a
shell script for flexible automation.
Some of the main differences are:
- You have to manually determine the compatibility of the two servers.
- You are responsible to create (but not activate) a WPAR on the arriving system
which is exactly the same as the one which is to be relocated. The exact match is
typically ensured by creating and then using a specification file for the WPAR.
- You have to start a mobility server on the departure system and then start a mobility
client on the arrival system which will connect to the mobility server.
The movewpar command that is the core of this capability is not officially documented.
There is no man page, nor does the WPAR Manager product documentation mention the
command line approach. The information here is from the redbook on the topic.
The documented command line approach uses checkpoint and restart (covered later).
Instructor notes:
Purpose — Provide an overview of the command line interface method for enhanced live
application mobility.
Details —
Additional information —
Transition statement — Let’s examine this procedure, step-by-step.
Uempty
• Create the file systems structure on the NFS server for the
WPAR.
– /, /tmp, /home, /var , and any application file systems
– Optionally: /usr , /opt (but not recommended)
• Export the NFS file systems with root access to both global
environments and to the WPAR.
1
# exportfs
/export/wpars –sec=sys,access=lparX:wparA:lparY,
root=lparX:wparA:lparY
/export/wpars_home -sec=sys,rw,access=lparX:
wparA:lparY,root=lparX:wparA:lparY
/export/wpars_tmp –sec=sys,rw,access=lparX:
wparA:lparY,root=lparX:wparA:lparY
/export/wpars_var -sec=sys,rw,access=lparX:
wparA:lparY,root=lparX:wparA:lparY
Notes:
The NFS aspects of using the command line interface are not any different from using the
WPAR Manager GUI interface.
Live relocation requires that any file systems that are private to the WPAR be stored
externally with common access from both the source and target systems. Currently, only
NFS is supported for this requirement.
The file systems /usr and /opt are optionally nfs mounted from the nfsserver, but
managing these as private file systems can be a problem, both in management and
performance. It is not recommended.
Once the file systems have been defined and mounted on the NFS server, they need to be
defined as NFS exported file systems with the two LPARs (global environments) and the
WPAR having read-write access as root.
Instructor notes:
Purpose — Cover the set up of the NFS server to support relocation.
Details —
Additional information —
Transition statement — Let’s look at the creation and activation of a relocatable WPAR.
Uempty
Notes:
The –c option in the mkwpar command specifies that the WPAR created will be
checkpointable.
The mount specifications match the NFS exports you set up earlier.
Using the command line to implement the relocation, you need to define a clone of the
WPAR on the target system. The easiest and safest way to do this is to generate a
specification file.
Instructor notes:
Purpose — Discuss the creation and activation of a relocatable WPAR.
Details —
Additional information —
Transition statement — Before you attempt a relocation, you should first check the
compatibility of the two systems.
Uempty
Notes:
System compatibility
System compatibility is strictly related to the relocation type. Live application mobility is
the process of relocating a WPAR while preserving the state of the application stack.
Static application mobility is defined as a shutdown of the WPAR on the departure node
and the clean start of the WPAR on the arrival node while preserving the file system
state. Live relocation requires a more extensive compatibility testing than static
relocation. Therefore, it is possible that two systems could be incompatible for live
relocation, but compatible for static relocation.
Compatibility is evaluated on the following criteria:
- Hardware levels (the two systems must have identical processor types)
- Installed hardware features
- Installed devices (as seen by the LPARs involved)
- Operating system levels and patch levels
- Other software or file systems installed with the operating system (same V.R.M.F:
version, release, modification, and fix)
- Additional user-selected compatibility testing for application mobility
Compatibility testing includes critical tests and optional tests. These compatibility tests
help to determine if a WPAR can be relocated from one managed system to another.
For each relocation type, live or static, there is a set of critical tests that must pass for
one managed system to be considered compatible with another.
For live relocation, the critical compatibility tests check the following compatibility
criteria:
- The operating system type must be the same on the arrival system and the
departure system.
- The operating system version and level must be the same on the arrival system and
the departure system.
- The processor class on the arrival system must be at least as high as the processor
class of the departure system.
- The version, release, modification, and fix level of the bos.rte fileset must be the
same on the arrival system and the departure system.
- The version, release, modification, and fix level of the bos.wpars fileset must be the
same on the arrival system and the departure system.
- The version, release, modification, and fix level of the mcr.rte fileset must be the
same on the arrival system and the departure system.
- The bos.rte.libc file must be the same on the arrival system and the departure
system.
- There must be at least as many storage keys on the arrival system as on the
departure system.
Note: The critical tests for static relocation are a subset of the tests for live relocation.
The only critical test for static relocation is that the bos.rte.libc file must be the same on
the arrival system and the departure system.
In addition to these critical tests, you can choose to add additional optional tests for
determining compatibility. These optional tests are selected as part of the WPAR group
policy for the WPAR you are planning to relocate, and are taken into account for both
types of relocation.
Two managed systems might be compatible for one WPAR and not for another,
depending on which WPAR group the WPAR belongs to and which optional tests were
selected as part of the WPAR group policy. Critical tests are always applied in
determining compatibility regardless of the WPAR group to which the WPAR belongs.
You can choose from optional tests to check the following compatibility criteria:
- NTP must be enabled on the arrival system and the departure system.
Uempty - The amount of physical memory on the arrival system must be at least as high as
the amount on the departure system.
- The processor speed for the arrival system must be at least as high as the
processor speed for the departure system.
- The version, release, modification, and fix level of the xlC.rte file set must be the
same on the arrival system and the departure system.
Instructor notes:
Purpose — Explain the compatibility requirements of two servers to support relocation of a
WPAR.
Details —
Additional information —
Transition statement — Let’s look at the final steps involved in invoking the relocation
between the servers.
Uempty
• Create the WPAR on the target system using the spec file.
# mkwpar -p –f /tmp/wparA -specfile 6
• Migrate the active WPAR using the key on the target system.
8
# /opt/mcr/bin/movewpar –k 49ff5ba00000838c \
wparA <IP address of source system>
© Copyright IBM Corporation 2009
Notes:
When creating a workload partition, or when you have to create many workload partitions, it
can be long and complex. A specification file can be used instead and specified as an
argument of the mkwpar command (mkwpar –f wpar.specfile). Also, you can create a spec
file from an existing workload partition. The file /etc/wpars/xxx.cf contains the file for
the WPAR xxx. You can use an existing specification file to create the next WPAR, create a
near clone WPAR, and to document a current WPAR configuration.
To do an enhanced live migration, there needs to be a migration server running on the
source system. When you start a migration server (using the movewpar command) to
support the WPAR which you intend to relocate, the server generates a connection key.
Record the connection key value; this key is needed when you request the actual
migration.
To run the actual migration, you start a migration client at the target system (using the
movewpar command), provide it with the information of what server to connect to, what
WPAR to migrate, and what connection key to use. The migration then proceeds just as if
you had requested it from the WPAR Manager GUI interface.
Instructor notes:
Purpose — Discuss the relocation of WPAR.
Details —
Additional information —
Transition statement — Let’s review what we have covered.
Uempty
1
1. On the NFS server, create, and export file system for backup
2
2. Create WPAR on source system (with local file systems)
3
3. Start and use the WPAR
4.
4 Quiesce and stop applications (if needed)
5.
5 Invoke static relocation task to compatible target system
ar WPAR
wp st
op Mgr ar
tw
st pa
r
Source Target
System System
WPAR WPAR
ServerA ServerA
sav
/va ewpa NFS tw par relocated
r/ a
dm r Server res
/W (backup)
PA
Local R Local
file systems file systems
© Copyright IBM Corporation 2009
Figure 10-37. Steps for WPAR static relocation (WPAR Mgr GUI) AN151.0
Notes:
NFS for static relocation
A major requirement for WPAR Manager version 1.2 static relocation is the use of an
NFS server to hold the backup images. Based upon prior WPAR backups, you should
know how large the allocated file system, on the NFS server, will need to be. Both the
source and target LPARs will need to have root read-write access to this NFS file
system. You need to manually define the mount of this NFS file system in both the
source and target systems’ global environments, using the expected mount point of
/var/adm/WPAR, (though that can be modified as a WPAR Manager application
configuration setting).
WPAR definition
The WPAR does not have to be checkpoint-enabled, since you are not going to use live
relocation. On the other hand, the WPAR must not use NFS for its private file systems.
Static relocation expects to back up these file systems to NFS and then restore them
from NFS.
1
1. On the NFS server, create, and export WPAR filesystems.
2
2. Create WPAR on Source system (with checkpointable flag).
3
3. Start the WPAR.
4
4. Checkpoint the WPAR (store state file on NFS server).
5
5. Create WPAR on target system (with checkpointable flag).
6
6. Restart the WPAR on target system using statefile.
lparX lparY
Source Target
System System
WPAR WPAR
wparB wparB
NFS relocated
Server
Figure 10-38. Steps for checkpoint and restart relocation: CLI AN151.0
Notes:
Here is an overview of the different steps needed to implement live relocation using the
checkpoint and restart technologies.
1. Create the file-systems structure on the nfs server for the WPAR:
• /
• /tmp
• /home
• /var
• optionally /usr and /opt (not recommended)
Then export the file systems with root access to both the global environments and to the
WPAR. Here is an example:
2. Create the WPAR – checkpointable (mkwpar –c)
3. Start the WPAR – startwpar.
Uempty 4. Checkpoint the running WPAR using the chkptwpar command. Specify the state file
name and the –k option in the command (kill the WPAR running). That requires an
empty directory in which the state file will be created (this directory must be accessible
from both systems).
5. Create the WPAR on the target system using the mkwpar command (checkpointable).
6. Restart this WPAR using the state-file previously created during the checkpoint
operation.
Verify that the application is still running after the relocation.
Instructor notes:
Purpose — Provide an overview of the checkpoint and restart based live relocation
procedure.
Details —
Additional information —
Transition statement — Let’s look at the steps in detail.
Uempty
• Create the file systems structure on the NFS server for the
WPAR.
1
– Optionally: /usr , /opt
– /, /tmp, /home, /var , and any application file systems
{nfsserver} /# exportfs
/export/wpars -sec=sys,rw,
access=lparX:wparB:lparY,root=lparX:wparB:lparY
/export/wpars_home -sec=sys,rw,
access=lparX:wparB:lparY,root=lparX:wparB:lparY
/export/wpars_tmp -sec=sys,rw,
access=lparX:wparB:lparY,root=lparX:wparB:lparY
/export/wpars_var -sec=sys,rw,
access=lparX:wparB:lparY,root=lparX:wparB:lparY
/export/wpars_cpr -sec=sys,rw,
access=lparX:wparB:lparY,root=lparX:wparB:lparY
© Copyright IBM Corporation 2009
Notes:
Both live relocation and enhanced live relocation require that the WPAR private file
systems be served from an NFS server. You need to be sure that the file systems
allocation on the NFS server are large enough.
When configuring NFS to export these file systems, be sure to provide root read-write
access to the WPAR and to the global environment on both the source and target systems.
The only difference between the enhanced live relocation and the live relocation setup is
that the live relocation setup requires an additional file system to hold the state files. This is
shown in the visual as the /export/wpars_cpr line in the exportfs report.
Instructor notes:
Purpose — Cover the NFS setup for live application mobility.
Details —
Additional information —
Transition statement — Once the NFS server is setup, we next need to define and start
the WPAR on the source system.
Uempty
Notes:
One important step in a WPAR migration is to create a checkpoint of the system WPAR.
That requires an empty directory in which the statefile will be created. In our example, an
empty directory named /export/wpars_cpr is created on the NFS shared filesystem
and must be mountable from both AIX systems (must be visible from inside and outside the
WPAR)
The –c option in the mkwpar command specifies that the WPAR which is created will be
checkpointable.
Instructor notes:
Purpose — Cover the definition and starting of the WPAR.
Details —
Additional information —
Transition statement — Next, let’s begin the relocation process.
Uempty
Notes:
WPAR live relocation uses the following approach:
• Freezing running applications and other services within a WPAR
• Performing a checkpoint which saves all execution state to a checkpoint file
• Restoring the execution state on a different but compatible system or LPAR
• Restarting the applications and other services from the restored execution state
This is primarily done by saving the runtime state of the WPAR and its processes and then
reconstructing the state using the configuration and the saved runtime state. The restarted
application resumes at the point where the checkpoint was done and the state of its objects
like memory, file objects, network connections and IPC objects is restored without loss of
any data.
Checkpoint and restart
Live relocation depends upon the process of saving (through a checkpoint operation) an
application or system service's complete execution state and then restarting that
Checkpoint (1 of 2)
IBM Power Systems
1. What are the three forms of file system access within a WPAR?
Notes:
Write down your answers here:
Checkpoint solutions (1 of 2)
IBM Power Systems
Additional information —
Transition statement —
Checkpoint (2 of 2)
IBM Power Systems
Notes:
Write down your answers here:
Checkpoint solutions (2 of 2)
IBM Power Systems
Additional information —
Transition statement —
Unit summary
IBM Power Systems
Notes:
References
Online AIX Version 6.1 Command Reference volumes 1-6
Online AIX Version 6.1 Kernel Extensions and Device Support
Programming Concepts (Chapter 16. Debug Facilities)
Online AIX Version 6.1 Operating system and device management
(section on System Startup)
Note: References listed as “online” above are available at the
following address:
http://publib.boulder.ibm.com/infocenter/systems
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit objectives
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
System dumps
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Types of dumps
IBM Power Systems
• Traditional:
– AIX generates dump prior to halt
• Firmware assisted (fw-assist):
– POWER6 firmware generates dump in parallel with AIX V6 halt
process
– Defaults to same scope of memory as traditional
– Can request a full system dump
• Live Dump Facility:
– Selective dump of registered components without need for a system
restart
– Can be initiated by software or by operator
– Controlled by livedumpstart and dumpctrl
– Written to a file system rather than a dump device
Notes:
Overview
In addition to the traditional dump function, AIX 6.1 introduces two new types of dumps.
Traditional dumps
Traditionally, AIX alone handled system dump generation and the only way to get a
dump was to halt the system either due to a crash or through operator request. In a
logical partition it will only dump the memory that is allocated to that partition.
Uempty In its default mode, it will capture the same scope of memory as the traditional dump,
but it can be configured for a full memory dump.
If, for some reason (such as memory restrictions), a configured or requested firmware
assisted dump is not possible, then the traditional dump facility will be invoked.
More details on the configuration and initiation of firmware assisted dumps will be
covered later in the context of the sysdumpdev and sysdumpstart commands.
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain the different types of dumps
Details — This is only an overview of the dump types. Do not go into much detail here.
There are two main reasons for introducing these dump types. First, they will likely hear
them referred to as being in AIX 6.1 and this will help clarify what these are about. Second,
they will see references to the firmware assisted dumps when we look at the smit panels
and line commands for dump management, later in the unit.
Additional information —
Transition statement — Let’s look at ways a system dump might be created.
Uempty
Through At Through
keyboard or unexpected command
reset button system halt
Through remote
reboot facility
Through
HMC reset/ Through SMIT
dump
Notes:
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
- For logical partitions running AIX, the HMC can issue a restart with dump request
which is the functional equivalent of the previously described reset button triggered
dump.
- The superuser can issue a command directly, or through SMIT, to invoke a system
dump.
- The remote reboot facility can also be used to create a system dump.
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
888
code
Software Hardware
Reset
102 103
Yes
Reset for Reset twice
crash code for SRN
yyy-zzz
Reset for
dump code Reset once
for FRU Optional
codes for
hardware
Reset eight times failure
for location code
Notes:
103 code
A 103 usually indicates a hardware error. In an HMC managed LPAR environment,
hardware errors are reported through the service focal point of the HMC; thus, you
should not expect to see an 888-103 sequence for in an LPAR reference code field on
the HMC. Working with the HMC facilities is covered in the LPAR training (either AU730
or the AN301).
If you do have an 888-103 sequence, pressing the reset button twice will get a Service
Request Number, which may be used by IBM support to analyze the problem.
In case of a hardware failure, additional resets would retrieve the sequence number of
the Field Replaceable Unit (FRU) and a location code. The location code identifies the
physical location of a device.
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce what an 888 display code means.
Details — Describe what students have to do when an 888 display occurs. Emphasize
that, in an HMC managed LPAR environment, they should only see the 888-102 sequence.
The focus here is on crashes which result in dumps (the left side of the diagram).
Additional information —
Transition statement — Whether an unintended system crash or an administrator
requested dump, where is the dump stored and how do we access it?
Uempty
hd6
/dev/hd6 Primary dump device
Next boot:
Copy dump into ...
Notes:
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe what happens if a dump occurs.
Details — Base your presentation on the material in the student notes.
Additional information — None
Transition statement — Let’s find out where all this information is written and how you
can customize this.
Uempty
Notes:
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Examples on visual
The examples on the visual illustrate use of several of the sysdumpdev flags discussed
in the preceding material.
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Discuss the sysdumpdev command and its various options.
Details — When you install the operating system, the dump device is automatically
configured for you. By default the primary device is /dev/hd6, which is a paging logical
volume, and the secondary device is /dev/sysdumpnull.
If a dump occurs to paging, the system will automatically copy the dump when the system
is rebooted. By default, the dump gets copied to the /var/adm/ras directory. We will
look at this in detail later in this unit.
The recommended size for the dump device is at least a quarter of the size of real memory.
In problem situations where the current dump device does not meet this recommendation,
it is advisable to create a temporary dump logical volume of the size required and manually
recreate the environment in which a previous dump occurred. If the dump device is not
large enough, the system will produce a partial dump only. It is possible, but extremely
unlikely, that a support center can determine the cause of the crash from a partial dump.
The -e flag can be used as a starting point to determine how big the dump device should
be.
Discussion Items - What is the advantage of having two dump areas?
Answer: For a backup media.
Additional information — Historic note: For systems that were migrated from AIX V3.2 to
AIX V4 or later, the primary dump device is set to what it was formerly - /dev/hd7.
Transition statement — For systems with more than 4 GB of memory, a dedicated dump
device is created at installation time.
Uempty
48 GB and up 4 GB
Notes:
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain that a dedicated dump device is created for systems with more than 4
GB of main memory.
Details — Point out that the size of the dedicated dump device depends on the amount of
physical memory on this system and mention the default name of the dedicated dump
device.
Additional information —
Transition statement — You can specify the name and size of the dedicated dump device
instead of using the defaults we have just discussed.
Uempty
/bosinst.data
...
control_flow:
CONSOLE = /dev/vty0
...
large_dumplv:
DUMPDEVICE = /dev/lg_dumplv
SIZEGB = 1
Notes:
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# sysdumpdev -e
0453-041 estimated dump size in bytes: 10485760
Notes:
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Show how to estimate the disk space needed for a system dump.
Details — The command sysdumpdev -e will estimate the dump size. It is just an estimate.
To be safe, the disk space should be larger than the estimate. Also, if the system has
dumped in the past, looking at the size of the past dump can provide more guidance on
sizing the dump device. This can be seen using the command sysdumpdev -L (mentioned
earlier in the unit).
In AIX V4.3.2, the ability to compress the dump was introduced. Turning on dump
compression will reduce the space needed significantly. Dump compression is on by
default in AIX 5L V5.3. Dumps are always compressed in AIX 6.1 and later.
You should mention a few other points about dump devices:
• If a paging device (like hd6) is used for dumps, it must be part of rootvg.
• The primary dump device must always be in the rootvg.
• The secondary dump device may be outside rootvg as long as it is not a paging device.
• Prior to 4.3.3, dump devices should not be mirrored. The dump information was written
to only one mirror and the mirror was not marked stale. When rebooting, the information
in the dump device would write the data to the dump file using both copies of the mirror
even though only one mirror had the correct information. This created a corrupted dump
file. In 4.3.3, this was corrected by allowing the dump file to be read only from the good
copy.
• AIX at V5.3 and later allows a DVD device to be used as a primary or secondary dump
device.
Additional information —
Transition statement — Let’s look at a new feature in AIX 5L that checks dump space
sizes.
Uempty
dumpcheck utility
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Uempty The -p flag of sysdumpstart is used to specify a dump to the primary dump device.
The -s flag of sysdumpstart is used to specify a dump to the secondary dump device.
The -t flag of sysdumpstart is used to change the default type from fw_assist to
traditional.
The -f flag of sysdumpstart is used to change the scope of the dump (interacts with
the configuration set up with sysdumpdev):
- disallow - Do not allow a full memory dump.
- require - Require a full memory dump.
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
S1
ump
D
login: #dump#>1
Add a TTY
...
REMOTE Reboot ENABLE: dump
REMOTE Reboot STRING: #dump#
...
Notes:
reboot_enable
The value of this attribute (referred to as REMOTE Reboot ENABLE in SMIT) indicates
whether this port is enabled to reboot the machine on receipt of the remote
reboot_string, and if so, whether to take a system dump prior to rebooting:
- no: Indicates remote reboot is disabled
- reboot: Indicates remote reboot is enabled
- dump: Indicates remote reboot is enabled, and, prior to rebooting, a system dump will
be taken on the primary dump device
reboot_string
This attribute (referred to as REMOTE Reboot STRING in SMIT) specifies the remote
reboot_string that the serial port will scan for when the remote reboot feature is
enabled. When the remote reboot feature is enabled, and the reboot_string is
received on the port, a '>' character is transmitted, and the system is ready to reboot. If
a '1' character is received, the system is rebooted (and a system dump may be started,
depending on the value of the reboot_enable attribute); any character other than '1'
aborts the reboot process. The reboot_string has a maximum length of 16 characters
and must not contain a space, colon, equal sign, null, new line, or Ctrl-\ character.
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how to start a dump from a TTY.
Details — Base your explanation on the material in the student notes.
Additional information — As mentioned in the student notes, the values for REMOTE
Reboot ENABLE are:
no Remote reboot is disabled
reboot Remote reboot is enabled
dump Remote reboot is enabled and a dump will occur prior to reboot
There is a good discussion of the remote boot facility (starting on page 24) in the AIX 5L
Version 5.3 System Management Guide: Operating System and Devices.
Transition statement — Let’s look at the dump interface of SMIT.
Uempty
# smit dump
System Dump
Move cursor to desired item and press Enter
Show Current Dump Devices
Show Information About the Previous System Dump
Show Estimated Dump Size
Change the Type of Dump
Change the Full Memory Dump Mode
Change the Primary Dump Device
Change the Secondary Dump Device
Change the Directory to which Dump is Copied on Boot
Start a Dump to the Primary Dump Device
Start a Traditional System Dump to the Secondary Dump Device
Copy a System Dump from a Dump Device to a File
Always ALLOW System Dump
Check Dump Resources Utility
Change/Show Global System Dump Properties
Change/Show Dump Attributes for a Component
Change Dump Attributes for multiple Components
Notes:
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the SMIT dump interface.
Details — Do not go into too much detail here. Just mention that SMIT uses the
sysdumpdev command for many of the items and they were covered earlier. Explain that
PCI machines should always allow a system dump. Historically, MCA machines could put
the physical key in the service mode to achieve this. This setting was created specifically
for PCI machines.
While there are three new AIX 6.1 items at the bottom of the menu, they are for the
component live dump facility. If asked about them, be ready to place them in context, but
avoid getting into the details which is outside the scope of this course.
On the other hand, the Change Type of Dump and Change Full Memory Dump Mode
are new items with AIX 6.1, which relate to the firmware assisted dump capabilities we
previously introduced.
The name of the menu item Start a Dump to the Secondary Dump Device has changed
in AIX 6.1 to Start a Traditional System Dump to the Secondary Dump Device in order
to distinguish this from the firmware assisted dump.
Additional information — None
Transition statement — Let’s discuss dump-related LED codes.
Uempty
Notes:
System-initiated dumps
If a system dump is initiated through a kernel panic, the LEDs on an RS/6000 will
display 0c9 while the dump is in progress, and then either a flashing 888 or a steady
0c0.
All of the LED codes following the flashing 888 (remember: you must use the Reset
button), should be recorded and passed to IBM. While rotating through the 888
sequence, you will encounter one of the codes shown. The code you want to see is 0c0,
indicating that the dump completed successfully.
User-initiated dumps
For user-initiated system dumps to the primary dump device, the LED codes should
indicate 0c2 for a short period, followed by 0c0 upon completion.
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Dump occurs
rc.boot 2
Is there
yes sufficient space
in /var to copy
dump to?
Dump copied no
to /var/adm/ras
Display the copy Forced copy flag
dump to tape =
Menu. TRUE
Boot continues
© Copyright IBM Corporation 2009
Notes:
The system dump is 583973 bytes and will be copied from /dev/hd6
to media inserted into the device from the list below.
Please make sure that you have sufficient blank, formatted media
before proceeding.
88 Help?
99 Exit
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# smit chgsys
Change/Show Characteristics of Operating System
...
Notes:
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe how to set up an automatic reboot after a crash.
Details — Base your explanation on the material in the student notes.
Additional information — None
Transition statement — Let’s discuss the snap command.
Uempty
# snap -a -o /dev/rmt0
Notes:
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Explain how the system dump should be prepared before it is sent to the IBM
Support Center.
Details — Provide the students with as much of the following information as you think is
appropriate:
The information gathered with the snap command can be used to identify and resolve
system problems. You must have root authority to execute this command.
If you use the -a flag, then you need approximately 8 MB of temporary disk space to collect
all the system information, including the contents of the error log (covered in a previous
unit).
The -g flag gathers the following information:
• Error report
• Copy of the customized ODM
• Trace file
• User environment
• Amount of physical memory and paging space
• Device and attribute information
• Security user information
The output from the -g flag is written to
/tmp/ibmsupt/general/general.snapfile. However, you can specify another
directory using the -d flag.
The execution of snap appends information to the previously created files. Use the -r flag
to remove previously gathered and saved information.
Before you send your media to the support center, ensure you call them and obtain a
Problem Management Number (PMR) which will be used to trace the status of your
problem. Ensure you label the media with this number, and also the other pieces of
information listed, to help the support team act quickly on your problem.
There is not much left for you to do after this, apart from waiting for a response from the
Support Center. However, you may want to have a look at your dump to try and analyze it
yourself. The tool that is used by the support center to analyze your dump is called kdb
(crash prior to AIX 5L V5.1), which is also available on the system; however, the output
from the command is very user unfriendly. Most people do not bother with this.
See the student notes for the AIX 5L V5.3 enhancements.
Additional information — In AIX 5L, the pax command was enhanced to allow archiving
of large files, such as dumps. The tar command, which was used prior to AIX 5L, does not
support files larger than 2 GB. If the file to be archived is larger than 2 GB, the only thing
available is pax.
Uempty Transition statement — Let's take a brief look at kdb to see how it can be used.
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
/unix
/var/adm/ras/vmcore.x
(Kernel)
(Dump file)
# uncompress /var/adm/ras/vmcore.x.Z
or
# dmpuncompress /var/adm/ras/vmcore.x.BZ
# kdb /var/adm/ras/vmcore.x /unix
> status
> stat
(further sub-commands for analyzing)
> quit
Notes:
Useful subcommands
Examining a system dump requires an in-depth knowledge of the AIX kernel. However,
there are two subcommands that might be useful to you:
- The subcommand status displays the processes/threads that were active on the
CPUs when the crash occurred
- The subcommand stat shows the machine status when the dump occurred
To exit the kdb debug program, type quit at the > prompt.
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Additional information — Prior to AIX 5L V5.1, the crash command was used instead of
the kdb command.
Transition statement — We have reached a checkpoint.
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint
IBM Power Systems
1. If your system has less than 4 GB of main memory, what is the default
primary dump device? Where do you find the dump file after reboot?
_________________________________________________________
_________________________________________________________
4. If the copy directory is too small, will the dump, which is copied during
the reboot of the system, be lost?
_________________________________________________________
_________________________________________________________
Notes:
Checkpoint solutions
IBM Power Systems
1. If your system has less than 4 GB of main memory, what is the default primary
dump device? Where do you find the dump file after reboot?
The default primary dump device is /dev/hd6. The default dump file is
/var/adm/ras/vmcore.x, where x indicates the number of the dump.
4. If the copy directory is too small, will the dump, which is copied during the reboot
of the system, be lost?
If the force copy flag is set to TRUE, a special menu is shown during reboot. From
this menu, you can copy the system dump to portable media.
5. Which command should you execute to collect system data before sending a
dump to IBM?
snap
Additional information — Here are a couple of points you might want to make when
going over the answers to the checkpoint:
• If there is 4 GB or more of memory, then a dedicated dump logical volume is created.
• Dump compression can be turned off with the -c flag of sysdumpdev.
Transition statement — Let’s switch over to the lab.
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit summary
IBM Power Systems
Notes:
When a dump occurs, kernel and system data are copied to the primary dump device.
By default, the system has a primary dump device (/dev/hd6) and a secondary device
(/dev/sysdumpnull).
During reboot, the dump is copied to the copy directory (/var/adm/ras).
A system dump should be retrieved from the system using the snap command.
The Support Center uses the kdb debugger to examine the dump.
© Copyright IBM Corp. 2009 Unit 11. The AIX system dump facility 11-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Checkpoint solutions
IBM Power Systems
Unit 2:
Checkpoint solutions
IBM Power Systems
CuAt
AP Unit 3:
Checkpoint solutions
IBM Power Systems
1. Which command generates error reports? Which flag of this command is used to
generate a detailed error report?
errpt
errpt -a
2. Which type of disk error indicates bad blocks?
DISK_ERR4
3. What do the following commands do?
errclear Clears entries from the error log.
errlogger Is used by root to add entries into the error log
4. What does the following line in /etc/syslog.conf indicate?
*.debug errlog
All syslogd entries are directed to the error log.
5. What does the descriptor en_method in errnotify indicate?
It specifies a program or command to be run when an error
matching the selection criteria is logged.
Unit 4:
Checkpoint solutions
IBM Power Systems
AP Unit 5:
Checkpoint solutions (1 of 2)
IBM Power Systems
1. True or False: You must have AIX loaded on your system to use the
System Management Services programs. False. SMS is part of the
built-in firmware.
2. Your AIX system is currently powered off. AIX is installed on hdisk1
but the bootlist is set to boot from hdisk0. How can you fix the problem
and make the machine boot from hdisk1? You need to boot the SMS
programs and set the new boot list to include hdisk1.
3. Your machine is booted and at the # prompt.
What is the command that will display the normal bootlist?
# bootlist -om normal.
How could you change the normal bootlist?
# bootlist -m normal device1 device2
4. What command is used to build a new boot image and write it to the
boot logical volume? bosboot -ad /dev/hdiskx
5. What script controls the boot sequence? rc.boot
Checkpoint solutions (2 of 2)
IBM Power Systems
6. True or False: During the AIX boot process, the AIX kernel is
loaded from the root file system.
False. The AIX kernel is loaded from hd5.
AP Unit 6:
(1)
/etc/init from RAMFS
rc.boot 1
in the boot image
restbase (2)
(4)
ODM files cfgmgr -f (3)
in RAM file system
bootinfo -b (5)
(5)
rc.boot 2 Merge RAM /dev files
(1) (6)
Activate rootvg Copy RAM ODM files
(4)
Turn on
paging
AP
savebase
/etc/inittab
syncd 60
/sbin/rc.boot3 errdemon
rm /etc/nologin
syncvg rootvg &
chgstatus=3
cfgmgr -p2 CuDv ?
cfgmgr -p3
Execute next line in
Start Console: cfgcon /etc/inittab
Start CDE: rc.dt boot
© Copyright IBM Corporation 2009
AP
Checkpoint solutions
IBM Power Systems
Unit 7:
Checkpoint solutions
IBM Power Systems
AP Unit 8:
Checkpoint solutions
IBM Power Systems
Unit 9:
Checkpoint solutions (1 of 4)
IBM Power Systems
4. Why should you not use exportvg with an alternate disk VG?
This will remove rootvg related entries from /etc/filesystems.
AP
Checkpoint solutions (2 of 4)
IBM Power Systems
Checkpoint solutions (3 of 4)
IBM Power Systems
AP
Checkpoint solutions (4 of 4)
IBM Power Systems
Unit 10:
Checkpoint solutions (1 of 2)
IBM Power Systems
AP
Checkpoint solutions (2 of 2)
IBM Power Systems
Unit 11:
Checkpoint solutions
IBM Power Systems
1. If your system has less than 4 GB of main memory, what is the default primary
dump device? Where do you find the dump file after reboot?
The default primary dump device is /dev/hd6. The default dump file is
/var/adm/ras/vmcore.x, where x indicates the number of the dump.
4. If the copy directory is too small, will the dump, which is copied during the reboot
of the system, be lost?
If the force copy flag is set to TRUE, a special menu is shown during reboot. From
this menu, you can copy the system dump to portable media.
5. Which command should you execute to collect system data before sending a
dump to IBM?
snap
AP Appendix E:
Checkpoint solutions
IBM Power Systems
Directories
mkdir Make directory
cd Change the directory. The default is $HOME directory.
rmdir Remove a directory (beware of files starting with “.”).
rm Remove file; -r option removes directory and all files and
subdirectories recursively.
pwd Print working directory: shows name of current directory
ls List files
-a (all)
-l (long)
-d (directory information)
-r (reverse alphabetic)
-t (time changed)
-C (multi-column format)
-R (recursively)
-F (places / after each directory name & * after each exec file)
Files - Basic
cat List files contents (concatenate). This can open a new file with
redirection, for example, cat > newfile. Use <Ctrl>d to end
input.
chmod Change the permission mode for files or directories.
• chmod =+- files or directories
• (r,w,x = permissions and u, g, o, a = who)
• Can use + or - to grant or revoke specific permissions
• Can also use numerics, 4 = read, 2 = write, 1 = execute
• Can sum them, first is user, next is group, last is other
AP Files - Advanced
awk Programmable text editor / report write
banner Display banner (can redirect to another terminal nn with
> /dev/ttynn)
cal Calendar (cal month year)
cut Cut out specific fields from each line of a file.
diff Differences between two files
find Find files anywhere on disks. Specify location by path (will
search all subdirectories under specified directory).
• -name fl (file names matching fl criteria)
• -user ul (files owned by user ul)
• -size +n (or -n) (files larger (or smaller) than n blocks)
• -mtime +x (-x) (files modified more (less) than x days ago)
• -perm num (files whose access permissions match num)
• -exec (execute a command with results of find command)
• -ok (execute a command interactively with results of find
command)
• -o (logical or)
• -print (display results. Usually included.)
find syntax: find path expression action
For example:
• find / -name "*.txt" -print
• find / -name "*.txt" -exec li -l {} \;
(Executes li -l where names found are substituted for {})
; indicates end of command to be executed and \ removes
usual interpretation as command continuation character)
grep Search for pattern, for example, grep pattern files.
pattern can include regular expressions.
• -c (count lines with matches, but do not list)
• -l (list files with matches, but do not list)
• -n (list line numbers with lines)
• -v (find files without pattern)
Expression metacharacters:
• [ ] matches any one character inside.
• with a - in [ ] will match a range of characters
• ^ matches BOL when ^ begins the pattern.
• $ matches EOL when $ ends the pattern.
• . matches any single character. (same as ? in shell)
Editors
ed Line editor
vi Screen editor
INed LPP editor
emacs Screen editor +
AP Metacharacters
* Any number of characters (0 or more)
? Any single character
[abc] [ ] any character from the list
[a-c] [ ] match any character from the list range
! Not any of the following characters (for example, leftbox !abc
right box)
; Command terminator used to string commands on a single line
& Command preceding and to be run in background mode
# Comment character
\ Removes special meaning (no interpretation) of the following
character
Removes special meaning (no interpretation) of character in
quotes
" Interprets only $, backquote, and \ characters between the
quotes
' Used to set variable to results of a command.
for example, now='date' sets the value of now to current
results of the date command
$ Preceding variable name indicates the value of the variable
AP Variables
= Set a variable (for example, d="day" sets the value of d to
"day"), can also set the variable to the results of a command by
the ` character, for example, now=`date` sets the value of
now to the current result of the date command.
HOME Home directory
PATH Path to be checked
SHELL Shell to be used
TERM Terminal being used
PS1 Primary prompt characters, usually $ or #
PS2 Secondary prompt characters, usually >
$? Return code of the last command executed
set Displays current local variable settings
export Exports variable so that they are inherited by child processes
env Displays inherited variables
echo Echo a message (for example, echo HI or echo $d),
can turn off carriage returns with \c at the end of the message,
can print a blank line with \n at the end of the message.
Transmitting
mail Send and receive mail. With userID sends mail to userID.
Without userID, displays your mail. When processing your mail,
at the ? prompt for each mail item, you can:
• d - delete
• s - append
• q - quit
• enter - skip
• m - forward
mailx Upgrade of mail
uucp Copy file to other UNIX systems (UNIX to UNIX copy)
System administration
df Display file system usage
installp Install program
kill (pid) Kill batch process with ID or (PID) (find using ps);
kill -9 PID will absolutely kill process
mount Associate logical volume to a directory;
for example, mount device directory
ps -ef Shows process status (ps -ef)
umount Disassociate file system from directory
smit System management interface tool
Miscellaneous
banner Displays banner
date Displays current date and time
newgrp Change active groups
nice Assigns lower priority to following command (for example,
nice ps -f)
passwd Modifies current password
sleep n Sleep for n seconds
stty Show or set terminal settings
touch Create a zero length files
xinit Initiate X-Windows
wall Sends message to all logged in users
who List users currently logged in (who am i identifies this user)
man,info Displays manual pages
System files
/etc/group List of groups
/etc/motd Message of the day, displayed at login
/etc/passwd List of users and signon information. Password shown as !,
can prevent password checking by editing to remove !
/etc/profile System wide user profile executed at login, can override
variables by resetting in the user's .profile file
/etc/security Directory not accessible to normal users
/etc/security/environ User environment settings
/etc/security/group Group attributes
/etc/security/limits User limits
/etc/security/login.cfg Login settings
/etc/security/passwd User passwords
/etc/security/user User attributes, password restrictions
Variables
var=string Set variable to equal string. (NO SPACES). Spaces must be
enclosed by double quotes. Special characters in string must
be enclosed by single quotes to prevent substitution. Piping (|),
redirection (<, >, >>), and & symbols are not interpreted.
$var Gives value of var in a compound
echo Displays value of var, for example, echo $var
HOME = Home directory of user
MAIL = Mail file name
PS1 = Primary prompt characters, usually "$" or "#"
PS2 = Secondary prompt characters, usually ">"
PATH = Search path
TERM = Terminal type being used
export Exports variables to the environment
env Displays environment variables settings
Commands
# Comment designator
&& Logical-and. Run command following && only if command
Preceding && succeeds (return code = 0)
|| Logical-or. Run command following || only if command
preceding || fails (return code < > 0)
exit n Used to pass return code nl from shell script, passed as
variable $? to parent shell
expr Arithmetic expressions
Syntax: "expr expression1 operator expression2"
operators: + - \* (multiply) / (divide) % (remainder)
for loop for n (or: for variable in $*); for example,:
do
command
done
if-then-else if test expression
then command
elif test expression
then command
else
then command
fi
read Read from standard input
shift Shifts arguments 1-9 one position to the left and decrements
number of arguments
test Used for conditional test, has two formats.
if test expression (for example, if test $# -eq 2)
if [ expression ]
(for example, if [ $# -eq 2 ]) (spaces required)
Integer operators:
-eq (=) -lt (<) -le (=<)
-ne (<>) -gt (>) -ge (=>)
String operators:
= != (not eq.) -z (zero length)
File status (for example, -opt file1)
• -f (ordinary file)
• -r (readable by this process)
• -w (writable by this process)
• -x (executable by this process)
• -s (non-zero length)
while loop while test expression
do
command
done
Miscellaneous
sh Execute shell script in the sh shell
-x (execute step-by-step, used for debugging shell scripts)
vi Editor
Entering vi
vi file Edits the file named file
vi file file2 Edit files consecutively (through :n)
.exrc File that contains the vi profile
wm=nn Sets wrap margin to nn. Can enter a file other than at first line
by adding + (last line), +n (line n), or +/pattern (first occurrence
of pattern).
vi -r Lists saved files
vi -r file Recover file named file from crash
:n Next file in stack
:set all Show all options
:set nu Display line numbers (off when set nonu)
:set list Display control characters in file
Units of measure
h, l Character left, character right
k or <Ctrl>p Move cursor to character above cursor
j or <Ctrl>n Move cursor to character below cursor
w, b Word right, word left
^, $ Beginning, end of current line
<CR> or + Beginning of next line
- Beginning of previous line
G Last line of buffer
Cursor movements
Can precede cursor movement commands (including cursor arrow) with number of times to
repeat, for example, 9--> moves right nine characters.
0 Move to first character in line
$ Move to last character in line
^ Move to first nonblank character in line
fx Move right to character x
Fx Move left to character x
AP Adding text
a Add text after the cursor (end with <esc>)
A Add text at end of current line (end with <esc>)
i Add text before the cursor (end with <esc>)
I Add text before first nonblank character in current line
o Add line following current line
O Add line before current line
<esc> Return to command mode
Deleting text
<Ctrl>w Undo entry of current word
@ Kill the insert on this line
x Delete current character
dw Delete to end of current word (observe punctuation)
dW Delete to end of current word (ignore punctuation)
dd Delete current line
d Erase to end of line (same as d$)
d) Delete current sentence
d} Delete current paragraph
dG Delete current line through end of buffer
d^ Delete to the beginning of line
u Undo last change command
U Restore current line to original state before modification
Replacing text
ra Replace current character with a
R Replace all characters overtyped until <esc> is entered
s Delete current character and append test until <esc>
s/s1/s2 Replace s1 with s2 (in the same line only)
S Delete all characters in the line and append text
cc Replace all characters in the line (same as S)
Moving text
p Paste last text deleted after cursor (xp will transpose 2
characters)
P Paste last text deleted before cursor
nYx Yank n text objects of type x (w, b = words,) = sentences, } =
paragraphs, $ = end-of-line, and no "x" indicates lines. Can
then paste them with p command. Yank does not delete the
original.
"ayy" Can use named registers for moving, copying, cut/paste with
"ayy" for register a (use registers a-z), can then paste them with
ap command.
Miscellaneous
. Repeat last command
J Join current line with next line
0c0 - 0cc
0c0 A user-requested dump completed successfully.
0c1 An I/O error occurred during the dump.
0c2 A user-requested dump is in progress. Wait at least one minute for the
dump to complete.
0c4 The dump ran out of space. Partial dump is available.
0c5 The dump failed due to an internal failure. A partial dump may exist.
0c7 Progress indicator. Remote dump is in progress.
0c8 The dump device is disabled. No dump device configured.
0c9 A system-initiated dump has started. Wait at least one minute for the
dump to complete.
0cc (AIX 4.2.1 and later) An error occurred writing to the primary dump
device. It switched over to the secondary.
100 - 195
100 Progress indicator. BIST completed successfully.
101 Progress indicator. Initial BIST started following system reset.
102 Progress indicator. BIST started following power on reset.
103 BIST could not determine the system model number.
104 BIST could not find the common on-chip processor bus address.
105 BIST could not read from the on-chip sequencer EPROM.
106 BIST detected a module failure.
111 On-chip sequencer stopped. BIST detected a module error.
112 Checkstop occurred during BIST and checkstop results could not be
logged out.
113 The BIST checkstop count equals 3, that means three unsuccessful
system restarts. System halts.
120 Progress indicator. BIST started CRC check on the EPROM.
121 BIST detected a bad CRC on the on-chip sequencer EPROM.
© Copyright IBM Corp. 2009 Appendix C. AIX dump code and progress codes C-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
AP 187 BIST was unable to identify the chip release level in the checkstop
logout data.
195 Progress indicator. The BIST checkstop logout completed.
© Copyright IBM Corp. 2009 Appendix C. AIX dump code and progress codes C-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Appendix C. AIX dump code and progress codes C-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Appendix C. AIX dump code and progress codes C-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Appendix C. AIX dump code and progress codes C-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
c00 - c99
c00 AIX Install/Maintenance loaded successfully.
c01 Insert the AIX Install/Maintenance diskette.
c02 Diskettes inserted out of sequence.
c03 Wrong diskette inserted.
c04 Irrecoverable error occurred.
© Copyright IBM Corp. 2009 Appendix C. AIX dump code and progress codes C-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Appendix C. AIX dump code and progress codes C-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Appendix objectives
IBM Power Systems
Notes:
AP Instructor notes:
Purpose — Present the objectives for this appendix.
Details —
Additional information —
Transition statement — Let’s start with an overview of how the auditing subsystem works.
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Kernel Applications
Audit Events
/dev/audit
Audit BIN
Audit logger STREAM Audit
records records
Notes:
AP Instructor notes:
Purpose — Describe how auditing works.
Details — Base your explanation on the information in the student materials.
Additional information — The auditing subsystem enables you to capture any event on
the system that changes the state of the security of your system. This visual should be
used to set the scene for the entire session.
An auditable event is any security-relevant occurrence in the system.
The programs and kernel modules that detect auditable events are responsible for
reporting these events to the system audit logger.
The fileset which is required to enable auditing is the bos.rte.security fileset.
Transition statement — Let’s discuss the configuration files for auditing.
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
Introduction
All audit configuration files reside in the directory /etc/security/audit. Individual
configuration files used by the auditing subsystem are described in the material that
follows.
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Provide information about the most important audit configuration files.
Details — Base your explanation on the information in the student materials.
Additional information — None
Transition statement — Let’s identify how to configure the auditing subsystem.
AP
# vi /etc/security/audit/objects
/etc/security/user:
w = "S_USER_WRITE"
...
/etc/filesystems:
w = "MY_EVENT"
/usr/sbin/no:
x = "MY_X_EVENT"
Notes:
Specifying objects
To configure the auditing subsystem, you first specify the objects (files or applications)
that you want to audit in /etc/security/audit/objects. In this file, you find
predefined files, for example, /etc/security/user.
To audit your own files, you have to add stanzas for each file, in the following format:
file:
access_mode = "event_name"
An audit event name can be up to 15 bytes long. Valid access modes are read (r), write
(w), and execute (x).
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
AP Instructor notes:
Purpose — Describe the function of the objects file.
Details — Base your explanation on the information in the student materials.
Additional information — When running a shell script, only a read event is generated. For
an execute event to be triggered, the program must be compiled.
Transition statement — Let’s introduce the events configuration file.
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# vi /etc/security/audit/events
auditpr:
...
Notes:
AP To print out the audit record with all event arguments, printf is used. Different format
specifiers are used, depending on the audit event that occurs. If you want to trigger
other applications that are called whenever an event occurs, you can specify an
event_program. If you do this, always use the full pathname of the event_program.
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe the events configuration file.
Details — Describe it using the information in the student materials.
Additional information — The command printf “%s” will format the output that will be
printed to the appropriate location. The %s indicates to accept a string of characters. There
are a number of format specifiers available with the printf command. Review man pages
for more detail.
Transition statement — Let’s introduce the config configuration file.
AP
# vi /etc/security/audit/config
start:
binmode = off
streammode = on
...
classes:
general = USER_SU, PASSWORD_Change, ...
tcpip = TCPIP_connect, TCPIP_data_in, ...
...
init = USER_Login, USER_Logout
users:
root = general
michael = init
Notes:
Introduction
The /etc/security/audit/config file contains audit configuration information.
The information that follows describes three of the stanzas in this file: start, classes,
and users.
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
AP Instructor notes:
Purpose — Describe the config configuration file.
Details — Describe using the information in the student materials.
Additional information — None
Transition statement — Let’s describe how bin mode works.
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# vi /etc/security/audit/config
start:
binmode = on
streammode = off
bin:
trail = /audit/trail
bin1 = /audit/bin1
bin2 = /audit/bin2
binsize = 10240
cmds = /etc/security/audit/bincmds
...
Notes:
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Describe how the bin mode works.
Details — Describe using the information in the student materials.
Additional information — If binsize is set to 0, no bin switching will occur and all bin
collection will go to bin1.
Transition statement — Let’s describe stream mode.
AP
# vi /etc/security/audit/config
start:
binmode = off
streammode = on
stream:
cmds = /etc/security/audit/streamcmds
...
# vi /etc/security/audit/streamcmds
Notes:
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
AP Instructor notes:
Purpose — Explain stream mode auditing.
Details — Describe using the information in the student materials
Additional information — None
Transition statement — Let’s show how to start and stop the auditing subsystem.
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
# audit start
Start / stop auditing
# audit shutdown
# audit off
Suspend / restart auditing
# audit on
Notes:
AP Instructor notes:
Purpose — Describe the audit command.
Details — Base your explanation on the information in the student materials.
Additional information — The difference between audit shutdown/start vs. audit
off/on is shutdown and start force the configuration files to be reread, whereas off and
on do not reread the configuration files. Also, a shutdown forces the information from the
bin files to be written to the trail file so when the start option is used, the bin files are
empty. The off option leaves the information in the bin files and resumes where it left off
when the on option is specified.
Transition statement — Let’s show some audit records.
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Notes:
AP Instructor notes:
Purpose — Describe examples of audit records.
Details — Describe them using the information in the student materials.
Additional information — None
Transition statement — Let’s provide a flowchart that helps to set up auditing.
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
What applications do I
want to audit? events
Do they trigger events?
Notes:
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Present a flowchart that helps in setting up auditing in a customer’s
environment.
Details — Describe it using the information in the student materials.
Additional information — None
Transition statement — There’s no checkpoint for this appendix. Let’s move on to the
exercise.
AP
Notes:
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Introduce the lab exercise.
Details — Be sure to tell the students where this exercise can be found.
Additional information — None
Transition statement — Let’s take a quick look back at what we discussed in this
appendix.
AP
Appendix summary
IBM Power Systems
Notes:
© Copyright IBM Corp. 2009 Appendix D. Auditing security related events D-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Instructor notes:
Purpose — Review the purpose of this appendix.
Details —
Additional information — None.
Transition statement — This concludes our discussion of auditing.
References
Online AIX Version 6.1 Understanding the Diagnostic
Subsystem for AIX
Note: References listed as “online” above are available at the
following address:
http://publib.boulder.ibm.com/infocenter/systems
Appendix objectives
IBM Power Systems
Notes:
Diagnostics
NIM Master
CD-ROM
bos.diag
Diagnostics
Notes:
Introduction
The lifetime of hardware is limited. Broken hardware leads to hardware errors in the
error log, to systems that will not boot, or to very strange system behavior.
The diagnostic package helps you to analyze your system and discover hardware that
is broken. Additionally, the diagnostic package provides information to service
representatives that allows fast error analysis.
Instructor notes:
Purpose — Give reasons when diagnostics are used. Describe the different sources for
diagnostics.
Details —
Additional information —
Transition statement — Let’s discuss how to use diagnostics.
Uempty
diag
Notes:
Instructor notes:
Purpose — Introduce the diag command.
Details —
Additional information — When the diagnostic tool runs, it automatically tries to diagnose
hardware errors it finds in the error log. The information generated by the diag command is
put back into the error log entry, so that it is easy to make the connection between the error
event and, for example the FRU number required to repair failing hardware.
Transition statement — Let’s show how to work with the diag menus.
Uempty
# diag
FUNCTION SELECTION 801002
Notes:
Notes:
Notes:
Diagnostic modes (1 of 2)
IBM Power Systems
Notes:
Diagnostic modes
Three different diagnostic modes are available:
- Concurrent mode
- Maintenance (single-user) mode
- Service (standalone) mode (covered on the next visual).
Concurrent mode
Concurrent mode provides a way to run online diagnostics on some of the system
resources while the system is running normal system activity. Certain devices can be
tested, for example, a tape device that is currently not in use, but the number of
resources that can be tested is very limited. Devices that are in use cannot be tested.
Instructor notes:
Purpose — Describe diagnostic modes.
Details — In concurrent mode, because the system is running in normal operation, devices
such as the following may require additional actions by the user or diagnostic application
before testing can be done:
• SCSI adapters connected to paging devices
• Disk drives used for paging, or are part of the rootvg
• LFT devices and graphic adapters if a Windowing system is active
• Memory
• Processor
Additional information —
Transition statement — Let’s describe the standalone mode.
Uempty
Diagnostic modes (2 of 2)
IBM Power Systems
Press F5 (or 5)
when logo
appears Boot system in service mode
Notes:
Standalone mode
But what do you do if your system does not boot or if you have to test a system without
AIX installed on the system? In this case, you must use the standalone mode.
Standalone mode offers the greatest flexibility. You can test systems that do not boot or
that have no operating system installed (the latter requires a diagnostic CD-ROM).
4. Press F5 when an acoustic beep is heard and icons are shown on the display.
This simulates booting in service mode (logical key switch).
5. The diag command will be started automatically, from the diagnostic CD-ROM.
6. At this point, you can start your diagnostic routines.
# diag
FUNCTION SELECTION 801002
Notes:
Additional tasks
The diag command offers a wide number of additional tasks that are hardware related.
All these tasks can be found after starting the diag main menu and selecting Task
Selection.
The tasks that are offered are hardware (or resource) related. For example, if your
system has a service processor, you will find service processor maintenance tasks,
which you do not find on machines without a service processor. On some systems, you
find tasks to maintain RAID and SSA storage systems.
Instructor notes:
Purpose — Describe the additional tasks that diag offers.
Details — Explain some typical tasks that are offered.
Additional information — All newer PCI models support the diag command.
Transition statement — The diagnostic output is saved to a binary file so it can be
referenced later. Let’s take a look at that.
Uempty
Diagnostic log
IBM Power Systems
# /usr/lpp/diagnostics/bin/diagrpt -r
ID DATE/TIME T RESOURCE_NAME DESCRIPTION
DC00 Mon Oct 08 16:13:06 I diag Diagnostic Session was started
DAE0 Mon Oct 08 16:10:38 N hdisk2 The device could not be tested
DC00 Mon Oct 08 16:10:13 I diag Diagnostic Session was started
DA00 Mon Oct 08 16:05:11 N sysplanar0 No Trouble Found
DA00 Mon Oct 08 16:05:05 N sisscsia0 No Trouble Found
DC00 Mon Oct 08 16:04:46 I diag Diagnostic Session was started
# /usr/lpp/diagnostics/bin/diagrpt -a
IDENTIFIER: DC00
Date/Time: Mon Oct 08 16:13:06
Sequence Number: 15
Event type: Informational Message
Resource Name: diag
Diag Session: 327726
Description: Diagnostic Session was started.
----------------------------------------------------------------------------
IDENTIFIER: DAE0
Date/Time: Mon Oct 08 16:10:38
Sequence Number: 14
Event type: Error Condition
Resource Name: hdisk2
Resource Description: 16 Bit LVD SCSI Disk Drive
Location: U7311.D20.107F67B-P1-C04-T2-L8-L0
Notes:
Diagnostic log
When diagnostics are run in online or single user mode, the information is stored into a
diagnostic log. The binary file is called /var/adm/ras/diag_log. The command,
/usr/lpp/diagnostics/bin/diagrpt, is used to read the content of this file.
Report fields
The ID column identifies the event that was logged. In the example in the visual, DC00
and DA00 are shown. DC00 indicated the diagnostics session was started and the DA00
indicates No Trouble Found (NTF).
The T column indicates the type of entry in the log. I is for informational messages. N is
for No Trouble Found. S shows the Service Request Number (SRN) for the error that
was found. E is for an Error Condition.
Instructor notes:
Purpose — Show the contents of a diagnostics log.
Details — Review the visual content for the diagnostics log. The student notes explain the
ID and Types that are displayed.
Additional information — The IDs that currently exist are:
DC00 - Diagnostic controller session started
DCF0 - Diagnostic controller reported an SRN from missing options
DCF1 - Diagnostic controller reported an SRN from new resource
DCE1 - Diagnostic controller reported ERROR_OTHER
DA00 - Diagnostic application reported NTF (No Trouble Found)
DAF0 - Diagnostic application reported an SRN
DAFE - Diagnostic application reported an ELA (Error Log Analysis) SRN
DAE0 - Diagnostic application reported ERROR_OPEN
DAE1 - Diagnostic application reported ERROR_OTHER
Transition statement — Let’s answer some checkpoint questions.
Uempty
Checkpoint
IBM Power Systems
Notes:
Instructor notes:
Purpose — Review and test the students, understanding of this unit.
Details — A suggested approach is to give the students about five minutes to answer the
questions on this page. Then, go over the questions and answers with the class.
Checkpoint solutions
IBM Power Systems
Additional information —
Transition statement — Now, let’s do an exercise.
Uempty
Notes:
Introduction
This exercise can be found in your Student Exercise Guide.
Instructor notes:
Purpose — Explain the goals of the lab.
Details — Clearly explain what students have to do.
Additional information — This exercise should be performed only by one person per
system.
Transition statement — Let’s summarize.
Uempty
Appendix summary
IBM Power Systems
Notes:
Instructor notes:
Purpose — Summarize the unit.
Details — Present the highlights from the unit.
Additional information —
Transition statement —
backpg
Back page