Professional Documents
Culture Documents
Introduction To Performance Management
Introduction To Performance Management
Introduction To Performance Management
Introduction to Performance
Management
Overview
For quite some time, consumers have come to expect the continuous availability of
many basic services such as water, power, and telephone. Today, consumers also expect
to access cash from automatic teller machines, to purchase goods with credit cards 24
hours a day, to make airline reservations at any time, and to have access to around-the-
clock emergency services.
In companies where customer service reigns, the customer-not the business-
determines when, where, and how services should be provided. Effectively managing
your system’s performance is one way to ensure that you meet the availability
requirements of your customers.
This section:
• Defines outages, both planned and unplanned
• Describes the causes of outages
• Defines performance management in the Tandem environment
• Explains how performance management fits into Tandem’s operations management
(OM) framework
What Is an Outage?
When an application is not performing properly-for example, when user response time
is unacceptable-availability can be affected and can result in an outage. Tandem
defines availability as the total time an application running on a Tandem system can be
accessed by a user of that application. An outage is time during which your NonStop
system is not capable of doing useful work. From the end-user’s perspective, an outage
is any time an application is not available.
Outages consist of two types: planned and unplanned. Maximizing availability of your
systems, networks, and applications can be achieved by reducing or eliminating both
types of outages.
Planned Outages
A planned outage is system or application downtime that is planned or scheduled.
Planned outages occur when there are changes that must be implemented and the
computing environment must be stopped to implement the changes. An example of
such a change is the installation of a new version of the operating system. You can
reduce or eliminate planned outages by:
• Performing changes online. Online change is any change that can be performed
while the Tandem NonStop Kernel and system utilities are operational. Being able
to make changes to your hardware and software online is one way to reduce-or
even eliminate-system and application downtime.
• Reducing the time required for planned outages.
The Availability Guide for Change Management provides guidelines for managing
planned outages in an operations environment.
Unplanned Outages
An unplanned outage is the time in which the application or system becomes
unavailable to the end user because of a problem situation. One example of such an
outage is a processor halt caused by severe memory problems. Slow response time can
also appear to be an outage to the end user and can result in the loss of transactions.
You can reduce or eliminate unplanned outages by:
• Predicting and then preventing problems before they occur
• Quickly recovering from problems
This manual explains how to identify and resolve performance problems. It also shows
you how to optimize system performance so that performance problems do not occur.
The Availability Guide for Problem Management also provides comprehensive
information about problem prediction, prevention, and recovery.
Measuring Outages
Tandem believes that the measurement of availability should be from the end-user’s
perspective. For example, it is not enough to record that a certain hardware or software
component has gone down; you must also take into consideration the user’s ability to
access the service, the quality of the service, and whether or not the response time is
within acceptable limits.
Outage Classes
Table 1-1 describes the five outage classes that Tandem uses to categorize the causes of
outages. Physical, design, operations, and environmental outages are usually unplanned
outages; reconfiguration outages are generally planned outages. Performance problems
are included in the operations outage class.
Outage Minutes
Availability is commonly measured as a percentage of planned available time. Tandem
believes there is a better way to measure availability-by measuring outage minutes.
When measuring outage minutes, Tandem assumes a 24-hour by 7-day by year-round
clock. This method of measuring availability considers unavailability caused by
planned outages as well as by unplanned outages.
Table 1-2 compares percentages with equivalent outage minutes and the resulting user
impact.
Table 1-2. Outage Minutes per Year (24-Hour by 7-Day by Year-Round Clock)
Percent
Availability 90% 99% 99.9% 99.99% 99.999% 100%
Outage 50,000 5,000 500 50 5 0
Minutes/Year*
User Impact* 35 days 3.5 days 8.3 hours 50 minutes 5 minutes 0 minutes
*Outage minutes per year and user impact days are approximations.
When analyzing and tuning your system’s performance, it is useful to divide your
activities into the following tasks:
• Establishing performance requirements
• Gathering performance information
• Analyzing the results of performance measurements
• Acting on collected data to optimize performance (tuning)
• Reporting results
Optimizing Performance
Performance optimization and tuning is the process of acting on collected performance
measurements and performance data analysis in order to optimize system performance.
Tuning a system means taking full advantage of that system’s capabilities to provide
optimum performance to all users. Performance optimization and tuning tasks often
include:
• Solving memory pressure
• Balancing the system workload
• Conserving resources
• Balancing memory consumption
Section 6, “Performance Optimization and Tuning,” explains how to tune your system
for optimum performance.
Reporting Results
Performance analysis and tuning results are usually reported to the capacity planning
staff and to management. Regular performance analysis and tuning reports can help
you track problems and resource usage. Periodic performance reports can also help
you detect trends in system use and performance. It is helpful to have reports that:
• Summarize the data collected
• List service goals and how well the goals were met
• Summarize actions taken (if any) to improve performance
• Summarize how the resources were used (for planning and accounting purposes)
• Summarize outstanding issues
Section 7, “Reporting Performance Analysis and Tuning Results,” provides a sample
performance analysis and tuning report.
Capacity Planning
Capacity planning is the process of forecasting future capacity needs based on your
company’s changing business needs, performance trends, and the growth in users,
applications, and so forth. Capacity planning helps you:
• Plan for growth in system workloads based on business growth. For example, if
capacity planners know that the company’s business is going to grow such that your
systems will have to handle a 10 percent increase in transactions per second, they
can determine the additional resources required to handle the future workload.
• Prepare budgets for the acquisition of additional equipment, space, power, air
conditioning, and other resources.
• Avoid crises related to overloaded systems.
Application Sizing
Application sizing is the process of forecasting the effects of new applications on your
system through the use of models to determine how well new applications will handle
their intended workloads. Application sizing helps you:
• Plan for growth in system workloads caused by new applications.
• Determine how much capacity and how many resources a new application will
require.
Data
Data
Data
Application Sizing Steps
002
003