Introduction To Performance Management

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

1

Introduction to Performance
Management
Overview
For quite some time, consumers have come to expect the continuous availability of
many basic services such as water, power, and telephone. Today, consumers also expect
to access cash from automatic teller machines, to purchase goods with credit cards 24
hours a day, to make airline reservations at any time, and to have access to around-the-
clock emergency services.
In companies where customer service reigns, the customer-not the business-
determines when, where, and how services should be provided. Effectively managing
your system’s performance is one way to ensure that you meet the availability
requirements of your customers.
This section:
• Defines outages, both planned and unplanned
• Describes the causes of outages
• Defines performance management in the Tandem environment
• Explains how performance management fits into Tandem’s operations management
(OM) framework

Availability Guide for Performance Management-115733 1-1


What Is an Outage? Introduction to Performance Management

What Is an Outage?
When an application is not performing properly-for example, when user response time
is unacceptable-availability can be affected and can result in an outage. Tandem
defines availability as the total time an application running on a Tandem system can be
accessed by a user of that application. An outage is time during which your NonStop
system is not capable of doing useful work. From the end-user’s perspective, an outage
is any time an application is not available.
Outages consist of two types: planned and unplanned. Maximizing availability of your
systems, networks, and applications can be achieved by reducing or eliminating both
types of outages.

Planned Outages
A planned outage is system or application downtime that is planned or scheduled.
Planned outages occur when there are changes that must be implemented and the
computing environment must be stopped to implement the changes. An example of
such a change is the installation of a new version of the operating system. You can
reduce or eliminate planned outages by:
• Performing changes online. Online change is any change that can be performed
while the Tandem NonStop Kernel and system utilities are operational. Being able
to make changes to your hardware and software online is one way to reduce-or
even eliminate-system and application downtime.
• Reducing the time required for planned outages.
The Availability Guide for Change Management provides guidelines for managing
planned outages in an operations environment.

Unplanned Outages
An unplanned outage is the time in which the application or system becomes
unavailable to the end user because of a problem situation. One example of such an
outage is a processor halt caused by severe memory problems. Slow response time can
also appear to be an outage to the end user and can result in the loss of transactions.
You can reduce or eliminate unplanned outages by:
• Predicting and then preventing problems before they occur
• Quickly recovering from problems
This manual explains how to identify and resolve performance problems. It also shows
you how to optimize system performance so that performance problems do not occur.
The Availability Guide for Problem Management also provides comprehensive
information about problem prediction, prevention, and recovery.

1-2 115733-Availability Guide for Performance Management


Introduction to Performance Management Measuring Outages

Measuring Outages
Tandem believes that the measurement of availability should be from the end-user’s
perspective. For example, it is not enough to record that a certain hardware or software
component has gone down; you must also take into consideration the user’s ability to
access the service, the quality of the service, and whether or not the response time is
within acceptable limits.

Outage Classes
Table 1-1 describes the five outage classes that Tandem uses to categorize the causes of
outages. Physical, design, operations, and environmental outages are usually unplanned
outages; reconfiguration outages are generally planned outages. Performance problems
are included in the operations outage class.

Table 1-1. Outage Classes


Outage Class Description
Physical Physical faults or failure in the hardware.
Examples include system disk failure and network router failure, non-
fault-tolerant hardware configurations (such as unmirrored disk
drives), and non-fault-tolerant application configurations.
Design Design errors such as bugs in design and design failure in hardware or
software.
Examples include an application change that makes the application
unusable by introducing unexpected problems.
Operations Errors caused by operations personnel due to accident, inexperience,
or malice.
Examples include deleting data, incorrectly installing software, and
performance problems such as poor response time.
Environmental Failures in power, cooling, network connections, natural disasters
(earthquake, flood), terrorism, and accidents.
Examples include air conditioning system failure and power failures.
Reconfiguration Any outage taken to reconfigure any part of the environment.
Examples include hardware and software upgrades and database
maintenance.

Outage Minutes
Availability is commonly measured as a percentage of planned available time. Tandem
believes there is a better way to measure availability-by measuring outage minutes.
When measuring outage minutes, Tandem assumes a 24-hour by 7-day by year-round
clock. This method of measuring availability considers unavailability caused by
planned outages as well as by unplanned outages.
Table 1-2 compares percentages with equivalent outage minutes and the resulting user
impact.

Availability Guide for Performance Management-115733 1-3


Measuring Outages Introduction to Performance Management

Table 1-2. Outage Minutes per Year (24-Hour by 7-Day by Year-Round Clock)
Percent
Availability 90% 99% 99.9% 99.99% 99.999% 100%
Outage 50,000 5,000 500 50 5 0
Minutes/Year*
User Impact* 35 days 3.5 days 8.3 hours 50 minutes 5 minutes 0 minutes
*Outage minutes per year and user impact days are approximations.

Measuring Outage Minutes in a Client/Server Environment


For client/server types of applications, it is useful to express downtime as the number of
user outage minutes. A failure in the client part of the application might affect only
one user, but to that user the application is down. A failure in part of the network could
affect several users. A failure in the server, however, could affect thousands of users. It
is important that an outage in the server be weighted over an outage in the client.
In a client/server environment, it makes sense to measure downtime as the number of
minutes the application is unavailable multiplied by the number of affected users. A
one-minute outage in the workstation equals one minute of downtime. An outage of
one minute in the server, however, equals one minute times the number of users
accessing the server.

Alternate Ways of Measuring Outages


Depending on specific business needs, downtime might be measured in ways other than
user outage minutes. For example, a site might be obligated to pay a penalty for each
transaction that does not get processed while an application is down. Such a site might
supplement its measure of downtime by keeping records of the number of transactions
it normally processes by minute and by day of the week. If an outage occurs-for
example, at 10 a.m. on Tuesday morning and lasts for 15 minutes-the site can
calculate the average number of transactions that would normally be processed during
that period. Subsequently, the site pays a corresponding penalty to its customer.
Using this method leads to significantly different outage costs depending on the time of
day and the day of the week. An hour-long outage at 2 a.m. on Monday morning might
carry a negligible penalty when compared with a 15-minute outage at 5 p.m. on a Friday.

1-4 115733-Availability Guide for Performance Management


Introduction to Performance Management Performance Analysis and Tuning

What Is Performance Management?


Performance management is the process of managing the performance of your system
and network environment to ensure that:
• You get the best return from your systems.
• Your systems meet your business needs as defined by your service-level
agreements. (Service-level agreements specify the level of service that operations
should provide.)

Performance Management Functions


Performance management includes the following functions:
• Performance analysis and tuning
• Capacity planning
• Application sizing
These functions are described in the following subsections.

Performance Analysis and Tuning


Performance analysis and tuning is the process of measuring system performance and
acting on the results of your analysis of these measurements in order to improve system
performance and availability. Performance analysis and tuning helps you to:
• Determine if the system is performing optimally.
• Ensure that service-level agreements are being met.
• Determine if performance guidelines are being followed.
• Ensure that resources are allocated according to established priorities.
• Determine if system usage occurs in a prudent, nonwasteful manner.
• Determine if the resources available can meet the performance goals desired.
• Determine if additional resources are required to meet performance goals.

Performance Analysis and Tuning Tasks


Performance analysis and tuning is an iterative process, not an ordered series of steps
that you perform to achieve a specified result. For example, you might need to take a
measurement, analyze the results of this measurement, take more measurements,
change some configuration parameters, take more measurements, and so on.

Availability Guide for Performance Management-115733 1-5


Performance Analysis and Tuning Introduction to Performance Management

When analyzing and tuning your system’s performance, it is useful to divide your
activities into the following tasks:
• Establishing performance requirements
• Gathering performance information
• Analyzing the results of performance measurements
• Acting on collected data to optimize performance (tuning)
• Reporting results

Establishing Performance Requirements


Before you attempt to measure, analyze, or optimize your system’s performance, you
must first determine the level of performance that is acceptable for your system and
establish performance requirements. Performance requirements set the boundaries for
your performance analysis and tuning activities-they define when a performance
problem exists and when the system is performing in an acceptable manner.
Performance requirements are based on service-level agreements.
Section 3, “Establishing Performance Requirements,” explains how to make
performance requirements part of your service-level agreements.

Gathering Performance Information


Gathering performance information involves collecting information about your system’s
performance that you will later use to identify performance problems and potential
performance problems. Gathering performance information includes the following
steps:
1. Collecting information about your system configuration
2. Collecting information about your application configurations
3. Collecting performance measurements
Section 4, “Gathering Performance Information,” explains how to measure performance
and collect performance information.

Analyzing Performance Information


Performance analysis involves examining performance information and includes the
following tasks:
• Evaluating your system and application configurations
• Creating a workload profile
• Analyzing resource use
• Identifying workload imbalances
Section 5, “Analyzing Performance Information,” explains how to examine performance
information and measurement data.

1-6 115733-Availability Guide for Performance Management


Introduction to Performance Management Capacity Planning

Optimizing Performance
Performance optimization and tuning is the process of acting on collected performance
measurements and performance data analysis in order to optimize system performance.
Tuning a system means taking full advantage of that system’s capabilities to provide
optimum performance to all users. Performance optimization and tuning tasks often
include:
• Solving memory pressure
• Balancing the system workload
• Conserving resources
• Balancing memory consumption
Section 6, “Performance Optimization and Tuning,” explains how to tune your system
for optimum performance.

Reporting Results
Performance analysis and tuning results are usually reported to the capacity planning
staff and to management. Regular performance analysis and tuning reports can help
you track problems and resource usage. Periodic performance reports can also help
you detect trends in system use and performance. It is helpful to have reports that:
• Summarize the data collected
• List service goals and how well the goals were met
• Summarize actions taken (if any) to improve performance
• Summarize how the resources were used (for planning and accounting purposes)
• Summarize outstanding issues
Section 7, “Reporting Performance Analysis and Tuning Results,” provides a sample
performance analysis and tuning report.

Capacity Planning
Capacity planning is the process of forecasting future capacity needs based on your
company’s changing business needs, performance trends, and the growth in users,
applications, and so forth. Capacity planning helps you:
• Plan for growth in system workloads based on business growth. For example, if
capacity planners know that the company’s business is going to grow such that your
systems will have to handle a 10 percent increase in transactions per second, they
can determine the additional resources required to handle the future workload.
• Prepare budgets for the acquisition of additional equipment, space, power, air
conditioning, and other resources.
• Avoid crises related to overloaded systems.

Availability Guide for Performance Management-115733 1-7


Application Sizing Introduction to Performance Management

Capacity Planning Steps


Capacity planning involves the following steps:
1. Understanding business objectives.
2. Establishing requirements and strategy.
3. Instituting capacity reporting.
4. Developing a model of existing resource usage (a capacity study) and using it to
forecast future resource requirements (capacity planning).
5. Developing (or reviewing and modifying) the capacity plan.
Capacity planning is described in Section 8, “Capacity Planning.”

Application Sizing
Application sizing is the process of forecasting the effects of new applications on your
system through the use of models to determine how well new applications will handle
their intended workloads. Application sizing helps you:
• Plan for growth in system workloads caused by new applications.
• Determine how much capacity and how many resources a new application will
require.

Application Sizing Steps


Application sizing involves the following steps:
1. Establishing requirements and strategy.
2. Developing a model of existing resource usage and using it to forecast future
resource requirements.
3. Reporting results to the capacity planning staff.
Application sizing is described in Section 9, “Application Sizing.”

1-8 115733-Availability Guide for Performance Management


Introduction to Performance Management How the Functions Fit Together

How the Functions Fit Together


Figure 1-1 shows the relationship between performance analysis and tuning, capacity
planning, and application sizing.

Figure 1-1. Performance Management Functions

Performance Analysis and Tuning Steps

Step 1 Step 2 Step 3 Step 4 Step 5


Establish Gather Analyze
Performance Performance Performance Optimize Report Results
Requirements Information Information Performance
Requirements

Data
Data

Capacity Planning Steps


Step 1 Step 2 Step 3 Step 4
Establish Institute Forecast Future Develop the
Requirements and Capacity Capacity Capacity Plan
Strategy Reporting Needs

Data
Application Sizing Steps

Step 1 Step 2 Step 3


Establish Forecast Future
Requirements and Resource Report Results
Strategy Needs

002

Availability Guide for Performance Management-115733 1-9


Performance Management and the OM Model Introduction to Performance Management

Performance Management and the OM Model


Performance management is one of the operations management disciplines defined in
Tandem’s operations management (OM) model. The OM model categorizes functions
of the operations environment into six industry-standard disciplines. In addition to
performance management, the OM model consists of the following disciplines:
• Production management-includes the day-to-day tasks performed by operations
personnel who operate and manage the production environment.
• Problem management-includes the tasks required to predict, prevent, and recover
from problems.
• Change management-includes the tasks required to manage the maintenance and
growth of your NonStop system.
• Configuration management-includes the tasks required to manage and
administrate the configuration of system software and hardware, application
subsystems, communications subsystems, and application software.
• Security management-includes the security features necessary to implement a
secure, audited, operations environment.
The following manuals describe the various aspects of the OM model:
• The Introduction to NonStop Operations Management describes the OM model and
provides an overview of each of the OM disciplines.
• The Availability Guide for Problem Management describes unplanned outages and
how to predict, prevent, prepare for, and recover from them.
• The Availability Guide for Change Management explains how to maximize system
and application availability while successfully implementing changes to your
NonStop system.
• The Security Management Guide describes how to manage the security
environment.

1-10 115733-Availability Guide for Performance Management


Introduction to Performance Management How the OM Disciplines Fit Together

How the OM Disciplines Fit Together


Figure 1-2 shows all of the OM disciplines and how they work together to ensure a
stable and predictable operations environment.

Figure 1-2. The OM Disciplines

Production Problem Change


Management Management Management

Stable and Predictable Environment

Configuration Performance Security


Management Management Management

003

Availability Guide for Performance Management-115733 1-11


How the OM Disciplines Fit Together Introduction to Performance Management

1-12 115733-Availability Guide for Performance Management

You might also like