Accelerate With ATS - TPC 4.1.1 Performance Management Enhancements and Demonstration

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Advanced Technical Skills (ATS) North America

Accelerate with ATS: TPC 4.1.1


Performance Management Enhancements
and Demonstration

John Hollis

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

Agenda
TPC 4.1.1 (Available October 2009) review of the significant new
performance metrics, alerts and thresholds
New performance metrics
New performance alerts and thresholds
New filters on thresholds for selected times and percentages
New alert triggers and suppression
Demo of using the "volume utilization" metric

Demo of the Storage Optimizer


Comparison of the Storage Optimizer's recommendations with findings
using "volume utilization" metric

Positioning the use of filtered reports, alerts, SAN Planner and Storage
Optimizer

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

New Performance Metrics, Alerts and Thresholds


New performance metrics:

Based on data rate.

Port Send/Receive/Overall Bandwidth Percentage for SVC and switch ports


Port Send/Receive/Overall Utilization Percentage for storage subsystem ports
Port FCP/FICON/PPRC Send/Receive/Total I/O Rate
Utilization is percent of time busy;
Port FCP/FICON/PPRC Send/Receive/Total Data Rate
requires service time from the storage
device (ie RT). IBM Education class
Port FCP/FICON/PPRC Send/Receive/Total Response Time
SGA07 will cover port and volume
utilization.
Read/Write/Total HPF I/O Rate
The approximate utilization percentage of a volume over a time
HPF I/O Percentage
interval. Available on systems reporting IO rate and RT.
Volume Utilization
New SVC 4.3.1 Performance Counters

Peak Backend Read/Write Response Time


Peak Backend Read/Write Queue Times
Non-Preferred Node Usage Percentage
Overall Host Attributed Response Time Percentage
This provides an aid to diagnosing slow hosts and poorly
performing fabrics.
This is the time taken for a host to respond to a transfer-ready
notification from the node (for read) or the time taken for a host to
send the write data after the node has responded to a transferready notification (for write)
3

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

New Performance Metrics, Alerts and Thresholds


New performance alerts and thresholds:

Exists prior to 4.1.1, but I/O


rate threshold boundary
added covered in next
slide

Total Backend I/O Rate for ESS/DS6K/DS8K arrays/SVC MDisk


Total Backend Data Rate for ESS/DS6K/DS8K arrays/SVC MDisk
Backend Read/Write Response Time for ESS/DS6K/DS8K arrays/SVC MDisk
Sets thresholds on the average
Backend Overall Response Time for SVC MDisk
number of milliseconds it took
to service each send operation
Backend Read/Write Queue Time for SVC MDisks
to another node in the local SVC
cluster. Violation of these
Backend Peak Write Response Time for SVC Nodes
threshold boundaries means
that it is taking too long to send
data between nodes (on the
Port to Local Node Send/Receive Response Time for SVC Nodes
fabric), and suggests either
congestion around these FC
Port to Local Node Send/Receive Queue Time for SVC Nodes
ports, or an internal SVC
microcode problem.
Non-Preferred Node Usage Percentage for SVC I/O Groups
Port Send/Receive Busy (time) Utilization Percentage for storage subsystem ports
Port Send/Receive Bandwidth Percentage for SVC and switch ports

This threshold is
enabled by
default,(SVC and
switches) with default
boundaries 85,75,-1,-1.

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

New Performance Metrics, Alerts and Thresholds


New filters on Thresholds for selected times and percentages:

If the I/O Rate is less than a user-specified value (ops/sec), no alert will be
generated even if the response time exceeds the threshold boundary.
Write Cache Delay Percentage (pre-populated 10%, 3,, I/O 10)
Overall Backend Response Time Threshold (SVC) (pre-populated blank, blankI/O 5)
Non-preferred Node Usage Percentage (SVC)
Backend Write Queue Time (SVC) (pre-populated 5, 3,, I/O 5)
Backend Read Queue Time (SVC) (pre-populated 5, 3,, I/O 5)
Backend Write Response Time (pre-populated 120, 80,, I/O 5)
Backend Read Response Time (pre-populated 35,25,,I/O 5)

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

New Performance Metrics, Alerts and Thresholds


Alert and threshold definition mechanism additions:

Trigger alerts based on critical/warning condition levels


Suppress alerts when there are insufficient repetitions
Suppress alerts when there are repeated conditions
All events shown in Constraint Violations reports

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

Make use of new thresholds and alert suppression

Tailor thresholds to attention getting levels.


If the alerts are being ignored they are of no value.
Consider adjusting alert levels such that the
administrator will take action.

Tailor the alert suppression options to an


attention getting level.
All alerts, including suppressed alerts, are
shown in the Constraints Violation Report.

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

Make use of volume utilization

This screen is shown in the


demo. It is here for reference.

Define the data columns you want to report on.


Performance management experts may add
columns for data used in diagnosing
performance problems. (for example I/O rates
and response times)

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

Make use of volume utilization

In performance management, there


is a concept called Population. It
is Total IO rate multiplied by
Overall Response Time divided
by 1000.
The calculation for Volume
Utilization is based on
Population.
Techniques for analyzing
Population and Volume
Utilization are covered in IBM
Education course SGA07.

This screen is shown in the


demo. It is here for reference.

As you gain experience in your


environment, add the names of
volumes you want excluded.
For example: volumes that you have
seen as always violating the
thresholds, their situation is that they
will not be fixed and you just do not
want to see them in this report any
more.
Also: the volume name filter LIKE
can be used to define reports on
critical volumes, applications,
servers. (if clever volume naming
policies have been used).

Press the save


icon to save the
report in My
Reports

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

Make use of volume utilization

10

This screen is shown in the


demo. It is here for reference.

This volume had many occurrences of high utilization


so it is a volume of performance interest

This volume had only one occurrence of


high utilization so it is not yet a volume of
performance interest

Use the Drill up option to go to reports that


may provide insight to the root cause.
2010 IBM Corporation

Advanced Technical Skills (ATS) North America

Make use of the Storage Optimizer


A

This screen is shown in the


demo. It is here for reference.

P0 is in need of attention

Thresholds define not only the


colors used in the heat map, but also
the parameters within the optimizer
must work.
For example, if a threshold of 20%
were chosen, and moving a volume
off of P0 resulted in the target Pools
utilization going over 20%, that
possible action would be eliminated.
It is possible to set thresholds so
low that no action is recommended.
Working with the Storage Optimizer
is an iterative process.
11

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

Make use of the Storage Optimizer

12

Before
After

This screen is shown in the


demo. It is here for reference.

These volumes are also identified in a filtered


Volume Utilization report when done for the
same dates as this Storage Optimizer.

The recommendations will spread some of the


workload that was previously only on P0.
The results are circled below.
This is a much better balance.

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

Positioning the use of filtered reports, alerts, SAN


Planner and Storage Optimizer
Filtered reports
Uses historic performance information
Good for trending, spikes, root cause analysis
Use them daily/weekly/monthly/as-needed

Alerts
Thresholds compared to performance data as it is gathered

Note: For storage systems, these are generally at a controller or I/O group level (i.e. no alert/threshold for volume utilization)

Good for: immediate notification of situations


Use them regularly for constraint violations report analysis of affected volumes and hosts

SAN Planner
Uses historic performance information for planning new volumes
Use it as needed for creating new volumes

Storage Optimizer
Uses historic performance information for optimizing performance of existing volumes
Use it as needed for performance problem resolution and weekly/monthly for performance
problem avoidance

13

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

Questions

14

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

Trademarks and notes


IBM and the IBM logo are registered trademarks, and other company, product or service
names may be trademarks or service marks of International Business Machines
Corporation in the United States, other countries, or both. For a list of IBM trademarks,
please see: http://www.ibm.com/legal/copytrade.shtml
Intel and related trademarks and logos, IT Infrastructure Library and ITIL, Java and all
Java-based trademarks and logos, Linux, Microsoft and Windows, and UNIX are
trademarks or service marks of others as described under Special attributions at:
http://www.ibm.com/legal/copytrade.shtml
Other company, product and service names may be trademarks or service marks of
others.
References in this publication to IBM products or services do not imply that IBM intends
to make them available in all countries in which IBM operates.
IBMs provision of products or services does not constitute the provision of legal
advice, and IBM does not represent or warrant that its services or products will
guarantee or assist your compliance with any laws or regulations. You are solely
responsible for identifying, interpreting and ensuring your compliance with all
applicable laws, regulations and rules relevant to your business needs and should seek
competent legal advice as needed.

15

2010 IBM Corporation

Advanced Technical Skills (ATS) North America

Demonstration contents
Options on creating a write cache delay percentage threshold, defaults,
alert suppression
Alerts > Storage Subsystem sort by date
Constraint violations report, change begin date to 2009, DS6KA Disk Utiliz%
thresholds, drill up, affected volumes report
DS6KA perf monitor logfile thresholds and defaults
Filtered Exercise - Volume Utilization report:

utilization, I/O rate, response time column comparison


Change start date to Nov 28, 2009, 1:11am, sort results on time
NOT LIKE *162*

Exercise Storage Optimizer


Options used in creating a volume with the SAN planner
Slide 13 Positioning.

16

2010 IBM Corporation

You might also like