Jetstress Field Guide v2.0.0.8

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 65

Jetstress 2013

Jetstress Field Guide


Monday, 8 July 2013
Version 2.0.0.8 [Issued]

Prepared by
neil.johnson@microsoft.com

Template Version October


2011

000Exchange Community0

MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.


Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights
under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval
system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or
otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Microsoft, our provision of this document does not give you any license to these patents,
trademarks, copyrights, or other intellectual property.
The descriptions of other companies products in this document, if any, are provided only as a convenience
to you. Any such references should not be considered an endorsement or support by Microsoft. Microsoft
cannot guarantee their accuracy, and the products may change over time. Also, the descriptions are
intended as brief highlights to aid understanding, rather than as thorough coverage. For authoritative
descriptions of these products, please consult their respective manufacturers.
2011 Microsoft Corporation. All rights reserved. Any use or distribution of these materials without
express authorization of Microsoft Corp. is strictly prohibited.
Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corporation in the
United States and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks of their respective
owners.
Page ii
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Revision and Signoff Sheet


Change Record
Date

Author

Versi Change reference


on

22/03/20 Neil
13
Johnson

2.0.0.
1

First draft for Jetstress 2013

03/04/20 Neil
13
Johnson

2.0.0.
2

Updates after feedback from Robert Gillies


and Ramone Infante.

19/06/20 Neil
13
Johnson

2.0.0.
5

Final issue after internal review

20/06/20 Neil
13
Johnson

2.0.0.
6

Updated Error Table description with JET


codes
Added troubleshooting information for ESE
606.

20/06/20 Neil
13
Johnson

2.0.0.
7

Fixed formatting issues

Page
, Field Guide, Version
Prepared by Neil Johnson
"" last modified on 8 Jul. 13, Rev

000Exchange Community0

Document Contributors
Name

Position

Section

Neil Johnson

Senior Consultant, UK MCS

Author

Alexandre
Costa

SENIOR SDET, Exchange Test

Jetstress internals

Ross Smith IV

PRINCIPAL PROGRAM MANAGER, Exchange Configuring


CXP
Jetstress

Ramon b.
Infante

DIR, WW COMMUNITIES, UC

Various

Matt Gossage

PRINCIPAL PROGRAM MANAGER LEAD

Various

Umair Ahmad

SDET II, Exchange Test

Various

Page
, Field Guide, Version
Prepared by Neil Johnson
"" last modified on 8 Jul. 13, Rev

000Exchange Community0

Reviewers
Name

Versio Position
n

Neil Johnson

2.0.0.1

Senior Consultant II, MCS UK

Alexandre
Costa

2.0.0.1

SENIOR SDET, Exchange Test

Ross Smith IV

2.0.0.1

PRINCIPAL PROGRAM MANAGER, Office 365


- CAT SVCS

Ramon b.
Infante

2.0.0.1

DIR, WW COMMUNITIES, UC

Matt Gossage

2.0.0.1

PRINCIPAL PROGRAM MANAGER LEAD,


Exchange PM US

Umair Ahmad

2.0.0.1

SDET II, Exchange Test US

Nathan Muggli 2.0.0.1

SENIOR PROGRAM MANAGER, Exchange PM


- US

Scott Schnoll

2.0.0.1

PRINCIPAL TECHNICAL WRITER, Content


Publishing

Boris
Lokhvitsky

2.0.0.1

DELIVERY ARCHITECT, US-US-MCS West SL


2

Jeff Mealiffe

2.0.0.1

SENIOR PROGRAM MANAGER LEAD, Office


365 - CAT SVCS

Robert Gillies

2.0.0.1

REGIONAL ARCHITECT, US-MCS DOD SL 2

David Mosier

2.0.0.1

PRINCIPAL CONSULTANT, US-MCS Civilian


SL 2

Date

Table 1: Document reviewers

Page
, Field Guide, Version
Prepared by Neil Johnson
"" last modified on 8 Jul. 13, Rev

000Exchange Community0

Table of Contents
1 Purpose...............................................................................1
2 What is New in Jetstress 2013...............................................1
3 Introduction to Jetstress........................................................2
4 Jetstress Internals................................................................3
4.1

Main Jetstress Components..........................................................................3


4.1.1

Auto Tuning Component...................................................................................3

4.1.2

Thread Dispatcher............................................................................................ 5

4.1.3

Background Log Checksummer........................................................................5

4.1.4

Offline Log and Database Checksummer..........................................................5

4.1.5

Reporting and Verification................................................................................6

5 Planning for Jetstress............................................................7


5.1

Jetstress testing flow chart...........................................................................7


5.1.1

High Level Test Overview..................................................................................7

5.1.2

Process with Automatic thread tuning..............................................................8

5.2

When should I run Jetstress in my project?..................................................9

5.3

Where should I run Jetstress in my infrastructure?....................................10

5.4

Failure Mode Testing................................................................................... 11

5.5

5.4.1

Raid Array Testing...........................................................................................11

5.4.2

Resilient Component Testing..........................................................................11

5.4.3

Example of a failed degraded mode test........................................................12

Jetstress testing inside virtual machines....................................................13


5.5.1

5.6

What is different about Jetstress inside a virtual machine?............................13

How much time should I allocate for Jetstress testing?..............................15


5.6.1

Initialisation.................................................................................................... 15

5.6.2

Testing............................................................................................................ 15

5.6.3

Clean-up......................................................................................................... 16

5.7

Preparing for the Jetstress test...................................................................17

5.8

What happens if the test fails?...................................................................18

6 Installing Jetstress..............................................................19
6.1

Documentation.......................................................................................... 19

6.2

Jetstress Version and Download.................................................................19


Page
, Field Guide, Version
Prepared by Neil Johnson
"" last modified on 8 Jul. 13, Rev

000Exchange Community0

6.3

Prerequisites.............................................................................................. 20

6.4

Getting ESE Files necessary for Jetstress...................................................21

6.5

6.4.1

File locations from an installed Exchange Server............................................21

6.4.2

File locations from the installation media.......................................................21

Installation................................................................................................. 22
6.5.1

Application Installation...................................................................................22

6.5.2

ESE File Installation........................................................................................24

7 Configuring Jetstress...........................................................26
7.1

7.2

Jetstress Test Types.................................................................................... 26


7.1.1

Test a disk subsystem throughput..................................................................26

7.1.2

Test an Exchange mailbox profile...................................................................26

Initial configuration.................................................................................... 27

8 Jetstress Output Files..........................................................33


9 Reading Jetstress report data..............................................34
9.1

Target design values..................................................................................34

9.2

Reading the Jetstress Test Result Report....................................................35


9.2.1

Test Summary.................................................................................................35

9.2.2

Database Sizing and Throughput....................................................................35

9.2.3

Jetstress System Parameters..........................................................................36

9.2.4

Database Configuration..................................................................................36

9.2.5

Transactional I/O Performance........................................................................36

9.2.6

Background Database Maintenance I/O Performance.....................................37

9.2.7

Log Replication I/O Performance.....................................................................37

9.2.8

Total I/O Performance.....................................................................................38

9.2.9

Host System Performance..............................................................................39

9.2.10 Error Counts Per Volume.................................................................................39


9.2.11 Test Log.......................................................................................................... 42

9.3

Interpreting Jetstress test results...............................................................43

9.4

Test evaluation........................................................................................... 44

10

Appendix A Configuring thread count..............................45

11

Appendix B Configuring sluggishsessions........................46

12

Appendix C - Running a Jetstress Test with JetstressCmd.exe


47
Page
, Field Guide, Version
Prepared by Neil Johnson
"" last modified on 8 Jul. 13, Rev

000Exchange Community0

13

Appendix E Running Jetstress on a production server......49

14

Common Issues...............................................................50

14.1 Troubleshooting Jetstress...........................................................................50


14.1.1 Jetstress cannot attach to or create a database.............................................50
14.1.2 Error loading Performance Monitor counters..................................................50
14.1.3 Unable to tune for the parameters.................................................................51
14.1.4 Unable to mount databases due to invalid mount point configuration...........51
14.1.5 Jetstress testing failed. Error: System.ApplicationException: Faulty
performance counter paths: \MSExchange Database(*)\*..........................................52

Page
, Field Guide, Version
Prepared by Neil Johnson
"" last modified on 8 Jul. 13, Rev

000Exchange Community0

Purpose
This document is intended to explain the process and requirements for validating
an Exchange 2013 storage solution prior to releasing an Exchange deployment
into production.
It will explain how Jetstress works, how to plan for and perform a Jetstress test,
and how to analyse the results of the test.
This document is not intended to provide Exchange storage design guidance. For
guidance on Exchange 2013, server design and planning refer to Planning and
Deployment.

1 What is New in Jetstress 2013


Jetstress 2013 is an evolution of Jetstress 2010. It has some improvements, bug
fixes and it allows validation of Exchange Server 2013 solutions.
A quick outline of new features:

The Event log is captured and logged to the test log. These events show
up in the Jetstress UI as the test is progressing.
Any errors are logged against the volume that they occurred. The final
report shows the error counts per volume in a new sub-section.
A single IO error anywhere will fail the test. In case of CRC errors, they
might be remapped. A re-run of Jetstress should verify that they indeed
were remapped.
Detects -1018, -1019, -1021, -1022, -1119, hung IO, DbtimeTooNew,
DbtimeTooOld.
Threads, which generate IO, are now controlled at a global level. Instead
of specifying Threads/DB, you now specify a global thread count, which
works against all databases. This improves the granularity of thread
tuning and enables automatic tuning to work more effectively.
Jetstress configuration files (JetstressConfig.XML) generated from an older
version of Jetstress is no longer allowed.

Important Changes

Do not use Jetstress 2013 for older versions of Exchange Server. Jetstress
2013 has only been tested with Exchange Server 2013.

Page 1
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Page 2
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Introduction to Jetstress
Jetstress is a tool for simulating Exchange database I/O load without requiring
Exchange to be installed. It is primarily used to validate physical deployments
against the theoretical design targets that were derived during the design phase.
To simulate the complex Exchange database I/O pattern effectively, Jetstress
makes use of the same ESE.DLL that Exchange uses in production. It is therefore
vital Jetstress use the same version of the Extensible Storage Engine (ESE) files
that your Exchange infrastructure will be built with in production.
Ideally, Jetstress testing will be part of the overall project plan. The best time to
schedule Jetstress testing is just before Exchange will be physically installed onto
the servers.
Jetstress testing provides the following benefits prior to deploying live users.

Validates that the physical deployment is capable of meeting


specific performance requirements
Validates that the storage design is capable of meeting specific
performance requirements
Finds weak components prior to deploying in production
Proves storage and I/O stability

The most important aspect of Jetstress testing is that it allows you to see how
the physically deployed storage and server infrastructure will behave once a real
Exchange workload is applied. This often works out differently from
expectations, especially in scenarios where shared storage infrastructure is
deployed or where the storage design is complex.
Often the Jetstress test will not provide the results that were expected.
Sometimes by making subtle configuration changes to the storage infrastructure
(for example, driver or firmware updates) it is then possible to get the test to
pass.
It is important to remember that when the Jetstress test reports a failure,
Jetstress has not failed, Jetstress is just reporting on the performance of your
storage solution. This may seem an obvious point, however a large number of
customer escalation cases for Jetstress are not actually Jetstress cases and are
instead storage performance cases. If you need to remediate a test failure,
remember that Jetstress is dumb tool that is used worldwide by thousands of
Exchange professionals and in Office 365. It is extremely unlikely that Jetstress
is broken; it is far more likely that you have a design issue or misconfiguration
with your storage deployment.
Fundamentally, a successful Jetstress test validates that all of the hardware and
software components within the I/O stack from the operating system down to the
physical disk drive are working to a sufficient level to meet the predicted
performance required by Exchange to operate successfully.
Page 3
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Important:
The validity of your Jetstress testing is only as good as the user profile
analysis and workload prediction that was completed during the design
phase of the project.

2 Jetstress Internals
2.1 Main Jetstress Components
Like Exchange, Jetstress is an ESE-based application. It runs in user memory
space, makes API calls to ESE, which in turn makes calls to the Windows File
system and I/O Manager to gain access to the data stored on disk. During each
of these tasks Windows records performance information about the specific task
and the operating system as a whole. Once the test is completed, Jetstress
analyses the performance data to determine if the system meets the targets
specified at the beginning of the test.

Figure 1 - Main Jetstress Components

2.1.1 Auto Tuning Component


This component is responsible for auto tuning within Jetstress. It attempts to
determine the maximum thread count that the solution can support. Each
thread performs a set amount of ESE calls, which generates a set amount of disk
I/O. By raising or lowering thread count, the storage workload can be modified.
The auto-tuning component attempts to determine the maximum thread count
that the storage solution can support, whilst remaining within the published disk
latency guidelines for Exchange Server. The Jetstress test parameters for disk
latency are shown in section 8.3 Interpreting Jetstress test results.

Page 4
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

New:
Auto tuning has been improved in Jetstress 2013 by moving to a global
thread controller. Auto-tuning may still fail, however it should be
successful in many more scenarios than in 2010.

Page 5
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

2.1.1 Thread Dispatcher


The thread dispatcher is responsible for managing workload within Jetstress. The
main areas of interest within the thread dispatcher are as follows:

ThreadCount: number of transactional threads globally (prior to Exchange


2010, it used to be the number of threads per storage group and in
Exchange 2010 it was number of threads per database). In Exchange
2013 this is a global parameter.
ThreadTypes: each of those threads chooses to do one type of work
against the database. The same thread can perform different types of
work during a given run. There are four types: insert, read, update and
delete (all of those against records on a table). The default operation mix
for an Exchange 2010 simulation is: 40%, 35%, 5% and 20%, respectively.
SluggishSessions: the default is 1 for Exchange 2010. This is usually used
to fine tune the amount of work performed by a given thread. Internally, a
thread sleeps for (SluggishSessions * TaskRunTime) before picking up the
next task to run. For example, if you have 3 for SluggishSessions and an
insert thread took 100ms in the last cycle, it will sleep for 300ms before
moving on to the next cycle. Of course, 0 means go full throttle.

2.1.2 Background Log Checksummer


This component simulates the I/O overhead of additional database copies. This
copy operation has an I/O cost which increases with each additional copy.

2.1.3 Offline Log and Database Checksummer


This process checksums all database and log files at the end of a Jetstress run to
ensure that all data is intact. It also provides performance data for CRC
checksum speed should VSS copies require a checksum prior to backup.
This process is extremely hard on storage hardware, often applying an I/O load
many times greater than the workload that the actual Jetstress test applies.
Important
If you are running Jetstress on multiple servers in parallel on shared
storage infrastructure, it is vital that the CRC check is not running while
other servers are performing their Jetstress tests. Selecting the multihost option during the test configuration causes the testing process to
stop and wait for confirmation before beginning the CRC check to avoid
servers interfering with each others results.

While working out the correct thread count to use it is not necessary to let the
checksum part of the test complete. To stop the checksum you can either click
on cancel, which will stop the checksum part of the test but still generate the
performance test report, or edit the Jetstress configuration file and change the
VerifyChecksum value to false (default is true).
Page 6
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

<VerifyChecksum>false</VerifyChecksum>

2.1.4 Reporting and Verification


At the end of a Jetstress test, the reporting and verification process compares
the observed performance results against a set of acceptable values. These
results are then written to a HTML file. During the test, binary performance data
is written out to a BLG file.

Page 7
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Planning for Jetstress


Jetstress testing can be difficult to account for in your planning process.
Particularly, how much time to allocate for testing, and which parts of the project
should Jetstress testing occur? This section will try to answer some of these
questions and explain the process in more detail.

3.1 Jetstress testing flow chart


The aim of the following process is to find the maximum workload while still
passing the test. Fundamentally, the aim is to increase workload until the test
fails or meets the design goals identified in the mailbox role calculator.
Important:
The last value before failure is the highest workload that the system can
support. If this value is below the design target, then use
sluggishsessions to fine-tune the test. If the storage is still unable to meet
the requirements then we have determined that it is unsuitable for the
workload intended.

The following process assumes that you are using the disk subsystem
throughput test and auto-tuning as recommended.

2.1.2 High Level Test Overview


Figure 2 - High Level Test Overview shows a high-level flowchart for Jetstress
testing. The process begins with a completed Mailbox Role Calculator and ends
when the test has passed successfully while meeting the targets identified in the
calculator.

Figure 2 - High Level Test Overview

Page 8
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

2.1.3 Process with Automatic thread tuning

Figure 3 - Jetstress test flowchart for automatic thread tuning

Page 9
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

3.2 When should I run Jetstress in my project?


Jetstress testing can often take place at multiple phases within the project plan.
Depending on the design approach taken, Jetstress testing may be performed
during both the planning (design) and build phases of a project.

Figure 4 - SDM phase overview

So, why would you run Jetstress during the planning/design phase of a project?
The simple answer is that with todays powerful hardware, Exchange design
teams must use standard chunks of hardware to create their design. Rather
than attempt to guess what the I/O limits are of the hardware it is preferable to
perform some Jetstress tests on the hardware to determine the maximum
storage IO capacity of the system. This allows the design team to specify the bill
of materials much more precisely, thereby saving money and reducing risk.
However, if you have already proven the solution in the lab, why test again at
build time? This is a common question. Many projects only schedule sufficient
time for testing a single server and its storage solution with the belief that they
only need to validate the design. The problem with this approach is that it
assumes a zero error rate in the build out. What happens if someone forgets a
part of the build on one server? Alternatively, deploys a different device driver
from the one used in the lab? What happens if a faulty piece of hardware has
been deployed? Jetstress testing at build time is a great way to validate that the
physically deployed hardware and software are capable of providing the required
I/O performance for Exchange. Jetstress testing at build time is also a way to
identify failing components such as disk drives; it is much less stressful to
identify a weak batch of disks during a Jetstress test than on a Monday morning
after a large user migration!
If the project plan will allow it, build in sufficient time to test each server and
storage chassis that will be deployed before migrating user mailboxes to it.
Remember that Jetstress can be fully automated, so with a little bit of planning it
can be left to run overnight and may not actually add any significant overhead to
the project.

Page 10
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

3.3 Where should I run Jetstress in my infrastructure?


To ensure that the Jetstress test is representative of production, it is
recommended to run Jetstress on every set of disks that will hold mailbox
database copies (active, passive or lagged). The test is designed to validate the
storage system and so it is important that where you have multiple Exchange
servers that use the same storage system, you must test them in parallel to
simulate the production workload. If the storage system also supports additional
workload, you should use IOMeter to simulate this if it is not yet active on the
storage system at the time of testing.
Note:
It is important to remember not to run Jetstress on production servers that
have Exchange Server already installed. This may lead to problems with
Exchange performance counters. It is recommended to run Jetstress
BEFORE installing Exchange Server into production.
In the event that you have already installed and configured Jetstress on
your production Exchange Servers, refer to the following article for more
information on resolving Exchange Performance Counter problems:
http://blogs.technet.com/b/mikelag/archive/2010/09/10/how-to-unloadreload-performance-counters-on-exchange-2010.aspx

Each database copy must be designed to provide sufficient I/O to support the
copy if it were to become active. Therefore, by testing each database LUN in
parallel, we are validating that the storage solution is able to meet the design
requirements. We are also validating that any pieces of shared infrastructure are
able to meet the demand of the entire solution, rather than simply testing each
server individually.
Note:
Where there is no shared infrastructure and all storage is directly
attached, servers may be tested individually. However, the test must be
configured to include any active, replica or lagged LUNS that could
become online at the same time to be a valid test.

Page 11
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

3.4 Failure Mode Testing


2.1.4 Raid Array Testing
Since the improvements in Exchange I/O from Exchange 2007, it is now viable to
deploy Exchange Server databases on a multitude of storage types, from JBOD to
RAID 6. Raid arrays offer a great compromise between data redundancy and
performance. However, they can also suffer from a significant performance
reduction when operating in degraded mode (spindle failure). Due to this, it is
recommended to design RAID arrays that will host Exchange Server databases
such that the RAID array should provide sufficient IOPS performance for the
Exchange workload when running in degraded mode.
Important:
While testing for failure scenarios it is not necessary to run your Jetstress
test at peak working load. Instead, it is recommended to modify the
thread count until the Jetstress test achieves just above the Total
Database Required IOPS / Server value reported in the Mailbox Role
Calculator.

From a service availability perspective, it is important to validate that your


storage can provide sufficient performance in all common failure conditions.
Due to this, it is recommended to run the Jetstress test while the array is
operating in the following conditions.
Array
Condition

Test importance

Description

Optimal

Recommended for all


deployments

All disk spindles operating


normally

Degraded

Recommended for all


deployments

Single spindle removed from the


array

Rebuilding

Recommended if array has hot


spare1.

Failed spindle replaced and array


controller is rebuilding the array

Table 2: Raid array testing conditions

Ideally, the Jetstress test should still pass during a degraded mode test. If the
test fails, refer to this post to analyse the failure severity.

1 If your array does not contain a hot spare, you can choose to perform array
rebuilds out of hours so the end user impact is minimized, however your data
loss exposure is increased. If you plan on performing array rebuilds during
working hours, even if you do not have a hot spare configured it is recommended
to perform a Jetstress test run while the array is rebuilding.
Page 12
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

2.1.5 Resilient Component Testing


Any aspect of the storage solution that has been designed to be resilient should
also be tested in a failed state to determine the impact. For example if there are
multiple paths between the host and the storage controller, the Jetstress test
should still pass if one is disabled. Since there are so many possible types of
resilient components, it is impossible to list them here, however the general
spirit of this test is to evaluate potential sources of failure within your storage
solution and ensure that Jetstress still passes if they enter a degraded state.

Page 13
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

3.4.1 Example of a failed degraded mode test


This example shows an unacceptable test result. I have chosen to show an
unacceptable result since a good test is just a flat line and that is not particularly
interesting. In this instance, the storage was based on Raid6 technology. The
Jetstress test was configured to run at 1256 IOPS (Mailbox Role Calculator
predicted 1200 IOPS). Approximately half way through the test, a hard disk drive
was (carefully) removed from the array and the spare began rebuilding.
The test data shows that the average read I/O latency (Exchange Database ==>
Instances\I/O Database Reads (Attached) /average Latency) increased from
11ms to 400ms+, with latency spikes of 3000-4000ms on the affected LUN. This
situation took 18 hours to return to normal after the failure. This represented a
clear failure of the degraded mode test.
Important:
Common failure modes such as a disk rebuild should not materially affect
the test results.

Figure 5: Degraded mode failure

Note:
Please refer to the following section about understanding storage
configuration for Exchange Server 2013 for more information on
recommended raid configurations for Exchange Server.
http://technet.microsoft.com/en-us/library/ee832792.aspx

Page 14
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

3.5 Jetstress testing inside virtual machines


A quick history lesson: Over the years, we have seen a huge increase is
deployments on hypervisor technology. During the early stages of hypervisor
use for Exchange, we worked with a number of customers who observed
inaccurate results during their Jetstress tests of virtual machines. This
culminated in the Exchange product group releasing a statement that advised
against using Jetstress inside a virtual machine and instead to test on the root of
the hypervisor obviously this worked for Hyper-V, but was not quite so practical
for all hypervisors. On 30th March 2012 after significant internal testing against
modern hypervisors the Exchange Product group announced that it is now viable
to perform your Jetstress testing directly from inside the virtual machines that
are planned to host the Exchange Mailbox role.
The single caveat is that the hypervisor being used is one of the following or
newer:

Microsoft Windows Server 2008 R2 (or newer)


Microsoft Hyper-V Server 2008 R2 (or newer)
VMware ESX 4.1 (or newer)
Information:
More information about deploying Exchange Server 2013 on a Hypervisor
can be found here:
http://technet.microsoft.com/en-us/library/jj619301.aspx

3.5.1 What is different about Jetstress inside a virtual machine?


The approach and testing process do not change. The aim of the test is to
validate that the storage presented to the virtual guest can provide sufficient
performance to meet the predicted requirements from the mailbox role
calculator. All performance counters and recommended values remain the same
from a physical to a virtual guest and the recommendations for testing against
raid arrays and in failure-modes still apply.
However, there are things that we may need to consider during our Jetstress
testing.
1. Is the virtual host operating at a normal working load during our test? If
the host has capacity for 10 virtual machines and we are testing with a
single virtual machine running, then there is the possibility that we will
experience performance problems once the host is fully loaded.
2. Does the host server have any high availability technology that we need
to test in degraded mode? This could include things like multiple paths to
the storage or network, or maybe even a Hypervisor HA solution.

Page 15
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Additionally the host may be the failover location for other guests,
meaning that workload may increase dramatically in a failure scenario.
3. Follow the current recommended practices from both Microsoft and your
hypervisor vendor. Yes, I know this is obvious but it still amazes me how
many problems are resolved by following the recommended guidance!
Guidance
The spirit of the test is to ensure that the system can meet its predicted
workload during normal working conditions and during any common
failure modes for which the system has been designed to survive.

Page 16
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

For more information about virtualizing Exchange Server:

Announcing Enhanced Hardware Virtualization Support for Exchange 2010


(this applies equally to Exchange Server 2013):
http://blogs.technet.com/b/exchange/archive/2011/05/16/announcingenhanced-hardware-virtualization-support-for-exchange-2010.aspx

Demystifying Exchange 2010 SP1 Virtualization (this applies equally to


Exchange Server 2013):
http://blogs.technet.com/b/exchange/archive/2011/10/11/demystifyingexchange-2010-sp1-virtualization.aspx

Best Practices for Virtualizing Exchange Server 2010 with Windows


Server 2008 R2 Hyper V (Applies equally to Exchange Server 2013):
http://www.microsoft.com/download/en/details.aspx?id=2428

3.6 How much time should I allocate for Jetstress testing?


Jetstress testing can take a long time to complete and it is vital that this time is
correctly planned for within your Exchange project plan.
Generally, the test procedure can be broken up into three parts.

Initialisation
Testing
Clean-up

3.6.1 Initialisation
This phase includes installation, prerequisites and initial database creation. Of
these tasks, the initial database creation will take the longest amount of time.
Database creation time varies between hardware deployments however expect
around 24 hours for 10TB of data per server (~7GB/minute). If you are using
direct attached storage and initialise multiple servers in parallel these
predictions apply to each server. If you are using shared storage, your
initialisation time may take considerably longer.
DATA
(TB)

1TB

2TB

5TB

10TB

50TB

100TB

TIME
(Hours)

2.4

4.8

12.0

24.1

120.3

240.6

TIME
(Days)

0.1

0.2

0.5

1.0

5.0

10.0

Table 3: Database initialisation time

3.6.2 Testing
The actual testing phase will vary depending on the complexity and maturity of
the design. If your design is based on complex, cutting-edge storage
technology, it is highly likely that you will need to allocate more time for testing.
If your design is based on common direct attached components, the testing
Page 17
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

phase is likely to be quite short. For simple direct attached solutions allow
between 2-5 days, for complex SAN solutions try to allocate up to 10 working
days. If you are working in a complex enterprise with large scale, complex
storage infrastructure budget between 4-6 weeks for Jetstress testing.
Troubleshooting storage performance issues can often be very time-consuming.

3.6.3 Clean-up
Before the server can be put into production, it is necessary to remove the
Jetstress application and the test databases that were created. The
recommended procedure is as follows

Uninstall Jetstress and Reboot


Copy the Jetstress data to a safe location
Delete the Jetstress installation folder
Remove all test databases

Depending on complexity, allow between 1 and 2 hours per Exchange server


that needs to have Jetstress uninstalled.
Tip:
If you have a complex deployment, you can use the scripts embedded
here:

JetstressScripts.zip

The scripts will parse your JetstressConfig.XML file and remove all
database and log folders defined in the test. The scripts takes two input
parameters:

[XMLFile] Path to JetstressConfig.XML file defaults to C:\Program


Files\Exchange Jetstress\JetstressConfig.xml if no other value is
specified.

[Prompt] $true or $false, default is $true, specify $false to use as part


of an automated process.

Note that these scripts are unsupported and you use them
entirely at your own risk. They are provided here for convenience
only.

Page 18
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

3.7 Preparing for the Jetstress test


Jetstress simulates an Exchange database workload. To ensure that the
environment is ready it should be configured according to both the hardware
vendors and Microsoft recommendations.
Refer to Understanding Exchange 2013 Storage Configuration Options for further
detail.
As a starting point, ensure that the following conditions have been met:
1. If multiple clusters will be sharing any aspect of the disk subsystem, the
server/storage configuration must be Cluster/Multi-Cluster Certified.
2. Verify with vendors that drivers and firmware are current and consistent
across all servers. Drivers and firmware include, but are not limited to, the
following items:
a. Server BIOS/firmware
b. SCSI/Array Controller firmware and driver
c. Fibre Host Bus Adapter (HBA) firmware and driver
d. Fibre switch/hub firmware
e. SAN (Storage Area Network) enclosure Operating
System/Microcode/firmware
f. Hard disk firmware
3. Verify that the HBA/SAN specific configuration is set correctly and is
consistent across all servers. Many HBAs use registry keys to customize
the configuration to a specific SAN platform (for example, Queue Depth).
4. Raid Controller Stripe size is 256Kb or greater (refer to hardware vendor
for guidance).
5. Read/Write Cache is 75% Write and 25% Read on all LUNs.
6. Configure the storage logical unit numbers (LUNs) (consider Exchange log
devices and database devices).
7. Format the LUNs within Windows with NTFS file system. Best practice =
64k allocation unit size.
8. NTFS Compression is not enabled.
9. File Level Anti-Virus is configured to exclude all Exchange data locations
and any directories that Jetstress has been configured to use.
10.Storport.SYS has been updated to the latest supported version for your
hardware.

Page 19
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

3.8 What happens if the test fails?


It is important to determine the pass and fail criteria for the test. The test will
find the peak working load that the storage is able to provide at the I/O latency
targets recommended by the Microsoft Exchange Team. These are defined in
section 8.3 Interpreting Jetstress test results.
If the recorded IOPS target from the Jetstress test is above the targets
documented within the Exchange design then the storage solution is deemed to
have passed the test. If it does not meet the design targets, then the storage
solution is deemed to have failed the test.
If the test shows that, the storage has failed to meet its design targets it will be
necessary to perform remediation. This usually involves a combination of
resources from the design/project, build, hardware, and storage vendor teams.
The aim of remediation is to determine why the IOPS target was below the
design target and to provide a remediation plan before submitting the solution
for a re-test.
Before beginning significant storage redesign work, it is important to check the
basics listed in section 4.7 Preparing for the Jetstress test. The most common
causes of Jetstress test failures are missing simple configuration steps during
deployment and/or misconfiguring the Jetstress test itself.
One of the most common pitfalls that occurs when a test fails is focussing on
Jetstress itself. Remember that Jetstress has not failed. Your storage has failed
the test. Jetstress is just the messenger, instead concentrate on understanding
the data that Jetstress has provided and how you can fix your storage solution.
Jetstress is a well-proven tool and is extremely unlikely to be the root cause of
your storage test failing.
Advice:
It is much easier to resolve configuration problems during this phase of
the deployment than after the Exchange servers have been put into
production. It is far better to suffer a small delay to the project timescales
than put a service into production that does not meet its original goals.

Page 20
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Installing Jetstress
4.1 Documentation
The document that you are currently reading represents the main source of
information for Jetstress 2013. If you are validating Exchange Server 2003, 2007
or 2010 refer to the Jetstress Field Guide for Jetstress 2010.

4.2 Jetstress Version and Download


Version

Bui
ld

Usage

14.01.0225
.017

32
bit

Exchange
20032

http://www.microsoft.com/enus/download/details.aspx?id=20054

14.01.0225
.017

64
bit

Exchange
2007
Exchange
2010

http://www.microsoft.com/enus/download/details.aspx?id=4167

Exchange
2013

http://www.microsoft.com/enus/download/details.aspx?id=36849

15.0.658.4

64
bit

Link

Table 4 - Jetstress version and download table

Note: Although there is a 32-bit build of Exchange 2007, it is not recommended


or supported to use these ESE files to run a Jetstress test. This is due to the
requirement for a 64-bit address space to simulate a realistic Exchange I/O
pattern.

Jetstress 2013 will not allow you to use an XML configuration file from an
older version of Jetstress.
Always ensure that you use the same version of Jetstress to initialise the
databases and to perform the testing.

2 Refer to Appendix D Exchange 2003 for information on configuring Jetstress


14.01.225.x for Exchange 2003
Page 21
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

4.3 Prerequisites

.NET Framework 4.5 or higher


A copy of your 64-bit production ESE files3
o ese.dll
o eseperf.dll
o eseperf.hxx
o eseperf.ini
o eseperf.xml

It is important that the version of ESE that is used for the test is the same
version that will be used in production.

3 See section 5.4 Getting ESE Files necessary for Jetstress for the locations of
these files.
Page 22
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

4.4 Getting ESE Files necessary for Jetstress


Jetstress requires ESE to function. The needed files are available from an
installed Exchange server or from the Exchange installation media. It is
recommended to get the files from an installed Exchange server that has been
fully updated and patched. If you are validating Exchange 2010 or newer, it is
possible to get the necessary files directly from the installation media without
requiring an Exchange installation.
Note: AMD64 refers to the x86-64 bit architecture and is not specific to AMD
processors. Do NOT use the x86 files!

4.4.1 File locations from an installed Exchange Server


File
ESE.DLL
ESEPERF.DLL
ESEPERF.HXX
ESEPERF.INI
ESEPERF.XML

Path
C:\Program Files\Microsoft\Exchange Server\V15\Bin
C:\Program Files\Microsoft\Exchange
Server\V15\Bin\perf\AMD64
C:\Program Files\Microsoft\Exchange
Server\V15\Bin\perf\AMD64
C:\Program Files\Microsoft\Exchange
Server\V15\Bin\perf\AMD64
C:\Program Files\Microsoft\Exchange
Server\V15\Bin\perf\AMD64

Table 5 - ESE file locations on running Exchange server

4.4.2 File locations from the installation media


File
ESE.DLL
ESEPERF.DLL
ESEPERF.HXX
ESEPERF.INI
ESEPERF.XML

Path
\setup\serverroles\common
\setup\serverroles\common\perf\amd64
\setup\serverroles\common\perf\amd64
\setup\serverroles\common\perf\amd64
\setup\serverroles\common\perf\amd64

Table 6 - ESE file locations from installation media

Caution
Remember to use the same version of ESE files in your Jetstress
tests that you will use in production.

Page 23
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

4.5 Installation
Before performing this section, it is recommended that all prerequisites have
been met and that Exchange server is not installed on any servers being
used for Jetstress testing.

4.5.1 Application Installation


#

Instruction

Screenshot

1.

Begin Jetstress installation

2.

Accept License agreement

Page 24
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

3.

Leave the installation options


as default unless you have a
good reason to change them.
Note: All performance data
and HTML reports will be
stored in the installation folder
so if your system drive is short
of space select an alternative
folder.

4.

This is the last chance to stop


the installation. Click on
Next to install

Page 25
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

5.

Once installation is completed


click on Close.

Table 7 - Jetstress installation instructions

4.5.2 ESE File Installation


#
1.

Instruction

Screenshot

Copy ESE prerequisite files


into the Jetstress installation
folder.
By default this is c:\Program
Files\Exchange Jetstress

Page 26
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

2.

Start Exchange Jetstress


2013
Note: Jetstress requires local
Administrator access. If user
access control is enabled,
ensure that you start the
JetstressWin.EXE process as
an administrator.

3.

Click on Start new test

4.

Jetstress will attempt to use


the ESE files that were copied
over in step 1. The first time
that this occurs Jetstress must
be restarted. Verify in the
output on this screen that the
ESE version is correct and that
the last line of the status
output requires that Jetstress
be restarted.
Close Jetstress
This is the end of the Jetstress
installation.
Table 8 - ESE installation instructions

Page 27
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Configuring Jetstress
For the purposes of this document, we will be configuring a disk subsystem
throughput test. The goal of this test is to identify the peak working IOPS value
that the storage subsystem can sustain while remaining within the disk latency
targets established by the Exchange Product Group.

5.1 Jetstress Test Types


5.1.1 Test a disk subsystem throughput
This test uses some fixed parameters to determine the maximum storage
performance at maximum working capacity (80%). This is the recommended
test type since it identifies the maximum working load of the storage solution for
use with Exchange Server 2013 while the disks are filled to capacity. The values
observed from this test can be used both to qualify the solution ready for
production and to calculate available system I/O headroom once the service is in
production. This test should be regarded as mandatory for each Exchange
server released into production.
Databases Size Control
Where you are testing multiple databases per volume, Jetstress will
automatically calculate the database size of all databases on the same
volume to ensure that the test runs at 80% of volume capacity.
If your volume is over-sized for your solution for some reason and the test
databases are too large, then you can control the size of the databases by
reducing the size the database using storage capacity percentage box
during the test configuration to be whatever you need.

5.1.2 Test an Exchange mailbox profile


Helps you determine whether your storage system meets or exceeds the
planned Exchange mailbox profile. In the Exchange mailbox profile test scenario,
you can specify the number of mailbox users, IOPS per mailbox and quota size to
simulate the profiled Exchange mailbox load. This test type can be useful if your
storage has been specifically designed to operate only at a specific disk
capacity4.
Note: Even if this test type is used, it is still recommended to complete the disk
subsystem throughput test to determine the maximum working load of the
storage solution at full capacity.

4 It is not recommended to design Exchange storage performance based on less


than 80% utilisation capacity.
Page 28
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Page 29
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

5.2 Initial configuration


#

Instruction

Screenshot

1.

Open Exchange Jetstress


2013

2.

Click on Start new test

3.

Check that the status text


does not ask for a restart and
that the last two lines state
that the ESE engine and
performance libraries were
detected.

Page 30
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

4.

Since this is the first time, we


are configuring a test we will
accept the defaults and click
next.
This will create a new
configuration file called
JetstressConfig.xml in the
default installation directory.
If you already have an XML file
select that.

5.

Select the Test disk


subsystem throughput test
and click next

6.

Ensure that Supress tuning


and use thread count is
unchecked. This is a change
to Jetstress 2010 where autotuning would rarely work.
Auto tuning should work in
most scenarios with Jetstress
2013. If Auto-tuning fails,
revert to manual thread
configuration as per Appendix
A Configuring Thread Count.
You should always test with
100% database capacity and
target IOPS throughput,
however if the storage
presented to your servers is
greatly oversized then you can
control the Jetstress test
database sizes by reducing
the size the database using
storage capacity percentage.
Most validation tests should
Page 31
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

leave both values at 100.


7.

Configure the test for


performance. If you are
testing a shared storage
platform, enable the multihost checkbox. Ensure that
run background database
maintenance is checked. Set
continue the test run despite
encountering errors to
enabled.
If any errors are detected
during the test, they will be
reported in a new table to
highlight disk errors.

8.

Enter in the folder for storing


the test results and set the
correct duration for Jetstress.
A minimum of one successful
2hr and a separate 24 test is
required for deployment
validation.
Note: While auto-tuning or
configuring thread count, you
can set a shorter than 2 hour
test by typing directly into the
window.

0.75 = 45m
0.50 = 30m
0.25 = 15m

Recommendation: Use 0.50


(30 minute) test runs to set
thread count for SAN storage.

Page 32
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

9.

Configure the test to represent


the production deployment.
Number of databases should
be the total on this server
including all database copies,
active, passive and lagged.
Number of copies per
database represents the
number of total copies that
will exist for each unique
database. This value simply
simulates some LOG I/O reads
to account for the log shipping
between active and passive
databases it does NOT
actually copy logs between
servers.
For example, if your 6 server
DAG contained 30 databases,
with 1 active copy, 2 passive
HA copies and 1 lagged copy
per database (or 120
database copies spread across
6 servers, with each server
hosting 20 copies), you would
set the number of databases
to 20 and the number of
copies per database to 4.

10.

Configure the database and


log file paths appropriately.
Scroll to the bottom of this
page to find the next link.
Note: Refer to the Mailbox
Role Calculators Distribution
Tab to understand how your
database should be
configured.

Page 33
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

11.

If this is the first time the test


has been run select to Create
new databases, otherwise
select Attach existing
databases.

12.

Verify that the paths are as


expected and click Prepare
test

13.

This will begin database


initialisation this process will
vary but plan on 24 hours for
every 10TB worth of data to
be initialised.
This value should equate to
80% of the available storage.
Refer to section 4.6.1
Initialisation, for further
information on database sizes
and creation time.

Page 34
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

14.

Once the test has been


initialised, click Execute
Test.

15.

Once the test has completed,


close Jetstress and copy the
Jetstress report and
performance data somewhere
for analysis.
Each performance test will
generate the following files.

Performance_<date>.X
ML
Performance_<date>.H
TML
Performance_<date>.B
LG
DBChecksum_<date>.X
ML
DBChecksum_<date>.H
TML
DBChecksum_<date>.B
LG
XMLConfig_<date>.XML

Ensure that you make a copy


of all of these files.
Note: In addition you may also
wish to make a copy of the
*.EVT files which contain
event log data taken during
Page 35
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

the test.
Table 9 - Jetstress initial configuration

Page 36
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Jetstress Output Files


This section will explain what output files will be created after the test and what
is in each one.
File

Content

Purpose

Performance_<date>.BLG

Binary performance data


captured during the
performance test.

To provide detailed data for


analysis. Open this file in
perfmon and examine the
counters manually to
understand reasons for
failure.

Performance_<date>.XML

XML Report for the


performance test

Provides the status report


data in XML format.

Performance_<date>.HTM
L

HTML Report for the


performance test

Provides an easy to read


status report for the test.

DBChecksum_<date>.BLG

Binary performance data


captured during the
checksum test.

Provides binary
performance data
gathered during the CRC
checksum of the database.
Useful if the checksum fails
or takes a long time to
complete.

DBChecksum_<date>.XML

XML Report for the


checksum test

Provides status report data


in XML format.

DBChecksum_<date>.HTM HTML Report for the


L
checksum test

Provides an easy to read


status report for the
checksum test.

XMLConfig_<date>.XML

Provides a backup of the


Jetstress Configuration file
used for the test.

XML Configuration File

Table 10 - Jetstress output files

Page 37
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Reading Jetstress report data


This section will walk through a very simple sample report, and explain where
the key values are stored and how to interpret the data.

7.1 Target design values


Before we can evaluate our Jetstress data, we need to know what our design
targets are. Assuming that the storage design was based on data from the
Mailbox Role calculator (which they should be), the information we need is in the
following table on the Role Requirements tab.

Make a note of the following value:

Total Database Required IOPS / Server

Page 38
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

7.2 Reading the Jetstress Test Result Report


The following report is for a test with four databases configured.

7.2.1 Test Summary

This section is a basic summary of the test, when it started, finished and which
versions of operating system and ESE were used.
The most important part of this section is the overall test result, pass or fail.

7.2.2 Database Sizing and Throughput

This section shows some more detailed parameters regarding the test. A test
disk subsystem throughput test report will always show 100% for Capacity
Percentage and Throughput Percentage. In this example, 4 x 25GB Databases
were created on a 126GB LUN. Jetstress created a total of 101GB ( 109154926592
bytes) of data for testing which is 80% of the available space. This is normal
behaviour; by default, in performance mode Jetstress will use 80% of the disk
capacity to allow room for growth during the test process.
The most important value in this section is the Achieved Transactional I/O per
Second. In this example the test validated the storage can provide 231
transactional I/O per second. This represents random database IOPS.
Note:
To validate that the test has met the design requirements compare the
Achieved Transactional I/O per Second from your Jetstress report to the
Total Database Required IOPS / Server value recorded in section 8.1 Target
design values, from the Mailbox Role Calculator.

Page 39
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

7.2.3 Jetstress System Parameters

This section displays some system values that Jetstress used for this test. The
important values for analysis here are the thread count and number of copies
per database.

7.2.4 Database Configuration

This section lists the paths for each database and log combination. In this
example, 4 x 25GB databases were configured on a single LUN. Check that all of
the test databases are listed here and the path names are correct.

7.2.5 Transactional I/O Performance

This section of the report displays the Transactional I/O values that were
achieved for each database. Transactional I/O does not include I/O for
Background Database Maintenance.
BDM I/O is mostly sequential so it is not usually considered during the design
phase.
Information:
If you sum the values highlighted in the red box the result should add up
to the Achieved Transactional I/O per second reported in the Database
Sizing and Throughput table.
Page 40
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

In this example, 33.859+ 24.069 + 33.87 + 23.491 + 33.978 + 24.186 +


34.043 + 23.807 = ~231 IOPS.

7.2.6 Background Database Maintenance I/O Performance

This section displays the I/O that was used to perform Background Database
Maintenance only. The sum of values in the red box shows the total amount of
IO used for BDM operations. These are sequential operations and we do not
usually need to account for them in our design. However, take the advice of
your storage vendor on this aspect, some storage platforms do not handle
sequential IO as well as others and may require some additional design work to
help them deal with BDM more gracefully.

7.2.7 Log Replication I/O Performance

This section displays the I/O overhead for LOG file replication. In this example
there were two replica copies (replicas=2), this is shown by a non-zero count for
I/O Log Reads/sec. If this value is greater than zero it confirms that database
replication is being simulated.
Note:
For those that noticed, I finally provided a report that shows log IO I
know, the little things count

Page 41
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

2.1.6 Total I/O Performance

This table shows all I/O that was recorded during the test (transactional I/O plus
BDM I/O plus LOG I/O). The summation of I/O values from areas highlighted in
red in this table should agree (roughly) with those observed at the storage
subsystem.
In this case, the summation suggests that the storage subsystem had to deal
with 349 IOPS. However, roughly 1/3rd of those (349-231=117) IOPS were
sequential and so were not accounted for during the design process, since
sequential I/O is very easy on most disk subsystems.
The following chart shows the observed IOPS from the Windows host during the
Jetstress test. This counter includes all system IOPS as well as the test IOPS;
however there should be a strong correlation between the IOPS observed on the
windows host and at the storage subsystem. In the event of contradiction
between observed IOPS at the Windows Host and those at the storage controller,
the windows host values take precedence from a Jetstress validation perspective.

Page 42
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Figure 6 - Host observed IOPS

It is import to differentiate between sequential IOPS and transactional (random)


IOPS when validating your storage.
We are only interested in transactional IOPS when we are Jetstress testing BDM
and LOG IO are sequential in nature and so we ignore them from a performance
planning perspective for Exchange Server.
Often storage teams are confused by the results of a Jetstress test since the
achieved transactional I/O per second value is much lower than the observations
they make at the storage system. It is important to differentiate between the
workloads.
Note:
It is an invalid approach to sum the values displayed in the Total I/O
Performance table and compare them to the Total Database Required
IOPS / Server predicted by the Mailbox Role calculator. The only value
from the Jetstress report that is required for validation is Achieved
Transactional I/O per Second. All other values are for support and
curiosity only!

Page 43
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

2.1.7 Host System Performance

Figure 7: Host System Performance Table

This section of the report shows the observed system performance during the
test. This section is most often used for troubleshooting. The most important
thing to note from this section is that the CPU load from Jetstress is usually
minimal. Jetstress has been optimized to evaluate the storage subsystem and
not the host performance itself.

2.1.8 Error Counts Per Volume


If the Jetstress test detects IO errors, during the test it will try to continue to run
the test and report the errors in both the Test Log and Error counts per Volume
table. The table lists each volume along with the number and type of IO errors
that were recorded.

Figure 8: Error Counts Per Volume Table

Page 44
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Error Type

JET/ESE Error Type

Error
Code

IO Failures

JET_errDiskIO

-1022

JET_errReadVerifyFailure

-1018

JET_errPageNotInitialized

-1019

JET_errReadPgnoVerifyFailure

-1118

JET_errDiskReadVerificationFailure

-1021

JET_errCheckpointCorrupt

-533

JET_errMissingLogFile

-528

JET_errLogFileCorrupt

-501

JET_errInvalidPath

-1023

JET_errInvalidSystemPath

-1024

JET_errInvalidLogDirectory

-1025

JET_errFileAccessDenied

-1032

JET_errFileInvalidType

-1812

JET_errLogCorrupted

-1852

JET_errObjectNotFound

-1305

JET_errReadLostFlushVerifyFailure

-1119

JET_errDbTimeTooOld

-566

JET_errDbTimeTooNew

-567

Filesystem Corruptions

Lost Flush

Table 11: JET Error Code Groupings

Information
Some failure events are more important than others. Lost Flush events
signal significant data corruption has occurred and something is very
wrong with your storage (under no circumstances should you entertain
putting a system into production that is experiencing ANY lost flush
events during a test). However, some other IO Failures are relatively
normal, for example, in a JBOD environment we may see -1021
(JET_errDiskReadVerificationFailure) which, although signifies that the data
we read was not the same that we originally wrote (checksum failed),
Exchange will try to deal with this scenario via Page Patching in normal
operation and so is not of critical importance.

For a full list of JET/ESE event types see the following article Extensible Storage
Engine Error Codes.

Page 45
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

What is a Lost Flush?


A lost flush occurs if we issued a write operation to the disk and the OS reported
the operation as having successfully completed, but it actually didnt get
physically committed to the non-volatile storage. The two main reasons for this
to happen are:
1
2

A bug somewhere in the storage stack.


Power loss on storage with write-cache enabled: in this case, the operation
is committed to the volatile cache of the disk or controller, but if the
hardware loses power, it means it never actually made it to the nonvolatile storage, even though it was reported to the application that it did.
This is the reason why we only run with write-cache enabled on the
storage if theres a battery backing up the cache, so if it loses power, the
controller makes sure to flush the uncommitted cache to the disk.

A lost flush is a very insidious type of storage failure for a database engine
because the consequences can range from none (if we are very lucky) to nasty
and potentially undetectable logical database corruption (more likely).
Undetected lost flushes on the active copy may show up as a
JET_errDbTimeTooNew (-567) replication error on the passive copy. Undetected
lost flushes on the passive copy may show up as a JET_errDbTimeTooOld (-566)
replication error on the passive copy.
ESE has implemented lost flush detection, based on a flush map. Basically, every
time we issue a write on a page, we flip a bit on the actual page and also store
that bit in a flush map in memory. If we read the page again off the disk, we
check the bit against the in-memory flush map and if they dont match, it means
the flush was lost.
Important:
The bottom line for lost flushes is that you should NEVER put a system
into production that has recorded lost flushes during the Jetstress test.
You must be 100% certain that you have resolved the underlying problem
and have at least one good 24 hour test that has no lost flushes recorded
before accepting the solution into production.

Page 46
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

7.2.8 Test Log


This section of the report is a log of the Jetstress test. It is most often used for
troubleshooting failures.

Page 47
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

7.3 Interpreting Jetstress test results


Jetstress evaluates latency values for Database Reads and LOG writes since
these affect the end user experience.
Performance Test Strict mode (<= 6 hour test)

Average Database Read Latency: 20ms


Average Log File Write Latency: 10ms
Max Database Read Latency: 100ms
Max Log File Write Latency: 100ms

Stress Test Lenient mode (> 6 hour test)

Average Database Read Latency: 20ms


Average Log File Write Latency: 10ms
Max Database Read Latency: 200ms
Max Log File Write Latency: 200ms

Page 48
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

7.4 Test evaluation


Evaluate the following criteria for each test run. The first test is validated
against the design target and must be performed manually; Jetstress does not
validate this value. The second and third are against pre-defined latency targets
for Exchange, if these values are not within tolerance, Jetstress will report the
test as failed.
1. DB IOPS Target: Is the Achieved Transactional I/O per Second in the test
report higher than the Total Database Required IOPS / Server predicted in
the Mailbox Role Calculator?
2. Is the I/O Database Reads Average Latency in the test report <20ms?
3. Is the I/O Log Writes Average Latency in the test report <10ms?

DB
IOPS
Target

DB
Read
Laten
cy

LOG
Write
Laten
cy

Action

PASS

PASS

PASS

Test successful

FAIL

PASS

PASS

The test is failing to meet the IOPS target, but the


latency values are good. Increase the thread count by
1 and re-test. Use sluggishsessions to fine-tune if
necessary.

PASS

FAIL

FAIL

PASS

PASS

FAIL

PASS

FAIL

PASS

At least one database has recorded latency over


threshold. If the latency values are very close to limits
increase sluggish sessions by 1, if both target IOPS
and latency values are much higher decrease the
thread count.

FAIL

FAIL

FAIL

FAIL

FAIL

PASS

FAIL

PASS

FAIL

If the test shows that Achieved IOPS is below the


design target AND the test latency values are above
limits the storage solution is unable to meet the
requirements. At this stage, it is necessary to reevaluate the storage design and begin troubleshooting
the physical deployment to determine the correct
remediation.

Table 12 - Quick results analysis table

Page 49
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Appendix A Configuring thread count


Jetstress 2013 has been updated so that the auto-tuning feature will work in far
more scenarios than previously. Due to this, it is recommended to begin
Jetstress testing in auto-tuning mode and only revert to manual thread
configuration if auto-tuning fails to set a thread value.
Thread count controls how many IOPS Jetstress attempts to drive through the
storage subsystem. Setting this value correctly requires some trial and error.
For the process described within this document the goal is to increase the thread
count to a value that fails and then reduce the value until the test passes, this
should then represent the peak working IOPS value that the storage subsystem
can support.
Each thread will generate a workload on the system. So for example, if the
storage design team recommended that the storage for a given server was able
to support 1000 IOPS:

Target IOPS = 1000

Starting thread count =

TargetIOPS
( 65 )

Given this example


Starting thread count =

1000
( 65 )

15.38 (round up to 16)

Notes:

Try auto-tuning with Jetstress 2013

If in doubt start with thread=1 and work up until the test fails.

If the thread count predicted is less than 1 it may be necessary to


modify the sluggishsessions value afterwards.

The exact quantity of IOPS generated per thread will change as the
storage system workload changes. As the storage system gets
closer to its performance limit the IOPS per thread value will reduce.
Jetstress was designed to produce approximately 60 IOPS per thread
at 20ms disk latency.

Page 50
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Appendix B Configuring sluggishsessions


If it is not possible to achieve the right IOPS value by modifying the thread count
it becomes necessary to modify the sluggishsessions value within the
JetstressConfig.xml file.
The sluggishsessions value adds a pause between each task. This allows a level
of fine-tuning over the workload dispatched by Jetstress.
As sluggishsessions is increased, the achieved IOPS value decreases.
To change the value, open the JetstressConfig.xml file and look for the default
configuration option
<SluggishSessions>1</SluggishSessions>
Modify the value, save the configuration file and then re-start Jetstress.

Page 51
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

10

Appendix C - Running a Jetstress Test with


JetstressCmd.exe
Both JetstressWin.exe and JetstressCmd.exe use the common Jetstress core
library files, which means you will have comparable test results with the same
XML configuration file. We recommend that you use JetstressWin.exe to create
new test scenarios, and JetstressCmd.exe to open and run the test scenarios by
using the /config command-line option. You can also see all the other available
options by using the /? (help) command-line option.
Action Argument

Example of Use

Description

help

/?

The help for the


command-line program

Config

/c JetstressConfig.xml

Open a configuration file

Generate

/g

Generate a sample XML


configuration file

TimeOut

/TimeOut 2H0M0S

Test Duration. Default is 2


hours.

Output

/output c:\output

Path for test output.


Default is the current
directory.

DBPath

/dbpath m:\sg1\mdb
/dbpath n:\sg2\mdb

Database paths for each


storage group

LogPath

/log x:\sg1\log y:\sg2\log

Log path for each storage


group

PctCapacity

/pctcapacity 100

Specify capacity
percentage

Throughput

/throughput 100

Specify throughput
percentage

Threads

/threads

Suppress auto tuning and


specify thread count

DoNotRunDBMPerform
ance

Do not run background


database maintenance
during performance/stress
test

RunDBMPerformance

Run background database


maintenance during soft
Page 52

Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8


Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

recovery test
New

/new

Create new databases

Open

/open

Open existing databases

Bak

/bak

Restore backup database

Recovery

/recovery

Run soft recovery test

Streaming

Run streaming backup


test

Transaction

Run transaction
performance test

VerifyCheckSum

Run database checksums

Page 53
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

11

Appendix E Running Jetstress on a production


server
Although the formal support position on this is that you shouldnt do it ever at
all under no circumstances in fact you shouldnt even be reading this section
of the field guide however, we all accept there are cases where it can be
necessary, such as when attaching new storage to an existing server or
troubleshooting performance bottlenecks on existing servers.
That still doesnt mean its ok to do it!!
If you really MUST do it, here are some things to know before beginning

Record the start-up state of all Exchange Services.


Stop and Disable all Exchange Services on the server.
Copy the ESE files from the currently installed version of Exchange server
Jetstress will detect that the performance counters are already installed
for this version of ESE and will use them, this will prevent performance
counter problems afterwards!
Do not unload/reload performance counters after the test (if you have
used the same ESE files as are currently installed this is unnecessary and
could break things!).
Remember to clean up the Jetstress test databases after testing.
Uninstall Jetstress.
Set Exchange Services back to the state they were in before you began
testing.
Reboot your Exchange Server.
Inspect Exchange Performance counters are working.
Inspect Windows System and Application Event logs for errors.

Remember: This is not supported or recommended only follow this as a matter


of last resort or under the instruction of Microsoft Support/Microsoft Consulting
Services.

Page 54
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

12

Common Issues
12.1 Troubleshooting Jetstress
While using Jetstress, you may encounter some known issues with Jetstress. This
section provides possible causes, and the recommended solutions.

12.1.1

Jetstress cannot attach to or create a database

Event log error that may display: Error -1023

Possible cause: The path of the database or log files is incorrect.


Solution: Ensure that the paths and file names are correct.

Event log error that may display: Error -1032

Possible cause: Permissions are insufficient to access the .edb file or the
log files.
Solution: Verify that permissions are sufficient for the account under which
Jetstress is running. Jetstress requires read/write permission to the
directories it is using.

Event log error that may display: Error -550 (0)

Possible cause: The last time Jetstress was run, it was ended uncleanly.
This caused the log files to become unsynchronized with the database.
Solution: Delete the Jetstress database (*.edb), log files (*.log), and check
file (*.chk), and re-create the Jetstress database. You can also use
Eseutil.exe with the /r switch to resynchronize the logs and database.

Event log error that may display: Error -1022

Possible cause: The failure is caused by circular logging by Jetstress.


Solution: Check the log drive for the log file name that is identified in the
event log. Delete that log file and all the log files that have a higher
number in the file name. Then, run Eseutil.exe /r to recover Jetstress.edb.
When the database is in a good state, delete all the log files in the log
directory, and rerun Jetstress.

12.1.2

Error loading Performance Monitor counters

JetstressWin.exe relies on performance counters to monitor the system.


JetstressWin.exe requires the ESE database counters to be installed.

Cause: When the counters are not loaded correctly, you may see
exception errors related to performance counters.
Solution: To reload the counters, exit from JetstressWin.exe. Locate the
directory where JetstressWin.exe was installed and verify that eseperf.dll,
eseperf.hxx, and eseperf.ini files exist in the directory. In a command shell
window, type the command unlodctr ESE and then click Enter. This will

Page 55
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

unregister the ESE Database performance counters. Start JetstressWin.exe


and allow it to reload the performance counters.

12.1.3

Unable to tune for the parameters

This error indicates that Jetstress could not find appropriate parameters that
could be used to run a performance or stress test at the desired level of I/O load.

Cause: This can be caused by several factors. The most common reason is
that the storage subsystem has multiple hosts attached to it, and those
hosts are competing for common resources during the tuning process.
Solution: When you are running in a scenario such as this, you can run
Jetstress on a single host with tuning enabled to generate the appropriate
load parameters, and then rerun the test on the other hosts with the
Suppress Tuning option enabled and the tuning parameters entered
manually from the results of the first test.

12.1.4
Unable to mount databases due to invalid mount point
configuration
When using mount points and running the Prepare phase of Jetstress, the
operation fails with error There is insufficient disk space on volume <system
drive>:\ , where <system drive> is the drive letter where you keep your root
mount folder.

Cause: This error means that one or more of the mount points is invalid or
the mount point folder path is not connected to its LUN. Database creation
fails saying that volume C: (or in general, the system volume) does not
have enough space. The issue here is that some of the mount-points
mapped to directories in the system volume are not properly configured
and so Jetstress is looking at the directory (thus checking against the
system drive itself), rather than the actual disk.

Troubleshooting: Execute a DIR command in the mount point root folder.


ALL mount point folder paths are indicated by a <JUNCTION> notation.
Any folder that is listed as a <DIR> is not attached to its mount point and
is likely causing the problem.

Page 56
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

000Exchange Community0

Solution: The mount path folder could be listed as <DIR> for a number of
reasons:
1. Verify the LUN is present and in good health.
2. Use the storage system array management software to verify that the
LUN has an assigned logical drive.
3. Using the Disk Management MMC, re-assign the LUN to the correct
mount-point.

2.1.9 Jetstress testing failed. Error: System.ApplicationException:


Faulty performance counter paths: \MSExchange
Database(*)\*
Jetstress version 658.004 has an incompatibility with ESE version 620 (CU1) and
above, if you try to run a test with more than 38 databases configured. If you
experience this issue either use the RTM version of ESE (516.26) or use a version
of ESE later than 726, which will be released with CU2.
Additionally, a fixed version of Jetstress will be released (726) that will work with
all versions of ESE after 516.26 (Exchange 2013 ESE).

Page 57
Jetstress 2013, Jetstress Field Guide, Version 2.0.0.8
Prepared by neil.johnson@microsoft.com
"323222109" last modified on 8 Jul. 13, Rev 2

You might also like