Professional Documents
Culture Documents
9 - Simulation Output Analysis
9 - Simulation Output Analysis
C H A P T E R
9 SIMULATION OUTPUT
ANALYSIS
9.1 Introduction
In analyzing the output from a simulation model, there is room for both rough
analysis using judgmental procedures as well as statistically based procedures for
more detailed analysis. The appropriateness of using judgmental procedures or
statistically based procedures to analyze a simulation’s output depends largely on
the nature of the problem, the importance of the decision, and the validity of the
input data. If you are doing a go/no-go type of analysis in which you are trying to
find out whether a system is capable of meeting a minimum performance level,
then a simple judgmental approach may be adequate. Finding out whether a sin-
gle machine or a single service agent is adequate to handle a given workload may
be easily determined by a few runs unless it looks like a close call. Even if it is a
close call, if the decision is not that important (perhaps there is a backup worker
who can easily fill in during peak periods), then more detailed analysis may not be
needed. In cases where the model relies heavily on assumptions, it is of little value
to be extremely precise in estimating model performance. It does little good to get
six decimal places of precision for an output response if the input data warrant
only precision to the nearest tens. Suppose, for example, that the arrival rate of
customers to a bank is roughly estimated to be 30 plus or minus 10 per hour. In
this situation, it is probably meaningless to try to obtain a precise estimate of teller
utilization in the facility. The precision of the input data simply doesn’t warrant
any more than a rough estimate for the output.
These examples of rough estimates are in no way intended to minimize the
importance of conducting statistically responsible experiments, but rather to em-
phasize the fact that the average analyst or manager can gainfully use simulation
221
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
.60
.25
.66
experiment but does not give us an independent replication. On the other hand, if a
different seed value is appropriately selected to initialize the random number gen-
erator, the simulation will produce different results because it will be driven by a
different segment of numbers from the random number stream. This is how the sim-
ulation experiment is replicated to collect statistically independent observations of
the simulation model’s output response. Recall that the random number generator
in ProModel can produce over 2.1 billion different values before it cycles.
To replicate a simulation experiment, then, the simulation model is initialized
to its starting conditions, all statistical variables for the output measures are reset,
and a new seed value is appropriately selected to start the random number gener-
ator. Each time an appropriate seed value is used to start the random number gen-
erator, the simulation produces a unique output response. Repeating the process
with several appropriate seed values produces a set of statistically independent
observations of a model’s output response. With most simulation software, users
need only specify how many times they wish to replicate the experiment; the soft-
ware handles the details of initializing variables and ensuring that each simulation
run is driven by nonoverlapping segments of the random number cycle.
Point Estimates
A point estimate is a single value estimate of a parameter of interest. Point esti-
mates are calculated for the mean and standard deviation of the population. To
estimate the mean of the population (denoted as µ), we simply calculate the aver-
age of the sample values (denoted as x̄):
n
xi
x̄ = i=1
n
where n is the sample size (number of observations) and xi is the value of ith
observation. The sample mean x̄ estimates the population mean µ.
The standard deviation for the population (denoted as σ), which is a measure
of the spread of data values in the population, is similarly estimated by calculat-
ing a standard deviation of the sample of values (denoted as s):
n
i=1 [x i − x̄]
2
s=
n−1
The sample variance s2, used to estimate the variance of the population σ2, is ob-
tained by squaring the sample standard deviation.
Let’s suppose, for example, that we are interested in determining the mean or
average number of customers getting a haircut at Buddy’s Style Shop on Saturday
morning. Buddy opens his barbershop at 8:00 A.M. and closes at noon on Saturday. In
order to determine the exact value for the true average number of customers getting
a haircut on Saturday morning (µ), we would have to compute the average based on
the number of haircuts given on all Saturday mornings that Buddy’s Style Shop has
been and will be open (that is, the complete population of observations). Not want-
ing to work that hard, we decide to get an estimate of the true mean µ by spending the
next 12 Saturday mornings watching TV at Buddy’s and recording the number of
customers that get a haircut between 8:00 A.M. and 12:00 noon (Table 9.1).
1 21
2 16
3 8
4 11
5 17
6 16
7 6
8 14
9 15
10 16
11 14
12 10
Sample mean x̄ 13.67
Sample standard deviation s 4.21
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
Interval Estimates
A point estimate, by itself, gives little information about how accurately it esti-
mates the true value of the unknown parameter. Interval estimates constructed
using x̄ and s, on the other hand, provide information about how far off the point
estimate x̄ might be from the true mean µ. The method used to determine this is
referred to as confidence interval estimation.
A confidence interval is a range within which we can have a certain level
of confidence that the true mean falls. The interval is symmetric about x̄, and the
distance that each endpoint is from x̄ is called the half-width (hw). A confidence
interval, then, is expressed as the probability P that the unknown true mean µ lies
within the interval x̄ ± hw. The probability P is called the confidence level.
If the sample observations used to compute x̄ and s are independent and nor-
mally distributed, the following equation can be used to calculate the half-width
of a confidence interval for a given level of confidence:
(tn−1,α/2 )s
hw = √
n
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
where tn−1,α/2 is a factor that can be obtained from the Student’s t table in Appen-
dix B. The values are identified in the Student’s t table according to the value
of α/2 and the degrees of freedom (n − 1). The term α is the complement of P.
That is, α = 1 − P and is referred to as the significance level. The significance
level may be thought of as the “risk level” or probability that µ will fall outside
the confidence interval. Therefore, the probability that µ will fall within the
confidence interval is 1 − α. Thus confidence intervals are often stated as
P(x̄ − hw ≤ µ ≤ x̄ + hw) = 1 − α and are read as the probability that the true
but unknown mean µ falls between the interval (x̄ − hw) to (x̄ + hw) is equal to
1 − α. The confidence interval is traditionally referred to as a 100(1 − α) percent
confidence interval.
Assuming the data from the barbershop example are independent and nor-
mally distributed (this assumption is discussed in Section 9.3), a 95 percent con-
fidence interval is constructed as follows:
Given: P = confidence level = 0.95
α = significance level = 1 − P = 1 − 0.95 = 0.05
n = sample size = 12
x̄ = 13.67 haircuts
s = 4.21 haircuts
From the Student’s t table in Appendix B, we find tn−1,α/2 = t11,0.025 = 2.201. The
half-width is computed as follows:
(t11,0.025 )s (2.201)4.21
hw = √ = √ = 2.67 haircuts
n 12
The lower and upper limits of the 95 percent confidence interval are calculated as
follows:
Lower limit = x̄ − hw = 13.67 − 2.67 = 11.00 haircuts
Upper limit = x̄ + hw = 13.67 + 2.67 = 16.34 haircuts
It can now be asserted with 95 percent confidence that the true but unknown
mean falls between 11.00 haircuts and 16.34 haircuts (11.00 ≤ µhaircuts ≤ 16.34).
A risk level of 0.05 (α = 0.05) means that if we went through this process for
estimating the number of haircuts given on a Saturday morning 100 times and
computed 100 confidence intervals, we would expect 5 of the confidence intervals
(5 percent of 100) to exclude the true but unknown mean number of haircuts given
on a Saturday morning.
The width of the interval indicates the accuracy the point estimate. It is
desirable to have a small interval with high confidence (usually 90 percent or
greater). The width of the confidence interval is affected by the variability in the
output of the system and the number of observations collected (sample size). It
can be seen from the half-width equation that, for a given confidence level, the
half-width will shrink if (1) the sample size n is increased or (2) the variability in
the output of the system (standard deviation s) is reduced. Given that we have lit-
tle control over the variability of the system, we resign ourselves to increasing the
sample size (run more replications) to increase the accuracy of our estimates.
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
Actually, when dealing with the output from a simulation model, techniques
can sometimes reduce the variability of the output from the model without chang-
ing the expected value of the output. These are called variance reduction tech-
niques and are covered in Chapter 10.
1 21
2 16
3 8
4 11
5 17
6 16
7 6
8 14
9 15
10 16
11 14
12 10
13 7
14 9
15 18
16 13
17 16
18 8
Sample mean x̄ 13.06
Sample standard deviation s 4.28
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
(t17,0.025 )s (2.11)4.28
hw = √ = √ = 2.13 haircuts
n 18
The lower and upper limits for the new 95 percent confidence interval are calcu-
lated as follows:
Lower limit = x̄ − hw = 13.06 − 2.13 = 10.93 haircuts
Upper limit = x̄ − hw = 13.06 + 2.13 = 15.19 haircuts
It can now be asserted with 95 percent confidence that the true but unknown mean
falls between 10.93 haircuts and 15.19 haircuts (10.93 ≤ µhaircuts ≤ 15.19).
Note that with the additional observations, the half-width of the confidence
interval has indeed decreased. However, the half-width is larger than the absolute
error value planned for (e = 2.0). This is just luck, or bad luck, because the new
half-width could just as easily have been smaller. Why? First, 18 replications was
only a rough estimate of the number of observations needed. Second, the number
of haircuts given on a Saturday morning at the barbershop is a random variable.
Each collection of observations of the random variable will likely differ from pre-
vious collections. Therefore, the values computed for x̄, s, and hw will also differ.
This is the nature of statistics and why we deal only in estimates.
We have been expressing our target amount of error e in our point estimate x̄
as an absolute value (hw = e). In the barbershop example, we selected an absolute
value of e = 2.00 haircuts as our target value. However, it is sometimes more con-
venient to work in terms of a relative error (re) value (hw = re|µ|). This allows us
to talk about the percentage error in our point estimate in place of the absolute
error. Percentage error is the relative error multiplied by 100 (that is, 100re per-
cent). To approximate the number of replications needed to obtain a point estimate
x̄ with a certain percentage error, we need only change the denominator of the n
equation used earlier. The relative error version of the equation becomes
2
(z α/2 )s
n = re
(1+re)
x̄
where re denotes the relative error. The re/(1 + re) part of the denominator is an
adjustment needed to realize the desired re value because we use x̄ to estimate µ
(see Chapter 9 of Law and Kelton 2000 for details). The appeal of this approach
is that we can select a desired percentage error without prior knowledge of the
magnitude of the value of µ.
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
As an example, say that after recording the number of haircuts given at the
barbershop on 12 Saturdays (n = 12 replications of the experiment), we wish to
determine the approximate number of replications needed to estimate the mean
number of haircuts given per day with an error percentage of 17.14 percent and a
confidence level of 95 percent. We apply our equation using the sample mean and
sample standard deviation from Table 9.1.
Given: P confidence level 0.95
α significance level 1 − P 1 − 0.95 0.05
Z α/2 = Z 0.025 = 1.96 from Appendix B
re 0.1714
x̄ 13.67
s 4.21
2 2
(z 0.025 )s (1.96)4.21
n = re = 0.1714 = 17.02 observations
(1+re)
x̄ (1+0.1714)
13.67
Thus n ≈ 18 observations. This is the same result computed earlier on page 229
for an absolute error of e = 2.00 haircuts. This occurred because we purposely
selected re = 0.1714 to produce a value of 2.00 for the equation’s denominator in
order to demonstrate the equivalency of the different methods for approximating
the number of replications needed to achieve a desired level of precision in the
point estimate x̄. The SimRunner software uses the relative error methodology to
provide estimates for the number of replications needed to satisfy the specified
level of precision for a given significance level α.
In general, the accuracy of the estimates improves as the number of replica-
tions increases. However, after a point, only modest improvements in accuracy
(reductions in the half-width of the confidence interval) are made through con-
ducting additional replications of the experiment. Therefore, it is sometimes nec-
essary to compromise on a desired level of accuracy because of the time required
to run a large number of replications of the model.
The ProModel simulation software automatically computes confidence inter-
vals. Therefore, there is really no need to estimate the sample size required for a
desired half-width using the method given in this section. Instead, the experiment
could be replicated, say, 10 times, and the half-width of the resulting confidence
interval checked. If the desired half-width is not achieved, additional replications
of the experiment are made until it is. The only real advantage of estimating the
number of replications in advance is that it may save time over the trial-and-error
approach of repeatedly checking the half-width and running additional replica-
tions until the desired confidence interval is achieved.
simulated to estimate the mean time that entities wait in queues in that system.
The experiment is replicated several times to collect n independent sample obser-
vations, as was done in the barbershop example. As an entity exits the system, the
time that the entity spent waiting in queues is recorded. The waiting time is de-
noted by yi j in Table 9.3, where the subscript i denotes the replication from which
the observation came and the subscript j denotes the value of the counter used to
count entities as they exit the system. For example, y32 is the waiting time for the
second entity that was processed through the system in the third replication. These
values are recorded during a particular run (replication) of the model and are
listed under the column labeled “Within Run Observations” in Table 9.3.
Note that the within run observations for a particular replication, say the ith
replication, are not usually independent because of the correlation between
consecutive observations. For example, when the simulation starts and the first
entity begins processing, there is no waiting time in the queue. Obviously, the
more congested the system becomes at various times throughout the simulation,
the longer entities will wait in queues. If the waiting time observed for one entity
is long, it is highly likely that the waiting time for the next entity observed is
going to be long and vice versa. Observations exhibiting this correlation between
consecutive observations are said to be autocorrelated. Furthermore, the within
run observations for a particular replication are often nonstationary in that they
do not follow an identical distribution throughout the simulation run. Therefore,
they cannot be directly used as observations for statistical methods that require in-
dependent and identically distributed observations such as those used in this
chapter.
At this point, it seems that we are a long way from getting a usable set of
observations. However, let’s focus our attention on the last column in Table 9.3,
labeled “Average,” which contains the x1 through xn values. xi denotes the average
waiting time of the entities processed during the ith simulation run (replication)
and is computed as follows:
m
j=1 yi j
xi =
m
where m is the number of entities processed through the system and yi j is the time
that the jth entity processed through the system waited in queues during the ith
replication. Although not as formally stated, you used this equation in the last row
of the ATM spreadsheet simulation of Chapter 3 to compute the average waiting
time of the m = 25 customers (Table 3.2). An xi value for a particular replication
represents only one possible value for the mean time an entity waits in queues,
and it would be risky to make a statement about the true waiting time from a
single observation. However, Table 9.3 contains an xi value for each of the n
independent replications of the simulation. These xi values are statistically
independent if different seed values are used to initialize the random number gen-
erator for each replication (as discussed in Section 9.2.1). The xi values are often
identically distributed as well. Therefore, the sample of xi values can be used for
statistical methods requiring independent and identically distributed observations.
Thus we can use the xi values to estimate the true but unknown average time
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
FIGURE 9.2
Replication technique used on ATM simulation of Lab Chapter 3.
-------------------------------------------------------------------
General Report
Output from C:\Bowden\2nd Edition\ATM System Ch9.MOD [ATM System]
Date: Nov/26/2002 Time: 04:24:27 PM
-------------------------------------------------------------------
Scenario : Normal Run
Replication : All
Period : Final Report (1000 hr to 1500 hr Elapsed: 500 hr)
Warmup Time : 1000
Simulation Time : 1500 hr
-------------------------------------------------------------------
LOCATIONS
Average
Location Scheduled Total Minutes Average Maximum Current
Name Hours Capacity Entries Per Entry Contents Contents Contents % Util
--------- ----- -------- ------- --------- -------- -------- -------- ------
ATM Queue 500 999999 9903 9.457 3.122 26 18 0.0 (Rep 1)
ATM Queue 500 999999 9866 8.912 2.930 24 0 0.0 (Rep 2)
ATM Queue 500 999999 9977 11.195 3.723 39 4 0.0 (Rep 3)
ATM Queue 500 999999 10006 8.697 2.900 26 3 0.0 (Rep 4)
ATM Queue 500 999999 10187 10.841 3.681 32 0 0.0 (Rep 5)
ATM Queue 500 999999 9987.8 9.820 3.271 29.4 5 0.0 (Average)
ATM Queue 0 0 124.654 1.134 0.402 6.148 7.483 0.0 (Std. Dev.)
ATM Queue 500 999999 9833.05 8.412 2.772 21.767 -4.290 0.0 (95% C.I. Low)
ATM Queue 500 999999 10142.6 11.229 3.771 37.032 14.290 0.0 (95% C.I. High)
runs two shifts with an hour break during each shift in which everything momen-
tarily stops. Break and third-shift times are excluded from the model because work
always continues exactly as it left off before the break or end of shift. The length
of the simulation is determined by how long it takes to get a representative steady-
state reading of the model behavior.
Nonterminating simulations can, and often do, change operating characteris-
tics after a period of time, but usually only after enough time has elapsed to es-
tablish a steady-state condition. Take, for example, a production system that runs
10,000 units per week for 5 weeks and then increases to 15,000 units per week for
the next 10 weeks. The system would have two different steady-state periods. Oil
and gas refineries and distribution centers are additional examples of nontermi-
nating systems.
Contrary to what one might think, a steady-state condition is not one in which
the observations are all the same, or even one for which the variation in obser-
vations is any less than during a transient condition. It means only that all obser-
vations throughout the steady-state period will have approximately the same
distribution. Once in a steady state, if the operating rules change or the rate at
which entities arrive changes, the system reverts again to a transient state until the
system has had time to start reflecting the long-term behavior of the new operat-
ing circumstances. Nonterminating systems begin with a warm-up (or transient)
state and gradually move to a steady state. Once the initial transient phase has
diminished to the point where the impact of the initial condition on the system’s
response is negligible, we consider it to have reached steady state.
the system, the utilization of resources) over the period simulated, as discussed
in Section 9.3.
The answer to the question of how many replications are necessary is usually
based on the analyst’s desired half-width of a confidence interval. As a general
guideline, begin by making 10 independent replications of the simulation and add
more replications until the desired confidence interval half-width is reached. For
the barbershop example, 18 independent replications of the simulation were re-
quired to achieve the desired confidence interval for the expected number of hair-
cuts given on Saturday mornings.
FIGURE 9.3
Behavior of model’s output response as it reaches steady state.
Warm-up ends
Transient state Steady state
Averaged
output
response
measure
(y )
1 2 3 4 5 6
Simulation time
from the beginning of the run and use the remaining observations to estimate the
true mean response of the model.
While several methods have been developed for estimating the warm-up
time, the easiest and most straightforward approach is to run a preliminary simu-
lation of the system, preferably with several (5 to 10) replications, average the
output values at each time step across replications, and observe at what time the
system reaches statistical stability. This usually occurs when the averaged output
response begins to flatten out or a repeating pattern emerges. Plotting each data
point (averaged output response) and connecting them with a line usually helps to
identify when the averaged output response begins to flatten out or begins a re-
peating pattern. Sometimes, however, the variation of the output response is so
large that it makes the plot erratic, and it becomes difficult to visually identify the
end of the warm-up period. Such a case is illustrated in Figure 9.4. The raw data
plot in Figure 9.4 was produced by recording a model’s output response for a
queue’s average contents during 50-hour time periods (time slices) and averaging
the output values from each time period across five replications. In this case, the
model was initialized with several entities in the queue. Therefore, we need to
eliminate this apparent upward bias before recording observations of the queue’s
average contents. Table 9.4 shows this model’s output response for the 20 time
periods (50 hours each) for each of the five replications. The raw data plot in Fig-
ure 9.4 was constructed using the 20 values under the ȳi column in Table 9.4.
When the model’s output response is erratic, as in Figure 9.4, it is useful
to “smooth” it with a moving average. A moving average is constructed by
calculating the arithmetic average of the w most recent data points (averaged
output responses) in the data set. You have to select the value of w, which is called
the moving-average window. As you increase the value of w, you increase the
“smoothness” of the moving average plot. An indicator for the end of the warm-up
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
FIGURE 9.4
SimRunner uses the Welch moving-average method to help identify the end of the warm-up period
that occurs around the third or fourth period (150 to 200 hours) for this model.
time is when the moving average plot appears to flatten out. Thus the routine is to
begin with a small value of w and increase it until the resulting moving average plot
begins to flatten out.
The moving-average plot in Figure 9.4 with a window of six (w = 6) helps
to identify the end of the warm-up period—when the moving average plot appears
to flatten out at around the third or fourth period (150 to 200 hours). Therefore,
we ignore the observations up to the 200th hour and record only those after the
200th hour when we run the simulation. The moving-average plot in Figure 9.4
was constructed using the 14 values under the yi (6) column in Table 9.4 that were
computed using
w
ȳ
s=−w i+s
if i = w + 1, . . . , m − w
2w + 1
ȳi (w) = i−1
ȳ
s=−(i−1) i+s if i = 1, . . . , w
2i − 1
where m denotes the total number of periods and w denotes the window of the
moving average (m = 20 and w = 6 in this example).
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
TABLE 9.4 Welch Moving Average Based on Five Replications and 20 Periods
Average
Total Contents Welch
Observed Mean Queue Contents (Replications) Contents for per Period, Moving
Period, Totali Average,
ȳi =
Period (i) Time 1 2 3 4 5 Totali 5 ȳi (6)
ȳ1 5.65
ȳ1 (6) = = = 5.65
1 1
ȳ1 + ȳ2 + ȳ3 5.65 + 3.08 + 3.12
ȳ2 (6) = = = 3.95
3 3
ȳ1 + ȳ2 + ȳ3 + ȳ4 + ȳ5 5.65 + 3.08 + 3.12 + 3.75 + 3.05
ȳ3 (6) = = = 3.73
5 5
·
·
·
ȳ1 + ȳ2 + ȳ3 + ȳ4 + ȳ5 + ȳ6 + ȳ7 + ȳ8 + ȳ9 + ȳ10 + ȳ11
ȳ6 (6) = = 3.82
11
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
Notice the pattern that as i increases, we average more data points together,
with the ith data point appearing in the middle of the sum in the numerator (an
equal number of data points are on each side of the center data point). This con-
tinues until we reach the (w + 1)th moving average, when we switch to the top
part of the ȳi (w) equation. For our example, the switch occurs at the 7th moving
average because w = 6. From this point forward, we average the 2w + 1 closest
data points. For our example, we average the 2(6) + 1 = 13 closest data points
(the ith data point plus the w = 6 closest data points above it and the w = 6 clos-
est data point below it in Table 9.4 as follows:
ȳ1 + ȳ2 + ȳ3 + ȳ4 + ȳ5 + ȳ6 + ȳ7 + ȳ8 + ȳ9 + ȳ10 + ȳ11 + ȳ12 + ȳ13
ȳ7 (6) =
13
ȳ7 (6) = 3.86
ȳ2 + ȳ3 + ȳ4 + ȳ5 + ȳ6 + ȳ7 + ȳ8 + ȳ9 + ȳ10 + ȳ11 + ȳ12 + ȳ13 + ȳ14
ȳ8 (6) =
13
ȳ8 (6) = 3.83
ȳ3 + ȳ4 + ȳ5 + ȳ6 + ȳ7 + ȳ8 + ȳ9 + ȳ10 + ȳ11 + ȳ12 + ȳ13 + ȳ14 + ȳ15
ȳ9 (6) =
13
ȳ9 (6) = 3.84
·
·
·
ȳ8 + ȳ9 + ȳ10 + ȳ11 + ȳ12 + ȳ13 + ȳ14 + ȳ15 + ȳ16 + ȳ17 + ȳ18 + ȳ19 + ȳ20
ȳ14 (6) =
13
ȳ14 (6) = 3.96
Eventually we run out of data and have to stop. The stopping point occurs when
i = m − w. In our case with m = 20 periods and w = 6, we stopped when i =
20 − 6 = 14.
The development of this graphical method for estimating the end of the
warm-up time is attributed to Welch (Law and Kelton 2000). This method is
sometimes referred to as the Welch moving-average method and is implemented
in SimRunner. Note that when applying the method that the length of each repli-
cation should be relatively long and the replications should allow even rarely
occurring events such as infrequent downtimes to occur many times. Law and
Kelton (2000) recommend that w not exceed a value greater than about m/4. To
determine a satisfactory warm-up time using Welch’s method, one or more key
output response variables, such as the average number of entities in a queue or the
average utilization of a resource, should be monitored for successive periods.
Once these variables begin to exhibit steady state, a good practice to follow would
be to extend the warm-up period by 20 to 30 percent. This approach is simple,
conservative, and usually satisfactory. The danger is in underestimating the
warm-up period, not overestimating it.
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
should be based on time. If the output measure is based on observations (like the
waiting time of entities in a queue), then the batch interval is typically based on the
number of observations.
Table 9.5 details how a single, long simulation run (one replication), like the
one shown in Figure 9.5, is partitioned into batch intervals for the purpose of ob-
taining (approximately) independent and identically distributed observations of a
simulation model’s output response. Note the similarity between Table 9.3 of
Section 9.3 and Table 9.5. As in Table 9.3, the observations represent the time an
entity waited in queues during the simulation. The waiting time for an entity is
denoted by yi j , where the subscript i denotes the interval of time (batch interval)
from which the observation came and the subscript j denotes the value of the
counter used to count entities as they exit the system during a particular batch in-
terval. For example, y23 is the waiting time for the third entity that was processed
through the system during the second batch interval of time.
Because these observations are all from a single run (replication), they are
not statistically independent. For example, the waiting time of the mth entity exit-
ing the system during the first batch interval, denoted y1m , is correlated with the
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
waiting time of the first entity to exit the system during the second batch interval,
denoted y21 . This is because if the waiting time observed for one entity is long, it
is likely that the waiting time for the next entity observed is going to be long, and
vice versa. Therefore, adjacent observations in Table 9.5 will usually be auto-
correlated. However, the value of y14 can be somewhat uncorrelated with the
value of y24 if they are spaced far enough apart such that the conditions that re-
sulted in the waiting time value of y14 occurred so long ago that they have little or
no influence on the waiting time value of y24 . Therefore, most of the observations
within the interior of one batch interval can become relatively uncorrelated with
most of the observations in the interior of other batch intervals, provided they are
spaced sufficiently far apart. Thus the goal is to extend the batch interval length
until there is very little correlation (you cannot totally eliminate it) between the
observations appearing in different batch intervals. When this occurs, it is reason-
able to assume that observations within one batch interval are independent of the
observations within other batch intervals.
With the independence assumption in hand, the observations within a batch
interval can be treated in the same manner as we treated observations within a
replication in Table 9.3. Therefore, the values in the Average column in Table 9.5,
which represent the average amount of time the entities processed during the
ith batch interval waited in queues, are computed as follows:
m
j=1 yi j
xi =
m
where m is the number of entities processed through the system during the batch in-
terval of time and yi j is the time that the jth entity processed through the system
waited in queues during the ith batch interval. The xi values in the Average column
are approximately independent and identically distributed and can be used to com-
pute a sample mean for estimating the true average time an entity waits in queues:
n
xi
x̄ = i=1
n
The sample standard deviation is also computed as before:
n
i=1 [x i − x̄]
2
s=
n−1
And, if we assume that the observations (x1 through xn) are normally distributed (or
at least approximately normally distributed), a confidence interval is computed by
(tn−1,α/2 )s
x̄ ± √
n
where tn−1,α/2 is a value from the Student’s t table in Appendix B for an α level of
significance.
ProModel automatically computes point estimates and confidence intervals
using the above method. Figure 9.6 is a ProModel output report from a single,
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
FIGURE 9.6
Batch means technique used on ATM simulation in Lab Chapter 3. Warm-up period is from 0 to 1,000 hours. Each
batch interval is 500 hours in length.
-----------------------------------------------------------------
General Report
Output from C:\Bowden\2nd Edition\ATM System Ch9.MOD [ATM System]
Date: Nov/26/2002 Time: 04:44:24 PM
-----------------------------------------------------------------
Scenario : Normal Run
Replication : 1 of 1
Period : All
Warmup Time : 1000 hr
Simulation Time : 3500 hr
-----------------------------------------------------------------
LOCATIONS
Average
Location Scheduled Total Minutes Average Maximum Current
Name Hours Capacity Entries Per Entry Contents Contents Contents % Util
--------- --------- -------- ------- --------- -------- -------- -------- ------
ATM Queue 500 999999 9903 9.457 3.122 26 18 0.0 (Batch 1)
ATM Queue 500 999999 10065 9.789 3.284 24 0 0.0 (Batch 2)
ATM Queue 500 999999 9815 8.630 2.823 23 16 0.0 (Batch 3)
ATM Queue 500 999999 9868 8.894 2.925 24 1 0.0 (Batch 4)
ATM Queue 500 999999 10090 12.615 4.242 34 0 0.0 (Batch 5)
ATM Queue 500 999999 9948.2 9.877 3.279 26.2 7 0.0 (Average)
ATM Queue 0 0 122.441 1.596 0.567 4.494 9.165 0.0 (Std. Dev.)
ATM Queue 500 999999 9796.19 7.895 2.575 20.620 -4.378 0.0 (95% C.I. Low)
ATM Queue 500 999999 10100.2 11.860 3.983 31.779 18.378 0.0 (95% C.I. High)
long run of the ATM simulation in Lab Chapter 3 with the output divided into five
batch intervals of 500 hours each after a warm-up time of 1,000 hours. The col-
umn under the Locations section of the output report labeled “Average Minutes
Per Entry” displays the average amount of time that customer entities waited in
the queue during each of the five batch intervals. These values correspond to the
xi values of Table 9.5. Note that the 95 percent confidence interval automatically
computed by ProModel in Figure 9.6 is comparable, though not identical, to the
95 percent confidence interval in Figure 9.2 for the same output statistic.
Establishing the batch interval length such that the observations x1 , x2 , . . . , xn
of Table 9.5 (assembled after the simulation reaches steady state) are approxi-
mately independent is difficult and time-consuming. There is no foolproof method
for doing this. However, if you can generate a large number of observations, say
n ≥ 100, you can gain some insight about the independence of the observations
by estimating the autocorrelation between adjacent observations (lag-1 autocorre-
lation). See Chapter 6 for an introduction to tests for independence and autocor-
relation plots. The observations are treated as being independent if their lag-1
autocorrelation is zero. Unfortunately, current methods available for estimating the
lag-1 autocorrelation are not very accurate. Thus we may be persuaded that our ob-
servations are almost independent if the estimated lag-1 autocorrelation computed
from a large number of observations falls between −0.20 to +0.20. The word
almost is emphasized because there is really no such thing as almost independent
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
(the observations are independent or they are not). What we are indicating here is
that we are leaning toward calling the observations independent when the estimate
of the lag-1 autocorrelation is between −0.20 and +0.20. Recall that autocorrela-
tion values fall between ±1. Before we elaborate on this idea for determining if an
acceptable batch interval length has been used to derive the observations, let’s talk
about positive autocorrelation versus negative autocorrelation.
Positive autocorrelation is a bigger enemy to us than negative autocorrelation
because our sample standard deviation s will be biased low if the observations
from which it is computed are positively correlated. This would result in a falsely
narrow confidence interval (smaller half-width), leading us to believe that we have
a better estimate of the mean µ than we actually have. Negatively correlated ob-
servations have the opposite effect, producing a falsely wide confidence interval
(larger half-width), leading us to believe that we have a worse estimate of the mean
µ than we actually have. A negative autocorrelation may lead us to waste time col-
lecting additional observations to get a more precise (narrow) confidence interval
but will not result in a hasty decision. Therefore, an emphasis is placed on guard-
ing against positive autocorrelation. We present the following procedure adapted
from Banks et al. (2001) for estimating an acceptable batch interval length.
For the observations x1 , x2 , . . . , xn derived from n batches of data assembled
after the simulation has researched steady state as outlined in Table 9.5, compute
an estimate of the lag-1 autocorrelation ρ̂1 using
n−1
(xi − x̄)(xi+1 − x̄)
ρ̂1 = i=1 2
s (n − 1)
where s 2 is the sample variance (sample standard deviation squared) and x̄ is the
sample mean of the n observations. Recall the recommendation that n should be
at least 100 observations. If the estimated lag-1 autocorrelation is not between
−0.20 and +0.20, then extend the original batch interval length by 50 to 100 per-
cent, rerun the simulation to collect a new set of n observations, and check the
lag-1 autocorrelation between the new x1 to xn observations. This would be re-
peated until the estimated lag-1 autocorrelation falls between −0.20 and +0.20
(or you give up trying to get within the range). Note that falling below −0.20 is
less worrisome than exceeding +0.20.
Achieving −0.20 ≤ ρ̂1 ≤ 0.20: Upon achieving an estimated lag-1
autocorrelation within the desired range, “rebatch” the data by combining
adjacent batch intervals of data into a larger batch. This produces a smaller
number of observations that are based on a larger batch interval length. The
lag-1 autocorrelation for this new set of observations will likely be closer to
zero than the lag-1 autocorrelation of the previous set of observations
because the new set of observations is based on a larger batch interval
length. Note that at this point, you are not rerunning the simulation with the
new, longer batch interval but are “rebatching” the output data from the last
run. For example, if a batch interval length of 125 hours produced 100
observations (x1 to x100 ) with an estimated lag-1 autocorrelation ρ̂1 = 0.15,
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
you might combine four contiguous batches of data into one batch with a
length of 500 hours (125 hours × 4 500 hours). “Rebatching” the data
with the 500-hour batch interval length would leave you 25 observations (x1
to x25 ) from which to construct the confidence interval. We recommend that
you “rebatch” the data into 10 to 30 batch intervals.
Not Achieving −0.20 ≤ ρ̂1 ≤ 0.20: If obtaining an estimated lag-1
autocorrelation within the desired range becomes impossible because you
cannot continue extending the length of the simulation run, then rebatch the
data from your last run into no more than about 10 batch intervals and
construct the confidence interval. Interpret the confidence interval with the
apprehension that the observations may be significantly correlated.
Lab Chapter 9 provides an opportunity for you to gain experience applying
these criteria to a ProModel simulation experiment. However, remember that
there is no universally accepted method for determining an appropriate batch in-
terval length. See Banks et al. (2001) for additional details and variations on this
approach, such as a concluding hypothesis test for independence on the final set
of observations. The danger is in setting the batch interval length too short, not too
long. This is the reason for extending the length of the batch interval in the last
step. A starting point for setting the initial batch interval length from which to
begin the process of evaluating the lag-1 autocorrelation estimates is provided in
Section 9.6.3.
In summary, the statistical methods in this chapter are applied to the data
compiled from batch intervals in the same manner they were applied to the data
compiled from replications. And, as before, the accuracy of point and interval es-
timates generally improves as the sample size (number of batch intervals) in-
creases. In this case we increase the number of batch intervals by extending the
length of the simulation run. As a general guideline, the simulation should be run
long enough to create around 10 batch intervals and possibly more, depending on
the desired confidence interval half-width. If a trade-off must be made between
the number of batch intervals and the batch interval length, then err on the side of
increasing the batch interval length. It is better to have a few independent obser-
vations than to have several autocorrelated observations when using the statistical
methods presented in this text.
Replication 1
Batch
Simulation time
9.7 Summary
Simulation experiments can range from a few replications (runs) to a large num-
ber of replications. While “ballpark” decisions require little analysis, more precise
decision making requires more careful analysis and extensive experimentation.
Harrell−Ghosh−Bowden: I. Study Chapters 9. Simulation Output © The McGraw−Hill
Simulation Using Analysis Companies, 2004
ProModel, Second Edition
15. Given the five batch means for the Average Minutes Per Entry of
customer entities at the ATM queue location in Figure 9.6, estimate the
lag-1 autocorrelation ρ̂1 . Note that five observations are woefully
inadequate for obtaining an accurate estimate of the lag-1
autocorrelation. Normally you will want to base the estimate on at least
100 observations. The question is designed to give you some experience
using the ρ̂1 equation so that you will understand what the Stat::Fit
software is doing when you use it in Lab Chapter 9 to crunch through
an example with 100 observations.
16. Construct a Welch moving average with a window of 2 (w = 2) using
the data in Table 9.4 and compare it to the Welch moving average with
a window of 6 (w = 6) presented in Table 9.4.
References
Banks, Jerry; John S. Carson, II; Barry L. Nelson; and David M. Nicol. Discrete-Event
System Simulation. New Jersey: Prentice-Hall, 2001.
Bateman, Robert E.; Royce O. Bowden; Thomas J. Gogg; Charles R. Harrell; and Jack
R. A. Mott. System Improvement Using Simulation. Orem, UT: PROMODEL Corp.,
1997.
Hines, William W., and Douglas C. Montgomery. Probability and Statistics in Engineering
and Management Science. New York: John Wiley and Sons, 1990.
Law, Averill M., and David W. Kelton. Simulation Modeling and Analysis. New York:
McGraw-Hill, 2000.
Montgomery, Douglas C. Design and Analysis of Experiments. New York: John Wiley &
Sons, 1991.
Petersen, Roger G. Design and Analysis of Experiments. New York: Marcel Dekker, 1985.