CBA Lecture 6

Lecture 6 - Input Data Analysis
Levan Pavlenishvili
ISET
2021
Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 1 / 24

Table of Contents
Lecture prepared with noted from Professor Burak Buke from University of
Edinburth and Textbook: Simulation modeling and analysis (Averill M.
Law) Reading: Lecture notes
1 Different approaches to analyze input data for CBA models

2 Estimating the parameters for a distribution
3 χ2 Goodness-of-Fit Test
4 Possible distribution families and their characteristics

Different approaches to analyze input data for CBA models
When conducting CBA we might have different types of data either very
small amount, or large enough to conduct statistical analysis
In many situations, when creating CBA models you might need to

generate a random variables. You can use several approaches.

Use Data directly i.e. draw a data point from the data set each time you
need to generate a random variable.
Advantage: We know that these data points are actually observable in

reality
Disadvantage:
Data never goes beyond observed values
The data we have may not be enough for long runs, which results in
repetitions

Use the histogram of the data, to interpolate an empirical distribution
Advantage: Goes beyond observed values
Disadvantage:
We only generate values in the observed interval i.e. we cannot
generate values in the tails of the distribution
The empirical distribution will have many non-smooth points might
be hard to interpolate a distribution

Fit Probability distribution to the data
Advantage:
Enables us to generate beyond observed values and interval
Easy to store as we only have to store parameters of the distribution
Easy to generate
Some distributions can be describing underlying process well
Disadvantage:
If the distribution is not a good fit, the resulting estimates can be far
from the real values
Sometimes no distribution may fit our data directly

To estimate the distribution of a random variable we need to perform the

following steps:
1 Hypothesize a distributional family (Normal, Poisson, Exponential)
2 Estimate the parameters of the distribution (mean and variance for
the normal, rate for the exponential)
3 Determine how good the distribution represents your data

Hypothesizing Distributional Family
In some situations you might try to hypothesize distributional family:
For some distributions there might be theoretical justification for

using certain distribution
For some processes we might assume specific distribution (more
details to follow)
There is an intuitive explanation for many different distributions, that
can be used to describe different processes.
You can draw a histogram of the data and examine its structure to
hypothesize a distribution
Summary measures are always useful: mean, mode, variance, slowness

Estimating the Parameters for a Distribution
To estimate parameters from a given distribution, maximum likelihood

estimation procedure can be used.
For this procedure we have to define the likelihood function L(θ, x), where
θ is the parameter and x is our data.
Intuitively likelihood function tells us how likely it is to obtain the given

data if we have parameter θ.

General likelihood functions have following form:
Continuous: with fθ (xi ) continues function
L(θ, x) = fθ (x1 )fθ (x2 )fθ (x3 )...fθ (xn ) (1)

Discrete: with pθ (xi ) discrete probability function
L(θ, x) = pθ (x1 )pθ (x2 )pθ (x3 )...pθ (xn ) (2)

To calculate maximum likelihood you will have to maximize L(θ, x):
∂L(θ, x)
=0 ⇒ θ =?? (3)
∂θ

χ2 Goodness-of-Fit Test
Suppose we hypothesis a distribution with a certain density function f (x)

with sample values (a, b), our goal is to measure how close the
hypothesized distribution is with our data;
Or putting it differently, how close is our hypothesized distribution to our

histogram;
We can perform the χ2 Goodness-of-Fit Test

We can perform the χ2 Goodness-of-Fit Test as follows:
1 Set the confidence level to be 1 − α × 1000%

2 Select K intervals (x0 , x1 ], (x1 , x2 ], ..., (xK − 1, xK ), where x0 = a and
xK = b
3 Define Nj to be the number of observations in xj − 1, xj
R xj
4 Let pj = xj−1 f (x)dx
(Nj −npj )2
Find the test statistic χ2 = N
P
5
j=1 npj
6 Reject if χ2 > χ2K −1,1−α with 1 − α% confidence interval

We can perform the χ2 Goodness-of-Fit Test as follows:
1 Set the confidence level to be 1 − α × 1000%

2 Select K intervals (x0 , x1 ], (x1 , x2 ], ..., (xK − 1, xK ), where x0 = a and
xK = b
3 Define Nj to be the number of observations in xj − 1, xj
R xj
4 Let pj = xj−1 f (x)dx
(Nj −npj )2
Find the test statistic χ2 = N
P
5
j=1 npj
6 Reject if χ2 > χ2K −1,1−α with 1 − α% confidence interval

Uniform Distribution
Used as a simplest model for the quantity that is felt to be randomly
distributed between two numbers (a, b) but about which little else is
known;

Uniform Distribution
Density: (
1
b−a ifa ≤ x ≤ b
f (x) =
0 otherwise
Range: [a, b]
a+b
Mean: 2
(b−a)2
Variance: 12
MLE: a = min1≤i≤n X , b = a = max1≤i≤n X

Exponential Distribution
Inter-arrival times of ”customers” to a system that occur at a constant
rate, time to failure of piece of equipment

Exponential Distribution
Density: (
1 −x/λ
λe ifx ≥ 0
f (x) =
0 otherwise
Range: [0, ∞]
Mean: λ
Variance: λ2
MLE: λ̂ = X̂ (n)

Weibull Distribution
Time to complete some task, time to failure of a piece of equipment; used
as a rough model in the absence of data. (k = α -shape λ = β -scale)

Weibull Distribution
Density: ( α
αβ −α x α−1 e (−x/β) ifx ≥ 0
f (x) =
0 otherwise
Range: [0, ∞]
β 1
Mean: α Γ( α )
β2 2 1 1 2
Variance: α {2Γ( α ) − α [Γ( α )] }
Pn
X α̂
MLE: β̂ = ( i=1n i )1/α̂ , as for the
parameter α in most of your
applications it will be given otherwise its estimation is more complex using
Newton’s iterative method

Normal Distribution
Errors of various types, quantities that are the sum of large number of
other quantities

Normal Distribution
Density:
1 (x−µ)2
f (x) = √ e− 2σ 2
2πσ 2
Range: [−∞, ∞]
Mean: µ
Variance: σ 2
MLE: µ̂ = X̄ (n), σ̂ = [ n−1 2

n S (n)]
1/2

Poisson Distribution
Number of events that occur in an interval of time, when events occure at
a constrant rate

Poisson Distribution
Mass:
e −λ λx
(
if x ∈ 0, 1, ....
X!
p(x) =
0 otherwise
Range: [0, 1, ...]
Mean: λ
Variance: λ
MLE: λ̂ = X̄ (n)

CBA Lecture 6

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CBA Lecture 6

Uploaded by

Copyright:

Available Formats

Lecture 6 - Input Data Analysis

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 1 / 24

1 Different approaches to analyze input data for CBA models

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 2 / 24

In many situations, when creating CBA models you might need to

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 3 / 24

Advantage: We know that these data points are actually observable in

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 4 / 24

Use the histogram of the data, to interpolate an empirical distribution

Advantage: Goes beyond observed values

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 5 / 24

Fit Probability distribution to the data

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 6 / 24

To estimate the distribution of a random variable we need to perform the

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 7 / 24

In some situations you might try to hypothesize distributional family:

For some distributions there might be theoretical justification for

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 8 / 24

To estimate parameters from a given distribution, maximum likelihood

Intuitively likelihood function tells us how likely it is to obtain the given

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 9 / 24

General likelihood functions have following form:

Continuous: with fθ (xi ) continues function

L(θ, x) = fθ (x1 )fθ (x2 )fθ (x3 )...fθ (xn ) (1)

L(θ, x) = pθ (x1 )pθ (x2 )pθ (x3 )...pθ (xn ) (2)

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 10 / 24

To calculate maximum likelihood you will have to maximize L(θ, x):

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 11 / 24

Suppose we hypothesis a distribution with a certain density function f (x)

Or putting it differently, how close is our hypothesized distribution to our

We can perform the χ2 Goodness-of-Fit Test

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 12 / 24

We can perform the χ2 Goodness-of-Fit Test as follows:

1 Set the confidence level to be 1 − α × 1000%

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 13 / 24

We can perform the χ2 Goodness-of-Fit Test as follows:

1 Set the confidence level to be 1 − α × 1000%

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 14 / 24

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 15 / 24

MLE: a = min1≤i≤n X , b = a = max1≤i≤n X

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 16 / 24

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 17 / 24

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 18 / 24

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 19 / 24

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 20 / 24

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 21 / 24

MLE: µ̂ = X̄ (n), σ̂ = [ n−1 2

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 22 / 24

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 23 / 24

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 24 / 24

You might also like