Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Lecture 6 - Input Data Analysis

Levan Pavlenishvili

ISET

2021

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 1 / 24


Table of Contents

Lecture prepared with noted from Professor Burak Buke from University of
Edinburth and Textbook: Simulation modeling and analysis (Averill M.
Law) Reading: Lecture notes

1 Different approaches to analyze input data for CBA models


2 Estimating the parameters for a distribution
3 χ2 Goodness-of-Fit Test
4 Possible distribution families and their characteristics

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 2 / 24


Different approaches to analyze input data for CBA models

When conducting CBA we might have different types of data either very
small amount, or large enough to conduct statistical analysis

In many situations, when creating CBA models you might need to


generate a random variables. You can use several approaches.

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 3 / 24


Different approaches to analyze input data for CBA models

Use Data directly i.e. draw a data point from the data set each time you
need to generate a random variable.

Advantage: We know that these data points are actually observable in


reality

Disadvantage:
Data never goes beyond observed values
The data we have may not be enough for long runs, which results in
repetitions

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 4 / 24


Different approaches to analyze input data for CBA models

Use the histogram of the data, to interpolate an empirical distribution

Advantage: Goes beyond observed values

Disadvantage:
We only generate values in the observed interval i.e. we cannot
generate values in the tails of the distribution
The empirical distribution will have many non-smooth points might
be hard to interpolate a distribution

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 5 / 24


Different approaches to analyze input data for CBA models

Fit Probability distribution to the data

Advantage:
Enables us to generate beyond observed values and interval
Easy to store as we only have to store parameters of the distribution
Easy to generate
Some distributions can be describing underlying process well
Disadvantage:
If the distribution is not a good fit, the resulting estimates can be far
from the real values
Sometimes no distribution may fit our data directly

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 6 / 24


Different approaches to analyze input data for CBA models

To estimate the distribution of a random variable we need to perform the


following steps:
1 Hypothesize a distributional family (Normal, Poisson, Exponential)
2 Estimate the parameters of the distribution (mean and variance for
the normal, rate for the exponential)
3 Determine how good the distribution represents your data

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 7 / 24


Hypothesizing Distributional Family

In some situations you might try to hypothesize distributional family:

For some distributions there might be theoretical justification for


using certain distribution
For some processes we might assume specific distribution (more
details to follow)
There is an intuitive explanation for many different distributions, that
can be used to describe different processes.
You can draw a histogram of the data and examine its structure to
hypothesize a distribution
Summary measures are always useful: mean, mode, variance, slowness

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 8 / 24


Estimating the Parameters for a Distribution

To estimate parameters from a given distribution, maximum likelihood


estimation procedure can be used.

For this procedure we have to define the likelihood function L(θ, x), where
θ is the parameter and x is our data.

Intuitively likelihood function tells us how likely it is to obtain the given


data if we have parameter θ.

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 9 / 24


Estimating the Parameters for a Distribution

General likelihood functions have following form:

Continuous: with fθ (xi ) continues function

L(θ, x) = fθ (x1 )fθ (x2 )fθ (x3 )...fθ (xn ) (1)


Discrete: with pθ (xi ) discrete probability function

L(θ, x) = pθ (x1 )pθ (x2 )pθ (x3 )...pθ (xn ) (2)

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 10 / 24


Estimating the Parameters for a Distribution

To calculate maximum likelihood you will have to maximize L(θ, x):

∂L(θ, x)
=0 ⇒ θ =?? (3)
∂θ

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 11 / 24


χ2 Goodness-of-Fit Test

Suppose we hypothesis a distribution with a certain density function f (x)


with sample values (a, b), our goal is to measure how close the
hypothesized distribution is with our data;

Or putting it differently, how close is our hypothesized distribution to our


histogram;

We can perform the χ2 Goodness-of-Fit Test

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 12 / 24


χ2 Goodness-of-Fit Test

We can perform the χ2 Goodness-of-Fit Test as follows:

1 Set the confidence level to be 1 − α × 1000%


2 Select K intervals (x0 , x1 ], (x1 , x2 ], ..., (xK − 1, xK ), where x0 = a and
xK = b
3 Define Nj to be the number of observations in xj − 1, xj
R xj
4 Let pj = xj−1 f (x)dx
(Nj −npj )2
Find the test statistic χ2 = N
P
5
j=1 npj
6 Reject if χ2 > χ2K −1,1−α with 1 − α% confidence interval

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 13 / 24


χ2 Goodness-of-Fit Test

We can perform the χ2 Goodness-of-Fit Test as follows:

1 Set the confidence level to be 1 − α × 1000%


2 Select K intervals (x0 , x1 ], (x1 , x2 ], ..., (xK − 1, xK ), where x0 = a and
xK = b
3 Define Nj to be the number of observations in xj − 1, xj
R xj
4 Let pj = xj−1 f (x)dx
(Nj −npj )2
Find the test statistic χ2 = N
P
5
j=1 npj
6 Reject if χ2 > χ2K −1,1−α with 1 − α% confidence interval

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 14 / 24


Uniform Distribution
Used as a simplest model for the quantity that is felt to be randomly
distributed between two numbers (a, b) but about which little else is
known;

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 15 / 24


Uniform Distribution

Density: (
1
b−a ifa ≤ x ≤ b
f (x) =
0 otherwise
Range: [a, b]

a+b
Mean: 2

(b−a)2
Variance: 12

MLE: a = min1≤i≤n X , b = a = max1≤i≤n X

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 16 / 24


Exponential Distribution
Inter-arrival times of ”customers” to a system that occur at a constant
rate, time to failure of piece of equipment

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 17 / 24


Exponential Distribution

Density: (
1 −x/λ
λe ifx ≥ 0
f (x) =
0 otherwise
Range: [0, ∞]

Mean: λ

Variance: λ2

MLE: λ̂ = X̂ (n)

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 18 / 24


Weibull Distribution
Time to complete some task, time to failure of a piece of equipment; used
as a rough model in the absence of data. (k = α -shape λ = β -scale)

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 19 / 24


Weibull Distribution

Density: ( α
αβ −α x α−1 e (−x/β) ifx ≥ 0
f (x) =
0 otherwise
Range: [0, ∞]

β 1
Mean: α Γ( α )

β2 2 1 1 2
Variance: α {2Γ( α ) − α [Γ( α )] }
Pn
X α̂
MLE: β̂ = ( i=1n i )1/α̂ , as for the
parameter α in most of your
applications it will be given otherwise its estimation is more complex using
Newton’s iterative method

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 20 / 24


Normal Distribution

Errors of various types, quantities that are the sum of large number of
other quantities

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 21 / 24


Normal Distribution

Density:
1 (x−µ)2
f (x) = √ e− 2σ 2
2πσ 2
Range: [−∞, ∞]

Mean: µ

Variance: σ 2

MLE: µ̂ = X̄ (n), σ̂ = [ n−1 2


n S (n)]
1/2

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 22 / 24


Poisson Distribution
Number of events that occur in an interval of time, when events occure at
a constrant rate

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 23 / 24


Poisson Distribution

Mass:
e −λ λx
(
if x ∈ 0, 1, ....
X!
p(x) =
0 otherwise
Range: [0, 1, ...]

Mean: λ

Variance: λ

MLE: λ̂ = X̄ (n)

Levan Pavlenishvili (ISET) Lecture 6 - Input Data Analysis 2021 24 / 24

You might also like