Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

MTH 6060 Mathematical Modeling

Dr. Chao Cheng Huang

Chapter 1 Statistical Reasoning


Why statistics?

– Uncertainty of nature (weather, earth movement, etc. )


– Uncertainty in observation/sampling/measurement
– Variability of human operation/error
– imperfection of machines, etc.

Section 1.1 Basics of Probability Theory ( http://www.math.uiuc.edu/~r-


ash/BPT/BPT.pdf )

– The classical de…nition of probability states that the probability


of an event is the number of outcomes favorable to the event,
divided by the total number of outcomes, where all outcomes are
equally likely. For instance, in coin toss, the probability of seeing
head = 1/2. This de…nition is very restrictive: it considers only
experiments with a …nite number of outcomes, and, more seriously,
circular (no matter how you look at it "equally likely" essentially
means "equally probable.") The concept of "equally probable"
involves the concept of probability. Thus we are using the concept
of probability to de…ne probability itself.
– Mathematical de…nition of probability:
–a set (e.g., Rn ;part of integers, etc.) called sample space
representing all possible outcome.

1
In coin toss example, = fhead; tailg
In dice toss, = f1; 2; 3; 4; 5; 6g : In an experiment of toss-
ing a dice, one may de…ne other outcomes. For instance,
N representing even and O representing odd number. So
outcome could be a subset of :This leads to "events"
An event is a subset of the sample space.
F –collection of events
F must be closed in set operations (Boolean algebra),
and must contain and ? (empty set). For instance,
if A; Bare two events, so are A \ B; A [ B; Ac = nA;
B c = nB;etc.
We call such F a f ield, or Boolean …eld
P (A) is a countable additive non-negative function de…ned
for every event A; P ( ) = 1:
countable additive means that
!
[1 X
1
P Ai = P (Ai ) if Ai \ Aj = ? f or any i < j
i=1 i=1

This implies: P (Ac ) = 1 P (A) : If A1 A2 A3 ;


then
! !
[1 [1
P Ai = P A1 [ (Ai+1 Ai )
i=1 i=1
X1
= P (A1 ) + P (Ai+1 Ai )
i=1
X1
= P (A1 ) + (P (Ai+1 ) P (Ai )) = lim P (Ai )
i!1
i=1

This last property is called lower continuity, or continuous


from below.
\
1
If A1 A2 A3 :::; and let A = An ; then
n=1

[
1
A1 = A[(A1 A2 )[(A2 A3 )[(A3 A4 )[::: = A[ (An An+1 )
n=1

2
So
X
1
P (A1 ) = P (A)+ P (An An+1 ) = P (A)+P (A1 ) lim P (An+1 )
n!1
n=1

This limit lim P (An ) = P (A) is called upper continuity.


n!1
Any non-negative countable additive function is called a
measure. When the measure of the whole space = 1, it is
called a probability measure.
One (classical) "de…nition" of P is
number of points in A f avorable outcomes
P (A) = =
total number of points in total outcomes
( ; F; P ) is called a probability space. If you are familiar
with real analysis, a probability space is a measurable space
with total measure one. an event is called a measurable set.
Note that there is non-measurable set.
Recall that in real analysis, we may de…ne integration based
on this measure P
Z X
g (!) dP = lim xi P (f! : xi g (!) xi+1 g)
i

In particular, if g (x) = XA (x) ; characteristic function of


A;then Z
XA (x) dP = P (A)

A function X (!) de…ned on ! 2 is called measurable func-


tion if for any real number x; f! : X (!) xg 2 F . We
view such a function is a measurement for outcomes associ-
ated with an experiment. In probability theory, it is called a
random variable.
For coin toss, = fH; T g ; F = fH; T; ?; fH; T gg ; P (H) =
P (T ) = 1=2; P (fH; T g) = 1:We de…ne a random variable
as
X (T ) = 1; X (H) = 0:
So if X = 1; then we get a tail. It thus provide a mea-
surement of out comes.

3
Two events A and B are called independent if P (A \ B) =
P (A) P (B)
Conditional probability of B given A: P (B j A) = P (A \ B) =P (A)
Theory of Total Probability: Let B1 ; B2 ; :::be a …nite or
countably in…nite family of mutually exclusive and exhaus-
tive events (i.e., disjoint and their union is ). Then
X X
P (A) = P (A \ Bi ) = P (Bi ) P (A j Bi ) :
i i

We de…ne FX (x) = P (f! : X (!) xg) : This is called cu-


mulative probability distribution function CDF of the random
variable.
FX is continuous from the right. This is because for integers
n > m; 1=n < 1=m

1 1
f! : X (!) xg ! : X (!) x+ ! : X (!) x+
n m

So
\
1
1
f! : X (!) xg ! : X (!) x+
n=1
n
On the other hand, for any !0 such that
\
1
1
!0 2 ! : X (!) x+ ;
n=1
n

it must hold
X (!0 ) x
since otherwise it will cause a contradiction (why?). Thus

\
1
1
f! : X (!) xg = ! : X (!) x+
n=1
n

According to the upper continuity of P; it follows that

1
P (f! : X (!) xg) = lim P ! : X (!) x+
n!1 n

4
or equivalently

1
FX (x) = lim FX x +
n!1 n

Exercise 1: De…ne

G (x) = P (f! : X (!) < xg) : (Exercise 1)

Show that G (x) is continuous from left.


For right-continuous function FX (x) ; one may de…ne the Lebesgue–
Stieltjes integration for any function g (x)
Z X
g (x) dFX = lim g (xi ) (FX (xi+1 ) FX (xi ))
X Z
= lim g (xi ) P (f! : xi X (!) xi+1 g) = g (X) dP

In particular, when g = XA ;the indicator function of an event


A; Z
dFX = P (f! : X (!) 2 Ag)
A

If A = (a; b];

P (f! : a < X (!) bg) = P (f! : X (!) 2 Ag)


Z Z b
= dFX = dFX = FX (b) FX a
A a

Let a = 1; b = x; we see
Z x
FX (x) = dFX
1

We call FX absolutely continuous, if there exists a function


such that (x)
Z x Z x
FX (x) = dFX = (t) dt
1 1

5
X (x) is called probability density function (PDF). One can
show that is in fact the derivative of FX :

FX (x + ) FX (x)
(x) = lim
!0
P (f! : x < X (!) x + g)
= lim = FX0 (x)
!0

For any interval A


Z Z
1
P X (A) = P (f! : X (!) 2 Ag) = dFX (x) = X (x) dx:
A A

For any function g (x)


Z Z 1
g (X (!)) dP = g (x) X (x) dx
1

This formula established connection between Probability mea-


sure P and P DF p: From this point, we don’t need to care
about P , nor about X:All we need is
Expected value, or expectation of X;is de…ned as
Z 1 Z
= E [X] = x X (x) dx = X (!) dP
1

E (X) means average value of X;or center of mass of X


If X takes only …nite many values xi , then
Z X
E [X] = X (!) dP = xi P (X = xi )

E [X] is linear

E [aX + bY ] = aE [X] + bE [Y ]

In particular, E [X ]=0
R=X is called residue of X: Reynolds decomposition:

X= +R

6
for any function g (x) ; the expected value of random vari-
able Y = g (X) is
Z 1 Z
E [Y ] = g (x) X (x) dx = g (X (!)) dP:
1

n th moment of X
Z 1 Z
n
E [X ] = x n
X (x) dx = X (!)n dP
1

Variance or the moment of inertia

= 2
= E (X )2
Z 1 Z
2
= (x ) X (x) dx = (X (!) )2 dP
1

is called standard deviation of X


median m (X) is a number such that FX (m) = 1=2
Joint distribution function: Let X and Y are two random vari-
ables. The joint cumulative probability distribution function
is

F (x; y) = P (X x; Y y)
= P (f! : X (!) x and Y (!) yg)

We say F (x; y) is absolutely continuous if there exists


a probability density function PDF (x; y) such that for
any intervals A; B
Z Z
P (X 2 A; Y 2 B) = (x; y) dxdy
A B

so Z y Z x
F (x; y) = (x; y) dxdy
1 1
Z Z Z
g (X; Y ) dP = g (x; y) (x; y) dxdy
R2

7
Two random variables are called independent if f! : X 2 Ag
and f! : Y 2 Bg are independent events for any intervals
A; B: In this case

F (x; y) = FX (x) FY (y) ; (x; y) = X (x) Y (y)

Covariance between random variables: for random variables


X1; X2 ; :::; Xn

Cov (Xi ; Xj ) = E [(Xi i ) (Xj j )]


Z
= (Xi i ) (Xj j ) dP
Z Z
= (x i ) (y j) ij (x; y) dxdy
R2

is called covariance between Xi and Xj :


Cov (Xi ; Xj ) = (Xi ; Xj ) measures relative dependence
between Xi and Xj
If Xi and Xj are independent, then Cov (Xi ; Xj ) = 0
Cov (Xi ; Xi ) = (Xi ) = variance
the matrix V = [Cov (Xi ; Xj )] is called covariance matrix
2 3
(X1 ) (X1 ; X2 ) (X1 ; Xn )
6 (X2 ; X1 ) (X2 ) (X2 ; Xn )7
6 7
V =6 .. .. . . .. 7
4 . . . . 5
(Xn ; X1 ) (Xn ; X2 ) (Xn )

random process = stochastic process: fX (t)gt 0 : For each


t; X (t) is a random variable, and it is "continuous" with
respective to t
Random …eld X (x) : for each x; X (x) is a random variable.
So (x; y) = Cov (X (x) ; X (y)) is a function of two variable.
Example: Let i for i = 1; 2; :::; n, be disjoint regions, and
Ki be random variable representing conductivity of i with
mean i and variance i2 : Then the over conductivity of the
entire region = [ i is
X
K (x) = X i (x) Ki
i

8
The over all mean is
X X X
= E [K] = E [X i Ki ] = E [X i ] E [Ki ] = Ii i
i i i

where Ii = proportion of i ; assuming independence of X i


and Ki , and
Z Z
E [X i ] = X i dP = dP = Ii :
i

The covariance between di¤erent locations x and y :


" n #2
X
2
(K) = E (X i (x) Ki Ii i ) (Optional Exercise)
i=1
X
n
1X
n
2
2
= Ii i + Ii Ij ( i j)
i=1
2 i=1

Section 1.2 Uniform Distributions

– Given interval [a; b] ;consider cumulative distribution function


8
>
< x 0 if x < a
a
F (x) = if a x b
>
: b a
1 if b < x
The random variable X associated with F is said to be uniformly
distributed on [a; b] :
– PDF is 8
> 0 if x < a
< 1
(x) if a x b
>
: b a
0 if b < x
– mean Z Z
1 b
x b+a
E [X] = x (x) dx = dx =
1 a b a 2
– variance
Z 1 2 Z 2
2 b a 1 b
b a (b a)2
= x (x) dx = d x x=
1 2 b a a 2 12

9
p
– Standard deviation = (b a) = 12

Section 1.3 Gaussian Distributions

– PDF of X
1 (x )2 =(2 2
)
=p e
2
– Exercise 2 : show that
2
mean = ; variance = (Exercise 2)

– Denote it by N ( ; )
– Let Y = (X ) = :Then
E [X]
E [Y ] =

Is called a standard normal distribution with zero mean and stan-


dard deviation 1;i.e., N (0; 1)
– n- process:
Z n
1 x2 =2
P (jX j n ) = P (jY j n) = p e dx ! 1 as n ! 1
2 n

the probability of X (x) lies in n- neighborhood of its mean:

Z 2
1 x2 =2
p e dx = 0:954 50
2 2
Z 4
1 x2 =2
p e dx = 0:999 94
2 4
Z 6
1 x2 =2 6
p e dx = 0:9999966 = 1 3:4 10
2 6

– In industry, a Six Sigma process describes quantitatively how a


process is performing. To achieve Six Sigma, a process must not
produce more than 3.4 defects per million opportunities. A Six
Sigma defect is de…ned as anything outside of customer speci…ca-
tions.

10
Exponential distribution
1 x=
(x) = e for x 0; (x) = 0 for x < 0

– Exercise 3: Show that


2
E [X] = ; = (Exercise 3)

– E [X n ] = n
n!
– m (X) = ln 2 < E [X]
– momorylessness: it satis…es (Exercise 4)
P (X > s + t j X > s) = P (X > t) (Exercise 4)

(recall that the conditional probability P (A j B) = P (A \ B) =P (B)


). When X is interpreted as the waiting time for an event to oc-
cur relative to some initial time, this relation implies that, if X
is conditioned on a failure to observe the event over some initial
period of time s, the distribution of the remaining waiting time
is the same as the original unconditional distribution. For exam-
ple, if an event has not occurred after 30 seconds, the conditional
probability that occurrence will take at least 10 more seconds is
equal to the unconditional probability of observing the event more
than 10 seconds relative to the initial time.
– The exponential distribution the only memoryless continuous prob-
ability distributions.

Section 1.4 The Binomial Distribution (two outcomes)

– Consider n devices, each has a failure probability p:Then the prob-


ability that exact k devices fail is
n k
p (k) = p (1 p)n k
k
The discrete random variable X with this PDF is called binomial
distribution, and
X n
F (x) = pk (1 p)n k
k x
k

11
– Mean is (Exercise 5)

Xn
n k
E [X] = k p (1 p)n k
= pn (Exercise 5)
k=0
k

– variance (Exercise 6)

X
n
n k
= 2
= (k np)2 p (1 p)n k
= np (1 p)
k=0
k
(Exercise 6)
– Problem A in page 7
– Newsboy problem in page 8 of textbook (Exercise 7: run MAT-
LAB routines)

Geometric distribution

– Suppose that each trial the probability of success is p:The prob-


ability in a sequence of trials that the …rst occurrence of success
requires k number (k = 1; 2; :::) of independent trials is

p (k) = P (X = k) = (1 p)k 1
p

– Exercise 8: show
2
E [X] = 1=p; = (1 p) =p2 (Exercise 8)

– Exercise 9: The geometric distribution is memoryless . This is the


only memoryless discrete probability distribution.

P (X > s + t j X > s) = P (X > t) (Exercise 9)

Section 1.5 The Poisson Distribution (continuum version of binomial)

– Suppose random noise spikes occur on channel per unit time.


Consider in a time period of length T:We divide [0; T ] into n
subintervals of length T =n: On average, the probability of a noise
spike in one time subinterval is p = T =n Then according to

12
the binomial distribution, the probability that a spike occurs in
exactly k subinterval is
k n k
n k n k n T T ( T )k T
p (1 p) = 1 ! e
k k n n k!

So the probability that exact k spikes in [0; T ] is the Poisson dis-


tribution with P DF
( T )k T
p (k) = e
k!
– = = T
– Problem A in page 10 of textbook (Run MATLAB, Exercise 10)

Section 1.6 Taguchi quality control

– X is a random variable measure the quality of a product. The


quality of loss function (QLF) is de…ned as

L (X) = k (X )2

– k is the loss coe¢ cient, is the target value. The goal is to make
L (X) as small as possible.
– E [L (X)] = kE (X )2

E [L (X)] = kE ((X )+( ))2


= kE (X )2 + 2 ( ) (X )+( )2
= kE (X )2 + 2k ( ) E (X )+k( )2
=k 2
+k( )2

– It reaches minimum when = 0 and =


– Two problems in page 12.

Homework (due 9/21, before class starts)

1. Exercise 1 –10 (as outlined above in this note, not from textbook)
2. From textbook: #1.1, #1.2

13
Hints for Exercise #5,6,8: Di¤erentiate the identities (twice if neces-
sary)

Xn
n k n
(x + y)n = x y k

k=0
k
X
1
1
xk =
k=0
1 x

14

You might also like