Professional Documents
Culture Documents
Normal Distribution: Theory and Testing of Normality
Normal Distribution: Theory and Testing of Normality
Normal Distribution: Theory and Testing of Normality
4
Normal
Distribu3on:
Theory,
Applica3on,
and
Tes3ng
Outline
Historical
Aspects
of
Normal
Distribu4on
Normal
Distribu4on:
Understanding
and
Applying
Normal
Distribu4on
Tes4ng
Normality
Problems
and
Solu4ons
Associated
with
Non-normal
data
Mul4variate
Normal
Distribu4on.
Tes4ng
Mul4variate
Normality
and
Outliers
Historical
Aspects
of
Normal
Distribu3on
Development
Abraham
de
Moivre
(16671754)
Approxima4o
ad
summam
terminorum
binomii
a
+
b
in
seriem
expansi
Carl
Friedrich
Gauss
(1809)
Theoria
motus
corporum
coeles4um
in
sec4onibus
conicis
solem
ambien4um
Marquis
de
Laplace
(17491827)
:
Laplace's
error
func4on
Pearson
popularized
the
term
normal
curve.
Tes3ng
Normality
Fishers
(1930),
Bartle`s
(1935),
E.
S.
Pearson
(1931),
Geary
(1947),
Box
(1953),
John
Tukeys
(1960),
Pearson
and
Please
(1975),
D'Agos4no
and
Lee
(1977)
Understanding
and
Applying
Normal
Distribu3on
(x )2
1 2 2
f (x) = e for < x <
Proper3es
2
The
mean,
median
and
mode
are
same.
Normal
curve
is
symmetric.
Normal
is
a
two-parameter
distribu4on:
mu
and
sigma,
mu
is
the
expected
value
and
sigma
is
the
standard
devia4on.
Normal
distribu4on
is
a
con4nuous
distribu4on
that
can
take
values
from
uncountable
to
uncountable
+.
The
highest
frequency
is
in
the
middle
and
the
frequency
tapers
down
at
either
of
the
extremes
of
the
normal
curve.
Most
of
the
area
under
the
normal
curve
is
within
the
rst
three
standard
devia4on
on
both
sides
(99.74%
area),
whereas
68.26%
area
is
within
the
rst
standard
devia4on.
Zero
skewness
and
Kurtosis
Finding
the
Area
under
Normal
Distribu3on:
Using
Z
score
Z
(standard
normal
distribu4on)
is
useful
in
nding
the
area
under
normal
distribu4on
Example:
Variable
has
mean
100
and
sd
=
10.
How
many
cases
will
be
above
120?
120 100 20
Z= = =2
10 10
From
the
table
of
normal
distribu4on,
the
area
beyond
Z
for
Z
=
2
is
0.0228.
In
terms
of
percentages,
it
is
0.0228
100
=
2.28.
Tes3ng
Normality
Tests
Using
Moments:
Popula4on
moments;
Moment
tests;
Skewness
test;
The
Kurtosis
test;
Absolute
moment
test:
Geary's
Test.
Goodness
of
Fit
and
Related
Tests:
KolmogorovSmirnov
test;
Lilliefors
test;
Kuiper's
V
test;
AndersonDarling
test
(AD
Test);
Cramrvon
Mises
test;
JarqueBera
test
(JB
Test);
ShapiroWilks
test;
The
DAgos4noPearson
test
(DAgos4no
K2
test).
Other
Tests
for
Normality:
Likelihood
Ra4o
Test;
D'Agos4no's
test;
Ojas
Test;
Lin
and
Mudholkars
Test.
Graphical
Methods
for
Tes3ng
Normality
PloJng
Raw
Data:
histogram;
box-whisker
plot
PloJng
Probability:
QQ
plot,
Detrended
Q-Q
plot
Skewness
Test
Skewness
Test
(g1):
When
g1>0
then
data
are
skewed
to
right
and
if
g1<0
then
data
are
skewed
to
lew
n
(x x ) k
mk = i =1
g1 = m3 m23/ 2
n
6 ( n 1) 6 ( n 1)
var( g1 ) = Y = g1
( n + 1)( n + 3) ( n + 1)( n + 3)
Z = log Y a +( (Y / a )
2
+1 )
W = 1 2 ( B2 1) = 1 / log(W ) a= 2 / (W 2 1)
The
Kurtosis
Test
The
fourth
moment
test
for
tes4ng
the
symmetric
departures
from
normality
is
calculated
by
b2
.
b2 = m4 / m2
4/2
1
9A 1+ x {2 / (A 4)}
Z= !
2 / (9A)
8 2 b2 (b2 )
where
A= 6+
1b2 1b2
+ 1+ 4 / (1b2 ) ! x=
(b2 )
!
1
(b
2
)
!is
the
third
moment
of
b2
Absolute
Moment
Test:
Gearys
Test
Geary
(1935)
proposed
a
test
n
x x i
a= i=1
!
n m2
DAgos4no
(1970)
n (a .7979)
Z= !
0.2123
Goodness-of-t
Tests
Empirical
distribu4on
func4on
(EDF)
tests
Hypothesized
theore4cal
distribu4on
(in
our
case
normal
distribu4on)
is
expressed
as
F0(X).
H0:
F(X)
=
F0(X)
HA:
F(X)
=
F0(X)
EDF
for
a
sample
is
Fn(X)
0 x < x1
Fn (x) = i / n xi x < x(i+1) !
1 xn x
If
no
two
observa4ons
are
equal,
the
empirical
distribu4on
func4on
is
a
step
func4on
that
jumps
1/n
in
height
at
each
observa4on
xk
KolmogorovSmirnov
Test
Kolmogorov
(1933)
developed
a
one-sample
test
and
Smirnov
(1939)
independently
developed
a
two-sample
procedure.
Ver4cal
distance
between
the
sample
cumula4ve
probability
distribu4on
and
hypothesized
cumula4ve
probability
distribu4on
can
be
obtained
for
each
value
of
X.
The
test
sta4s4cs
is
the
largest
value
for
the
ver4cal
distance.
Dn = sup x Fn (X) F0 (X) !
Lilliefors
Test:
KolmogorovSmirnov
one-sample
test
when
and
are
unknown:
Sample
mean
and
sample
SD
can
be
used
as
es4mators
of
popula4on
mean
and
popula4on
sd
in
KS
test.
Kuipers
V
Test:
Combine
M
and
M
(V
=
M+M)
and
obtain
V*.
AndersonDarling
Test
(AD
Test)
Anderson
and
Darling
(1952)
A = n s !
2
2i 1
n
s= log( pi ) + log(1 p(ni+1) ) !
i=1 n
JB = (n 2
6
S + .25 ( K 3) ! ) 2
i=1
1 1
f (x) = exp ( X ) ( 1
X )
( 2 ) m2
12
2