Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

 The normal distribution is a descriptive model that

describes real world situations such as age, intelligence,


etc.
 It is defined as a continuous frequency distribution of
infinite range (can take any values not just integers as in
case of discrete probability distributions).
 Many sampling distributions based on large samples can
be approximated by the normal distribution, even though
their parent population is not normal (binomial and
Poisson distribution).
 This is the most important probability distribution in
statistics and important tool in analysis of almost all
data sciences.
 It links frequency distribution to probability distribution.
 Tests of Significance are developed based on the
assumption of normality. The t-test and ANOVA used to
compare groups means make assumption of Normality
and is also known as parametric tests.
 A continuous random variable can assume any
value in an interval on the real line or in a
collection of intervals.
 It is not possible to talk about the probability of
the random variable assuming a particular value.
 Instead, we talk about the probability of the
random variable assuming a value within a given
interval.
 The probability of the random variable assuming
a value within some given interval from x1 to x2 is
defined to be the area under the graph of the
probability density function between x1 and x2.
 The entire family of normal distributions are
characterized by two parameter: µ and σ.
 The shape of the curve is bell shaped and symmetric
about the line x=µ.
 A r.v. following normal distribution with mean µ and
variance σ2 is expressed as X~ N (µ, σ2).
 Normal Probability Density Function:
1 − ( x − µ ) 2 / 2σ 2
f ( x) = e Standard Deviation =σ
2 πσ

Mean=µ. x
 The highest point on normal curve is mean and
is located at center, mean, median and mode
for normal distribution coincide.
 Shape of distribution is determined by standard
deviation: large value of SD reduce the height
and increase the spread of the curve, small
value of SD increase the height and reduce the
spread of the curve.
 The total area under curve for normal
distribution is 1. The area under the curve to
the left and right of mean is 0.5.
 The curve extends indefinitely in both
directions, approaching, but never touching the
horizontal axis.
µ
One standard deviation away from the mean ( ) in either direction on
the horizontal axis accounts for around 68 percent of the data. Two
standard deviations away from the mean accounts for roughly 95
percent of the data with three standard deviations representing about
99.7 percent of the data.
Standard Deviation =σ

x
Mean=µ1. Mean=µ2. Mean=µ3

Standard Deviation =σ1

Standard Deviation =σ2

Standard Deviation =σ3

x
Mean=µ
 Skewness is a measure of symmetry.
 Skewness = 0 indicates that the
distribution is symmetrical about the mid
point.
 Positive values of skewness (significantly
more than 0) indicate that the
distribution is skewed to the right.
 Negative values of skewness (significantly
less than 0) indicate that the distribution
is skewed to the left.
 For a normal distribution, skewness = 0.
Positively Skewed Curve Normal Curve Negatively Skewed Curve

For a Normal Distribution Skewness = 0


 Kurtosis measures the peakedness of the
curve.
 Kurtosis of more than 3 indicates that the
distribution has shorter tails and higher
peak as compared to a normal distribution.
(Leptokurtic)
 Kurtosis of less than 3 indicates that the
distribution has longer tails and lower
peak as compared to a normal distribution.
(Platykurtic)
 For a normal distribution Kurtosis=3.
For a Normal Distribution Kurtosis = 3
A random variable that has a normal distribution
with a mean of zero and a standard deviation of
one is said to have a standard normal
probability distribution.

The letter z is commonly used to designate this


normal random variable.
The following expression convert any Normal
Distribution into the Standard Normal
Distribution
x−µ
z=
σ
 Statistical methods are based on various
underlying assumptions. One common
assumption is that a random variable is normally
distributed.
 In many statistical analyses, normality is often
conveniently assumed without any empirical
evidence or test. But normality is critical in many
statistical methods.
 When this assumption is violated, interpretation
and inference may not be valid.
 Hence one has to test for normality when the
distribution of the variable is not known.
The tests of Normality can be grouped under two
categories:
 Graphical Methods: It visualize the distribution
of the random variables or differences between
an empirical distribution and a theoretical
distribution (the standard normal distribution).
These are intuitive and easy to interpret but do
not provide objective criteria. It can be
visualized by plotting Histograms, P-P plots and
Q-Q plots
 Numerical methods: Numerical methods present
summary statistics such as skewness and
kurtosis, or involve computation of a statistical
test. They provide an objective way of
examining normality. Kolmogorov-Smirnov test
and Shapiro-Wilk test are two most widely used
test.
F
r
e
q
u
e
n
c
y

Variable values
When a variable is not normally
distributed, one can transform the
variable using suitable transformation.
If the transformed variable is normal,
then that variable can be used in the
analysis.
Three common transformations are:
 the logarithmic transformation
 the square root transformation
and
 the inverse transformation
 When the transformations are unable to induce
normality for a variable, the parametric methods
cannot be used to analyze that variable.
 One can use non-parametric tests as an alternative to
the parametric tests. Though these tests are of lesser
power than the parametric tests, the assumptions of
these tests can be met easily. The results and
conclusions will be valid.
 The assumption of normality for a variable measured
on a nominal scale is inappropriate as one cannot
expect a nominal variable to follow a normal
distribution.
 When the variable is measured on an ordinal scale,
even if the transformation is able to induce normality,
the results are to be interpreted with caution and one
may be required to defend treating an ordinal variable
as a metric.

You might also like