Download as pdf or txt
Download as pdf or txt
You are on page 1of 103

Normality test

Yi-Lung Chen, PhD


Department of Healthcare Administration, Asia University
Department of Psychology, Asia University

1
Normal distribution
• Normal distribution
– In probability theory, a normal distribution is a type of
continuous probability distribution for a real-valued random
variable
• A basic assumption for parametric test (e.g., t-test,
ANOVA, regression)

2
Methods of testing normality
• Methods of testing normality
– Graphic methods
• Frequency distribution
• Q-Q plot
– Statistical tests
• Shapiro–Wilk test
• Kolmogorov–Smirnov test
– Skewness and kurtosis

4
Graphic methods

5
Frequency distribution
(histogram)

A bell shape of histogram is the visual judgment about


whether the distribution is normal
6
Histogram
• Histogram
– is an approximate representation of the distribution of data
• X axis
– Value of variable
• Y axis
– Frequency (count)

7
Data for histogram
Raw data Data for histogram
X All unique value Frequency
1 1 1
2 2 2
2 3 3
3 4 2
3 5 1
3
4
4
5 8
Figure of histogram

Data Figure
4

All unique values Frequency


3
1 1

Frequency
2 2 2

3 3
1
4 2
5 1 0
1 2 3 4 5
Value

9
An practice
Raw data
X
0
0
Please calculate summary
1 data and draw a histogram
1
2
3
3
4
4
10
Answer for the practice

Data Figure
3

All unique values Frequency


0 2 2

Frequency
1 2
2 1 1

3 2
4 2
0
0 1 2 3 4
Value

11
Demo of histogram in the SPSS

12
13
1. Put variable in “variable(s)”
2. Click “Charts”
3. Check “Histogram”
4. Check “Show normal curve” to visualize data 14
Results

Summary data Histogram

The histogram is close to normal curve


15
Q-Q plot
• Q–Q (quantile-quantile) plot
– is a probability plot, which is a graphical method for
comparing two probability distributions by plotting their
quantiles against each other.

17
Quantile
• Quantile
– quantiles are cut points dividing the range of a probability
distribution into continuous intervals with equal
probabilities, or dividing the observations in a sample in
the same way
• Two expressions
– q-quantiles
» q is an integer
• Range from 2 to ∞
– cumulative distribution-quantiles
» cumulative distribution is probability
• Range from 0 to 1

18
q-quantiles
• q-quantiles
– q is an integer
• It means a finite set of values into q subsets of (nearly)
equal sizes.
– If q = 2
» We spilite data into 2 subsets equally
– If q = 4
» We split data into 4 subsets equally

19
The number of value for quantile

• The number of value for quantile


– q minus 1
• Where q is the part we want for data
– If we want to split our data into two parts, we need “1”
point
» For example, for a data (1, 2, 3), we cut between 2

– If we want to split our data into 3 parts, we need “2” point


» For example, for a data (1, 2, 3), we cut between 1 and 2
and between 2 and 3

20
Specialized quantiles
• Specialized quantiles
– We usually use a probability with quantile
• 2-quantile
– It is exact as the median
• 4-quantiles
– quartiles
• 100-quantiles
– percentiles

21
Another expression
• Expression of quantile based on cumulative distribution
– Cumulative Distribution Function (c.d.f.)
• The cumulative distribution function (CDF) at x gives
the probability that the random variable is less than or
equal to x: FX(x)=P(X≤x), calculated as the sum of the
probability
– Cumulative Distribution Function
» Range from 0 to 1

22
Calculation of C. D. F.
X Probability Cumulative distribution
1 0.1 (1/10) 0.1 (1/10)
2 0.1 (1/10) 0.2 (2/10)
3 0.1 (1/10) 0.3 (3/10)
4 0.1 (1/10) 0.4 (4/10)
5 0.1 (1/10) 0.5 (5/10)
6 0.1 (1/10) 0.6 (6/10)
7 0.1 (1/10) 0.7 (7/10)
8 0.1 (1/10) 0.8 (8/10)
9 0.1 (1/10) 0.9 (9/10)
10 0.1 (1/10) 1.0 (10/10)

A 10 observations data 23
Normal distribution and its CDF

Normal distribution C. D. F.

Cumulative probability
Probability

24
Expression of quantile based on
cumulative distribution
• We used the cumulative distribution to express quantile
– For a 2-quantile
• It spilt the data into 2 parts
– 50% and 50%
» 0.5 of cumulative distribution
• So, we can say a 0.5 quantile or 50% quantile
– For a 4-quantile
• It spilt the data into 4 parts
– 25%, 50%, 75%
» So, we can say they are 0.25, 0.5, and 0.75 quantile

25
A cut point to spilt data

1
2 When we have a 10 observations data,
We spilt data into 2 equal sized groups
3 5 for group 1 and 5 for group 2
4 Then this quantile is 0.5 or 50% quantile
5
Because it spilt 50% of data with equal size
6
7 Then we should cut it between 5 and 6,
So 5.5 is good
8
9
10
26
A practice

1
2 If we want to a 0.5 quantile, what is it
3

27
Answer

1
It is 2 because
2 50% of data is between
1 and 3
3

28
Many ways to calculate quantiles

• 9 methods to calculate quantiles


– Because of
• Unbiased estimate
• Derivations
• Linear interpolation
– So it may be common that some little difference happen
when we report quantiles and its-related methods

– When data is large, all methods have very close results

30
Q-Q plots compare the quantile in
your data and the quantile from a
normal distribution

31
Steps for Q-Q plot
• Steps for Q-Q plot
– 1. sort data
– 2. calculating Z score of original data
– 3. calculating theoretical normal cumulative distribution of sorted
data with its rank
– 4. calculate the quantile of Z score for normal cumulative
distribution
– 5. Plot two Z scores

For a theoretical cumulative distribution (cdi) for a normal distribution F is


𝑟𝑖 − 0.5
cd𝑖 =
𝑛
Where ri is the i-th observation’s rank
n is the total sample size 32
An example to calculate Q-Q plot

X
1
2
2
3
3
3
4
4
5

33
Calculation of Z score of x

Z score of
X
observed X

1 -1.45
2 -0.45
2 -0.45
3 0.55
3 0.55
Mean = 3, SD = 1.22
3 0.55
4 1.55
4 1.55
5 2.55 34
Ranking X and its theoretical
normal cumulative distribution
Z score of
X Rank
observed X

1 -1.45 1
2 -0.45 2
2 -0.45 3
3 0.55 4
𝑟𝑖 − 0.5
𝑞𝑖 =
3
𝑛
0.55 5
3 0.55 6
4 1.55 7
4 1.55 8
35
5 2.55 9
Ranking X and its theoretical
cumulative distribution
Z score of
cumulative
X observed Rank
distribution
X

1 -1.45 1 (1-0.5)/9
2 -0.45 2 (2-0.5)/9
2 -0.45 3 .
3 0.55 4 .
𝑟𝑖 − 0.5
𝑞𝑖 =
3
𝑛
0.55 5 .
3 0.55 6 .
4 1.55 7 .
4 1.55 8 .
36
5 2.55 9 .
Z score of expected normal quantile

Z score of
cumulative Z score of
X observed Rank
distribution expected X
X
1 -1.45 1 0.06 -1.59
2 -0.45 2 0.17 -0.97
2 -0.45 3 0.28 -0.59
3 0.55 4 0.39 -0.28
3 0.55 5 0.50 0.00
3 0.55 6 0.61 0.28
4 1.55 7 0.72 0.59
4 1.55 8 0.83 0.97
5 2.55 9 0.94 1.59 37
Plot two Z scores
Z score of Z score of
observed X expected X

-1.45 -1.59
-0.45 -0.97
-0.45 -0.59
0.55 -0.28
0.55 0.00
0.55 0.28
1.55 0.59
1.55 0.97
2.55 1.59
38
Plot two Z scores
Z score of Z score of
observed X expected X
-1.45 -1.59
-0.45 -0.97
-0.45 -0.59
0.55 -0.28
0.55 0.00
0.55 0.28
1.55 0.59
1.55 0.97
2.55 1.59

If observed score is close to expected value, we say it has normality


So, we usually add the expected value as the reference line,
which is follows the 45° line y=x 39
A little difference between
statistical software

SPSS R with the package of car SAS

Qqplot also has a little difference in different statistical software


based on different methods to calculate quantile
41
We used the same methods as those in R and SAS
Demo of Q-Q plot in the SPSS

42
43
1. Test distribution is normal (the default setting)
you can change it if you have another assumption of distribution
2. Check the standardized values (Z score)
it reports the Z score, if unchecked, it reports original scale of variable

44
Results

The observed value is close to expected normal value (the line of 45 degree)
45
The main disadvantage of
graphic methods
• The main disadvantage of graphic methods
– It is objective for different research about what is the
deviated from a normal distribution

Is this a normal distribution?


46
Test of normality

47
Test of normality
• Test of normality
– Test of normality can give us a p-value to determine
normality or not

• Methods of Testing normality


– Shapiro–Wilk test
– Kolmogorov–Smirnov test

48
Kolmogorov–Smirnov vs. Shapiro–Wilk test

• Kolmogorov–Smirnov test
– It has been reported that the K-S test has low power and it
should not be seriously considered for testing normality.

• Shapiro–Wilk test
– It is preferable that normality be assessed both visually
and through normality tests, of which the Shapiro-Wilk test,
provided by the SPSS software, is highly recommended.

Ghasemi, A., & Zahediasl, S. (2012). Normality tests for statistical analysis: a guide for non-statisticians. 49
International Journal of Endocrinology and Metabolism, 10(2), 486-489. doi:10.5812/ijem.3505
Shapiro–Wilk test
• Assumption
– Null hypothesis: normality
– Alternative hypothesis: non-normality

• Shapiro-Wilk formula

(σ 𝑎𝑖 × (𝑥𝑛+1−𝑖 − 𝑥𝑖 )2
𝑊=
σ 𝑥𝑖 − 𝑥ҧ 2

Where xi is the value of ordered data from smallest


ai is the coefficient of the order statistics of a sample of size n from
a normal distribution
n is the sample size 50
Shapiro-Wilk Tables

Coefficients p-values of W

51
An examples
Raw data
55
35
45
70
58
61
63
65
68
86
72
74
52
An examples
Raw data Sorted data Order
55 35 1
35 45 2
45 55 3
70 58 4
58 61 5
61 63 6
63 65 7
65 68 8
68 70 9
86 72 10
72 74 11
74 86 12
53
An examples
Sorted data 𝑥𝑖 − 𝑥ҧ 2

35 (35-62.7)2
45 (45-62.7)2
55 (55-62.7)2
58 (58-62.7)2
61 (61-62.7)2 (σ 𝑎𝑖 × (𝑥𝑛+1−𝑖 − 𝑥𝑖 )2
63 (63-62.7)2 𝑊=
σ 𝑥𝑖 − 𝑥ҧ 2
65 (65-62.7)2
68 (68-62.7)2 (σ 𝑎𝑖 × (𝑥𝑛+1−𝑖 − 𝑥𝑖 )2
=
70 (70-62.7)2 2008.7
72 (72-62.7)2
74 (74-62.7)2
86 (86-62.7)2
Mean = 62.7 Sum = 2008.7
54
An examples
Sorted (𝑥𝑛+1−𝑖 (𝑎𝑖 × (𝑥𝑛+1−𝑖
Order 𝑎𝑖 n+1-i i 𝑥𝑛+1−𝑖 𝑥𝑖
data − 𝑥𝑖) − 𝑥𝑖)

35 1 0.5475 12+1-1=12 1 86 35 51 27.9


45 2 0.3325 12+1-2=11 2 74 45 29 9.6
55 3 0.2347 12+1-3=10 3 72 55 17 4.0
58 4 0.1586 12+1-4=9 4 70 58 12 1.9
61 5 0.0922 12+1-5=8 5 68 61 7 0.6
63 6 0.0303 12+1-6=7 6 65 63 2 0.1
65 7 Sum=44.2
68 8 44.22=1953.6
n (sample size)=12
70 9
72 10 (σ 𝑎𝑖 × (𝑥𝑛+1−𝑖 − 𝑥𝑖 )2
=
2008.7
74 11
1953.6 55
86 12 = = 0.97
2008.7
Critical value of W

When W value is smaller, indicating a tendency of non-normality


The critical value of W value is 0.859 when n = 12

Because our results is 0.97, which is large than 0.859,


suggesting normality of this data

56
An practice
Raw data a
1 0.6872
3 0.1677
4
2

Please calculate Shapiro-Wilk test for normality

57
Answer
𝑥𝑖 − 𝑥ҧ 2
Sorted data 𝑎𝑖 𝑥𝑛+1−𝑖 − 𝑥𝑖 𝑎𝑖 × (𝑥𝑛+1−𝑖 − 𝑥𝑖

1 (1-2.5)2 0.6872 4-1=3 2.1

2 (2-2.5)2 0.1677 3-2=1 0.2

3 (3-2.5)2 Sum=2.2

4 (4-2.5)2 (2.3)2=5.3

Mean=2.5 Sum=5

n (sample size)=4

5.3
= = 1.06
5
58
A deep look at the formula

Symmetry of distribution
(skewness)
Shapiro-Wilk formula

(σ 𝑎𝑖 × (𝑥𝑛+1−𝑖 − 𝑥𝑖 )2
𝑊=
σ 𝑥𝑖 − 𝑥ҧ 2

Sum of square
(Variance)

59
Sensitive for Symmetry of
distribution (skewness)
Symmetry Asymmetry
3.5 3.5

2.5 2.5

Frequency
Frequency

1.5
1.5

0.5
0.5

0
1 2 3 4 5 1 2 3 4 5 6 7 8
-0.5 Value
Value

(σ 𝑎𝑖 ×(𝑥𝑛+1−𝑖 −𝑥𝑖 )2
𝑊= σ 𝑥𝑖 −𝑥ҧ 2
; because ai is smaller than 1, when there is
asymmetry, the increase in denominator is 60
stronger than in numerator
Demo of Shapiro-Wilk test in the
SPSS

61
62
1. Put variable in dependent list
2. Go to “Plots” and check “normality plots with tests”

63
Results

Because the p-value of Shapiro-Wilk test is 0.922,


then we the null hypothesis that our data has normality

64
When we want to conduct a t-test
with 2 groups, the normality test
should be tested in the outcome
variable within the whole sample
or two groups separately?

65
Normality between groups

Two sample t-test assumes normality. Therefore, it can be


used when the normality is satisfied through the normality
test. In this case, the normality test should be performed for
each group, and it can be said that the normality is satisfied
when the normality is satisfied in both groups.

Kwak, S. G., & Park, S.-H. (2019). Normality Test in Clinical Research. Journal of 66
Rheumatic Diseases, 26(1), 5. doi:10.4078/jrd.2019.26.1.5
Shortage for these
Kolmogorov–Smirnov test
• Shortage for these tests
– It is suitable for moderate sample size because of sample-
size bias
• When sample size is small, it is conservative (tend to
be normality)
– But we are usually not sure the normality under a small
size
• When sample size is large, it is too sensitive (tend to be
non-normality)

67
Skewness and kurtosis are alternative
method to test normality

68
Skewness
• Skewness
– is a measure of the “asymmetry” of the probability
distribution of a real-valued random variable about its
“mean”

mean mean

69
Symmetry

Symmetry - Wikipedia 70
Negative or positive Skewness

71
Sample skewness
• Sample skewness
– adjusted Fisher-Pearson standardized moment coefficient of
skewness (there are several methods to calculate sample
skewness)

74
An example of calculation of
sample skewness
xi Mean Sd xi-mean Cubic

1 2.25 1.25 -1.25 -1.95


2 -0.25 -0.02
2 -0.25 -0.02
4 1.75 5.36
sum 3.38
sample standard deviation

75
An example of calculation of
sample skewness
xi Mean Sd xi-mean Cubic

1 2.25 1.25 -1.25 -1.95


2 -0.25 -0.02
2 -0.25 -0.02
4 1.75 5.36
sum 3.38
4 × 3.38
=
1.253 × 3 × 2

=1.13 76
Se of skewness
• Se of skewness

6𝑛 × (𝑛 − 1)
=
(𝑛 − 2) × (𝑛 + 1) × (𝑛 + 3)

n = sample size

77
Meaning of quadratic and
cubic term

• If we have a center 0 and three point (-1, 1, 2)


– Quadratic term (non-negative feature)
• (-1-0)2 = 1
• (1-0)2 = 1
• (2-0)2 = 4
– Distance of all points apart from 0

-2 -1 0 1 2 78
Meaning of quadratic and
cubic term (II)

• If we have a center 0 and three point (-1, 1, 2)


– Cubic term (non-negative feature)
• (-1-0)3 = -1
• (1-0)3 = 1
• (2-0)3 = 8
– We can detect difference between two sides

-2 -1 0 1 2 79
Impact of parameters
• Sample skewness

– If all Xi – Xbar = 0 (symmetric), the sample skewness = 0


– If sample-standard-deviation is large, the sample skewness is
small
– If sample size is large, the sample skewness is small

80
Kurtosis
• Kurtosis
– a measure of the "tailedness" of the probability distribution
of a real-valued random variable
• Fat tailed, heavy tailed

↓ Fat tailed
Thin tailed ↑
82
Unbiased estimate of sample
kurtosis
• Unbiased estimate of sample kurtosis

k2 is the unbiased estimate of the second cumulant


(identical to the unbiased estimate of the sample variance)
adjusted Fisher-Pearson standardized moment coefficient of kurtosis

Joanes, Derrick N.; Gill, Christine A. (1998), "Comparing measures of sample skewness and 87
kurtosis", Journal of the Royal Statistical Society, Series D, 47 (1): 183–189
Impact of parameters
• Unbiased estimate of sample kurtosis

– If there are many outlier (xi), the kurtosis will increase


– Because we used the quartic term, the direct of outliers
(greater or smaller than mean) is the same
• For example, (-5)^4 = (5)^4

90
Impact of parameters
• Unbiased estimate of sample kurtosis

– When sample size increase, the estimator usually


decreases
• For example, when we keep the middle term as 1, we
compared n between 10, and 20
11 ∗ 10 92 110 243
𝑛 𝑜𝑓 10 = ∗1 −3∗ = − = 0.22 − 4.34 = −4.12
9∗8∗7 8 ∗ 7 504 56
21∗20 192 110 243
𝑛 𝑜𝑓 20 = ∗1 −3∗ = 504 − = 0.07 − 3.54=-3.47
19∗18∗17 18∗17 56

91
Impact of Outliers
• Outlier
– Both the Skewness test and the Kurtosis test are very
sensitive outlier detectors
• One outlier will make the distribution appear skewed

• Two symmetric outliers will make the tails appear heavy

92
Plot

Without outliers With “two” outliers

94
Suggestions of non-normality of
kurtosis and skewness
• Non-normality based on different sample size using common
statistical software (e.g., SPSS or SAS)
– <50
• Skessness and kurtotis
– Z > ± 1.96
– <300
• Skessness and kurtotis
– Z > ± 3.29
– >300
• Skessness
– Skessness value > ± 2
• Kurtotis
– Kurtotis value > ± 4

Kim, H-Y. (2013). Statistical notes for clinical researchers: Assessing normal distribution (2) 95
using skewness and kurtosis. Restorative Dentistry and Endodontics 38, 52–54
Demo of skewness and kurtosis in
the SPSS

99
100
1. Go to “Statistics”
2. Check “skewness” and “kurtosis”

101
Results

Z value of skewness = -0.518/0.637 = -0.81

Z value of kurtosis = -0.747/1.232 = -0.61


102
Data transformation

104
Data transformation
• Data transformation
– It is usually applied so that the data appear to more closely
meet the assumptions of a statistical inference procedure
that is to be applied, or to improve the interpretability or
appearance of graphs
• Non-normality to normality

105
Common transformation methods

• Common transformation methods


– Square-root transformation
• X’ = √X
– Cube-root transformation
• X’ = 3√X
– Logarithmic transformation
• X’ = log of X
– Base can be e (Euler's number) or 10
» These methods can be use to address right skew

106
Effect of transformation

Original sqrt cube root log(e)


1 1 1 0
2 1.414214 1.122462 0.693147
4 2 1.259921 1.386294
5 2.236068 1.30766 1.609438
10 3.162278 1.467799 2.302585
50 7.071068 1.919383 3.912023
80 8.944272 2.075782 4.382027
100 10 2.154435 4.60517
107
Illusion

Square-root and square root and log transformation can use to address
right skewness data, but the later two have stronger effect on distribution shape
https://fahimahmad.netlify.app/posts/methods-for-transforming-data-to-normal-distribution-in-r/ 108
Data transformation in the SPSS

109
Data transformation in the SPSS

1. Naming the New variable


2. Formula of transformation
3. Click Ok

110
Results

111
One issue to cause non-normality
is “outlier”

112
To address outliers
• Problems of outliers
– non-normality
– unequal variances

• Method to address outliers


– Trimming or truncation
– Winsorizing or winsorization

113
Trimming or truncation
• Trimming or truncation
– Removing outliers from your data
• Common definition of outlier
– 99.7%
» Z score of ± 3
– 98.8%
» Z score of ± 2.5
– 95%
» Z score of ± 1.96
• Definition is sometime based on your sample size,
when you have a small sample size, you usually do
not want to delete too many sample, then you may set
a strict definition of outlier, for example, ± 2.5.

114
Winsorizing or winsorization
• Winsorizing or winsorization
– It is developed by Charles P. Winsor, we first define the
cuf-off value of outliers, then we replace the outlier using
the cut-off value

115
An example

Original
-3
-2.5 If we define a outlier as z score of
-2 ± 2.5, what are the results of
trimming and winsorizing
-1
0
1
2
2.5
3
116
The results of
trimming and winsorizing
Original trimming winsorizing
-3 - -2.5
-2.5 - -2.5
-2 -2 -2
-1 -1 -1
0 0 0
1 1 1
2 2 2
2.5 - 2.5
3 - 2.5
Sample size 9 5 9
117
Calcuating Z score

118
119
Results

You can get a new standardized variable


with its name is “Z+original variable name”

For example, if your original variable is


x, then you get a “Zx” variable

120
Thanks for listening
Q&A

123

You might also like