Professional Documents
Culture Documents
LESSON 2 Statistical Theory-1
LESSON 2 Statistical Theory-1
LESSON 2 Statistical Theory-1
Learning Outcomes
Mean
The mean of a data set is the average of all the data values.
∑𝑁 𝑥 ∑𝑛 𝑥
𝜇 = 𝑖=1 𝑖 𝑥̅ = 𝑖=1 𝑖
𝑁 𝑛
Population mean Sample mean
Example 1:
Below are the ages of the 11 executives of Company A. Find the mean.
35, 42, 38, 45, 48, 34, 32, 36, 39, 25, 26
Solution:
35 + 42 + ⋯ + 26
𝜇= = 36.4
11
Example 2:
The grades of the selected Public Administration students in Statistics 101 last
semester were collected as follows:
80 84 82 85 86
Solution:
80+84+82+85+86
𝑥̅ = 5
= 83.4
Median
The median is the value in the middle of the data set when arranged in
ascending/descending order. It is the preferred measure if the data set includes
outliers or extreme values.
26 18 27 12 14 72 19
Solution:
Population median = 𝝁
̃
Sample median = 𝒙
̃
Solution:
Again, arrange the data set in increasing order:
12 14 18 19 26 27 30 72
The median is the average of the two middle values:
19+26
𝑥̃ = = 22.5
2
Mode
The mode of a data set is the value that occurs with greatest frequency. The
data set can have multiple or no mode at all. If the data have exactly 2 modes, the data
set is called bimodal. If the data have more than two modes, it is called multimodal.
Example 5:
No Mode
Raw Data : 10.3 4.9 8.9 11.7 6.3 7.7
This data set contains no mode, since all values have same frequency.
One Mode
Raw Data : 6.0 4.9 6.0 8.9 6.3 4.9 4.9
The mode is 4.9, since it occurs 3 times and is the most among the data values.
Population mode: 𝝁𝟎
Sample mode = 𝒙𝟎
The modes in the data set are 28 and 43, since both have the most number of
occurrences.
Exercises:
Fifteen hotels were randomly sampled in a Davao City. The prices per night
are presented below:
4250 4300 4300 4350 4350
4400 4400 4400 4450 4450
4500 4500 4500 4500 4500
Weighted Mean
𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛
𝑥̅𝑤 =
𝑤1 + 𝑤2 + ⋯ + 𝑤𝑛
∑𝑛𝑖=1 𝑤𝑖 𝑥𝑖 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖
𝑥̅𝑤 = 𝑛 𝑜𝑟
∑𝑖=1 𝑤𝑖 ∑𝑛𝑖=1 𝑓𝑖
Note:
The weighted mean presented above is the sample weighted mean. The
notation for the population weighted mean is 𝜇𝑤 whose formula is the same
as 𝑥̅ .
Example 6:
Ms. Pia Diaz received 83, 85, 90, 87, and 86 for her five 3-unit minor subjects,
and 76 for her 5-unit subject. Compute her grade point average (GPA).
Solution:
Grade Unit
83 3.0
85 3.0
90 3.0
87 3.0
86 3.0
76 5.0
(83)(3.0)+(85)(3.0)+(90)(3.0)+(87)(3.0)+(86)(3.0)+(76)(5.0)
𝑥̅𝑤 = 20
= 83.7
Example 7:
Ms. Catriona Green grades quizzes, 40%; homework, 10%; and final exam,
50%. A student had grades of 78, 95, and 75 respectively, for quizzes, homework, and
final exam. Find the student’s final grade.
Solution:
(78)(. 40) + (95)(. 10) + (75). 50)
𝐺𝑟𝑎𝑑𝑒 =
. 40 + .10 + .50
= 78.2
(10)(2)+(5)(3)+(8)(1)+(12)(2)+(6)(1)+(2)(1)
10
Range
The range of a set of data is the difference between the highest and the lowest
values of the data.
Range = Highest – Lowest
Example 8:
The pulse rates of two patients were recorded three times upon arriving in
hospital. The pulse rates for patient A were 72, 76, and 74, while 72, 91, and 59 for
patient B. Note that the mean pulse rate of the two patients is the same, 74, but
observe the difference in variability. Which patient is more varied in pulse rates?
𝑅𝐴 = 76 − 72 = 4
𝑅𝐵 = 91 − 59 = 32
Note:
The range is easy to compute. It only depends on the two extreme values of a
set of data – the lowest and the highest. It does not tell us anything about the
dispersion of the values that fall between the two extremes.
Illustration:
SET A 3 18 18 18 18
SET B 3 7 16 16 18
SET C 3 6 9 14 18
Based on the table, set C is more variable than sets A and B as can be seen on
the distribution of the data set. However, the range cannot detect these differences of
variability since:
𝑅𝐴 = 18 − 3 = 15
𝑅𝐵 = 18 − 3 = 15
𝑅𝐶 = 18 − 3 = 15
Variance
The Variance of a set of data is the average square deviation of the values
from the mean.
∑𝑁
𝑖=1(𝑥𝑖 −𝜇 )
2 ∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2
𝜎2 = 𝑠2 =
𝑁 𝑛−1
Example 9:
The Human Resource and Management Office of Malaybalay City has to
choose 1 of the two personnel for promotion. The human resource officer considers
the performance ratings in the last 5 years.
Which personnel should the human resource officer consider for promotion?
Solution:
87+83+84+94+92 88+89+87+86+90
𝑥̅𝐴 = = 88 𝑥̅𝐵 = = 88
5 5
𝑠𝐴2 = 23.5
𝑠𝐵2 = 2.5
The human resource officer should choose personnel B since he is more
consistent in his performance in the last 5 years.
Example 10:
The prices of all gasoline stations in Malaybalay in the last month were
recorded. The prices were as follows: 44, 44, 45, 45, 43, 46, 45, 47, 46, 44, 47, 44. Find
the variance of the prices.
44 + 44 + 45 + 45 + 43 + 46 + 45 + 47 + 46 + 44 + 47 + 44
𝜇= = 45
12
2
(44 − 45)2 + (44 − 45)2 + (45 − 45)2 + ⋯ + (44 − 45)2
𝜎 =
12
(−1)2+(−1)2+(0)2 +(0)2+(−2)2+(1)2 +(0)2+(2)2 +(1)2 +(−1)2+(2)2 +(−1)2
=
12
1+1+0+0+4+1+0+4+1+1+4+1
= 12
18
= 12
= 1.5
Standard Deviation
The Standard Deviation of a set of data is the average of the square deviation of
the values of the data from the mean. It has the same unit as the given data.
∑𝑁
𝑖=1(𝑥𝑖 −𝜇 )
2 ∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2
𝜎=√ 𝑁
𝑠=√ 𝑛−1
Example 11:
Solution:
For example 9,
𝑠𝐴 = √23.5 = 4.8
𝑠𝐵 = √2.5 = 1.6
For example 10,
𝜎 = √1.5 = 1.2
Note: The standard deviation has same unit as the given data.
Mean Deviation
Example 12:
Find the mean absolute deviation of the sample 2, 3, 5, 7, and 8.
Solution:
2+3+5+7+8
𝑥̅ = =5
5
|2−5|+|3−5|+|5−5|+|7−5|+|8−5|
𝑥̅𝑀𝐴𝐷 = 5
|−3|+|−2|+|0|+|2|+|3|
= 5
3+2+0+2+3
= 5
=2
Coefficient of Variation
Population CV Sample CV
Note:
If you know nothing about the data other than the mean, one way to interpret
the relative magnitude of the standard deviation is to divide it by the mean. This is
called the coefficient of variation.
Example 13:
Consider two sampled data sets with different units. In data set 1, the mean
is 80 and standard deviation is 12, while in data set 2 the mean is .50 and its standard
deviation is .20. Which data set is more variable?
Solution:
Given:
Mean 80 cm .50 m
SD 12 cm .20 m
The computations indicate that data set 2 is more variable than data set 1.
Note:
Knowing nothing else about the data, the CV helps us see that even a lower
standard deviation doesn’t mean less variable data.
Percentiles
The percentiles are the values in the sorted data set which is divided in 100
equal parts. The kth percentile is the value in which k percent of the data are below
it. The percentiles are denote by 𝑃𝑘 where 𝑘 = 1, 2, 3, … , 99.
Example 14:
Below are the ages of the employees in the Traffic Management Department
(TMD) in the City of Malaybalay.
22 25 25 27 27 28 28 28
30 30 31 32 32 32 34 34
35 35 36 36 36 36 37 37
38 38 39 39 40 40 41 41
43 44 45 47 49 51 52 55
Solution:
85
𝑖=( ) × 40 = 34
100
Note that the obtained value of 𝑖 is a whole number. So, the 𝑃85 is the average
44+45
of the 34th and 35th observations. Hence, 𝑃85 = 2 = 44.5. This value means the
85% of the employees under the traffic management department have ages below
44.5 years.
48
𝑖=( ) × 40 = 19.2
100
Since the computed 𝑖 is not a whole number, then the 𝑃48 is the 20th value
which is 36. This means that 48% of the employees under traffic management
department are below 36 years of age.
Deciles
The deciles are the values in the data set sorted in 10 equal parts. The values
are denoted by 𝐷1 , 𝐷2 , … , 𝐷9 .
Note that:
Example 15:
Solution:
7
𝑖=( ) × 40 = 27
10
39+39
The value of 𝐷7 is the average of the 27th and 28th values. Hence, 𝐷7 = =
2
39. This means that 70% of the employees under the TMD is below 39 years of age.
Note:
The same with percentile, if the computed index 𝑖 is not an integer, then its
value is rounded up to the next whole number.
Quartiles
The quartiles are the values in the data set sorted in 4 equal parts. The values
are denoted by 𝑄1 , 𝑄2 , and , 𝑄3 .
Example 16:
Find the Q2 and Q3 in the data set presented example 14. Interpret the results
Solution:
For 𝑄2 ,
2
𝑖 = ( ) × 40 = 20
4
The computed 𝑖 shows that the 𝑄2 is the average of 20th and 21st values in the
36+36
data set. Hence, 𝑄2 = 2 = 36. This means that 50% of the employees under TMD
are below 36 years of year.
Interquartile Range
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
Example 17:
Find the interquartile range of the following data: 35, 25, 11, 0, 2, 10, 6, 5, 20,
14, and 30.
STEPS:
Step 1: Arrange the number in increasing order:
0, 2, 5, 6, 10, 11, 14, 20, 25, 30, 35
𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 25 – 5 = 20
Chebyshev’s Theorem
1
At least the fraction 1 − 𝑘 2 of the measurements of any set of data must lie
within k standard deviations of the mean.
Example 18:
Solution:
1 1,590
1 − 𝑘 2 = 2,120
1,590 1
1 − 2,120 = 𝑘 2
3 1
1 − 4 = 𝑘2
1 1
= 𝑘2 ⟹ 𝑘2 = 4 ⟹ 𝑘=2
4
Therefore, at least 1,590 (or 75%) of the students have IQs between 118 and
138.
Example 19:
Refer to example 18, in what range can we be sure that no more than 120 of
the scores fall?
2.4 z-score
Note:
1. The positive value of z-score is the number of standard deviations an
observation is above the mean, while the negative z-score is the number of
standard deviations an observation is below the mean.
Example 20:
Mr. Reyes made a grade of 83 in Mathematics and a grade of 89 in English. The
performances of the classes where Mr. Reyes was in are as follows:
Solution:
83−75
𝑧𝑀 = = 2.67
3
The value 2.67 means that Mr. Reyes’ grade is 2.67 standard deviations above
the whole Mathematics class performance.
89−88
𝑧𝐸 = = 0.5
2
The value 0.5 means that Mr. Reyes’ grade is 0.5 standard deviations above the
whole English class performance.
Example 21:
Typing is one of the skills a secretary should possess. In order to evaluate
candidates for these positions, a local employment agency administers three
standardized typing samples. A time penalty has been incorporated into the scoring
of each sample based on the number of typing errors. Ms. Anna Gozales is applying
for the three firms as a secretary. The mean and standard deviation for each test,
together with the score achieved by Ms. Gonzales, are given below:
8 − 11
𝑧𝑀 = = −1
3
38 − 30
𝑧𝑆 = =4
2
The results suggest that Ms. Gonzales is best suited to real estate firm.
Skewness
The distribution of data set is said to be symmetric if it looks the same to the
left and right of its center point. Otherwise, it is called skewed.
Coefficient of Skewness:
The skewness of the data set can be computed using the formulas:
3(𝜇 − 𝜇̃ ) 3(𝑥̅ − 𝑥̃ )
𝑠𝑘 = ̂=
𝑠𝑘
𝜎 𝑠
Symmetric, if sk = 0
The figures show that the distribution is positively skewed if the bulk of
observations is on the left side data set; it is negatively skewed if the bulk is on the
right side; and symmetric if the bulk is at the center or the distributions of the left and
right of the data set are the same.
Note that if the distribution is negatively skewed, the mean < median < mode.
That is,
Mean
Median
Mode
Example 22:
Consider the set of sampled data set 6, 8, 10, 10, 12, 14. Compute the skewness.
Solution:
6+8+10+10+12+14
𝑥̅ = = 10
6
(6−10)2 +(8−10)2+(10−10)2+(10−10)2+(12−10)2+(14−10)2
𝑠=√ = 2.83
6−1
𝑥̃ = 10
This means that the distribution of the given data set is symmetric.
Example 23:
Find the measure of skewness of the following sampled data: 27, 18, 9, 1, 2, 7,
6, 5, 15, 12, and 19.
Solution:
27+18+9+1+2+7+6+5+15+12+19
𝑥̅ = = 11
11
(27−11)2+(18−11)2+(9−11)2 +⋯+(19−11)2
𝑠=√ = 8.05
6−1
𝑥̃ = 9
3(11 − 9)
̂=
𝑠𝑘 = 0.75
8.05
This means that the distribution of the given data set is symmetric.
Kurtosis
Types of Kurtosis:
1. Leptokurtic
2. Platykurtic
The values/scores are distributed over a wider range about the center
making the hump of the curve flat.
3. Mesokurtic
Leptokurtic Platykurtic
Mesokurtic