Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Kishiane Ysabelle L.

Cabatic BEED LangEd III


CPE105
Handouts: Review on Basic Statistics
Instruction: Read and understand the concepts in this short review then answer the exercises. Show
your solution.

1.) Definitions:
a. Data (Datum): Items in a record or report are facts expressed in numbers or described
by their quality or kind. These facts are called data. The major concern of Statistics is
about data and how to deal with it.
Ex. 1. Color of the eyes 3. scores
2. Class size 4. height

b. Population, Sample: A population is a collection of all the units from which data is to
be collected. A subset or a representative part of the population is called sample .

c. Census, sampling: Census is the process where information is gathered from all the
units in the population, sampling is the process when only a part of the population is
used to obtain data. The information derived from the sample is used to make some
generalizations about the whole population. Errors are unavoidable when these
generalizations are made. The role of statistics is to provide the procedures so as to
minimize these errors.

d. Statistics (Singular and plural sense):


In singular sense: Statistics is a branch of Science that deals with the
development of methods for a more effective way of collecting, organizing,
presenting, and analyzing data. In plural sense Statistics can mean the data itself or
some numerical computations derived from the data.

e. Measurement: The process of assigning a number or a numerical value to a


characteristic of the object that is being measured.

2.) Two Major Areas in Statistics:


a. Descriptive Statistics – deals largely with summary calculations, graphical displays
and describing important features of a set of data. It does not attempt to draw
conclusions/insights about anything that pertains to more than the data themselves.

b. Inductive / Inferential Statistics – is concerned with making generalizations from


information gathered from a small group of observations (sample) to a bigger group
of observations (population). It is equipped with an enormous number of analytical
tools that allows the investigator to grasp a better understanding about the population
from which the sample data was gathered based on the information that is contained
only in the sample.
3.) Two Types of Data:
a. Quantitative data – are data that can be expressed in numbers. These are things or
information that can be measured, like a person’s height or a student’s score in a quiz.
b. Qualitative data – are facts for which no numerical measure exists and are usually
expressed in categories or kind like the color of the skin – black, white or brown; a
person’s sex – male or female; a person’s personality – extrovert or introvert.
4.) Types of Variables:

1
Variables are the characteristics or properties measured from objects, persons or
things.
a. Discrete variables can be counted thus assume a value which is a whole number.
Example: Number of passers or failures in the LET
b. Continues variables can be measured using some units of measurement, which may
take some decimal numbers. Example: weights, heights, ages
5.) Properties of Numbers :
Another way of looking at data is on the way they are measured. Measurements are
always associated with numbers that have some interesting properties or characteristics.
a. Identity – it enables a person to distinguish one number from the other. They are
identified by their shapes or the way they are written. This is the simplest property of
numbers.
b. Order – it refers to the way the numbers are arranged in a sequence. It is an
established convention that 1 comes before 2, 2 comes before 3, and so on. We also
say that “7 is greater than 6” or “3 is less than 5”.
c. Additivity / equality of scale – the property that allows us to add numbers. There is
equality of scale used, when we say “ 3g + 5g = 8g” , we are confident that we are
correct because of equality of scale. This means that in the sequence 1g, 2g, 3g. … the
distance from 1g to 2g is the same as the distance from 2g to 3g, is the same as the
distance from 3g to 4g and so on. Certainly, 2 flowers + 5 peanuts is not equal to 7
apples because the scales used are not equal.
d. “Absolute zero” – it means it has nothing of the characteristic that is being measured.
In temperature, the characteristic that is being measured is ‘amount of heat’. A
temperature of zero degree Celsius does not mean that the object does not have any
amount of heat. If a student received a score of zero in an intelligence exam, it does
not necessarily mean that the student has ‘no intelligence’ at all. A length of zero
meters absolutely means the object has ‘no length’. An object with a weight of zero
pound is certainly weightless.
6.) Four Types of measurement:
a. Nominal –measurements possess only the property of identity. Ex. Color and sex.
b. Ordinal - measurements possess the properties of identity and order but do not have
the equality of scale property. Ordinal measurements are usually associated with
ranks. When students are ranked according to class performance, an order of 1st, 2nd,
3rd, …can be established. But these numbers can not be added because the distance
from 1st to 2nd may not be the same as the distance from 2nd to 3rd, and so on. Ex.
Social classes, military ranks, honor roll, and taste preferences.
c. Interval – measurements possess the properties of identity, order, and equality of
scale. Temperature and Intelligence Scores are examples of interval measurement.
The numbers associated with these variables have identity and order and they can be
added in the usual manner. However they do not have the property of absolute zero.
d. Ratio - measurements possess all the properties of identity, order, equality of scale
and absolute zero. This is the highest form of measurement. Ex. Length, weight.
A large number of statistical analysis tools are available for each type of measurement. It is
important that the statistical user have a good understanding of the type of data that is to be
processed in order that the chosen statistical tool is used properly.

7.) Four Methods of Collecting Data:


a. By Interview
b. By Questionnaires
c. By Direct Observation
d. By utilizing Existing Records - published or unpublished, primary(first hand and have
not been subjected to some transcription or condensation) or secondary (transcribed or
compiled from original sources))
2
8.) Sampling techniques. Usually, doing a census, that is studying the entire population, is not
always feasible because of limited resources like money and time. Oftentimes researchers
resort to do sample surveys. To make reliable inferences regarding the population, from
which the sample was taken, one should select a sample that is a good representation of the
population, that is, unbiased sample. Example is Simple Random Sampling.
9.) Score Data: After checking your students’ test papers you now have a set of data. They need
to be organized. Statistical organization of scores is a systematic arrangement or grouping of
scores. The purpose is to determine their significant meaning.

10.) Presentation of Scores: Scores may be presented by


a. Tabulating – is a process by tallying scores in a statistical table called talligram, a
contraction of tally and diagram. This table consists of columns for the units digit and
rows for the tens digit.
Example: Math 1 Prelim exam result.
Scores: 86 74 66 70 75 56 69 70 73 66 74 81
60 76 80 81 61 67 63 68 73 63 75
71 58 72 83 69 79 67 68 64 69 73
69 78 88 62 76 72 65 66 70 73 61
78 84 77

Table 1.
Units digit 0 1 2 3 4 5 6 7 8 9 T
Tens digit
8 1 11 1 1 1 1 7
7 111 1 11 1111 11 11 11 1 11 1 20
6 1 11 1 11 1 1 111 11 11 1111 19
5 1 1 2
T 5 5 3 7 4 3 7 3 6 5 48

b. Ordering – The data is arranged in descending (highest to lowest) or ascending


(lowest to highest) order writing each score as many times as it occurs. Ordered
arrangement of scores is a prerequisite to ranking of scores
c. Ranking – is assigning a position or rank to an observation, score, or individual in
relation to the others in the group according to some characteritics such as magnitude,
quality, worth, chronology or importance. It is usually indicated by a number, thus ,
the highest score may be given a rank of 1, the second a rank of 2, and so on. In the
case of chronological ranking, the item occurring first is ranked 1, the second is
ranked 2 , and so on.
Ranking a small number of scores:
Procedure:
1. Arrange scores in descending order (from highest to lowest). Write each
scores as many times as it occurs in one column. This is the first column.
2. Number each score consecutively from 1 to n where n equals the number of
scores. This is the second column.
3. On the third column write the ranks of each scores.
a. The rank of a score occurring once is the same as its consecutive number.
b. To find the rank of a score occurring two or more times, add the first and
the last consecutive numbers of the score and divide the sum by two. The
result is the rank.
Ex. Find the ranks of the following scores in English 1 prelim exam. What is the rank
of 61?

3
Scores: 72 45 61 69 45 61 37 69 45 88 41
Answer:
Scores Consecutive #s Rank
88 1 1
72 2 2
69 3 3.5
69 4 3.5
61 5 5.5
61 6 5.5
45 7 8
45 8 8
45 9 8
41 10 10
37 11 11 //
The rank of 61 is 5.5
For large number of scores:
Procedure:
1. Tabulate the scores in a talligram.
2. Use the talligram as an aid in arranging the scores in descending order in a vertical
column writing a score only once even if it occurs two or more times. This is the
first column.
3. Number the scores consecutively. The consecutive numbers of a score must be
equal to the number of times it occurs. This is the second column.
4. Assign the ranks on the third column by following rule 3 for small number of
scores.

d.Grouping Scores into a Class frequency distribution:


Class frequency distribution is the process of placing scores in scaled groups called
classes. A class is a group of a specified number of consecutive single scores or measures.
The specified number of consecutive scores that a class contains is called the class width. The
lower end-number of the class is called “lower limit” and the upper-end of the class is called
“upper limit”. Example: The class 36 – 41 has the lower limit of 36 and the upper-limit of
41. The lower class boundary is 35.5 and the upper class boundary is 41.5. The class width is
6 because there are six consecutive single scores contained in the class: 36, 37, 38, 39, 40,
and 41.
Ex:The following are test scores in Math 31. Construct the class frequency
distribution table.
86 74 66 70 75 57 69 70 73 66 60 81
90 62 76 72 61 58 63 68 73 63 75 71
63 66 74 73 78 61 78 72 67 83 59 67
68 64 59 73 69 76 80 81 79 84 77 68
Steps: 1. Find the range R. Range = Highest score – Lowest score. Ex. R = 90 – 57 = 33
2. Determine/ estimate the number of intervals/classes, k.
Formula: k = √n Ex. k= √48 = 6.9 = 7 .
3. Find the class width ( c ) or the width of the interval. Divide the range by the tentative
number
of classes and raise the quotient to the next higher integer if there is any fractional part
Ex. c = R/k = 33 / 7 = 4.7 = 5.
4. Find the lowest limit of the classes. This is the number equal to or next lower than the
lowest score. ( The lowest score must be contained in the lowest class while the highest
score must be contained in the highest class.)
5. Find the lower and upper limits of the classes.
6. Tally the scores.
4
7. Write the frequencies, class boundaries, cumulative frequencies.
Table 2.
-------------------------------------------------------------------------------------------------------------------------
-----
Class Tally f Class Mid pts Cum Freq
Interval Boundaries (Class marks) (lesser than)
87 – 91 1 1 86.5 – 91.5 89 48
82 – 86 111 3 81.5 – 86.5 84 47
77 – 81 1111111 7 76.5 – 81.5 79 44
72 – 76 111111111111 12 71.5 – 76.5 74 37
67 – 71 1111111111 10 66.5 – 71.5 69 25
62 – 66 11111111 8 61.5 – 66.5 64 15
57 – 61 1111111 7 56.5 – 61.5 59 7
n = 48
USES of a Class Frequency Distribution (CFD):
1. It shows whether the distribution is normal or skewed. It also indicates the relative difficulty
of the test from which the scores are taken.
If most of the scores are massed at the middle portion of the frequency table, the
distribution is normal and the test is of moderate difficulty.
If most of the scores are gathered at the upper portion of the distribution, the distribution
is skewed to the left or skewed negatively. The test is relatively easy for the students.
If the majority of the scores are clustered at the lower part of the frequency table, the
distribution is skewed to the right or skewed positively and the test is relatively difficult.
2. It facilitates the computation of statistical measures such as the median, mean, quartiles,
percentiles, standard deviation, etc.
3. Grouping also minimizes space.

e.) Graphing: Graphical Presentation of Class Frequency Distributions:


1. Frequency polygon – It is a linear graph representing the frequencies of the midpoints of
the classes in a class frequency distribution and which forms a polygon when its ends are joined with
the baseline.

Construction Procedure:
On the XY plane reflect the mid points of the classes on the X axis and the class
frequencies on the Y axis. Locate and connect the intersection points of the mid points of the
classes and their corresponding frequencies. To close the curve connect the end points to the
baseline along the two extended midpoints below and above the distribution.

Uses of the frequency polygon.


It displays the distribution of the scores. If it is more or less symmetrical with the
highest
point approximately in the middle, the distribution is more or less regular or normal and the
test is of moderate difficulty. If the graph is asymmetrical and it is higher at the left than at
the right, the distribution is said to be skewed to the right and the test is relatively difficult. If
the graph is higher at the right than at the left side, the distribution is said to be skewed to the
left and the test is relatively easy.

2. Bar Chart
The bar chart is a graph consisting of bars or rectangles placed side-by-side vertically
representing the frequencies of classes in a class frequency distribution. The width of the bar or
rectangle is the width of the interval represented by the class limits in the horizontal axis. The length
of the rectangle represented by the class frequency is drawn in the vertical axis.

5
3. Histogram
The histogram is a graph which is a close resemblance of the bar chart. The bar chart uses
the class limits for the horizontal axis while the histogram employs the class boundaries. Using the
class boundaries eliminates the spaces between the rectangles giving it a solid appearance. Usually,
but not necessarily, the sides of the adjacent sides of the bars are omitted so that the graph also looks
like a polygon.

4.Pie Chart and Pictograph


Categorical variables are often described graphically by using a pie chart, a circle which is
divided into pie-shaped sectors. The angle of a sector is proportional in size to the frequencies or
percentages but it is advisable to convert the frequency table into percentages.
The pictograph uses pictures usually to represent a certain population.
Both give a more dramatic and livelier appearance or presentation of the distribution of data.

Descriptive Measures:
To investigate a set of data, it is useful to define measures that describe its important features.
We have Measures of Central Tendency and other locations, and Measures of Variability.

1.)Measures of Central Tendency


Central tendency is the tendency of observations or cases or scores to cluster about a point. A
measure of central tendency is defined as a statistics calculated from a set of observations or scores
and designed to typify or represent the whole population. It is either an average (mean), a midpoint
(median) , or the most frequent score in a distribution of scores (mode). The most common central
measures are the mean, the median and the mode.

A.) The mean, median, and mode of UNGROUPED data (raw or real data/scores):

a. The Mean – It is the most popular and the most reliable measure of central tendency. It is the
average of a set of scores or observations.

The mean is the sum of a set of scores (or observations) divided by the total number of scores
in the set. The formula is:
x=
∑x where: x are the scores,
n
n is the number of scores, and
∑ is the symbol for summation

The Weighted Mean is the average computed for k quantities x1, x2, …, xk where more
significance is attached to some scores than to others, weights w1, w2, …, wk are assigned to the k
quantities respectively. These weights represent measures of relative importance to the individual
scores. The formula is:
x=
∑ x i wi where: xi are the scores,
∑ wi
wi are the weights of each score
∑ is the symbol for
summation

Ex: 1.) Find the mean of the following test scores in Math 1.
71, 68, 68, 58, 55, 52, 52, 45, 38, 38, 38, 30, 25, 25
Solution:
The mean is

6
x=
∑x = 71 + 68 + 68 + 58 + 55 + 52 + 52 + 45 + 38 + 38 + 38 + 30 + 25+25
n
14

= 663
14
= 47.36
Ex: 2.) When Nikka Sanchez was in her fourth year high school her final grades in Math 4,
English 4, Filipino 4, Physics, Chemistry, Journalism, World History, and Research
were 78, 89, 90, 79, 83, 93, 89, and 95 respectively. If each subject had the equivalent
units of 2, 1, 1, 1.5, 1.5, 1, 1, and 1.5 respectively what was her weighted average
grade when she graduated?
Solution:
78(2) + 89(1) + 90(1) + 79(1.5) + 83(1.5) + 93(1) + 89(1) + 95(1.5)
x=¿ 2+1+1+1.5+1.5+1+1+1.5

x = 902.5
10.5
x = 85.95

b. The Median ( ~ x ) is a point in a scale which divides the scale into two equal parts. A scale is a
succession of numbers, steps, classes, degrees, gradations, or categories with a fixed interval. The
median is just the middle value of a set of observations arranged in an increasing or decreasing order
of magnitude. It is the middle score or value when the number of observations is odd, or the
arithmetic mean of the two middle values when the number of observations is even. It is the value
such that half of the observations fall above it and the other half fall below it.

Formula:

x(n+1)/2 if n is odd
~
x= {
½ ( x n+ xn +1 ) if n is even
2 2

Ex: 1.) Find the median of the following test scores in Math 1.
25, 71, 52, 68, 58, 55, 38, 52, 45, 38, 38, 30, 25, 68

Solution: Arranging first the data from highest to lowest or vice-versa.


71, 68, 68, 58, 55, 52, 52, 45, 38, 38, 38, 30, 25, 25

Since n = 14 (which is even), we have two middle scores.


Hence, the median,
~
x = ½( x14 + x14+1) = ½( x7 + x7+1) = ½( x7 + x8) = ½ (52+45) = 48.5
2 2

Ex 2.) Find the median of the following set of scores in ED 103PRT.


24, 15, 13, 23, 27, 18, 16
Solution: Arranging the data from highest to lowest we have 27, 24, 23, 18, 16, 15, 13.
Since n = 7 (which is odd), we have a middle score. Hence
~
x = xn+1 = x7+1 = x4 = 18.
2 2

c. The Mode ( ^x ). It is the most frequently occurring score in a set of data or the score with the
highest
7
frequency. A set of score data can have one mode (unimodal), two modes (bimodal), three
modes
(trimodal), or more, or no mode at all. The mode is the poorest measure of central tendency.

Ex. The mode of example 1 is 38. → ^x = 38

Example 2 has no mode

B.) The mean, median, and mode of GROUPED data (class frequency distribution):

a.) The Mean : The class mark or midpoint method

Formula:
__

x=
∑ f i xi where fi is the class frequency of the ith class interval
n
xi is the class mark or the midpoint of the ith class
interval

Ex. Table 2.
---------------------------------------------------------------------------------
Class fi xi fixi
Interval (Mid pts/class marks)
87 – 91 1 89 89
82 – 86 3 84 252
77 – 81 7 79 553
72 – 76 12 74 888
67 – 71 10 69 690
62 – 66 8 64 512
57 – 61 7 59 413
_______________________________________________________
n = 48 ∑ fixi = 3397

x=
∑ f i x i = 3397 =70.77
n 48

b.) The Median.


Formula:
~
x = Lm + ( n/2 – cf<) c . where Lm = lower class boundary of the median class
f n = total frequency or total number of
observations
cf< = cumulative frequency equal to or next lower than
the
n/2
c = class interval
f = frequency of the median class
Example.
Table 2.
-------------------------------------------------------------------------------------------------------------------------
-----
8
Class Tally f Class Mid pts cf<
Interval Boundaries (Class marks) (cum freq lesser
than)
87 – 91 1 1 86.5 – 91.5 89 48
82 – 86 111 3 81.5 – 86.5 84 47
77 – 81 1111111 7 76.5 – 81.5 79 44
72 – 76 111111111111 12 71.5 – 76.5 74 37
67 – 71 1111111111 10 66.5 – 71.5 69 25
62 – 66 11111111 8 61.5 – 66.5 64 15
57 – 61 1111111 7 56.5 – 61.5 59 7
_________________________________________________________________________________
___
n = 48
Solution:
n/2 = 48/2 = 24 → median class is (67 – 71)
cf< = 15, the cum freq lower than 24
Lm = 66.5
f = 10
c=5
f = 10
~
x = Lm + ( n/2 – cf<) c = 66.5 + ( 24 – 15) 5 = 66.5 + 45 = 66.5 + 4.5 = 71 .

f 10 10
c.) The Mode.
Crude Mode: x^ = Lm + C where c is the class interval and Lm is the
2 lower class boundary of the modal class
Refined Mode: ^x = 3~
x – 2x

2.) Other Measures of location:


Other measures of location that describe or locate the non-central position of a set of data are
referred to as quantiles or fractiles . Most common fractiles are known as percentiles, deciles, and
quartiles.
Percentiles- are values that divide an ordered set of observations into 100 equal parts denoted
by P1, P2, …, P99 such that 1% of the data falls below P1, 2% of the data falls below P2, …, and 99%
of the data falls below P99.
Deciles are values that divide an ordered set of observations into 10 equal parts denoted byD1,
D2, …, D9 such that 10% of the data falls below D1, 20% of the data falls below D2, …, and 90% of
the data falls below D9.
Quartiles are values that divide an ordered set of observations into 4 equal parts denoted
byQ1, Q2, …, Q3 such that 25% of the data falls below Q1, 50% of the data falls below Q2, …, and
75% of the data falls below Q3.

A.) From Ungrouped Data:


To solve for percentiles, deciles or quartiles from ungrouped data follow the following procedure:
1. Arrange the data in an increasing order of magnitude (ascending order).
2. Solve for the value of L where
L = mn / 100 for percentiles where m is the location of the
percentile,
L = mn / 10 for deciles decile, or quartile
L = mn / 4 for quartiles n is the number of observations

3. If L is an integer, the desired fractile gets the average of the Lth and the (L+1)th
observation.
9
If L is fractional, the desired fractile gets the next higher integer to find the required location.
The fractile corresponds to the value in that location.
Examples:
1.) Find P63, D8, and Q1 in the following set of score data in Bio 1.
95, 34, 45, 67, 56, 58, 76, 87, 91, 39, 56, 78

Solution:
Data arranged in ascending order: 34, 39, 45, 56, 56, 58, 67, 76, 78, 87, 91, 95. n = 12

a.) P63: L = 63(12) / 100 = 7.56 → 8.


This means that the 8th value in the set of data is the 63rd percentile. Therefore, P63 = 76.
This means that 63% of the data falls below 76.

b.) D8: L = 8(12) /10 = 9.6 → 10.


This means that the 10th value in the data is the 8th decile. Therefore, D8 = 87 which means
that 80% 0f the data falls below 87.

c.) Q1: L = 1(12) / 4 = 3


This means that the 1st quartile is the average between the 3rd and the 4th value in the data.
Hence, Q1 = (45 + 56) / 2 = 50.5 . This further means that 25% of the data falls below 50.5

B.) From Grouped Data:

a.) Percentile: The computing formula is Pm = Lm + ( mn/100 – cf<) c .


fm

b.) Decile: The computing formula is Dm = Lm + ( mn/10 – cf<) c .


fm

c.) Quartile: The computing formula is Qm = Lm + ( mn/4 – cf<) c .


fm
Examples:
1.) Using the data given in Table 2, compute for the following:
a.) P43
b.) D9
c.)Q3
Table 2.
----------------------------------------------------------------------------
Class f Class Cum Freq
Boundaries (cf<)
87 – 91 1 86.5 – 91.5 48
82 – 86 3 81.5 – 86.5 47
77 – 81 7 76.5 – 81.5 44
72 – 76 12 71.5 – 76.5 37
67 – 71 10 66.5 – 71.5 25
62 – 66 8 61.5 – 66.5 15
57 – 61 7 56.5 – 61.5 7
____________________________________________________
n = 48

a.) P43: mn/100 = 43(48)/100 = 20.64 → class (67 – 71)


cf< = 15
Lm = 66.5
10
fm = 10
c=5
Hence, P43 = 66.5 + (20.64 – 15) 5 = 69.32 . This means that 43% of the data
falls
10 below 69.32

b.) D9: mn/10 = 9(48) /10 = 43.2 → class (77 -81)


cf< = 37
Lm = 76.5
fm = 7
c=5
Hence, D9 = 76.5 + (43.2 – 37) 5 = 80.93 This means that 90% of the data
falls
7 below 80.93

c.) Q3: mn/4 = 3(48)/4 = 36 → class (72 – 76)


cf< = 25
Lm = 71.5
fm = 12
c=5
Hence, Q3 = 71.5 + (36 – 25) 5 = 76.08 This means that 75 % of the data
12 falls below 76.08

3.)Measures of Variability / Dispersion / Spread


The measures of central tendency characterize only the location at which a given set of data
clusters to. To further give an adequate description on how the data cluster around or scatter away
from the central point, we need to know another important measures that also characterize a given set
of data, we call it measures of variability or dispersion or spread.

Example 1. The same group of 8 students took their final exams in English 1 and Math 1.
Their scores and the mean scores are
__
English 1 : 75, 77, 80, 80, 81, 82, 83, 84 x́ = 80.25
__
Math 1 : 60, 65, 76, 82, 83, 85, 95, 96 x́ = 80.25

The two sets of data have the same means equal to 80.25 but they are not identical. Scores in
English 1 clustered close to the mean while scores in Math 1 are more dispersed about its mean. The
measures used to describe this variation are the range, the variance, the standard deviation, the
quartile deviation, and the coefficient of variation. The quartile deviation and coefficient of variation
will no longer be discussed in this material. Our discussion will be limited only to the three common
measures of variability (Range, variance and standard deviation) of ungrouped data.

1.) The range is the easiest to compute but it is the poorest measure of dispersion. The larger
the range, the more dispersed is the data.

Range = Highest Score – Lowest Score. → R = Hs – Ls

From Ex. 1: Eng 1: R = 84 – 75 = 9


Math 1: R = 96 – 60 = 36
In our example, test results in Math 1 is more variable than in English 1. Dispersion of scores
in Math 1 is wider than that in English 1. __

11
2.) Another measure of variability is the variance. It is always non-negative and thus it can
never be negative. A large variance corresponds to a highly dispersed set of values. It makes
use of all observations in the data set. Its unit of measure is the square of the unit of measure
of the given set of values.

(∑ )
n n 2

n∑ x −
Variance, s2 . Formula: s2= ∑
2 2
(x−x) i xi
or 2 i=1 i=1
n−1 s=
n( n−1)
Example 2. The number of hours spent by ten students (out of 40 in Bio 1) in studying per
day were recorded as follows: 5, 8, 4, 2, 2, 2, 2, 5, 3, and 4. Find the variance.

Solution:

Total
xi 5 8 4 2 2 2 2 5 3 4 37
2
xi 25 64 16 4 4 4 4 25 9 16 171
So we have:
❑ ❑


x i=37 and ∑

2
x i =171

10 ( 171 )−( 37 )2
s2= =3.79 sq.hrs.
10 (9)

3.) Standard Deviation, s . Formula: s= √ s2


The standard deviation is the positive square root of the variance.
Hence, the standard deviation of example 2 above is

s= √ s =√ 3.79=¿1.95
2

12

You might also like