Professional Documents
Culture Documents
Statistics MCT
Statistics MCT
STATISTICS – A collection of methods for planning experiments, obtaining data and then analyzing,
interpreting and drawing conclusions based on the data.
Aspects
Theoretical aspect deals with the development, derivation and proof of statistical theorems, formulas,
rules and laws.
Applied – involves the application of those theorems, rules and laws to solve real-world problems
In order for a statistician to gain information, he collects for VARIABLES used to describe an event. DATA
are the values that the variables can assume. VARIABLES whose values are determined by chance are
called RANDOM VARIABLES.
TYPES OF VARIABLES
1. Qualitative Variables – words or codes that represent a class or category.
Sex is a qualitative variable where male can be assigned the value 1 while female is assigned 0
as the case may be. In surveying the opinion of respondents on a given issue, the respondents may be
“for,” ‘against,” or “undecided.” Arbitrary values can be assigned to these responses for computational
purposes.
Example:
The temperature of a given person is continuous variable the number of persons in a room is a discrete
variable.
FIELDS OF STATISTICS
1. Descriptive Statistics – This is statistics concerned with the collection, classification, and presentation
of data. This is done by determining percentages, using measures of central tendencies, measures of
variability, measures of location, skewness, and kurtosis.
2. Inferential Statistics – This is concerned with the analysis and interpretation of data. Tools such as the
z-test, t-test, F-test, analysis of variance, chi square, and correlation are utilized under this field.
1. Textual form – data and information are presented in paragraph and narrative form.
2. Tabular form – quantitative data are summarized in rows and columns.
3. Graphical form – data are presented in charts, graphs or pictures
LEVEL OF MEASUREMENT
1. Nominal level – characterized by data that consist of names, labels, or categories only.
Example
Classifying survey objects by gender (male, female), marital status (single, married, separated)
and employment (business, construction, engineering, education etc.
2. Ordinal Level – data measured can be ordered or ranked
Example
Rank of teachers – Instructor, Assistant Professor, and Associate Professor
Winners of marathon – ranked first, second and third
3. Interval Level – precise differences between measures but there is no true zero
Example
Temperature 40°C and 50°C both have a meaningful difference of 10°C but 0°C does not mean
that there is no temperature.
*same as the ordinal level, with an additional property that we can determine meaningful amounts of
differences between the data.
Population
This is a set of data consisting of all possible observations of a certain phenomenon. In the study of
The Factors Affecting the Academic Performance of the Students of the College of Teacher Education, every
college student of CTE belongs to the population. Oftentimes, the population is a big number, reaching to
thousands or millions. It is denoted by N.
Sample
This is a portion taken from a population, possessing identical characteristics. A sample is usually
taken in cases where it is not possible to take all the members of the population as subjects or respondents in
a study. It is represented by n.
where n is the sample size, N is the population, and e is the margin of error, which is usually from 0 to
0.1.
Example:
Determine the sample size in a study where the actual population is 450 if a margin of error of 0.05 is
desired.
450
n=
1+ ( 450 ) ( 0.05 )2
¿ 211.76
≈ 212
Lynch Formula:
N z2 p (1−p )
n= 2 2
N d + z p ( 1− p )
where z = 1.96, p = 0.50, d is the margin of error, n is the sample size, and N is the population.
Example:
Determine the sample in the previous example using the same margin of error.
A researcher wants to decide the number of respondents to be taken from the population of high first year
students of NLPSC with 5% margin of error
Population Sample
BSE 120
BEED 50
BCAED 30
BPEd 30
BSBA 350
BSOA 90
ABEL 40
BA POS 80
BSM 35
BSCS 120
BS Crim 150
1095
≈ 284
n 284.38
N
=
1095
= 0.2597
Population Sample
BSE 120 31
BEED 50 13
BCAED 30 8
BPEd 30 8
BSBA 350 91
BSOA 90 23
ABEL 40 10
BA POS 80 21
BSM 35 9
BSCS 120 31
BS Crim 150 39
1095 284
MEASURES OF CENTRAL TENDENCY
Central Tendency determines a numerical value in the central region of a distribution of scores. It
refers to the center of a distribution of observations. There are three measures of central tendency: mean,
median and mode. These are used when the general or over-all performance of the class is compared to
other classes.
1. MEAN
The mean is the most commonly used measure of central tendency. It can be affected by extreme
scores. It is stable, varies less from sample to sample. It is used if the most reliable measure is desired and
when there are a few with very high values and a few with very low values. The mean is the balance point of a
score distribution. When we speak of average, we always refer to the mean.
∑X ∑x
μ= x=
N n
mean ( x ) =
∑ of the values
the number of values
Example 1: A researcher collects data on the ages of recipients of doctoral degree in science and
engineering, and his study yields the following:
37 37 24 28 43 44 36 41 33 27
Solution: The mean is determined by the sum of the ages and then dividing by the total number of
recipients.
350
=
10
= 35
Determine the mean of the following: 24 , 25 , 33 ,50 , 53 , 66 , 78
24+25+ 33+50+53+66+78
x=
7
x=47
2. MEDIAN
The median is the midpoint of the data array. Before finding this value, the data must be arranged, from
least to greatest or vice versa. The median will either be a specific value or will fall between two values.
Example
1. Seven mothers were selected and given a blood pressure check. Their systolic pressures were
recorded:
135 121 119 116 130 121 131
Median (M d ) = 121
2. Eight novels were randomly selected and the numbers of pages were recorded as follows:
415 398 402 400 420 415 407 425
Examples
1. The scores 1, 2, 3, 2, 4, 7, 9, 2 have a mode 2
3. The scores 2, 3, 4, 4, 3, 2 have no mode since all the scores have the same appearance
4. The scores 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5 have the modes 3 and 4 since they both occur with the same
highest frequency (we refer to such data as bimodal)
5. The scores 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 5 have the modes 1, 2 and 3 with the same highest frequency and
is called trimodal.
1 5 6 9 11 15 17
2 5 7 9 12 15 17
4 5 7 9 12 15 18
4 6 8 12 10 16 18
4 6 9 12 11 16 18
For example: 37, 46, 38, 27, 43, 40, 42, 31, 50, 30, 27, 39, 35, 43, 33, 33, 38, 38, 40, 20
Range: HS-LS= 50 – 20 = 30
30
c= =3
10
Class interval Tally Frequency (f) Class Boundary Class Mark
50 - 52 I 1 49.5 – 52.5 51
47 – 49 0 46.5 – 49.5 48
44 – 46 I 1 43.5 – 46.5 45
41 – 43 III 3 40.5 – 43.5 42
38 – 40 IIIII-I 6 37.5 – 40.5 39
35 – 37 II 2 34.5 – 37.5 36
32 – 34 II 2 31.5 – 34.5 33
29 – 31 II 2 28.5 – 31.5 30
26 – 28 II 2 25.5 – 28.5 27
23 – 25 0 22.5 – 25.5 24
20 – 22 I 1 19.5 – 22.5 21
B. Grouped Data
1. Mean
When data is grouped in a frequency distribution, the midpoint of each class is used as an
approximation of all values contained in the class. The symbol x represents the class mark or class midpoint.
The symbol f represents the observed frequency of values. The formulas for sample mean and population
mean are:
x=
∑ fx ∧μ= ∑ fx
n N
Example:
1. Consider the given frequency distribution of test scores in Algebra below, find the mean.
Scores 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49
Frequency 2 5 4 6 12 10 8 3
Solution:
Scores Frequency (f) Class Mark (x) fx
45 – 49 3 47 141
40 – 44 8 42 336
35 – 39 10 37 370
30 – 34 12 32 384
25 – 29 6 27 162
20 – 24 4 22 88
15 – 19 5 17 85
10 – 14 2 12 24
n = 50 ∑ fx=1590
x=
∑ fx
n
1590
¿
50
x=31.8
2. Median
For grouped data, the class which contains the median value has to determined first, and then the
position of the median within the class interval is determined by interpolation. The class which contains the
median is the first class for which the cumulative frequency equals or exceeds one-half the total number of
observations. Once this class is identified, the specific value of the median is determined by the formula:
( )
n
−cf b
2
median ( Md )=L1 + c
fm
Where:
L1=¿ Lower class boundary of the median class
n=¿total number of observations
cf b= cumulative frequency before the median class
f m=¿ frequency of the median class
c=¿ size of the class interval
( )
n
−cf b
2
Md=L1 + c
fm
( )
50
−17
2
Md=29.5+ 5
12
Md=29.5+ ( 0.67 ) 5
Md=29.5+ 3.33
Md=32.83
3. Mode
From a frequency distribution, the mode can be obtained from the formula:
Mo=L1 + ( d1
d 1 +d 2 )
c
Where:
L1=¿ lower class boundary of modal class
d 1 = difference between the frequency of the modal class and the frequency below it
d 2 = difference between the frequency of the modal class and frequency above it
Mo=L1 +
( d1
)
d 1 +d 2
c
L1 = 29.5
d 1= 12 – 6 = 6
d 2=¿ 12 – 10 = 2
c=¿ 5
Mo=29.5+ ( 6 6+2 ) 5
Mo=29.5+ ( 0.75 ) 5
Mo=29.5+3.75
Mo=33.25