Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

DEFINITION OF STATISTICS

STATISTICS – A collection of methods for planning experiments, obtaining data and then analyzing,
interpreting and drawing conclusions based on the data.

Aspects
 Theoretical aspect deals with the development, derivation and proof of statistical theorems, formulas,
rules and laws.
 Applied – involves the application of those theorems, rules and laws to solve real-world problems

In order for a statistician to gain information, he collects for VARIABLES used to describe an event. DATA
are the values that the variables can assume. VARIABLES whose values are determined by chance are
called RANDOM VARIABLES.

TYPES OF VARIABLES
1. Qualitative Variables – words or codes that represent a class or category.
Sex is a qualitative variable where male can be assigned the value 1 while female is assigned 0
as the case may be. In surveying the opinion of respondents on a given issue, the respondents may be
“for,” ‘against,” or “undecided.” Arbitrary values can be assigned to these responses for computational
purposes.

2. Quantitative Variables – numbers that represent an amount or a count


Classification
 Discrete – can be assigned values such as 0, 1, 2, 3 …
 Continuous - can assume all values between any two specific values like 0.5, 1.2 etc.

Example:
The temperature of a given person is continuous variable the number of persons in a room is a discrete
variable.

FIELDS OF STATISTICS

1. Descriptive Statistics – This is statistics concerned with the collection, classification, and presentation
of data. This is done by determining percentages, using measures of central tendencies, measures of
variability, measures of location, skewness, and kurtosis.

2. Inferential Statistics – This is concerned with the analysis and interpretation of data. Tools such as the
z-test, t-test, F-test, analysis of variance, chi square, and correlation are utilized under this field.

METHODS OF COLLECTING DATA

1. Direct Method – collected the use of interviews


2. Indirect Method – collected through the use of questionnaires
3. Observation
4. Experimentation – gathered through experiments in laboratories and classrooms
5. Registration – acquired from private and government agencies such as NSO

WAYS OF PRESENTING DATA

1. Textual form – data and information are presented in paragraph and narrative form.
2. Tabular form – quantitative data are summarized in rows and columns.
3. Graphical form – data are presented in charts, graphs or pictures

LEVEL OF MEASUREMENT

1. Nominal level – characterized by data that consist of names, labels, or categories only.
Example
Classifying survey objects by gender (male, female), marital status (single, married, separated)
and employment (business, construction, engineering, education etc.
2. Ordinal Level – data measured can be ordered or ranked
Example
Rank of teachers – Instructor, Assistant Professor, and Associate Professor
Winners of marathon – ranked first, second and third
3. Interval Level – precise differences between measures but there is no true zero
Example
Temperature 40°C and 50°C both have a meaningful difference of 10°C but 0°C does not mean
that there is no temperature.
*same as the ordinal level, with an additional property that we can determine meaningful amounts of
differences between the data.

4. Ratio Level – highest Level of measurement


- An interval level modified to include the inherent zero starting point
Example
Those used to measure height, weight, area and volume.

POPULATION AND SAMPLE

Population

This is a set of data consisting of all possible observations of a certain phenomenon. In the study of
The Factors Affecting the Academic Performance of the Students of the College of Teacher Education, every
college student of CTE belongs to the population. Oftentimes, the population is a big number, reaching to
thousands or millions. It is denoted by N.

Sample

This is a portion taken from a population, possessing identical characteristics. A sample is usually
taken in cases where it is not possible to take all the members of the population as subjects or respondents in
a study. It is represented by n.

Methods of Determining the Sample Size


Slovin’s Formula:
N
n=
1+ N e 2

where n is the sample size, N is the population, and e is the margin of error, which is usually from 0 to
0.1.

Example:
Determine the sample size in a study where the actual population is 450 if a margin of error of 0.05 is
desired.
450
n=
1+ ( 450 ) ( 0.05 )2
¿ 211.76
≈ 212

Lynch Formula:
N z2 p (1−p )
n= 2 2
N d + z p ( 1− p )
where z = 1.96, p = 0.50, d is the margin of error, n is the sample size, and N is the population.

Example:

Determine the sample in the previous example using the same margin of error.

( 450 ) (1.96 )2 ( 0.50 )( 1−0.50 )


n= 2 2
( 450 ) ( 0.05 ) + ( 1.96 ) ( 0.50 )( 1−0.50 )
¿ 207.24
≈ 207
Another Example:

A researcher wants to decide the number of respondents to be taken from the population of high first year
students of NLPSC with 5% margin of error

Population Sample
BSE 120
BEED 50
BCAED 30
BPEd 30
BSBA 350
BSOA 90
ABEL 40
BA POS 80
BSM 35
BSCS 120
BS Crim 150
1095

(1095 )( 1.96 )2 ( 0.50 ) ( 1−0.50 )


n= 2 2
1095 ( 0.05 ) + ( 1.96 ) ( 0.50 ) ( 1−0.50 )
≈ 284.38

≈ 284

n 284.38
N
=
1095
= 0.2597

BSE/BSCS 0.2597 x 120 = 31.16 BCAED/BPEd 0.2597 x 30 = 7.79

BEED 0.2597 x 50 = 12.99 BSBA 0.2597x350=90.90

BSOA 0.2597 x 90 = 23.37 ABEL 0.2597 x 40 = 10.38

BA POS 0.2597 x 80 = 20.78 BSM 0.2597 x 35 = 9.09

BS Crim 0.2597 x 150 = 38.96

Population Sample
BSE 120 31
BEED 50 13
BCAED 30 8
BPEd 30 8
BSBA 350 91
BSOA 90 23
ABEL 40 10
BA POS 80 21
BSM 35 9
BSCS 120 31
BS Crim 150 39
1095 284
MEASURES OF CENTRAL TENDENCY

Central Tendency determines a numerical value in the central region of a distribution of scores. It
refers to the center of a distribution of observations. There are three measures of central tendency: mean,
median and mode. These are used when the general or over-all performance of the class is compared to
other classes.

A. COMPUTING MEAN, MEDIAN AND MODE OF UNGROUPED DATA

1. MEAN
The mean is the most commonly used measure of central tendency. It can be affected by extreme
scores. It is stable, varies less from sample to sample. It is used if the most reliable measure is desired and
when there are a few with very high values and a few with very low values. The mean is the balance point of a
score distribution. When we speak of average, we always refer to the mean.

Population mean Sample mean

∑X ∑x
μ= x=
N n

Where N – total number of observations in the population

n – total number of observations in the sample

mean ( x ) =
∑ of the values
the number of values

Example 1: A researcher collects data on the ages of recipients of doctoral degree in science and
engineering, and his study yields the following:

37 37 24 28 43 44 36 41 33 27

Solution: The mean is determined by the sum of the ages and then dividing by the total number of
recipients.

37+37+24 +28+ 43+44 +36+41+33+ 27


Mean =
10

350
=
10

= 35
Determine the mean of the following: 24 , 25 , 33 ,50 , 53 , 66 , 78
24+25+ 33+50+53+66+78
x=
7
x=47

2. MEDIAN
The median is the midpoint of the data array. Before finding this value, the data must be arranged, from
least to greatest or vice versa. The median will either be a specific value or will fall between two values.
Example

1. Seven mothers were selected and given a blood pressure check. Their systolic pressures were
recorded:
135 121 119 116 130 121 131

Arrange: 116 119 121 121 130 131 135

Median (M d ) = 121

2. Eight novels were randomly selected and the numbers of pages were recorded as follows:
415 398 402 400 420 415 407 425

Arrange: 398 400 402 407 415 415 420 425

Median = (407 + 415)/2 = 411


3. MODE
The third measure of average is the mode. It is the value with the largest frequency. It is the value that
occurs most often in the data set. This is used when the quickest estimate of typical performance is wanted. A
distribution can be unimodal with one mode value, bimodal with two mode values and trimodal with three
mode values. In other words, it can have more than one mode.

Examples
1. The scores 1, 2, 3, 2, 4, 7, 9, 2 have a mode 2

2. The scores 1, 2, 3, 4, 5, 6 have no mode since no score is repeated

3. The scores 2, 3, 4, 4, 3, 2 have no mode since all the scores have the same appearance

4. The scores 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5 have the modes 3 and 4 since they both occur with the same
highest frequency (we refer to such data as bimodal)

5. The scores 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 5 have the modes 1, 2 and 3 with the same highest frequency and
is called trimodal.

Determine the mode:

1 5 6 9 11 15 17
2 5 7 9 12 15 17
4 5 7 9 12 15 18
4 6 8 12 10 16 18
4 6 9 12 11 16 18

FREQUENCY DISTRIBUTION TABLE


A frequency distribution is a table in which possible values for a variable are grouped into classes,
and the number of observed values which fall into each class is recorded.
For the construction of frequency table, the following steps should be followed.
Step 1: Determine the range. Range = Highest Score – Lowest Score.
Step 2: Determine the number of class intervals. The number of intervals is dependent on the number of
scores and the purpose of organizing the frequency table. The number of class intervals is usually taken
between 5 and 20, depending on the data.
When the number of class intervals has been decided, the size of the class interval (c) can be found by
dividing the range of the scores by the number of intervals wanted.
Step 3: Determine the limits of the bottom class interval. The class mark or class midpoint is in the middle of
the class interval
Step 4: Construct the table. The remaining class intervals are formed by increasing each interval by the size of
c until an interval is reached that includes the highest score.
Step 5: Tally the scores. The scores are counted one at a time and a tally mark is placed to the right of the
appropriate interval.
Step 6: Record the tallies under the column headed f (frequency). Sum the frequencies ¿

For example: 37, 46, 38, 27, 43, 40, 42, 31, 50, 30, 27, 39, 35, 43, 33, 33, 38, 38, 40, 20
Range: HS-LS= 50 – 20 = 30
30
c= =3
10
Class interval Tally Frequency (f) Class Boundary Class Mark
50 - 52 I 1 49.5 – 52.5 51
47 – 49 0 46.5 – 49.5 48
44 – 46 I 1 43.5 – 46.5 45
41 – 43 III 3 40.5 – 43.5 42
38 – 40 IIIII-I 6 37.5 – 40.5 39
35 – 37 II 2 34.5 – 37.5 36
32 – 34 II 2 31.5 – 34.5 33
29 – 31 II 2 28.5 – 31.5 30
26 – 28 II 2 25.5 – 28.5 27
23 – 25 0 22.5 – 25.5 24
20 – 22 I 1 19.5 – 22.5 21

B. Grouped Data
1. Mean
When data is grouped in a frequency distribution, the midpoint of each class is used as an
approximation of all values contained in the class. The symbol x represents the class mark or class midpoint.
The symbol f represents the observed frequency of values. The formulas for sample mean and population
mean are:

x=
∑ fx ∧μ= ∑ fx
n N

Example:
1. Consider the given frequency distribution of test scores in Algebra below, find the mean.
Scores 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49
Frequency 2 5 4 6 12 10 8 3

Solution:
Scores Frequency (f) Class Mark (x) fx
45 – 49 3 47 141
40 – 44 8 42 336
35 – 39 10 37 370
30 – 34 12 32 384
25 – 29 6 27 162
20 – 24 4 22 88
15 – 19 5 17 85
10 – 14 2 12 24
n = 50 ∑ fx=1590

x=
∑ fx
n
1590
¿
50
x=31.8

2. Median
For grouped data, the class which contains the median value has to determined first, and then the
position of the median within the class interval is determined by interpolation. The class which contains the
median is the first class for which the cumulative frequency equals or exceeds one-half the total number of
observations. Once this class is identified, the specific value of the median is determined by the formula:

( )
n
−cf b
2
median ( Md )=L1 + c
fm
Where:
L1=¿ Lower class boundary of the median class
n=¿total number of observations
cf b= cumulative frequency before the median class
f m=¿ frequency of the median class
c=¿ size of the class interval

Using the example above:


Scores Frequency (f) Class Mark (x) fx Cumulative
Frequency (cf)
45 – 49 3 47 141 50
40 – 44 8 42 336 47
35 – 39 10 37 370 39
30 – 34 12 32 384 29
25 – 29 6 27 162 17
20 – 24 4 22 88 11
15 – 19 5 17 85 7
10 – 14 2 12 24 2
n = 50 ∑ fx=1590
n 50
Highlighted row part is the median class: = =25
2 2

L1=¿ 29.5 n=¿50 cf b= 17 f m=¿ 12 c=¿ 5

( )
n
−cf b
2
Md=L1 + c
fm

( )
50
−17
2
Md=29.5+ 5
12
Md=29.5+ ( 0.67 ) 5
Md=29.5+ 3.33
Md=32.83

3. Mode
From a frequency distribution, the mode can be obtained from the formula:

Mo=L1 + ( d1
d 1 +d 2 )
c

Where:
L1=¿ lower class boundary of modal class
d 1 = difference between the frequency of the modal class and the frequency below it
d 2 = difference between the frequency of the modal class and frequency above it

Using the example above:


Scores Frequency (f) Class Mark (x)
45 – 49 3 47
40 – 44 8 42
35 – 39 10 37
30 – 34 12 32
25 – 29 6 27
20 – 24 4 22
15 – 19 5 17
10 – 14 2 12
n = 50

The modal class is the class with the highest frequency.

Mo=L1 +
( d1
)
d 1 +d 2
c

L1 = 29.5
d 1= 12 – 6 = 6
d 2=¿ 12 – 10 = 2
c=¿ 5
Mo=29.5+ ( 6 6+2 ) 5
Mo=29.5+ ( 0.75 ) 5
Mo=29.5+3.75
Mo=33.25

You might also like