Professional Documents
Culture Documents
Statistics Lecture
Statistics Lecture
Course Requirements
Reaction Paper (Film Clip Analysis)
Reaction Paper/ Film Clip Analysis
1
Statistics defined . . . Main Divisions
Descriptive Statistics
• STATISTICS is a collection of methods
for planning experiment, obtaining data, - summarize or describe the
important characteristics of a known
and then organizing, summarizing, set of population data
presenting, analyzing, interpreting and
drawing conclusions based on the data. Inferential Statistics
2
Qualitative vs. Quantitative Data Discrete vs Continuous Data
• Qualitative (categorical or attribute) • Discrete data result from either a finite number of
data can be separated into different possible values or a countable number of possible
categories that are distinguished by values (that is, the number of possible values are
0, 1, 2, or more)
some non – numerical characteristics
• Continuous data result from infinitely many
• Quantitative data consists of numbers
possible values that can be associated with points
representing counts or measurements on a continuous scale in such a way that there are
no gaps or interruptions
• Dependent variable – the variable that is being • The nominal level of measurement is
affected
characterized by data that consists of
names, labels or categories only. The data
- the variable that is being
cannot be arranged in an ordering scheme
explained
3
Ordinal Level of Measurement Interval Level of Measurement
• The ordinal level of measurement involves • The interval level of measurement is like the
data that may be arranged in some order, but ordinal level, with the additional property that
differences between data values are either meaningful amounts of differences between data
meaningless or cannot be determined. can be determined. However, there are no inherent
(natural) zero starting point
• Examples:
• Examples:
good, better or best speakers; 1 star, 2
body temperature, year (2007, 2008, 2013, etc)
star or 3 star movie; rank of an employee
4
The Mean
Measures of
Central
Tendency
(UNGROUPED
• Two Forms
DATA) – Simple mean
– Weighted mean
The Mean
Arithmetic Mean (Mean)
If you have a
“balancing point” of a set of scores Population Sample
5
Example:
Simple Arithmetic Mean
Consider the following data set:
Where:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
x = an individual
Solution:
X
score
X n = the number of
X
X = 1+2+3+4+5+6+7+8+9+10
n scores/cases n 10
Sigma or x= sum of
the individual score
values Mean = 5.5
Example: Solution:
• The following data represents the ages of the mothers • To obtain the mean age of the mothers of the Grade 1,
of Paulinian Graders randomly selected from four we have
different grade levels who attended a session on
Counseling. What is the mean age of the mothers per
X=35+37+45+54+39+48
grade level?
6
= 258
• Grade 1: 35, 37, 45, 54, 39, 48
6
• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
X=43
• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
**This means that the mothers of the Grade 1 pupils are relatively young.
• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47
6
Example: Answers:
• Find the mean of the other grade levels. Round off
your answers to the nearest hundredths. • Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
ANSWER: 53.73
• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56 • Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47 ANSWER: 50
Example:
Weighted Mean • The following are the responses of 30 randomly chosen
respondents in one item of a research questionnaire.
7
Solution: Interpretation of Values
• To obtain for the weighted response, we have
Range Verbal Description
X = 5(7) +4(11) + 3(9) +2(2) + 1(1) 4.20 – 5.00 Very strongly agree
30 3.40 – 4.19 Strongly agree
= 111
2.60 – 3.39 Agree
30
1.80 – 2.59 Disagree
X = 3.70 strongly agree
1.00 – 1.79 Strongly disagree
Exercise: Example:
• The following are the grades of one student one
• Construct a likert scale to interpret items of a summer term.
questionnaire with weights 1 – 4.
Subject No. of Units Grade
8
sum of the deviations about the mean is zero
Characteristics of the Mean
(–1)+(–2)+(–2)+1+4=0
an interval statistic B
A C D E
calculated average
3 4 5 6 7 8 9
value is determined by every
(+1)
case in the distribution
Median
Median
the value at which 1/2 of the ordered scores fall above
and 1/2 of the scores fall below
the value that lies in the middle after ranking
all the scores n = odd n = even
positional measure 12345 1234
the midpoint or
the 50th Median = 3 Median = 2.5
percentile of a
distribution
9
Example Example:
I am the 4th
observation. I 5.40 1.10 0.42 0.73 0.48
am the median. 1.10
0.42 0.48 0.73 1.10 1.10
5.40
Example
5.40 1.10 0.42 0.73 0.48 1.10 0.66 not affected by extreme values
0.42 0.48 0.66 0.73 1.10 1.10 5.40
(in order - odd number of values) can be subjected to a few
mathematical computations
exact middle Characteristics less widely used than the mean
MEDIAN is 0.73
of the Median represents a typical score
10
Exercise
Mode
• The following data represents the ages of the mothers
of Paulinian Graders randomly selected from four the value which occurs most frequently in a given data
different grade levels who attended a session on set
Counseling. What is the median of the ages of the does not involve any calculation or ordering of data
mothers per grade level?
Example Examples
Consider the following data set:
1 5 b. 27 27 27 55 55 55 88 88 99
Bimodal - 27 & 55
2 7 c. 1 2 3 6 7 8 9 10
3 3 No Mode
4 8
5 7
11
Characteristics of
the Mode Which is best?
an inspection average
Mode Quick and easy to May not be representative
most frequently occurring value calculate. of the whole sample
When to use . . .
Measures of
Central
Mean -an interval interpretation is needed Tendency
(GROUPED
-the value of each score is desired
DATA)
-further statistical computation is expected
12
The Mean The Mean
i.) Classmark method ii.) Coded – deviation method
fxm fd i
X= n
X = AM + n
Where: Where:
Xm – class mark / class midpoint AM – assumed mean (Xm of where the zero deviation is set)
f – frequency f – frequency
n – number of cases / observations d – deviation
n – number of cases / observations
24 – 26 3 f
21 – 23 12
18 – 20 10 Where:
15 – 17 6
XLB – lower boundary of the median class
12 – 14 6
cfp – cumulative frequency preceding the median class
9–11 5
n – number of cases
6– 8 5
13
The Mode Exercise
**Find the mean, median and mode of the
following data set:
Mo = XLB + ∆1 i X F
56–62 4
∆1 +∆2 49–55 9
Where: 42–48 12
21–27 8
and frequency below it
14–20 6
∆2 – difference between frequency of the modal class
and frequency above it 7–13 4
i – class size/width
14
The Decile The Percentile
k n
kn
- cfp - cfp
10 i 100 i
Dk = XLB + f
Pk = XLB + f
Where: Where:
XLB – lower boundary of the decile class XLB – lower boundary of the percenttile class
cfp – cumulative frequency preceding the decile class cfp – cumulative frequency preceding the percentile class
n – number of cases n – number of cases
42–48 10 1. range
35–41 12 2. mean absolute deviation
28–34 10 3. semi – interquartile range/ quartile deviation
21–27 8 4. variance
14–20 6
5. standard deviation
7–13 4
15
Formulas (Ungrouped Data) Formulas (Ungrouped Data)
1. Range 4. Variance
R =HOV – LOV
2 (X X )2
s = n 1
2. Mean absolute deviation
/XX/
MAD=
n 5. Standard deviation
3. Semi – interquartile range/ quartile
deviation QD=Q3–Q1 s = s2
2
Exercise: Application:
• Given the following data, find the range, MAD, • Two seemingly equally excellent students are
variance and the standard deviation. vying for an academic honor where only one must
20, 26, 40, 39, 35 have to be chosen to get the award. The following
are their grades which are used as a basis for giving
the award.
•Student A: 90, 92, 92, 94, 95
•Student B: 90, 91, 93, 94, 95
16
Guiding Principle Formulas (Grouped Data)
1. Range
• The lesser the value of the measure, the R=HOV–LOV
more consistent, the more homogenous and
the less scattered are the observations in the 2. Mean absolute deviation
f / Xm X /
set of data. MAD =
n
3. Semi – interquartile range/ quartile deviation
QD=Q3–Q1
2
s 2 2. MAD 4. variance
n 1
X F
56–62 6
49–55 9
5. Standard deviation 42–48 10
35–41 12
s = s2 28–34 10
21–27 8
14–20 6
7–13 4
17
Tests of Hypothesis Kinds of Hypotheses
1. Null Hypothesis (Ho)
Hypothesis • It serves as the working hypothesis
•A statement or tentative theory which aims to • It is that which one hopes to accept or reject
explain facts about the real world •An • It must always express the idea of no
educated guess significant difference
•It is subject for testing. If it is found to be
statistically true, it is accepted. Otherwise, it gets 2. Alternative Hypothesis (H1 or Ha)
rejected. • It generally represents the hypothetical
statement that the researcher wants to prove.
18
Type I and Type II Errors
A Type I error is the mistake of rejecting the null
hypothesis when it is true.
The symbol ∞ (alpha) is used to represent the probability of a
type I error.
A Type II error is the mistake of failing to reject the null
hypothesis when it is false.
The symbol (beta) is used to represent the probability of a
type II error.
Critical Region
Level of Significance The critical region (or rejection region) is the set of all values of
the test statistic that cause us to reject the null hypothesis.
The probability of making Type I error or alpha
Region of
error in a test is called the significance level of the rejection
test. The significance level of a test is the maximum
value of the probability of rejecting the null Region of
acceptance
hypothesis (Ho) when in fact it is true.
19
Critical Value P - Value
A critical value is any value that separates the
critical region (where we reject the null The P-value (probability value) is the probability of
hypothesis) from the values of the test statistic getting a value of the test statistic that is at least as
that do not lead to rejection of the null extreme as the one representing the sample data,
hypothesis, the sampling distribution that assuming that the null hypothesis is true. The null
applies, and the significance level . hypothesis is rejected if the P-value is very small,
such as 0.05 or less.
20
Right – tailed Tests Left – tailed Tests
Given: Given:
H0:= ; H1:> H0:= ; H1:<
21
Steps in Hypothesis Testing Steps in Hypothesis Testing
4. Determine the tabular value of the test. 5. Compute for z or t as needed. Vary your solutions
using the formulas:
***For a T – test, one must compute first the For z – test
degree/s of freedom (df) then look for the tabular i. Sample mean compared with a population mean
value from the table of Students’ T – Distribution. ii. Comparing two sample means
iii. Comparing two sample proportions
i. For a single sample For t – test
df = n – 1 i. Sample mean compared with a population mean
ii. For two samples ii. Comparing two sample means
df = n1 + n2 – 2
Decision Criterion
Steps in Hypothesis Testing
Traditional Method:
6. Compare the computed value with its
corresponding tabular value, then state your ***Reject H0 (Accept H1 ) if the test
conclusions based on the following guidelines: statistic falls within the critical region.
Reject Ho if the absolute computed value is equal to ***Fail to reject H0 (Accept Ho) if
or greater than the absolute tabular value
the test statistic does not fall within the
Accept Ho if the absolute computed value is less critical region.
than the absolute tabular value
22
Decision Criterion Decision Criterion
P - value method:
Another option:
* Reject Ho (Accept H1 ) if P-value
(where is the significance level, such Instead of using a significance level
as 0.05) such as 0.05, simply identify the P-value and
leave the decision to the reader.
***Fail to reject H0 (Accept Ho)
if P-value >
Z - TEST Z - TEST
1. Sample Mean (X) Compared with a Population Mean (μ) 2. Comparing Two Sample Means (X1 & X2)
( X – μ) n X1 -X2
Z= Z=
δ δ (1/n1) + (1/n2)
Where:
Where:
X – sample mean
X1 – mean of the first sample
μ – population mean
X2 – mean of the second sample
n – number of items in the sample
n1 – number of items in the first sample
δ – population standard deviation
n2– number of items in the second sample
23
Z- TEST T- TEST
3. Comparing Two Sample Proportions (P1 & P2) 4. Sample Mean (X) Compared with a Population Mean (μ)
P1 -P2 ( X – μ) n – 1
Z= t=
(p1q1/n1) + (p2q2/n2) s
Where:
Where:
p1 – proportion of the first sample
X – sample mean
p2 – proportion of the second sample
μ – population mean
n1 – number of items in the first sample
n2– number of items in the second sample n – number of items in the sample
T- TEST Example 1
5. Comparing Two Sample Means (X1 & X2)
X1–X2
Data from a school census show that the
t=
mean weight of college students is 45 kilos with a
(n – 1)(s
1
2 2
) + (n – 1)(s )
2 1 +1 standard deviation of 3 kilos. A sample of 100
1 2
college students were found to have a mean of 47
n1 + n2 – 2 n1 n2
Where: kilos. Are the college students really heavier than
X1 – mean of the first sample the rest using the 0.05 level of significance?
X2 – mean of the second sample
n1 – number of items in the first sample
n2– number of items in the second sample
s1 – standard deviation of the first sample
s2 – standard deviation of the second sample
24
Example 2 Example 3
A researcher wishes to find out whether or not there A sample survey of television programs in
is significant difference in the monthly allowance of
morning and afternoon students in his school. By random
Metro Manila shows that 80 out of 200 men and 75
sampling, he took a sample of 239 students in the morning out of 250 women dislike “May Bukas Pa”
session. The students were found to have a mean monthly program. One likes to know whether the difference
allowance of P142.00. The researcher also took a sample of between the two sample proportions, 80/200 = 0.40
209 students in the afternoon session . They were found to and 75/250 = 0.30, is significant or not at 0.05
have a mean monthly allowance of P148.00. The population level.
of students in that school have a standard deviation of
P40.00. Is there a significant difference between the
two samples at 0.01 level?
Example 4 Example 5
A researcher knows that the average height of
Beta company is manufacturing steel wire
Filipino women is 1.525 meters. A random sample
with an average tensile strength of 50 kilos. The
of 26 women was taken and was found to have a
laboratory tests 16 pieces and finds that the mean is
mean height of 1.56 meters, with a standard
47 kilos with a standard deviation of 15 kilos. Are
deviation of 0.10 meters. Is there reason to believe
the results in accordance with the hypothesis that
that the 26 women are significantly taller than the
the population mean is 50 kilos?
rest using the 0.05 level of significance?
25
Example 6 Example 7
It is known from the records of the city Two types of rice varieties are being considered
schools that the standard deviation of math test for yield and a comparison is needed. Thirty hectares
scores on ABC test is 5. A sample of 200 students were planted with the rice varieties exposed to fairly
from the system was taken and it was found out that uniform conditions. The results are tabulated below:
the sample mean is 75. Previous tests showed the Variety A Variety B
Average yield 80 sack/hec 85 sack/hec
population mean to be 70. Is it safe to conclude that Sample Variance 5.90 12.10
the sample is significantly different from the
population at 0.01 level? Is there significant difference in the yield of the
two varieties at 0.05 level of significance?
Example 8 Example 9
A manufacturer of flashlight batteries claims A company is trying to decide which brand of two
that the average life of his product will exceed 40 types to buy for their trucks. They would like to adopt
hours. A company is willing to buy a very large Brand c unless there is some evidence that Brand D is
shipment of batteries provided the claim is true. A better. An experiment was conducted where 16 from each
random sample of 36 batteries is tested, and it was brand were used. The tires were run under uniform
conditions until they wore out. The results are:
found out that the sample mean is 45 hours. If the
Brand C: X1 = 40,000 km s1 = 5,400 km
population of batteries has a standard deviation of 5 Brand D: X2 = 38,000 km s2 = 3,200 km
hours, is it likely that the batteries will be bought?
What conclusion can be drawn?
26
Example 10 Analysis of Variance (F - Test)
-A test that was developed by Ronald A. Fisher
All freshmen in a particular school were
found to have a variability in grades expressed as a -A technique in inferential statistics designed to test
standard deviation of 3. two samples among these whether or not more than two samples (or groups)
freshmen, made up of 20 and 50 students each, were are significantly different from each other
found to have means of 88 and 85respectively.
Based on their grades, is the first group really
brighter than the second group using 0.01 level of
significance?
TSS = x2
(
x)2 dft = rk – 1 = N – 1
N
(x)2
2
1 dfb = k – 1
( x )
SSB = r ij N
dfw = dft – dfb
SSW = TSS – SSB
27
Analysis of Variance Contingency Table for ANOVA
3. Compute for the mean sum of squares Sources of Sum of Degree of Mean Sum F – Ratio
SSB Variation Squares Freedom of Squares
(df)
MSSB =
Between SSB dfb MSSB
SSW Column
Exercise Exercise
1. The weights in kilograms of three groups of 5 2. The following are the mileage obtained after several road tests were
run using 5 different kinds of gasoline on a Toyota Car.
members each are shown in the table below. Is there
unusual variation among the groups? ( use ∞ = 0.05) Road Type of Gasoline
Test A B C D E
Group
Members 1ST 35 61 38 65 56
A B C
2ND 31 63 54 60 69
1 50 60 53
3RD 42 50 47 57 70
2 48 40 55
4TH 48 42 60 55 50
3 55 50 40
5TH 40 49 55 60 48
4 50 60 40
5 46 52 47 Is there significant difference among the mileage yields, at 1% level?
28
Exercise 2
Chi – Square Test (X )
3. Below are the bowling scores of four groups og four - Used to test significant difference or relationship
members each. At 5% significance level, find out if - Used if data are in frequencies (enumeration data)
there is unusual variation among the groups.
Members Group USES:
1. to test the goodness of fit of a normal curve; that is
A B C D
to find out whether or not a sample distribution
1 98 100 87 90 conforms with the hypothetical normal distribution
2 78 95 92 93
2. to find out whether or not an observed proportion
is equal to some given ideal or expected proportion
3 95 90 105 95 3. to test the independence of one variable from
4 110 85 88 97 another variable.
Formulas: Exercise
1. Test the hypothesis that educational attainment does not
i. For a 2 x 2 table (with YATE’s correction for continuity) depend on socio – economic status for the following
100 persons in a particular community.
29
Exercise Exercise
2. At 1% significance level, does college academic grade 3. At ABC Company, there are 28 males and 32
depend on the high school NSAT results for the females. Out of the 28 males, 10 holds executive
following 200 students?
posts and the others do clerical work. Of the 32
NSAT Rating
females, only 5 hold executive position and the
Academic
Grade Low Average High
others do clerical work. Prepare a contingency
table, then test the hypothesis that position is
Above 85 13 25 21
independent on sex.
75–85 18 31 38
Below 75 14 20 20
Exercise
4. To determine whether type of personality is related to
academic performance, a random sample of 180 high
school students from a certain college were taken and
Correlation
the data are as follows: and
Low Average Average High Average
Regression Analysis
Introvert 35 30 25
Extrovert 31 23 36
30
Regression Analysis Regression Analysis
b= nxy x y
- concerned with the problem of estimation and
forecasting nx2 x2
FORMULA:
y = a + bx
a = Y – bX
Where: Where:
y predicted score Y mean of the y values
a y – intercept X mean of the x values
b slope of the line
31
Interpretation: Testing the Significance of r
Pearson r Qualitative Description
±1 Perfect Correlation
t=r (n 2)2
± 0.91 – ± 0.99 Very High
1 r 2
± 0.71 – ± 0.90 High
0–±0.20 Negligible
Exercise
Exercise
1. It is generally known that the number of road accidents is inversely
proportional with road width. The following data shows the result of a
study indicating the number of accidents occurring per hundred
2. The following table shows the final grades of ten
thousand vehicles. students in Algebra and Statistics.
32
Pilar B. Acorda
Email Address : pbacorda@yahoo.com
Mobile Number: 09359547319
33