Statistics Lecture

St.
Paul University Philippines Course Content

Graduate School
• Basic Concepts in Statistics
• Measures of Central Tendency
• Measures of Variability
• Correlation and Regression Analysis
A Course Presentation in Statistics • Test of Hypothesis
– Z – Test
– T – Test
– Chi – Square Test
– Analysis of Variance (ANOVA)
• EXPLORING THE SPSS
Course Requirements
Reaction Paper (Film Clip Analysis)
 Reaction Paper/ Film Clip Analysis
 Problem Set Lies, Damned Lies and Statistics: The

Misapplication of Statistics in
 Final Examination Everyday Life
1
Statistics defined . . . Main Divisions
 Descriptive Statistics
• STATISTICS is a collection of methods
for planning experiment, obtaining data, - summarize or describe the
important characteristics of a known
and then organizing, summarizing, set of population data
presenting, analyzing, interpreting and
drawing conclusions based on the data.  Inferential Statistics
-use sample data to make inferences (or

generalizations) about a population
Population vs. Sample Parameter vs. Statistic
• A POPULATION is the complete collection of • A PARAMETER is a numerical measurement

elements (scores, people, measurements, and so describing some characteristics of a population
on)
• A STATISTIC is a numerical measurement

• A SAMPLE is a portion / subset of elements describing some characteristic of a sample
drawn from a population
2
Qualitative vs. Quantitative Data Discrete vs Continuous Data
• Qualitative (categorical or attribute) • Discrete data result from either a finite number of
data can be separated into different possible values or a countable number of possible
categories that are distinguished by values (that is, the number of possible values are
0, 1, 2, or more)
some non – numerical characteristics
• Continuous data result from infinitely many
• Quantitative data consists of numbers
possible values that can be associated with points
representing counts or measurements on a continuous scale in such a way that there are
no gaps or interruptions
Dependent vs Independent Variable Nominal Level of Measurement
• Dependent variable – the variable that is being • The nominal level of measurement is
affected
characterized by data that consists of
names, labels or categories only. The data
- the variable that is being
cannot be arranged in an ordering scheme
explained
• Independent variable – the variable that affects • Examples:

- the variable that explains gender of employees, civil
status, nationality, religion, etc
3
Ordinal Level of Measurement Interval Level of Measurement
• The ordinal level of measurement involves • The interval level of measurement is like the
data that may be arranged in some order, but ordinal level, with the additional property that
differences between data values are either meaningful amounts of differences between data
meaningless or cannot be determined. can be determined. However, there are no inherent
(natural) zero starting point
• Examples:
• Examples:
good, better or best speakers; 1 star, 2
body temperature, year (2007, 2008, 2013, etc)
star or 3 star movie; rank of an employee
Ratio Level of Measurement Visual Summary of the Scales of Measurement

Are there named categories?
YES NO
• The ratio level of measurement is the
interval modified to include the inherent
zero starting point. For values at this level, Nominal scale of measurement
YES
Are the scores ranked?
NO
differences and ratios are meaningful.
Ordinal scale of measurement Are there equal intervals with a
meaningful zero point?
• Examples: YES NO
weights, lengths, distance traveled

Ratio scale of measurement Interval scale of measurement
4
The Mean
Measures of
Central
Tendency
(UNGROUPED
• Two Forms
DATA) – Simple mean
– Weighted mean
Mean Median Mode The mean takes the symbol X.
The Mean
Arithmetic Mean (Mean)
If you have a
“balancing point” of a set of scores Population Sample
the “average score” Total number of cases is N Total number of cases is n
Sum of the scores is ΣX Sum of the scores is ΣX
Compute the mean of the Compute the mean of the

population sample
∑X ΣX
µ= X=
N n
5
Example:
Simple Arithmetic Mean
Consider the following data set:
Where:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
x = an individual
Solution:
X
score
X n = the number of
X
X = 1+2+3+4+5+6+7+8+9+10
n scores/cases n 10
Sigma or x= sum of
the individual score
values Mean = 5.5
Example: Solution:
• The following data represents the ages of the mothers • To obtain the mean age of the mothers of the Grade 1,
of Paulinian Graders randomly selected from four we have
different grade levels who attended a session on
Counseling. What is the mean age of the mothers per
X=35+37+45+54+39+48
grade level?
6
= 258
• Grade 1: 35, 37, 45, 54, 39, 48
6
• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
X=43
• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
**This means that the mothers of the Grade 1 pupils are relatively young.
• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47
6
Example: Answers:
• Find the mean of the other grade levels. Round off
your answers to the nearest hundredths. • Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
ANSWER: 53.73
• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56 • Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47 ANSWER: 50
• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47

ANSWER: 52.44
Example:
Weighted Mean • The following are the responses of 30 randomly chosen
respondents in one item of a research questionnaire.
Xw = w1X1 + w2X2 + w3X3 + . . . + wnXn Verbal Description Weight No. of Responses
Total number of weights Very strongly agree 5 7

Strongly agree 4 11
Where: Agree 3 9
w = weight per item value Disagree 2 2
x = individual score values Strongly disagree 1 1
* Find the weighted response of the respondents

and interpret the result.
7
Solution: Interpretation of Values
• To obtain for the weighted response, we have
Range Verbal Description
X = 5(7) +4(11) + 3(9) +2(2) + 1(1) 4.20 – 5.00 Very strongly agree
30 3.40 – 4.19 Strongly agree
= 111
2.60 – 3.39 Agree
30
 1.80 – 2.59 Disagree
X = 3.70 strongly agree
1.00 – 1.79 Strongly disagree
Exercise: Example:
• The following are the grades of one student one
• Construct a likert scale to interpret items of a summer term.
questionnaire with weights 1 – 4.
Subject No. of Units Grade
• Assume the following descriptions were used: Statistics 3 98

4 – always
PE 2 90
3 – sometimes
2 – seldom Chemistry 5 93
1 – never
* Find the weighted average of the student.

* What could have been the student’s average if all
his subjects are of equal weights?
8
sum of the deviations about the mean is zero
Characteristics of the Mean
(–1)+(–2)+(–2)+1+4=0
an interval statistic B
A C D E
calculated average
3 4 5 6 7 8 9
value is determined by every  
(+1)
case in the distribution
affected by extreme values  

(-1)
(-2)
most widely used (+4)
(-2)
most sensitive measure
Median
Median 
the value at which 1/2 of the ordered scores fall above
and 1/2 of the scores fall below
the value that lies in the middle after ranking
all the scores n = odd n = even
positional measure 12345 1234
the midpoint or
the 50th Median = 3 Median = 2.5
percentile of a
distribution
9
Example Example:
I am the 4th
observation. I 5.40 1.10 0.42 0.73 0.48
am the median. 1.10
0.42 0.48 0.73 1.10 1.10
5.40
(even number of values – no exact middle

shared by two numbers)
0.73 + 1.10
MEDIAN is 0.915
2
Example
Example an ordinal statistic
rank or position average
5.40 1.10 0.42 0.73 0.48 1.10 0.66 not affected by extreme values
0.42 0.48 0.66 0.73 1.10 1.10 5.40
(in order - odd number of values) can be subjected to a few
mathematical computations
exact middle Characteristics less widely used than the mean
MEDIAN is 0.73
of the Median represents a typical score
10
Exercise
Mode
• The following data represents the ages of the mothers
of Paulinian Graders randomly selected from four the value which occurs most frequently in a given data
different grade levels who attended a session on set
Counseling. What is the median of the ages of the does not involve any calculation or ordering of data
mothers per grade level?
• Grade 1: 35, 37, 45, 54, 39, 48

• Grade 2: 54, 63, 47, 63, 45, 53, 52, 48, 55, 48, 63
• Grade 3: 56, 48, 39, 48, 55, 57, 41, 56
• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47
Example Examples
Consider the following data set:
Observation Value/ a. 5.40 1.10 0.42 0.73 0.48 1.10


Score Mode is 1.10
1 5 b. 27 27 27 55 55 55 88 88 99 
Bimodal - 27 & 55
2 7 c. 1 2 3 6 7 8 9 10

3 3 No Mode
4 8
5 7
11
Characteristics of
the Mode Which is best?
a nominal statistic Advantages Disadvantages
an inspection average
Mode Quick and easy to May not be representative
most frequently occurring value calculate. of the whole sample
Median Fairly easy to calculate. Tedious to find for a large

cannot be manipulated mathematically
Half of the scores lie set of numbers or for a set
above the median. that is not in order
rarely used
Mean Takes all numbers into Can be affected by outliers
account.
most “popular score
When to use . . .
Measures of
Central
Mean -an interval interpretation is needed Tendency
(GROUPED
-the value of each score is desired
DATA)
-further statistical computation is expected
Median -an ordinal interpretation is needed

-the middle score is desired
-avoidance of the influence of extreme values is
needed
Mode -a nominal interpretation needed Mean Median Mode
-a quick approximation of a central tendency
measure is desired
-most frequently occurring score is needed
12
The Mean The Mean
i.) Classmark method ii.) Coded – deviation method
 fxm fd i
X= n
X = AM +  n
Where: Where:
Xm – class mark / class midpoint AM – assumed mean (Xm of where the zero deviation is set)
f – frequency f – frequency
n – number of cases / observations d – deviation
n – number of cases / observations
Example The Median

**Find the mean, median and mode of the
following data set: n - cfp
2
X F
Md = XLB + i
24 – 26 3 f
21 – 23 12
18 – 20 10 Where:
15 – 17 6
XLB – lower boundary of the median class
12 – 14 6
cfp – cumulative frequency preceding the median class
9–11 5
n – number of cases
6– 8 5
3– 5 3 f – frequency of the median class

i – class size/width
13
The Mode Exercise
**Find the mean, median and mode of the
following data set:
Mo = XLB + ∆1 i X F
56–62 4
∆1 +∆2 49–55 9
Where: 42–48 12
XLB – lower boundary of the modal class 35–41 12
∆1 – difference between frequency of the modal class 28–34 10
21–27 8
and frequency below it
14–20 6
∆2 – difference between frequency of the modal class
and frequency above it 7–13 4
Other Measures of Position The Quartile

(QUANTILES) kn
- cfp
4 i
Qk = XLB + f
1. Quartile (Qk ) – divides the distribution into 4
equal parts
2. Decile (Dk )- divides the distribution into 10 Where:
equal parts XLB – lower boundary of the quartile class
3. Percentile (Pk )- divides the distribution into cfp – cumulative frequency preceding the quartile class
100 equal parts n – number of cases
f – frequency of the quartile class
14
The Decile The Percentile
k n
kn
- cfp - cfp
10 i 100 i
Dk = XLB + f
Pk = XLB + f
Where: Where:
XLB – lower boundary of the decile class XLB – lower boundary of the percenttile class
cfp – cumulative frequency preceding the decile class cfp – cumulative frequency preceding the percentile class
n – number of cases n – number of cases
f – frequency of the decile class f – frequency of the percentile class

i – class size/width i – class size/width
Exercise Measures of Variability

**Using the frequency distribution below, find:
1. Q1 3. D3 5. P3 • The statistical tool used to describe the degree to
2. D6 4. P78 which scores/ observations are scattered.
X F •It is used to determine the degree of consistency /
56–62 6
homogeneity of scores.
49–55 9
42–48 10 1. range
35–41 12 2. mean absolute deviation
28–34 10 3. semi – interquartile range/ quartile deviation
21–27 8 4. variance
14–20 6
5. standard deviation
7–13 4
15
Formulas (Ungrouped Data) Formulas (Ungrouped Data)
1. Range 4. Variance
R =HOV – LOV
2 (X  X )2
s = n 1
2. Mean absolute deviation
/XX/
MAD= 
n 5. Standard deviation
3. Semi – interquartile range/ quartile
deviation QD=Q3–Q1 s = s2
2
Exercise: Application:
• Given the following data, find the range, MAD, • Two seemingly equally excellent students are
variance and the standard deviation. vying for an academic honor where only one must
20, 26, 40, 39, 35 have to be chosen to get the award. The following
are their grades which are used as a basis for giving
the award.
•Student A: 90, 92, 92, 94, 95
•Student B: 90, 91, 93, 94, 95
•Who do you think deserves the award? Why?
16
Guiding Principle Formulas (Grouped Data)
1. Range
• The lesser the value of the measure, the R=HOV–LOV
more consistent, the more homogenous and
the less scattered are the observations in the 2. Mean absolute deviation
f / Xm  X /
set of data. MAD = 
n
3. Semi – interquartile range/ quartile deviation
QD=Q3–Q1
2
Formulas (Grouped Data) Exercise:

4. Variance **Using the frequency distribution below, find:
1. Range 3. QD 5. Standard Deviation
=  f ( Xm X )
2
s 2 2. MAD 4. variance
n 1
X F
56–62 6
49–55 9
5. Standard deviation 42–48 10
35–41 12
s = s2 28–34 10
21–27 8
14–20 6
7–13 4
17
Tests of Hypothesis Kinds of Hypotheses
1. Null Hypothesis (Ho)
Hypothesis • It serves as the working hypothesis
•A statement or tentative theory which aims to • It is that which one hopes to accept or reject
explain facts about the real world •An • It must always express the idea of no
educated guess significant difference
•It is subject for testing. If it is found to be
statistically true, it is accepted. Otherwise, it gets 2. Alternative Hypothesis (H1 or Ha)
rejected. • It generally represents the hypothetical
statement that the researcher wants to prove.
Types of Alternative Hypotheses (Ha) Type I and Type II Errors

1. Directional hypothesis 
 When making a decision about a proposed
expresses direction hypothesis based on the sample data, one runs the

one – tailed risk of making an error. The following table on the
 next slide summarizes the possibilities:
uses order relation of “greater than” or “less than”,
2. Non – directional hypothesis

does not express direction

two – tailed

uses the “not equal to”
18
Type I and Type II Errors

A Type I error is the mistake of rejecting the null
hypothesis when it is true.

The symbol ∞ (alpha) is used to represent the probability of a
type I error.

A Type II error is the mistake of failing to reject the null
hypothesis when it is false.

The symbol  (beta) is used to represent the probability of a
type II error.
Critical Region
Level of Significance The critical region (or rejection region) is the set of all values of
the test statistic that cause us to reject the null hypothesis.

The probability of making Type I error or alpha
Region of
error in a test is called the significance level of the rejection
test. The significance level of a test is the maximum
value of the probability of rejecting the null Region of
acceptance
hypothesis (Ho) when in fact it is true.
P - value Critical - value
19
Critical Value P - Value
A critical value is any value that separates the
critical region (where we reject the null The P-value (probability value) is the probability of
hypothesis) from the values of the test statistic getting a value of the test statistic that is at least as
that do not lead to rejection of the null extreme as the one representing the sample data,
hypothesis, the sampling distribution that assuming that the null hypothesis is true. The null
applies, and the significance level . hypothesis is rejected if the P-value is very small,
such as 0.05 or less.
Two-tailed, Right-tailed and Two-tailed Tests

Left-tailed Tests Given:
H0:= ; H1:≠
• The tails in a distribution are the extreme

regions bounded by critical values.
20
Right – tailed Tests Left – tailed Tests
Given: Given:
H0:= ; H1:> H0:= ; H1:<
Steps in Hypothesis Testing Steps in Hypothesis Testing

1. Formulate the null hypothesis (Ho) that there is no 4. Determine the tabular value of the test.
significant difference between the items compared. State ***For a Z – test, the table below summarizes
the alternative hypothesis (Ha) which is used in case the critical values at varying significance levels
Ho is rejected.
Type of Level of Significance
2. Set the level of significance of the test, . Test 0.10 0.05 0.025 0.01
One – ±1.28 ± 1. 645 ± 1.96 ± 2.33
3. Determine the test to be used.

Z – TEST – used if the population standard deviation is given
 Tailed
T – TEST – used if the sample standard deviation is given
Two – ± 1.645 ± 1.96 ± 2.33 ±2.58
Tailed
21
Steps in Hypothesis Testing Steps in Hypothesis Testing
4. Determine the tabular value of the test. 5. Compute for z or t as needed. Vary your solutions
using the formulas:

***For a T – test, one must compute first the For z – test
degree/s of freedom (df) then look for the tabular i. Sample mean compared with a population mean
value from the table of Students’ T – Distribution. ii. Comparing two sample means
iii. Comparing two sample proportions

i. For a single sample For t – test
df = n – 1 i. Sample mean compared with a population mean
ii. For two samples ii. Comparing two sample means
df = n1 + n2 – 2
Decision Criterion
Steps in Hypothesis Testing
Traditional Method:
6. Compare the computed value with its
corresponding tabular value, then state your ***Reject H0 (Accept H1 ) if the test
conclusions based on the following guidelines: statistic falls within the critical region.

Reject Ho if the absolute computed value is equal to ***Fail to reject H0 (Accept Ho) if
or greater than the absolute tabular value
the test statistic does not fall within the

Accept Ho if the absolute computed value is less critical region.
than the absolute tabular value
22
Decision Criterion Decision Criterion
P - value method:
Another option:
* Reject Ho (Accept H1 ) if P-value 
 (where  is the significance level, such Instead of using a significance level
as 0.05) such as 0.05, simply identify the P-value and
leave the decision to the reader.
***Fail to reject H0 (Accept Ho)
if P-value > 
Z - TEST Z - TEST
1. Sample Mean (X) Compared with a Population Mean (μ) 2. Comparing Two Sample Means (X1 & X2)
( X – μ) n X1 -X2
Z= Z=
δ δ (1/n1) + (1/n2)
Where:
Where:
X – sample mean
X1 – mean of the first sample
μ – population mean
X2 – mean of the second sample
n – number of items in the sample
n1 – number of items in the first sample
δ – population standard deviation
n2– number of items in the second sample
δ – population standard deviation
23
Z- TEST T- TEST
3. Comparing Two Sample Proportions (P1 & P2) 4. Sample Mean (X) Compared with a Population Mean (μ)
P1 -P2 ( X – μ) n – 1
Z= t=
(p1q1/n1) + (p2q2/n2) s
Where:
Where:
p1 – proportion of the first sample
X – sample mean
p2 – proportion of the second sample
μ – population mean
n2– number of items in the second sample n – number of items in the sample
q1 = 1 – p1 s – sample standard deviation

q2 = 1 – p2
T- TEST Example 1
5. Comparing Two Sample Means (X1 & X2)
X1–X2
Data from a school census show that the
t=
mean weight of college students is 45 kilos with a
(n – 1)(s
1
2 2
) + (n – 1)(s )
2 1 +1 standard deviation of 3 kilos. A sample of 100
1 2
college students were found to have a mean of 47
n1 + n2 – 2 n1 n2
Where: kilos. Are the college students really heavier than
X1 – mean of the first sample the rest using the 0.05 level of significance?
X2 – mean of the second sample
n2– number of items in the second sample
s1 – standard deviation of the first sample
s2 – standard deviation of the second sample
24
Example 2 Example 3
A researcher wishes to find out whether or not there A sample survey of television programs in
is significant difference in the monthly allowance of
morning and afternoon students in his school. By random
Metro Manila shows that 80 out of 200 men and 75
sampling, he took a sample of 239 students in the morning out of 250 women dislike “May Bukas Pa”
session. The students were found to have a mean monthly program. One likes to know whether the difference
allowance of P142.00. The researcher also took a sample of between the two sample proportions, 80/200 = 0.40
209 students in the afternoon session . They were found to and 75/250 = 0.30, is significant or not at 0.05
have a mean monthly allowance of P148.00. The population level.
of students in that school have a standard deviation of
P40.00. Is there a significant difference between the
two samples at 0.01 level?
Example 4 Example 5
A researcher knows that the average height of
Beta company is manufacturing steel wire
Filipino women is 1.525 meters. A random sample
with an average tensile strength of 50 kilos. The
of 26 women was taken and was found to have a
laboratory tests 16 pieces and finds that the mean is
mean height of 1.56 meters, with a standard
47 kilos with a standard deviation of 15 kilos. Are
deviation of 0.10 meters. Is there reason to believe
the results in accordance with the hypothesis that
that the 26 women are significantly taller than the
the population mean is 50 kilos?
rest using the 0.05 level of significance?
25
Example 6 Example 7
It is known from the records of the city Two types of rice varieties are being considered
schools that the standard deviation of math test for yield and a comparison is needed. Thirty hectares
scores on ABC test is 5. A sample of 200 students were planted with the rice varieties exposed to fairly
from the system was taken and it was found out that uniform conditions. The results are tabulated below:
the sample mean is 75. Previous tests showed the Variety A Variety B
Average yield 80 sack/hec 85 sack/hec
population mean to be 70. Is it safe to conclude that Sample Variance 5.90 12.10
the sample is significantly different from the
population at 0.01 level? Is there significant difference in the yield of the
two varieties at 0.05 level of significance?
Example 8 Example 9
A manufacturer of flashlight batteries claims A company is trying to decide which brand of two
that the average life of his product will exceed 40 types to buy for their trucks. They would like to adopt
hours. A company is willing to buy a very large Brand c unless there is some evidence that Brand D is
shipment of batteries provided the claim is true. A better. An experiment was conducted where 16 from each
random sample of 36 batteries is tested, and it was brand were used. The tires were run under uniform
conditions until they wore out. The results are:
found out that the sample mean is 45 hours. If the
Brand C: X1 = 40,000 km s1 = 5,400 km
population of batteries has a standard deviation of 5 Brand D: X2 = 38,000 km s2 = 3,200 km
hours, is it likely that the batteries will be bought?
What conclusion can be drawn?
26
Example 10 Analysis of Variance (F - Test)
-A test that was developed by Ronald A. Fisher
All freshmen in a particular school were
found to have a variability in grades expressed as a -A technique in inferential statistics designed to test
standard deviation of 3. two samples among these whether or not more than two samples (or groups)
freshmen, made up of 20 and 50 students each, were are significantly different from each other
found to have means of 88 and 85respectively.
Based on their grades, is the first group really
brighter than the second group using 0.01 level of
significance?
Analysis of Variance Analysis of Variance

Steps:
1. Compute for the sum of squares 2. Compute degrees of freedom
TSS =  x2 
(
 x)2 dft = rk – 1 = N – 1
N
(x)2
2
1 dfb = k – 1
( x )
SSB = r  ij  N
dfw = dft – dfb
SSW = TSS – SSB
27
Analysis of Variance Contingency Table for ANOVA
3. Compute for the mean sum of squares Sources of Sum of Degree of Mean Sum F – Ratio
SSB Variation Squares Freedom of Squares
(df)
MSSB =
Between SSB dfb MSSB
SSW Column
MSSW = Within SSW dfw MSSW

Column
4. Compute for the F – Ratio Total TSS dft

MSSB
F=
MSSW
Exercise Exercise
1. The weights in kilograms of three groups of 5 2. The following are the mileage obtained after several road tests were
run using 5 different kinds of gasoline on a Toyota Car.
members each are shown in the table below. Is there
unusual variation among the groups? ( use ∞ = 0.05) Road Type of Gasoline
Test A B C D E
Group
Members 1ST 35 61 38 65 56
A B C
2ND 31 63 54 60 69
1 50 60 53
3RD 42 50 47 57 70
2 48 40 55
4TH 48 42 60 55 50
3 55 50 40
5TH 40 49 55 60 48
4 50 60 40
5 46 52 47 Is there significant difference among the mileage yields, at 1% level?
28
Exercise 2
Chi – Square Test (X )
3. Below are the bowling scores of four groups og four - Used to test significant difference or relationship
members each. At 5% significance level, find out if - Used if data are in frequencies (enumeration data)
there is unusual variation among the groups.
Members Group USES:
1. to test the goodness of fit of a normal curve; that is
A B C D
to find out whether or not a sample distribution
1 98 100 87 90 conforms with the hypothetical normal distribution
2 78 95 92 93
2. to find out whether or not an observed proportion
is equal to some given ideal or expected proportion
3 95 90 105 95 3. to test the independence of one variable from
4 110 85 88 97 another variable.
Formulas: Exercise
1. Test the hypothesis that educational attainment does not
i. For a 2 x 2 table (with YATE’s correction for continuity) depend on socio – economic status for the following
100 persons in a particular community.
X2 = (OF  EF 0.5)2 Socio – economic Educational Attainment

status
EF Finished College Did Not Finish
College
ii. For a non 2 x 2 table Poor 18 10

(OF  EF )2
2
X = Middle Class 28 25
EF
Rich 14 5
29
Exercise Exercise
2. At 1% significance level, does college academic grade 3. At ABC Company, there are 28 males and 32
depend on the high school NSAT results for the females. Out of the 28 males, 10 holds executive
following 200 students?
posts and the others do clerical work. Of the 32
NSAT Rating
females, only 5 hold executive position and the
Academic
Grade Low Average High
others do clerical work. Prepare a contingency
table, then test the hypothesis that position is
Above 85 13 25 21
independent on sex.
75–85 18 31 38
Below 75 14 20 20
Exercise
4. To determine whether type of personality is related to
academic performance, a random sample of 180 high
school students from a certain college were taken and
Correlation
the data are as follows: and
Low Average Average High Average
Regression Analysis
Introvert 35 30 25
Extrovert 31 23 36
Is there a significant relationship between personality

type and academic performance?
30
Regression Analysis Regression Analysis
b= nxy  x y
- concerned with the problem of estimation and
forecasting nx2  x2
FORMULA:
y = a + bx
a = Y – bX
Where: Where:
 
y predicted score Y mean of the y values
 
a y – intercept X mean of the x values

b slope of the line
Correlation Analysis Range of Values: r = [-1, 1]

- Concerned in the relationship of the changes of
the variables (+) r – shows a direct positive relationship (- )
r – shows a negative or inverse relationship
Formula: Pearson Product Moment Correlation (r) 
r=0 this indicates no relationship

r = 1 perfect positive relationship
n(xy)  (x)( y)
r= 
r = -1 perfect negative relationship
[n(x2 )  (x)2 ][n( y2 )  ( y)2
31
Interpretation: Testing the Significance of r
Pearson r Qualitative Description
±1 Perfect Correlation
t=r (n  2)2
± 0.91 – ± 0.99 Very High
1 r 2
± 0.71 – ± 0.90 High
± 0.41 – ± 0.70 Marked
± 0.21 – ± 0.40 Slight/Low
0–±0.20 Negligible
Exercise
Exercise
1. It is generally known that the number of road accidents is inversely
proportional with road width. The following data shows the result of a
study indicating the number of accidents occurring per hundred
2. The following table shows the final grades of ten
thousand vehicles. students in Algebra and Statistics.
Road width (in feet) (x) 75 52 60 33 22 Algebra (x) 75 80 93 65 87 71

Number of accidents (y) 40 84 55 92 90 Statistics (y) 82 78 86 72 91 80
a. draw a scatter diagram a. draw a scatter diagram

b. find the equation of the LSRL b. find the equation of the LSRL
c. predict accident frequency for a road whose width is 55 feet; c. predict grade in Statistics if grade in
48 feet Algebra is 78; 82; 89; 95; 100
d. find the degree of relationship between road width and
accident frequency.
d. find the degree of relationship between grades
in Algebra and Statistics
32
Pilar B. Acorda
Email Address : pbacorda@yahoo.com
Mobile Number: 09359547319
33

Statistics Lecture

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Lecture

Uploaded by

Copyright:

Available Formats

St.

Paul University Philippines Course Content

 Problem Set Lies, Damned Lies and Statistics: The

-use sample data to make inferences (or

Population vs. Sample Parameter vs. Statistic

• A POPULATION is the complete collection of • A PARAMETER is a numerical measurement

• A STATISTIC is a numerical measurement

Dependent vs Independent Variable Nominal Level of Measurement

• Independent variable – the variable that affects • Examples:

Ratio Level of Measurement Visual Summary of the Scales of Measurement

weights, lengths, distance traveled

Mean Median Mode The mean takes the symbol X.

the “average score” Total number of cases is N Total number of cases is n

Sum of the scores is ΣX Sum of the scores is ΣX

Compute the mean of the Compute the mean of the

• Grade 4: 53, 47, 49, 59, 60, 45, 53, 59, 47

Xw = w1X1 + w2X2 + w3X3 + . . . + wnXn Verbal Description Weight No. of Responses

Total number of weights Very strongly agree 5 7

* Find the weighted response of the respondents

• Assume the following descriptions were used: Statistics 3 98

* Find the weighted average of the student.

affected by extreme values  

(even number of values – no exact middle

Example an ordinal statistic

rank or position average

• Grade 1: 35, 37, 45, 54, 39, 48

Observation Value/ a. 5.40 1.10 0.42 0.73 0.48 1.10

a nominal statistic Advantages Disadvantages

Median Fairly easy to calculate. Tedious to find for a large

Median -an ordinal interpretation is needed

Example The Median

3– 5 3 f – frequency of the median class

XLB – lower boundary of the modal class 35–41 12

∆1 – difference between frequency of the modal class 28–34 10

Other Measures of Position The Quartile

f – frequency of the decile class f – frequency of the percentile class

Exercise Measures of Variability

•Who do you think deserves the award? Why?

Formulas (Grouped Data) Exercise:

Types of Alternative Hypotheses (Ha) Type I and Type II Errors

P - value Critical - value

Two-tailed, Right-tailed and Two-tailed Tests

• The tails in a distribution are the extreme

Steps in Hypothesis Testing Steps in Hypothesis Testing

δ – population standard deviation

q1 = 1 – p1 s – sample standard deviation

Analysis of Variance Analysis of Variance

MSSW = Within SSW dfw MSSW

4. Compute for the F – Ratio Total TSS dft

X2 = (OF  EF 0.5)2 Socio – economic Educational Attainment

Is there a significant relationship between personality

Correlation Analysis Range of Values: r = [-1, 1]

± 0.41 – ± 0.70 Marked

± 0.21 – ± 0.40 Slight/Low

Road width (in feet) (x) 75 52 60 33 22 Algebra (x) 75 80 93 65 87 71

a. draw a scatter diagram a. draw a scatter diagram

You might also like