MTH 313 Advanced Statistics Arnie Boy Guitguit (Complete Weeks)

UNIVERSITY OF MINDANAO
College of Teacher Education

Secondary Education Program
Physically Distanced but Academically Engaged
Self-Instructional Manual (SIM) for Self-Directed Learning

(SDL)
Course/Subject: MTH 313 – ADVANCED STATISTICS
Name of Teacher: ARNIE BOY O. GUITGUIT
THIS SIM/SDL MANUAL IS A DRAFT VERSION ONLY, NOT

FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS
INTENDED USE. THIS IS INTENDED ONLY FOR
THE USE OF THE STUDENTS WHO ARE
OFFICIALLY ENROLLED IN THE
COURSE/SUBJECT.
EXPECT REVISIONS OF THE MANUAL.

University of Mindanao 2
College of Teacher Education “Physically Distanced but Academically Engaged”
TABLE OF CONTENTS
Course Outline 4
Course Outline Policy 4
Course Coordinator’s (CC) Voice 7
Course Outcome (CO) 7
WEEK 1 – 3: INTRODUCTION TO STATISTICS

BIG PICTURE: Week 1 – 3 ULOs 8
Steps in Hypothesis Testing (Traditional
Lesson 1.1 9
Method
Lesson 1.2 z Test for Mean 20
Lesson 1.3
t Test for Mean 31
Testing the Differences Between Two
Lesson 1.4 39
Means: Using the z Test
WEEK 4 – 5: TESTING TWO MEANS AND CORRELATION

Testing the Difference Between Two Means
Lesson 2.1 46
of Independent Samples: Using the t Test
Testing the Difference Between Two Means
Lesson 2.2 51
of Dependent Samples
Lesson 2.3 Correlation and Regression 58
WEEK 6 – 7: CHI-SQUARE AND ANOVA
Lesson 3.1 Test for Goodness of Fit 73
Lesson 3.2 Test for Independence 81
Lesson 3.3 One-way Analysis of Variance 91
WEEK 8 – 9: NONPARAMETRIC TESTS
Advantages and Disadvantages of
Lesson 4.1 99
Nonparametric Methods
QUESTION AND ANSWER (Q & A) LIST 101
KEYWORD’S INDEX 102

REFERENCES 103
APPENDIX
TABLE E: The z Distribution 104
TABLE F: The t Distribution 106
TABLE G: The Chi-square Distribution 107
Course Outline: MTH 313 – Advanced Statistics
Course Coordinator: Arnie Boy O. Guitguit

Email: aguitguit@umindanao.edu.ph
Student Consultation: By LMS Blackboard message / text message
Mobile: 0910-6775105 / 0927-2981372
Phone: None
Effectivity Date: August 2020
Mode of Delivery: Blended (On-Line with face to face or virtual sessions)
Time Frame: 54 Hours
Student Workload: Expected Self-Directed Learning
Requisites: MTH 212, MTH 213
Credit: 3.0 units
Attendance Requirements: A minimum of 95% attendance is required at all
scheduled Virtual or face to face sessions.
Course Outline Policy
Areas of Concern Details

Contact and Non-contact Hours This 3-unit course self-instructional manual is
designed for blended learning mode of instructional
delivery with scheduled face to face or virtual
sessions. The expected number of hours will be 54
including the face to face or virtual sessions. The face
to face sessions shall include the summative
assessment tasks (exams) since this course is crucial
in the licensure examination for teachers.
Assessment Task Submission Submission of assessment tasks shall be on 3rd, 5th,
7th and 9th week of the term. The assessment paper
shall be attached with a cover page indicating the title
of the assessment task (if the task is performance),
the name of the course coordinator, date of
submission and name of the student. The document
should be emailed to the course coordinator. It is also
expected that you already paid your tuition and other
fees before the submission of the assessment task.
If the assessment task is done in real time through

the features in the Blackboard Learning Management
System, the schedule shall be arranged ahead of time
by the course coordinator.
Since this course is included in the licensure

examination for teachers, you will be required to take
the Multiple-Choice Question exam inside the
University. This should be scheduled ahead of time by
your course coordinator. This is non-negotiable for all
licensure-based programs.
Turnitin Submission (IF To ensure honesty and authenticity, all assessment

NECESSARY) tasks are required to be submitted through Turnitin
with a maximum similarity index of 30% allowed. This
means that if your paper goes beyond 30%, the
students will either opt to redo her/his paper or
explain in writing addressed to the course coordinator
the reasons for the similarity. In addition, if the paper
has reached more than 30% similarity index, the
student may be called for a disciplinary action in
accordance with the University’s OPM on Intellectual
and Academic Honesty.
Please note that academic dishonesty such as

cheating and commissioning other students or people
to complete the task for you have severe
punishments (reprimand, warning, expulsion).
Penalties for Late The score for an assessment item submitted after the
Assignments/Assessments designated time on the due date, without an approved
extension of time, will be reduced by 5% of the
possible maximum score for that assessment item for
each day or part day that the assessment item is late.
However, if the late submission of assessment paper

has a valid reason, a letter of explanation should be
submitted and approved by the course coordinator. If
necessary, you will also be required to present/attach
evidences.
Return of Assessment tasks will be returned to you two (2)
Assignments/Assessments weeks after the submission. This will be returned by
email or via Blackboard portal.
For group assessment tasks, the course coordinator

will require some or few of the students for online or
virtual sessions to ask clarificatory questions to
validate the originality of the assessment task
submitted and to ensure that all the group members
are involved.
Assignment Resubmission You should request in writing addressed to the course
coordinator his/her intention to resubmit an
assessment task. The resubmission is premised on
the student’s failure to comply with the similarity index
and other reasonable grounds such as academic
literacy standards or other reasonable circumstances
e.g. illness, accidents financial constraints.
Re-marking of Assessment You should request in writing addressed to the
Papers and Appeal program coordinator your intention to appeal or
contest the score given to an assessment task. The
letter should explicitly explain the reasons/points to
contest the grade. The program coordinator shall
communicate with the students on the approval and
disapproval of the request.
If disapproved by the course coordinator, you can

elevate your case to the program head or the dean
with the original letter of request. The final decision

will come from the dean of the college.
Grading System Course exercises – 30% (including BlackBoard
forum)
1st exam – 10%
2nd exam – 10%
3rd exam – 10%
Final exam – 40%
Preferred Referencing Style APA 6th Edition

(IF THE TASK REQUIRES)
Student Communication You are required to create a umindanao email
account which is a requirement to access the
BlackBoard portal. Then, the course coordinator
shall enroll the students to have access to the
materials and resources of the course. All
communication formats: chat, submission of
assessment tasks, requests etc. shall be through the
portal and other university recognized platforms.
You can also meet the course coordinator in person

through the scheduled face to face sessions to raise
your issues and concerns.
For students who have not created their student

email, please contact the course coordinator or
program head.
Contact Details of the Dean Dr. Jocelyn Bacasmot
Email: jbacasmot@umindanao.edu.ph
Phone: 082-3050647 local 102
Contact Details of the Program Maribel Abalos, MaEd

Head Email: mabalos@umindanao.edu.ph
Phone: 082-3050647 local 102
Students with a Special Needs Students with special needs shall communicate with
the course coordinator about the nature of his or her
special needs. Depending on the nature of the need,
the course coordinator with the approval of the
program coordinator may provide alternative
assessment tasks or extension of the deadline of
submission of assessment tasks. However, the
alternative assessment tasks should still be in the
service of achieving the desired course learning
outcomes.
Online Tutorial Registration (IF You are required to enroll in a specific tutorial time for
NECESSARY) this course via the www.cte.edu.ph portal. Please
note that there is a deadline for enrollment to the
tutorial.
Instructional Help Desk Dr. Jocelyn Bacasmot
Contact Details Email: jbacasmot@umindanao.edu.ph
Phone: 082-3050647 local 102
Phyll Jhann Gildore, MaEd

Email: pgildore@umindanao.edu.ph
Phone: 082-3050647 local 102
BEED Concerns
Email: BEED@umindanao.edu.ph
Mobile: 0923-6028780
Library Contact Details Brigida E. Bacani
Head, LIC
Email: library@umindanao.edu.ph
Phone: +63 305-0640
Mobile: 0951-3766681
Well-being Welfare Support Ronadora E. Deala
Held Desk Contact Details Head, GSTC
Email: ronadora_deala@umindanao.edu.ph
Mobile: 0921-2122846
Ivy Jane S. Regidor

Facilitator, CTE GSTC
Email: gstcmain@umindanao.edu.ph
Mobile: 0910-5681081
Course Information – see/download course syllabus in the Black Board LMS
CC’s Voice: Hello future Math teachers! Welcome to this course MTH313 –
Advanced Statistics. By now, I am confident that you really wanted to
become a Math teacher and that you have visualized yourself already
being in front of the classroom teaching. As you go through with this
module, always remember to broaden your perspective about the
course. Moreover, some inputs and discussion were hereby
acknowledged from the book of Allan Bluman (2011) and Edward Vieira,
Jr. (2017).
CO Before the actual teaching performance, you must deal with one of the
major qualities of a teacher which is having an expert power in the subject
matter you are going to discuss. Hence, the ultimate course outcome (CO)
of this course is for you to be acquainted with statistics (statistical tools
and methods). Talking about advanced statistics, the most crucial part is
how we find the correct representation of the following concept. Thus, in
this course you are expected to recall your knowledge or read in advance
the rudiments of statistics, parameters, z test, t test, as well as ANOVA.
These include the process how to calculate mean, variance, and standard
deviation.
Let us begin!
Big Picture
Week 1-3: Unit Learning Outcomes (ULO): At the end of the unit, you are expected to:
a. Understand the five steps used in hypothesis testing,

b. Test means using the z test,
c. Test means using the t test, and
d. Test the difference between sample means using the z test.
Big Picture in Focus!

ULOa: Understand the five steps used in hypothesis testing
Metalanguage
In this section, the most essential terms relevant to the advanced statistics will
be defined. You will encounter these terms as we go through this course. Please refer
to these definitions in case you will encounter difficulty in understanding some
concepts.
For ULOa, the terms below will be defined to establish a common frame of
reference as to how the terms are addressed in this course.
TERMS DEFINITIONS
1. Sample ➢ It is a set of individuals or objects collected or
selected from a statistical population.
2. Population ➢ It is the entire pool from which the statistical sample

is drawn.
3. Hypothesis ➢ It is an assumption that needs to be validated using

statistical methods.
Essential Knowledge
To perform the aforesaid big picture (unit learning outcomes) for the first three
(3) weeks of the course, you need to fully understand the essential knowledge that will
be laid down in the succeeding pages. Please note that you are not limited to
exclusively refer to these resources. Thus, you are expected to utilize other books,
research articles and other resources that are available in the university’s library e.g.
ebrary, search.proquest.com etc.
For this module, several concepts will be tackled and explained. Let us start
with the systems of linear equations. In this part, the most critical words or terms used
in studying analytic geometry will be introduced. Kindly look closely to some terms
especially those bolded or italicized.
INTRODUCTION TO STATISTICS
Statistics is a branch of applied mathematics that focuses on studying sample data

that are representative of a specific population (Vieira, 2017). Researchers collect and
analyze sample data. The data analysis provides a useful information for decision-
making.
Statistics has two general purposes, perhaps best thought of as types of statistics. We
have descriptive statistics that summarizes and describes sample data
representative of the population. Next, we have inferential statistics that examines
the relationships between and among variables and helps us draw conclusions that
can be generalized to the population.
Having sample data that represent the broader targeted population is critical to our
ability to use our findings to make informed decisions. A sample is a relatively small
representative number of a larger population selected for statistical analysis.
Lesson 1.1 Steps in Hypothesis Testing (Traditional Method)
Every hypothesis-testing situation begins with the statement of a hypothesis. A

statistical hypothesis is a conjecture about a population parameter. This conjecture
may or may not be true.
There are two statistical hypotheses for each situation: the null hypothesis and the
alternative hypothesis.
1. Null Hypothesis. It is symbolized by H0. It is a statistical hypothesis that
states that there is no difference between a parameter and a specific value, or
that there is no difference between two parameters.
2. Alternative Hypothesis. It is symbolized by H1 or Ha. It is a statistical
hypothesis that states the existence of a difference between a parameter and
a specific value, or states that there is a difference between two parameters.
As an illustration of how hypothesis should be stated, three statistical studies will be

used as examples.
EXAMPLE 1
A medical researcher is interested in finding out whether a new medication will have
any undesirable side effects. The researcher is particularly concerned with the pulse
rate of the patients who take the medication. Will the pulse rate increase, decrease or
remain unchanged after a patient takes the medication?
Since the researcher knows that the mean pulse rate for the population under study is
82 beats per minute, the hypothesis for this situation are
𝐻0 : 𝜇 = 82 and 𝐻1 : 𝜇 ≠ 82
The null hypothesis specifies that the mean will remain unchanged, and the alternative
hypothesis states that it will be different. This test is called a two-tailed test since the
possible side effects of the medicine could be to raise or lower the pulse rate.
EXAMPLE 2
A chemist invents an additive to increase the life of an automobile battery. If the mean
lifetime of the automobile battery without the additive is 36 months, then her
hypotheses are
𝐻0 : 𝜇 = 36 and 𝐻1 : 𝜇 > 36
In this situation, the chemist is interested only in increasing the lifetime of the batteries,
so her alternative hypothesis is that the mean is greater than 36 months. The null
hypothesis is that the mean is equal to 36 months. This test is called right-tailed test
since the interest is in an increase only.
EXAMPLE 3
A contractor wishes to lower heating bills by using a special type of insulation in
houses. If the average of the monthly heating bills is P78, her hypotheses about
heating costs with the use of insulation are
𝐻0 : 𝜇 = 𝑃78 and 𝐻1 : 𝜇 < 𝑃78
This test is a left-tailed test since the contractor is interested only in lowering heating
costs.
To state hypotheses correctly, researchers must translate the conjecture or claim from
words into mathematical symbols. The basic symbols used are as follows:
Equal = Greater than >

Not equal to ≠ Less than <
The null and alternative hypotheses are stated together, and the null hypothesis
contains the equal sign, as shown below (where k represents a specified number).
Two-tailed test Right-tailed test Left-tailed test

𝐻0 : 𝜇 = 𝑘 𝐻0 : 𝜇 = 𝑘 𝐻0 : 𝜇 = 𝑘
𝐻1 : 𝜇 ≠ 𝑘 𝐻1 : 𝜇 > 𝑘 𝐻1 : 𝜇 < 𝑘
The null hypothesis is always stated using the equal sign. This is done because most
professionals journals, and when we test the null hypothesis, the assumption is that
the mean, proportion, or standard deviation is equal to a given specific value. Also,
when the researcher conducts a study, he or she is generally looking for evidence to
support a claim. Therefore, the claim should be stated as the alternative hypothesis,
i.e. using < or > or ≠. Because of this, the alternative hypothesis is sometimes called
the research hypothesis.
A claim, though, can be stated as either the null hypothesis or the alternative
hypothesis; however, the statistical evidence can only support the claim if it is the
alternative hypothesis. Statistical evidence can be used to reject the claim if the claim
is the null hypothesis. These facts are important when you are stating the conclusion
of a statistical study.
After stating the hypothesis, the researcher designs the study. The researcher selects
the correct statistical tests, chooses an appropriate level of significance, and
formulates a plan for conducting the study.
A statistical test uses the data obtained from a sample to decide about whether the
null hypothesis should be rejected. The numerical value obtained from a statistical test
is called the test value.
In the hypothesis-testing situation, there are four possible outcomes. In reality, the null
hypothesis may or may not be true, and a decision is made to reject or not reject it
based on the data obtained from a sample. The four possible outcomes are shown
below. Notice that there two possibilities for a correct decision and two possibilities for
an incorrect decision.
H0 true H0 false
Reject Error Correct

H0 Type I decision
Do not
Correct Error
Reject
H0 decision Type II
A type I error occurs if you reject the null hypothesis when it is true.
A type II error occurs if you do not reject the null hypothesis when it is false.
The hypothesis-testing situation can be likened to a jury trial. In a jury trial, there are
four possible outcomes. The defendant is either guilty or innocent, and he or she will
be convicted or acquitted. Now the hypotheses are
H0: The defendant is innocent.

H1: The defendant is not innocent (i.e. guilty).
If the defendant is convicted but he or she does not commit the crime, then a type I
error has been committed. On the other hand, if the defendant is convicted and he or
she has committed the crime, then a correct decision has been made.
If the defendant is acquitted and he or she did not commit the crime, a correct decision
has been made. However, if the defendant is acquitted and he or she did commit the
crime, then a type II error has been made.
Now, the question is ‘How large a difference is necessary to reject the null hypothesis?’
Here is where the level of significance is used.
The level of significance is the maximum probability of committing a type I error.

This probability is symbolized by 𝛼 (Greek letter alpha). That is, P(type I error) = 𝛼.
The probability of a type II error is symbolized by 𝛽, the Greek letter beta. That is,
P(type II error) = 𝛽. In most hypothesis-testing situation, 𝛽 cannot be easily computed;
however, 𝛼 and 𝛽 are related in that decreasing one increases the other.
Statisticians generally agree on using three arbitrary significance levels: the 0.10, 0.05,
and 0.01 levels. That is, if the null hypothesis is rejected, the probability of a type I
error will be 10%, 5%, or 1%, depending on which level of significance is used. Here
is another way of putting it: When 𝛼 = 0.10, there is a 10% chance of rejecting a true
null hypothesis; when 𝛼 = 0.05, there is 5% chance of rejecting a true null hypothesis;
and when 𝛼 = 0.01, there is 1% chance of rejecting a true null hypothesis.
In a hypothesis-testing situation, the researcher decides what level of significance to

use. It does not have to be the 0.10, 0.05, or 0.01 level. It can be any level, depending
on the seriousness of the type I error. After a significance level Is chosen, a critical
value is selected from a table for the appropriate test. If a z test is used, for example,
the z table is consulted to find the critical value. The critical value determines the
critical and noncritical regions.
The critical value separates the critical region from the noncritical region. The
symbol for critical value is C.V.
The critical or rejection region is the range of values of the test value that indicates
that there is significant difference and that the null hypothesis should be rejected.
The noncritical or nonrejection region is the range of values of the test value that
indicates that the difference was probably due to chance and that the null hypothesis
should not be rejected.
The critical value can be on the right side of the mean or on the right side of the mean
for a one-tailed test. Its location depends on the inequality sign of the alternative
hypothesis.
A one-tailed test indicates that the null hypothesis should be rejected when the test
value is in the critical region on one side of the mean. A one-tailed test is either a
right-tailed test or left-tailed test, depending on the direction of the inequality of
the alternative hypothesis.
Figure 1. Finding the Critical Value for 𝛼 = 0.01 (Right-tailed Test)
To obtain the critical value, the researcher must choose an alpha level. Suppose the
researcher chose 𝛼 = 0.01. Then the researcher must find a z value such that 1% of
the area falls to the right of the z value and 99% falls to the left of the z value, as shown
in Figure 1.
Next, the researcher must find the area value in z table closest to 0.9900. The critical
z value is 2.33, since that value gives the area closest to 0.9900 (that is, 0.9901), as
shown in Figure 1.
The critical and noncritical regions and the critical value are shown in Figure 2.
Figure 2. Critical and Noncritical Regions for 𝛼 = 0.01 (Right-tailed Test)
If we have an alternative hypothesis, such as 𝐻1 : 𝜇 < 𝑃78, then the critical value falls
to the left of the mean. This test is thus a left-tailed test. At α = 0.01, the critical value
is -2.33, since 0.0099 is the closest value to 0.01 as shown in Figure 3.
Figure 3. Critical and Noncritical Regions for 𝛼 = 0.01 (Left-tailed Test)
When a researcher conducts a two-tailed test, the null hypothesis can be rejected
when there is significant difference in either decision, above or below the mean.
In a two-tailed test, the null hypothesis should be rejected when the test value is in
either of the two critical regions.
For a two-tailed test, then, the critical region must be split into two equal parts. If α =
0.01, then one-half of the area or 0.005, must be to the right of the mean and one-half
must be to the left of the mean, as shown in Figure 4.
Figure 4. Finding the Critical Values for 𝛼 = 0.01 (Two-tailed Test)
In this case, the z value on the left side is found by looking up the z value
corresponding to an area of 0.0050. The z value falls about halfway between −2.57
and −2.58 corresponding to the areas 0.0049 and 0.0051. The average of −2.57 and
−2.58 is [(−2.57) + (−2.58)] ÷ 2 = −2.575 so is the z value is needed to three decimal
places, −2.575 is used; however, if the z value is rounded to two decimal places,
−2.58 is used.
On the right side, it is necessary to find the z value corresponding to 0.99 + 0.005, or
0.9950. Again, the value falls between 0.9949 and 0.9951, so +2.575 or 2.58 can be
used.
The critical values are +2.58 and −2.58, as shown in Figure 5.

Figure 5. Critical and Noncritical Regions for 𝛼 = 0.01 (Two-tailed Test)
Similar procedures are used to find other values of 𝛼. In Figure 6, it shows the critical
value (C.V.) for the three situations discussed for values 𝛼 = 0.10, 𝛼 = 0.05, and 𝛼 =
0.01 with shaded rejected regions.
Figure 6. Summary of Hypothesis Testing and Critical Values
In finding the critical values for specific 𝛼, take note of the following:
Step 1: Draw the figure and indicate the appropriate area.

a. If the test is left-tailed, the critical region, with an area equal to 𝛼, will be on
the left side of the mean.
b. If the test is right-tailed, the critical region, with an area of 𝛼, will be on the
right side of the mean.
c. If the test is two-tailed, 𝛼 must be divided by 2; one-half of the area will be to
the right of the mean, and one-half will be to the left of the mean.
Step 2: Identify the corresponding critical values.

a. For a left-tailed test, use the z value that corresponds to the area equivalent
to 𝛼.
b. For a right-tailed test, use the z value that corresponds to the area equivalent
to 1 − 𝛼.
c. For a two-tailed test, use the z value that corresponds to 𝛼/2 for the left value.
It will be negative. For the right value, use the z value that corresponds to the
area equivalent to 1 − 𝛼/2. It will be positive.
Self-Help! Refer to the sources to further understand the lesson.

• Goos, P., & Meintrup, D. (2015). Statistics with jmp: Graphs, descriptive
statistics, and probability. ProQuest Ebook Central
https://search.proquest.com
WRITE IT OUT!
ACTIVITY 1!
After the discussion on steps of hypothesis testing, you have now known the
preliminary concepts about advanced statistics. With that, let now us check your
knowledge by doing the following tasks. Now, I will require you to explain thoroughly
your answers about the given argument.
1. Define null and alternative hypotheses and give example of each.

___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
2. What is meant by a Type I error? A Type II error? How are they related?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
3. When should a one-tailed test be used? A two-tailed test?

___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
4. In hypothesis testing, why can’t the hypothesis be proved true?

___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
5. List the steps of hypothesis testing.

___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
APPLY THE CONCEPT!
ACTIVITY 1!
knowledge by doing the following tasks.
DIRECTION: Using the z table (Table E), find the critical value (or values) for each.
1. 𝛼 = 0.05, two-tailed test
2. 𝛼 = 0.01, left-tailed test
3. 𝛼 = 0.005, right-tailed test
8. 𝛼 = 0.01, two-tailed test
.
ACTIVITY 2!
DIRECTION: For each conjecture, state the null and alternative hypotheses.
1. The average age of community college students is 24.6 years.
2. The average income of accountants is $51,497.
3. The average age of attorneys is greater than 25.4 years.
4. The average score of high school basketball games is less than 88.
5. The average pulse rate of male marathon runners is less than 70 beats per
minute.
6. The average cost of DVD player is $79.95.
7. The average weight loss for a sample of people who exercise 30 minutes per
day for 6 weeks is 8.3 pounds.
GO BEYOND!
Based on the discussion about steps of hypothesis testing, it is necessary to go

further what was given by the book. You are tasked now to answer the given problem
as well as explain the reason behind your answers.
EGGS AND YOUR HEALTH

The Incredible Edible Egg company recently found that eating eggs does not increase
a person’s blood serum cholesterol. Five hundred subjects participated in a study that
lasted for 2 years. The participants were randomly assigned to either a no-egg group
or a moderate-egg group. The blood serum cholesterol levels were checked at the
beginning and at the end of the study. Overall, the groups’ levels were not significantly
different. The company reminds us that eating eggs is healthy if done in moderation.
Many of the previous studies relating eggs and high blood serum cholesterol jumped
to improper conclusions.
Using this information, answer the questions below.
1. What prompted the study?

2. What is the population under study?
3. Was a sample collected?
4. What was the hypothesis?
5. Were data collected?
6. Were any statistical tests run?
7. What was the conclusion?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________

ULOb: Test means using the z test
Metalanguage
To clearly understand the ULOb, you need to know the important terms relevant
to this course. Kindly adopt with the definitions below.
TERMS DEFINITIONS
1. Mean ➢ This refers to the average of the given sample
data.
2. z test ➢ It is any statistical test for which the distribution of

the test statistic under the null hypothesis can be
approximated by a normal distribution.
Essential Knowledge
In this part of the module, we will explore testing means using the z test. Follow
closely the discussion and answer the exercises afterwards.
LESSON 1.2 z Test for a Mean
We will explain two statistical tests: the z test is used when 𝜎 is known and the t test
is used when 𝜎 is unknown. We will discuss first the z test.
Many hypotheses are tested using a statistical test based on the following general
formula:
(𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑣𝑎𝑙𝑢𝑒) − (𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒)

𝑇𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟
The observed value is the statistic (such as the sample mean) that is computed from
the sample data. The expected value is the parameter (such as the population
standard mean) that you would expect to obtain if the null hypothesis were true – in
other words, the hypothesized value. The denominator is the standard error of the
statistics being tested (in this case, the standard error of the mean).
The z test is defined formally as follows.
The z test is a statistical test for the mean of a population. It can be used when 𝑛 ≥
30, or when the population is normally distributed and 𝜎 is known.
The formula for the z test is
𝑋−𝜇
𝑧=
𝜎 / √𝑛
where
𝑋 = sample mean
𝜇 = hypothesized population mean
𝜎 = population standard deviation
𝑛 = sample size
For the z test, the observed value is the value of the sample mean. The expected value
is the value of the population mean, assuming that the null hypothesis is true. The
denominator 𝜎 / √𝑛 is the standard error of the mean.
Assumptions for the z Test for a Mean when 𝝈 is Known

1. The sample is a random sample.
2. Either 𝑛 ≥ 30 or the population is normally distributed if 𝑛 < 30.
Note: Your first encounter with hypothesis testing can be somewhat challenging and
confusing, since there are many new concepts being introduced at the same time. To
understand all the concepts, you must carefully follow each step in the examples and
try each exercise that is assigned. Only after careful study and patience will these
concepts become clear.
There are five steps for solving hypothesis-testing problems:
Step 1 State the hypothesis and identify the claim.

Step 2 Find the critical value(s).
Step 3 Compute the test value.
Step 4 Make the decision to reject or not reject the null hypothesis.
Step 5 Summarize the results.
EXAMPLE 1 Days on Dealer’s Lots

A researcher wishes to see if the mean number of days that a basic, low-price, small
automobile sits on a dealer’s lot is 29. A sample of 30 automobile dealers has a mean
of 30.1 days for basic, low-price, small automobiles. At 𝛼 = 0.05, test the claim that the
mean time is greater than 29 days. The standard deviation of the population is 3.8
days.
Solution
Step 1 State the hypotheses and identify the claim.

𝐻0 : 𝜇 = 29 and 𝐻1 : 𝜇 > 29 (claim)
Step 2 Find the critical value.
Since 𝛼 = 0.05 and the test is a right-tailed test, the critical value is
z = +1.65.
𝑋−𝜇 30.1 − 29
𝑧= = = 𝟏. 𝟓𝟗
𝜎 / √𝑛 3.8 / √30
Step 4 Make the decision.

Since the test value, +1.59, is less than the critical value, +1.65, and is
not in the critical region, the decision is to not reject the null hypothesis.
See the illustration below.
There is not enough evidence to support the claim that mean time is
greater than 29 days.
EXAMPLE 2 Costs of Men’s Athletic Shoes

A researcher claims that the average cost of men’s athletic shoes is less than $80. He
selects a random sample of 36 pairs of shoes from a catalog and finds the following
costs (in dollars). (The costs have been rounded to the nearest dollar.) Is there enough
evidence to support the researcher’s claim at 𝛼 = 0.10? Assume 𝜎 = 19.2.
60 70 75 55 80 55
50 40 80 70 50 95
120 90 75 85 80 60
110 65 80 85 85 45
75 60 90 90 60 95
110 85 45 90 70 70
Solution

𝐻0 : 𝜇 = $80 and 𝐻1 : 𝜇 < $80 (claim)
Since 𝛼 = 0.10 and the test is a left-tailed test, the critical value is
z = −1.28.
Since the problem gives raw data, it is necessary to find the mean of the
data. Computing the mean, you will have 𝑋 = 75.0 and 𝜎 = 19.2.
Substituting to the formula
𝑋−𝜇 75 − 80
𝑧= = = −𝟏. 𝟓𝟔
𝜎 / √𝑛 19.2 / √36

Since the test value, −1.56, falls in the critical region, the decision is to
reject the null hypothesis. See the illustration below.
There is enough evidence to support the claim that the average cost of
men’s athletic shoes is less than $80.
EXAMPLE 3 Cost of Rehabilitation

The Medical Rehabilitation Education Foundation reports that the average cost of
rehabilitation for stroke victims is $24,672. To see if the average cost of rehabilitation
is different at a particular hospital, a researcher selects a random sample of 35 stroke
victims at the hospital and finds that the average cost of their rehabilitation is $26,343.
The standard deviation of the population is $3251. At 𝛼 = 0.01, can it be concluded
that the average cost of stroke rehabilitation at a particular hospital is different from
$24,672?
Solution

𝐻0 : 𝜇 = $24,672 and 𝐻1 : 𝜇 ≠ $24,672 (claim)
Since 𝛼 = 0.01 and the test is a two-tailed test, the critical values are
+𝟐. 𝟓𝟖 and −𝟐. 𝟓𝟖.
𝑋−𝜇 26343 − 24672

𝑧= = = 𝟑. 𝟎𝟒
𝜎 / √𝑛 3251 / √35

Reject the null hypothesis, since the test value falls in the critical region.
See the illustration below.
There is enough evidence to support the claim that the average cost of
rehabilitation at the particular hospital is different from $24,672.
Students sometimes have difficulty summarizing the results of a hypothesis test.

Figure 7 shows the four possible outcomes and the summary statement for each
situation.
Figure 7. Outcomes of a Hypothesis-Testing Situation
First, the claim can be either the null or alternative hypothesis, and one should identify
which it is. Second, after the study is completed, the null hypothesis is either rejected
or not rejected. From these two facts, the decision can be identified in the appropriate
block shown in Figure 7.
Again, remember that nothing is being proved true or false. The statistician is only
stating that there is or is not enough evidence to say that a claim is probably true or
false. As noted previously, the only way to prove something would be to use the entire
population under study, and usually this cannot be done, especially when the
population is large.
P-value Method for Hypothesis Testing
Statisticians usually test hypotheses at the common 𝛼 levels of 0.05 or 0.01 and
sometimes at 0.10. Recall that the choice of level depends on the seriousness of the
type I error. Besides listing an 𝛼 value, many computer statistical packages give a P-
value for the hypothesis testing.
The P-value (or probability value) is the probability of getting a sample statistic (such
as the mean) or a more extreme sample statistic in the direction of the alternative
hypothesis when the null hypothesis is true.
In other words, the P-value is the actual area under the standard normal distribution
curve (or other curve, depending on what statistical test is being used) representing
the probability of a particular sample statistic or a more extreme sample statistic
occurring if the null hypothesis is true.
For example, suppose that an alternative hypothesis is 𝐻1 : 𝜇 > 50 and the mean of a
sample of 𝑋 = 52. If the computer printed a P-value of 0.0356 for a statistical test, then
the probability of getting a sample mean of 52 or greater is 0.0356 if the true population
mean is 50 (for a given sample size and standard deviation). The relationship between
the P-value and the 𝛼 value can be explained in this manner. For P = 0.0356, the null
hypothesis would be rejected at 𝛼 = 0.05 but not at 𝛼 = 0.01. See Figure 8.
Figure 8. Comparison of 𝛼 Values and P-Values
When the hypothesis test is two-tailed, the area in one tail must be doubled. For a two-
tailed test, if 𝛼 = 0.05 and the area in one tail is 0.0356, the P-value will be 2(0.0356)
= 0.0712. That is, the null hypothesis should not be rejected at 𝛼 = 0.05, since 0.0712
is greater than 0.05. In summary, then, if the P-value is less than 𝛼, reject the null
hypothesis. If the P-value is greater than 𝛼, do not reject the null hypothesis.
The P-value for the z test can be found by using the Table E in the Appendix. First find
the area under the standard normal distribution curve corresponding to the z test
value. For a left-tailed test, use the area given in the table; for a right-tailed test, use
1.0000 minus the area given in the table. To get the P-value for a two-tailed test,
double the area you found in the tail. This procedure is shown in the succeeding
examples.
There are five steps for solving hypothesis-testing problems (P-value method):

Step 3 Find the P-value.
EXAMPLE 4 Cost of College Tuition

A researcher wishes to test the claim that the average cost of tuition and fees at a four-
year public college is greater than $5700. She selects a random sample of 36 four-
year public colleges and finds the mean to be $5950. The population standard
deviation is $659. Is there evidence to support the claim at 𝛼 = 0.05? Use the P-value
method.
Solution

𝐻0 : 𝜇 = $5700 and 𝐻1 : 𝜇 > $5700 (claim)
𝑋−𝜇 5950 − 5700

𝑧= = = 𝟐. 𝟐𝟖
𝜎 / √𝑛 659 / √36
Using Table E in the appendix, find the corresponding area under the
normal distribution for z = 2.28. It is 0.09887. Subtract this value for the
area from 1.0000 to find the area in the right tail.
1.0000 – 0.9887 = 0.0113
The P-value is 0.0113.

Since the P-value is less than 0.05, the decision is to reject the null
hypothesis. See illustration below.
There is enough evidence to support the claim that the tuition and fees at
four-year public colleges are greater than $5700.
EXAMPLE 5 Wind Speed

A researcher claims that the average wind speed in a certain city is 8 miles per hour.
A sample of 32 days has an average wind speed of 8.2 miles per hour. The standard
deviation of the population is 0.6 mile per hour. At 𝛼 = 0.05, is there enough evidence
to reject the claim? Use the P-value method.
Solution

𝐻0 : 𝜇 = 8 (claim) and 𝐻1 : 𝜇 ≠ 8
𝑋−𝜇 8.2 − 8
𝑧= = = 𝟏. 𝟖𝟗
𝜎 / √𝑛 0.6 / √32
Using Table E in the appendix, find the corresponding area under the
normal distribution for z = 1.89. It is 0.09706. Subtract the value from
1.0000.
1.0000 – 0.9706 = 0.0294
Since this is a two-tailed test, the area of 0.0294 must be doubled to get
the P-value.
2(0.0294) = 0.0588

The decision is to not reject the null hypothesis since the P-value is
greater than 0.05. See illustration below.
There is not enough evidence to reject the claim that the average wind
speed is 8 miles per hour.
When using the P-value method, you can use the following rule:
Decision Rule when Using the P-value

If P-value ≤ 𝛼, reject the null hypothesis.
If P-value > 𝛼, do not reject the null hypothesis.

APPLY THE CONCEPT!
ACTIVITY 1!
After the discussion on z test for a mean, you have now known the preliminary
concepts about advanced statistics. With that, let now us check your knowledge by
doing the following tasks. Kindly clearly write your solutions after each exercise.
DIRECTION: Perform each of the following steps for the set of questions.
a. State the hypotheses and identify the claim.
b. Find the critical value(s).
c. Compute the test value.
d. Make the decision.
e. Summarize the results.
1. Warming and Ice Melt. The average depth of the Hudson Bay is 305 feet.
Climatologists were interested in seeing if the effects of warming and ice melt were
affecting the water level. Fifty-five measurements over a period of weeks yielded a
sample mean of 306.2 feet. The population variance is known to be 3.57. Can it be
concluded at the 0.05 level of significance that the average depth has increased? Is
there evidence of what caused this to happen?
2. Moviegoers. The average “moviegoer” sees 8.5 movies a year. A moviegoer is

defined as a person who sees at least one movie in a theater in a 12-month period. A
random sample of 40 moviegoers from a large university revealed that the average
number of movies seen per person was 9.6. The population standard deviation is 3.2
movies. At the 0.05 level of significance, can it be concluded that this represents a
difference from the national average?
3. Nonparental Care. According to the Digest of Educational Statistics, a certain

group of preschool children under the age of one year each spends an average of 30.9
hours per week in nonparental care. A study of state university center-based programs
indicated that a random sample of 32 infants spent an average of 32.1 hours per week
in their care. The standard deviation of the population is 3.6 hours. At 𝛼 = 0.01 is there
sufficient evidence to conclude that the sample mean differs from the national mean?
4. Ages of U.S. Senators. The mean age of Senators in the 109th Congress was
60.35 years. A random sample of 40 senators from various state senates had an
average age of 55.4 years, and the population standard deviation is 6.5 years. At 𝛼 =
0.05, is there sufficient evidence that state senators are on average younger than the
Senators in Washington?
5. Peanut Production in Virginia. The average production of peanuts in Virginia is

3000 pounds per acre. A new plant food has been developed and is tested on 60
individual plots of land. The mean yield with the new plant food is 3120 pounds of
peanuts per acre, and the population standard deviation is 578 pounds. At 𝛼 =0.05,
can you conclude that the average production has increase?
6. Heights of 1-Year-Olds. The average 1-year-old (both genders) is 29 inches tall.

A random sample of 30 1-year-olds in a large day care franchise resulted in the
following heights. At 𝛼 =0.05, can it be concluded that the average height differs from
29 inches? Use 𝜎 = 2.61.
25 32 35 25 30 26.5 25.5 29.5 26 32
30 28.5 30 32 28 31.5 29.5 30 29 34
29 32 27 28 33 28 32 29 27 29.5
GO BEYOND!

CAR THEFTS
You recently received a job with a company that manufactures an automobile antitheft
device. To conduct an advertising campaign for the product, you need to make a claim
about the number of automobile thefts per year. Since the population of various cities
in the United States varies, you decide to use rates per 10,000 people. (The rates are
based on the number of people living in the cities.) Your boss said that last year the
theft rate per 10,000 people was 44 vehicles. You want to see if it has changed. The
following are rates per 10,000 people for 36 randomly selected locations in the United
States.
55 70 56 125 94 66 20 17 16
39 58 25 23 62 91 41 20 36
51 42 53 26 33 115 73 73 78
15 69 55 56 62 75 134 24 67
1. What hypotheses would you use?

2. Is the sample considered small or large?
3. What assumption must be met before the hypothesis test can be conducted?
4. Which probability distribution would you use?
5. Would you select a one- or two-tailed test? Why?
6. What critical value(s) would you use?
7. Conduct a hypothesis test. Use 𝜎 = 30.3.
8. What is your decision?
9. What is your conclusion?
10. Write a brief statement summarizing your conclusion.
11. If you lived in a city whose population was about 50,000, how many
automobile thefts per would you expect to occur?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________

ULOc: Test means using the t test
Metalanguage
To clearly understand the ULOc, you need to know the important terms relevant
to this course. Kindly adopt with the definition below.
TERMS DEFINITIONS
1. t test ➢ It is any statistical hypothesis test in which the
test statistic follows a student’s t-distribution
under the null hypothesis.
Essential Knowledge
In this part of the module, we will explore how to test sample means using the
t test. Follow closely the discussion and answer the exercises afterwards.
LESSON 1.3 t Test for a Mean
When the population standard deviation is unknown, the z test is not normally used
for testing hypothesis involving means. A different test, called the t test, is used. The
distribution of the variable should be approximately normal.
The t distribution is similar to the standard normal distribution in the following ways.
1. It is bell-shaped.
2. It is symmetric about the mean.
3. The mean, median, and mode are equal to 0 and are located at the center of
the distribution.
The t distribution differs from the standard normal distribution in the following ways.
1. The variance is greater than 1.
2. The t distribution is a family of curves based on the degrees of freedom (d.f.),
which is a number related to sample size.
3. As the sample size increases, the t distribution approaches the normal
distribution.
The t test is defined next.
The t test is a statistical test for the mean of a population and is used when the
population is normally or approximately normally distributed, and 𝜎 is unknown.
The formula for t test is

𝑋− 𝜇
𝑡=
𝑠 / √𝑛
where
𝑋 = sample mean
𝜇 = hypothesized population mean
𝑠 = sample standard deviation
𝑛 = sample size
The degrees of freedom are d.f. = 𝑛 – 1.
The formula for t test is similar to the formula for z test. But since the population
standard deviation 𝜎 is unknown, the sample standard deviation 𝑠 is used instead.
The critical values for the t test are given in Table F in the appendix. For a one-tailed
test, find the 𝛼 level by looking down the left-hand column.
Notice that the degrees of freedom are given for values from 1 through 30, then at
intervals above 30. When the degrees of freedom are above 30, some reference will
tell you to use the nearest table value; however, in this module, you should always
round down to the nearest table value. For example, if d.f. = 59, use d.f. = 55 to find
the critical value or values. This is a conservative approach.
As the degrees of freedom get larger, the critical values approach the z values. Hence
the bottom values (large sample size) are the same as the z values that were used in
the last section.
Assumptions for the t Test for a Mean when 𝝈 is Unknown

2. Either 𝑛 ≥ 30 or the population is normally distributed if 𝑛 < 30.
When you test hypotheses by using the t test (traditional method), follow the same
procedure as for the z test, except use Table F.

Step 2 Find the critical value(s) from Table F.
Remember that the t test should be used when the population is approximately
normally distributed, and the population standard deviation is unknown.
EXAMPLE 6 Hospital Infections

A medical investigation claims that the average number of infections per week at a
hospital in southwestern Visayas is 16.3. A random sample of 10 weeks had a mean
number of 17.7 infections. The sample standard deviation is 1.8. Is there enough
evidence to reject the investigator’s claim at 𝛼 = 0.05?
Solution

𝐻0 : 𝜇 = 16.3 (claim) and 𝐻1 : 𝜇 ≠ 16.3
The critical values are +𝟐. 𝟐𝟔𝟐 and −𝟐. 𝟐𝟔𝟐 for 𝛼 = 0.05 and d.f. = 9.
𝑋−𝜇 17.7 − 16.3

𝑡= = = 𝟐. 𝟒𝟔
𝑠 / √𝑛 1.8 / √10

Reject the null hypothesis since 2.46 > 2.262. See illustration below.
There is enough evidence to reject the claim that the average number of
infections is 16.3.
EXAMPLE 7 Substitute Teachers’ Salaries

An educator claims that the average salary of substitute teachers in school districts in
Allegheny County, Pennsylvania, is less than $60 per day. A random sample of eight
school districts is selected, and the daily salaries (in dollars) are shown. Is there
enough evidence to support the educator’s claim at 𝛼 = 0.10?
60 56 60 55 70 55 60 55
Solution

𝐻0 : 𝜇 = $60 and 𝐻1 : 𝜇 < $60 (claim)
At 𝛼 = 0.10 and d.f. = 7, the critical value is −𝟏. 𝟒𝟏𝟓.

To compute the test value, the mean and the standard deviation must be
found. Calculating the two, you will have 𝑋 = $58.88 and 𝑠 = 5.08.
Substituting,
𝑋−𝜇 58.88 − 60
𝑡= = = −𝟎. 𝟔𝟐𝟒
𝑠 / √𝑛 5.08 / √8

Do not reject the null hypothesis since –0.624 falls in the noncritical
region. See illustration below.
There is not enough evidence to support the educator’s claim that the
average salary od substitute teachers in Allegheny County is less than
$60 per day.
Students sometimes have difficulty deciding whether to use the z test or t test. The
rules are the same as those pertaining confidence intervals.
1. If 𝜎 is known, use the z test. The variable must be normally distributed if 𝑛 < 30.
2. If 𝜎 is unknown but 𝑛 ≥ 30, use the t test.
3. If 𝜎 is unknown and 𝑛 < 30, use the t test. The population must be approximately
normally distributed.
These rules are summarized in Figure 9.

Figure 9. Using the z or t test

WRITE IT OUT!
ACTIVITY 1!
After the discussion on t test a mean, you have now known the preliminary
doing the following tasks. Now, I will require you to explain thoroughly your answers
about the given argument.
1. In what ways is the t distribution similar to the standard normal distribution? In what
ways is the t distribution different from the standard normal distribution?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
2. What are the degrees of freedom for the t test?

___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
APPLY THE CONCEPT!
ACTIVITY 1!
After the discussion on t test for a mean, you have now known the preliminary
doing the following tasks.
DIRECTION: Using the t table (Table F), find the critical value (or values) for each.
1. 𝑛 = 10, 𝛼 = 0.05, two-tailed test
2. 𝑛 = 18, 𝛼 = 0.01, left-tailed test
3. 𝑛 = 6, 𝛼 = 0.005, right-tailed test
8. 𝑛 = 17, 𝛼 = 0.01, two-tailed test
.
ACTIVITY 2!
After the discussion on t test for a mean, you have now known the preliminary
1. Cell Phone Call Lengths. The average local cell phone call length was reported to
be 2.27 minutes. A random sample of 20 phone calls showed an average of 2.98
minutes in length with a standard deviation of 0.98 minute. At 𝛼 = 0.05 can it be
concluded that the average differs from the population average?
2. Heights of Tall Buildings. A researcher estimates that the average height of the
buildings of 30 or more stories in a large city is at least 700 feet. A random sample of
10 buildings is selected, and the heights in feet are shown. At 𝛼 = 0.025, is there
enough evidence to reject the claim?
485 511 841 725 615
520 535 635 616 582
3. Commute Time to Work. A survey of 15 large U.S. cities finds that the average
commute time one way is 25.4 minutes. A chamber of commerce executive feels that
the commute in his city is less and wants to publicize this. He randomly selects 25
commuters and finds the average is 22.1 minutes with a standard deviation of 5.3
minutes. At 𝛼 = 0.10, is he correct?
4. Exercise and Reading Time Spent by Men. Men spend an average of 29 minutes
per day on weekends and holidays exercising and playing sports. They spend an
average of 23 minutes per day reading. A random sample of 25 men resulted in a
mean of 35 minutes exercising with a standard deviation of 6.9 minutes and an
average of 20.5 minutes reading with 𝑠 = 7.2 minutes. At 𝛼 = 0.05 for both, is there
sufficient evidence that these two results differ from the national means?
5. Internet Visits. A U.S. Web Usage Snapshot indicated a monthly average of 36

Internet visits per user from home. A random sample of 24 Internet users yielded a
sample mean of 42.1 visits with a standard deviation of 5.3. At the 0.01 level of
significance can it be concluded that this differs from the national average?
GO BEYOND!

HOW MUCH NICOTINE IS IN THOSE CIGARETTES?

A tobacco company claims that its best-selling cigarettes contain at most 40 mg of
nicotine. This claim is tested at the 1% significance level by using the results of 15
randomly selected cigarettes. The mean is 42.6 mg and the standard deviation is 3.7
mg. Evidence suggests that nicotine is normally distributed. Information from a
computer output of the hypothesis test is listed.
Sample mean = 42.6 P-value = 0.008
Sample standard deviation = 3.7 Significance level = 0.01
Sample size = 15 Test statistic t = 2.72155
Degreed if freedom = 14 Critical value t = 2.62610
1. What are the degrees of freedom??

2. Is this z or t test?
3. Is this a comparison of one or more samples?
4. Is this a right-tailed, left-tailed, or two-tailed test?
5. By comparing the test statistic to the critical value, what would you conclude?
6. Is there a conflict in this output. Explain.
7. What has been proved in this study?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________

ULOd: Test the difference between sample means using the z test
Metalanguage
To clearly understand the ULOd, you need to know the important terms relevant
TERMS DEFINITIONS
1. Independent ➢ The data are from two different sources.
Populations
Essential Knowledge
In this part of the module, we will explore how to use z test in comparing two
means from independent populations.
LESSON 1.4 Testing the Difference Between Two Means: Using the z Test
Suppose the researcher wishes to determine whether there is a difference in the

average age of nursing students who enroll in a nursing program at a community
college and those who enroll in a nursing program at a university. In this case, the
researcher is not interested in the average age of all beginning nursing students;
instead, he is interested in comparing the means of the two groups. His research
question is, Does the mean age of nursing students who enroll at a community college
differ from the mean age of nursing students who enroll at a university? Here, the
hypotheses are
𝐻0 : 𝜇1 = 𝜇2
𝐻1 : 𝜇1 ≠ 𝜇2
where
𝜇1 = mean age of all beginning nursing students at the community college
𝜇2 = mean age of all beginning nursing students at the university
Another way of stating the hypothesis for this situation is
𝐻0 : 𝜇1 − 𝜇2 = 0
𝐻1 : 𝜇1 − 𝜇2 ≠ 0
If there is no difference in population means, subtracting them will give a difference of

zero. If they are different, subtracting will give a number other than zero. Both methods
of stating hypotheses are correct; however, the first method will be used in this module.
Assumptions for the z Test to Determine the Difference Between Two Means
1. Both samples are random samples.
2. The samples must be independent of each other. That is, there can be no
relationship between the subjects in each sample.
3. The standard deviation of both populations must be known, and if the sample
sizes are less than 30, the populations must be normally or approximately
The theory behind testing the difference between two means is based on selecting
pairs of samples and comparing the means of the pairs. The population means need
not be known.
All possible pairs of samples are taken from population. The means for each pair of
samples are computed and then subtracted, and the differences are plotted. If both
populations have the same men, the most of the differences will be zero or close to
zero.
Figure 10. Differences of Means of Pairs of Samples
Occasionally, there will be a few large differences due to chance alone, some positive
and others negative. If the differences are plotted, the curve will be shaped like a
normal distribution and have a mean of zero, as shown in Figure 10.
The variance of the difference 𝑋1 − 𝑋2 is equal to the sum of the individual variances
of 𝑋1 and 𝑋2. That is,
𝜎𝑋21 − 𝑋2 = 𝜎𝑋21 + 𝜎𝑋22
𝜎12 𝜎22 𝜎2 𝜎22

where 𝜎𝑋21 = and 𝜎𝑋22 = so the standard deviation of 𝑋1 − 𝑋2 is √𝑛1 +
𝑛1 𝑛2 1 𝑛2
Formula for the z Test for Comparing Two Means from Independent
Populations
(𝑋1 − 𝑋2 ) − (𝜇1 − 𝜇2)

𝑧=
𝜎12 𝜎22
√
𝑛1 + 𝑛2
This formula is based on the general format of

where 𝑋1 − 𝑋2 is the observed difference, and the expected difference 𝜇1 − 𝜇2 is zero

when the null hypothesis is 𝜇1 = 𝜇2 , since that is equivalent to 𝜇1 − 𝜇2 = 0. Finally, the
standard error of the difference is
𝜎12 𝜎22
√ +
𝑛1 𝑛2
In comparison of two sample means, the difference may be due to chance, in which
the null hypothesis will not be rejected, and the researcher can assume that the means
of the populations are basically the same. The difference in this case is not significant.
On the other hand, if the difference is significant, the null hypothesis is rejected, and
the researcher can conclude that the population means are different. To illustrate, see
Figure 11.
Figure 11. Hypothesis-Testing Situations in the Comparison of Means
These tests can also be one-tailed, using the following hypotheses:
Right-tailed test Left-tailed test
𝐻0 : 𝜇1 = 𝜇2 𝐻0 : 𝜇1 − 𝜇2 = 0 𝐻0 : 𝜇1 = 𝜇2 𝐻0 : 𝜇1 − 𝜇2 = 0
or or
𝐻1 : 𝜇1 > 𝜇2 r 𝐻1 : 𝜇1 − 𝜇2 > 0 𝐻1 : 𝜇1 < 𝜇2 r 𝐻1 : 𝜇1 − 𝜇2 < 0
The same critical values used in the previous examples. They can be obtained from
Table E in the appendix.
If 𝜎12 and 𝜎22 are no known, the researcher can use the variances from each sample 𝑠12
and 𝑠22 , but a t test must be used (this will be explained in the next topic). The basic
format for hypothesis testing using the traditional method is reviewed here.

Step 2 Find the critical value(s).
EXAMPLE 8 Hotel Room Cost

A survey found the average hotel room rate in New Orleans is $88.42 and the average
room rate in Phoenix is $80.61. Assume that the data we obtained from two samples
of 50 hotels each and that the standard deviations of the populations are $5.62 and
$4.83, respectively. At 𝛼 = 0.05, can it be concluded that there is a significant
difference in the rates?
Solution

𝐻0 : 𝜇1 = 𝜇2 and 𝐻1 : 𝜇1 ≠ 𝜇2 (claim)
Since 𝛼 = 0.05, the critical values are +𝟏. 𝟗𝟔 and −𝟏. 𝟗𝟔.
(𝑋1 − 𝑋2 ) − (𝜇1 − 𝜇2) (88.42 − 80.61) − 0

𝑧= = = 𝟕. 𝟒𝟓
2 2
𝜎12 𝜎22 √5.62 + 4.83
√ 50 50
𝑛1 + 𝑛2

Reject the null hypothesis at 𝛼 = 0.05, since 7.45 > 2.262. See illustration
below.
There is enough evidence to support the claim that the means are not
equal. Hence, there is a significant difference in the rates.

WRITE IT OUT!
ACTIVITY 1!
After the discussion on using z test on testing the difference between two
means, you have now known the preliminary concepts about advanced statistics. With
that, let now us check your knowledge by doing the following tasks. Now, I will require
you to explain thoroughly your answers about the given argument.
1. Explain the difference between testing a single mean and testing the difference
between two means.
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
2. What two assumptions must be met when you are using the z test to test differences
between two means?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
3. Show two different ways to state that the means of two populations are equal.
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
APPLY THE CONCEPT!
After the discussion on using z test on testing the difference between two
that, let now us check your knowledge by doing the following tasks.
1. ACT Scores. A survey of 1000 students nationwide showed a mean ACT score of
21.4. A survey of 500 Ohio scores showed a mean of 20.8. If the population standard
deviation in each case is 3, can we conclude that Ohio is below the national average?
Use 𝛼 = 0.05.
2. Lengths of Hospital Stays. The average length of “short hospital stays” for men is
slightly longer than that for women, 5.2 days versus 4.5 days. A random sample of
recent hospital stays for both men and women revealed the following. At 𝛼 = 0.01, is
there sufficient evidence to conclude that the average hospital stay for men is longer
than the average hospital stay for women?
Men Women
Sample Size 32 30
Sample mean 5.5 days 4.2 days
Population standard deviation 1.2 days 1.5 days
GO BEYOND!
Based on the discussion about using z test for testing the difference between
two means, it is necessary to go further what was given by the book. You are tasked
now to answer the given problem as well as explain the reason behind your answers.
HOME RUNS
For a sports radio talk show, you are asked to research the question whether more
home runs are hit by players in the National League or by players in the American
League. You decide to use the home run leaders from each league for a 40-year period
as your data. The numbers are shown.
National League American League
47 73 65 49 40 32 52 48 56 50
46 38 47 49 37 46 44 36 29 40
40 31 48 52 38 39 22 45 39 36
44 48 45 39 52 32 37 44 44 49
49 50 79 47 43 57 47 56 52 40
35 40 39 37 36 43 51 42 49 43
37 48 45 38 36 39 41 46 32 32
40 45 36 44 47 32 33 49 44 32
1. Define a population.
2. What kind of sample was used?
3. Do you feel that is representative?
4. What are your hypotheses?
5. What significance level will you use?
6. What statistical test will you use?
7. What are the test results? (Assume 𝜎1 = 8.8 and 𝜎2 = 7.8.
8. What is your decision?
9. What can you conclude?
10. Do you feel that using the data given really answers the original question
asked?
11. What other data might be used to answer the question?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
Big Picture
a. Test the difference between two means for independent samples using t test,
b. Test the difference between two means for dependent samples, and
c. Draw a scatter plot and compute the correlation coefficient.

ULOa: Test the difference between two means for independent
samples using t test
Metalanguage
TERMS DEFINITIONS
1. Independent ➢ These are samples which are not related to each
Samples other.
Essential Knowledge
To perform the aforesaid big picture (unit learning outcomes) for the fourth and
fifth weeks of the course, you need to fully understand the essential knowledge that
will be laid down in the succeeding pages. Please note that you are not limited to
LESSON 2.1 Testing the Difference Between Two Means of Independent

Samples: Using the t Test
The z test was used to test the difference between two means when the population
standard deviations were known and the variables were normally or approximately
normally distributed, or when both sample sizes were greater than or equal to 30. In
many situations, however, these conditions cannot be met — that is, the population
standard deviations are not known. In these cases, a t test is used to test the difference
between means when the two samples are independent and when the samples are
taken from two normally or approximately normally distributed populations. Samples
are independent samples when they are not related. Also, it will be assumed that the
variances are not equal.
Formula for the t Test – For Testing the Difference Between Two Means –
Independent Samples
Variances are assumed to be unequal
(𝑋1 − 𝑋2 ) – (𝜇1 − 𝜇2)

𝑡=
𝑠12 𝑠22
√
𝑛1 + 𝑛2
where the degrees of freedom are equal to the smaller of 𝑛1 − 1 or 𝑛2 − 1.
The formula
(𝑋1 − 𝑋2 ) – (𝜇1 − 𝜇2)
𝑡=
𝑠 2𝑠 2
√ 1+ 2
𝑛1 𝑛2
follows the format of

where 𝑋1 − 𝑋2 is the observed difference between sample means and where the
expected value 𝜇1 − 𝜇2 is zero when no difference between population means is
𝑠12 𝑠22
hypothesized. The denominator √ + is the standard error of the difference
𝑛1 𝑛2
between two means. Since mathematical derivation of the standard error is somewhat
complicated, it will be omitted here.
Assumptions for the t Test for Two Independent Means When 𝝈𝟏 and 𝝈𝟐 are
Unknown
1. The samples are random samples.
2. The sample data are independent of one another.
3. When the sample sizes are less than 30, the populations must be normally or
approximately normally distributed.
EXAMPLE 9 Farm Sizes

The average size of a farm in Indiana County, Pennsylvania, is 191 acres. The average
size of a farm in Greene County, Pennsylvania, is 199 acres. Assume the data were
obtained from two samples with standard deviations of 38 and 12 acres, respectively,
and sample sizes of 8 and 10, respectively. Can it be concluded at 𝛼 = 0.05 that the
average size of the farms in the two counties are different? Assume the populations
are normally distributed.
Solution

𝐻0 : 𝜇1 = 𝜇2 and 𝐻1 : 𝜇1 ≠ 𝜇2 (claim)
Since the test is two-tailed, since 𝛼 = 0.05, and since the variances are
unequal, the degrees of freedom are the smaller of 𝑛1 − 1 or 𝑛2 − 1. In
this case, the degrees of freedom are 8 – 1 = 7. Hence, from Table F in
the appendix, the critical values are +𝟐. 𝟑𝟔𝟓 and −𝟐. 𝟑𝟔𝟓.
(𝑋1 − 𝑋2 ) − (𝜇1 − 𝜇2) (191 − 199) − 0

𝑡= = = −𝟎. 𝟓𝟕
2 2
𝑠12 𝑠22 √38 + 12
√ + 8 10
𝑛1 𝑛2

Do not reject the null hypothesis, since –0.57 > –2.365. See illustration
below.
There is not enough evidence to support the claim that the average size
of the farms is different.

• Vieira, E. (2017). Introduction to Real World Statistics: With Step-by-Step
SPSS Instructions. United States of America: Routledge, Taylor & Francis
Group.
APPLY THE CONCEPT!
After the discussion on using t test on testing the difference between two
that, let now us check your knowledge by doing the following tasks.
DIRECTION: Perform each of the following steps for the set of questions. For these
exercises assume that the variables are normally or approximately normally distributed
and the variances are unequal.
1. Ages of Gamblers. The mean age of a sample of 25 people who were playing the
slot machines is 48.7 years, and the standard deviation is 6.8 years. The mean age of
a sample of 35 people who were playing roulette is 55.3 with a standard deviation of
3.2 years. Can it be concluded at 𝛼 = 0.05 that the mean age of those playing the slot
machines is less than those playing roulette?
2. Bestseller Books. The mean for the number of weeks 15 New York Times hard-
cover fiction books spent on the bestseller list is 22 weeks. The standard deviation is
6.17 weeks. The mean for the number of weeks 15 New York Times hard-cover
nonfiction books spent on the list is 28 weeks. The standard deviation is 13.2 weeks.
At 𝛼 = 0.10, can we conclude that there is a difference in the mean times for the
number of weeks the books were on the bestseller lists?
3. Weights of Vacuum Cleaners. Upright vacuum cleaners have either a hard body
type or a soft body type. Shown are the weights in pounds of a sample of each type.
At 𝛼 = 0.05, can it be concluded that the means of the weights are different?
Hard copy types Soft body types
21 18 15 16 17 24 13 11 13 12
16 17 17 20 18 15
23 17 15 20
13 16 17
4. Noise Levels in Hospitals. The mean noise level of 20 areas designated as

“casualty doors” was 63.1 dBA, and the standard deviation is 4.1 dBA. The mean noise
level for 24 areas designated as operating theaters was 56.3 dBA, and the standard
deviation was 7.5 dBA. At 𝛼 = 0.05, can it be concluded that there is a difference in
the means?
GO BEYOND!
Based on the discussion about using t test for testing the difference between
TOO LONG ON THE TELEPHONE

A company collects data on the lengths of telephone calls made by employees in two
different divisions. The mean and standard deviation for the sales division are 10.26
and 8.56, respectively. The mean and standard deviation for the shipping and
receiving division are 6.93 and 4.93, respectively. A hypothesis test was run, and the
computer output follows.
Degrees of freedom = 56
Test statistic t = 1.89566
Critical value t = -2.0037, 2.0037
P-value = 0.06317
Significance level = 0.05
1. Are the samples independent or dependent?

2. Which number from the output is compared to the significance level to check
if the null hypothesis should be rejected?
3. Which number form the output gives the probability of a type I error that is
calculated from the sample data?
4. Was a right, left, or two-tailed test done? Why?
5. What are your conclusions?
6. What would your conclusions be if the level of significance were initially set
at 0.10?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________

ULOb Test the difference between two means for dependent samples
Metalanguage
TERMS DEFINITIONS
1. Dependent ➢ Samples are considered to be dependent
Samples samples when the subjects are paired or
matched in some way.
Essential Knowledge
In this part of the module, we will explore how to test the difference between
two means for dependent samples. Follow closely the discussion and answer the
exercises.
LESSON 2.2 Testing the Difference Between Two Means of Dependent

Samples
The t test was used to compare two sample means when the samples were
independent. In this section, a different version of the t test is explained. This version
is used when the samples are dependent. Samples are considered to be dependent
samples when the subjects are paired or matched in some way.
For example, suppose a medical researcher wants to see whether a drug will affect
the reaction time of its users. To test this hypothesis, the researcher must pretest the
subjects in the sample first. That is, they are given a test to ascertain their normal
reaction times. Then after taking the drug, the subjects are tested again, using a
posttest. Finally, the means of the two tests are compared to see whether there is a
difference. Since the same subjects are used in both cases, the samples are related;
subjects scoring high on the pretest will generally score high on the posttest, even
after consuming the drug. Likewise, those scoring lower on the pretest will tend to
score lower on the posttest. To take this effect into account, the researcher employs
a t test, using the differences between the pretest values and the posttest values.
Thus, only the gain or loss in values is compared.
Here are some other examples of dependent samples. A researcher may want to
design an SAT preparation course to help students raise their test scores the second
time they take the SAT. Hence, the differences between the two exams are compared.
A medical specialist may want to see whether a new counseling program will help
subjects lose weight. Therefore, the pre weights of the subjects will be compared with
the post weights.
When the samples are dependent, a special t test for dependent means is used. This
test employs the difference in values of the matched pairs. The hypotheses are as
follows:
Two-tailed test Right-tailed test Left-tailed test

𝐻0 : 𝜇𝐷 = 0 𝐻0 : 𝜇𝐷 = 0 𝐻0 : 𝜇𝐷 = 0
𝐻1 : 𝜇𝐷 ≠ 0 𝐻1 : 𝜇𝐷 > 0 𝐻1 : 𝜇𝐷 < 0
where 𝜇𝐷 is the symbol for the expected mean of the difference of the matched pairs.
The general procedures for finding the test value involves several steps.
First, find the differences of the values of the pairs of data.
𝐷 = 𝑋1 − 𝑋2
̅ of the differences, using the formula

Second, find the mean 𝐷
∑𝐷
̅=
𝐷 where 𝑛 is the number of data pairs.
𝑛
Third, find the standard deviation 𝑠𝐷 of the differences, using the formula
2
𝑛 ∑ 𝐷 − (∑ 𝐷)2
𝑠𝐷 = √
𝑛(𝑛 − 1)
Fourth, find the estimated standard error 𝑠𝐷̅ of the differences, which is
𝑠𝐷
𝑠𝐷̅ =
√𝑛
Finally, find the test value, using the formula
̅ − 𝜇𝐷
𝐷
𝑡= with d. f. = 𝑛 − 1
𝑠𝐷 / √𝑛
The formula in the final step follows the basic format of

where the observed value is the mean of the differences. The expected value 𝜇𝐷 is
zero if the hypothesis is 𝜇𝐷 = 0. The standard error of the difference is the standard
deviation of the difference, divided by the square root of the sample size. Both
populations must be normally or approximately normally distributed.
Assumptions for the t Test for Two Independent Means When 𝝈𝟏 and 𝝈𝟐 are
Unknown
1. The sample or samples are random.
2. The sample data are dependent.
3. When the sample size or sample sizes are less than 30, the population must
be normally or approximately normally distributed.
EXAMPLE 10 Bank Deposits

A sample of nine local banks shows their deposits (in billions of dollars) 3 years ago
and their deposits (in billions of dollars) today. At 𝛼 = 0.05, can it be concluded that
the average in deposits for the banks is greater today than it was 3 years ago?
Bank 1 2 3 4 5 6 7 8 9
3 years ago 11.42 8.41 3.98 7.37 2.28 1.10 1.00 0.9 1.35
Today 16.69 9.44 6.53 5.58 2.92 1.88 1.78 1.5 1.22
Solution

Since we are interested to see of there has been an increase in deposits,
the deposits 3 years ago must be less than the deposits today; hence,
the differences must be significantly less 3 years ago than they are today.
Hence the mean of the differences must be less than zero.
𝐻0 : 𝜇𝐷 = 0 and 𝐻1 : 𝜇𝐷 < 0 (claim)

The degreed of freedom are n – 1, or 9 – 1 = 8. The critical value for a
left-tailed with 𝛼 = 0.05 is −𝟏. 𝟖𝟔𝟎.

a. Make a table.
3 years ago Now A B

(𝑿𝟏 ) (𝑿𝟐 ) 𝑫 = 𝑿𝟏 − 𝑿𝟐 𝑫 = (𝑿𝟏 − 𝑿𝟐 )𝟐
𝟐
11.42 16.69
8.41 9.44
3.98 6.53
7.37 5.58
2.28 2.92
1.10 1.88
1.00 1.78
0.90 1.50
1.35 1.22
b. Find the differences and place the results in column A.
3 years ago Now A B

(𝑿𝟏 ) (𝑿𝟐 ) 𝑫 = 𝑿𝟏 − 𝑿𝟐 𝑫𝟐 = (𝑿𝟏 − 𝑿𝟐 )𝟐
11.42 16.69 −5.27
8.41 9.44 −1.03
3.98 6.53 −2.55
7.37 5.58 +1.79
2.28 2.92 −0.64
1.10 1.88 −0.78
1.00 1.78 −0.78
0.90 1.50 −0.60
1.35 1.22 +0.13
∑𝐷 = −9.73
c. Find the means of the differences.
∑𝐷 −9.73
̅=
𝐷 = = −1.081
𝑛 9
d. Square the differences and place the results in Column B.
3 years ago Now A B

(𝑿𝟏 ) (𝑿𝟐 ) 𝑫 = 𝑿𝟏 − 𝑿𝟐 𝑫𝟐 = (𝑿𝟏 − 𝑿𝟐 )𝟐
11.42 16.69 −5.27 27.7729
8.41 9.44 −1.03 1.0609
3.98 6.53 −2.55 6.5025
7.37 5.58 +1.79 3.2041
2.28 2.92 −0.64 0.4096
1.10 1.88 −0.78 0.6084
1.00 1.78 −0.78 0.6084
0.90 1.50 −0.60 0.3600
1.35 1.22 +0.13 0.1690
2
∑𝐷 = −9.73 ∑𝐷 = 40.5437
e. Find the standard deviation of the differences.
2
𝑛 ∑ 𝐷 − (∑ 𝐷)2
𝑠𝐷 = √
𝑛(𝑛 − 1)
(9)(40.5437) − (−9.73)2
𝑠𝐷 = √
9(9 − 1)
270.2204
𝑠𝐷 = √
72
𝑠𝐷 = 1.937
f. Find the test value.
̅ − 𝜇𝐷
𝐷 −1.081 − 0
𝑡= = = −𝟏. 𝟔𝟕
𝑠𝐷 / √𝑛 1.937 / √9

Do not reject the null hypothesis, since the test value, –1.67, is greater
than the critical value, –1.860. See illustration below.
There is not enough evidence to show that the deposits have increased
over the last 3 years.

Group.
•
APPLY THE CONCEPT!
ACTIVITY 1!
Classify the following situations whether each is considered independent or
dependent samples.
1. Heights of identical twins
2. Test scores of the same students in English and psychology
3. The effectiveness of two different brands of aspirin
4. Effects of a drug on reaction time, measured by a before-and-after test
5. The effectiveness of two different diets on two different groups of individuals
ACTIVITY 2!
After the discussion on testing the difference between two means of dependent
samples, you have now known the preliminary concepts about advanced statistics.
With that, let now us check your knowledge by doing the following tasks.
DIRECTION: Perform each of the following steps for the set of questions. For these
exercises assume that the variables are normally or approximately normally
distributed.
1. Retention Test Scores. A sample of non-English majors at a selected college was

used in a study to see if the student retained more from reading a 19th-century novel
or by watching it in DVD form. Each student was assigned one novel to read and a
different one to watch, and then they were given a 20-point written quiz on each novel.
The test results are shown below. At 𝛼 = 0.05, can it be concluded that the book scores
are higher than the DVD scores?
Book 90 80 90 75 80 90 84
DVD 85 72 80 80 70 75 80
2. Bestseller Books. As an aid for improving students’ study habits, nine students
were randomly selected to attend a seminar on the importance of education in life. The
table shows the number of hours each student studied per week before and after the
seminar. At 0.10 (significance level), did attending the seminar increase the number
of hours the students studied per week?
Before 9 12 6 15 3 18 10 13 7
After 9 17 9 20 2 21 15 22 6
GO BEYOND!
Based on the discussion about using t test for testing the difference between
AIR QUALITY
As a researcher for the EPA, you have been asked to determine if the air quality in the
United States has changed over the past 2 years. You select a random sample of 10
metropolitan areas and find the number of days each year that the areas failed to meet
acceptable air quality standards. The data are shown.
Year 1 18 125 9 22 138 29 1 19 17 31

Year 2 24 152 13 21 152 23 6 31 34 20
1. What is the purpose of the study?

2. Are the samples independent or dependent?
3. What hypotheses would you use?
4. What is (are) the critical value(s) that you would use?
5. What statistical test would you use?
6. How many degrees of freedom are there?
8. Could an independent means test have been used?
9. Do you think this was a god way to answer the original question?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________

ULOc: Draw a scatter plot and compute the correlation coefficient
Metalanguage
TERMS DEFINITIONS
1. Correlation ➢ It is a statistical method used to determine
whether a linear relationship between variables
exists.
2. Regression ➢ It is a statistical method used to describe the

nature of the relationship between variables,
that is, positive or negative, linear or nonlinear.
Essential Knowledge
In this part of the module, we will explore about correlation and regression. In
addition, we will explore how to create a scatter plot and calculate the correlation
coefficient.
LESSON 2.3 Correlation and Regression
Another area of inferential statistics involves determining whether a relationship exists

between two or more numerical or quantitative variables. For example, a
businessperson may want to know whether the volume of sales for a given month is
related to the amount of advertising the firm does that month. Educators are interested
in determining whether the number of hours a student studies is related to the student’s
score on a particular exam. Medical researchers are interested in questions such as,
Is caffeine related to heart damage? or Is there a relationship between a person’s age
and his or her blood pressure? A zoologist may want to know whether the birth weight
of a certain animal is related to its life span. These are only a few of the many questions
that can be answered by using the techniques of correlation and regression analysis.
Correlation is a statistical method used to determine whether a linear relationship

between variables exists. Regression is a statistical method used to describe the
nature of the relationship between variables, that is, positive or negative, linear or
nonlinear.
The purpose of this topic is to answer these questions statistically:
1. Are two or more variables linearly related?

2. If so, what is the strength of the relationship?

3. What type of relationship exists?
4. What kind of predictions can be made from the relationship?
To answer the first two questions, statisticians use a numerical measure to determine
whether two or more variables are linearly related and to determine the strength of the
relationship between or among the variables. This measure is called a correlation
coefficient. For example, there are many variables that contribute to heart disease,
among them lack exercise, smoking, heredity, age, stress, and diet. Of these
variables, some are more important than others; therefore, a physician who wants to
help a patient must know which factors are most important.
To answer the third question, you must ascertain what type of relationship exists.
There are two types of relationships: simple and multiple. In a simple relationship,
there are two variables — an independent variable, also called an explanatory
variable or a predictor variable, and a dependent variable, also called a response
variable. A simple relationship analysis is called simple regression, and there is one
independent variable that is used to predict the dependent variable. For example, a
manager may wish to see whether the number of years the salespeople have been
working for the company has anything to do with the amount of sales they make. This
type of study involves a simple relationship, since there are only two variables—years
of experience and amount of sales.
In a multiple relationship, called multiple regression, two or more independent

variables are used to predict one dependent variable. For example, an educator may
wish to investigate the relationship between a student’s success in college and factors
such as the number of hours devoted to studying, the student’s GPA, and the student’s
high school background. This type of study involves several variables.
Simple relationships can also be positive or negative. A positive relationship exists

when both variables increase or decrease at the same time. For instance, a person’s
height and weight are related; and the relationship is positive, since the taller a person
is, generally, the more the person weighs. In a negative relationship, as one variable
increases, the other variable decreases, and vice versa. For example, if you measure
the strength of people over 60 years of age, you will find that as age increases,
strength generally decreases. The word generally is used here because there are
exceptions.
Finally, the fourth question asks what type of predictions can be made. Predictions are
made in all areas and daily. Examples include weather forecasting, stock market
analyses, sales predictions, crop predictions, gasoline price predictions, and sports
predictions. Some predictions are more accurate than others, due to the strength of
the relationship. That is, the stronger the relationship is between variables, the more
accurate the prediction is.
LESSON 2.4 Scatter Plots and Correlation

In simple correlation and regression studies, the researcher collects data on two
numerical or quantitative variables to see whether a relationship exists between the
variables. For example, if a researcher wishes to see whether there is a relationship
between number of hours of study and test scores on an exam, she must select a
random sample of students, determine the hours each studied, and obtain their grades
on the exam. A table can be made for the data, as shown here.
Student Hours of study x Grade y (%)

A 6 82
B 2 63
C 1 57
D 5 88
E 2 68
F 3 75
As stated previously, the two variables for this study are called the independent
variable and the dependent variable. The independent variable is the variable in
regression that can be controlled or manipulated. In this case, the number of hours of
study is the independent variable and is designated as the x variable. The dependent
variable is the variable in regression that cannot be controlled or manipulated. The
grade the student received on the exam is the dependent variable, designated as the
y variable. The reason for this distinction between the variables is that you assume
that the grade the student earns depends on the number of hours the student studied.
Also, you assume that, to some extent, the student can regulate or control the number
of hours he or she studies for the exam.
The determination of the x and y variables is not always clear-cut and is sometimes
an arbitrary decision. For example, if a researcher studies the effects of age on a
person’s blood pressure, the researcher can generally assume that age affects blood
pressure. Hence, the variable age can be called the independent variable, and the
variable blood pressure can be called the dependent variable. On the other hand, if a
researcher is studying the attitudes of husbands on a certain issue and the attitudes
of their wives on the same issue, it is difficult to say which variable is the independent
variable and which is the dependent variable. In this study, the researcher can
arbitrarily designate the variables as independent and dependent.
The independent and dependent variables can be plotted on a graph called a scatter
plot. The independent variable x is plotted on the horizontal axis, and the dependent
variable y is plotted on the vertical axis.
A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the
independent variable x and the dependent variable y.
The scatter plot is a visual way to describe the nature of the relationship between the
independent and dependent variables. The scales of the variables can be different,
and the coordinates of the axes are determined by the smallest and largest data values
of the variables.
EXAMPLE 11 Car Rental Companies

Construct a scatter plot for the data shown for car rental companies in the United
States for a recent year.
Student Cars (in ten thousands) Revenue (in billions)

A 63.0 $7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5
Solution
Step 1 Draw and label the x and y axes.
Step 2 Plot each point on the graph, as shown below.

EXAMPLE 12 Absences and Final Grades

Construct a scatter plot for the data obtained in a study on the number of absences
and the final grades of seven randomly selected students from a statistics class.
Student Number of absences x Final grade y (%)

A 6 82
B 2 86
C 15 43
D 9 74
E 12 58
F 5 90
G 8 78
Solution
EXAMPLE 13 Age and Wealth

A researcher wishes to see if there is a relationship between the ages and net worth
of the wealthiest people in America. The data for a specific year are shown.
Person Age x Net wealth y ($ billions)

A 73 16
B 65 26
C 53 50
D 54 21.5
E 79 40
F 69 16
G 61 19.6
H 65 19
Solution
After the plot is drawn, it should be analyzed to determine which type of relationship,
if any, exists. For example, the plot shown in Example 11 suggests a positive
relationship, since as the number of cars rented increases, revenue tends to increase
also. The plot of the data shown in Example 12 suggests a negative relationship, since
as the number of absences increases, the final grade decreases. Finally, the plot of
the data shown in Example 13 shows no specific type of relationship, since no pattern
is discernible. Note that the data shown in Examples 11 and 12 also suggest a linear
relationship, since the points seem to fit a straight line, although not perfectly.
LESSON 2.5 Correlation Coefficient
Statisticians use a measure called the correlation coefficient to determine the

strength of the linear relationship between two variables. There are several types of
correlation coefficients. The one explained in this section is called the Pearson
product moment correlation coefficient (PPMC), named after statistician Karl
Pearson, who pioneered the research in this area.
The correlation coefficient computed from the sample data measures the strength
and direction of a linear relationship between two quantitative variables. The symbol
for the sample correlation coefficient is r. The symbol for the population correlation
coefficient is 𝜌 (Greek letter rho).
The range of the correlation coefficient is from −1 to +1. If there is a strong positive
linear relationship between the variables, the value of r will be close to +1. If there is
a strong negative relationship between the variables, the value of r will be close to −1.
When there is no linear relationship between the variables or only a weak relationship,
the value of r will be close to 0. See Figure 12.
Figure 12. Range of Values for the Correlation Coefficient
In addition, the graphs in Figure 13 show the relationship between the correlation
coefficients and their corresponding scatter plots. Notice that as the value of the
correlation coefficient increase from 0 to +1 (parts a, b, and c), data values become
closer to an increasingly strong relationship. As the value of the correlation coefficient
decreases from 0 to −1 (parts d, e, and f), the data values also become closer to a
straight line. Again, this suggests a stronger relationship.
Figure 13. Relationship Between the Correlation Coefficient and the Scatter Plot
There are several ways to compute the value of the correlation coefficient. One
method is to use the formula shown here.
Formula for the Correlation Coefficient r
𝑛(∑ 𝑥𝑦) − (∑ 𝑥)(∑ 𝑦)

𝑟=
√[𝑛(∑ 𝑥 2 ) − (∑ 𝑥)2 ] [𝑛(∑ 𝑦2 ) − (∑ 𝑦)2 ]
where 𝑛 is the number of pairs.
Rounding Rule for the Correlation Coefficient: Round the value of r to threes
decimal places. The formula looks somewhat complicated, but using a table to
compute the values makes it somewhat easier to determine the value of r. There are
no units associated with r, and the value of r will remain unchanged if the x and y are
switched.
Assumptions for the Correlation Coefficient

2. The data pairs fall approximately on a straight line and are measured at the
interval or ratio level.
3. The variables have a joint normal distribution. (This means that given any
specific value of x, the y values are normally distributed; and given any
specific value of y, the x values are normally distributed.)
EXAMPLE 14 Car Rental Companies

Compute the correlation coefficient for the data shown for car rental companies in the
United States for a recent year.
Student Cars (in ten thousands) Revenue (in billions)

A 63.0 $7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5
Solution
Step 1 Make a table as shown here.
Cars x Revenue y
Company (in ten thousands) (in billions) xy x2 y2
A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5
Step 2 2 2
Find the values of xy, x , and y and place these values in the
corresponding columns of the table.
The completed table is shown.
Cars x
Revenue y
(in ten
Company (in billions) xy x2 y2
thousands)
A 63.0 7.0 441.00 3969.00 49.00
B 29.0 3.9 113.10 841.00 15.21
C 20.8 2.1 43.68 432.64 4.41
D 19.1 2.8 53.48 364.81 7.84
E 13.4 1.4 18.76 179.56 1.96
F 8.5 1.5 12.75 72.25 2.25
----------- ---------- --------- ---------- -------
TOTAL: 153.8 18.7 682.77 5859.26 80.67
∑ 𝑥 = 153.8
∑ 𝑦 = 18.7
∑ 𝑥𝑦 = 682.77
∑ 𝑥 2 = 5859.26
∑ 𝑦 2 = 80.67
Step 3 Substitute in the formula and solve for r.
𝑛(∑ 𝑥𝑦) − (∑ 𝑥)(∑ 𝑦)

𝑟=
√[𝑛(∑ 𝑥 2 ) − (∑ 𝑥)2 ] [𝑛(∑ 𝑦2 ) − (∑ 𝑦)2 ]
(6)(682.77) − (153.8)(18.7)
𝑟= = 𝟎. 𝟗𝟖𝟐
√[(6)(5859.26) − (153.8)2 ] [(6)(80.67) − (18.7)2 ]
The correlation coefficient suggests a strong relationship between the

number of cars a rental agency had and its annual revenue.
EXAMPLE 15 Age and Wealth

Compute the value of the correlation coefficient for the data of the ages and net worth
of the wealthiest people in America. The data for a specific year are shown.
Person Age x Net wealth y ($ billions)

A 73 16
B 65 26
C 53 50
D 54 21.5
E 79 40
F 69 16
G 61 19.6
H 65 19
Solution
Step 1 Make a table.
Step 2 Find the values of xy, x2, and y2 and place these values in the
corresponding columns of the table.
The completed table is shown.
Revenue
Cars x
y
(in ten
Company (in xy x2 y2
thousands)
billions)
A 73 16 1168 5329 256
B 65 26 1690 4225 676
C 53 50 2650 2809 2500
D 54 21.5 1161 2916 462.25
E 79 40 3160 6241 1600
F 69 16 1104 4761 256
G 61 19.6 1195.6 3721 384.16
H 65 19 1235 4225 361
----------- ---------- --------- --------- -------
TOTAL: 519 208.1 13363.6 34,227 6495.41
∑ 𝑥 = 519
∑ 𝑦 = 208.1
∑ 𝑥𝑦 = 13363.6
∑ 𝑥 2 = 34227
∑ 𝑦 2 = 6945.41
Step 3 Substitute in the formula and solve for r.
𝑛(∑ 𝑥𝑦) − (∑ 𝑥)(∑ 𝑦)

𝑟=
√[𝑛(∑ 𝑥 2 ) − (∑ 𝑥)2 ] [𝑛(∑ 𝑦2 ) − (∑ 𝑦)2 ]
(8)(13,363.6) − (519)(208.1)
𝑟= = −𝟎. 𝟏𝟕𝟔
√[(8)(34,227) − (519)2 ] [(8)(6495.41) − (208.1)2 ]
The value or r indicates a very weak negative relationship between the

variables.
In example 14, the value of r was high (close to 1.00); in example 15, the value of r
was much lower (closer to 0). This question then arises, When is the value of r due to
chance, and when does it suggest a significant linear relationship between the
variables?
The Significance of the Correlation Coefficient: As stated before, the range of the
correlation coefficient is between −1 and +1. When the value of r is near +1 or −1,
there is a strong linear relationship. When the value of r is near 0, the linear relationship
is weak or nonexistent. Since the value of r is computed from the data obtained from
samples, there are two possibilities when r is not equal to zero; either the value of r is
high enough to conclude that there is a significant linear relationship between the
variables, or the value of r is due to chance.

• Vieira, E. (2017). Introduction to Real World Statistics: With Step-by-Step SPSS
Instructions. United States of America: Routledge, Taylor & Francis Group.
• Goos, P., & Meintrup, D. (2015). Statistics with jmp: Graphs, descriptive statistics,
and probability. ProQuest Ebook Central https://search.proquest.com
WRITE IT OUT!
ACTIVITY 1!
After the discussion on correlation, regression, scatter plots, and correlation
coefficient, you have now known the preliminary concepts about advanced statistics.
With that, let now us check your knowledge by doing the following tasks. Now, I will
require you to explain thoroughly your answers about the given argument.
1. What is meant by the statement that two variables are related?

___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
2. How is linear relationship between two variables measured in statistics? Explain.

___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
3. What is meant when the relationship between two variables is called positive?
Negative?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
4. Give examples of two variables that are positively correlated and two variables
that are negatively correlated.
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
5. What is the diagram of the independent and dependent variables called? Why is
drawing this diagram important?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
APPLY THE CONCEPT!
ACTIVITY 1!
After the discussion on correlation, regression, scatter plots, and correlation
coefficient, you have now known the preliminary concepts about advanced statistics.
With that, let now us check your knowledge by doing the following tasks.
a. Draw the scatter plot for the variables.
b. Compute the value of the correlation coefficient.
c. State the hypotheses.
d. Test the significance of the correlation coefficient at 𝛼 = 0.05, using Table I.
e. Give a brief explanation of the type of relationship.
1. Gas Tax and Fuel Use. The data below indicate the state gas tax in cents per
gallon and the fuel use per registered vehicle (in gallons). Is there a significant
relationship between these two variables?
Tax 21.5 23 18 24.5 26.4 19
Usage 1062 631 920 686 736 684
2. Forest Fires and Acres Burned. An environmentalist wants to determine the

relationships between the numbers (in thousands) of forest fires over the year and the
number (in hundred thousands) of acres burned. The data for 8 recent years are
shown. Describe the relationship.
Number of fires x 72 69 58 47 84 62 57 45
Number of acres
62 42 19 26 51 15 30 15
burned y
3. School Districts and Secondary Schools. A random sample of states yielded the
following numbers of local school districts and the corresponding numbers of
secondary schools. Is there a significant relationship between the data?
School Districts 53 19 24 17 95 68
Secondary Schools 50 27 187 84 143 216
4. Emergency Calls and Temperature. An emergency service wishes to see whether

a relationship exists between the outside temperature and the number of emergency
calls it receives for a 7-hour period. The data are shown
Temperature x 68 74 82 88 93 99 101
No. of Calls y 7 4 8 10 11 9 13
GO BEYOND!
Based on the discussion about using z test for testing the difference between
STOPPING DISTANCES
In a study on speed control, it was found that the main reasons for regulations were to
make traffic flow more efficient and to minimize the risk of danger. An area that was
focused on in the study was the distance required to completely stop a vehicle at
various speeds. Use the following table to answer the questions.
MPH Braking Distance (feet)

20 20
30 45
40 81
50 133
60 205
80 411
Assume MPH is going to be used to predict stopping distance.
1. Which of the two variables is the independent variable?

2. Which is the dependent variable?
3. What type of variable is the independent variable?
4. What type of variable is the dependent variable?
5. Construct a scatter plot for the data.
6. Is there a linear relationship between the two variables?
7. Redraw the scatter plot and change the distances between the independent
variable numbers. Does the relationship look different?
8. Is the relationship positive or negative?
9. Can braking distance be accurately predicted from MPH?
10. List some other variables that affect braking distance.
11. Compute the value of r.
12. Is r significant at 𝛼 = 0.05?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
Big Picture
a. Test a distribution for goodness of fit using chi-square,

b. Test two variables for independence using chi-square, and
c. Use the one-way ANOVA technique to determine significant difference among
three or more means.

ULOa: Test a distribution for goodness of fit using chi-square
Metalanguage
TERMS DEFINITIONS
1. Chi-square ➢ It is a statistical method assessing the goodness of
fit between observed values and those expected
theoretically.
2. Goodness of Fit ➢ It is used to see whether a frequency distribution fits

a specific pattern.
Essential Knowledge
To perform the aforesaid big picture (unit learning outcomes) for the sixth and
seventh weeks of the course, you need to fully understand the essential knowledge
that will be laid down in the succeeding pages. Please note that you are not limited to
CHI-SQUARE TESTS
The chi-square distribution can also be used for test concerning frequency
distributions, such as “If a sample of buyers is given a choice of automobile colors, will
each color be selected with the same frequency?” The chi-square distribution can be
used to test the independence of two variables, for example, “Are senators’ opinions
on gun control independent of party affiliations?” That is, do the republicans feel one
way and the Democrats feel differently, or do they have the same opinion?
Finally, the chi-square distribution can be used to test the homogeneity of proportions.
For example, is the proportion of high school seniors who attend college immediately
after graduating the same for the northern, southern, eastern, and western parts of the
United States? This section explains the chi-square distribution and its application.
LESSON 3.1 Test for Goodness of Fit
In addition to being used to test a single variance, the chi-square statistic can be used
to see whether a frequency distribution fits a specific pattern. For example, to meet
customer demands, a manufacturer of running shoes may wish to see whether buyers
show a preference for a specific style. A traffic engineer may wish to see whether
accidents occur more often on some days than on others, so that she can increase
police patrols accordingly. An emergency service may want to see whether it receives
more calls at certain times of the day than at others, so that it can provide adequate
staffing.
When you are testing to see whether a frequency distribution fits a specific pattern,
you can use the chi-square goodness-of-fit test. For example, suppose as a market
analyst you wished to see whether consumers have any preference among five flavors
of a new fruit soda. A sample of 100 people provided these data.
Cherry Strawberry Orange Lime Grape

32 28 16 14 10
If there were no preference, you would expect each flavor to be selected with equal
frequency. In this case, the equal frequency is 100 / 5 = 20. That is, approximately 20
people would select each flavor.
Since the frequencies for each flavor were obtained from a sample, these actual
frequencies are called the observed frequencies. The frequencies obtained by
calculation (as if there were no preference) are called the expected frequencies. A
completed table for the test is shown.
Frequency Cherry Strawberry Orange Lime Grape

Observed 32 28 16 14 10
Expected 20 20 20 20 20
The observed frequencies will almost differ from the expected frequencies due to
sampling error; that is, the values differ from sample to sample. But the question is:
Are these differences significant (a preference exists), or are they due to chance? The
chi-square goodness-of-fit test will enable the researcher to determine the answer.
Before computing the test value, you must state the hypotheses. The null hypothesis
should be a statement indicating that there is no difference or no change.
For this example, the hypotheses are as follows:
H0: Consumers show no preference for flavors of the fruit soda.

H1: Consumers show a preference.
In the goodness-of-fit test, the degrees of freedom are equal to the number of
categories minus 1. For this example, there are five categories (cherry, strawberry,
orange, lime, and grape); hence, the degrees of freedom are 5 – 1 = 4. This is so
because the number of subjects in each of the first four categories is free to vary. But
in order for the sum to be 100 – the total number of subjects – the number of subjects
in the last category is fixed.
Formula for the Chi-Square Goodness-of-Fit Test
(𝑂 − 𝐸)2
𝜒2 = ∑
𝐸
with degrees of freedom equal to the number of categories minus 1, and where
𝑂 = observed frequency
𝐸 = expected frequency
Two assumptions are needed for the goodness-of-fit test. These assumptions are
given next.
Assumptions for the Chi-Square Goodness-of-Fit Test

1. The data are obtained from a random sample.
2. The expected frequency for each category must be 5 or more.
This test is a right-tailed test, since when the 𝑂 − 𝐸 values are squared, the answer
will be positive or zero. Consider the next example.
EXAMPLE 16 Fruit Soda Flavor Preference

Is there enough evidence to reject the claim that there is no preference in the selection
of fruit soda flavors, using the data shown below? Let 𝛼 = 0.05.
Cherry Strawberry Orange Lime Grape

32 28 16 14 10
Solution

H0: Consumers show no preference for flavors. (claim)
H1: Consumers show a preference.

The degrees of freedom are 5 – 1 = 4, and 𝛼 = 0.05. Hence, the critical
value from Table G in the appendix is 9.488.

Subtract the expected value from the observed value, squaring the result
and dividing by the expected value, and finding the sum. The expected
value for each category is 20, as calculated earlier.
(𝑂 − 𝐸)2
𝜒2 = ∑
𝐸
2
(32 − 20)2 (28 − 20)2 (16 − 20)2 (14 − 20)2 (10 − 20)2
𝜒 = + + + +
20 20 20 20 20
𝜒 2 = 18.0

The decision is to reject the null hypothesis since 18.0 > 9.488. See
illustration below.
There is enough evidence to reject the claim that consumers show no

preference for the flavors.
To get some idea of why this test is called the goodness-of-fit test, examine graphs of
the observed values and expected values. See Figure 14. From the graphs, you can
see whether the observed values and expected values are close together or far apart.
Figure 14. Graphs of the Observed and Expected Values for Soda Flavors
When the observed values and expected values are close together, the chi-square
test value will be small. Then the decision will be to not reject the null hypothesis –
hence, there is “a good fit”. When the observed values and the expected values are
far apart, the chi-square test value will be large. Then the null hypothesis will be
rejected – hence, there is “not a good fit”. See Figure 15 for the illustration.
Figure 15. Results of the Goodness-of-Fit Test
When there is perfect agreement between the observed and the expected values,
𝜒 2 = 0. Also, 𝜒 2 can never be negative. Finally, the test is right-tailed because “H0:
Good fit” and “H1: Not a good fit” mean that 𝜒 2 will be small in the first case and
large in the second case.
Here is another example for chi-square goodness-of-fit test.
EXAMPLE 17 Firearm Deaths

A researcher read that firearm-related deaths for people aged 1 to 18 were distributed
as follows: 74% were accidental, 16% were homicides, and 10% were suicides. In her
district, there were 68 accidental deaths, 27 homicides, and 5 suicides during the past
year. At 𝛼 = 0.10, test the claim that the percentages are equal.
Solution

H0: The deaths due to firearms for people aged 1 through 18 are
distributed as follows: 74% accidental, 16% homicides, and 10%
suicides. (claim)
H1: The distribution is not the same as stated in the null hypothesis.

Since 𝛼 = 0.10 and the degrees of freedom are 3 – 1 = 2, the critical value
is 4.605.

The expected values are as follows:
0.74 × 100 = 74
0.16 × 100 = 16
0.10 × 100 = 10
(𝑂 − 𝐸)2
𝜒2 = ∑
𝐸
(68 − 74)2 (27 − 16)2 (5 − 10)2

𝜒2 = + +
74 16 10
𝜒 2 = 10.549

illustration below.
There is enough evidence to reject the claim that the distribution is 74%
accidental, 16% homicides, and 10% suicides.

statistics, and probability. ProQuest Ebook Central https://search.proquest.com
WRITE IT OUT!
ACTIVITY 1!
After the discussion on goodness-of-fit test, you have now known the
1. How are the degrees of freedom computed for the goodness-of-fit test?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
2. How are the expected values computed for the goodness-of-fit test?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
3. When the expected frequencies are less than 5 for a specific class, what should be
done so that you can use the goodness-of-fit test?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
APPLY THE CONCEPT!
ACTIVITY 1!
After the discussion on goodness-of-fit test, you have now known the
1. Home-Schooled Student Activities. Students who are home-schooled often

attend their local schools to participate in various types of activities such as sports or
musical ensembles. According to the government, 82% of home - schoolers receive
their education entirely at home, while 12% attend school up to 9 hours per week and
6% spend from 9 to 25 hours per week at school. A survey of 85 students who are
home-schooled revealed the following information about where they receive their
education
Entirely at home Up to 9 hours 9 to 25 hours
50 25 10
At 𝛼 = 0.05, is there sufficient evidence to conclude that the proportions differ from
those stated by the government?
2. Assessment of Mathematics Students. As part of the Mathematics Assessment,

eighth - graders were asked about the frequency with which they used calculators
while taking tests or quizzes. The results for national public schools were as follows:
never, 28%; sometimes, 51%, and always, 21%. A random sample of 140 eighth-
grade students in a large urban school district indicated that 30 said never, 78 said
sometimes, and 32 said always. At 𝛼 = 0.05 do these proportions differ from the
national report?
3. Music Sales. In a recent year, 77.8% of recorded music sales were full-length CDs,
12.8% digital downloads, 3.8% singles, and the rest a mixture of other formats. A
survey of college students and their recent music purchases indicated that out of 200
purchases, 48 were downloads, 135 were full-length CDs, 10 were singles, and the
rest fit into the “other” category. At the 0.05 level of significance is there sufficient
evidence to conclude that at least one of the proportions differs from its original value?
GO BEYOND!
Based on the discussion about goodness-of-fit test, it is necessary to go further

what was given by the book. You are tasked now to answer the given problem as well
as explain the reason behind your answers.
NEVER THE SAME AMOUNTS

M&M/Mars, the makers of Skittles candies, states that the flavor blend is 20% for each
flavor. Skittles is a combination of lemon, lime, orange, strawberry, and grape flavored
candies. The following data list the results of four randomly selected bags of Skittles
and their flavor blends. Use the data to answer the questions.
Flavor
Bag Green Orange Red Purple Yellow
1 7 20 10 7 14
2 20 5 5 13 17
3 4 16 13 21 4
4 12 9 16 3 17
Total 43 50 44 44 52
1. Are the variables quantitative or qualitative?

2. What type of test can be used to compare the observed values to the
expected values?
3. Perform a chi-square test on the total values.
4. What hypotheses did you use?
5. What were the degrees of freedom for the test? What is the critical value?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________

ULOb: Test two variables for independence using chi-square
Metalanguage
TERMS DEFINITIONS
1. Independence Test ➢ It used to determine whether two variables are
independent of or related to each other when a
single sample is selected.
2. Contingency Table ➢ It is when data are arranged in table form for the
chi-square independence test.
Essential Knowledge
In this part of the module, we will explore how to use chi-square distribution for
independence test. Follow closely the discussion and answer the exercises.
LESSON 3.2 Test for Independence
The chi-square independence test can be used to determine whether two variables
are independent of or related to each other when a single sample is selected. For
example, suppose a new postoperative procedure is administered to a number of
patients in a large hospital. The researcher can ask the question, Do the doctors feel
differently about this procedure from the nurses, or do they feel basically the same
way? Note that the question is not whether they prefer the procedure but whether there
is a difference of opinion between the two groups.
To answer this question, a researcher selects a sample of nurses and doctors and
tabulates the data in table form, as shown.
Prefer new Prefer old No

Group
procedure procedure preference
Nurses 100 80 20
Doctors 50 120 30
As the survey indicates, 100 nurses prefer the new procedure, 80 prefer the old
procedure, and 20 have no preference; 50 doctors prefer the new procedure, 120 like
the old procedure, and 30 have no preference. Since the main question is whether
there is a difference in opinion, the null and alternative hypothesis are stated as
follows:
H0: The opinion about the procedure is independent of the profession.

H1: The opinion about the procedure is dependent on the profession.
If the null hypothesis is not rejected, the test means that both professions feel basically
the same way about the procedure and the differences are due to chance. If the null
hypothesis is rejected, the test means that one group feels differently about the
procedure from the other. Remember that rejection does not mean that one group
favors the procedure and the other does not. Perhaps both groups favor it or both
dislike it, but in different proportions.
To test the null hypothesis by using the chi-square independence test, you must
compute the expected frequencies, assuming that the null hypothesis is true. These
frequencies are computed by using the observed frequencies given in the table.
When data are arranged in table form for the chi-square independence test, the table
is called a contingency table. The table is made up of R rows and C columns. The
table here has two rows and three columns.

Group
Nurses 100 80 20
Doctors 50 120 30
Note that row and column headings do not count in determining the number of rows
and columns.
A contingency table is designated as an R x C (rows by columns) table. In this case,

R = 2 and C = 3; hence, this table is a 2 x 3 contingency table. Each block in the table
is called a cell and is designated by its row and column position. For example, the cell
with a frequency of 80 is designated as C1, 2, or row 1, column 2. The cells are shown
below
Column 1 Column 2 Column 3

Row 1 C1, 1 C1, 2 C1, 3
Row 2 C2, 1 C2, 2 C2, 3
The degrees of freedom for any contingency table are (rows – 1) times (columns – 1);
that is, d.f. = (R – 1)(C – 1). In this case, (2 – 1)(3 – 1) = (1)(2) = 2. The reason for this
formula for d.f. is that all expected values except one are free vary in each row and in
each column.
Using the previous table, you can compute the expected frequencies for each block
(or cell), as shown next.
1. Find the sum of each row and each column, and find the grand total, as shown.

Group Total
Nurses 100 80 20 200

(Row 1 sum)
Doctors + 50 + 120 + 30 200

(Row 2 sum)
--------- ---------- --------- -----------------
Total 150 200 50 400
(Column 1 (Column 2 (Column 3 (Grand total)
sum) sum) sum)
2. For each cell, multiply the corresponding row sum by the column sum and divide by
the grand total, to get the expected value:
row sum × column sum

Expected value =
grand total
For example, for C1, 2, the expected value denoted by E1, 2, is
(200)(200)
𝐸1,2 = = 100
400
For each cell, the expected values are computed as follows:
(200)(150) (200)(200) (200)(50)

𝐸1,1 = = 75 𝐸1,2 = = 100 𝐸1,3 = = 25
400 400 400
(200)(150) (200)(200) (200)(50)

𝐸2,1 = = 75 𝐸2,2 = = 100 𝐸2,3 = = 25
400 400 400
The expected values can now be placed in corresponding cells along with the
observed values, as shown.

Group Total
Nurses 100 (75) 80 (100) 20 (25) 200
Doctors 50 (75) 120 (100) 30 (25) 200
--------- ---------- --------- -----------------
Total 150 200 50 400
The rationale for the computation of the expected frequencies for a contingency table
uses proportions. For C1, 1 a total of 150 out of 400 people prefer the new procedure.
Since there are 200 nurses, you would expect, if the null hypothesis were true, (150 /
400) (200), or 75, of the nurses to be in favor of the new procedure.
The formula for the test value for the independence test is the same as the one used
for the goodness-of-fit test. It is
2
(𝑂 − 𝐸)2
𝜒 = ∑
𝐸
For the previous example, compute the (𝑂 − 𝐸)2 / 𝐸 values for each cell, and then
find the sum.
(𝑂 − 𝐸)2
𝜒2 = ∑
𝐸
2
(100 − 75)2 (80 − 100)2 (20 − 25)2 (50 − 75)2
𝜒 = + + +
75 100 25 75
(120 − 100)2 (30 − 25)2
+ +
100 25
𝜒 2 = 26.67
The final steps are to make the decision and summarize the results. This test is always
a right-tailed test, and the degrees of freedom are (R – 1)(C – 1) = (2 – 1)(3 – 1) = 2.
If 𝛼 = 0.05, the critical value from Table G in the appendix is 5.991. Hence, the decision
is to reject the null hypothesis since 26.67 > 5.991. See illustration below.
The conclusion is that there is enough evidence to support the claim that opinion is
related to (dependent on) profession – that is, that the doctors and nurses differ in their
opinions about the procedure.
Consider the next example to further discussed the concept.

EXAMPLE 18 Hospitals and Infections

A researcher wishes to see if there is a relationship between the hospital and the
number of patient infections. A sample of 3 hospitals was selected, and the number of
infections for a specific year has been reported. The data are shown next
Surgical site Pneumonia Bloodstream

Hospital Total
infections infections infections
A 41 27 51 119
B 36 3 40 79
C 169 106 109 384
--------- ---------- --------- -----------------
Total 246 136 200 582
At 𝛼 = 0.05, can it be concluded that the number of infections is related to the hospital
where they occurred?
Solution

H0: The number of infections is independent of the hospital.
H1: The number of infections is dependent of the hospital. (claim)

The critical value at 𝛼 = 0.05 with (3 – 1)(3 – 1) = (2)(2) = 4 degrees of
freedom is 9.488.
a. First find the expected values.
(119)(246) (119)(136) (119)(200)

𝐸1,1 = = 50.30 𝐸1,2 = = 27.81 𝐸1,3 = = 40.89
582 582 582
(79)(246) (79)(136) (79)(200)
𝐸2,1 = 33.39 𝐸2,2 = = 18.46 𝐸2,3 = = 27.15
582 582 582
(384)(246) (384)(136) (384)(200)

𝐸3,1 = = 162.31 𝐸3,2 = = 89.73 𝐸3,3 = = 131.96
582 582 582
b. Complete the table with observed and expected values.
Surgical site Pneumonia Bloodstream

Hospital Total
infections infections infections
A 41 (50.30) 27 (27.81) 51 (40.89) 119
B 36 (33.39) 3 (18.46) 40 (27.15) 79
C 169 (162.31) 106 (89.73) 109 (131.96) 384

--------- ---------- --------- ------------
Total 246 136 200 582
c. Substitute in the formula and evaluate.
2
(𝑂 − 𝐸)2
𝜒 = ∑
𝐸
2
(41 − 50.30)2 (27 − 27.81)2 (51 − 40.89)2 (36 − 33.39)2
𝜒 = + + +
50.30 27.81 40.89 33.39
(3 − 18.46)2 (40 − 27.15)2 (169 − 162.31)2
+ + +
18.46 27.15 162.31
(106 − 89.73)2 (109 − 131.96)2
+ +
89.73 131.96
𝜒 2 = 30.698

illustration below.
There is enough evidence to support the claim that the number of

infections is related to the hospital where they occurred.

Group.
WRITE IT OUT!
ACTIVITY 1!
After the discussion on independence test, you have now known the preliminary
doing the following tasks. Now, I will require you to explain thoroughly your answers
about the given argument.
1. How is the chi-square independence test similar to the goodness-of-fit test? How is
it different?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
2. How are the degrees of freedom computed for the independence test?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
3. Generally, how would the null and alternative hypotheses be stated for the chi-
square independence test?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
4. How are the expected values computed for each cell in the table?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
APPLY THE CONCEPT!
After the discussion on independence test, you have now known the preliminary
1. Ethnicity and Movie Admission. Are movie admissions related to ethnicity? A

2007 study indicated the following numbers of admissions (in thousands) for two
different years. At the 0.05 level of significance can it be concluded that movie
attendance by year was dependent upon ethnicity?
Caucasian Hispanic African-American Other

2006 936 240 195 101
2007 909 297 150 105
2. Women in the Military. This table lists the numbers of officers and enlisted
personnel for women in the military. At 𝛼 = 0.05, is there sufficient evidence to
conclude that a relationship exists between rank and branch of the Armed Forces?
Officers Enlisted
Army 10, 791 62, 491
Navy 7, 816 42, 750
Marine Corps 932 9, 525
Air Force 11, 819 54, 344
3. Medal Counts for the Olympics. The 2010 Winter Olympics final medal counts for
the top four nations are shown below. At the 0.10 level of significance can it be
concluded that the type of medal won was dependent upon the competing country?
Gold Silver Bronze
United States 9 15 13
Germany 10 13 7
Canada 14 7 5
Norway 9 8 6
GO BEYOND!
Based on the discussion about independence test, it is necessary to go further

SATELLITE DISHES IN RESTRICTED AREAS

The Senate is expected to vote on a bill to allow the installation of satellite dishes of
any size in deed-restricted areas. The House had passed a similar bill. An opinion poll
was taken to see if how a person felt about satellite dish restrictions was related to his
or her age. A chi-square test was run, creating the following computer-generated
information.
Degrees of freedom d.f. = 6
Test statistic 𝜒 2 = 61.25
Critical value C.V. = 12.6
P-value = 0.00
Significance level = 0.05
18 - 29 30 - 49 50 - 64 65 and up
For 96 (79.5) 96 (79.5) 90 (79.5) 36 (79.5)
Against 201 (204.75) 189 (204.75) 195 (204.75) 234 (204.75)
Don’t Know 3 (15.75) 15 (15.75) 15 (15.75) 30 (15.75)
1. Which number from the output is compared to the significance level to check
if the null hypothesis should be rejected?
2. Which number from the output gives the probability of a type I error that is
calculated from your sample data?
3. Was a right-, left-, or two-tailed test run? Why?
4. Can you tell how many rows and columns there were by looking at the
degrees of freedom?
5. Does increasing the sample size change the degrees of freedom?
6. What are your conclusions? Look at the observed and expected frequencies
in the table to draw some of your own specific conclusions about response and
age.
7. What would your conclusions be if the level of significance were initially set
at 0.10?
8. Does chi-square tell you which cell’s observed and expected frequencies are
significantly different?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________

ULOc: Use the one-way ANOVA technique to determine
significant difference among three or more means
Metalanguage
TERMS DEFINITIONS
1. Analysis of ➢ It is used to test claims involving three or more
Variance (ANOVA) means.
2. One-way ANOVA ➢ It is the analysis of variance that is used to

compare three or more means since it contains
only one variable.
3. Two-way ANOVA ➢ It is the analysis of variance can be extended to

studies involving two variables.
Essential Knowledge
In this part of the module, we will explore analysis of variance. We will explore
one-way and two-way analysis of variance.
ANALYSIS OF VARIANCE
The F test, used to compare two variances, can also be used to compare three or
more means. This technique is called analysis of variance, or ANOVA. It is used to
test claims involving three or more means. (Note: The F test can also be used to test
the equality of two means. Since it is equivalent to the t test in this case, the t test is
usually used instead of the F test when there are only two means.) For example,
suppose a researcher wishes to see whether the means of the time it takes three
groups of students to solve a computer problem using Fortran, Basic, and Pascal are
different. The researcher will use the ANOVA technique for this test. The z and t tests
should not be used when three or more means are compared.
For three groups, the F test can only show whether a difference exists among the three
means. It cannot reveal where the difference lies—that is, between 𝑋̅1 and 𝑋̅2, or 𝑋̅1
and𝑋̅3, or 𝑋̅2 and 𝑋̅3. If the F test indicates that there is a difference among the means,
other statistical tests are used to find where the difference exists. The most commonly
used tests are the Scheffé test and the Tukey test.
The analysis of variance that is used to compare three or more means is called a one-
way analysis of variance since it contains only one variable. In the previous example,
the variable is the type of computer language used. The analysis of variance can be
extended to studies involving two variables, such as type of computer language used
and mathematical background of the students. These studies involve a two-way
analysis of variance.
LESSON 3.3 One-Way Analysis of Variance
When an F test is used to test a hypothesis concerning the means of three or more
populations, the technique is called analysis of variance (commonly abbreviated as
ANOVA). At first glance, you might think that to compare the means of three or more
samples, you can use the t test, comparing two means at a time. But there are several
reasons why the t test should not be done.
First, when you are comparing two means at a time, the rest of the means under study
are ignored. With the F test, all the means are compared simultaneously. Second,
when you are comparing two means at a time and making all pairwise comparisons,
the probability of rejecting the null hypothesis when it is true is increased, since the
more t tests that are conducted, the greater is the likelihood of getting significant
differences by chance alone. Third, the more means there are to compare, the more t
tests are needed. For example, for the comparison of 3 means two at a time, 3 t tests
are required. For the comparison of 5 means two at a time, 10 tests are required. And
for the comparison of 10 means two at a time, 45 tests are required.
Assumptions for the F Test for Comparing Three or More Means

1. The population from which the samples were obtained must be normally or
approximately normally distributed.
2. The samples must be independent of one another.
3. The variances of the populations must be equal.
Even though you are comparing three or more means in this use of the F test,
variances are used in the test instead of means.
With the F test, two different estimates of the population variance are made. The first
estimate is called the between-group variance, and it involves finding the variance
of the means. The second estimate, the within-group variance, is made by
computing the variance using all the data and is not affected by differences in the
means. If there is no difference in the means, the between-group variance estimate
will be approximately equal to the within-group variance estimate, and the F test value
will be approximately equal to 1. The null hypothesis will not be rejected. However,
when the means differ significantly, the between-group variance will be much larger
than the within-group variance; the F test value will be significantly greater than 1; and
the null hypothesis will be rejected. Since variances are compared, this procedure is
called analysis of variance (ANOVA).
For a test of the difference among three or more means, the following hypotheses
should be used:
𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘
𝐻1 : At least one mean is different from the others.
As stated previously, a significant test value means that there is a high probability that
this difference in means is not due to chance, but it does not indicate where the
difference lies.
The degrees of freedom for this F test are d.f.N. = 𝑘 − 1, where 𝑘 is the number of
groups, and d.f.D. = 𝑁 − 𝑘, where 𝑁 is the sum of the sample sizes of the groups 𝑁 =
𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘 . The sample sizes need not be equal. The F test to compare
means is always right-tailed.
Consider the following examples.
EXAMPLE 19 Lowering Blood Pressure

A researcher wishes to try three different techniques to lower the blood pressure of
individuals diagnosed with high blood pressure. The subjects are randomly assigned
to three groups; the first group takes medication, the second group exercises, and the
third group follows a special diet. After four weeks, the reduction in each person’s
blood pressure is recorded. At 𝛼 = 0.05, test the claim that there is no difference among
the means. The data are shown.
Medication Exercise Diet

10 6 5
12 8 9
9 3 12
15 0 8
13 2 4
Solution

H0: 𝜇1 = 𝜇2 = 𝜇3 (claim)
H1: At least one mean is different from the others.

Since 𝑘 = 3 and 𝑁 = 15,
d.f.N. = 𝑘 − 1 = 3 – 1 = 2
d.f.D. = 𝑁 − 𝑘 = 15 – 3 = 12
The critical value is 3.89, obtained from Table H in the appendix with 𝛼 =
0.05.
a. Find the mean and the variance of each sample.
Medication Exercise Diet

10 6 5
12 8 9
9 3 12
15 0 8
13 2 4
𝑋̅1 = 11.8 𝑋̅2 = 3.8 𝑋̅3 = 7.6
𝑠12 = 5.7 𝑠22 = 10.2 𝑠32 = 10.3
b. Find the grand mean. The grand mean, denoted by 𝑋̅𝐺𝑀 , is the mean
of all values in the samples.
∑𝑋 10 + 12 + 9 + 15 + 13 + 6 + 8 + ⋯ + 4 116
𝑋̅𝐺𝑀 = = = = 7.73
𝑁 15 15
When samples are equal in size, find 𝑋̅𝐺𝑀 by summing the 𝑋̅’s and
dividing by 𝑘, where 𝑘 = the number of groups.
c. Find the between-group variance, denoted by 𝑠𝐵2 .
∑ 𝑛𝑖 (𝑋̅𝑖 − 𝑋̅𝐺𝑀 )2
𝑠𝐵2 =
𝑘−1
5(11.8 − 7.73)2 + 5(3.8 − 7.73)2 + 5(7.6 − 7.73)2

𝑠𝐵2 =
3−1
160.13
𝑠𝐵2 = = 80.07
2
Note: This formula finds the variance among the means by using the
sample sizes as weights and considers the differences in the means.
2
d. Find the within-group variance, denoted by 𝑠𝑊 .
2
∑(𝑛𝑖 − 1)𝑠𝑖2
𝑠𝑊 =
∑(𝑛𝑖 − 1)
2
(5 − 1)(5.7) + (5 − 1)(10.2) + (5 − 1)(10.3)
𝑠𝑊 =
(5 − 1)(5 − 1)(5 − 1)
2
104.80
𝑠𝑊 = = 8.73
12
Note: This formula finds an overall variance by calculating a weighted

average of the individual variances. It does not involve using differences
of the means.
e. Find the F test value.
𝑠𝐵2 80.07
𝐹= 2 = 8.73 = 𝟗. 𝟏𝟕
𝑠𝑊

The decision is to reject the null hypothesis since 9.17 > 3.89.
There is enough evidence to reject the claim and conclude that at least
one mean is different from the others.
The numerator of the fraction obtained in step 3, part c, of the computational process
is called the sum of squares between groups, denoted by SS𝐵 . The numerator of the
fraction obtained in step 3, part d, of the computational procedure is called the sum of
squares within groups, denoted by SS𝑊 . This statistic is also called the sum of
squares for the error. SS𝐵 is divided by d.f.N. to obtain the between-group variance.
SS𝑊 is divided by 𝑁 − 𝑘 to obtain the within-group or error variance. These two
variances are sometimes called the mean squares, denoted by MS𝐵 and MS𝑊 . These
terms are used to summarize the analysis of variance and are placed in a summary
table.
Analysis of Variance Summary Table
Sum of Mean
Source squares d.f. square F
Between SS𝐵 𝑘−1 MS𝐵
Within (error) SS𝑊 𝑁−𝑘 MS𝑊
Total
In the table,
SS𝐵 = sum of squares between groups

SS𝑊 = sum of squares within groups
𝑘 = number of groups
𝑁 = 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘 = sum of sample sizes for groups
SS𝐵
MS𝐵 =
𝑘−1
SS𝑊
MS𝑊 =
𝑁−𝑘
MS𝐵
𝐹=
MS𝑊
The totals are obtained by adding the corresponding columns. For example 19, the
ANOVA summary table is shown below.
Analysis of Variance Summary Table
Sum of Mean
Source squares d.f. square F
Between 160.13 2 80.07 9.17
Within (error) 104.80 12 8.73
Total 264.93 14

statistics, and probability. ProQuest Ebook Central https://search.proquest.com
WRITE IT OUT!
After the discussion on analysis of variance, you have now known the
1. Define between-group variance and within-group variance.

___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
APPLY THE CONCEPT!
After the discussion on analysis of variance, you have now known the
e. Summarize the results and explain where the differences in the means are.
f. Create the analysis of variance summary table.
1. Sodium Contents of Foods. The amount of sodium (in milligrams) in one serving
for a random sample of three different kinds of foods is listed here. At the 0.05 level of
significance, is there sufficient evidence to conclude that a difference in mean sodium
amounts exists among condiments, cereals, and desserts?
Condiments Cereals Desserts
270 260 100
130 220 180
230 290 250
180 290 250
80 200 300
70 320 360
200 140 300
160
2. Healthy Eating. Americans appear to be eating healthier. Between 1970 and 2007
the per capita consumption of broccoli increased 1000% from 0.5 to 5.5 pounds. A
nutritionist followed a group of people randomly assigned to one of three groups and
noted their monthly broccoli intake (in pounds). At 𝛼 = 0.05, is there a difference in
means?
Group A Group B Group C

2.0 2.0 3.7
1.5 1.5 2.5
0.75 4.0 4.0
1.0 3.0 5.1
1.3 2.5 3.8
3.0 2.0 2.9
GO BEYOND!
Based on the discussion about analysis of variance, it is necessary to go further

COLOR THAT MAKE YOU SMARTER
The following set of data values was obtained from a study of people’s perceptions on
whether the color of a person’s clothing is related to how intelligent the person looks.
The subjects rated the person’s intelligence on a scale of 1 to 10. Group 1 subjects
were randomly shown people with clothing in shades of blue and gray. Group 2
subjects were randomly shown people with clothing in shades of brown and yellow.
Group 3 subjects were randomly shown people with clothing in shades of pink and
orange. The results follow.
Group 1 Group 2 Group 3

8 7 4
7 8 9
7 7 6
7 7 7
8 5 9
8 8 8
6 5 5
8 8 8
8 7 7
7 6 5
7 6 4
8 6 5
8 6 4
1. Use ANOVA to test for any significant differences between the means.
2. What is the purpose of this study?
3. Explain why separate t tests are not accepted in this situation.
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
Big Picture
a. Identify the advantages and disadvantages of nonparametric methods.

.

ULOa: Identify the advantages and disadvantages of
nonparametric methods
Metalanguage
For ULOa, the term below will be defined to establish a common frame of
TERM DEFINITION
1. Parametric tests ➢ These are statistical tests for population parameters
such as means, variances, and proportions that
involve assumptions about the populations from
which the samples were selected.
2. Nonparametric ➢ These are statistical tests used when the population

tests from which the samples are selected is not normally
distributed.
Essential Knowledge
In this part of the module, we will continue to explore nonparametric tests. Also,
we will identify the advantages and disadvantages of nonparametric tests.
NONPARAMETRIC STATISTICS
Statistical tests, such as the z, t, and F tests, are called parametric tests. Parametric
tests are statistical tests for population parameters such as means, variances, and
proportions that involve assumptions about the populations from which the samples
were selected. One assumption is that these populations are normally distributed. But
what if the population in a particular hypothesis-testing situation is not normally
distributed? Statisticians have developed a branch of statistics known as
nonparametric statistics or distribution-free statistics to use when the population
from which the samples are selected is not normally distributed. Nonparametric
statistics can also be used to test hypotheses that do not involve specific population
parameters, such as 𝜇, 𝜎, or p.
For example, a sportswriter may wish to know whether there is a relationship between
the rankings of two judges on the diving abilities of 10 Olympic swimmers. In another
situation, a sociologist may wish to determine whether men and women enroll at
random for a specific drug rehabilitation program. The statistical tests used in these
situations are nonparametric or distribution-free tests. The term nonparametric is used
for both situations.
LESSON 4.1 Advantages and Disadvantages of Nonparametric Methods
As stated previously, nonparametric tests and statistics can be used in place of their
parametric counterparts (z, t, and F) when the assumption of normality cannot be met.
However, you should not assume that these statistics are a better alternative than the
parametric statistics. There are both advantages and disadvantages in the use of
nonparametric methods.
ADVANTAGES
There are five advantages that nonparametric methods have over parametric
methods:
1. They can be used to test population parameters when the variable is not
2. They can be used when the data are nominal or ordinal.
3. They can be used to test hypotheses that do not involve population
parameters.
4. In some cases, the computations are easier than those for the parametric
counterparts.
5. They are easy to understand.
DISADVANTAGES
There are three disadvantages of nonparametric methods:
1. They are less sensitive than their parametric counterparts when the
assumptions of the parametric methods are met. Therefore, larger
differences are needed before the null hypothesis can be rejected.
2. They tend to use less information than the parametric tests. For example, the
sign test requires the researcher to determine only whether the data values
are above or below the median, not how much above or below the median each
value is.
3. They are less efficient than their parametric counterparts when the
assumptions of the parametric methods are met. That is, larger sample sizes
are needed to overcome the loss of information. For example, the
nonparametric sign test is about 60% as efficient as its parametric counterpart,
the z test. Thus, a sample size of 100 is needed for use of the sign test,
compared with a sample size of 60 for use of the z test to obtain the same
results.

WRITE IT UP!
ACTIVITY 1
After the discussion on nonparametric tests, you have now known the
1. Explain the advantages and disadvantages of using nonparametric methods.

___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
ACTIVITY 2
After the discussion on nonparametric tests, research the use and interpretation
of the following nonparametric tests. Write your answers in a table.
Nonparametric Tests When to Use / Interpretation
1. Sign Test
2. Wilcoxon Rank Sum Test
3. Wilcoxon Signed-rank Test
4. Kruskal-Wallis Test
5. Runs Test
QUESTION AND ANSWER (Q & A) LIST
In this section, kindly list down all your emerging questions and issues related
to this course. These questions or issues can be raised through LMS Blackboard
message and email. Once questions or issues are clarified by the course coordinator,
don’t forget to write the answers to help you review concepts and essential knowledge.
Do you have questions for clarifications?
Questions / Issues Answers
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
KEYWORDS’ INDEX
This section contains all necessary keywords introduced in this course. These
includes concepts (a word or phrase), ideas, theories, names of people, and other vital
terms to remember. This index will help you recall and review.
A M
Alternative Hypothesis Mean
Analysis of variance Multiple regression
B N
Between-group variance Negative relationship
Noncritical region
C Nonparametric tests
Chi-square Null Hypothesis
Contingency Table
Correlation O
Correlation coefficient Observed value
Critical region One-tailed test
Critical value One-way ANOVA
D P
Degrees of freedom Parametric tests
Dependent samples Population
Dependent variable Positive relationship
P-value
E
Expected value R
Regression
G Right-tailed test
Goodness of fit
S
H Sample
Hypothesis Sample size
Scatter plot
I Simple regression
Independence test Standard deviation
Independent populations Standard error
Independent samples
Independent variable T
t test
L Test value
Left-tailed test Two-tailed Test
Level of significance Two-way ANOVA
Type I error
Type II error
REFERENCES

Group.
• Bluman, G. (2011). Elementary Statistics: A Step by Step Approach (8 th ed.).

United States of America: McGraw-Hill Education.
APPENDIX
APPENDIX
APPENDIX
APPENDIX

MTH 313 Advanced Statistics Arnie Boy Guitguit (Complete Weeks)

Uploaded by

Copyright:

Available Formats

You might also like

MTH 313 Advanced Statistics Arnie Boy Guitguit (Complete Weeks)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MTH 313 Advanced Statistics Arnie Boy Guitguit (Complete Weeks)

Uploaded by

Copyright:

Available Formats

UNIVERSITY OF MINDANAO

College of Teacher Education

Physically Distanced but Academically Engaged

Self-Instructional Manual (SIM) for Self-Directed Learning

Course/Subject: MTH 313 – ADVANCED STATISTICS

Name of Teacher: ARNIE BOY O. GUITGUIT

THIS SIM/SDL MANUAL IS A DRAFT VERSION ONLY, NOT

EXPECT REVISIONS OF THE MANUAL.

WEEK 1 – 3: INTRODUCTION TO STATISTICS

BIG PICTURE: Week 4 – 5 ULOs 46

KEYWORD’S INDEX 102

Course Outline: MTH 313 – Advanced Statistics

Course Coordinator: Arnie Boy O. Guitguit

Course Outline Policy

Areas of Concern Details

If the assessment task is done in real time through

Since this course is included in the licensure

Turnitin Submission (IF To ensure honesty and authenticity, all assessment

Please note that academic dishonesty such as

However, if the late submission of assessment paper

For group assessment tasks, the course coordinator

If disapproved by the course coordinator, you can

with the original letter of request. The final decision

Preferred Referencing Style APA 6th Edition

You can also meet the course coordinator in person

For students who have not created their student

Contact Details of the Program Maribel Abalos, MaEd

Phyll Jhann Gildore, MaEd

Ivy Jane S. Regidor

Course Information – see/download course syllabus in the Black Board LMS

a. Understand the five steps used in hypothesis testing,

Big Picture in Focus!

2. Population ➢ It is the entire pool from which the statistical sample

3. Hypothesis ➢ It is an assumption that needs to be validated using

Statistics is a branch of applied mathematics that focuses on studying sample data

Lesson 1.1 Steps in Hypothesis Testing (Traditional Method)

Every hypothesis-testing situation begins with the statement of a hypothesis. A

As an illustration of how hypothesis should be stated, three statistical studies will be

𝐻0 : 𝜇 = 𝑃78 and 𝐻1 : 𝜇 < 𝑃78

Equal = Greater than >

Two-tailed test Right-tailed test Left-tailed test

Reject Error Correct

H0: The defendant is innocent.

The level of significance is the maximum probability of committing a type I error.

In a hypothesis-testing situation, the researcher decides what level of significance to

Figure 1. Finding the Critical Value for 𝛼 = 0.01 (Right-tailed Test)

Figure 2. Critical and Noncritical Regions for 𝛼 = 0.01 (Right-tailed Test)

Figure 3. Critical and Noncritical Regions for 𝛼 = 0.01 (Left-tailed Test)

Figure 4. Finding the Critical Values for 𝛼 = 0.01 (Two-tailed Test)

The critical values are +2.58 and −2.58, as shown in Figure 5.

Figure 5. Critical and Noncritical Regions for 𝛼 = 0.01 (Two-tailed Test)

Figure 6. Summary of Hypothesis Testing and Critical Values

Step 1: Draw the figure and indicate the appropriate area.

Step 2: Identify the corresponding critical values.

Self-Help! Refer to the sources to further understand the lesson.

1. Define null and alternative hypotheses and give example of each.

3. When should a one-tailed test be used? A two-tailed test?