Professional Documents
Culture Documents
MTH 313 Advanced Statistics Arnie Boy Guitguit (Complete Weeks)
MTH 313 Advanced Statistics Arnie Boy Guitguit (Complete Weeks)
MTH 313 Advanced Statistics Arnie Boy Guitguit (Complete Weeks)
TABLE OF CONTENTS
Course Outline 4
Course Outline Policy 4
Course Coordinator’s (CC) Voice 7
Course Outcome (CO) 7
Students with a Special Needs Students with special needs shall communicate with
the course coordinator about the nature of his or her
special needs. Depending on the nature of the need,
the course coordinator with the approval of the
program coordinator may provide alternative
assessment tasks or extension of the deadline of
submission of assessment tasks. However, the
alternative assessment tasks should still be in the
service of achieving the desired course learning
outcomes.
Online Tutorial Registration (IF You are required to enroll in a specific tutorial time for
NECESSARY) this course via the www.cte.edu.ph portal. Please
note that there is a deadline for enrollment to the
tutorial.
Instructional Help Desk Dr. Jocelyn Bacasmot
Contact Details Email: jbacasmot@umindanao.edu.ph
Phone: 082-3050647 local 102
University of Mindanao 7
College of Teacher Education “Physically Distanced but Academically Engaged”
BEED Concerns
Email: BEED@umindanao.edu.ph
Mobile: 0923-6028780
Library Contact Details Brigida E. Bacani
Head, LIC
Email: library@umindanao.edu.ph
Phone: +63 305-0640
Mobile: 0951-3766681
Well-being Welfare Support Ronadora E. Deala
Held Desk Contact Details Head, GSTC
Email: ronadora_deala@umindanao.edu.ph
Mobile: 0921-2122846
CC’s Voice: Hello future Math teachers! Welcome to this course MTH313 –
Advanced Statistics. By now, I am confident that you really wanted to
become a Math teacher and that you have visualized yourself already
being in front of the classroom teaching. As you go through with this
module, always remember to broaden your perspective about the
course. Moreover, some inputs and discussion were hereby
acknowledged from the book of Allan Bluman (2011) and Edward Vieira,
Jr. (2017).
CO Before the actual teaching performance, you must deal with one of the
major qualities of a teacher which is having an expert power in the subject
matter you are going to discuss. Hence, the ultimate course outcome (CO)
of this course is for you to be acquainted with statistics (statistical tools
and methods). Talking about advanced statistics, the most crucial part is
how we find the correct representation of the following concept. Thus, in
this course you are expected to recall your knowledge or read in advance
the rudiments of statistics, parameters, z test, t test, as well as ANOVA.
These include the process how to calculate mean, variance, and standard
deviation.
Let us begin!
University of Mindanao 8
College of Teacher Education “Physically Distanced but Academically Engaged”
Big Picture
Week 1-3: Unit Learning Outcomes (ULO): At the end of the unit, you are expected to:
Metalanguage
In this section, the most essential terms relevant to the advanced statistics will
be defined. You will encounter these terms as we go through this course. Please refer
to these definitions in case you will encounter difficulty in understanding some
concepts.
For ULOa, the terms below will be defined to establish a common frame of
reference as to how the terms are addressed in this course.
TERMS DEFINITIONS
1. Sample ➢ It is a set of individuals or objects collected or
selected from a statistical population.
Essential Knowledge
To perform the aforesaid big picture (unit learning outcomes) for the first three
(3) weeks of the course, you need to fully understand the essential knowledge that will
be laid down in the succeeding pages. Please note that you are not limited to
exclusively refer to these resources. Thus, you are expected to utilize other books,
research articles and other resources that are available in the university’s library e.g.
ebrary, search.proquest.com etc.
For this module, several concepts will be tackled and explained. Let us start
with the systems of linear equations. In this part, the most critical words or terms used
University of Mindanao 9
College of Teacher Education “Physically Distanced but Academically Engaged”
in studying analytic geometry will be introduced. Kindly look closely to some terms
especially those bolded or italicized.
INTRODUCTION TO STATISTICS
Statistics has two general purposes, perhaps best thought of as types of statistics. We
have descriptive statistics that summarizes and describes sample data
representative of the population. Next, we have inferential statistics that examines
the relationships between and among variables and helps us draw conclusions that
can be generalized to the population.
Having sample data that represent the broader targeted population is critical to our
ability to use our findings to make informed decisions. A sample is a relatively small
representative number of a larger population selected for statistical analysis.
There are two statistical hypotheses for each situation: the null hypothesis and the
alternative hypothesis.
1. Null Hypothesis. It is symbolized by H0. It is a statistical hypothesis that
states that there is no difference between a parameter and a specific value, or
that there is no difference between two parameters.
2. Alternative Hypothesis. It is symbolized by H1 or Ha. It is a statistical
hypothesis that states the existence of a difference between a parameter and
a specific value, or states that there is a difference between two parameters.
EXAMPLE 1
A medical researcher is interested in finding out whether a new medication will have
any undesirable side effects. The researcher is particularly concerned with the pulse
rate of the patients who take the medication. Will the pulse rate increase, decrease or
remain unchanged after a patient takes the medication?
University of Mindanao 10
College of Teacher Education “Physically Distanced but Academically Engaged”
Since the researcher knows that the mean pulse rate for the population under study is
82 beats per minute, the hypothesis for this situation are
𝐻0 : 𝜇 = 82 and 𝐻1 : 𝜇 ≠ 82
The null hypothesis specifies that the mean will remain unchanged, and the alternative
hypothesis states that it will be different. This test is called a two-tailed test since the
possible side effects of the medicine could be to raise or lower the pulse rate.
EXAMPLE 2
A chemist invents an additive to increase the life of an automobile battery. If the mean
lifetime of the automobile battery without the additive is 36 months, then her
hypotheses are
𝐻0 : 𝜇 = 36 and 𝐻1 : 𝜇 > 36
In this situation, the chemist is interested only in increasing the lifetime of the batteries,
so her alternative hypothesis is that the mean is greater than 36 months. The null
hypothesis is that the mean is equal to 36 months. This test is called right-tailed test
since the interest is in an increase only.
EXAMPLE 3
A contractor wishes to lower heating bills by using a special type of insulation in
houses. If the average of the monthly heating bills is P78, her hypotheses about
heating costs with the use of insulation are
This test is a left-tailed test since the contractor is interested only in lowering heating
costs.
To state hypotheses correctly, researchers must translate the conjecture or claim from
words into mathematical symbols. The basic symbols used are as follows:
The null and alternative hypotheses are stated together, and the null hypothesis
contains the equal sign, as shown below (where k represents a specified number).
The null hypothesis is always stated using the equal sign. This is done because most
professionals journals, and when we test the null hypothesis, the assumption is that
University of Mindanao 11
College of Teacher Education “Physically Distanced but Academically Engaged”
the mean, proportion, or standard deviation is equal to a given specific value. Also,
when the researcher conducts a study, he or she is generally looking for evidence to
support a claim. Therefore, the claim should be stated as the alternative hypothesis,
i.e. using < or > or ≠. Because of this, the alternative hypothesis is sometimes called
the research hypothesis.
A claim, though, can be stated as either the null hypothesis or the alternative
hypothesis; however, the statistical evidence can only support the claim if it is the
alternative hypothesis. Statistical evidence can be used to reject the claim if the claim
is the null hypothesis. These facts are important when you are stating the conclusion
of a statistical study.
After stating the hypothesis, the researcher designs the study. The researcher selects
the correct statistical tests, chooses an appropriate level of significance, and
formulates a plan for conducting the study.
A statistical test uses the data obtained from a sample to decide about whether the
null hypothesis should be rejected. The numerical value obtained from a statistical test
is called the test value.
In the hypothesis-testing situation, there are four possible outcomes. In reality, the null
hypothesis may or may not be true, and a decision is made to reject or not reject it
based on the data obtained from a sample. The four possible outcomes are shown
below. Notice that there two possibilities for a correct decision and two possibilities for
an incorrect decision.
H0 true H0 false
Do not
Correct Error
Reject
H0 decision Type II
A type I error occurs if you reject the null hypothesis when it is true.
A type II error occurs if you do not reject the null hypothesis when it is false.
The hypothesis-testing situation can be likened to a jury trial. In a jury trial, there are
four possible outcomes. The defendant is either guilty or innocent, and he or she will
be convicted or acquitted. Now the hypotheses are
If the defendant is convicted but he or she does not commit the crime, then a type I
error has been committed. On the other hand, if the defendant is convicted and he or
she has committed the crime, then a correct decision has been made.
If the defendant is acquitted and he or she did not commit the crime, a correct decision
has been made. However, if the defendant is acquitted and he or she did commit the
crime, then a type II error has been made.
Now, the question is ‘How large a difference is necessary to reject the null hypothesis?’
Here is where the level of significance is used.
The probability of a type II error is symbolized by 𝛽, the Greek letter beta. That is,
P(type II error) = 𝛽. In most hypothesis-testing situation, 𝛽 cannot be easily computed;
however, 𝛼 and 𝛽 are related in that decreasing one increases the other.
Statisticians generally agree on using three arbitrary significance levels: the 0.10, 0.05,
and 0.01 levels. That is, if the null hypothesis is rejected, the probability of a type I
error will be 10%, 5%, or 1%, depending on which level of significance is used. Here
is another way of putting it: When 𝛼 = 0.10, there is a 10% chance of rejecting a true
null hypothesis; when 𝛼 = 0.05, there is 5% chance of rejecting a true null hypothesis;
and when 𝛼 = 0.01, there is 1% chance of rejecting a true null hypothesis.
The critical value separates the critical region from the noncritical region. The
symbol for critical value is C.V.
The critical or rejection region is the range of values of the test value that indicates
that there is significant difference and that the null hypothesis should be rejected.
The noncritical or nonrejection region is the range of values of the test value that
indicates that the difference was probably due to chance and that the null hypothesis
should not be rejected.
The critical value can be on the right side of the mean or on the right side of the mean
for a one-tailed test. Its location depends on the inequality sign of the alternative
hypothesis.
University of Mindanao 13
College of Teacher Education “Physically Distanced but Academically Engaged”
A one-tailed test indicates that the null hypothesis should be rejected when the test
value is in the critical region on one side of the mean. A one-tailed test is either a
right-tailed test or left-tailed test, depending on the direction of the inequality of
the alternative hypothesis.
To obtain the critical value, the researcher must choose an alpha level. Suppose the
researcher chose 𝛼 = 0.01. Then the researcher must find a z value such that 1% of
the area falls to the right of the z value and 99% falls to the left of the z value, as shown
in Figure 1.
Next, the researcher must find the area value in z table closest to 0.9900. The critical
z value is 2.33, since that value gives the area closest to 0.9900 (that is, 0.9901), as
shown in Figure 1.
The critical and noncritical regions and the critical value are shown in Figure 2.
If we have an alternative hypothesis, such as 𝐻1 : 𝜇 < 𝑃78, then the critical value falls
to the left of the mean. This test is thus a left-tailed test. At α = 0.01, the critical value
is -2.33, since 0.0099 is the closest value to 0.01 as shown in Figure 3.
University of Mindanao 14
College of Teacher Education “Physically Distanced but Academically Engaged”
When a researcher conducts a two-tailed test, the null hypothesis can be rejected
when there is significant difference in either decision, above or below the mean.
In a two-tailed test, the null hypothesis should be rejected when the test value is in
either of the two critical regions.
For a two-tailed test, then, the critical region must be split into two equal parts. If α =
0.01, then one-half of the area or 0.005, must be to the right of the mean and one-half
must be to the left of the mean, as shown in Figure 4.
In this case, the z value on the left side is found by looking up the z value
corresponding to an area of 0.0050. The z value falls about halfway between −2.57
and −2.58 corresponding to the areas 0.0049 and 0.0051. The average of −2.57 and
−2.58 is [(−2.57) + (−2.58)] ÷ 2 = −2.575 so is the z value is needed to three decimal
places, −2.575 is used; however, if the z value is rounded to two decimal places,
−2.58 is used.
On the right side, it is necessary to find the z value corresponding to 0.99 + 0.005, or
0.9950. Again, the value falls between 0.9949 and 0.9951, so +2.575 or 2.58 can be
used.
Similar procedures are used to find other values of 𝛼. In Figure 6, it shows the critical
value (C.V.) for the three situations discussed for values 𝛼 = 0.10, 𝛼 = 0.05, and 𝛼 =
0.01 with shaded rejected regions.
In finding the critical values for specific 𝛼, take note of the following:
b. If the test is right-tailed, the critical region, with an area of 𝛼, will be on the
right side of the mean.
c. If the test is two-tailed, 𝛼 must be divided by 2; one-half of the area will be to
the right of the mean, and one-half will be to the left of the mean.
WRITE IT OUT!
ACTIVITY 1!
After the discussion on steps of hypothesis testing, you have now known the
preliminary concepts about advanced statistics. With that, let now us check your
knowledge by doing the following tasks. Now, I will require you to explain thoroughly
your answers about the given argument.
2. What is meant by a Type I error? A Type II error? How are they related?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
ACTIVITY 1!
After the discussion on steps of hypothesis testing, you have now known the
preliminary concepts about advanced statistics. With that, let now us check your
knowledge by doing the following tasks.
DIRECTION: Using the z table (Table E), find the critical value (or values) for each.
1. 𝛼 = 0.05, two-tailed test
2. 𝛼 = 0.01, left-tailed test
3. 𝛼 = 0.005, right-tailed test
4. 𝛼 = 0.01, right-tailed test
5. 𝛼 = 0.05, left-tailed test
6. 𝛼 = 0.02, left-tailed test
7. 𝛼 = 0.05, right-tailed test
8. 𝛼 = 0.01, two-tailed test
9. 𝛼 = 0.04, left-tailed test
10. 𝛼 = 0.02, right-tailed test
.
ACTIVITY 2!
After the discussion on steps of hypothesis testing, you have now known the
preliminary concepts about advanced statistics. With that, let now us check your
knowledge by doing the following tasks.
DIRECTION: For each conjecture, state the null and alternative hypotheses.
1. The average age of community college students is 24.6 years.
2. The average income of accountants is $51,497.
3. The average age of attorneys is greater than 25.4 years.
4. The average score of high school basketball games is less than 88.
5. The average pulse rate of male marathon runners is less than 70 beats per
minute.
6. The average cost of DVD player is $79.95.
7. The average weight loss for a sample of people who exercise 30 minutes per
day for 6 weeks is 8.3 pounds.
University of Mindanao 19
College of Teacher Education “Physically Distanced but Academically Engaged”
GO BEYOND!
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 20
College of Teacher Education “Physically Distanced but Academically Engaged”
Metalanguage
To clearly understand the ULOb, you need to know the important terms relevant
to this course. Kindly adopt with the definitions below.
TERMS DEFINITIONS
1. Mean ➢ This refers to the average of the given sample
data.
Essential Knowledge
In this part of the module, we will explore testing means using the z test. Follow
closely the discussion and answer the exercises afterwards.
We will explain two statistical tests: the z test is used when 𝜎 is known and the t test
is used when 𝜎 is unknown. We will discuss first the z test.
Many hypotheses are tested using a statistical test based on the following general
formula:
The observed value is the statistic (such as the sample mean) that is computed from
the sample data. The expected value is the parameter (such as the population
standard mean) that you would expect to obtain if the null hypothesis were true – in
other words, the hypothesized value. The denominator is the standard error of the
statistics being tested (in this case, the standard error of the mean).
The z test is a statistical test for the mean of a population. It can be used when 𝑛 ≥
30, or when the population is normally distributed and 𝜎 is known.
The formula for the z test is
University of Mindanao 21
College of Teacher Education “Physically Distanced but Academically Engaged”
𝑋−𝜇
𝑧=
𝜎 / √𝑛
where
𝑋 = sample mean
𝜇 = hypothesized population mean
𝜎 = population standard deviation
𝑛 = sample size
For the z test, the observed value is the value of the sample mean. The expected value
is the value of the population mean, assuming that the null hypothesis is true. The
denominator 𝜎 / √𝑛 is the standard error of the mean.
Note: Your first encounter with hypothesis testing can be somewhat challenging and
confusing, since there are many new concepts being introduced at the same time. To
understand all the concepts, you must carefully follow each step in the examples and
try each exercise that is assigned. Only after careful study and patience will these
concepts become clear.
Solution
𝑋−𝜇 30.1 − 29
𝑧= = = 𝟏. 𝟓𝟗
𝜎 / √𝑛 3.8 / √30
There is not enough evidence to support the claim that mean time is
greater than 29 days.
Solution
𝑋−𝜇 75 − 80
𝑧= = = −𝟏. 𝟓𝟔
𝜎 / √𝑛 19.2 / √36
There is enough evidence to support the claim that the average cost of
men’s athletic shoes is less than $80.
Solution
There is enough evidence to support the claim that the average cost of
rehabilitation at the particular hospital is different from $24,672.
First, the claim can be either the null or alternative hypothesis, and one should identify
which it is. Second, after the study is completed, the null hypothesis is either rejected
or not rejected. From these two facts, the decision can be identified in the appropriate
block shown in Figure 7.
Again, remember that nothing is being proved true or false. The statistician is only
stating that there is or is not enough evidence to say that a claim is probably true or
false. As noted previously, the only way to prove something would be to use the entire
population under study, and usually this cannot be done, especially when the
population is large.
University of Mindanao 25
College of Teacher Education “Physically Distanced but Academically Engaged”
Statisticians usually test hypotheses at the common 𝛼 levels of 0.05 or 0.01 and
sometimes at 0.10. Recall that the choice of level depends on the seriousness of the
type I error. Besides listing an 𝛼 value, many computer statistical packages give a P-
value for the hypothesis testing.
The P-value (or probability value) is the probability of getting a sample statistic (such
as the mean) or a more extreme sample statistic in the direction of the alternative
hypothesis when the null hypothesis is true.
In other words, the P-value is the actual area under the standard normal distribution
curve (or other curve, depending on what statistical test is being used) representing
the probability of a particular sample statistic or a more extreme sample statistic
occurring if the null hypothesis is true.
For example, suppose that an alternative hypothesis is 𝐻1 : 𝜇 > 50 and the mean of a
sample of 𝑋 = 52. If the computer printed a P-value of 0.0356 for a statistical test, then
the probability of getting a sample mean of 52 or greater is 0.0356 if the true population
mean is 50 (for a given sample size and standard deviation). The relationship between
the P-value and the 𝛼 value can be explained in this manner. For P = 0.0356, the null
hypothesis would be rejected at 𝛼 = 0.05 but not at 𝛼 = 0.01. See Figure 8.
When the hypothesis test is two-tailed, the area in one tail must be doubled. For a two-
tailed test, if 𝛼 = 0.05 and the area in one tail is 0.0356, the P-value will be 2(0.0356)
= 0.0712. That is, the null hypothesis should not be rejected at 𝛼 = 0.05, since 0.0712
is greater than 0.05. In summary, then, if the P-value is less than 𝛼, reject the null
hypothesis. If the P-value is greater than 𝛼, do not reject the null hypothesis.
The P-value for the z test can be found by using the Table E in the Appendix. First find
the area under the standard normal distribution curve corresponding to the z test
value. For a left-tailed test, use the area given in the table; for a right-tailed test, use
1.0000 minus the area given in the table. To get the P-value for a two-tailed test,
University of Mindanao 26
College of Teacher Education “Physically Distanced but Academically Engaged”
double the area you found in the tail. This procedure is shown in the succeeding
examples.
There are five steps for solving hypothesis-testing problems (P-value method):
Solution
Using Table E in the appendix, find the corresponding area under the
normal distribution for z = 2.28. It is 0.09887. Subtract this value for the
area from 1.0000 to find the area in the right tail.
There is enough evidence to support the claim that the tuition and fees at
four-year public colleges are greater than $5700.
Solution
𝑋−𝜇 8.2 − 8
𝑧= = = 𝟏. 𝟖𝟗
𝜎 / √𝑛 0.6 / √32
Using Table E in the appendix, find the corresponding area under the
normal distribution for z = 1.89. It is 0.09706. Subtract the value from
1.0000.
Since this is a two-tailed test, the area of 0.0294 must be doubled to get
the P-value.
2(0.0294) = 0.0588
There is not enough evidence to reject the claim that the average wind
speed is 8 miles per hour.
When using the P-value method, you can use the following rule:
ACTIVITY 1!
After the discussion on z test for a mean, you have now known the preliminary
concepts about advanced statistics. With that, let now us check your knowledge by
doing the following tasks. Kindly clearly write your solutions after each exercise.
DIRECTION: Perform each of the following steps for the set of questions.
a. State the hypotheses and identify the claim.
b. Find the critical value(s).
c. Compute the test value.
d. Make the decision.
University of Mindanao 29
College of Teacher Education “Physically Distanced but Academically Engaged”
1. Warming and Ice Melt. The average depth of the Hudson Bay is 305 feet.
Climatologists were interested in seeing if the effects of warming and ice melt were
affecting the water level. Fifty-five measurements over a period of weeks yielded a
sample mean of 306.2 feet. The population variance is known to be 3.57. Can it be
concluded at the 0.05 level of significance that the average depth has increased? Is
there evidence of what caused this to happen?
4. Ages of U.S. Senators. The mean age of Senators in the 109th Congress was
60.35 years. A random sample of 40 senators from various state senates had an
average age of 55.4 years, and the population standard deviation is 6.5 years. At 𝛼 =
0.05, is there sufficient evidence that state senators are on average younger than the
Senators in Washington?
GO BEYOND!
CAR THEFTS
You recently received a job with a company that manufactures an automobile antitheft
device. To conduct an advertising campaign for the product, you need to make a claim
about the number of automobile thefts per year. Since the population of various cities
in the United States varies, you decide to use rates per 10,000 people. (The rates are
based on the number of people living in the cities.) Your boss said that last year the
theft rate per 10,000 people was 44 vehicles. You want to see if it has changed. The
following are rates per 10,000 people for 36 randomly selected locations in the United
States.
55 70 56 125 94 66 20 17 16
39 58 25 23 62 91 41 20 36
51 42 53 26 33 115 73 73 78
15 69 55 56 62 75 134 24 67
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 31
College of Teacher Education “Physically Distanced but Academically Engaged”
Metalanguage
To clearly understand the ULOc, you need to know the important terms relevant
to this course. Kindly adopt with the definition below.
TERMS DEFINITIONS
1. t test ➢ It is any statistical hypothesis test in which the
test statistic follows a student’s t-distribution
under the null hypothesis.
Essential Knowledge
In this part of the module, we will explore how to test sample means using the
t test. Follow closely the discussion and answer the exercises afterwards.
When the population standard deviation is unknown, the z test is not normally used
for testing hypothesis involving means. A different test, called the t test, is used. The
distribution of the variable should be approximately normal.
The t distribution is similar to the standard normal distribution in the following ways.
1. It is bell-shaped.
2. It is symmetric about the mean.
3. The mean, median, and mode are equal to 0 and are located at the center of
the distribution.
The t distribution differs from the standard normal distribution in the following ways.
1. The variance is greater than 1.
2. The t distribution is a family of curves based on the degrees of freedom (d.f.),
which is a number related to sample size.
3. As the sample size increases, the t distribution approaches the normal
distribution.
The t test is a statistical test for the mean of a population and is used when the
population is normally or approximately normally distributed, and 𝜎 is unknown.
University of Mindanao 32
College of Teacher Education “Physically Distanced but Academically Engaged”
The formula for t test is similar to the formula for z test. But since the population
standard deviation 𝜎 is unknown, the sample standard deviation 𝑠 is used instead.
The critical values for the t test are given in Table F in the appendix. For a one-tailed
test, find the 𝛼 level by looking down the left-hand column.
Notice that the degrees of freedom are given for values from 1 through 30, then at
intervals above 30. When the degrees of freedom are above 30, some reference will
tell you to use the nearest table value; however, in this module, you should always
round down to the nearest table value. For example, if d.f. = 59, use d.f. = 55 to find
the critical value or values. This is a conservative approach.
As the degrees of freedom get larger, the critical values approach the z values. Hence
the bottom values (large sample size) are the same as the z values that were used in
the last section.
When you test hypotheses by using the t test (traditional method), follow the same
procedure as for the z test, except use Table F.
Remember that the t test should be used when the population is approximately
normally distributed, and the population standard deviation is unknown.
University of Mindanao 33
College of Teacher Education “Physically Distanced but Academically Engaged”
Solution
There is enough evidence to reject the claim that the average number of
infections is 16.3.
Solution
𝑋−𝜇 58.88 − 60
𝑡= = = −𝟎. 𝟔𝟐𝟒
𝑠 / √𝑛 5.08 / √8
There is not enough evidence to support the educator’s claim that the
average salary od substitute teachers in Allegheny County is less than
$60 per day.
Students sometimes have difficulty deciding whether to use the z test or t test. The
rules are the same as those pertaining confidence intervals.
1. If 𝜎 is known, use the z test. The variable must be normally distributed if 𝑛 < 30.
2. If 𝜎 is unknown but 𝑛 ≥ 30, use the t test.
3. If 𝜎 is unknown and 𝑛 < 30, use the t test. The population must be approximately
normally distributed.
WRITE IT OUT!
ACTIVITY 1!
After the discussion on t test a mean, you have now known the preliminary
concepts about advanced statistics. With that, let now us check your knowledge by
doing the following tasks. Now, I will require you to explain thoroughly your answers
about the given argument.
1. In what ways is the t distribution similar to the standard normal distribution? In what
ways is the t distribution different from the standard normal distribution?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 36
College of Teacher Education “Physically Distanced but Academically Engaged”
ACTIVITY 1!
After the discussion on t test for a mean, you have now known the preliminary
concepts about advanced statistics. With that, let now us check your knowledge by
doing the following tasks.
DIRECTION: Using the t table (Table F), find the critical value (or values) for each.
1. 𝑛 = 10, 𝛼 = 0.05, two-tailed test
2. 𝑛 = 18, 𝛼 = 0.01, left-tailed test
3. 𝑛 = 6, 𝛼 = 0.005, right-tailed test
4. 𝑛 = 9, 𝛼 = 0.01, right-tailed test
5. 𝑛 = 15, 𝛼 = 0.05, left-tailed test
6. 𝑛 = 23, 𝛼 = 0.02, left-tailed test
7. 𝑛 = 28, 𝛼 = 0.05, right-tailed test
8. 𝑛 = 17, 𝛼 = 0.01, two-tailed test
9. 𝑛 = 28, 𝛼 = 0.04, left-tailed test
10. 𝑛 = 12, 𝛼 = 0.02, right-tailed test
.
ACTIVITY 2!
After the discussion on t test for a mean, you have now known the preliminary
concepts about advanced statistics. With that, let now us check your knowledge by
doing the following tasks.
University of Mindanao 37
College of Teacher Education “Physically Distanced but Academically Engaged”
DIRECTION: Perform each of the following steps for the set of questions.
a. State the hypotheses and identify the claim.
b. Find the critical value(s).
c. Compute the test value.
d. Make the decision.
e. Summarize the results.
1. Cell Phone Call Lengths. The average local cell phone call length was reported to
be 2.27 minutes. A random sample of 20 phone calls showed an average of 2.98
minutes in length with a standard deviation of 0.98 minute. At 𝛼 = 0.05 can it be
concluded that the average differs from the population average?
2. Heights of Tall Buildings. A researcher estimates that the average height of the
buildings of 30 or more stories in a large city is at least 700 feet. A random sample of
10 buildings is selected, and the heights in feet are shown. At 𝛼 = 0.025, is there
enough evidence to reject the claim?
485 511 841 725 615
520 535 635 616 582
3. Commute Time to Work. A survey of 15 large U.S. cities finds that the average
commute time one way is 25.4 minutes. A chamber of commerce executive feels that
the commute in his city is less and wants to publicize this. He randomly selects 25
commuters and finds the average is 22.1 minutes with a standard deviation of 5.3
minutes. At 𝛼 = 0.10, is he correct?
4. Exercise and Reading Time Spent by Men. Men spend an average of 29 minutes
per day on weekends and holidays exercising and playing sports. They spend an
average of 23 minutes per day reading. A random sample of 25 men resulted in a
mean of 35 minutes exercising with a standard deviation of 6.9 minutes and an
average of 20.5 minutes reading with 𝑠 = 7.2 minutes. At 𝛼 = 0.05 for both, is there
sufficient evidence that these two results differ from the national means?
GO BEYOND!
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 39
College of Teacher Education “Physically Distanced but Academically Engaged”
Metalanguage
To clearly understand the ULOd, you need to know the important terms relevant
to this course. Kindly adopt with the definitions below.
TERMS DEFINITIONS
1. Independent ➢ The data are from two different sources.
Populations
Essential Knowledge
In this part of the module, we will explore how to use z test in comparing two
means from independent populations.
LESSON 1.4 Testing the Difference Between Two Means: Using the z Test
𝐻0 : 𝜇1 = 𝜇2
𝐻1 : 𝜇1 ≠ 𝜇2
where
𝜇1 = mean age of all beginning nursing students at the community college
𝜇2 = mean age of all beginning nursing students at the university
𝐻0 : 𝜇1 − 𝜇2 = 0
𝐻1 : 𝜇1 − 𝜇2 ≠ 0
Assumptions for the z Test to Determine the Difference Between Two Means
1. Both samples are random samples.
2. The samples must be independent of each other. That is, there can be no
relationship between the subjects in each sample.
3. The standard deviation of both populations must be known, and if the sample
sizes are less than 30, the populations must be normally or approximately
normally distributed.
The theory behind testing the difference between two means is based on selecting
pairs of samples and comparing the means of the pairs. The population means need
not be known.
All possible pairs of samples are taken from population. The means for each pair of
samples are computed and then subtracted, and the differences are plotted. If both
populations have the same men, the most of the differences will be zero or close to
zero.
Occasionally, there will be a few large differences due to chance alone, some positive
and others negative. If the differences are plotted, the curve will be shaped like a
normal distribution and have a mean of zero, as shown in Figure 10.
The variance of the difference 𝑋1 − 𝑋2 is equal to the sum of the individual variances
of 𝑋1 and 𝑋2. That is,
Formula for the z Test for Comparing Two Means from Independent
Populations
University of Mindanao 41
College of Teacher Education “Physically Distanced but Academically Engaged”
In comparison of two sample means, the difference may be due to chance, in which
the null hypothesis will not be rejected, and the researcher can assume that the means
of the populations are basically the same. The difference in this case is not significant.
On the other hand, if the difference is significant, the null hypothesis is rejected, and
the researcher can conclude that the population means are different. To illustrate, see
Figure 11.
Figure 11. Hypothesis-Testing Situations in the Comparison of Means
𝐻0 : 𝜇1 = 𝜇2 𝐻0 : 𝜇1 − 𝜇2 = 0 𝐻0 : 𝜇1 = 𝜇2 𝐻0 : 𝜇1 − 𝜇2 = 0
or or
𝐻1 : 𝜇1 > 𝜇2 r 𝐻1 : 𝜇1 − 𝜇2 > 0 𝐻1 : 𝜇1 < 𝜇2 r 𝐻1 : 𝜇1 − 𝜇2 < 0
University of Mindanao 42
College of Teacher Education “Physically Distanced but Academically Engaged”
The same critical values used in the previous examples. They can be obtained from
Table E in the appendix.
If 𝜎12 and 𝜎22 are no known, the researcher can use the variances from each sample 𝑠12
and 𝑠22 , but a t test must be used (this will be explained in the next topic). The basic
format for hypothesis testing using the traditional method is reviewed here.
Solution
There is enough evidence to support the claim that the means are not
equal. Hence, there is a significant difference in the rates.
WRITE IT OUT!
ACTIVITY 1!
After the discussion on using z test on testing the difference between two
means, you have now known the preliminary concepts about advanced statistics. With
that, let now us check your knowledge by doing the following tasks. Now, I will require
you to explain thoroughly your answers about the given argument.
1. Explain the difference between testing a single mean and testing the difference
between two means.
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 44
College of Teacher Education “Physically Distanced but Academically Engaged”
2. What two assumptions must be met when you are using the z test to test differences
between two means?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
3. Show two different ways to state that the means of two populations are equal.
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
After the discussion on using z test on testing the difference between two
means, you have now known the preliminary concepts about advanced statistics. With
that, let now us check your knowledge by doing the following tasks.
DIRECTION: Perform each of the following steps for the set of questions.
a. State the hypotheses and identify the claim.
b. Find the critical value(s).
c. Compute the test value.
d. Make the decision.
e. Summarize the results.
1. ACT Scores. A survey of 1000 students nationwide showed a mean ACT score of
21.4. A survey of 500 Ohio scores showed a mean of 20.8. If the population standard
deviation in each case is 3, can we conclude that Ohio is below the national average?
Use 𝛼 = 0.05.
2. Lengths of Hospital Stays. The average length of “short hospital stays” for men is
slightly longer than that for women, 5.2 days versus 4.5 days. A random sample of
recent hospital stays for both men and women revealed the following. At 𝛼 = 0.01, is
there sufficient evidence to conclude that the average hospital stay for men is longer
than the average hospital stay for women?
Men Women
Sample Size 32 30
Sample mean 5.5 days 4.2 days
Population standard deviation 1.2 days 1.5 days
University of Mindanao 45
College of Teacher Education “Physically Distanced but Academically Engaged”
GO BEYOND!
Based on the discussion about using z test for testing the difference between
two means, it is necessary to go further what was given by the book. You are tasked
now to answer the given problem as well as explain the reason behind your answers.
HOME RUNS
For a sports radio talk show, you are asked to research the question whether more
home runs are hit by players in the National League or by players in the American
League. You decide to use the home run leaders from each league for a 40-year period
as your data. The numbers are shown.
National League American League
47 73 65 49 40 32 52 48 56 50
46 38 47 49 37 46 44 36 29 40
40 31 48 52 38 39 22 45 39 36
44 48 45 39 52 32 37 44 44 49
49 50 79 47 43 57 47 56 52 40
35 40 39 37 36 43 51 42 49 43
37 48 45 38 36 39 41 46 32 32
40 45 36 44 47 32 33 49 44 32
1. Define a population.
2. What kind of sample was used?
3. Do you feel that is representative?
4. What are your hypotheses?
5. What significance level will you use?
6. What statistical test will you use?
7. What are the test results? (Assume 𝜎1 = 8.8 and 𝜎2 = 7.8.
8. What is your decision?
9. What can you conclude?
10. Do you feel that using the data given really answers the original question
asked?
11. What other data might be used to answer the question?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 46
College of Teacher Education “Physically Distanced but Academically Engaged”
Big Picture
Week 4-5: Unit Learning Outcomes (ULO): At the end of the unit, you are expected to:
a. Test the difference between two means for independent samples using t test,
b. Test the difference between two means for dependent samples, and
c. Draw a scatter plot and compute the correlation coefficient.
Metalanguage
For ULOa, the terms below will be defined to establish a common frame of
reference as to how the terms are addressed in this course.
TERMS DEFINITIONS
1. Independent ➢ These are samples which are not related to each
Samples other.
Essential Knowledge
To perform the aforesaid big picture (unit learning outcomes) for the fourth and
fifth weeks of the course, you need to fully understand the essential knowledge that
will be laid down in the succeeding pages. Please note that you are not limited to
exclusively refer to these resources. Thus, you are expected to utilize other books,
research articles and other resources that are available in the university’s library e.g.
ebrary, search.proquest.com etc.
The z test was used to test the difference between two means when the population
standard deviations were known and the variables were normally or approximately
normally distributed, or when both sample sizes were greater than or equal to 30. In
many situations, however, these conditions cannot be met — that is, the population
standard deviations are not known. In these cases, a t test is used to test the difference
between means when the two samples are independent and when the samples are
taken from two normally or approximately normally distributed populations. Samples
are independent samples when they are not related. Also, it will be assumed that the
variances are not equal.
University of Mindanao 47
College of Teacher Education “Physically Distanced but Academically Engaged”
Formula for the t Test – For Testing the Difference Between Two Means –
Independent Samples
The formula
(𝑋1 − 𝑋2 ) – (𝜇1 − 𝜇2)
𝑡=
𝑠 2𝑠 2
√ 1+ 2
𝑛1 𝑛2
where 𝑋1 − 𝑋2 is the observed difference between sample means and where the
expected value 𝜇1 − 𝜇2 is zero when no difference between population means is
𝑠12 𝑠22
hypothesized. The denominator √ + is the standard error of the difference
𝑛1 𝑛2
between two means. Since mathematical derivation of the standard error is somewhat
complicated, it will be omitted here.
Assumptions for the t Test for Two Independent Means When 𝝈𝟏 and 𝝈𝟐 are
Unknown
1. The samples are random samples.
2. The sample data are independent of one another.
3. When the sample sizes are less than 30, the populations must be normally or
approximately normally distributed.
Solution
There is not enough evidence to support the claim that the average size
of the farms is different.
After the discussion on using t test on testing the difference between two
means, you have now known the preliminary concepts about advanced statistics. With
that, let now us check your knowledge by doing the following tasks.
DIRECTION: Perform each of the following steps for the set of questions. For these
exercises assume that the variables are normally or approximately normally distributed
and the variances are unequal.
a. State the hypotheses and identify the claim.
b. Find the critical value(s).
c. Compute the test value.
d. Make the decision.
e. Summarize the results.
1. Ages of Gamblers. The mean age of a sample of 25 people who were playing the
slot machines is 48.7 years, and the standard deviation is 6.8 years. The mean age of
a sample of 35 people who were playing roulette is 55.3 with a standard deviation of
3.2 years. Can it be concluded at 𝛼 = 0.05 that the mean age of those playing the slot
machines is less than those playing roulette?
2. Bestseller Books. The mean for the number of weeks 15 New York Times hard-
cover fiction books spent on the bestseller list is 22 weeks. The standard deviation is
6.17 weeks. The mean for the number of weeks 15 New York Times hard-cover
nonfiction books spent on the list is 28 weeks. The standard deviation is 13.2 weeks.
At 𝛼 = 0.10, can we conclude that there is a difference in the mean times for the
number of weeks the books were on the bestseller lists?
3. Weights of Vacuum Cleaners. Upright vacuum cleaners have either a hard body
type or a soft body type. Shown are the weights in pounds of a sample of each type.
At 𝛼 = 0.05, can it be concluded that the means of the weights are different?
Hard copy types Soft body types
21 18 15 16 17 24 13 11 13 12
16 17 17 20 18 15
23 17 15 20
13 16 17
GO BEYOND!
Based on the discussion about using t test for testing the difference between
two means, it is necessary to go further what was given by the book. You are tasked
now to answer the given problem as well as explain the reason behind your answers.
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 51
College of Teacher Education “Physically Distanced but Academically Engaged”
Metalanguage
To clearly understand the ULOb, you need to know the important terms relevant
to this course. Kindly adopt with the definitions below.
TERMS DEFINITIONS
1. Dependent ➢ Samples are considered to be dependent
Samples samples when the subjects are paired or
matched in some way.
Essential Knowledge
In this part of the module, we will explore how to test the difference between
two means for dependent samples. Follow closely the discussion and answer the
exercises.
The t test was used to compare two sample means when the samples were
independent. In this section, a different version of the t test is explained. This version
is used when the samples are dependent. Samples are considered to be dependent
samples when the subjects are paired or matched in some way.
For example, suppose a medical researcher wants to see whether a drug will affect
the reaction time of its users. To test this hypothesis, the researcher must pretest the
subjects in the sample first. That is, they are given a test to ascertain their normal
reaction times. Then after taking the drug, the subjects are tested again, using a
posttest. Finally, the means of the two tests are compared to see whether there is a
difference. Since the same subjects are used in both cases, the samples are related;
subjects scoring high on the pretest will generally score high on the posttest, even
after consuming the drug. Likewise, those scoring lower on the pretest will tend to
score lower on the posttest. To take this effect into account, the researcher employs
a t test, using the differences between the pretest values and the posttest values.
Thus, only the gain or loss in values is compared.
Here are some other examples of dependent samples. A researcher may want to
design an SAT preparation course to help students raise their test scores the second
time they take the SAT. Hence, the differences between the two exams are compared.
University of Mindanao 52
College of Teacher Education “Physically Distanced but Academically Engaged”
A medical specialist may want to see whether a new counseling program will help
subjects lose weight. Therefore, the pre weights of the subjects will be compared with
the post weights.
When the samples are dependent, a special t test for dependent means is used. This
test employs the difference in values of the matched pairs. The hypotheses are as
follows:
where 𝜇𝐷 is the symbol for the expected mean of the difference of the matched pairs.
The general procedures for finding the test value involves several steps.
𝐷 = 𝑋1 − 𝑋2
∑𝐷
̅=
𝐷 where 𝑛 is the number of data pairs.
𝑛
Third, find the standard deviation 𝑠𝐷 of the differences, using the formula
2
𝑛 ∑ 𝐷 − (∑ 𝐷)2
𝑠𝐷 = √
𝑛(𝑛 − 1)
Fourth, find the estimated standard error 𝑠𝐷̅ of the differences, which is
𝑠𝐷
𝑠𝐷̅ =
√𝑛
̅ − 𝜇𝐷
𝐷
𝑡= with d. f. = 𝑛 − 1
𝑠𝐷 / √𝑛
where the observed value is the mean of the differences. The expected value 𝜇𝐷 is
zero if the hypothesis is 𝜇𝐷 = 0. The standard error of the difference is the standard
deviation of the difference, divided by the square root of the sample size. Both
populations must be normally or approximately normally distributed.
Assumptions for the t Test for Two Independent Means When 𝝈𝟏 and 𝝈𝟐 are
Unknown
1. The sample or samples are random.
2. The sample data are dependent.
3. When the sample size or sample sizes are less than 30, the population must
be normally or approximately normally distributed.
Bank 1 2 3 4 5 6 7 8 9
3 years ago 11.42 8.41 3.98 7.37 2.28 1.10 1.00 0.9 1.35
Today 16.69 9.44 6.53 5.58 2.92 1.88 1.78 1.5 1.22
Solution
11.42 16.69
8.41 9.44
3.98 6.53
7.37 5.58
2.28 2.92
University of Mindanao 54
College of Teacher Education “Physically Distanced but Academically Engaged”
1.10 1.88
1.00 1.78
0.90 1.50
1.35 1.22
∑𝐷 −9.73
̅=
𝐷 = = −1.081
𝑛 9
2
𝑛 ∑ 𝐷 − (∑ 𝐷)2
𝑠𝐷 = √
𝑛(𝑛 − 1)
(9)(40.5437) − (−9.73)2
𝑠𝐷 = √
9(9 − 1)
University of Mindanao 55
College of Teacher Education “Physically Distanced but Academically Engaged”
270.2204
𝑠𝐷 = √
72
𝑠𝐷 = 1.937
̅ − 𝜇𝐷
𝐷 −1.081 − 0
𝑡= = = −𝟏. 𝟔𝟕
𝑠𝐷 / √𝑛 1.937 / √9
There is not enough evidence to show that the deposits have increased
over the last 3 years.
ACTIVITY 1!
Classify the following situations whether each is considered independent or
dependent samples.
1. Heights of identical twins
2. Test scores of the same students in English and psychology
3. The effectiveness of two different brands of aspirin
4. Effects of a drug on reaction time, measured by a before-and-after test
5. The effectiveness of two different diets on two different groups of individuals
ACTIVITY 2!
After the discussion on testing the difference between two means of dependent
samples, you have now known the preliminary concepts about advanced statistics.
With that, let now us check your knowledge by doing the following tasks.
DIRECTION: Perform each of the following steps for the set of questions. For these
exercises assume that the variables are normally or approximately normally
distributed.
a. State the hypotheses and identify the claim.
b. Find the critical value(s).
c. Compute the test value.
d. Make the decision.
e. Summarize the results.
2. Bestseller Books. As an aid for improving students’ study habits, nine students
were randomly selected to attend a seminar on the importance of education in life. The
table shows the number of hours each student studied per week before and after the
seminar. At 0.10 (significance level), did attending the seminar increase the number
of hours the students studied per week?
Before 9 12 6 15 3 18 10 13 7
After 9 17 9 20 2 21 15 22 6
University of Mindanao 57
College of Teacher Education “Physically Distanced but Academically Engaged”
GO BEYOND!
Based on the discussion about using t test for testing the difference between
two means, it is necessary to go further what was given by the book. You are tasked
now to answer the given problem as well as explain the reason behind your answers.
AIR QUALITY
As a researcher for the EPA, you have been asked to determine if the air quality in the
United States has changed over the past 2 years. You select a random sample of 10
metropolitan areas and find the number of days each year that the areas failed to meet
acceptable air quality standards. The data are shown.
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 58
College of Teacher Education “Physically Distanced but Academically Engaged”
Metalanguage
To clearly understand the ULOc, you need to know the important terms relevant
to this course. Kindly adopt with the definitions below.
TERMS DEFINITIONS
1. Correlation ➢ It is a statistical method used to determine
whether a linear relationship between variables
exists.
Essential Knowledge
In this part of the module, we will explore about correlation and regression. In
addition, we will explore how to create a scatter plot and calculate the correlation
coefficient.
To answer the first two questions, statisticians use a numerical measure to determine
whether two or more variables are linearly related and to determine the strength of the
relationship between or among the variables. This measure is called a correlation
coefficient. For example, there are many variables that contribute to heart disease,
among them lack exercise, smoking, heredity, age, stress, and diet. Of these
variables, some are more important than others; therefore, a physician who wants to
help a patient must know which factors are most important.
To answer the third question, you must ascertain what type of relationship exists.
There are two types of relationships: simple and multiple. In a simple relationship,
there are two variables — an independent variable, also called an explanatory
variable or a predictor variable, and a dependent variable, also called a response
variable. A simple relationship analysis is called simple regression, and there is one
independent variable that is used to predict the dependent variable. For example, a
manager may wish to see whether the number of years the salespeople have been
working for the company has anything to do with the amount of sales they make. This
type of study involves a simple relationship, since there are only two variables—years
of experience and amount of sales.
Finally, the fourth question asks what type of predictions can be made. Predictions are
made in all areas and daily. Examples include weather forecasting, stock market
analyses, sales predictions, crop predictions, gasoline price predictions, and sports
predictions. Some predictions are more accurate than others, due to the strength of
the relationship. That is, the stronger the relationship is between variables, the more
accurate the prediction is.
University of Mindanao 60
College of Teacher Education “Physically Distanced but Academically Engaged”
As stated previously, the two variables for this study are called the independent
variable and the dependent variable. The independent variable is the variable in
regression that can be controlled or manipulated. In this case, the number of hours of
study is the independent variable and is designated as the x variable. The dependent
variable is the variable in regression that cannot be controlled or manipulated. The
grade the student received on the exam is the dependent variable, designated as the
y variable. The reason for this distinction between the variables is that you assume
that the grade the student earns depends on the number of hours the student studied.
Also, you assume that, to some extent, the student can regulate or control the number
of hours he or she studies for the exam.
The determination of the x and y variables is not always clear-cut and is sometimes
an arbitrary decision. For example, if a researcher studies the effects of age on a
person’s blood pressure, the researcher can generally assume that age affects blood
pressure. Hence, the variable age can be called the independent variable, and the
variable blood pressure can be called the dependent variable. On the other hand, if a
researcher is studying the attitudes of husbands on a certain issue and the attitudes
of their wives on the same issue, it is difficult to say which variable is the independent
variable and which is the dependent variable. In this study, the researcher can
arbitrarily designate the variables as independent and dependent.
The independent and dependent variables can be plotted on a graph called a scatter
plot. The independent variable x is plotted on the horizontal axis, and the dependent
variable y is plotted on the vertical axis.
A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the
independent variable x and the dependent variable y.
University of Mindanao 61
College of Teacher Education “Physically Distanced but Academically Engaged”
The scatter plot is a visual way to describe the nature of the relationship between the
independent and dependent variables. The scales of the variables can be different,
and the coordinates of the axes are determined by the smallest and largest data values
of the variables.
Solution
Solution
D 54 21.5
E 79 40
F 69 16
G 61 19.6
H 65 19
Solution
After the plot is drawn, it should be analyzed to determine which type of relationship,
if any, exists. For example, the plot shown in Example 11 suggests a positive
relationship, since as the number of cars rented increases, revenue tends to increase
also. The plot of the data shown in Example 12 suggests a negative relationship, since
as the number of absences increases, the final grade decreases. Finally, the plot of
the data shown in Example 13 shows no specific type of relationship, since no pattern
is discernible. Note that the data shown in Examples 11 and 12 also suggest a linear
relationship, since the points seem to fit a straight line, although not perfectly.
The correlation coefficient computed from the sample data measures the strength
and direction of a linear relationship between two quantitative variables. The symbol
for the sample correlation coefficient is r. The symbol for the population correlation
coefficient is 𝜌 (Greek letter rho).
The range of the correlation coefficient is from −1 to +1. If there is a strong positive
linear relationship between the variables, the value of r will be close to +1. If there is
a strong negative relationship between the variables, the value of r will be close to −1.
When there is no linear relationship between the variables or only a weak relationship,
the value of r will be close to 0. See Figure 12.
In addition, the graphs in Figure 13 show the relationship between the correlation
coefficients and their corresponding scatter plots. Notice that as the value of the
correlation coefficient increase from 0 to +1 (parts a, b, and c), data values become
closer to an increasingly strong relationship. As the value of the correlation coefficient
decreases from 0 to −1 (parts d, e, and f), the data values also become closer to a
straight line. Again, this suggests a stronger relationship.
Figure 13. Relationship Between the Correlation Coefficient and the Scatter Plot
University of Mindanao 65
College of Teacher Education “Physically Distanced but Academically Engaged”
There are several ways to compute the value of the correlation coefficient. One
method is to use the formula shown here.
Rounding Rule for the Correlation Coefficient: Round the value of r to threes
decimal places. The formula looks somewhat complicated, but using a table to
compute the values makes it somewhat easier to determine the value of r. There are
no units associated with r, and the value of r will remain unchanged if the x and y are
switched.
Cars x Revenue y
Company (in ten thousands) (in billions) xy x2 y2
A 63.0 7.0
University of Mindanao 66
College of Teacher Education “Physically Distanced but Academically Engaged”
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5
Step 2 2 2
Find the values of xy, x , and y and place these values in the
corresponding columns of the table.
The completed table is shown.
Cars x
Revenue y
(in ten
Company (in billions) xy x2 y2
thousands)
A 63.0 7.0 441.00 3969.00 49.00
B 29.0 3.9 113.10 841.00 15.21
C 20.8 2.1 43.68 432.64 4.41
D 19.1 2.8 53.48 364.81 7.84
E 13.4 1.4 18.76 179.56 1.96
F 8.5 1.5 12.75 72.25 2.25
----------- ---------- --------- ---------- -------
TOTAL: 153.8 18.7 682.77 5859.26 80.67
∑ 𝑥 = 153.8
∑ 𝑦 = 18.7
∑ 𝑥𝑦 = 682.77
∑ 𝑥 2 = 5859.26
∑ 𝑦 2 = 80.67
(6)(682.77) − (153.8)(18.7)
𝑟= = 𝟎. 𝟗𝟖𝟐
√[(6)(5859.26) − (153.8)2 ] [(6)(80.67) − (18.7)2 ]
B 65 26
C 53 50
D 54 21.5
E 79 40
F 69 16
G 61 19.6
H 65 19
Solution
Step 2 Find the values of xy, x2, and y2 and place these values in the
corresponding columns of the table.
The completed table is shown.
Revenue
Cars x
y
(in ten
Company (in xy x2 y2
thousands)
billions)
A 73 16 1168 5329 256
B 65 26 1690 4225 676
C 53 50 2650 2809 2500
D 54 21.5 1161 2916 462.25
E 79 40 3160 6241 1600
F 69 16 1104 4761 256
G 61 19.6 1195.6 3721 384.16
H 65 19 1235 4225 361
----------- ---------- --------- --------- -------
TOTAL: 519 208.1 13363.6 34,227 6495.41
∑ 𝑥 = 519
∑ 𝑦 = 208.1
∑ 𝑥𝑦 = 13363.6
∑ 𝑥 2 = 34227
∑ 𝑦 2 = 6945.41
(8)(13,363.6) − (519)(208.1)
𝑟= = −𝟎. 𝟏𝟕𝟔
√[(8)(34,227) − (519)2 ] [(8)(6495.41) − (208.1)2 ]
University of Mindanao 68
College of Teacher Education “Physically Distanced but Academically Engaged”
In example 14, the value of r was high (close to 1.00); in example 15, the value of r
was much lower (closer to 0). This question then arises, When is the value of r due to
chance, and when does it suggest a significant linear relationship between the
variables?
The Significance of the Correlation Coefficient: As stated before, the range of the
correlation coefficient is between −1 and +1. When the value of r is near +1 or −1,
there is a strong linear relationship. When the value of r is near 0, the linear relationship
is weak or nonexistent. Since the value of r is computed from the data obtained from
samples, there are two possibilities when r is not equal to zero; either the value of r is
high enough to conclude that there is a significant linear relationship between the
variables, or the value of r is due to chance.
WRITE IT OUT!
ACTIVITY 1!
After the discussion on correlation, regression, scatter plots, and correlation
coefficient, you have now known the preliminary concepts about advanced statistics.
With that, let now us check your knowledge by doing the following tasks. Now, I will
require you to explain thoroughly your answers about the given argument.
3. What is meant when the relationship between two variables is called positive?
Negative?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
4. Give examples of two variables that are positively correlated and two variables
that are negatively correlated.
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
5. What is the diagram of the independent and dependent variables called? Why is
drawing this diagram important?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 70
College of Teacher Education “Physically Distanced but Academically Engaged”
ACTIVITY 1!
After the discussion on correlation, regression, scatter plots, and correlation
coefficient, you have now known the preliminary concepts about advanced statistics.
With that, let now us check your knowledge by doing the following tasks.
DIRECTION: Perform each of the following steps for the set of questions.
a. Draw the scatter plot for the variables.
b. Compute the value of the correlation coefficient.
c. State the hypotheses.
d. Test the significance of the correlation coefficient at 𝛼 = 0.05, using Table I.
e. Give a brief explanation of the type of relationship.
1. Gas Tax and Fuel Use. The data below indicate the state gas tax in cents per
gallon and the fuel use per registered vehicle (in gallons). Is there a significant
relationship between these two variables?
Tax 21.5 23 18 24.5 26.4 19
Usage 1062 631 920 686 736 684
3. School Districts and Secondary Schools. A random sample of states yielded the
following numbers of local school districts and the corresponding numbers of
secondary schools. Is there a significant relationship between the data?
School Districts 53 19 24 17 95 68
Secondary Schools 50 27 187 84 143 216
GO BEYOND!
Based on the discussion about using z test for testing the difference between
two means, it is necessary to go further what was given by the book. You are tasked
now to answer the given problem as well as explain the reason behind your answers.
STOPPING DISTANCES
In a study on speed control, it was found that the main reasons for regulations were to
make traffic flow more efficient and to minimize the risk of danger. An area that was
focused on in the study was the distance required to completely stop a vehicle at
various speeds. Use the following table to answer the questions.
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 72
College of Teacher Education “Physically Distanced but Academically Engaged”
Big Picture
Week 6-7: Unit Learning Outcomes (ULO): At the end of the unit, you are expected to:
Metalanguage
For ULOa, the terms below will be defined to establish a common frame of
reference as to how the terms are addressed in this course.
TERMS DEFINITIONS
1. Chi-square ➢ It is a statistical method assessing the goodness of
fit between observed values and those expected
theoretically.
Essential Knowledge
To perform the aforesaid big picture (unit learning outcomes) for the sixth and
seventh weeks of the course, you need to fully understand the essential knowledge
that will be laid down in the succeeding pages. Please note that you are not limited to
exclusively refer to these resources. Thus, you are expected to utilize other books,
research articles and other resources that are available in the university’s library e.g.
ebrary, search.proquest.com etc.
CHI-SQUARE TESTS
The chi-square distribution can also be used for test concerning frequency
distributions, such as “If a sample of buyers is given a choice of automobile colors, will
each color be selected with the same frequency?” The chi-square distribution can be
used to test the independence of two variables, for example, “Are senators’ opinions
on gun control independent of party affiliations?” That is, do the republicans feel one
way and the Democrats feel differently, or do they have the same opinion?
University of Mindanao 73
College of Teacher Education “Physically Distanced but Academically Engaged”
Finally, the chi-square distribution can be used to test the homogeneity of proportions.
For example, is the proportion of high school seniors who attend college immediately
after graduating the same for the northern, southern, eastern, and western parts of the
United States? This section explains the chi-square distribution and its application.
In addition to being used to test a single variance, the chi-square statistic can be used
to see whether a frequency distribution fits a specific pattern. For example, to meet
customer demands, a manufacturer of running shoes may wish to see whether buyers
show a preference for a specific style. A traffic engineer may wish to see whether
accidents occur more often on some days than on others, so that she can increase
police patrols accordingly. An emergency service may want to see whether it receives
more calls at certain times of the day than at others, so that it can provide adequate
staffing.
When you are testing to see whether a frequency distribution fits a specific pattern,
you can use the chi-square goodness-of-fit test. For example, suppose as a market
analyst you wished to see whether consumers have any preference among five flavors
of a new fruit soda. A sample of 100 people provided these data.
If there were no preference, you would expect each flavor to be selected with equal
frequency. In this case, the equal frequency is 100 / 5 = 20. That is, approximately 20
people would select each flavor.
Since the frequencies for each flavor were obtained from a sample, these actual
frequencies are called the observed frequencies. The frequencies obtained by
calculation (as if there were no preference) are called the expected frequencies. A
completed table for the test is shown.
The observed frequencies will almost differ from the expected frequencies due to
sampling error; that is, the values differ from sample to sample. But the question is:
Are these differences significant (a preference exists), or are they due to chance? The
chi-square goodness-of-fit test will enable the researcher to determine the answer.
Before computing the test value, you must state the hypotheses. The null hypothesis
should be a statement indicating that there is no difference or no change.
University of Mindanao 74
College of Teacher Education “Physically Distanced but Academically Engaged”
In the goodness-of-fit test, the degrees of freedom are equal to the number of
categories minus 1. For this example, there are five categories (cherry, strawberry,
orange, lime, and grape); hence, the degrees of freedom are 5 – 1 = 4. This is so
because the number of subjects in each of the first four categories is free to vary. But
in order for the sum to be 100 – the total number of subjects – the number of subjects
in the last category is fixed.
(𝑂 − 𝐸)2
𝜒2 = ∑
𝐸
with degrees of freedom equal to the number of categories minus 1, and where
𝑂 = observed frequency
𝐸 = expected frequency
Two assumptions are needed for the goodness-of-fit test. These assumptions are
given next.
This test is a right-tailed test, since when the 𝑂 − 𝐸 values are squared, the answer
will be positive or zero. Consider the next example.
(𝑂 − 𝐸)2
𝜒2 = ∑
𝐸
2
(32 − 20)2 (28 − 20)2 (16 − 20)2 (14 − 20)2 (10 − 20)2
𝜒 = + + + +
20 20 20 20 20
𝜒 2 = 18.0
To get some idea of why this test is called the goodness-of-fit test, examine graphs of
the observed values and expected values. See Figure 14. From the graphs, you can
see whether the observed values and expected values are close together or far apart.
University of Mindanao 76
College of Teacher Education “Physically Distanced but Academically Engaged”
Figure 14. Graphs of the Observed and Expected Values for Soda Flavors
When the observed values and expected values are close together, the chi-square
test value will be small. Then the decision will be to not reject the null hypothesis –
hence, there is “a good fit”. When the observed values and the expected values are
far apart, the chi-square test value will be large. Then the null hypothesis will be
rejected – hence, there is “not a good fit”. See Figure 15 for the illustration.
When there is perfect agreement between the observed and the expected values,
𝜒 2 = 0. Also, 𝜒 2 can never be negative. Finally, the test is right-tailed because “H0:
Good fit” and “H1: Not a good fit” mean that 𝜒 2 will be small in the first case and
large in the second case.
district, there were 68 accidental deaths, 27 homicides, and 5 suicides during the past
year. At 𝛼 = 0.10, test the claim that the percentages are equal.
Solution
0.74 × 100 = 74
0.16 × 100 = 16
0.10 × 100 = 10
(𝑂 − 𝐸)2
𝜒2 = ∑
𝐸
𝜒 2 = 10.549
There is enough evidence to reject the claim that the distribution is 74%
accidental, 16% homicides, and 10% suicides.
University of Mindanao 78
College of Teacher Education “Physically Distanced but Academically Engaged”
WRITE IT OUT!
ACTIVITY 1!
After the discussion on goodness-of-fit test, you have now known the
preliminary concepts about advanced statistics. With that, let now us check your
knowledge by doing the following tasks. Now, I will require you to explain thoroughly
your answers about the given argument.
1. How are the degrees of freedom computed for the goodness-of-fit test?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
2. How are the expected values computed for the goodness-of-fit test?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
3. When the expected frequencies are less than 5 for a specific class, what should be
done so that you can use the goodness-of-fit test?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 79
College of Teacher Education “Physically Distanced but Academically Engaged”
ACTIVITY 1!
After the discussion on goodness-of-fit test, you have now known the
preliminary concepts about advanced statistics. With that, let now us check your
knowledge by doing the following tasks.
DIRECTION: Perform each of the following steps for the set of questions.
a. State the hypotheses and identify the claim.
b. Find the critical value(s).
c. Compute the test value.
d. Make the decision.
e. Summarize the results.
3. Music Sales. In a recent year, 77.8% of recorded music sales were full-length CDs,
12.8% digital downloads, 3.8% singles, and the rest a mixture of other formats. A
survey of college students and their recent music purchases indicated that out of 200
purchases, 48 were downloads, 135 were full-length CDs, 10 were singles, and the
rest fit into the “other” category. At the 0.05 level of significance is there sufficient
evidence to conclude that at least one of the proportions differs from its original value?
University of Mindanao 80
College of Teacher Education “Physically Distanced but Academically Engaged”
GO BEYOND!
Flavor
Bag Green Orange Red Purple Yellow
1 7 20 10 7 14
2 20 5 5 13 17
3 4 16 13 21 4
4 12 9 16 3 17
Total 43 50 44 44 52
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 81
College of Teacher Education “Physically Distanced but Academically Engaged”
Metalanguage
To clearly understand the ULOb, you need to know the important terms relevant
to this course. Kindly adopt with the definitions below.
TERMS DEFINITIONS
1. Independence Test ➢ It used to determine whether two variables are
independent of or related to each other when a
single sample is selected.
2. Contingency Table ➢ It is when data are arranged in table form for the
chi-square independence test.
Essential Knowledge
In this part of the module, we will explore how to use chi-square distribution for
independence test. Follow closely the discussion and answer the exercises.
The chi-square independence test can be used to determine whether two variables
are independent of or related to each other when a single sample is selected. For
example, suppose a new postoperative procedure is administered to a number of
patients in a large hospital. The researcher can ask the question, Do the doctors feel
differently about this procedure from the nurses, or do they feel basically the same
way? Note that the question is not whether they prefer the procedure but whether there
is a difference of opinion between the two groups.
To answer this question, a researcher selects a sample of nurses and doctors and
tabulates the data in table form, as shown.
As the survey indicates, 100 nurses prefer the new procedure, 80 prefer the old
procedure, and 20 have no preference; 50 doctors prefer the new procedure, 120 like
the old procedure, and 30 have no preference. Since the main question is whether
University of Mindanao 82
College of Teacher Education “Physically Distanced but Academically Engaged”
there is a difference in opinion, the null and alternative hypothesis are stated as
follows:
If the null hypothesis is not rejected, the test means that both professions feel basically
the same way about the procedure and the differences are due to chance. If the null
hypothesis is rejected, the test means that one group feels differently about the
procedure from the other. Remember that rejection does not mean that one group
favors the procedure and the other does not. Perhaps both groups favor it or both
dislike it, but in different proportions.
To test the null hypothesis by using the chi-square independence test, you must
compute the expected frequencies, assuming that the null hypothesis is true. These
frequencies are computed by using the observed frequencies given in the table.
When data are arranged in table form for the chi-square independence test, the table
is called a contingency table. The table is made up of R rows and C columns. The
table here has two rows and three columns.
Note that row and column headings do not count in determining the number of rows
and columns.
The degrees of freedom for any contingency table are (rows – 1) times (columns – 1);
that is, d.f. = (R – 1)(C – 1). In this case, (2 – 1)(3 – 1) = (1)(2) = 2. The reason for this
formula for d.f. is that all expected values except one are free vary in each row and in
each column.
University of Mindanao 83
College of Teacher Education “Physically Distanced but Academically Engaged”
Using the previous table, you can compute the expected frequencies for each block
(or cell), as shown next.
1. Find the sum of each row and each column, and find the grand total, as shown.
2. For each cell, multiply the corresponding row sum by the column sum and divide by
the grand total, to get the expected value:
(200)(200)
𝐸1,2 = = 100
400
The expected values can now be placed in corresponding cells along with the
observed values, as shown.
The rationale for the computation of the expected frequencies for a contingency table
uses proportions. For C1, 1 a total of 150 out of 400 people prefer the new procedure.
Since there are 200 nurses, you would expect, if the null hypothesis were true, (150 /
400) (200), or 75, of the nurses to be in favor of the new procedure.
The formula for the test value for the independence test is the same as the one used
for the goodness-of-fit test. It is
2
(𝑂 − 𝐸)2
𝜒 = ∑
𝐸
For the previous example, compute the (𝑂 − 𝐸)2 / 𝐸 values for each cell, and then
find the sum.
(𝑂 − 𝐸)2
𝜒2 = ∑
𝐸
2
(100 − 75)2 (80 − 100)2 (20 − 25)2 (50 − 75)2
𝜒 = + + +
75 100 25 75
(120 − 100)2 (30 − 25)2
+ +
100 25
𝜒 2 = 26.67
The final steps are to make the decision and summarize the results. This test is always
a right-tailed test, and the degrees of freedom are (R – 1)(C – 1) = (2 – 1)(3 – 1) = 2.
If 𝛼 = 0.05, the critical value from Table G in the appendix is 5.991. Hence, the decision
is to reject the null hypothesis since 26.67 > 5.991. See illustration below.
The conclusion is that there is enough evidence to support the claim that opinion is
related to (dependent on) profession – that is, that the doctors and nurses differ in their
opinions about the procedure.
At 𝛼 = 0.05, can it be concluded that the number of infections is related to the hospital
where they occurred?
Solution
2
(𝑂 − 𝐸)2
𝜒 = ∑
𝐸
2
(41 − 50.30)2 (27 − 27.81)2 (51 − 40.89)2 (36 − 33.39)2
𝜒 = + + +
50.30 27.81 40.89 33.39
(3 − 18.46)2 (40 − 27.15)2 (169 − 162.31)2
+ + +
18.46 27.15 162.31
(106 − 89.73)2 (109 − 131.96)2
+ +
89.73 131.96
𝜒 2 = 30.698
WRITE IT OUT!
ACTIVITY 1!
After the discussion on independence test, you have now known the preliminary
concepts about advanced statistics. With that, let now us check your knowledge by
doing the following tasks. Now, I will require you to explain thoroughly your answers
about the given argument.
1. How is the chi-square independence test similar to the goodness-of-fit test? How is
it different?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
2. How are the degrees of freedom computed for the independence test?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
3. Generally, how would the null and alternative hypotheses be stated for the chi-
square independence test?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
4. How are the expected values computed for each cell in the table?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 88
College of Teacher Education “Physically Distanced but Academically Engaged”
After the discussion on independence test, you have now known the preliminary
concepts about advanced statistics. With that, let now us check your knowledge by
doing the following tasks.
DIRECTION: Perform each of the following steps for the set of questions.
a. State the hypotheses and identify the claim.
b. Find the critical value(s).
c. Compute the test value.
d. Make the decision.
e. Summarize the results.
2. Women in the Military. This table lists the numbers of officers and enlisted
personnel for women in the military. At 𝛼 = 0.05, is there sufficient evidence to
conclude that a relationship exists between rank and branch of the Armed Forces?
Officers Enlisted
Army 10, 791 62, 491
Navy 7, 816 42, 750
Marine Corps 932 9, 525
Air Force 11, 819 54, 344
3. Medal Counts for the Olympics. The 2010 Winter Olympics final medal counts for
the top four nations are shown below. At the 0.10 level of significance can it be
concluded that the type of medal won was dependent upon the competing country?
Gold Silver Bronze
United States 9 15 13
Germany 10 13 7
Canada 14 7 5
Norway 9 8 6
University of Mindanao 89
College of Teacher Education “Physically Distanced but Academically Engaged”
GO BEYOND!
18 - 29 30 - 49 50 - 64 65 and up
For 96 (79.5) 96 (79.5) 90 (79.5) 36 (79.5)
Against 201 (204.75) 189 (204.75) 195 (204.75) 234 (204.75)
Don’t Know 3 (15.75) 15 (15.75) 15 (15.75) 30 (15.75)
1. Which number from the output is compared to the significance level to check
if the null hypothesis should be rejected?
2. Which number from the output gives the probability of a type I error that is
calculated from your sample data?
3. Was a right-, left-, or two-tailed test run? Why?
4. Can you tell how many rows and columns there were by looking at the
degrees of freedom?
5. Does increasing the sample size change the degrees of freedom?
6. What are your conclusions? Look at the observed and expected frequencies
in the table to draw some of your own specific conclusions about response and
age.
7. What would your conclusions be if the level of significance were initially set
at 0.10?
8. Does chi-square tell you which cell’s observed and expected frequencies are
significantly different?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 90
College of Teacher Education “Physically Distanced but Academically Engaged”
Metalanguage
To clearly understand the ULOc, you need to know the important terms relevant
to this course. Kindly adopt with the definitions below.
TERMS DEFINITIONS
1. Analysis of ➢ It is used to test claims involving three or more
Variance (ANOVA) means.
Essential Knowledge
In this part of the module, we will explore analysis of variance. We will explore
one-way and two-way analysis of variance.
ANALYSIS OF VARIANCE
The F test, used to compare two variances, can also be used to compare three or
more means. This technique is called analysis of variance, or ANOVA. It is used to
test claims involving three or more means. (Note: The F test can also be used to test
the equality of two means. Since it is equivalent to the t test in this case, the t test is
usually used instead of the F test when there are only two means.) For example,
suppose a researcher wishes to see whether the means of the time it takes three
groups of students to solve a computer problem using Fortran, Basic, and Pascal are
different. The researcher will use the ANOVA technique for this test. The z and t tests
should not be used when three or more means are compared.
For three groups, the F test can only show whether a difference exists among the three
means. It cannot reveal where the difference lies—that is, between 𝑋̅1 and 𝑋̅2, or 𝑋̅1
and𝑋̅3, or 𝑋̅2 and 𝑋̅3. If the F test indicates that there is a difference among the means,
other statistical tests are used to find where the difference exists. The most commonly
used tests are the Scheffé test and the Tukey test.
University of Mindanao 91
College of Teacher Education “Physically Distanced but Academically Engaged”
The analysis of variance that is used to compare three or more means is called a one-
way analysis of variance since it contains only one variable. In the previous example,
the variable is the type of computer language used. The analysis of variance can be
extended to studies involving two variables, such as type of computer language used
and mathematical background of the students. These studies involve a two-way
analysis of variance.
When an F test is used to test a hypothesis concerning the means of three or more
populations, the technique is called analysis of variance (commonly abbreviated as
ANOVA). At first glance, you might think that to compare the means of three or more
samples, you can use the t test, comparing two means at a time. But there are several
reasons why the t test should not be done.
First, when you are comparing two means at a time, the rest of the means under study
are ignored. With the F test, all the means are compared simultaneously. Second,
when you are comparing two means at a time and making all pairwise comparisons,
the probability of rejecting the null hypothesis when it is true is increased, since the
more t tests that are conducted, the greater is the likelihood of getting significant
differences by chance alone. Third, the more means there are to compare, the more t
tests are needed. For example, for the comparison of 3 means two at a time, 3 t tests
are required. For the comparison of 5 means two at a time, 10 tests are required. And
for the comparison of 10 means two at a time, 45 tests are required.
Even though you are comparing three or more means in this use of the F test,
variances are used in the test instead of means.
With the F test, two different estimates of the population variance are made. The first
estimate is called the between-group variance, and it involves finding the variance
of the means. The second estimate, the within-group variance, is made by
computing the variance using all the data and is not affected by differences in the
means. If there is no difference in the means, the between-group variance estimate
will be approximately equal to the within-group variance estimate, and the F test value
will be approximately equal to 1. The null hypothesis will not be rejected. However,
when the means differ significantly, the between-group variance will be much larger
than the within-group variance; the F test value will be significantly greater than 1; and
the null hypothesis will be rejected. Since variances are compared, this procedure is
called analysis of variance (ANOVA).
University of Mindanao 92
College of Teacher Education “Physically Distanced but Academically Engaged”
For a test of the difference among three or more means, the following hypotheses
should be used:
𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘
𝐻1 : At least one mean is different from the others.
As stated previously, a significant test value means that there is a high probability that
this difference in means is not due to chance, but it does not indicate where the
difference lies.
The degrees of freedom for this F test are d.f.N. = 𝑘 − 1, where 𝑘 is the number of
groups, and d.f.D. = 𝑁 − 𝑘, where 𝑁 is the sum of the sample sizes of the groups 𝑁 =
𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘 . The sample sizes need not be equal. The F test to compare
means is always right-tailed.
Solution
The critical value is 3.89, obtained from Table H in the appendix with 𝛼 =
0.05.
University of Mindanao 93
College of Teacher Education “Physically Distanced but Academically Engaged”
b. Find the grand mean. The grand mean, denoted by 𝑋̅𝐺𝑀 , is the mean
of all values in the samples.
∑𝑋 10 + 12 + 9 + 15 + 13 + 6 + 8 + ⋯ + 4 116
𝑋̅𝐺𝑀 = = = = 7.73
𝑁 15 15
When samples are equal in size, find 𝑋̅𝐺𝑀 by summing the 𝑋̅’s and
dividing by 𝑘, where 𝑘 = the number of groups.
∑ 𝑛𝑖 (𝑋̅𝑖 − 𝑋̅𝐺𝑀 )2
𝑠𝐵2 =
𝑘−1
Note: This formula finds the variance among the means by using the
sample sizes as weights and considers the differences in the means.
2
d. Find the within-group variance, denoted by 𝑠𝑊 .
2
∑(𝑛𝑖 − 1)𝑠𝑖2
𝑠𝑊 =
∑(𝑛𝑖 − 1)
2
(5 − 1)(5.7) + (5 − 1)(10.2) + (5 − 1)(10.3)
𝑠𝑊 =
(5 − 1)(5 − 1)(5 − 1)
2
104.80
𝑠𝑊 = = 8.73
12
University of Mindanao 94
College of Teacher Education “Physically Distanced but Academically Engaged”
𝑠𝐵2 80.07
𝐹= 2 = 8.73 = 𝟗. 𝟏𝟕
𝑠𝑊
There is enough evidence to reject the claim and conclude that at least
one mean is different from the others.
The numerator of the fraction obtained in step 3, part c, of the computational process
is called the sum of squares between groups, denoted by SS𝐵 . The numerator of the
fraction obtained in step 3, part d, of the computational procedure is called the sum of
squares within groups, denoted by SS𝑊 . This statistic is also called the sum of
squares for the error. SS𝐵 is divided by d.f.N. to obtain the between-group variance.
SS𝑊 is divided by 𝑁 − 𝑘 to obtain the within-group or error variance. These two
variances are sometimes called the mean squares, denoted by MS𝐵 and MS𝑊 . These
terms are used to summarize the analysis of variance and are placed in a summary
table.
Sum of Mean
Source squares d.f. square F
Between SS𝐵 𝑘−1 MS𝐵
Within (error) SS𝑊 𝑁−𝑘 MS𝑊
Total
In the table,
SS𝑊
MS𝑊 =
𝑁−𝑘
MS𝐵
𝐹=
MS𝑊
The totals are obtained by adding the corresponding columns. For example 19, the
ANOVA summary table is shown below.
Sum of Mean
Source squares d.f. square F
Between 160.13 2 80.07 9.17
Within (error) 104.80 12 8.73
Total 264.93 14
WRITE IT OUT!
After the discussion on analysis of variance, you have now known the
preliminary concepts about advanced statistics. With that, let now us check your
knowledge by doing the following tasks. Now, I will require you to explain thoroughly
your answers about the given argument.
After the discussion on analysis of variance, you have now known the
preliminary concepts about advanced statistics. With that, let now us check your
knowledge by doing the following tasks.
DIRECTION: Perform each of the following steps for the set of questions.
a. State the hypotheses and identify the claim.
b. Find the critical value(s).
c. Compute the test value.
d. Make the decision.
e. Summarize the results and explain where the differences in the means are.
f. Create the analysis of variance summary table.
1. Sodium Contents of Foods. The amount of sodium (in milligrams) in one serving
for a random sample of three different kinds of foods is listed here. At the 0.05 level of
significance, is there sufficient evidence to conclude that a difference in mean sodium
amounts exists among condiments, cereals, and desserts?
Condiments Cereals Desserts
270 260 100
130 220 180
230 290 250
180 290 250
80 200 300
70 320 360
200 140 300
160
2. Healthy Eating. Americans appear to be eating healthier. Between 1970 and 2007
the per capita consumption of broccoli increased 1000% from 0.5 to 5.5 pounds. A
nutritionist followed a group of people randomly assigned to one of three groups and
noted their monthly broccoli intake (in pounds). At 𝛼 = 0.05, is there a difference in
means?
GO BEYOND!
1. Use ANOVA to test for any significant differences between the means.
2. What is the purpose of this study?
3. Explain why separate t tests are not accepted in this situation.
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
University of Mindanao 98
College of Teacher Education “Physically Distanced but Academically Engaged”
Big Picture
Week 8-9: Unit Learning Outcomes (ULO): At the end of the unit, you are expected to:
Metalanguage
For ULOa, the term below will be defined to establish a common frame of
reference as to how the terms are addressed in this course.
TERM DEFINITION
1. Parametric tests ➢ These are statistical tests for population parameters
such as means, variances, and proportions that
involve assumptions about the populations from
which the samples were selected.
Essential Knowledge
In this part of the module, we will continue to explore nonparametric tests. Also,
we will identify the advantages and disadvantages of nonparametric tests.
NONPARAMETRIC STATISTICS
Statistical tests, such as the z, t, and F tests, are called parametric tests. Parametric
tests are statistical tests for population parameters such as means, variances, and
proportions that involve assumptions about the populations from which the samples
were selected. One assumption is that these populations are normally distributed. But
what if the population in a particular hypothesis-testing situation is not normally
distributed? Statisticians have developed a branch of statistics known as
nonparametric statistics or distribution-free statistics to use when the population
from which the samples are selected is not normally distributed. Nonparametric
University of Mindanao 99
College of Teacher Education “Physically Distanced but Academically Engaged”
statistics can also be used to test hypotheses that do not involve specific population
parameters, such as 𝜇, 𝜎, or p.
For example, a sportswriter may wish to know whether there is a relationship between
the rankings of two judges on the diving abilities of 10 Olympic swimmers. In another
situation, a sociologist may wish to determine whether men and women enroll at
random for a specific drug rehabilitation program. The statistical tests used in these
situations are nonparametric or distribution-free tests. The term nonparametric is used
for both situations.
As stated previously, nonparametric tests and statistics can be used in place of their
parametric counterparts (z, t, and F) when the assumption of normality cannot be met.
However, you should not assume that these statistics are a better alternative than the
parametric statistics. There are both advantages and disadvantages in the use of
nonparametric methods.
ADVANTAGES
There are five advantages that nonparametric methods have over parametric
methods:
1. They can be used to test population parameters when the variable is not
normally distributed.
2. They can be used when the data are nominal or ordinal.
3. They can be used to test hypotheses that do not involve population
parameters.
4. In some cases, the computations are easier than those for the parametric
counterparts.
5. They are easy to understand.
DISADVANTAGES
There are three disadvantages of nonparametric methods:
1. They are less sensitive than their parametric counterparts when the
assumptions of the parametric methods are met. Therefore, larger
differences are needed before the null hypothesis can be rejected.
2. They tend to use less information than the parametric tests. For example, the
sign test requires the researcher to determine only whether the data values
are above or below the median, not how much above or below the median each
value is.
3. They are less efficient than their parametric counterparts when the
assumptions of the parametric methods are met. That is, larger sample sizes
are needed to overcome the loss of information. For example, the
nonparametric sign test is about 60% as efficient as its parametric counterpart,
the z test. Thus, a sample size of 100 is needed for use of the sign test,
compared with a sample size of 60 for use of the z test to obtain the same
results.
University of Mindanao 100
College of Teacher Education “Physically Distanced but Academically Engaged”
WRITE IT UP!
ACTIVITY 1
After the discussion on nonparametric tests, you have now known the
preliminary concepts about advanced statistics. With that, let now us check your
knowledge by doing the following tasks. Now, I will require you to explain thoroughly
your answers about the given argument.
ACTIVITY 2
After the discussion on nonparametric tests, research the use and interpretation
of the following nonparametric tests. Write your answers in a table.
1. Sign Test
4. Kruskal-Wallis Test
5. Runs Test
University of Mindanao 101
College of Teacher Education “Physically Distanced but Academically Engaged”
In this section, kindly list down all your emerging questions and issues related
to this course. These questions or issues can be raised through LMS Blackboard
message and email. Once questions or issues are clarified by the course coordinator,
don’t forget to write the answers to help you review concepts and essential knowledge.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
University of Mindanao 102
College of Teacher Education “Physically Distanced but Academically Engaged”
KEYWORDS’ INDEX
This section contains all necessary keywords introduced in this course. These
includes concepts (a word or phrase), ideas, theories, names of people, and other vital
terms to remember. This index will help you recall and review.
A M
Alternative Hypothesis Mean
Analysis of variance Multiple regression
B N
Between-group variance Negative relationship
Noncritical region
C Nonparametric tests
Chi-square Null Hypothesis
Contingency Table
Correlation O
Correlation coefficient Observed value
Critical region One-tailed test
Critical value One-way ANOVA
D P
Degrees of freedom Parametric tests
Dependent samples Population
Dependent variable Positive relationship
P-value
E
Expected value R
Regression
G Right-tailed test
Goodness of fit
S
H Sample
Hypothesis Sample size
Scatter plot
I Simple regression
Independence test Standard deviation
Independent populations Standard error
Independent samples
Independent variable T
t test
L Test value
Left-tailed test Two-tailed Test
Level of significance Two-way ANOVA
Type I error
Type II error
University of Mindanao 103
College of Teacher Education “Physically Distanced but Academically Engaged”
REFERENCES
• Goos, P., & Meintrup, D. (2015). Statistics with jmp: Graphs, descriptive
statistics, and probability. ProQuest Ebook Central
https://search.proquest.com
University of Mindanao 104
College of Teacher Education “Physically Distanced but Academically Engaged”
APPENDIX
University of Mindanao 105
College of Teacher Education “Physically Distanced but Academically Engaged”
APPENDIX
University of Mindanao 106
College of Teacher Education “Physically Distanced but Academically Engaged”
APPENDIX
University of Mindanao 107
College of Teacher Education “Physically Distanced but Academically Engaged”
APPENDIX