Professional Documents
Culture Documents
Group 6 Inferential Statistics Module
Group 6 Inferential Statistics Module
INTRODUCTION
Inferential statistics is used to analyze the results and draw conclusions.
Experts described inferential statistics as the mathematics and logic of how this
generalization from sample to population can be made (Kolawole, 2001). These
procedures might be used to estimate the likelihood that the collected data
occurred by chance and to draw conclusions about a larger population from
which samples were collected.
Before starting with inferential statistics let us get the basic idea of population
and sample.
POPULATION
• is the group that is targeted to collect the data from. Our data is the
information collected from the population. Population is always defined first,
before starting the data collection process for any statistical study. Population
is not necessarily be people rather it could be batch of batteries,
measurements of rainfall in an area or a group of people.
SAMPLE:
• is the part of population which is selected randomly for the study. The
sample should be selected such that it represents all the characteristics of
the population. The process of selecting the subset from the population is
called sampling and the subset selected is called the sample.
OBJECTIVES
Page 1 of 34
DISCUSSION
Theoretical structure signifies that inferential statistics infer from the sample
to the population. They determine probability of characteristics of population
based on the characteristics of sample and help assess strength of the relationship
between independent (causal) variables, and dependent (effect) variables.
Fundamentally, all inferential statistics procedures are the same as they seek
to determine if the observed (sample) characteristics are sufficiently deviant from
the null hypothesis to justify rejecting it.
Page 2 of 34
Flow chart of inferential statistics
THE INGREDIENTS FOR MAKING THIS CALCULATION ARE THE SAME FOR ALL
STATISTICAL PROCEDURES:
Page 3 of 34
REASON FOR USING INFERENTIAL STATISTICS
Many top-level journals will not publish articles that do not use inferential
statistics. It allows analyst to generalize findings to the larger population.
It can determine not just what can happen, but what tends to happen in programs.
Inferential statistics helps assess strength of the relationship between independent
(causal) variables, and dependent (effect) variables. It can assess the relative impact
of various program.
Inferential statistics can only be used when statisticians have a complete list of
the members of the population. He draws a random sample from this population. Using
a pre-established formula, statisticians determine that the sample size is large enough.
Inferential statistics can help to determine the strength of the relationship within the
sample. Statisticians can assess the strength of the impact of independent variables
(program inputs) on outcomes (program outputs). In inferential statistics, it is difficult to
obtain a population list and/or draw a random sample.
The following types of inferential statistics are extensively used and relatively
easy to interpret:
It can be said that inferential statistics are used to make generalizations from a
sample to a population. There are two sources of error that may result in samples being
different from the population, from which it is drawn that include sampling error and
sampling bias. Inferential statistics are based on taking a random sample from a larger
population and attempting to draw conclusions about the larger population from that
data and the probability that the relations between measured variables are
consistent.
Page 4 of 34
PROBABILITY SAMPLING
When you conduct research about a group of people, it’s rarely possible to
collect data from every person in that group. Instead, you select a sample. The
sample is the group of individuals who will actually participate in the research.
To draw valid conclusions from your results, you have to carefully decide
how you will select a sample that is representative of the group as a whole.
KEY TERMS:
Population vs Sample
• The population is the entire group that you want to draw conclusions
about.
• The sample is the specific group of individuals that you will collect data
from.
Sampling Frame
The sampling frame is the actual list of individuals that the sample will be
drawn from. Ideally, it should include the entire target population (and nobody
who is not part of that population).
Sample Size
Page 5 of 34
TYPES PROBABILITY SAMPLING TECHNIQUES
B. Non-Probability Sampling
1. Judgmental Sampling
2. Quota Sampling
3. Convenience Sampling
4. Snowball Sampling
Page 6 of 34
Example 1: Simple Random Sampling (SRS)
Imagine that you own a movie theatre and you are offering a
special horror movie film festival next month. To decide which horror
movies to show, you survey moviegoers asking them which of the listed
movies are their favorites. To create the list of movies needed for your
survey, you decide to sample 100 of the 1,000 best horror movies of all
time.
a) Horror movie population is divided evenly into classic movies (those
filmed in or before 1990) and modern movies (those filmed in or later
than 2000).
b) Write out all of the movie titles on slips of paper and place them in
an empty box.
c) Draw out 100 titles and you will have your sample. By using this
approach, you will have ensured that each movie had an equal
chance of selection.
Page 7 of 34
Example 2: Simple Random Sampling
(SRS)
Page 8 of 34
2. SYSTEMATIC SAMPLING
➢ Systematic (Random) Sampling, there is a gap, or interval, between each
selected unit in the sample.
➢ Selection of units is based on sample interval, k starting from a
determined point, where k = N/n
Steps:
i. Number the units on your frame from 1 to N and the population are
arranged in some way.
ii. First sample drawn between 1 and k randomly (determine point/
the random start).
iii. Afterwards, every k then must be drawn until the total sample has
been drawn.
Example 1: Systematic Sampling
Page 9 of 34
Example 2: Systematic Sampling
Page 10 of 34
3. STRATIFIED SAMPLING
➢ Stratified sampling technique is generally applied in order to obtain a
representative sample. Under stratified sampling, the population is divided
into several sub-populations that are individually more homogeneous
than the population and the different sub-populations are called ‘strata’
and then we select items from each stratum to constitute a sample.
Page 11 of 34
Example 2: Stratified Sampling
Solution:
Page 12 of 34
4. CLUSTER RANDOM SAMPLING
➢ A cluster sample is a simple random sample, in which each sampling unit
is a collection or cluster of elements. When the sampling unit is a cluster,
the procedure is called cluster sampling.
Page 13 of 34
2. Double-Stage Clustering Sampling
➢ only some units from a selected cluster are taken using simple random or systematic
random sampling.
Page 14 of 34
Example 1: Cluster Random Sampling
Page 15 of 34
5. MULTISTAGE SAMPLING
Page 16 of 34
Example 1: Multistage Sampling
Page 17 of 34
Advantages of Using Multistage Sampling
DISCUSSION:
Probability methods are suitable for large scale studies concerned with
representativeness; nonprobability approaches are more suitable for in- depth
qualitative research in which the focus is often to understand complex social
phenomena (e.g., Marshall 1996; Small 2009). One of the advantages of
nonprobability sampling is its lower cost compared to probability sampling.
Page 18 of 34
Moreover, the in-depth analysis of a small- N purposive sample or a case study
enables the ‘’discovery’’ and identification of patterns and casual mechanism
that do not draw time and context – free assumptions.
2. Consecutive sampling
3. Snowball sampling
The first respondent refers an acquaintance. The friend also refers a friend, and
so on. Such samples are biased because they give people with more social
connections an unknown but higher chance response rates.
Page 19 of 34
4. Judgemental sampling or Purposive sampling
The researcher chooses the sample based on who they think would be
appropriate for the study. This is used primarily when there is limited number of
people that have expertise in the area being researched, or when the interest
of the research is on specific field or small group.
a) Deviant case – The researcher obtains cases that substantially differ from
the dominant pattern (a special type of purposive sample). The case is
selected in order to obtain information or unusual cases that can be
especially problematic or especially good.
b) Case study – The research is limited to one group, often with similar
characteristic or of small size.
c) Ad hoc quotes – A quota is established (65% women) and researchers are
free to choose any respondent they wish as long as the quota is met.
Nonprobability sampling should not intend to obtain the same types of results or
be held to the same quality standards as those probability sampling (Steinke, 2004).
The statistical model one uses can also render the data of nonprobability sample.
For example, Lucas (2014b) notes that several published studies that use multilevel
modeling have been based on samples that are probability samples for one or more
of the levels of analysis in the study. Evidence indicates that in such cases the bias is
poorly behaved, such that inferences from such analyses are unjustified.
These problems occur in the academic literature, but they maybe more
common in non-academic research.
Page 20 of 34
SAMPLING DISTRIBUTION
Important: Each sample has its own sample mean and the distribution of
the sample mean is known as the sample distribution.
The average weight computed for each sample set is the sampling
distribution of the mean. Not just the mean can be calculated from a sample.
Page 21 of 34
How does it work?
2. Calculate a statistic for the sample, such as the mean, median, or standard
deviation.
4. Plot the frequency distribution of each sample statistic that you developed
from the step above. The resulting graph will be the sampling distribution.
You can calculate the mean of every sample group chosen from the population
and plot out all the data points. The graph will show a normal distribution, and
the center will be the mean of the sampling distribution, which is the mean of the
entire population.
3. T-distribution
T-distribution is used when the sample size is very small or not much is known
about the population. It is used to estimate the mean of the population,
confidence intervals, statistical differences, and linear regression.
Page 22 of 34
Central Limit Theorem
The more sample groups you use, the less variable the means will be for the
sample groups. When the sample size increases, the standard error decreases.
Therefore, the center of the sampling distribution is fairly close to the actual mean
of the population.
It also helps make the data easier to manage and builds a foundation for
statistical inferencing, which leads to making inferences for the whole population.
Understanding statistical inference is important because it helps individuals
understand the spread of frequencies and what various outcomes are like within
a dataset.
Page 23 of 34
HYPOTHESIS TESTING
4. Using the sample statistic to evaluate the hypothesis (how likely is it that
our hypothesized parameter is correct. To test the validity of our
assumption we determine the difference between the hypothesized
parameter value and the sample value.)
Page 24 of 34
NULL HYPOTHESIS
The null hypothesis H0 represents a theory that has been put forward either
because it is believed to be true or because it is used as a basis for an argument and
has not been proven. For example, in a clinical trial of a new drug, the null hypothesis
might be that the new drug is no better, on average, than the current drug. We would
write H0: there is no difference between the two drugs on an average.
ALTERNATIVE HYPOTHESIS
Page 25 of 34
TYPE I AND TYPE II ERRORS
1. Type I error refers to the situation when we reject the null hypothesis when it is
true (Ho is wrongly rejected).
e.g., Ho: there is no difference between the two drugs on average.
Type I error will occur if we conclude that the two drugs produce different
effects when actually there isn't a difference.
Prob (Type I error) = significance level = α
2. Type II error refers to the situation when we accept the null hypothesis when it
is false.
H0: there is no difference between the two drugs on average.
Type II error will occur if we conclude that the two drugs produce the same
effect when actually there is a difference.
Prob(Type II error) = β
Your null hypothesis is that the battery for a heart pacemaker has an average
life of 300 days, with the alternative hypothesis that the average life is more than 300
days. You are the Quality control manager for the battery manufacturer.
(a) Would you rather make a Type I error or a Type II error?
(b) Based on your answer to part (a), should you use a high or low significance
level?
Given H0: average life of pacemaker = 300 days, and HA: Average life of pacemaker
> 300 days
(a) It is better to make a Type II error (where Ho is false i.e., average life is
actually more than 300 days but we accept Ho and assume that the average
life is equal to 300 days)
Page 26 of 34
TWO TAIL TEST
Two tailed tests will reject the null hypothesis if the sample mean is significantly
higher or lower than the hypothesized mean
Appropriate when Ho : µ = µ0 and HA: µ ≠ µ0
e.g., The manufacturer of light bulbs wants to produce light bulbs with a mean
life of 1000 hours. If the lifetime is shorter, he will lose customers to the competition,
and if it is longer then he will incur a high cost of production. He does not want to
deviate significantly from 1000 hours in either direction. Thus, he selects the
hypotheses as
H0: µ = 1000 hours and HA: µ ≠ 1000 hours and uses a two tail test.
Lower tailed test will reject the null hypothesis if the sample mean is significantly
lower than the hypothesized mean. Appropriate when H0: µ = µ0 and HA: µ < µ0
e.g., A wholesaler buys light bulbs from the manufacturer in large lots and decides
not to accept a lot unless the mean life is at least 1000 hours.
H0: p = 1000 hours and HA: µ <1000 hours and uses a lower tail test.
i.e.,he rejects H0 only if the mean life of sampled bulbs is significantly below 1000
hours. (He accepts HA and rejects the lot.)
Upper tailed test will reject the null hypothesis if the sample mean is
significantly higher than the hypothesized mean. Appropriate when H0: µ = µ0 and HA:
µ > µ0
e.g., A highway safety engineer decides to test the load bearing capacity of a 20-
year-old bridge. The minimum load-bearing capacity of the bridge must be at least
10 tons.
H0: µ = 10 tons and HA: µ >10 tons and uses an upper tail test.
i.e., he rejects Ho only if the mean load bearing capacity of the bridge is significantly
higher than 10 tons.
Page 27 of 34
HYPOTHESIS TEST FOR POPULATION MEAN
√𝑛(𝑥̅ −𝜇0 )
Ho : µ = µ0 and Test statistic ∆=
𝑆
EXAMPLE:
A weight reducing program that includes a strict diet and exercise claims on its
online advertisement that it can help an average overweight person lose 10 pounds
in three months. Following the program's method, a group of twelve overweight
persons have lost 8.1 5.7 11.6 12.9 3.8 5.9 7.8 9.1 7.0 8.2 9.3 and 8.0
pounds in three months. Test at 5% level of significance whether the program's
advertisement is overstating the reality.
SOLUTION:
Page 28 of 34
HYPOTHESIS TEST FOR POPULATION PROPORTION
√𝑛(𝑝̂−𝑝0)
H0 : p = p0 and Test statistic ∆ =
√𝑝0 (1−𝑝0 )
EXAMPLE:
SOLUTION:
Page 29 of 34
HYPOTHESIS TEST FOR POPULATION STANDARD DEVIATION
(𝑛−1)𝑠 2
H0: 𝜎 = 𝜎0 and Test statistic ∆ =
𝜎0 2
2(𝑅)
For HA: 𝜎 > 𝜎0 , reject H0 if ∆ > 𝑥(𝑛−1),𝑎
2(𝑅)
For HA: 𝜎 < 𝜎0 , reject H0 if ∆ < 𝑥(𝑛−1),1−𝑎
2(𝑅) 2(𝑅)
For HA: 𝜎 ≠ 𝜎0 , reject H0 if ∆ < 𝑥(𝑛−1),1−𝑎/2 or ∆ > 𝑥(𝑛−1),𝑎/2
Consider two populations with means µ1, µ2 and standard deviations 𝜎1 and 𝑥̅1
population1 and population2 respectively. 𝜎𝑥̅ 1 and 𝜎𝑥̅ 2 denote the standard errors of
the sampling distributions of the means.
µ𝑥̅1 − 𝑥̅2 is the mean of the difference between sample means and
𝜎12 𝜎22
𝜎𝑥̅1 − 𝑥̅2 =√𝑛 + 𝑛 is the corresponding standard error.
1 2
Page 30 of 34
EXAMPLE:
A sample of 32 money market mutual funds was chosen on January 1, 1996 and
the average annual rate of return over the past 30 days was found to be 3.23% and
the sample standard deviation was 0.51%. A year earlier a sample of 38 money-market
funds showed an average rate of return of 4.36% and the sample standard deviation
was 0.84%. Is it reasonable to conclude (at α = 0.05) that money-market interest rates
declined during 1995?
SOLUTION:
Consider two samples of sizes 𝑛1 and 𝑛2 with 𝑝̅1, and 𝑝̅2 as the respective
proportions of successes. Then
𝑛1 𝑝̅ ,+𝑛2 𝑝̅2
𝑝̅ = is the estimated overall proportion of successes in the two
𝑛1 +𝑛2
populations
𝑝̂𝑞̂ 𝑝̂𝑞̂
𝜎𝑝̅1 −𝑝̅2 = √ + is the estimated standard error of the difference between the
𝑛1 𝑛2
two proportions.
Page 31 of 34
(𝑝̅1 −𝑃̅2 )−(𝑝1 −𝑝2 )𝐻0
𝐻0 : 𝑝1 = 𝑝2 and test statistic, 𝛥 =
𝜎𝑥̅1 −𝑥
̅̅̅̅
2
EXAMPLE:
A large hotel chain is trying to decide whether to convert more of its rooms into
non-smoking rooms. In a random sample of 400 guests last year, 166 had requested
non-smoking rooms. This year 205 guests in a sample of 380 preferred the non-smoking
rooms. Would you recommend that the hotel chain convert more rooms to non-
smoking? Support your recommendation by testing the appropriate hypotheses at a
0.01 level of significance.
SOLUTION:
Page 32 of 34
SUMMARY
Page 33 of 34
hypothesis H0 represents a theory that has been put forward either because it is
believed to be true or because it is used as a basis for an argument and has not been
proven. The alternative hypothesis, HA, is a statement of what a statistical hypothesis
test is set up to establish. Researchers test the hypothesis by comparing it with the null
hypothesis. The null hypothesis is only rejected if its probability falls below a
predetermined significance level, in which case the hypothesis being tested is said to
have that level of significance. Type I or Type II errors can occur throughout this
process. Type I error refers to the situation when we reject the null hypothesis when it is
true (Ho is wrongly rejected). While Type II error refers to the situation when we accept
the null hypothesis when it is false.
Prepared by:
GROUP 6
Capangpangan, Lilia
Diolola, Florida
Ellado, Princess Mae
Gemino, Anthony
Gemino. Jefren
Jaromay, Ricky
Moscosa, Analiza
Page 34 of 34