Group 6 Inferential Statistics Module

INFERENTIAL STATISTICS
INTRODUCTION
Inferential statistics is used to analyze the results and draw conclusions.
Experts described inferential statistics as the mathematics and logic of how this
generalization from sample to population can be made (Kolawole, 2001). These
procedures might be used to estimate the likelihood that the collected data
occurred by chance and to draw conclusions about a larger population from
which samples were collected.
Before starting with inferential statistics let us get the basic idea of population
and sample.
POPULATION
• is the group that is targeted to collect the data from. Our data is the
information collected from the population. Population is always defined first,
before starting the data collection process for any statistical study. Population
is not necessarily be people rather it could be batch of batteries,
measurements of rainfall in an area or a group of people.
SAMPLE:
• is the part of population which is selected randomly for the study. The
sample should be selected such that it represents all the characteristics of
the population. The process of selecting the subset from the population is
called sampling and the subset selected is called the sample.
OBJECTIVES
• Explain the purpose of inferential statistics in terms of generalizing from a

sample to a population
• Define and explain the basic techniques of random sampling
• Explain and define these key terms: population, sample, parameter, statistic,
representative, EPSEM sampling techniques
• Differentiate between the sampling distribution, the sample, and the
population
• Explain the two theorems presented
Page 1 of 34
DISCUSSION
Theoretical structure signifies that inferential statistics infer from the sample
to the population. They determine probability of characteristics of population
based on the characteristics of sample and help assess strength of the relationship
between independent (causal) variables, and dependent (effect) variables.
Inferential statistics is strongly associated with the logic of hypothesis testing.

A hypothesis is an empirically verifiable declarative statement concerning the
relationship between independent and dependent variables and their
corresponding measures. In hypothesis testing, main aim is usually to reject the null
hypothesis. Hypothesis testing is an inferential procedure that uses sample data to
evaluate the credibility of a hypothesis about a population.
Hypotheses are stated in two ways. A null hypothesis is a statement that

denies that there is a statistical difference between the status quo and the
experimental condition. It states that the independent variable being studied
makes no difference to the end result. The null hypothesis is the null condition, with
no difference between means or no relationship between variables. Data are
collected that allows deciding if statistician can reject the null hypothesis, and do
so with some confidence that he is not making a mistake. The alternative
hypothesis (H1) would be that there is a relationship between the two variables.
Fundamentally, all inferential statistics procedures are the same as they seek
to determine if the observed (sample) characteristics are sufficiently deviant from
the null hypothesis to justify rejecting it.
Page 2 of 34
Flow chart of inferential statistics
(Source: Asadoorian, 2015)
THE INGREDIENTS FOR MAKING THIS CALCULATION ARE THE SAME FOR ALL
STATISTICAL PROCEDURES:
1. The size of the observed difference(s)

2. The variability in the sample
3. The sample size.
PROCEDURE FOR PERFORMING AN INFERENTIAL TEST:
There are many steps to do inferential statistics.

• Start with a theory
• Make a research hypothesis
• Operationalize the variables
• Identify the population to which the study results should apply
• Form a null hypothesis for this population
• Collect a sample of children from the population and run the study
• Perform statistical tests to see if the obtained sample characteristics are
sufficiently different from what would be expected under the null
hypothesis to be able to reject the null hypothesis.
Page 3 of 34
REASON FOR USING INFERENTIAL STATISTICS
Many top-level journals will not publish articles that do not use inferential
statistics. It allows analyst to generalize findings to the larger population.
It can determine not just what can happen, but what tends to happen in programs.
Inferential statistics helps assess strength of the relationship between independent
(causal) variables, and dependent (effect) variables. It can assess the relative impact
of various program.
Inferential statistics can only be used when statisticians have a complete list of
the members of the population. He draws a random sample from this population. Using
a pre-established formula, statisticians determine that the sample size is large enough.
Inferential statistics can help to determine the strength of the relationship within the
sample. Statisticians can assess the strength of the impact of independent variables
(program inputs) on outcomes (program outputs). In inferential statistics, it is difficult to
obtain a population list and/or draw a random sample.
The following types of inferential statistics are extensively used and relatively
easy to interpret:
1. One sample test of difference/One sample hypothesis test

2. Confidence Interval
3. Contingency Tables and Chi Square Statistic
4. T-test or Anova
5. Pearson Correlation
6. Bi-variate Regression
7. Multi-variate Regression
It can be said that inferential statistics are used to make generalizations from a
sample to a population. There are two sources of error that may result in samples being
different from the population, from which it is drawn that include sampling error and
sampling bias. Inferential statistics are based on taking a random sample from a larger
population and attempting to draw conclusions about the larger population from that
data and the probability that the relations between measured variables are
consistent.
Page 4 of 34
PROBABILITY SAMPLING
When you conduct research about a group of people, it’s rarely possible to
collect data from every person in that group. Instead, you select a sample. The
sample is the group of individuals who will actually participate in the research.
To draw valid conclusions from your results, you have to carefully decide
how you will select a sample that is representative of the group as a whole.
There are two types of sampling methods:
• Probability Sampling involves random selection, allowing you to make

strong statistical inferences about the whole group.
• Non-probability Sampling involves non-random selection based on
convenience or other criteria, allowing you to easily collect data.
KEY TERMS:
Population vs Sample
First, you need to understand the difference between a population and

a sample, and identify the target population of your research.
• The population is the entire group that you want to draw conclusions
about.
• The sample is the specific group of individuals that you will collect data
from.
Sampling Frame
The sampling frame is the actual list of individuals that the sample will be
drawn from. Ideally, it should include the entire target population (and nobody
who is not part of that population).
Sample Size
• A sample is a smaller collection of units from a population used to

determine truths and facts about that population.
• A small group of people taken from a larger group and used to
represent the larger group is called sample.
Page 5 of 34
TYPES PROBABILITY SAMPLING TECHNIQUES
A. Probability Sampling Technique

1. Simple Random Sampling
2. Systematic Sampling
3. Stratified Sampling
4. Cluster Sampling
5. Multi-Stage Sampling
B. Non-Probability Sampling
1. Judgmental Sampling
2. Quota Sampling
3. Convenience Sampling
4. Snowball Sampling
PROBABILITY SAMPLING TECHNIQUE
1. SIMPLE RANDOM SAMPLING (SRS)
➢ Assure that each element in the population has an equal chance of

being selected.
➢ Selection is free from bias
➢ Can calculate the probability – sample size (n) and population size (N)
Therefore, the probability is = n/N
➢ Can be done with or without replacement More convenience, Possibility
of selecting the more precise result same item as a sample twice
Several Ways of Selecting a Simple Random Sample
1. Lottery draw. The name or identifying number of each item in the

population is recorded on a slip of paper and placed in a box -
shuffled – randomly choose required sample size from the box.
2. Each item is numbered and a table of random numbers is used to
select the members of the sample.
3. There are many software programs, such as MINITAB and Excel that
have routines that will randomly select a given number of items from
the population.
Page 6 of 34
Example 1: Simple Random Sampling (SRS)
Imagine that you own a movie theatre and you are offering a
special horror movie film festival next month. To decide which horror
movies to show, you survey moviegoers asking them which of the listed
movies are their favorites. To create the list of movies needed for your
survey, you decide to sample 100 of the 1,000 best horror movies of all
time.
a) Horror movie population is divided evenly into classic movies (those
filmed in or before 1990) and modern movies (those filmed in or later
than 2000).
b) Write out all of the movie titles on slips of paper and place them in
an empty box.
c) Draw out 100 titles and you will have your sample. By using this
approach, you will have ensured that each movie had an equal
chance of selection.
Example 2: Simple Random Sampling (SRS)
Suppose your college has 500 students (population)and you need

to conduct a short survey on the quality of the food served in the
cafeteria. You decide that a sample of 70 students (sample) should be
sufficient for your purposes.
In order to get your sample, you;
a) Assign a number from 001 to 500 to each student,
b) Use a table of randomly generated numbers (Random Number

Tables)
Page 7 of 34
Example 2: Simple Random Sampling
(SRS)
1.Randomly pick a starting point in the

table, and look at the random
number appear there.
2. (In this case) The data run into three

digits (500), the random number
would need to contain three digits
as well.
3. Ignore all random numbers greater

than 500 because they do not
correspond to any of the students in
the college.
4.Remember sample is without

replacement, so if the number recurs,
Advantages of using simple random sampling
skip over it and use the next random
number.
1. Easy to conduct & conceptualize
2. High probability of achieving a representative sample
5.The first 70 different numbers
between 3. 001Meets assumptions
to 500 make upofyour
many statistical procedures
sample. 4. No need of prior information of population
5. Equal and independent chance of selection to every element.
Disadvantages of using simple random sampling
1. Identification of all members of the population can be difficult

2. Contacting all members of the sample can be difficult
3. Expensive and time consuming,
4. Low frequency of use,
5. Larger risk of random error.
Page 8 of 34
2. SYSTEMATIC SAMPLING
➢ Systematic (Random) Sampling, there is a gap, or interval, between each
selected unit in the sample.
➢ Selection of units is based on sample interval, k starting from a
determined point, where k = N/n
Steps:
i. Number the units on your frame from 1 to N and the population are
arranged in some way.
ii. First sample drawn between 1 and k randomly (determine point/
the random start).
iii. Afterwards, every k then must be drawn until the total sample has
been drawn.
Example 1: Systematic Sampling
Page 9 of 34
Example 2: Systematic Sampling
Advantages of Using Systematic Sampling
1. Simple to draw sample,

2. Moderate cost & usage,
3. Easy to verify.
4. Suitable sampling frame can be identified easily
5. Sample evenly spread over entire reference population
Disadvantages of Using Systematic Sampling
1. Periodic ordering required,

2. Contacting
3. Sample may be biased if hidden periodicity in population coincides with
that of selection.
4. Difficult to assess precision of estimate from one survey.
Page 10 of 34
3. STRATIFIED SAMPLING
➢ Stratified sampling technique is generally applied in order to obtain a
representative sample. Under stratified sampling, the population is divided
into several sub-populations that are individually more homogeneous
than the population and the different sub-populations are called ‘strata’
and then we select items from each stratum to constitute a sample.
How to form strata?
As we can say that strata be formed on the basis of common

characteristics of the items in each stratum. Strata are usually based on
past experience and personal judgment of the researcher. We can take
small samples of equal size from the proposed strata and then examining
the variances among the possible stratifications.
How to allocate the sample size of each stratum?
We usually follow the method of proportional allocation under

which the sizes of the samples from the different strata are kept
proportional to the sizes of the strata.
Example 1: Stratified Sampling
Page 11 of 34
Example 2: Stratified Sampling
Stratified (Random) Sampling can be stratified by any variable that is

available Gender (Male & Female), Educational Level (SPM, diploma, 1st
degree) etc.
Sample size is 70 to be drawn from a population of 500 which divided

into 2 strata by gender, male is 150 and female is 350.
Solution:
Male = (150/500)70 = 21 male students

Female = (350/500) = 49 female students
Total Sample = 70 students
Advantages of Using Stratified Sampling
1. More accurate sample,

2. Can be used for both proportional and non- proportional
samples,
3. Representation of subgroups in the sample
Disadvantages of Using Stratified Sampling
1. Identifying members of all subgroups can be difficult,

2. Identification of all members of the population can be difficult
3. Stratified lists costly to prepare
Page 12 of 34
4. CLUSTER RANDOM SAMPLING
➢ A cluster sample is a simple random sample, in which each sampling unit
is a collection or cluster of elements. When the sampling unit is a cluster,
the procedure is called cluster sampling.
Steps to Create a Cluster Random Sampling
1. Identify and define the population.

2. Determine the desired sample size.
3. Identify and define the logical cluster.
4. List all clusters (or obtain a list) that make up the population of
cluster.
5. Estimate the average number of population members per cluster.
6. Determine the number of clusters needed by dividing the sample
size by the estimated size of cluster,
7. Randomly select the needed number of clusters by using a table of
random numbers.
8. Include in the study all population members in each selected
cluster.
Types of Cluster Random Sampling
1. Single Stage Clustering Sampling
➢ when all units in the selected cluster are selected.
Page 13 of 34
2. Double-Stage Clustering Sampling
➢ only some units from a selected cluster are taken using simple random or systematic
random sampling.
Page 14 of 34
Example 1: Cluster Random Sampling
• A special form of cluster sampling called the “30 X 7 cluster

sampling”, has been recommended by the WHO for field
studies in assessing vaccination coverage.
• In this a list of all villages (clusters) for a given geographical area

is made.
• 30 clusters are selected using Probability Proportional to Size

(PPS).
• From each of the selected clusters, 7 subjects are randomly

chosen.
• Thus, a total sample of 30 x 7 = 210 subjects is chosen.
• The advantage of cluster sampling is that sampling frame is not

required Cluster sampling
Advantages of Cluster Random Sampling
✓ Simple as complete list of sampling units within population not

required
✓ Low cost
✓ Can estimate characteristics of both cluster and population
✓ Less travel/resources required
Disadvantages of Cluster Random Sampling
✓ Potential problem is that cluster members are more likely to be alike,

than those in another cluster (homogenous).
✓ Each stage in cluster sampling introduces sampling error—the more
stages there are, the more error there tends to be
✓ Usually less expensive than SRS but not as accurate cluster sampling.
Page 15 of 34
5. MULTISTAGE SAMPLING
➢ Multi-stage sampling is like cluster sampling, but involves selecting a

sample within each chosen cluster, rather than including all units in the
cluster.
➢ involves selecting a sample in at least two stages.
✓ In the first stage, large groups or clusters are selected. These

clusters are designed to contain more population units than are
required for the final sample.
✓ In the second stage, population units are chosen from selected

clusters to derive a final sample. If more than two stages are
used, the process of choosing population units within clusters
continues until the final sample is achieved.
➢ Combination of the methods previously described.
Four Multistage Steps to Conduct Multistage Sampling:
1. Choose a sampling frame, considering the population of interest. The

researcher allocates a number to every group and selects a small
sample of relevant separate groups.
2. Select a sampling frame of relevant separate sub-groups. Do this from

related, different discrete groups selected in the previous stage.
3. Repeat the second step if necessary.
4. Using some variation of probability sampling, choose the members of

the sample group from the sub-groups.
Page 16 of 34
Example 1: Multistage Sampling
Using a combination of the sampling at various stages
✓ Stratify the population by region of the country.

✓ For each region, stratify by urban, suburban, and rural and take
a random sample of communities within those strata.
✓ Divide the selected communities into city blocks as clusters,
and samples some blocks
✓ Everyone on the blocks or within the fixed areas may then be
sampled.
▪ A researcher wants to understand pet feeding habits among

people living in the USA.
▪ For this, he/she requires a sample size of 200 respondents.
▪ The researcher selects 10 states out of 50 at random.
▪ Further, he/she randomly picks out 5 districts per state.
▪ From these 5 districts states, he/she then chooses 4 pet-owning
households to conduct his research.
A systematic sample of households within Enumerations Areas (EAs)

within Districts
✓ Districts – Strata – 1st Stage

✓ EAs – Clusters – 2nd Stage
✓ Households – 3rd Stage
Within each District take a sample of EAs

Within each EAs, take a sample of households
Within each household, sample 2 individuals
Page 17 of 34
Advantages of Using Multistage Sampling
1. It’s has Cost & Time-effectiveness,

2. High level of flexibility,
3. Fewer investigators are needed
4. Normally more accurate than cluster sampling for the same size
sample  Use
Disadvantages of Using Multistage Sampling
1. Further analysis is difficult,

2. High level of subjectivity,
3. Research findings can never be 100% representative of
population,
4. Less accurate than SRS of same size (but more accurate for same
cost),
5. There is the possibility of bias if, for example, only if a small number
of regions are selected.
NON-PROBABILITY SAMPLING TECHNIQUE

In statistics, in the theory relating to sampling probability (also known as
inclusion probability) of an element or member of the population, is its
probability of becoming part of the sample during the drawing of a single
sample.
Non-probability sampling does not meet the criterion. Nonprobability

sampling techniques are not intended to be used to infer from the sample to the
general population in statistical terms. Instead, for example, grounded theory
can be produced through iterative non probability sampling until theoretical
saturation is reached (Strauss and Corbin, 1990)
DISCUSSION:
Probability methods are suitable for large scale studies concerned with
representativeness; nonprobability approaches are more suitable for in- depth
qualitative research in which the focus is often to understand complex social
phenomena (e.g., Marshall 1996; Small 2009). One of the advantages of
nonprobability sampling is its lower cost compared to probability sampling.
Page 18 of 34
Moreover, the in-depth analysis of a small- N purposive sample or a case study
enables the ‘’discovery’’ and identification of patterns and casual mechanism
that do not draw time and context – free assumptions.
Non-probability sampling is often not appropriate in statistical quantitative

research, though, as these assertions raise some questions- how can one
understand a complex social phenomenon by drawing only the most
convenient expressions of that phenomenon into consideration? What
assumption about homogeneity in the world must one make to justify such
assertions? Alas, the consideration that research can only be based in
statistical inference focuses on the problems of bias linked to nonprobability
sampling and acknowledges only one situation in which nonprobability sample
can be appropriate- if one is interested only in the specific cases studied, one
does not need to draw a probability sample from similar cases (Lucas 2014a).
Nonprobability sampling is however widely used in qualitative research.
TYPES OF NONPROBABILITY SAMPLING

1. Convenience, haphazard or accidental sampling
Members of the population are chosen based on their relative ease of access.
To sample friends, co-workers, or shoppers at a single mall, are all examples of
convenience sampling. Such samples are biased because researches may
unconsciously approach some kinds of respondents who volunteer for a study
may differ in unknown but important ways from others (Wiederman 1999).
2. Consecutive sampling
also known as total enumerative sampling, [1] is a sampling technique in which

every subject meeting the criteria of inclusion is selected until the required
sample size is achieved.
3. Snowball sampling
The first respondent refers an acquaintance. The friend also refers a friend, and
so on. Such samples are biased because they give people with more social
connections an unknown but higher chance response rates.
Page 19 of 34
4. Judgemental sampling or Purposive sampling
The researcher chooses the sample based on who they think would be
appropriate for the study. This is used primarily when there is limited number of
people that have expertise in the area being researched, or when the interest
of the research is on specific field or small group.
Types of purposive sampling
a) Deviant case – The researcher obtains cases that substantially differ from
the dominant pattern (a special type of purposive sample). The case is
selected in order to obtain information or unusual cases that can be
especially problematic or especially good.
b) Case study – The research is limited to one group, often with similar
characteristic or of small size.
c) Ad hoc quotes – A quota is established (65% women) and researchers are
free to choose any respondent they wish as long as the quota is met.
Nonprobability sampling should not intend to obtain the same types of results or
be held to the same quality standards as those probability sampling (Steinke, 2004).
Studies intended to use probability sampling sometimes end up using

nonprobability samples because of characteristics of the sampling method.
The statistical model one uses can also render the data of nonprobability sample.
For example, Lucas (2014b) notes that several published studies that use multilevel
modeling have been based on samples that are probability samples for one or more
of the levels of analysis in the study. Evidence indicates that in such cases the bias is
poorly behaved, such that inferences from such analyses are unjustified.
These problems occur in the academic literature, but they maybe more
common in non-academic research.
Page 20 of 34
SAMPLING DISTRIBUTION
➢ is a distribution of a sample statistic (Lovirc 2010)
➢ A SAMPLING DISTRIBUTION of a given population is the distribution of

frequencies of a range of different outcomes that could possibly occur
in a statistic of a population.
➢ It is a statistic that is arrived out through repeated sampling from a larger.
➢ it describes a range of possible outcomes of a statistic (such as the

mean or mode of some variable)
➢ is the probability distribution based on a large number of size n from a

given population
UNDERSTANDING SAMPLING DISTRIBUTION
A Sample is a subset of a population.
A medical researcher that wanted to compare the average weight of all

babies in the Province of Eastern Samar from 1995 to 2005 to those born in
Western Samar within the same time period cannot within a reasonable amount
of time draw the data for the entire population of over a hundred childbirths that
occurred over the ten-year time frame. He will instead only use the weight of say,
100 babies, in each of the two provinces to make a conclusion. The weight of 200
babies used is the sample and the average weight calculated is the sample
mean.
Important: Each sample has its own sample mean and the distribution of
the sample mean is known as the sample distribution.
The average weight computed for each sample set is the sampling
distribution of the mean. Not just the mean can be calculated from a sample.
The standard deviation of a sampling distribution is called the standard

error. While the mean of sampling distribution is equal to the mean of the
population, the standard error depends on the standard deviation of the
population, the size of the population and the size of the sample. The standard
error of the sampling distribution decreases as the sample size increases.
Page 21 of 34
How does it work?
1. Select a random sample of a specific size from a given population.
2. Calculate a statistic for the sample, such as the mean, median, or standard
deviation.
3. Develop a frequency distribution of each sample statistic that you

calculated from the step above.
4. Plot the frequency distribution of each sample statistic that you developed
from the step above. The resulting graph will be the sampling distribution.
Types of Sampling Distribution
1. Sampling distribution of mean
You can calculate the mean of every sample group chosen from the population
and plot out all the data points. The graph will show a normal distribution, and
the center will be the mean of the sampling distribution, which is the mean of the
entire population.
2. Sampling distribution of proportion
It gives you information about proportions in a population. You would select

samples from the population and get the sample proportion. The mean of all the
sample proportions that you calculate from each sample group would become
the proportion of the entire population.
3. T-distribution
T-distribution is used when the sample size is very small or not much is known
about the population. It is used to estimate the mean of the population,
confidence intervals, statistical differences, and linear regression.
Page 22 of 34
Central Limit Theorem
The central limit theorem helps in constructing the sampling distribution of

the mean. The theorem is the idea of how the shape of the sampling distribution
will be normalized as the sample size increases. In other words, plotting the data
that you get will result closer to the shape of a bell curve the more sample groups
you use.
The more sample groups you use, the less variable the means will be for the
sample groups. When the sample size increases, the standard error decreases.
Therefore, the center of the sampling distribution is fairly close to the actual mean
of the population.
Importance of Using a Sampling Distribution
Since populations are typically large in size, it is important to use a sampling

distribution so that you can randomly select a subset of the entire population.
Doing so helps eliminate variability when you are doing research or gathering
statistical data.
It also helps make the data easier to manage and builds a foundation for
statistical inferencing, which leads to making inferences for the whole population.
Understanding statistical inference is important because it helps individuals
understand the spread of frequencies and what various outcomes are like within
a dataset.
Page 23 of 34
HYPOTHESIS TESTING
WHAT IS HYPOTHESIS TESTING?
• Hypothesis testing refers to
1. Making an assumption, called hypothesis, about a population

parameter.
2. Collecting sample data.
3. Calculating a sample statistic.
4. Using the sample statistic to evaluate the hypothesis (how likely is it that
our hypothesized parameter is correct. To test the validity of our
assumption we determine the difference between the hypothesized
parameter value and the sample value.)
Page 24 of 34
NULL HYPOTHESIS
The null hypothesis H0 represents a theory that has been put forward either
because it is believed to be true or because it is used as a basis for an argument and
has not been proven. For example, in a clinical trial of a new drug, the null hypothesis
might be that the new drug is no better, on average, than the current drug. We would
write H0: there is no difference between the two drugs on an average.
ALTERNATIVE HYPOTHESIS
The alternative hypothesis, HA, is a statement of what a statistical hypothesis test is

set up to establish. For example, in the clinical trial of a new drug, the alternative
hypothesis might be that the new drug has a different effect, on average, compared
to that of the current drug. We would write
HA: the two drugs have different effects, on average.

Or
HA: the new thug is better than the current thug, on average.
The result of a hypothesis test:

`Reject H0 in favour of HA' OR `Do not reject H0'
SELECTING AND INTERPRETING SIGNIFICANCE LEVEL
1. Deciding on a criterion for accepting or rejecting the null hypothesis.

2. Significance level refers to the percentage of sample means that is outside
certain prescribed limits. E.g., testing a hypothesis at 5% level of significance
means
➢ that we reject the null hypothesis if it falls in the two regions of area 0.025.
➢ Do not reject the null hypothesis if it falls within the region of area 0.95.
3. The higher the level of significance, the higher is the probability of rejecting the
null hypothesis when it is true. (Acceptance region narrows)
Page 25 of 34
TYPE I AND TYPE II ERRORS
1. Type I error refers to the situation when we reject the null hypothesis when it is
true (Ho is wrongly rejected).
e.g., Ho: there is no difference between the two drugs on average.
Type I error will occur if we conclude that the two drugs produce different
effects when actually there isn't a difference.
Prob (Type I error) = significance level = α
2. Type II error refers to the situation when we accept the null hypothesis when it
is false.
H0: there is no difference between the two drugs on average.
Type II error will occur if we conclude that the two drugs produce the same
effect when actually there is a difference.
Prob(Type II error) = β
TYPE I AND TYPE II ERRORS — Example
Your null hypothesis is that the battery for a heart pacemaker has an average
life of 300 days, with the alternative hypothesis that the average life is more than 300
days. You are the Quality control manager for the battery manufacturer.
(a) Would you rather make a Type I error or a Type II error?
(b) Based on your answer to part (a), should you use a high or low significance
level?
TYPE I AND TYPE II ERRORS — Example
Given H0: average life of pacemaker = 300 days, and HA: Average life of pacemaker
> 300 days
(a) It is better to make a Type II error (where Ho is false i.e., average life is
actually more than 300 days but we accept Ho and assume that the average
life is equal to 300 days)
(b) As we increase the significance level (α) we increase the chances of

making a type I error. Since here it is better to make a type II error, we shall
choose a low α.
Page 26 of 34
TWO TAIL TEST
Two tailed tests will reject the null hypothesis if the sample mean is significantly
higher or lower than the hypothesized mean
Appropriate when Ho : µ = µ0 and HA: µ ≠ µ0
e.g., The manufacturer of light bulbs wants to produce light bulbs with a mean
life of 1000 hours. If the lifetime is shorter, he will lose customers to the competition,
and if it is longer then he will incur a high cost of production. He does not want to
deviate significantly from 1000 hours in either direction. Thus, he selects the
hypotheses as
H0: µ = 1000 hours and HA: µ ≠ 1000 hours and uses a two tail test.
ONE TAIL TEST

A one-sided test is a statistical hypothesis test in which the values for which we
can reject the null hypothesis, H0, are located entirely in one tail of the probability
distribution.
Lower tailed test will reject the null hypothesis if the sample mean is significantly
lower than the hypothesized mean. Appropriate when H0: µ = µ0 and HA: µ < µ0
e.g., A wholesaler buys light bulbs from the manufacturer in large lots and decides
not to accept a lot unless the mean life is at least 1000 hours.
H0: p = 1000 hours and HA: µ <1000 hours and uses a lower tail test.
i.e.,he rejects H0 only if the mean life of sampled bulbs is significantly below 1000
hours. (He accepts HA and rejects the lot.)
Upper tailed test will reject the null hypothesis if the sample mean is
significantly higher than the hypothesized mean. Appropriate when H0: µ = µ0 and HA:
µ > µ0
e.g., A highway safety engineer decides to test the load bearing capacity of a 20-
year-old bridge. The minimum load-bearing capacity of the bridge must be at least
10 tons.
H0: µ = 10 tons and HA: µ >10 tons and uses an upper tail test.
i.e., he rejects Ho only if the mean load bearing capacity of the bridge is significantly
higher than 10 tons.
Page 27 of 34
HYPOTHESIS TEST FOR POPULATION MEAN
√𝑛(𝑥̅ −𝜇0 )
Ho : µ = µ0 and Test statistic ∆=
𝑆
For HA: µ > µ0, reject H0 if ∆ > 𝑡𝑛−1,𝛼
For HA: µ < µ0, reject H0 if ∆ < −𝑡𝑛−1,𝛼
For HA: µ ≠ µ0, reject H0 if |𝛥| > 𝑡𝑛−1,𝛼/2
For n ≥ 30, replace 𝑡𝑛−1,𝛼/2 by zα
EXAMPLE:
A weight reducing program that includes a strict diet and exercise claims on its
online advertisement that it can help an average overweight person lose 10 pounds
in three months. Following the program's method, a group of twelve overweight
persons have lost 8.1 5.7 11.6 12.9 3.8 5.9 7.8 9.1 7.0 8.2 9.3 and 8.0
pounds in three months. Test at 5% level of significance whether the program's
advertisement is overstating the reality.
SOLUTION:
Page 28 of 34
HYPOTHESIS TEST FOR POPULATION PROPORTION
√𝑛(𝑝̂−𝑝0)
H0 : p = p0 and Test statistic ∆ =
√𝑝0 (1−𝑝0 )
For HA : p > p0 reject H0 if ∆ > 𝑧𝛼
For HA : p < p0 reject H0 if ∆ > −𝑧𝛼
For HA : p ≠ p0 reject H0 if |𝛥| > 𝑧𝑎
EXAMPLE:
A ketchup manufacturer is in the process of deciding whether to produce an

extra spicy brand. The company's marketing research department used a national
telephone survey of 6000 households and found the extra spicy ketchup would be
purchased by 335 of them. A much more extensive study made two years ago showed
that 5% of the households would purchase the brand then. At a 2% significance level,
should the company conclude that there is an increased interest in the extra-spicy
flavor?
SOLUTION:
Page 29 of 34
HYPOTHESIS TEST FOR POPULATION STANDARD DEVIATION
(𝑛−1)𝑠 2
H0: 𝜎 = 𝜎0 and Test statistic ∆ =
𝜎0 2
2(𝑅)
For HA: 𝜎 > 𝜎0 , reject H0 if ∆ > 𝑥(𝑛−1),𝑎
2(𝑅)
For HA: 𝜎 < 𝜎0 , reject H0 if ∆ < 𝑥(𝑛−1),1−𝑎
2(𝑅) 2(𝑅)
For HA: 𝜎 ≠ 𝜎0 , reject H0 if ∆ < 𝑥(𝑛−1),1−𝑎/2 or ∆ > 𝑥(𝑛−1),𝑎/2
HYPOTHESIS TEST FOR COMPARING TWO POPULATION MEANS
Consider two populations with means µ1, µ2 and standard deviations 𝜎1 and 𝑥̅1
population1 and population2 respectively. 𝜎𝑥̅ 1 and 𝜎𝑥̅ 2 denote the standard errors of
the sampling distributions of the means.
µ𝑥̅1 − 𝑥̅2 is the mean of the difference between sample means and
𝜎12 𝜎22
𝜎𝑥̅1 − 𝑥̅2 =√𝑛 + 𝑛 is the corresponding standard error.
1 2
𝐻0 (𝑥̅1 −𝑥̅ 2 )−(𝜇1 − 𝜇2 )𝐻0

𝐻0 : 𝜇1 = 𝜇2 and test statistic ∆ =
𝜎𝑥1 − 𝑥2
For HA: 𝜇1 > µ2 reject H0 if ∆ > 𝑧𝛼
Here, ∆ denotes the
For HA: 𝜇1 > µ2 reject H0 if ∆ < −𝑧𝛼 standardized difference of
sample means
For HA: µ ≠ 𝜇2 reject 𝐻0 if |𝛥| > 𝑧𝑎/2
(Decision makers may be concerned with parameters of two populations e.g., do

female employees receive lower salary than their male counterparts for the same
job)
Page 30 of 34
EXAMPLE:
A sample of 32 money market mutual funds was chosen on January 1, 1996 and
the average annual rate of return over the past 30 days was found to be 3.23% and
the sample standard deviation was 0.51%. A year earlier a sample of 38 money-market
funds showed an average rate of return of 4.36% and the sample standard deviation
was 0.84%. Is it reasonable to conclude (at α = 0.05) that money-market interest rates
declined during 1995?
SOLUTION:
HYPOTHESIS TEST FOR COMPARING POPULATION PROPORTIONS
Consider two samples of sizes 𝑛1 and 𝑛2 with 𝑝̅1, and 𝑝̅2 as the respective
proportions of successes. Then
𝑛1 𝑝̅ ,+𝑛2 𝑝̅2
𝑝̅ = is the estimated overall proportion of successes in the two
𝑛1 +𝑛2
populations
𝑝̂𝑞̂ 𝑝̂𝑞̂
𝜎𝑝̅1 −𝑝̅2 = √ + is the estimated standard error of the difference between the
𝑛1 𝑛2
two proportions.
Page 31 of 34
(𝑝̅1 −𝑃̅2 )−(𝑝1 −𝑝2 )𝐻0
𝐻0 : 𝑝1 = 𝑝2 and test statistic, 𝛥 =
𝜎𝑥̅1 −𝑥
̅̅̅̅
2
For HA : p1 > p2 reject H0 if ∆ > 𝑧𝛼 A training director may wish to

determine if the proportion of
For HA : p1 < p2 reject H0 if ∆ > −𝑧𝛼 promotable employees at one
office is different from that of
For HA : p1 ≠ p2 reject H0 if |𝛥| > 𝑧𝑎 /2 another.
EXAMPLE:
A large hotel chain is trying to decide whether to convert more of its rooms into
non-smoking rooms. In a random sample of 400 guests last year, 166 had requested
non-smoking rooms. This year 205 guests in a sample of 380 preferred the non-smoking
rooms. Would you recommend that the hotel chain convert more rooms to non-
smoking? Support your recommendation by testing the appropriate hypotheses at a
0.01 level of significance.
SOLUTION:
Page 32 of 34
SUMMARY
In this report, we discussed that Inferential statistics are used to analyze

the results and draw conclusions from data collected from larger populations. Experts
described inferential statistics as the mathematics and logic of how this generalization
from sample to population can be made. There are many ways to perform inferential
statistics but one way is to start with a theory, make a research hypothesis,
operationalize the variables, identify the population to which the study results should
apply, form a null hypothesis for this population, collect a sample, and perform
statistical tests to compare against the null hypothesis.
In conducting research, you select a sample instead of collecting data from
every person because that is impossible. Probability sampling comes in here. There are
two types of sampling methods: Probability Sampling, which involves random
selection, allowing you to make strong statistical inferences about the whole group,
and Non-probability Sampling, which involves non-random selection based on
convenience or other criteria, allowing you to easily collect data. There are different
types of Probability Sampling Technique: Simple Random Sampling, Systematic
Sampling, Stratified Sampling, Cluster Sampling, and Multi-Stage Sampling. While Non-
Probability Sampling includes Judgmental Sampling, Quota Sampling, Convenience
Sampling, and Snowball Sampling.
Sampling distribution, which refers to the distribution of a sample statistic (Lovirc

2010), is important in concluding large samples. The standard deviation of a sampling
distribution is called the standard error. While the mean of a sampling distribution is
equal to the mean of the population, the standard error depends on the standard
deviation of the population, the size of the population, and the size of the sample. The
standard error of the sampling distribution decreases as the sample size increases.
Hypothesis testing refers to making an assumption, called hypothesis, about a

population parameter, collecting sample data, calculating a sample statistic, and
using the sample statistic to evaluate the hypothesis (how likely is it that our
hypothesized parameter is correct. To test the validity of our assumption we determine
the difference between the hypothesized parameter value and the sample value.)
There are two kinds of hypotheses: null and alternative. The null
Page 33 of 34
hypothesis H0 represents a theory that has been put forward either because it is
believed to be true or because it is used as a basis for an argument and has not been
proven. The alternative hypothesis, HA, is a statement of what a statistical hypothesis
test is set up to establish. Researchers test the hypothesis by comparing it with the null
hypothesis. The null hypothesis is only rejected if its probability falls below a
predetermined significance level, in which case the hypothesis being tested is said to
have that level of significance. Type I or Type II errors can occur throughout this
process. Type I error refers to the situation when we reject the null hypothesis when it is
true (Ho is wrongly rejected). While Type II error refers to the situation when we accept
the null hypothesis when it is false.
Prepared by:
GROUP 6
Capangpangan, Lilia
Diolola, Florida
Ellado, Princess Mae
Gemino, Anthony
Gemino. Jefren
Jaromay, Ricky
Moscosa, Analiza
Page 34 of 34

Group 6 Inferential Statistics Module

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Group 6 Inferential Statistics Module

Uploaded by

Copyright:

Available Formats

INFERENTIAL STATISTICS

• Explain the purpose of inferential statistics in terms of generalizing from a

Inferential statistics is strongly associated with the logic of hypothesis testing.

Hypotheses are stated in two ways. A null hypothesis is a statement that

(Source: Asadoorian, 2015)

1. The size of the observed difference(s)

PROCEDURE FOR PERFORMING AN INFERENTIAL TEST:

There are many steps to do inferential statistics.

1. One sample test of difference/One sample hypothesis test

There are two types of sampling methods:

• Probability Sampling involves random selection, allowing you to make

First, you need to understand the difference between a population and

• A sample is a smaller collection of units from a population used to

A. Probability Sampling Technique

PROBABILITY SAMPLING TECHNIQUE

1. SIMPLE RANDOM SAMPLING (SRS)

➢ Assure that each element in the population has an equal chance of

Several Ways of Selecting a Simple Random Sample

1. Lottery draw. The name or identifying number of each item in the

Example 2: Simple Random Sampling (SRS)

Suppose your college has 500 students (population)and you need

In order to get your sample, you;

a) Assign a number from 001 to 500 to each student,

b) Use a table of randomly generated numbers (Random Number

1.Randomly pick a starting point in the

2. (In this case) The data run into three

3. Ignore all random numbers greater

4.Remember sample is without

Disadvantages of using simple random sampling

1. Identification of all members of the population can be difficult

Advantages of Using Systematic Sampling

1. Simple to draw sample,

Disadvantages of Using Systematic Sampling

1. Periodic ordering required,

How to form strata?

As we can say that strata be formed on the basis of common

How to allocate the sample size of each stratum?

We usually follow the method of proportional allocation under

Example 1: Stratified Sampling

Stratified (Random) Sampling can be stratified by any variable that is

Sample size is 70 to be drawn from a population of 500 which divided

Male = (150/500)70 = 21 male students

Advantages of Using Stratified Sampling

1. More accurate sample,

Disadvantages of Using Stratified Sampling

1. Identifying members of all subgroups can be difficult,

Steps to Create a Cluster Random Sampling

1. Identify and define the population.

Types of Cluster Random Sampling

1. Single Stage Clustering Sampling

➢ when all units in the selected cluster are selected.

• A special form of cluster sampling called the “30 X 7 cluster

• In this a list of all villages (clusters) for a given geographical area

• 30 clusters are selected using Probability Proportional to Size

• From each of the selected clusters, 7 subjects are randomly

• Thus, a total sample of 30 x 7 = 210 subjects is chosen.

• The advantage of cluster sampling is that sampling frame is not

Advantages of Cluster Random Sampling

✓ Simple as complete list of sampling units within population not

Disadvantages of Cluster Random Sampling

✓ Potential problem is that cluster members are more likely to be alike,

➢ Multi-stage sampling is like cluster sampling, but involves selecting a