Hypothesis Test and Significance Level

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27



Prepared by
Vaishnavi - 001
Kavya - 016
Topics that will be covered
● What is hypothesis testing?
● Null hypothesis and Alternate hypothesis
● Simple and composite hypothesis testing
● One tailed and two tailed hypothesis testing
● Type I and Type II error
● Level of significance
● P value
● How to find level of significance?
● How is the level of significance used in hypothesis testing?
What is Hypothesis Testing?
Hypothesis Testing is a type of statistical analysis in which you put your
assumptions about a population parameter to the test. It is used to estimate the
relationship between two statistical variables.

Hypothesis Testing refers to :

1. Making an assumption, called hypothesis, about a population parameter

2. Collecting sample data
3. Calculating a sample statistic
4. Using the sample statistic to evaluate the hypothesis

Some Real life examples:

● A doctor believes that 3D (Diet, Dose, and Discipline) is 90% effective

for diabetic patients
● A report says that 35% people already gave up on their new year
resolutions (I know you are one in them)
● About 10% of the human population is left-handed

Null Hypothesis
● Null Hypothesis represents a theory that has been put forward either because
it is believed to be true or because it is used as a basis for an argument and
has not been proven
● H0 is the symbol for it, and it is pronounced H-naught
● The null hypothesis, also known as the conjecture, is used in qualitative
analysis to test theories about markets, investing strategies, or economies to
decide if an idea is true or false.
● The observations of this hypothesis are the result of chance

Alternate Hypothesis
● The alternative hypothesis is a statement used in statistical inference
● It is contradictory to the null hypothesis and denoted by Ha or H1
● The Alternate Hypothesis is the logical opposite of the null hypothesis. The
acceptance of the alternative hypothesis follows the rejection of the null
● The observations of this hypothesis are the result of real effect

1. To check the water quality of a river for one year, the researchers are doing
the observation. As per the null hypothesis, there is no change in water quality
in the first half of the year as compared to the second half. But in the
alternative hypothesis, the quality of water is poor in the second half when
2. A sanitizer manufacturer claims that its product kills 99.99 percent of germs
on average. To put this company's claim to the test, create a null and
alternate hypothesis

H0 (Null Hypothesis): Average = 99.99%

Alternative Hypothesis (H1): The average is less than 99.99%

Simple and Composite Hypothesis Testing
Depending on the population distribution, you can classify the statistical hypothesis into two

Simple Hypothesis: A simple hypothesis specifies an exact value for the parameter

Composite Hypothesis: A composite hypothesis specifies a range of values

A company is claiming that their average sales for this quarter are 1000 units. This is
an example of a simple hypothesis.Suppose the company claims that the sales are in
the range of 900 to 1000 units. Then this is a case of a composite hypothesis

One tailed and two tailed hypothesis testing

● In a one tailed test, the critical distribution area is one sided,

means the test sample is either greater or lesser than a
specific value
● In two tailed test, the test sample is checked to be greater or
less than a range of values, implying that the critical
distribution area is two-sided
● If the sample falls within this range, the alternate hypothesis
will be accepted and null hypothesis will be rejected

Type I and Type II errors

A hypothesis test can result in two types of errors:

Type I Error refers to the situation when we reject the null

hypothesis when it is true

Type II Error refers to the situation when we accept the null

hypothesis when it is false

Suppose a teacher evaluates the examination paper to decide whether a
student passes or fails

H0: Student has passed

H1: Student has failed

Type I Error will be the teacher failing the student[rejects H0] although the
student scored the passing marks[H0 was true]

Type II Error will be the case where the teacher passes the student[do not
reject H0] although the student did not score the passing marks[H1 is true]

Level of Significance

The level of significance refers to a constant probability of incorrect

abolition of the null hypothesis. It is mainly a Type I error probability
that is predetermined by the statistician before the collection of data,
together with the outcomes of error. It refers to the measurement of
statistical significance when the null hypothesis is implicit to be
established or discarded. The level of significance helps to determine
the statistical significance of the result of the null hypothesis to be
false. For the rejection of the null hypothesis, there should be
stronger evidence when the level of significance is low.

The level of significance can be said to be the value which is
represented by the Greek symbol α (alpha). Here, Level of
significance = α .
p-values are the probability of procuring an effect no less than as
intense as the one in the test data, assuming the null hypothesis to
be true.
When the null hypothesis is rejected, a type I error occurs. It can be
a false positive too, and they can be controlled only by defining an
appropriate level of significance.

For research purposes, the 5 significance level is the most
commonly determined level.
Lower p-value means a significant difference in the considered
values from the population value that was hypothesized in the
beginning. The results are highly significant if the p-value is very
less, i.e. 0.05 as it is rarely practiced.

Example: The value significant at 5% refers to p-value is less than

0.05 or p < 0.05. Similarly, significant at the 1% means that the p-
value is less than 0.01.
The level of significance is taken at 0.05 or 5%. When the p-value
is low, it means that the recognised values are significantly different
from the population value that was hypothesised in the beginning.

The p-value is said to be more significant if it is as low as possible. Also,
the result would be highly significant if the p-value is very less. But,
most generally, p-values smaller than 0.05 are known as significant,
since getting a p-value less than 0.05 is quite a less practice.

How to find the level of significance?

To measure the level of statistical significance of the result, the
investigator first needs to calculate the p-value. It defines the probability
of identifying an effect which provides that the null hypothesis is true.
When the p-value is less than the level of significance (α), the null
hypothesis is rejected. If the p-value so observed is not less than the
significance level α, then theoretically null hypothesis is accepted. But
practically, we often increase the size of the sample size and check if we
reach the significance level. The general interpretation of the p-value
based upon the level of significance of 10%: 16
● If p > 0.1, then there will be no assumption for the null
● If p > 0.05 and p ≤ 0.1, it means that there will be a low
assumption for the null hypothesis.
● If p > 0.01 and p ≤ 0.05, then there must be a strong assumption
about the null hypothesis.
● If p ≤ 0.01, then a very strong assumption about the null
hypothesis is indicated.
The outcome of the hypothesis testing is
evaluated with the help of a p-value. If the
p-value is less than the level of
significance, then the hypothesis testing
outcome is statistically significant. On the
other hand, if the hypothesis testing
outcome is not statistically significant or the
p-value is more than the level of
significance, then we fail to reject the null
hypothesis. The same is represented in the
picture below for a right-tailed test. I will be
posting details on different types of tail test
in future posts.

The picture below represents the concept for two-tailed hypothesis

For example: Let’s say that a school principal wants to find out whether
extra coaching of 2 hours after school help students do better in their
exams. The hypothesis would be as follows:

● Null hypothesis: There is no difference between the performance

of students even after providing extra coaching of 2 hours after the
schools are over.
● Alternate hypothesis: Students perform better when they get
extra coaching of 2 hours after the schools are over. This
hypothesis testing example would require a level of significant
value at 0.05 or simply put, it would need to be highly precise that
there’s actually a difference between the performance of students
based on whether they take extra coaching.
Now, let’s say that we conduct this experiment with 100 students and
measure their scores in exams. The test statistics is computed to be z=-
0.50 (p-value=0.62). Since the p-value is more than 0.05, we fail to reject
the null hypothesis. There is not enough evidence to show that there’s a
difference in the performance of students based on whether they get extra

While performing hypothesis tests or experiments, it is important to keep

the level of significance in mind.

How is the level of significance used in hypothesis
If the test statistic falls within the critical region, you reject the null
hypothesis. This means that your findings are statistically significant and
support the alternate hypothesis. The value of the p-value determines
how likely it is for finding this outcome if, in fact, the null hypothesis were
true. If the p-value is less than or equal to the level of significance, you
reject the null hypothesis. This means that your hypothesis testing
outcome was statistically significant at a certain degree and in favor of
the alternate hypothesis.

If on the other hand, the p-value is greater than alpha level or

significance level, then you fail to reject the null hypothesis.
These findings are not statistically significant enough for one to
reject the null hypothesis. The same is represented in the diagram


from scipy.stats import ttest_1samp

import numpy as np
# Creating a sample of ages
ages = [45, 89, 23, 46, 12, 69, 45, 24, 34, 67]

# Calculating the mean of the sample

mean = np.mean(ages)

# Performing the T-Test

t_test, p_val = ttest_1samp(ages,50)
print("P-value is: ", p_val)

# taking the threshold value as 0.05 or 5%

if p_val < 0.05:
print(" We can reject the null hypothesis")
print("We can accept the null hypothesis")


For the above code:
I am determining if the average age of 10 people is 50 or
NULL HYPOTHESIS(H0): The average age is around 45.4
ALTERNATE HYPOTHESIS(H1):The average age deviates from
The p value obtained from the t test on the given sample=0.55
Since the p value >0.5,we accept the null hypothesis



You might also like