Q2 Module 5 - Data Analysis Using Statistics and Hypothesis Testing

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Project MIMs

Grade 12 – Practical Research 2

G12 MIMs LC 18 & 22


DATA ANALYSIS USING STATISTICS AND HYPOTHESIS TESTING

Learning Competencies:
• plans data analysis using statistics and hypothesis testing (if appropriate)
• use statistical techniques to analyze data – study differences and relationships limited to
bivariate analysis

Objectives:
• prepare/ finalize plan for data analysis using statistics and hypothesis testing (if
appropriate)
• use appropriate statistical technique in doing bivariate analysis of data

REMEMBER:

HYPOTHESIS TESTING

Hypothesis testing is a form of inferential statistics that allows us to draw conclusions


about an entire population based on a representative sample. You gain tremendous benefits
by working with a sample. In most cases, it is simply impossible to observe the entire
population to understand its properties. The only alternative is to collect a random sample
and then use statistics to analyze it.

While samples are much more practical and less expensive to work with, there are
trade-offs. When you estimate the properties of a population from a sample, the sample
statistics are unlikely to equal the actual population value exactly. For instance, your sample
mean is unlikely to equal the population mean. The difference between the sample statistic
and the population value is the sample error.

Differences that researchers observe in samples might be due to sample error rather
than representing a true effect at the population level. If sample error causes the observed
difference, the next time someone performs the same experiment the results might be
different. Hypothesis testing incorporates estimates of the sampling error to help you make
the correct decision.

For example, if you are studying the proportion of defects produced by two
manufacturing methods, any difference you observe between the two sample proportions
might be sample error rather than a true difference. If the difference does not exist at the
population level, you will not obtain the benefits that you expect based on the sample
statistics. That can be a costly mistake!

ENLIGHTEN:

HYPOTHESIS TESTING

Hypothesis testing is a statistical analysis that uses sample data to assess two
mutually exclusive theories about the properties of a population. Statisticians call these
theories the null hypothesis and the alternative hypothesis. A hypothesis test assesses your
sample statistic and factors in an estimate of the sample error to determine which hypothesis
the data support.

When you can reject the null hypothesis, the results are statistically significant, and
your data support the theory that an effect exists at the population level.
EFFECT

The effect is the difference between the population value and the null hypothesis
value. The effect is also known as population effect or the difference. For example, the mean
difference between the health outcome for a treatment group and a control group is the
effect.

Typically, you do not know the size of the actual effect. However, you can use a
hypothesis test to help you determine whether an effect exists and to estimate its size.

An effect can be statistically significant, but that does not necessarily indicate that it
is important in a real-world, practical sense.

NULL HYPOTHESIS

The null hypothesis is one of two mutually exclusive theories about the properties of
the population in hypothesis testing. Typically, the null hypothesis states that there is no
effect (i.e., the effect size equals zero). The null is often signified by H0.

In all hypothesis testing, the researchers are testing an effect of some sort. The effect
can be the effectiveness of a new vaccination, the durability of a new product, the proportion
of defect in a manufacturing process, and so on. There is some benefit or difference that the
researchers hope to identify.

However, it is possible that there is no effect or no difference between the


experimental groups. In statistics, we call this lack of an effect the null hypothesis. Therefore,
if you can reject the null, you can favor the alternative hypothesis, which states that the effect
exists (does not equal zero) at the population level.

You can think of the null as the default theory that requires sufficiently strong
evidence against in order to reject it.

For example, in a 2-sample t-test, the null often states that the difference between
the two means equals zero.

ALTERNATIVE HYPOTHESIS

The alternative hypothesis is the other theory about the properties of the population
in hypothesis testing. Typically, the alternative hypothesis states that a population parameter
does not equal the null hypothesis value. In other words, there is a non-zero effect. If your
sample contains sufficient evidence, you can reject the null and favor the alternative
hypothesis. The alternative is often identified with H1 or HA.

For example, in a 2-sample t-test, the alternative often states that the difference
between the two means does not equal zero.

You can specify either a one- or two-tailed alternative hypothesis:

If you perform a two-tailed hypothesis test, the alternative states that the population
parameter does not equal the null value. For example, when the alternative hypothesis is
HA: μ ≠ 0, the test can detect differences both greater than and less than the null value.

A one-tailed alternative has more power to detect an effect, but it can test for a
difference in only one direction. For example, HA: μ > 0 can only test for differences that are
greater than zero.
P-VALUES

P-values are the probability that you would obtain the effect observed in your sample,
or larger, if the null hypothesis is correct. In simpler terms, p-values tell you how strongly
your sample data contradict the null. Lower p-values represent stronger evidence against
the null. You use P-values in conjunction with the significance level to determine whether
your data favor the null or alternative hypothesis.

SIGNIFICANCE LEVEL (ALPHA)

The significance level, also known as alpha or α, is an evidentiary standard that


researchers set before the study. It specifies how strongly the sample evidence must
contradict the null hypothesis before you can reject the null for the entire population. This
standard is defined by the probability of rejecting a null hypothesis that is true. In other words,
it is the probability that you say there is an effect when there is no effect. Lower significance
levels indicate that you require stronger evidence before you will reject the null.

For instance, a significance level of 0.05 signifies a 5% risk of deciding that an effect
exists when it does not exist.

Use p-values and significance levels together to help you determine which
hypothesis the data support. If the p-value is less than your significance level, you can reject
the null and conclude that the effect is statistically significant. In other words, the evidence
in your sample is strong enough to be able to reject the null hypothesis at the population
level.

TYPES OF ERRORS IN HYPOTHESIS TESTING

Statistical hypothesis tests are not 100% accurate because they use a random
sample to draw conclusions about entire populations. There are two types of errors related
to drawing an incorrect conclusion.

• False positives: You reject a null that is true. Statisticians call this a Type I error. The
Type I error rate equals your significance level or alpha (α).

• False negatives: You fail to reject a null that is false. Statisticians call this a Type II
error. Generally, you do not know the Type II error rate. However, it is a larger risk
when you have a small sample size, noisy data, or a small effect size. The type II error
rate is also known as beta (β).

Statistical power is the probability that a hypothesis test correctly infers that a sample
effect exists in the population. In other words, the test correctly rejects a false null
hypothesis. Consequently, power is inversely related to a Type II error. Power = 1 – β.

CHOOSING THE RIGHT STATISTICAL TEST

There are many different types of procedures you can use. The correct choice
depends on your research goals and the data you collect. Do you need to understand the
mean or the differences between means? Or, perhaps you need to assess proportions. You
can even use hypothesis testing to determine whether the relationships between variables
are statistically significant.

To choose the proper statistical procedure, you will need to assess your study
objectives and collect the correct type of data. This background research is necessary before
you begin a study.

Statistical tests are crucial when you want to use sample data to make conclusions
about a population because these tests account for sample error. Using significance levels
and p-values to determine when to reject the null hypothesis improves the probability that
you will draw the correct conclusion.

TYPES OF STATISTICAL TEST

After looking at the distribution of data and perhaps conducting some descriptive
statistics to find out the mean, median, or mode, it is time to make some inferences about
the data. As mentioned previously, inferential statistics are the set of statistical tests
researchers use to make inferences about data. These statistical tests allow researchers to
make inferences because they can show whether an observed pattern is due to intervention
or chance. There is a wide range of statistical tests. The decision of which statistical test to
use depends on the research design, the distribution of the data, and the type of variable. In
general, if the data is normally distributed, parametric tests should be used. If the data is
non-normal, non-parametric tests should be used. Below is a list of just a few common
statistical tests and their uses.

TYPE OF TEST USE


Correlational: these tests look for an association between variables
Tests for the strength of the association between two
Pearson Correlation
continuous variables
Tests for the strength of the association between two
Spearman Correlation ordinal variables (does not rely on the assumption of
normally distributed data)
Tests for the strength of the association between two
Chi-Square
categorical variables
Comparison of Means: these tests look for the difference between the means of
variables
Tests for the difference between two variables from
Paired T-Test
the same population (e.g., a pre- and posttest score)
Tests for the difference between the same variable
Independent T-Test from different populations (e.g., comparing boys to
girls)
Tests for the difference between group means after
any other variance in the outcome variable is
ANOVA
accounted for (e.g., controlling for sex, income, or
age)
TYPE OF TEST USE
Regression: these tests assess if change in one variable predicts change in another
variable
Tests how change in the predictor variable predicts
Simple Regression
the level of change in the outcome variable
Tests how changes in the combination of two or more
Multiple Regression predictor variables predict the level of change in the
outcome variable
Non-Parametric: these tests are used when the data does not meet the assumptions
required for parametric tests
Tests for the difference between two independent
Wilcoxon Rank-Sum Test variables; takes into account magnitude and direction
of difference
Tests for the difference between two related
Wilcoxon Sign-Rank Test variables; takes into account the magnitude and
direction of difference
Tests if two related variables are different; ignores the
Sign Test magnitude of change—only takes into account
direction

STATISTICAL TEST EXAMPLE

The exercise below provides practice in statistical test selection.

The examples below illustrate how a researcher might go about selecting a statistical
test. They also show that based on the study design and the distribution of the data, the best
statistical test for analysis can change.
Time Data Variable
Study Design Groups Test
points Distribution type

Pretest/Posttest: Studying
an eight-week tutoring
component at an after-
Ordinal:
school program. Assessing
1=Very
student satisfaction of 40
satisfied, Paired T-
participants. Comparing One Two Normal
2=Satisfied, Test
the same students’
3=Not at all
satisfaction using a pretest
satisfied
before the tutoring begins
and a posttest after the
tutoring component ends.

Wilcoxon
Same as
Same as above. One Two Non-normal Sign-
above
Rank Test

Pretest, Posttest, and


Control Group:
Comparing the satisfaction
of two groups of students
in different after-school Wilcoxon
Same as
programs. Each group has Two Two Non-Normal Rank-
above
25 participants. Comparing Sum Test
the satisfaction scores
using a pretest before the
intervention and a posttest
after the intervention.
Time Data Variable
Study Design Groups Test
points Distribution type

Pretest, Posttest:
Assessing weight loss after
a nutrition intervention
among the one group of 50
students who receive the
Continuous
intervention. Would like to
(ratio): Paired T-
determine if there is a One Two Normal
weight in Test
relationship between
pounds
participation in the
intervention and weight
loss. Weight is measured
before and after the
intervention.

LET’S TRY:

Instructions: Write your comprehensive learning about the following.

1. What are the processes and things to be considered in writing the statistical
treatment of data of a research study?
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
REINFORCEMENT:

Instructions: Based on your approved research topic and title, write the statistical treatment
of data of your research as part of the final requirement in Practical Research
2 by considering the learning that you have in this module. Use the format
below.

Statistical Treatment of Data


Challenge!

Find four (4) different quantitative research and read the statistical treatment of data
of the study. Critique the statistical treatment of data of the study based on the
learning you gained using this module. Follow the format below.

1. Research Title:

Remarks on the Statistical Treatment of Data

2. Research Title:

Remarks on the Statistical Treatment of Data


3. Research Title:

Remarks on the Statistical Treatment of Data

4. Research Title:

Remarks on the Statistical Treatment of Data

Prepared by:

MR. JESTER G. DE LEON


Master Teacher I, MNHS – SHS

You might also like