Professional Documents
Culture Documents
New Normal MPA Statistics Chapter 2
New Normal MPA Statistics Chapter 2
New Normal MPA Statistics Chapter 2
According to Majaski ( 2019), hypothesis testing is an act in statistics whereby an analyst tests an assumption
regarding a population parameter. Moreover, it is a methodology employed by the analyst depends on the nature of the
data used and the reason for the analysis. Lastly, hypothesis testing is used to infer the result of a hypothesis performed
on sample data from a larger population.
While samples are much more practical and less expensive to work with, there are tradeoffs. When you
estimate the properties of a population from a sample, the sample statistics are unlikely to equal the actual population
value exactly. For instance, your sample mean is unlikely to equal the population mean. The difference between the
sample statistic and the population value is the sample error.
In statistics according to Beers (2019), the p-value is the probability of obtaining the observed results of a test,
assuming that the null hypothesis is correct. It is the level of marginal significance within a statistical hypothesis test
representing the probability of the occurrence of a given event.
Moreover, the p-value is used as an alternative to rejection points to provide the smallest level of significance at
which the null hypothesis would be rejected. A smaller p-value means that there is stronger evidence in favor of the
alternative hypothesis.
Figure 1
In figure 2, the Sample mean (330.6) falls within
the critical region, which indicates it is statistically
significant at the 0.05 level. We can also see if it is
statistically significant using the other common significance
level of 0.01.
Figure 2
This comparison shows why you need to choose your significance level before you begin your study. Since the
graph determine that our results are statistically significant at the 0.05 level without using a P value. However, utilizing
the numeric output produced by statistical software, we compare the P value to your significance level to make this
determination.
Parametric tests assume underlying statistical distributions in the data. Therefore, several conditions of validity
must be met so that the result of a parametric test is reliable. For example, Student’s t-test for two independent
samples is reliable only if each sample follows a normal distribution and if sample variances are homogeneous.
Nonparametric tests do not rely on any distribution. They can thus be applied even if parametric conditions of
validity are not met.
Parametric tests often have nonparametric equivalents. (Campbell and Swinscow, 2009)
A test statistic is used in a hypothesis test when you are deciding to support or reject the null hypothesis. The
test statistic takes your data from an experiment or survey and compares your results to the results you would expect
from the null hypothesis (WP, 2020).
The single sample t method tests a null hypothesis that the population mean is equal to a specified value. If this
value is zero (or not entered) then the confidence interval for the sample mean is given (Altman, 1991; Armitage and
Berry, 1994).
Example: Consider 20 first year resident female doctors drawn at random from one area, resting systolic blood pressures
measured using an electronic sphygmomanometer were:
Sample BP Sample BP
1. 128 11. 127
2. 118 12. 115
3. 144 13. 142
4. 133 14. 140
5. 132 15. 131
6. 111 16. 132
7. 149 17. 122
8. 139 18. 119
9. 136 19. 129
10. 126 20. 128
Can we conclude that the instrument is effective? (using 0.05 level of significance)
2. Level of Significance
α = 0.05
3. Statistical Tool
Dependent Sample t-test
4. Computation
Using JASP
4. Go to Results
One Sample T-Test
t df p
BP 58.392 19 < .001
Based from the results, there was significant difference on the systolic blood pressure of the participants [t(19) =
58.39, p < 0.001] at 0.05 level of significance. Therefore, the instrument is effective.
Activity:
1. A weight reduction program claims to be effective in treating obesity. To test this claim 12 PNP personnel were put on
the program and the number of pounds of weight gain/loss was recorded for each person after two years. Can we
conclude the weight loss program effective? (use 0.05 level of significance)
Example: Researchers give each of a random sample of 15 employees with high absenteeism records (Group A) a test to
measure level of hostility. They give the same test to an independent random sample of 22 employees with low
absenteeism records (Group B). Is there significant difference in the level of hostility of the employees when grouped
according to absenteeism record? (use 0.05 level of significance.)
Absenteeism Hostility
Group A 62
Group A 93
Group A 71
Group A 90
Group A 69
Group A 90
Group A 71
Group A 76
Group A 86
Group A 71
Group A 81
Group A 84
Group A 65
Group A 61
Group A 69
Group B 67
Group B 66
Group B 64
Group B 42
Group B 59
Group B 70
Group B 75
Group B 69
Group B 72
Group B 74
Group B 55
Group B 55
Group B 56
Group B 57
Group B 60
Group B 48
Group B 60
Group B 53
Group B 65
Group B 64
Group B 46
Group B 41
Is there significant difference in the level of hostility of the employees when grouped according to absenteeism
record?
2. Level of significance
α = 0.05
3. Statistical Tool
Independent sample t-test
4. Computation
Using JASP
3. Direct Absenteeism to Grouping Variables and Hostility to Dependent Variables, and click Descriptives
4. Go to Results
Group Descriptives
Group N Mean SD SE
Hostility Group A 15 75.933 10.640 2.747
Group B 22 59.909 9.851 2.100
5. Making of decision and conclusion.
Results shows that there was significant difference in the level of hostility of the employees when grouped
according to absenteeism record [t(35) = 4.704, p < 0.001] at 0.05 level of significance. This implies that the participants
with higher number of absences are more hostile than the participants with lower number of absences.
Activity:
A research study was conducted to examine the differences between older and younger adults on perceived life
satisfaction. A pilot study was conducted to examine this hypothesis. Ten older adults (over the age of 70) and ten
younger adults (between 20 and 30) were give a life satisfaction test (known to have high reliability and validity). Scores
on the measure range from 0 to 60 with high scores indicative of high life satisfaction; low scores indicative of low life
satisfaction. Is there significant difference on the perceived life satisfaction when the participants are grouped according
to age? (use 0.05 level of significance). (http://faculty.webster.edu/woolflm/ttest.html)
Participants Scores
Older 45
Older 38
Older 52
Older 48
Older 25
Older 39
Older 51
Older 46
Older 55
Older 46
Younger 34
Younger 22
Younger 15
Younger 27
Younger 37
Younger 41
Younger 24
Younger 19
Younger 26
Younger 36
Paired t-test
The Paired Samples t-test compares two means that are from the same individual, object, or related units
(https://libguides.library.kent.edu/SPSS/PairedSamplestTest). The two means can represent things like:
Moreover, the purpose of the test is to determine whether there is statistical evidence that the mean difference
between paired observations on a particular outcome is significantly different from zero. The Paired Samples t Test is a
parametric test. This test is not appropriate for analyses involving the following:
1) unpaired data;
2) comparisons between more than two units/groups;
3) a continuous outcome that is not normally distributed; and
4) an ordinal/ranked outcome.
Example:
A dose of the drug Captropil, designed to lower systolic blood pressure, is administered to 10 randomly selected
volunteers, with the following results. Test the effectiveness of the drug. (use 0.01 level of significance)
Before After
120 118
136 122
160 143
98 105
115 98
110 98
180 180
190 175
138 105
128 112
Is there significant difference in the blood pressure of the volunteers before and after taking Captropil?
Ho: There is no significant difference in the blood pressure of the volunteers before and after taking
Captropil.
H1: There is significant difference in the blood pressure of the volunteers before and after taking
Captropil.
2. Level of significance
α = 0.01
3. Statistical tool
4. Computation
Using JASP
3. Direct the Before and After Data to the Variable Pairs, and click Descriptives
4. Go to Results
Descriptives
N Mean SD SE
Before 10 137.500 30.351 9.598
After 10 125.600 30.424 9.621
5. Making of decision and conclusion.
Results shows, there was significant difference on the systolic blood pressure of the participants before and
after the administration of Captropil [t(9) = 3.366, p = 0.008] at 0.01 level of significance. This implies that Captropil is
effective to lower down the systolic blood pressure of the participants.
Activity:
Based on “An analysis of Factors that contribute to the Efficacy of Hypnotic Analgesia”, by Price and Barber,
Journal of Abnormal Psychology, Vol.96, No.1. A study was conducted to investigate the effectiveness of hypnotism in
reducing pain. Results for randomly selected subjects, at level of significance 0.05, test the claim that the sensory
measurements are lower after hypnotism. (The values are before and after hypnosis; the measurements are in
centimeter on a pain scale.) Does hypnotism appear effective in reducing pain?
One-way ANOVA
The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically significant
differences between the means of two or more independent (unrelated) groups (although you tend to only see it used
when there is a minimum of three, rather than two groups) (Leard, 2018).
Furthermore, the one-way ANOVA is an omnibus test statistic and cannot tell you which specific groups were
statistically significantly different from each other; it only tells you that at least two groups were different. Since you
may have three, four, five or more groups in your study design, determining which of these groups differ from each
other is important. You can do this using a post hoc test.
Example: Researchers wish to study the effect of crowding on the productivity of office workers. Office workers of the
same age, sex, and level of training and experience are randomly assigned to one of three groups representing three
level of crowding: Severe, Moderate, or None. The following shows the results. Can we conclude from these data that
crowding affects productivity? Let α = 0.05.
Crowding Productivity
Severe 22
Severe 49
Severe 32
Severe 37
Severe 32
Severe 22
Moderate 31
Moderate 30
Moderate 43
Moderate 30
Moderate 46
None 68
None 73
None 78
None 47
None 56
None 59
Is there significant difference on the level of productivity of the participants when grouped according to
crowding?
Ho: There is no significant difference on the level of productivity of the participants when grouped
according to crowding.
H1: There is significant difference on the level of productivity of the participants when grouped according
to crowding.
2. Level of significance
α = 0.05
3. Statistical tool
one-way ANOVA
4. Computation
Using JASP
4. Click Post Hoc Test and direct Crowding to the other box
ANOVA - Productivity
Cases Sum of Squares df Mean Square F p
Crowding 3415.284 2 1707.642 16.732 < .001
Residuals 1428.833 14 102.060
Descriptives - Productivity
Crowding Mean SD N
Moderate 36.000 7.842 5
None 63.500 11.572 6
Severe 32.333 10.132 6
Post Hoc Comparisons - Crowding
Mean Difference SE t p scheffe
Moderate None -27.500 6.117 -4.495 0.002
Severe 3.667 6.117 0.599 0.837
None Severe 31.167 5.833 5.343 < .001
Based from the results, there was significant difference on the level of productivity when the participants are
grouped according to crowding [F(2, 14) = 16.732, p < 0.001] at 0.05 level of significance.
Utilizing Scheffe Method, none crowding creates difference with moderate and severe crowding with mean
difference of 27.5 and 31.167 respectively. This implies that the none crowded office is more productive than the
moderate and severe crowded working area.
Activity:
A firm wishes to compare four programs for training workers to perform a certain manual task. Twenty new
employees are randomly assigned to the training programs, with 5 in each program. At the end of the training period, a
test is conducted to see how quickly trainees can perform the task. The number of times the task is performed per
minute is recorded for each trainee, with the following results:
Program Score
Program 1 9
Program 1 12
Program 1 14
Program 1 11
Program 1 13
Program 2 10
Program 2 6
Program 2 9
Program 2 9
Program 2 10
Program 3 12
Program 3 14
Program 3 11
Program 3 13
Program 3 11
Program 4 9
Program 4 8
Program 4 11
Program 4 7
Program 4 8
References:
Campbell and Swinscow, (2009), Parametric and Non-parametric tests for comparing two or more groups, Parametric
and Non-parametric tests for comparing two or more groups, https://www.healthknowledge.org.uk/public-health-
textbook/research-methods/1b-statistical-methods/parametric-nonparametric-tests, Date retrieved January 9, 2020,
Thursday.
Ellise, et.al (2015), Understanding Hypothesis Tests: Significance Levels (Alpha) and P values in Statistics,
https://blog.minitab.com/blog/adventures-in-statistics-2/understanding-hypothesis-tests-significance-levels-alpha-and-
p-values-in-statistics, Date retrieved January 9, 2020.
Mangiafico, S. (2016), Assumptions in parametric statistics, https://rcompanion.org/handbook/I_01.html, Date
Retrieved January 9, 2020, Thursday
Statistics Solutions (SS) (2013). Hypothesis Testing [WWW Document]. Retrieved from
http://www.statisticssolutions.com/academic-solutions/resources/directory-of-statistical-analyses/hypothesis-testing/
W.P. Word Press (2020), Test Statistic: What is it? Types of Test Statistic,
https://www.statisticshowto.datasciencecentral.com/test-statistic/, January 9, 2020, Thursday