Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Statistical Analysis Report using

IBM SPSS

RESEARCH METHODS – CA 1

MSc in Cybersecurity

Jaganath Kaliyamoorthy
Student ID: 19198868

School of Computing

National College of Ireland

Lecturer: Dr. Vladimir Milosavljevic

1
Contents
1. Introduction: ............................................................................................................................................ 3
1.1 Tool Used: .................................................................................................................................................... 3
1.2 Independent Sample t test:......................................................................................................................... 3
1.3 Dataset: ........................................................................................................................................................ 3
1.3.1 Assumptions: ............................................................................................................................................ 3
1.3.2 Hypothesis: ............................................................................................................................................... 3
1.3.3 Group Statistics: ...................................................................................................................................... 3
1.3.4 Independent sample test:......................................................................................................................... 4
1.3.5 Conclusion: ............................................................................................................................................... 4
1.4 Mann-Whitney U Test ................................................................................................................................ 5
1.4.1 Assumptions: ............................................................................................................................................ 5
1.4.2 Hypothesis: ............................................................................................................................................... 5
1.4.3 Mann-Whitney U Test: ............................................................................................................................ 5
1.4.4 Conclusion: ............................................................................................................................................... 6
1.5 Chi-Square test: .......................................................................................................................................... 6
1.5.1 Assumptions: ............................................................................................................................................ 6
1.5.2 Hypothesis: ............................................................................................................................................... 6
1.5.3 Chi-Square Test: ...................................................................................................................................... 6
1.5.4 Conclusion: ............................................................................................................................................... 8
2.1 Dataset: ........................................................................................................................................................ 8
2.2 Assumptions: ............................................................................................................................................... 8
2.3 Hypothesis: .................................................................................................................................................. 9
2.4 Data Cleaning: ............................................................................................................................................ 9
2.5 Descriptive statistics: .................................................................................................................................. 9
2.6 Homogeneity of variances and ANOVA table:....................................................................................... 10
2.7 Post Hoc test: ............................................................................................................................................. 10
2.8 Conclusion: ................................................................................................................................................ 11
REFERENCE: .................................................................................................................................................... 12

2
1. Introduction:
The report provides overview of multiple statistical tests conducted on two datasets. The test
includes Independent sample t test, Mann – Whitney U test, Chi- square test and one-way anova
test. One of the datasets has been downloaded from Kaggle and the link has been provided.

1.1 Tool Used:


IBM SPSS has been used for analysing the dataset.

1.2 Independent Sample t test:


The independent-samples t-test are used to compare the mean values of two groups with
continuous variable.[1]

Formula:

Assumptions:

1.3 Dataset:
Dataset link: https://www.kaggle.com/muraleetharan/college-student-dataset
The Dataset is a college student dataset which has demographics such as height, age, marital
status, hours of study, student’s current gpa, does they have childrens etc., Using these variables
we are performing number of test and calculating the mean and testing whether the model is
significant.

1.3.1 Assumptions:
• The dependent variable must be quantified in a continuous scale.
• The independent variable must have more than two groups.
• The Observation should not have any relationship between each group.
• No Substantial outliers.
• Variances ought to be homogeneous.
• The target attribute must be distributed normally for every group of the dependent
factors.

1.3.2 Hypothesis:
• Null hypothesis(H0): The hours of study per week between marital status (single) and
(married) is equal.
• Alternate hypothesis (H1): The hours of study per week between marital status (single)
and (married) is not equal.

1.3.3 Group Statistics:


The below table describes the group comparison of number of sample size, standard deviation,
Mean, , standard error for hours of study per week. There are 20 samples in single group and

3
18 samples in married group. The mean value of hours of study per week for single is 11.50
and married is 18.06.

Fig. 1 Group statistics

1.3.4 Independent sample test:


The F statistics and significance p value infers the results of the independent sample t test.
The test has been conducted and the statistic (F) and significance (p) has been calculated and
from the findings we have could observe that the variance of hours of study per week for single
is significantly different than the hours of study per week for married. Also, the value of sig
(2-tailed) is .007, so the mean value of single and married students is unequal. Confidence
interval (CI) for the mean difference is not 0 since the p value is small in the significance test.

Fig 2 Independent sample test

1.3.5 Conclusion:
Levene’s test has been performed to test the equality of variances which drive us to the
conclusion. We can conclude that the p value is 0.016 thereby proving to be less than 0.05
which indicates that the null hypothesis is rejected. Also, the hours of study per week is
statistically different for married and single categories.

4
1.4 Mann-Whitney U Test
The Significances difference between the two independent categories are compared using the
Mann Whitney U test. This is usually performed when the dependent attribute consists of
nominal or continuous values, although, not being normally distributed.[2]
1.4.1 Assumptions:
• The dependent variable must be quantified in a continuous scale.
• The independent variable must have independent groups which are categorical.
• The Observation should not have any relationship between each group.
• This test can be conducted when the two variables are not normally distributed.

1.4.2 Hypothesis:
• Null hypothesis (H0): The hours of study per week between students who has children
and who does not have children are equal.
• Alternate hypothesis (H1): The hours of study per week between students who has
children and who does not have children are not equal.

1.4.3 Mann-Whitney U Test:

The table below shows that the hours of study per week for the students who have children are
more than the hour of study per week for the students who does not have children.

Fig 3 Mann-Whitney test

The table below shows the test statistics and it clearly states the actual significance value for
the test. The table provides the value of U statistics and Asymp. Sig (2-tailed) p-value. From
the table below we can conclude that the hours of study per week for people who have children
are higher than the students who did not have children group (U = 142, p= .001).

5
Fig 4 Test statistics

1.4.4 Conclusion:
The p-value being 0.01 which is lesser than 0.05, the null hypothesis is rejected since the
hours of study per week for the students who have children and who does not have children is
significantly different.
1.5 Chi-Square test:
This test is used to find out the relation between the categorical variables.[3]

1.5.1 Assumptions:
• We should consider two continuous attributes for the chi-square test.
• The two attributes which are continuous must be categorical with more than two
independent groups.

1.5.2 Hypothesis:
• Null hypothesis (H0): The student’s current GPA is not associated with the age group
(less than 22, 22-28, 30 or more).
• Alternate hypothesis (H1): The student’s current GPA is associated with the age group
(less than 22, 22-28, 30 or more).

1.5.3 Chi-Square Test:

Fig 5 Case processing summary

6
In the above table, the missing values has been mentioned in the column as 0 and the number
of samples are 50 in numbers.

……

Fig 6 Chi-Square tests

7
The table shows the person chi-square value to be 57.37 and significance value is 0.002 which
is much lesser than 0.05. In the fig 7, we could see the phi and Cramer’s V test significance.
This test is performed for checking the strength of the association between the variables is
strong.

Fig 7 Symmetric Measures

1.5.4 Conclusion:
Since the significance p value is .002 which is lesser than 0.05, the null hypothesis is rejected
since the student’s current GPA and age group is significantly different. So, we are rejecting
the null hypothesis and stating that the student’s current GPA is associated with the age group.

2. ANOVA TEST:
The difference between the means of independent categories is tested by the application of
ANOVA test which nothing but a one-way analysis of variance. [4]

2.1 Dataset:
Group Bone density (mg/cm3)
Control 611 621 614 593 593 653 600 554 603 569
Low jump 635 605 638 594 599 632 631 588 607 596
High jump 650 622 626 626 631 622 643 674 643 650

The dataset is associated with the study of rats which undergo three types of treatments, one
control with no jumping and another is low jump of height up to 30 centimeters and the last
one is up to 60 centimeters. And the bone density of rats after 8 weeks is provided.

2.2 Assumptions:
• The dependent variable must be quantified in a continuous scale.
• The independent variable must have more than two independent groups which are
categorical.
• The Observation should not have any relationship between each group.
• No Considerable existence of outliers.
• The dependent variable should be normally distributed for each group of the
independent variable.
• Variances ought to be homogeneous.

8
2.3 Hypothesis:
• Null hypothesis (H0): The mean of the groups is equal with respective to bone density
of rats.
• Alternate hypothesis (H1): The mean of the groups is not equal with respective to bone
density of rats.

2.4 Data Cleaning:


The data has been cleaned and the categories are changed value from strings. The cleaning has
been done in IBM SPSS and the snip is provided in the below figure,

Fig 8 Data Cleaning

2.5 Descriptive statistics:


The statistical table below clearly shows us the mean, standard deviation, standard error and
lower/upper bound of mean of the three categories (control, high jump, and low jump). The
mean values of Control, low jump, and high jumps are 601.10, 612.50 and 638.70 respectively.
The mean value of high jump is higher with respect to the bone density of the rats
comparatively.

Fig 9 Descriptive of Mean, Std. Deviation and Std Error.

9
2.6 Homogeneity of variances and ANOVA table:
The Homogeneity of the variance has been tested through Tukey method and we can assume
that the variance of each groups is equal. From the table, the control and Low jump have mean
value of 601.10 and 612.50 which shows the homogeneity has been breached. However, in
high jump the mean is 638.70 which has a mean long way from other two groups which shows
the homogeneity is not breached for this variance.

Fig 10 Homogeneity of variance

The ANOVA table below clearly reveals the mean between the groups is statistically
significant since the value of F and significance are 7.978 and 0.002 respectively. This clearly
shows that the bone density is higher for the rats which undergone high jump.

Fig 11 ANOVA Test

2.7 Post Hoc test:


The post Hoc test of Tukey and Bonferroni has been conducted and the test shows the variance
among each pair of groups (control, low jump, high jump). From the table below we could see
that the mean difference between control and high jump, high jump and control are 0.002 which

10
is lesser the 0.05. For both the test the value is the same. Hence, the high jump supports the
alternate hypothesis.

Fig 12 Post Hoc Tests – Tukey and Bonferroni

2.8 Conclusion:
From the analysis, we have statistically significantly demonstrated the difference between the
groups through one-way ANOVA where the values of F (2,27) = 7.978 and p = 0.002 which
is less than 0.05. This concludes that the bone density of rats which performs high jump are
higher than the other two groups. Hence, statistically significantly the high jump supports in
the rejection of null hypothesis.

11
REFERENCE:
[1] Independent t-test in SPSS Statistics - Procedure, output and interpretation of the output
using a relevant example | Laerd Statistics [WWW Document], n.d. URL
https://statistics.laerd.com/spss-tutorials/independent-t-test-using-spss-statistics.php
(accessed 11.1.20).
[2] Mann-Whitney U Test in SPSS Statistics | Setup, Procedure & Interpretation | Laerd
Statistics [WWW Document], n.d. URL https://statistics.laerd.com/spss-tutorials/mann-
whitney-u-test-using-spss-statistics.php (accessed 11.1.20).
[3] Yeager, K., n.d. LibGuides: SPSS Tutorials: Chi-Square Test of Independence [WWW
Document]. URL https://libguides.library.kent.edu/SPSS/ChiSquare (accessed 11.1.20).
[4] One-way ANOVA in SPSS Statistics - Step-by-step procedure including testing of
assumptions. [WWW Document], n.d. URL https://statistics.laerd.com/spss-tutorials/one-
way-anova-using-spss-statistics.php (accessed 11.1.20).

12

You might also like