Download as pdf or txt
Download as pdf or txt
You are on page 1of 73

INTRODUCTION TO SPSS

SPSS (Statistical Package for Social Sciences) was initially developed by Norman H. Nie, a
social scientist himself, along with his fellow colleagues Dale H. Bent and C.Hadlai Hull in
1968 at the Stanford University (McCormick et al; 2016). The software program was later
acquired by IBM in 2009ans is now called IBM SPSS. SPSS is a powerful and user-friendly
software package for all sorts of statistical analysis of data (Levesque, 2007). This program is
mostly used by students and researchers from the fields of sociology, psychology, economics,
business studies, medicine, engineering, and other disciplines. In addition, various public,
private, and non-governmental organizations also use SPSS for various projects. SPSS is a
strong choice for marketing and survey companies for analysing consumer behaviour and
forecasting (Vorhies , 2017).

The biggest advantage of SPSS is that the program is designed to handle a large set of data
with multiple variables associated with it (Jasrai, 2020). It also has all the flexibilities of
multiple analyses of data along with graphical representation (Garth, 2008). SPSS is also easily
applicable to quantitative statistical analysis of data (Arkkelin, 2014) like correlation,
regression, ANOVA, T-tests, factorial analysis…etc of both parametric and non-parametric in
nature. It also has exceptional report generation capabilities(Wagner, 2019). Survey data can
also be deployed for strategic data mining for statistical analysis while also storing the same in
a safe and sound manner. Another big advantage of SPSS usage is it’s Graphical User Interface
(GUI), which is easy to learn and use (Landau and Everitt, 2004). It also has the unique ability
of creating variables from existing information (MacInnes, 2016). One can open a variety of
file formats such as Excel, SAS, Stata..etc. SPSS runs on Windows, macOS, and Linux
platforms (URI, 2019).

SPSS has disadvantages too. These include: The commercial liscence of the program, that is
paying for the software to use it. In some cases, especially for MS Excel users, the usability
might not feel the same. Editing the graphs is a bit difficult. Not many options are available for
the graphs as well. The graphic quality is poor, and is not easy to edit (UT Austin. 2012). The
program can run slow depending on the machine it is installed in. Certain add-in modules are
not as easy as they should be.

Despite the disadvantages, it is the most reliable statistical package for researchers to date.
References:

 SPSS. (2022). IBM. https://www.ibm.com/products/spss-statistics


 Pallant, J. (2021). SPSS Survival Manual: A step by step guide to data analysis using IBM
SPSS. McGraw-Hill Education.
 Field, A. (2021). Discovering Statistics using IBM SPSS Statistics. Sage Publications.

INTRODUCTION TO STATISTICS

Statistics refers to numerical facts. The summarized figures of numerical facts such as
percentages, averages, means, medians, modes, and standard deviations are referred to as
statistics. Statistics also signifies the method or methods of dealing with numerical facts. In
this sense, it is considered as a science of collecting, summarizing, analysing, and interpreting
numerical facts (Mangal, 2002). It involves the examination of data collected from samples
within populations as well as the use of probabilistic models to make inferences and draw
conclusions (APA, 2015). Statistics is now regarded as an indispensable instrument in the fields
of education and psychology, especially where any sort of measurement or evaluation is
involved (Mangal, 2002).

TYPES OF STATISTICS

Statistics can mainly be bifurcated into two types – descriptive and inferential statistics. These
are described below.

Descriptive Statistics: Descriptive statistics summarize large volumes of data. Descriptive


statistics provides the reader with an understanding of what the data look like by using a few
indicative or typical values. Depending on the type of data, this can involve a measure of central
tendency (e.g. mean) and a measure of spread (e.g. standard deviation), etc (Harrison et. al.
2020).

Inferential Statistics: An inference is a conclusion made according to some criteria. inferential


statistics is a broad class of statistical techniques that allow inferences about characteristics of
a population to be drawn from a sample of data from that population. These techniques include
approaches for testing hypotheses (APA, 2015).
References:

 American Psychological Association, &Vandenbos, G. R. (2015). APA Dictionary of


Psychology, Second Edition (2nd ed.). American Psychological Association.
 Harrison, V., Kemp, R., Brace, N., &Snelgar, R. (2020). SPSS for Psychologists (7th ed.).
Bloomsbury Academic.
 Mangal, S. K. (2002). Statistics in Psychology and Education. PHI Learning Pvt. Ltd.

INTRODUCTION TO PARAMETRIC AND NON-PARAMETRIC TESTS

Parametric Tests

Parametric tests are based on the assumption that the data follows a specific distribution,
usually the normal distribution. These tests require certain assumptions about the population,
including that the data is normally distributed, has equal variances, and is independent. Some
examples of parametric tests include the t-test, ANOVA, and regression analysis. These tests
are more powerful than non-parametric tests and provide more accurate results when the
assumptions are met.

Pearson Correlation Method

The Pearson correlation method is a measure of the strength and direction of the relationship
between two continuous variables. It ranges from -1 (a perfect negative correlation) to 1 (a
perfect positive correlation), with 0 indicating no correlation. The Pearson correlation
coefficient is widely used in many fields, including psychology, economics, and engineering.
It is used to test the relationship between two continuous variables, such as age and income or
height and weight.
Independent Sample t-Test

The independent sample t-test is used to compare the means of two independent groups. It is a
parametric test that assumes normal distribution and equal variances. The independent sample
t-test is widely used in many fields, including medicine, psychology, and economics. It is used
to test the difference between two groups, such as the effect of a drug treatment on two groups
of patients.

Paired Sample t-Test

The paired sample t-test is used to compare the means of two related groups. It is a parametric
test that assumes normal distribution and equal variances. The paired sample t-test is widely
used in many fields, including medicine, psychology, and education. It is used to test the
difference between two related groups, such as the effect of a drug treatment on the same group
of patients before and after the treatment.

Linear Regression

Linear regression is a statistical method used to analyze the relationship between two or more
continuous variables. It involves fitting a linear equation to the data, which can be used to make
predictions and analyze the relationship between the variables. Linear regression is widely used
in many fields, including biology, economics, and engineering. It is used to analyze the
relationship between two or more continuous variables, such as the relationship between height
and weight or the relationship between temperature and energy consumption.
Multiple Regression

Multiple regression is a statistical method used to analyze the relationship between a dependent
variable and multiple independent variables. The technique allows researchers to determine
how much of the variation in the dependent variable can be explained by the independent
variables. Multiple regression is often used to make predictions, identify the most important
predictors, and test hypotheses about the relationships between variables. The method can be
used in various contexts, such as in psychology to study the relationship between personality
traits and academic performance, or in economics to analyze the impact of different factors on
business success.

One-Way ANOVA

One-way ANOVA, or analysis of variance, is a statistical technique used to test for differences
between two or more groups on a single dependent variable. The goal of one-way ANOVA is
to determine whether the means of the groups are significantly different from each other. The
method is often used in experimental and quasi-experimental research designs to test
hypotheses about the effects of an independent variable on a dependent variable. One-way
ANOVA can be applied in various areas, such as in healthcare to evaluate the effect of different
treatments on patient outcomes, or in marketing to assess the impact of different advertising
campaigns on sales.

Two-Way ANOVA

Two-way ANOVA is a statistical technique used to test for differences between two or more
groups on two independent variables, also known as factors. The method aims to determine
whether there are main effects of each factor, as well as an interaction effect between the two
factors. Two-way ANOVA is often used in experimental and quasi-experimental research
designs to test hypotheses about the effects of two independent variables on a dependent
variable. Two-way ANOVA can be applied in various fields, such as in agriculture to examine
the impact of different fertilizers and irrigation systems on crop yields, or in education to assess
the effect of different teaching methods and classroom environments on student performance.

Non-Parametric Tests

Non-parametric tests are used when the data does not follow a specific distribution or when the
assumptions of parametric tests are not met. These tests do not require any assumptions about
the population, except that the data is ordinal or nominal. Some examples of non-parametric
tests include the Mann-Whitney U test, Kruskal-Wallis test, and Wilcoxon signed-rank test.
Non-parametric tests are less powerful than parametric tests but are more robust to outliers and
provide more accurate results when the assumptions of parametric tests are not met.

Spearman Rank Order Method

The Spearman rank order method is a non-parametric statistical method used to measure the
correlation between two variables. It is similar to the Pearson correlation method, but it is used
for non-linear relationships or when the data is not normally distributed. The Spearman rank
order coefficient ranges from -1 to 1, where 0 indicates no correlation. It is widely used in many
fields, including psychology, sociology, and economics. It is used to test the correlation
between two variables, such as the relationship between education level and income.

Mann Whitney U Test

The Mann Whitney U test is a non-parametric statistical method used to test the difference
between two independent groups. It is used when the data is not normally distributed or when
the assumption of equal variances is not met. The Mann Whitney U test is widely used in many
fields, including medicine, psychology, and education. It is used to test the difference between
two groups, such as the effect of a drug treatment on two groups of patients.
Wilcoxon Signed Rank Method

The Wilcoxon signed rank method is a non-parametric statistical method used to test the
difference between two related groups. It is used when the data is not normally distributed or
when the assumption of equal variances is not met. The Wilcoxon signed rank method is widely
used in many fields, including medicine, psychology, and education. It is used to test the
difference between two related groups, such as the effect of a drug treatment on the same group
of patients before and after the treatment.

Chi-Square Method

The chi-square method is a non-parametric statistical method used to test the relationship
between two categorical variables. It is widely used in many fields, including sociology,
psychology, and biology. The chi-square method is used to test the relationship between two
categorical variables, such as the relationship between gender and smoking.

Friedman Test

The Friedman test is a non-parametric statistical method used to test the difference between
three or more related groups. It is used when the data is not normally distributed or when the
assumption of equal variances is not met. The Friedman test is widely used in many fields,
including medicine, psychology, and education. It is used to test the difference between three
or more related groups, such as the effect of three different drugs on the same group of patients.

Kruskal Wallis Test

The Kruskal Wallis test is a non-parametric statistical method used to test the difference
between three or more independent groups. It is used when the data is not normally distributed
or when the assumption of equal variances is not met. The Kruskal Wallis test is widely used
in many fields, including medicine, psychology, and education. It is used to test the difference
between three or more independent groups, such as the effect of three different drugs on three
groups of patients.

Scatter Plot

A scatter plot is a graphical representation of the relationship between two continuous


variables. It is widely used in many fields, including biology, economics, and engineering. A
scatter plot is used to analyze the relationship between two continuous variables, such as the
relationship between height and weight or the relationship between temperature and energy
consumption.

Significance of Parametric and Non-Parametric Tests

Parametric and non-parametric tests are essential tools for hypothesis testing in many fields.
Parametric tests are more powerful and accurate than non-parametric tests when the
assumptions of the tests are met. However, when the assumptions of the tests are not met, non-
parametric tests are more accurate and robust than parametric tests. The choice of the
appropriate test depends on the data and the assumptions of the test.

References:
Gould, R. (2015). Introduction to statistics. John Wiley & Sons.
Dodge, Y. (2008). The Oxford dictionary of statistical terms. Oxford University Press.
Mangal, S. K. (2002). STATISTICS IN PSYCHOLOHY AND EDUCATION. PHI Learning.
INTRODUCTION TO CENTRAL TENDENCY

Central tendency measures describe the central location of an entire distribution of


observations. It is useful in comparing the performance of a group with that of a standard
reference group. It also helps simplify comparison of two or more groups tested under different
conditions. For example, We may be interested in whether video games can improve mental or
physical functioning in the elderly. Accordingly, one study has found that the average IQ of
those who played video games on a 2-month period, rose from 101.8 to 108.3 after the two-
month intervention (Drew and Waters, 1985). Thus, measures of central tendency like mean
(the average of the scores), median (the value that divides the distribution into half, i.e the
midpoint of the scores), and mode (The score with the greatest frequency), form the essence of
any research study that is looking at a deeper analysis of variables being studied.

References:

King, B. M., & Minium, E. W. (2008). Statistical reasoning in the behavioral sciences (5th ed.).
John Wiley & Sons Inc.
PATH FOR DESCRIPTIVE STATISTICS:

Enter data in Data and Variable view

Go to "Analyze"

Select "Descriptive Statistics"

Select "Frequencies"

Choose the Variables

Click on Statistics

select mean, median, mode, standard deviation, minimum,


maximum, skewness and kurtosis

Click on Continue
Central Tendency Table:

Table 1: Statistics
N Mean Median Mode Std. Deviation Skewness Kurtosis Minimum Maximum

19 25.05 22.00 20 5.317 .661 -1.349 20 34

Frequency Tables:

Table 2: Age

Frequency Percent Valid Percent

Valid 21 5 26.3 26.3

22 6 31.6 31.6

23 7 36.8 36.8

24 1 5.3 5.3

Total 19 100.0 100.0

Table 3: Gender

Frequency Percent Valid Percent

Valid Male 10 52.6 52.6

Female 9 47.4 47.4

Total 19 100.0 100.0


Interpretation:

The total number of observations (N) is 19. The mean score for life satisfaction is 25.05 and
the median score is 22. The mode of the data is 20. The standard deviation for the life
satisfaction score is 5.317. The minimum and maximum scores are 20 and 34 respectively. The
skewness in the data set is .661, which falls within the prescribed range of skewness (+/- 1.96).
The kurtosis is -1.349, which falls within the prescribed range of kurtosis (+/- 1.96).

FREQUENCIES:
PATH TO MEASURE THE FREQUENCIES:

Enter data in Data and Variable view

Go to "Analyze"

Select "Descriptive Statistics"

Select "Frequencies"

Choose the Variables

Click on Statistics

select mean, median, mode, standard deviation,


minimum, maximum, skewness and kurtosis

Click on Continue
DATA TABLES FOR FREQUENCIES:

Table 1: Socio Economic Status

Frequency Percent Valid Percent

Valid Lower SES 7 36.8 36.8

Middle SES 7 36.8 36.8

Higher SES 5 26.3 26.3

Total 19 100.0 100.0

Table 2: Age

Frequency Percent Valid Percent

Valid 21 5 26.3 26.3

22 6 31.6 31.6

23 7 36.8 36.8

24 1 5.3 5.3

Total 19 100.0 100.0

Table 3: Gender

Frequency Percent Valid Percent

Valid Male 10 52.6 52.6

Female 9 47.4 47.4

Total 19 100.0 100.0


Interpretation:

Age

The total number of observations (N) is 19. 26.3% of the sample i.e., 5 individuals are 21 years
old. 31.6% of the sample i.e., 6 individuals are 22 years old. 36.8% of the sample is 23 years
old, amounting to 7 individuals and 1 individual or 5.3 % of the sample is 24 years old.

Gender

The total number of observations (N) is 19. Out of this 52.6% are males and 47.4% are females.
In terms of absolute numbers, there are 10 males and 9 females in the sample.

Socio Economic Status

The total number of observations (N) is 19. Lower and Middle SES make up equal portions of
the sample with 36.8% or 7 observations each. The Higher SES group makes up 26.3% of the
sample with 5 observations.
PEARSON CORRELATION

It is a parametric test that is used for measuring the strength of relationship between dependent
and independent variables. Such a correlation clearly reveals how the change in one variable is
accompanied by a change in the other or to what extent an increase or decrease in one is
accompanied by the increase or decrease in the other. (Mangal, 2002). Correlation has been in
use for various research studies and gauging the effect of various government policies and
programmes, while also being reliable in studying organizational behaviour.

References:

 Best, J.W. and Kahn, J.V (1998). Research in Education. (8th ed). Butler University.
 Bordens K.S & Abbott B.B (2011). Research designs and methods: A process approach.
(8th ed). May field Publications.

Question: The following data describes the scores of fam env. And decision making of
professionals. Find if there is a significant relationship b/w the two variables.

Hypothesis: There is no significant relationship b/w fam environment scores and decision-
making scores.

Variables:

Variable 1- Family environment scores

Variable 2- Decision-making scores


PROCEDURE FOR PEARSON CORRELATION

Enter data in Data and Variable view

Go to "Analyze"

Select "Correlate"

Select "Bivariate"

Choose the Variables

check 'Pearson'

Select 'Options'

Select 'Means' and 'Standard


Deviation'

Click on Continue

Click Okay
DATA TABLES FOR PEARSON CORRELATION:

Table 1: Descriptive Statistics

Mean Std. Deviation N


Family_Env_Scores 19.9333 7.24418 30
Decision_Making_Scores 36.2667 17.86141 30

Table 2: Correlations
Decision_Making_Sco
Family_Env_Scores res
Family_Env_Scores Pearson Correlation 1 -.504**

Sig. (2-tailed) .005


N 30 30
Decision_Making_Scores Pearson Correlation -.504** 1
Sig. (2-tailed) .005
N 30 30
Note**. Correlation is significant at the 0.01 level (2-tailed).

Interpretation:

The above table 1 shows Mean and standard deviation of family environment and decision-
making scores. The Mean of family environment is 19.933 and the S.D is 7.244. The Mean of
decision-making scores is 36.267 and the S.D is 17.861, with both the samples having a sample
size of 30 each (N=30).

The above table 2 shows that there is significant negative relationship (r= -.504, P<0.01) or a
moderate correlation between family environment scores and decision-making scores.
Therefore, the above hypothesis is rejected.
REGRESSION

Correlation estimates the direction of the relationship between two variables. Regression on
the other hand, can estimate the values of a variable based on knowledge of the values of the
other variables. It thus establishes a cause-effect relationship between the variables of choice,
indicating how much of one, affects the other.

References:

 Best, J.W. and Kahn, J.V (1998). Research in Education. (8th ed). Butler University.

 Bordens K.S & Abbott B.B (2011). Research designs and methods: A process approach.
(8th ed). May field Publications.

LINEAR REGRESSION

In linear regression, variables must be linearly related to be able to fit a straight line to the data,
and use that line/equation to make the prediction of how much effect variable X has on variable
Y.

References:

 Best, J.W. and Kahn, J.V (1998). Research in Education. (8th ed). Butler University.

Question: To test the effect of stress on the work performance.

Variables:

Predictor Variable- Stress Scores

Criterion Variable- Work performance Scores

Hypothesis: Stress would positively and significantly effect work performance.


Enter data in Data and Variable view

Go to "Analyze"

Select "Regression"

Select "Linear"

Choose the Independent and


Dependent Variable/s

Click on Statistics

Select R Squared change,


Confidence Intervals and
Descriptives

Click on Continue

Click On Okay
DATA TABLES FOR ANALYSING REGRESSION:

Table 1: Descriptive Statistics

Mean Std. Deviation N


Work_Performance_Scores
75.8333 9.60274 30

Stress_Scores
37.2000 10.60384 30

Model Summary

Change Statistics

Model R R Square F Change df1 Sig. F Change Std. Coefficient Beta

1 .208a .043 1.265 1 .270 -.208

a. Predictors: (Constant), Stress_Score

Interpretation:

Table 1 shows that the mean and standard deviation of the work-performance scores is 75.833
and 9.603, respectively. It also shows that the mean and standard deviation of the stress scores
is 37.20 and 10.604, respectively. Table 2 shows that R2= .043, which says that stress is
affecting only 4.3% of one’s work performance. Accordingly, the F ratio was .270, which
indicated that the results are not statistically significant, needing for a larger sample or other
variables to account for a higher percentage of work performance. In addition, the correlation
between the two variables was weak and not statistically significant either (r=-0.208,
p=0.27>0.05). Other factors may be influencing a change in work performance which needs to
be investigated.
MULTIPLE REGRESSION

Multiple regression is a statistical technique used to analyze the relationship between a


dependent variable and two or more independent variables. It is an extension of simple linear
regression, which involves only one independent variable. In multiple regression, the
relationship between the dependent variable and each independent variable is examined while
controlling for the effects of other independent variables.
Multiple regression is used in various fields, including finance, economics, social sciences, and
engineering. It is used to understand the factors that affect the dependent variable and to predict
its value based on the values of the independent variables. For example, a company may use
multiple regression to determine how factors such as advertising, price, and competition affect
sales of a product. By controlling for the effects of other variables, it can provide insights into
the factors that affect the dependent variable and can be used to predict its value based on the
values of the independent variables.

References:
Kutner, M. H., Nachtsheim, C. J., Neter, J., and Li, W. (2005). Applied Linear Statistical
Models (5th ed.). New York: McGraw-Hill.

Question: The following data shows the age, reading age, standardized reading, standardized
spelling score and percentage of correct spelling. Determine if there is an effect of, reading
age, standardized reading, standardized spelling score on percentage of correct spelling.

Hypothesis: There is no significant prediction of the percentage of correct spellings by age,


reading age, standard reading score, and standard spelling score.

Criterion Variable:
Percentage of correct spelling

Predictor Variables:
Reading age
Standardized reading
Standardized spelling score
PROCEDURE FOR MULTIPLE REGRESSION:

Enter data in Data and Variable view

Go to "Analyze"

Select "Regression"

Select "Linear"

Choose the Independent and Dependent


Variable(s)

Click on Statistics

Select R square, descriptives, confidence


intervals

Click on Continue

DATA TABLES FOR MULTIPLE REGRESSION:

Table 1: Descriptive Statistics

Mean Std. Deviation N

pr_correct_spelling 59.7660 23.93307 47

Age 93.4043 7.49104 47

Reading_age 89.0213 21.36483 47

Std_reading_score 95.5745 17.78341 47

Std_spelling_score 107.0851 14.98815 47


Table 2: Correlations

Percentage Standardized Standardized


Correct Spelling Age Reading Age Reading Spelling Score
Pearson Correlation Percentage Correct 1.000 -.074 .623 .778 .847
Spelling
Age 1.000 .124 -.344 -.416
Reading Age 1.000 .683 .570
Standardized Reading 1.000 .793
Standardized Spelling 1.000
Score

Table 3: Model Summary

Adjusted R Std. Error of the Change Statistics


Model R R Square Square Estimate R Square Change F Change df1 Sig. F Change
1 .923a .852 .838 9.63766 .852 60.417 4 .000
a. Predictors: (Constant), Standardised Spelling Score, Age, Reading Age, Standardised Reading

Interpretation:

The total number of valid observations is 47. The predictor variables were age (M=59.76,
SD=23.93), reading age (M=89.02, SD=21.36), standard reading score (M=95.57, SD=17.78),
and standard spelling score (M=107.08, SD=14.98). The criterion variable was the percentage
of correct spellings (M=59.76, SD=23.93). It can be seen that the correlation between age and
percentage of correct spellings was negative and negligible (r=-.074). Reading age was
positively and substantially correlated with the percentage of correct spellings (r=.623). The
standard reading score (r=.778) and standard spelling score (r=.847) were positively and highly
correlated with the criterion variable. Overall, the model shows that the combined effect size
of the predictor variables on the percentage of correct spellings is 85.2% (R2=.852). The F
value came out to be 60.417 which is significant at 0.01 level (p=.000<0.01).

Thus, the null hypothesis is not supported, and an alternative hypothesis is adopted which states
that there is a significant prediction of the percentage of correct spellings by age, reading age,
standard reading score, and standard spelling score.
INDEPENDENT SAMPLE T-TEST

T-test was first described by William Sealy Gosset in 1908, when he published his article under
the pseudonym 'student' while working for a brewery (Drummond GB, Tom BD., 2011). In
other words, a Student’s t-test or an independent sample T-test is a ratio that quantifies how
significant the difference is between the 'means' of two groups while taking their variance or
distribution into account. It allows testing the values of a statistic between two groups (Wadhwa
RR, Marappa-Ganeshan R; 2022). It is used in varied clinical settings, medical researches,
companies…etc to carry out various kinds of researches suitable for the development of their
fields.

References:

 Wadhwa RR, Marappa-Ganeshan R. T Test. [Updated 2022 Jan 19]. In: StatPearls
[Internet]. Treasure Island (FL): StatPearls Publishing; 2022 Jan-. Available from:
https://www.ncbi.nlm.nih.gov/books/NBK553048/?report=classic.

 Drummond GB, Tom BD. Statistics, probability, significance, likelihood: words mean
what we define them to mean. J Physiol. 2011 Aug 15;589(Pt 16):3901-4.

Question: To find out whether there exist significant differences between male and female
stress scores.

Hypothesis: There will be no significant differences between the stress scores of males and
females.

Variables:
Test score- independent Variable/Test variable
Gender- Dependent variable/Grouping variable
PROCEDURE FOR INDEPENDENT SAMPLE T-TEST:

Enter data in Data and Variable view

Go to "Analyze"

Select "Compare Means"

Click on the "Independent Samples-T Test"

Click on "Define Groups" and Define your groups

Click on OK
DATA TABLE FOR INDPENDENT SAMPLE T-TEST:

Independent Samples Test

Std. t df Sig. (2-tailed)


Gender N Mean Deviation

Occ_Stress_Score Male 30 35.73 7.852 -16.890 58 .000

Female 30 74.80 9.943

Interpretation:
The sample size (N) is 60 with 30 males and 30 females. The mean score for males turns out
to be 35.73 with an SD of 7.85 while the mean for females is 74.80 with SD being 9.9. The t
value turns out to be -16.89 which is significant at 0.01 level. (p=.000>0.01). Hence the null
hypothesis is rejected and an alternative hypothesis is accepted which states that there is a
difference between males and females in occupational stress.

PAIRED SAMPLE T-TEST

The paired t-test is used in scenarios where measurements from the two groups have a link to
one another. A paired two-sample t-test can be used to capture the dependence of measurements
between the two groups (Wadhwa RR, Marappa-Ganeshan R; 2022). The above two variations
of the student's t-test use observed or collected data to calculate a test statistic, which can then
be used to calculate a p-value. Often misinterpreted, the p-value is equal to the probability of
collecting data that is at least as extreme as the observed data in the study, assuming that the
null hypothesis is true (Andrade C., 2019). The paired sample is mostly used for conducting
experiments that include the use of interventions to see the pre and post test differences of a
sample.

References:

 Andrade C. The P Value and Statistical Significance: Misunderstandings, Explanations,


Challenges, and Alternatives. Indian J Psychol Med. 2019 May-Jun;41(3):210-215.

 Wadhwa RR, Marappa-Ganeshan R. T Test. [Updated 2022 Jan 19]. In: StatPearls
[Internet]. Treasure Island (FL): StatPearls Publishing; 2022 Jan-. Available from:
https://www.ncbi.nlm.nih.gov/books/NBK553048/?report=classic.
Question: To find out whether there exists a significant difference between the pre and post
test scores.

Hypothesis: There will be no significant difference in the pre and post test scores

Variables:
Independent variable- Stress management technique
Dependent Variable- Difference in the Stress scores of pre and post tests
PROCEDURE FOR PAIRED SAMPLE T-TEST:

Enter data in Data and Variable view

Go to "Analyze"

Select "Compare Means"

Click on the "Paired Samples-T Test"

Click on insert the variables

Click on OK

DATA TABLE FOR PAIRED SAMPLE T-TEST:

Paired Samples Test

Std.
Mean N Deviation t df Sig. (2-tailed)

Pre test Scores 35.17 30 13.631 1.934 29 .063

Post test Scores 31.70 30 14.042

Interpretation:

The sample size N for the pre-test and post-test is 30 each. The mean score for the pre-test
turned out to be 35.17, while the mean for post-test scores is 31.70. The standard deviation for
pre-test scores is 13.63 whereas the standard deviation for post-test scores is 14.04. The t value
turns out to be 1.93 and is not significant (p=.063>0.05). It is observed that there is no
significant difference in the pre-test and post-test scores in stress levels. Hence the null
hypothesis is accepted.

Practice Question for Independent sample t-test

To demonstrate the use of an independent t-test, we are going to analyze the data of two groups,
one of whom received mnemonic instructions. It was hypothesized that the group receiving
mnemonic instructions would remember more than the group who did not receive any specific
mnemonic instructions.

Hypothesis:

The group receiving mnemonic instructions would remember more than the group who did
not receive any specific mnemonic instructions.

Independent Samples Test


Std. Deviation t df Sig. (2-tailed)

Gender N Mean
Scores 1 11 17.73 2.867 2.578 19 .018
2 10 14.10 3.573

Interpretation:

The sample size (N) is 21 with 11 subjects in the mnemonic condition group (Group 1) and 10
subjects in the non-mnemonic group (Group 2). The mean score for group 1 turned out to be
17.73 with an SD of 2.867 while the mean for Group 2 is 14.10 with SD being 3.573. The t
value turns out to be 2.578 and is significant at 0.05 level. Hence, the alternative hypothesis is
accepted which states that the group receiving mnemonic instructions would remember more
than the group who did not receive any specific mnemonic instructions.
Practice Question for Paired Sample t-test

To demonstrate the use of paired t- test, we are going to analyze the data from the mental
imagery experiment. It was hypothesized that, as participants would compare their mental
images of two animals to determine which was larger, their decision times for small size
difference trials would be longer than for the large size difference trials.

Hypothesis:

H0 –There is no significant difference between the mental images of the two animals of large-
size mental images and small-size mental images.

Ha (One-Tailed)- decision times for small size difference trials would be longer than for the
large size difference trials.

Paired Samples Test

Std.
Mean N Deviation t df Sig. (2-tailed)

Large_Size_Diff 1156.69 16 290.049 -4.459 15 .000


Small_Size Diff 1462.44 16 500.496

Interpretation:

The sample size N for the Large Size Difference and Small Size Difference is 16 each. The
mean score for the Large Size Difference turned out to be 1156.69 with SD being 290.049,
while the mean for Small Size Difference is 1462.44 with an SD of 500.496. The t value turns
out to be -4.459 and is significant at 0.01 level. The null hypothesis is rejected and the
alternative hypothesis is adopted which states that decision times for small-size difference trials
would be longer than for the large-size difference trials.
MANN WHITNEY U TEST

Mann–Whitney U test a nonparametric test of centrality for ordinal data that contrasts scores
from two independent samples to assess whether there are significant differences between the
two sets of rankings. The statistic obtained from this test, U, is calculated by summing the
number of ranks in one group that are smaller than each of the ranks in the other group. A
Mann–Whitney U test is analogous to a one-way analysis of variance (One way-ANOVA),
except that the former is conducted with ranked data and the latter is conducted with continuous
data.

References:

Norris, G., Qureshi, F., Howitt, D., & Cramer, D. (2014). Introduction to Statistics with SPSS
for Social Science. Routledge.

Question:
The following data shows the intelligence scores of male and female students. Determine if
there is any difference between them.
Variables:
Independent variable: Gender
Dependent variable: Intelligence scores

Hypothesis:
There is no significant difference between intelligence scores of male and female students.

PROCEDURE FOR MANN WHITNEY U TEST

Enter data in Variable view


and Data view

Go to Analyze

Select Non Parametric tests

go to legacy dialogs and select 2 independent


samples...

choose scores as test variable and gender as


grouping variable

define the groups

select Mann Whitney U test

click OK
DATA TABLE FOR MANN-WHITNEY U-TEST

Mann-Whitney U Test

Asymp. Sig. (2-


tailed)
Gender N Mean Score Mann-Whitney U

Score Male 20 21.65 137.00 .207

Female 18 17.11

Interpretation:

The sample size (N) is 38 with 20 males and 18 females. The mean score for males turns out
to be 21.65 while the mean for females is 17.11. The Mann Whitney U value turns out to be
137.00 and the Asymptotic Significant (2 tailed) value is .207 which is not significant at 0.01
level. Hence the null hypothesis is accepted which states that there is no difference between
the intelligence scores of males and females students.

Practice Question 1 for Mann Whitney-U Test


Question:

To find out whether Gender has any influence on the intelligence scores of males and
females.

Hypothesis:

There is no significant difference between the scores of male and female students.

Variables:

Independent Variable- Gender


Dependent Variable- Intelligence

Data Tables for Mann Whitney U-Test

Descriptive Statistics

N Mean Std. Deviation Minimum Maximum


Scores
26 28.6538 7.89401 15.00 50.00

Gender
26 1.5000 .50990 1.00 2.00

Mann Whitney-U Ranks

Mean Rank Summed Mann Whitney U Z Asymp. Sig (2


Ranks tailed)
Gender N
Male
Scores 13 13.65 177.50 82.500 -.103 .918

Female 13 13.35 173.50


Total 26

Interpretation:

The descriptive statistics shows that the mean scores of the males and females is 28.654, with
the S.D being 7.894. In the Mann-Whitney U-ranks table, The sample size (N) is 26 with 13
males and 13 females. The mean score/rank for males turns out to be 13.65 while the mean for
females is 13.35. The Mann Whitney U value turns out to be 82.500 and the Asymptotic
Significant (2 tailed) value is .918, which is not significant at 0.01 level. Hence the null
hypothesis is accepted which states that there is no difference between the intelligence scores
of males and females students.

Practice Question-2 for Mann Whitney Test:

Question:
To see whether Gender has an effect on the rating of males and females on their physiques.
Hypothesis:
Men and Women will not differ in the importance they attach to physique.

Variables:
Independent Variable- Gender
Dependent Rating on Physique

Data Tables:

Descriptive Statistics
N Mean Std. Deviation Minimum Maximum
Rating_Physique 40 5.0250 1.68686 2.00 8.00
Gender 40 1.5000 .50637 1.00 2.00

Mann Whitney-U Ranks

Mean Rank Summed Mann Whitney U Z Asymp. Sig (2


Ranks tailed)
Gender N
Male
Rating 20 17.88 357.50 147.500 -1.441 .150
Physique
Female 20 23.12 462.50
Total 40

Interpretation:

The descriptive statistics shows that the mean rating scores of the males and females is 5.025,
with the S.D being 1.687. In the Mann-Whitney U-ranks table, The sample size (N) is 40 with
20 males and 20 females. The mean score/rank for males turns out to be 17.88 while the mean
for females is 23.12. The Mann Whitney U value turns out to be 147.500 and the Asymptotic
Significant (2 tailed) value is .150, which is not significant at 0.01 level. Hence the null
hypothesis is accepted which states that there is no difference between the rating of physique
by males and females.
WILCOXON SIGNED RANK TEST

A nonparametric statistical procedure is used to determine whether a single sample is derived


from a population in which the median equals a specified value. The data are values obtained
using a ratio scale, each is subtracted from the hypothesized value of the population median,
and the difference scores are then ranked. The test takes into account the direction of the
differences and gives more weight to large differences than to small differences. The symbol
for the test statistic is T. It is also called the Wilcoxon T test.

References:

American Psychological Association, & VandenBos, G. R. (2015). APA Dictionary of


Psychology, Second Edition (2nd ed.). American Psychological Association.

Question:

The following data describes the Pre-test and Post-test scores of a group of students on a
Happiness Scale. Find if the intervention has an effect on the happiness of the group.

Variables:
Independent variable: Intervention
Dependent variable: happiness scores
Hypothesis:
There is no significant difference between pre and post-test happiness scores.

PROCEDURE FOR WILCOXON SIGNED RANK TEST

Enter data in Variable


view and Data view

Go to Analyze

Select Non Parametric


tests

go to legacy dialogs and select 2 related


samples...

choose the variables

select Wilcoxon test

check default selection of asymptotic in exact

click OK
PROCEDURE FOR WILCOXON SIGNED RANK TEST

Enter data in Variable


view and Data view

Go to Analyze

Select Non Parametric


tests

go to legacy dialogs and select 2 related


samples...

choose the variables

select Wilcoxon test

check default selection of asymptotic in exact

click OK
DATA TABLES FOR WILCOXON SIGNED RANK TEST

Descriptive Statistics

N Mean Std. Deviation

Scores on 15 32.53 6.685


happiness
before
intervention

Scores on 15 28.00 8.220


happiness
after
intervention

Wilcoxon Signed Rank Test

Mean Sum Asymp.


N Rank of Ranks Z Sig. (2 tailed)
Post-intervention scores Negative Ranks 10a 7.70 77.00 -1.538 .124
Pre-intervention scores Positive Ranks 4b 7.00 28.00
Ties 1c
Total 15

Interpretation:

The sample size (N) is 15. The mean of scores on happiness before intervention is 32.53 and
the standard deviation is 6.685. the mean of scores on happiness after intervention is 28.00 and
standard deviation is 8.220. The negative ranks for the subjects turned out to be 10 while the
positive ranks are 4 and ties rank is 1 implying that the intervention was effective only for 4
individuals. For the Z value of -1.538, the Asymptotic Significant (2 tailed) value is .124 which
is not significant at 0.01 level. Hence the null hypothesis is retained which states that there is
no difference between the pre-test and post-test happiness scores.
Practice Question-1 for Wilcoxon Signed Ranks Test:

Question:
To find whether there is any difference in the E-fit Rating amongst the pre and post test
groups.

Hypothesis:
There exists no difference between the pre and post test intervention groups.

Variables:
Independent Variable- Intervention
Dependent Variable- E-Fit Scores
DATA TABLES

Descriptive Statistics

N Mean Std. Deviation Minimum Maximum


E_Fit_Rating_1 48 3.7083 1.09074 2.00 6.00

E_Fit_Rating_2 48 3.7500 1.31278 2.00 6.00

Wilcoxon Signed Rank Test

Mean Sum Asymp.


N Rank of Ranks Z Sig. (2 tailed)
Post-intervention scores Negative Ranks 19a 20.26 385.00 -0.72 .943
Pre-intervention scores Positive Ranks 20b 19.75 395.00
Ties 9c
Total 48

Interpretation:

The sample size (N) is 48. The mean of scores on E-fit ratings before intervention is 3.709 and
the standard deviation is 1.090. the mean of scores E- fit ratings after intervention is 3.7500
and standard deviation is 1.312. The negative ranks for the subjects turned out to be 19 while
the positive ranks are 20 and ties ranks are 9, implying that the intervention was effective only
for 20 individuals. For the Z value of -0.72, the Asymptotic Significant (2 tailed) value is .943
which is not significant at 0.01 level. Hence the null hypothesis is retained which states that
there is no difference between the pre-test and post-test E-fit ratings scores.
Practice Question-2 for Wilcoxon Signed Ranks Test:

Question:
To find whether there is any difference in the Job Satisfaction Level scores for pre-test and
post-test conditions
Hypothesis:
There exists no difference between the pre and post test scores.
Variables:
Independent Variable- Intervention
Dependent Variable- Job satisfaction Scores
DATA TABLES

Descriptive Statistics

N Mean Std. Deviation Minimum Maximum


Pre_Test_Scores 15 63.7333 17.81038 34.00 88.00

Post_Test_Scores 15 66.0667 17.77425 30.00 93.00

Wilcoxon Signed Rank Test

Mean Sum Asymp.


N Rank of Ranks Z Sig. (2 tailed)
Post-intervention scores Negative Ranks 6a 7.17 43.00 -0.967 .334
Pre-intervention scores Positive Ranks 9b 8.56 77.00
Ties 0c
Total 15

Interpretation:

The sample size (N) is 15. The mean of scores before intervention is 363.733 and the standard
deviation is 17.810. the mean of scores after intervention is 66.067 and standard deviation is
17.775. The negative ranks for the subjects turned out to be 6 while the positive ranks are 9
and ties ranks are 0, implying that the intervention was effective only for 9 individuals. For the
Z value of -0.967 the Asymptotic Significant (2 tailed) value is .334 which is not significant at
0.01 level. Hence the null hypothesis is retained which states that there is no difference between
the pre-test and post-test scores.
ONE-WAY ANOVA

We can determine whether or not the means of our experimental conditions are different using
the parametric test known as an ANOVA. This allows us to determine whether the dependent
variable has been affected by our experimental manipulation. The phrase "one-way" simply
refers to the usage of it for data analysis from studies with a single IV. One-way ANOVA
enables us to compare participant performance across a number of groups or conditions,
whereas t-tests can only test for differences between two groups or conditions. One-way
ANOVA, however, won't specifically state which pairs of conditions are significantly different
from one another (e.g., whether condition 1 is significantly different from condition 2, whether
condition 2 is significantly different from condition 3, or whether condition 1 is significantly
different from condition 3). Instead, it will simply state whether the scores significantly vary
across our conditions (i.e., whether our IV has had a significant effect on participants' scores).
Planned and unplanned comparisons, which are additional statistical techniques, are needed for
these comparisons.

References:

Harrison, V., Kemp, R., Brace, N., & Snelgar, R. (2020). SPSS for Psychologists.

Red Globe Press.


Question 1: Following is the data describes the conditions of greyblob, pixelation, negation and
unmasking applied to 4 groups of witnesses and the resulting memory recall based on those
conditions. Determine whether there is an effect of these conditions on memory recall.

Null Hypothesis: There would be a negative effect of masking on memory.

Variables:

Independent Variables– Masking conditions (Unmasked, Greyblob, Pixelated and Negated)

Dependent Variable– Memory


PROCEDURE FOR ONE-WAY ANOVA

Enter data in data view and variable view

Go to Analyze

Click on Compare means

select one way ANOVA

select independent variable as factor and dependent variable as


dependent list

go to options

under statistics select Descriptive statistics and Homogeneity of


variance test

click on continue

click on Post-hoc

select Tukey and Games Howell

click on continue and OK


DATA TABLES FOR ONE-WAY ANOVA

Test of Homogeneity of Variances


Levene’s Statistic df1 df2 Sig.

.490 3 36 .692

Descriptive Statistics

Presentation Condition Mean Std. Deviation N


Unmasked 66.7000 5.33437 10
Greyblob 55.7000 3.80205 10
Pixelated 57.7000 5.41705 10
Negated 67.2000 4.58984 10
Total 61.8250 7.00142 40

ANOVA
Sum Of df Mean Square F Sig.
Squares
Between 1071.875 3 357.292 15.314 .000
Groups
Within 839.900 36 23.331
Groups
Total 1911.775
Post Hoc Tests

Multiple Comparisons
Dependent Variable: Memory
Mean Difference
(I) PresentationCondition (J) PresentationCondition (I-J) Sig.
*
Tukey HSD Unmasked greyblob 11.0000 .000
*
pixelated 9.0000 .001
negated -.5000 .996
*
greyblob Unmasked -11.0000 .000
pixelated -2.0000 .791
*
negated -11.5000 .000
*
pixelated Unmasked -9.0000 .001
greyblob 2.0000 .791
*
negated -9.5000 .001
negated Unmasked .5000 .996
*
greyblob 11.5000 .000
*
pixelated 9.5000 .001
*
Games-Howell Unmasked greyblob 11.0000 .000
*
pixelated 9.0000 .007
negated -.5000 .996
*
greyblob Unmasked -11.0000 .000
pixelated -2.0000 .776
*
negated -11.5000 .000
*
pixelated Unmasked -9.0000 .007
greyblob 2.0000 .776
*
negated -9.5000 .003
negated Unmasked .5000 .996
*
greyblob 11.5000 .000
*
pixelated 9.5000 .003
Based on observed means.
The error term is Mean Square(Error) = 23.331.
*. The mean difference is significant at the .05 level.

Interpretation:

The sample of the study N is 40. The mean memory score for unmasked condition (M= 66.70,
.SD= 5.334, n=10), greyblob (M= 55.70, SD=3.802, n=10), pixelated (M= 57.70, SD= 5.417,
n=10) and negated condition (M= 67.20, SD= 4.590, n=10) indicates that greyblob condition
yielded the lowest mean memory score. The Levene’s test is not significant (p=0.692) which
indicate that the variances of the four groups are not significantly different. Hence the
assumption of homogeneity of variance is validated and it is normally distributed, therefore
parametric test can be used to study the effect. The memory scores between groups are
significantly different (F=15.314, p=.000 at 0.05 level which is <0.05). Hence, alternate
hypothesis is accepted stating that there is a significant effect of presentation condition
(masking) on memory.

Tukey and Games-Howell post hoc tests were computed. They yielded a multiple comparison
to show which differences between the specific groups. The Tukey post hoc test shows that
there is a significant difference between the unmasked and greyblob groups (p=.000); the
unmasked and pixelated groups (p=.001); the greyblob and negated groups (p=.000); and the
pixelated and negated groups (p=.001). No significant difference was seen between the
unmasked and negated groups (p=.996); the greyblob and pixelated groups (p=.791).

The Games-Howell post hoc test revealed the same results as the Tukey test for all pairing with
negligible variations in the significance values.
Question 2: The following data describes the Psychological Wellbeing of adults. Find out
whether marital status of adults has any influence on their psychological wellbeing.

Null Hypothesis: There is no significant difference in psychological wellbeing of adults across


different marital status.

Variables:

Independent Variable- marital status (married, unmarried, widowed and separated)

Dependent Variable- psychological wellbeing


DATA TABLES FOR ONE-WAY ANOVA

Descriptive Statistics
Dependent Variable: Psychological wellbeing
Marital status N Mean Std. Deviation
Married 20 9.55 1.638
Unmarried 20 17.35 1.755
Widowed 20 18.65 1.226
Separated 20 17.25 1.773
Total 40 15.70 3.947

ANOVA

Psychological wellbeing

Sum of Squares df Mean Square F Sig.

Between Groups 1033.000 3 344.333 132.302 .000

Within Groups 197.800 76 2.603

Total 1230.800 79

Levene's Test of Equality of Error Variances


Levene Statistic df1 df2 Sig.

Psychological wellbeing Based on Mean .580 3 76 .630


Post Hoc Test

Multiple Comparisons

Dependent Variable: psychological wellbeing

(I) marital status (J) marital status Mean Difference (I-J) Sig.

Tukey HSD Married Unmarried -7.800* .000

Widowed -9.100* .000

Separated -7.700* .000

Unmarried Married 7.800* .000

Widowed -1.300 .061

Separated .100 .997

Widowed Married 9.100* .000

Unmarried 1.300 .061

Separated 1.400* .037

Separated Married 7.700* .000

Unmarried -.100 .997

Widowed -1.400* .037

Games-Howell Married Unmarried -7.800* .000

Widowed -9.100* .000

Separated -7.700* .000

Unmarried Married 7.800* .000

Widowed -1.300* .048

Separated .100 .998

Widowed Married 9.100* .000

Unmarried 1.300* .048

Separated 1.400* .031

Separated Married 7.700* .000

Unmarried -.100 .998

Widowed -1.400* .031

*. The mean difference is significant at the 0.05 level.


Interpretation:

The sample of the study N is 80. The mean score of psychological wellbeing in married adults
(M=9.55, .SD= 1.638, n=20), unmarried adults (M= 17.35, SD=1.755, n=20), widowed adults
(M=18.65, SD=1.226, n=20) and separated adults (M=17.25, SD= 1.773, n=20) indicates that
married adults have the lowest mean score. The Levene’s test is not significant (p=0.630) which
indicate that the variances of the four groups are not significantly different. Hence the
assumption of homogeneity of variance is validated and it is normally distributed, therefore
parametric test can be used to study the effect. The psychological wellbeing scores between
groups is significantly different (F=132.302, p=.000 at 0.05 level which is <0.05. Hence,
alternate hypothesis is accepted stating that there is a significant difference in psychological
wellbeing of adults across different marital status.

A Multiple Comparison test was done to know which of the specific groups differed. Tukey
and Games Howell post hoc test are generally the preferred tests for conducting a pot hoc test
on one way ANOVA. The Tukey post hoc table shows that there is a statistically significant
difference between the married and unmarried group (p=.000). The same was seen for married
and widowed (p=.000); and married and separated groups (p=.000); and widowed and
separated group (p=.031). There is no statistical difference between unmarried and widowed
group (p=.061); and between unmarried and separated group (p=.997). the same results were
yielded by Games Howell test except unmarried and widowed group showing a significant
difference (p=0.48).
TWO-WAY ANOVA

To evaluate whether or whether there is a statistically significant difference between the means
of three or more independent groups that have been divided on two variables (also known as
"factors"), a two-way ANOVA ("analysis of variance") is performed. A two-way ANOVA is
used to examine the effects of two variables on a response variable and to ascertain whether
there is an interaction between the two variables and the response variable. The following
presumptions must be true in order for the findings of a two-way ANOVA to be considered
valid:

1. Normality - For each group, the response variable is roughly normally distributed.

2. Equivalent Variances - Each group's variances should be substantially equal.

3. Independence - Each group's observations are independent of one another, and a random
sample was used to collect the data within each group.

REFERENCES-

Z. (2020, June 8). How to Perform a Two-Way ANOVA in SPSS. Statology.


https://www.statology.org/two-way-anova-spss/
PROCEDURE FOR TWO-WAY ANOVA

Enter data in Data and Variable view

Go to "Analyze"

Select "General Linear Model"

Click on the "Univariate"

Insert your Fixed factors and Dependent Variable

Click on "Post Hoc"

Click on "Tukey" and "Bonferroni"

Click on Continue

Click on "Options"

Choose Descriptive Statistics, Homogeneity tests,


Estimates of Effect Size

Click on Continue

Click "OK"
Question: The following data describes the attractiveness scores for a given court sentence
score along with the gender of the prisoner so awarded the sentence. Find out whether
attractiveness of prisoners has an influence on the sentence received, whether gender has an
influence on the sentence received, and the interaction between attractiveness and gender in
influencing the sentence.

Hypotheses:

There is a significant influence of Gender on Sentence among prisoners


There is a significant influence of attractiveness on sentence among prisoners
There is a significant interaction effect of gender, attractiveness and sentence among prisoners.

Variables:

Independent Variable-

Attractiveness (Attractive, Unattractive, No Picture)


Gender (Male, Female)

Dependent Variable-

Sentences given to the prisoners.

DATA TABLE FOR TWO-WAY ANOVA

Descriptive Statistics
N Mean Std. Deviation Skewness Kurtosis
Statistic Statistic Statistic Statistic Statistic

Sentences 60 10.7500 3.28646 -.174 -.947

Levene's Test of Equality of Error Variances


Levene Statistic df1 df2 Sig.
Sentences Based on Mean 1.509 5 54 .202

Tests of Between-Subjects Effects

Dependent Variable: Sentences

Type III Sum


Source of Squares df Mean Square F Sig.
Gender 6.017 1 6.017 1.579 .214

Attractiveness 422.500 2 211.250 55.457 .000

Gender * 3.033 2 1.517 .398 .674


Attractiveness
Multiple Comparisons

Dependent Variable: Sentences

(I) Attractiveness (J) Attractiveness Mean Difference (I-J) Sig.


Tukey HSD Attractive Unattractive -3.2500 *
.000
No Picture -6.5000 *
.000
Unattractive Attractive 3.2500 *
.000
No Picture -3.2500 *
.000
No Picture Attractive 6.5000 *
.000
Unattractive 3.2500 *
.000
Bonferroni Attractive Unattractive -3.2500 *
.000
No Picture -6.5000 *
.000
Unattractive Attractive 3.2500 *
.000
No Picture -3.2500 *
.000
No Picture Attractive 6.5000 *
.000
Unattractive 3.2500 *
.000
Based on observed means.
The error term is Mean Square (Error) = 3.809.
*. The mean difference is significant at the .05 level.

Interpretation:

For the recorded sentences given to the 60 prisoners the mean was found to be 10.75 with SD
3.29. The skewness and kurtosis values for the same are -0.174 and -0.947 which are greater
than -1.96, indicating that the data is not skewed and lies in the normal probability
distribution. Hence, a parametric test can be performed. The Levene’s statistic for sentences
was found to be 1.509 and the significance value for the same was found to be 0.202 which is
not significant at 0.05 level (p= 0.20, p>0.05). This indicates that the sentences given are also
normally distributed.

Here, Two-Way ANOVA was performed since the hypotheses calls for an interaction effect
(between attractiveness & gender) and effect of those variables on the dependent variable
(sentences) as well. In the Tests of Between-Subjects Effects table, Gender was not seen to be
significantly affecting the sentences (F=1.579, p=0.214 >0.05). Attractiveness was found to
be significantly affecting the sentences (F=55.45, p=0.00<0.01). Whereas, for the interaction
effect between attractiveness and gender the significance values was found to be 0.674 which
is nowhere significant (F=0.398, p= 0.67>0.05). Hence, attractiveness has an effect on
sentences but it does not interact with gender. Whereas, gender does not affect sentences.

Therefore, the first hypothesis stating that there is a significant influence of Gender on
Sentence among prisoners is not supported. The second hypothesis stating that there is a
significant influence of attractiveness on sentence among prisoners is accepted. And the third
hypothesis stating that there is a significant interaction effect of gender, attractiveness and
sentence among prisoners is also not supported.
SPEARMAN RANK-ORDER METHOD

The nonparametric equivalent of the Pearson product-moment correlation is the Spearman's


rank-order correlation. The strength and direction of the relationship between two ranking
variables are measured by Spearman's correlation coefficient (rs), sometimes known as the ρ,
also signified by ‘rs ’symbol. Two variables that can be either ordinal, interval, or ratio are
required. The Spearman correlation can be employed when the Pearson correlation's
presumptions are extremely broken, even if would typically wish to apply a Pearson product-
moment correlation on interval or ratio data. But, unlike Pearson's correlation, which identifies
the strength and direction of the linear relationship between two variables, Spearman's
correlation determines the strength and direction of the monotonic relationship between your
two variables. The Spearman correlation coefficient, rs, can take values from +1 to -1. A rs of
+1 indicates a perfect association of ranks, a rs of zero indicates no association between ranks
and a rs of -1 indicates a perfect negative association of ranks. The closer rs is to zero, the
weaker the association between the ranks.

References:
Spearman’s Rank-Order Correlation - A guide to when to use it, what it does and what the
assumptions are. (n.d.). https://statistics.laerd.com/statistical-guides/spearmans-rank-
order-correlation-statistical-guide.php
PROCEDURE FOR SPEARMAN RANK-ORDER METHOD

Enter data in data view and


variable view

Go to Analyze

Click on 'Descriptives
Statistics'

Click on 'Descriptives'

Select the variables

Go to options

Select mean, SD, Kurtosis and


Skewness

Click on 'Continue'

Click on OK

Go to Analyze

Click on 'Correlate'

Click on 'Bivariate'

Select the variables

Unselect Pearson

Select Spearman

Click on OK
Question: Following data describes the Locus of Control and Job Satisfaction scores of 11
individuals. Find out whether there is a relationship between these two.

Hypothesis- There is no significant relationship between locus of control and job satisfaction.

Variables:
Locus of Control
Job satisfaction
DATA TABLE FOR SPEARMAN RANK-ORDER METHOD

Descriptive Statistics

N Mean Std. Deviation Skewness Kurtosis


Statistic Statistic Statistic Statistic Statistic
Locus Of Control 11 28.7273 10.64980 .063 -1.322

Job Satisfaction 11 70.8182 13.92708 -.740 -.866

Correlations

Locus Of Control Job Satisfaction


Spearman's rho Locus Of Control Correlation Coefficient 1.000 .808**
Sig. (2-tailed) . .003
N 11 11
**
Job Satisfaction Correlation Coefficient .808 1.000
Sig. (2-tailed) .003 .
N 11 11
**. Correlation is significant at the 0.01 level (2-tailed).

Interpretation:
The data above show the Locus of Control and Job Satisfaction scores of the participants. For
Locus of Control (N= 11, Mean = 28.73, SD= 10.65) the values of skewness and kurtosis
were found to be .063 and -1.32 respectively. And that for Job satisfaction score (N=11,
Mean= 70.82, SD= 13.93) the values of skewness and kurtosis were found to be -0.866 and
1.28 respectively. These values lie between ±1.96, indicating that the data is normally
distributed. Hence, a parametric statistical test can be performed. But, since the sample size is
less than 30, a non-parametric test needs to be run. For this purpose, Spearman Rank-Order
method was used. In the correlations table, for 11 observations the correlation coefficient was
found to be .808 which is nearer to +1, indicating a strong positive relationship between Job
Satisfaction and Locus of Control. The significance value was found out to be .003 which is
significant at 0.01 level (pvalue = 0.003, p<0.01). Hence, the null hypothesis is rejected
which states that there is no significant relationship between locus of control and job
satisfaction.
SCATTER PLOT

The relationships between two pairs of continuous variables are displayed using scatterplots.
These graphs provide symbols for the paired variables at the X and Y coordinates of the data
points. Other names for scatterplots include scattergrams and scatter charts. Determine whether
there is a relationship or correlation between two continuous variables using the dot pattern on
a scatterplot. If a relationship is present, the scatterplot shows its direction as well as whether
it is straight or curved. A special kind of scatterplot called a fitted line plot shows the data
points and a fitted line for a straightforward regression model. You may assess how well the
model matches the data using this graph. To evaluate the following characteristics of your
dataset, use scatterplots:

Analyze the connection between the two variables.

Look out for anomalies and strange observations.

Using irregular time-dependent data, plot a time series.

Analyze the regression model's fit.

Stronger relationships produce a tighter clustering of data points. Stronger relationships


produce correlation coefficients closer to -1 and +1 and regression models that have higher R-
squared values.

References:
Frost, J. (2023, January 13). Scatterplots: Using, Examples, and Interpreting. Statistics by
Jim. https://statisticsbyjim.com/graphs/scatterplots/
Question: Following data describes the scores of Locus of Control and Job Satisfaction. Find
out whether there is a relationship between the both.

Hypothesis: Locus of Control is the negative predictor of the job satisfaction


Variables:
Dependent Variable- Job Satisfaction
Independent Variable- Locus of Control
PROCEDURE FOR SCATTER PLOT

Enter data in Data and Variable view

Go to "Analyze"

Select "Regression"

Select "Curve Estimation"

Select " Independent and Dependent


variables"

Select "Linear" in Models

Click on "OK"

Enter data in data view and variable view

Go to Analyze

Click on 'Descriptives Statistics'

Click on 'Descriptives'

Select the variables

Go to options

Select mean, SD, Kurtosis and Skewness

Click on 'Continue'

Click on OK
DATA TABLE FOR SCATTER PLOT

Descriptive Statistics

N Mean Std. Deviation Skewness Kurtosis

Statistic Statistic Statistic Statistic Statistic


Locus Of Control 11 28.7273 10.64980 .063 -1.322

Job Satisfaction 11 70.8182 13.92708 -.740 -.866

Valid N (listwise) 11

Model Summary and Parameter Estimates

Model Summary Parameter Estimates

Equation N R Square F Sig. Constant b1

Linear 11 .485 8.460 .017 44.668 .910


SCATTER-PLOT GRAPH

Interpretation:

The data above shows the Locus of Control and Job Satisfaction scores of the participants. For
Locus of Control (N= 11, Mean = 28.73, SD= 10.65) the values of skewness and kurtosis were
found to be .063 and -1.32 respectively. And that for Job satisfaction score (N=11, Mean=
70.82, SD= 13.93) the values of skewness and kurtosis were found to be -0.866 and 1.28
respectively. These values lie between ±1.96, indicating that the data is normally distributed.
Hence, a parametric statistical test can be performed. But, since the sample size is less than 30,
a non-parametric test needs to be run. For this purpose, non-parametric regression was done.

In the model summary table, for 11 observations the R square is seen to be .485 which indicates
that there is some effect of independent variable on the dependent variable. The F value = 8.460
for which the significance value was found to be 0.017. It is significant at 0.05 level i.e., p<0.05.
In the parameter estimates the constant means intercept and b1 means the slope of that intercept
in the above scatterplot. Here, the constant value (the intercept touching the Y-axis) was seen
to be 44.67 meaning when the Locus of Control will be zero the Job Satisfaction will be 44.67.
Similarly, the b1 value was found to be .910, indicating that when score of Locus of Control
increases by one the job satisfaction increases by .910. Additionally, if LOC increases by 10
Job satisfaction will also increase by 9.10. In short, constant value indicates the base score of
Job Satisfaction when the predictor variable i.e., LOC is zero and b1 value predicts the increase
in criterion variable i.e., job satisfaction if there is an increase in predictor variable which is
LOC here.

In the scatter plot, Locus of Control is seen to be plotted against the Job Satisfaction along with
a fitted line drawn. The linear line progresses upwards from the left side showing a positive
correlation. The observed dots seem to be clustered more towards the upper end of the linear
line. This shows when people tend to have higher locus of control their job satisfaction level is
also higher.

The P-value was found to be significant at 0.05 level and the scatterplot graph also shows a
positive relationship between both the variables. Therefore, the null hypothesis is rejected
which states that locus of Control is the negative predictor of the job satisfaction.

You might also like