Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

SPSS

FOR BEGINNERS
IN 45 MINUTES
A 2021 Quick Reference Guide to Research Methods, Data Analysis
and Interpretation of Statistical Data

Bill Wesley
Copyright
Copyright©2021 Bill Wesley
All rights reserved. No part of this book may be reproduced or used in any manner without the
prior written permission of the copyright owner, except for the use of brief quotations in a book
review.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal
responsibility for any errors or omissions that may be made. The publisher makes no warranty,
express or implied, with respect to the material contained herein.
Printed on acid-free paper.

Printed in the United States of America


© 2021 by Bill Wesley
Table of Contents
Copyright
Introduction to SPSS
History of SPSS
How do Statistics of Social Science came about?
Main Features of SPSS
Benefits of Using SPSS
How to install the SPSS software?
Advantages of SPSS over other Tools
CHAPTER ONE
Versions of the SPSS
Editions of the SPSS
SPSS MENU
The File Menu
The Edit Menu
The View Menu
The Data Menu
The Transform Menu
The Analyze Menu
Direct Marketing Menu
The Graphs Menu
Other Menu
CHAPTER TWO
Getting familiar with the SPSS Data Editor Environment
How to Open the SPSS Software
Tabs in the SPSS Editor
How to Use the File Menu
CHAPTER THREE
Analyzing Statistical Data
Introduction to Descriptive Analysis
Frequencies
Descriptive
Explore
Graphs
CHAPTER FOUR
REGRESSION AND CORRELATION
Regression and Linear Correlation
Validation and diagnosis of the model
Normal
Histogram
Normal Probabilistic Plot
Normality test: Kolomogorov-Smirnov test
Homoscedasticity
Independence of residues: Durbin-Watson test
Quadratic Regression and Correlation
CHAPTER FIVE
PROBABILITY DISTRIBUTIONS: Binomial, Poisson and Normal
How to Carry out Probability Distribution in SPSS
Probability Mass Function
Distribution Function
How to Calculate Quantiles
Generate random values from a given distribution
Practical Questions and Answers
CHAPTER 6
Confidence Intervals
Practical Question
Solution Procedure
Confidence Intervals for the difference of means in independent samples
Practical Question
Solution Procedure
Confidence Intervals for the difference of means in related samples
CHAPTER SEVEN
Contrasts of Hypothesis
Basic Concept of Hypothesis
Types of contrasts
Types of contrast hypotheses
The Decision Rules
Type I and II errors
Parametric hypothesis testing
Hypothesis Testing for a Sample
Practical Question
Solution Procedure
Hypothesis Testing for Two Independent Samples
Practical Question 2
Solution Procedure
Hypothesis testing for paired samples
Practical Question
Solution Procedure
The Chi-square Test Procedure
Practical Question
Solution Procedure
CHAPTER EIGHT
Statistical Design of Experiments
Completely Randomized Design
How to carry out Contrast
Randomized Complete Block Design
How to Import a Data Source from Excel to SPSS
How to Import Text to SPSS
About the Author

Introduction to SPSS
One of the best statistical software in the world is IBM SPSS. This statistical software offers a
wide variety of statistical calculations, and it is very easy to learn. This is because the software is
user-friendly and free from numerous sophistications like other statistical software. The SPSS
software uses an advanced statistical procedure to ensure high accuracy and quality decision-
making. Also, it can be used for all facets of the analytics lifecycle, such as data preparation, data
management, data analysis, and reporting of data.

History of SPSS
SPSS is a computer software owned and managed by IBM Inc. for the analysis of data. It is
important to note that IBM did not initially create the program; it was created by three social
scientists Normal H. Nie, C. Hadai Hull, and Dale H. Bent, in 1968. These three creators were
fresh graduates when they invented the software to analyze large amounts of data obtained from
different research methods quickly and efficiently. At first, the program was used at the
University of Chicago, after which it gained popularity, and other American universities adopted
the software usage. The software right was later acquired by IBM in 2009 and has thus received
the brand name IBM SPSS Statistics.

How do Statistics of Social Science came


about?
Nie started working on the SPSS project at the University of Chicago. He invited Hull to
participate as director of the project to develop other components of the software. Dale H. Bent
was also involved in the development of the program. These events occurred between 1969 and
1975. The three social scientists continued to improve on the software until it was officially
released in 1975 as SPSS Inc. In 2009, SPSS was sold to IBM Inc. later adopting the name IBM-
SPSS Statistics. The program further underwent other improvements, and several versions have
been released. In 2014, the program was declared, "the world's leading statistical software for
companies, government, research organizations, and academics." and as "an easy-to-use set of
predictive analytics tools and data for business users, analysts, and statistical programmers.

Main Features of SPSS


This software includes four major programs that assist researchers in their investigations:
statistics program, modeler program, text analysis program for surveys, and visualization
designer. The statistics program contains a large number of essential statistical functions.
Likewise, some of its operations include frequencies, cross-tabulation, and bivariate statistics,
among other aspects.
In this sense, several statistical methods can be performed in SPSS, such as descriptive statistics.
There are also bivariate statistics, including analysis of variance, means, correlation, and non-
parametric tests. Likewise, cluster and factor analysis can predict numerical results such as linear
regression and projections to identify groups.
The modeling program has the main objective of constructing and validating production models
using advanced statistical procedures. With the program text analysis surveys, surveyors or
administrators can discover valuable information from the user responses to questions. In this
way, it provides feedback analysis, allowing these managers to get real insight.
Visualization Designer allows researchers to use data to create various visual elements such as
graphs and diagrams of radial density boxes with ease. In addition to these four programs, SPSS
also offers the possibility of documenting data; that is, it allows researchers to store a dictionary
of metadata. This dictionary acts as a centralized repository of information related to the data,
such as its meaning, origin, use, and format. Added to this is the option of managing data, such
as selecting cases, creating derived data, and reshaping files.

Benefits of Using SPSS


Although it seems like very complex software, you should know that SPSS is very easy to learn,
use, and apply. You have two types of views: variable view and data view. The first one, variable
view, allows you to customize the data type for a thorough analysis. To do this, you must fill in
the different column headers, which are the attributes that help characterize the data. These are
Name, Label, Type, Width, Decimals, Values, Missing, Columns, Align, and Measurements.
On the other hand, the data view is structured in rows and columns, and it is where you can
import data from a file or add it manually. Another feature of SPSS is that it helps us have the
data management system and editing tools at hand and be able to design, plot, report, and present
functions for greater clarity. All this allows us to analyze the exact result of the data through
detailed statistics.
The data is stored in SAV format and can easily be exported to SPSS for detailed and targeted
analysis. This makes manipulating, analyzing, and extracting data very simple since part of this
process is automated. The automated process gives researchers more time to do what they like:
identify trends, develop predictive models, and draw informed conclusions.

How to install the SPSS software?


When installing SPSS, you should remember that it costs money, although it has a free trial
version. The first thing you should do is verify the minimum hardware and software
requirements that your system must have to install this program, which is the following:
● Operating system:
Microsoft Windows XP (32-bit version), Windows Vista (32-bit and 64-bit versions), or
Windows 7 (32-bit and 64-bit versions) or higher. It can also be installed on Mac.
● Configuration:
It requires the following system configurations for it to perform optimally;
i. Intel or AMD processor at 1 gigahertz (GHz) or higher.
ii. 1 gigabyte (GB) of RAM or more.
iii. 800 MB of available disk space.
iv. If you install more than one help language, each language requires 60/70 MB of free disk
space.
v. Monitor with Super VGA resolution (800x600) or higher.

Advantages of SPSS over other Tools


SPSS is recognized internationally. One of the software characteristics that make it more popular
and highly valuable is its ability to work with very large databases. This program is capable of
handling databases with more than 30,000 variables. It has a data processing capacity only
limited by the storage capacity of the computer's disk. Also, it does not round-off or carry out
approximation; rather, it provides more exact calculations or results.
Thanks to this speed and ability to analyze these large amounts of data, the program saves
considerable time and effort, making it very useful for researchers and allowing more agile
decision-making.
This way, SPSS does not only focus on the analysis of the data and the crossing of variables but
also on the decisions about the process, the interpretation of the results, and thus carry out more
critical analyses. In addition, this tool is the most used in any of the research fields of the social
sciences because it is compatible with the transfer of data with other programs.
CHAPTER ONE
Versions of the SPSS
SPSS is a software product widely used all over the world. This software was first introduced in
1968, and it was formerly known as Statistical Package for the Social Sciences (SPSS). In 2009,
IBM acquired the software and branded it as IBM Statistical Product and Service Solutions. Here
are some of the versions released since the software was introduced;
SPSS Version One: This version was released in 1968. It was the first release of the software,
and it was majorly used to carry out statistical analysis on sociology.
SPSS Version Two: This version was released in 1983, and it has several improvements from
the former version.
SPSS version five: This version of SPSS was released in 1993, and it came preloaded with
several functions to make statistical analysis easy. It is important to note that there were no
versions three and four of the SPSS software.
SPSS version 6.1: This version of SPSS was released in the year 1995. It was a huge
development from the rest platform
SPSS version 7.5: This version of SPSS was released in the year 1997
SPSS version 8: This version of SPSS was released in the year 1998
SPSS version 9: This version of SPSS was released in 1999
SPSS version 10: This version of SPSS was released in 1999. Version 9 and 10 were the first
two versions of SPSS that were released the same year.
SPSS version 11: This version of SPSS was released in the year 2002
SPSS version 12: This version of SPSS was released in the year 2004
SPSS version 13: This version of SPSS was released in the year 2005
SPSS version 14: This version of SPSS was released in the year 2006
SPSS version 15: This version of SPSS was released in the year 2006
SPSS version 16: This version of SPSS was released in the year 2007
SPSS version 17: This version of SPSS was released in the year 2008
PASW version 17: This version came as a result of copyright issues. It was released in the year
2009
PASW version 18: This version was released in the year 2009. Even when the issue of copyright
was still on, the company still released updated software.
SPSS version 19: After the issue of copyright had been resolved, a newer version was released
in the year 2010
SPSS version 20: This version was released in the year 2011
SPSS version 21: This version was released in the year 2012
SPSS version 22: This version was released in the year 2013
SPSS version 23: This version was released in the year 2015
SPSS version 24: This version was released in the year 2016
SPSS version 25: This version was released in the year 2017. It was the first software version to
receive a significant overhauling
SPSS version 26: This version of SPSS was released in the year 2018
SPSS version 27: This version of SPSS was released in the year 2019
SPSS version 28: This version of SPSS was released in 2021. It is the latest version of the SPSS,
and it came pre-installed with several features that made it the best.

Editions of the SPSS


There are four types of editions of the SPSS software, and they are;
● The base edition
● The standard edition
● The professional edition
● The premium edition
All these editions differ by price as well as the function that is included in the edition. For
example, the premium edition is the highest and has all the functions, unlike the base edition.
Base Edition has the following functionalities: Basic statistics, linear regression, clustering, and
factor analysis.
The Standard edition has the following functionalities: Logistic regression, generalized LM,
survival analysis, and it has a drag and drop table. It also has some of the functionalities of the
base edition, such as linear regression, clustering, and factor analysis
Professional edition has the following functionalities: Data preparation, forecasting, decision
trees, and imputation. It has various standard and base editions functionalities like logistic
regression, generalized LM, survival analysis, drag, and drop table, linear regression, clustering,
and factor analysis.
Premium edition of SPSS has the following functionalities: Bootstrapping, complex sampling,
exact tests, and SEM (Structural equation modeling). It also has all the other functionalities of
the rest editions.

SPSS MENU
There are several menu options in the software. These menu bars are located at the top screen for
easy finding. Here is the menu and what they contain;

The File Menu


The file menu is located at the distant left of the screen, and this menu is used mainly for the
following;
● Open existing files
● Create New files
● Print files
● Save anything that you are working on.
This menu also contains details of recently used data that you are working on. Also, you can use
the file menu to rename a file you have already saved. The shortcut to the file menu is by
pressing the Alt button on your keyboard as well as the F button (Alt + F).

The Edit Menu


Immediately after the file menu is the edit menu. This menu houses several functions in the
SPSS software. The edit menu is used mainly for the following;
● Undo changes
● Redo changes
● Copy data/spreadsheet
● Paste data/spreadsheet
● Find data
● Go to Case
● Replace
● Clear
One of the most important features is the Go to Case, as it allows you to locate a particular data
score or participant when dealing with a large set of data. The shortcut for opening the edit menu
is by pressing the Alt button and the E button (Alt + E)

The View Menu


The view menu is the following menu after the editing menu. This menu deals with the visual
aspects of the spreadsheet. It has the following functions;
● Status bar
● Tools bar
● Fonts
● Gridlines
● Valuable Labels
● Variables
As you can observe, most of these functions allow you to either view or enter details of the
different variables you have used. The shortcut to use the view menu is to press the Alt button on
your keyboard and the V button (Alt + V).
The Data Menu
The data menu is one of the most exciting menus as it performs the function of organizing your
data. With this menu, you can identify potential mistakes you have made in your files. This menu
has the following features;
● Identify Duplicate Cases
● Sort Cases
● Sort Variables.
● Transpose
● Merge Files
● Split File
● Copy Dataset
The shortcut to the data menu is by pressing the Alt button of your keyboard and the D button
(Alt + D).

The Transform Menu


With this menu, you can manipulate the variables in your data sets. It has the following
functions;
● Compute variable
● Recode into the same variable
● Recode into different variable
The Compute variable is a function that allows you to create a new variable from an already
existing variable. At the same time, the Recode options allow you to change the values of
specific variables. The shortcut for the transform menu is by pressing the Alt button and the T
button on the keyboard (Alt + T).

The Analyze Menu


This menu is found immediately after the transform menu, and it constitutes analytical tools to
analyze and compare the data in your files. This menu contains the following features;
● Descriptive Statistics
● Compare Means
● General Linear Model
● Correlation and Regression.
The shortcut to using this menu is pressing the Alt button and the A button on the keyboard (Alt
+ A).

Direct Marketing Menu

This menu is meant for a large corporation that is into market research. It consists of several
features that help you analyze your customers or contacts to help improve your results. With this
menu, you can do the following;
● Identify the right contacts and improve campaign ROI
● Easily uncover customer groups
● Get an all-in-one marketing analysis
● Connect to Salesforce.com for further insight
● Access to a range of features

The Graphs Menu


The graph menu is used to present data in graphical format. This is to ensure that viewers
understand the presented data better. In this menu, graphs can be represented in several ways,
such as;
● The Chart Builder
● The Legacy Dialogs
These two options allow you to build charts by dragging and dropping variables onto a virtual
canvas. This creates greater flexibility in graphing. The Legacy Dialogs allow you to create
basic, simple graphs in an easy-to-use manner.

Other Menu
Other menus that can be found in the SPSS menu bar include;
● Utilities Menu
● Add Ons menu
● Window Menu
● Help Menu
CHAPTER TWO
Getting familiar with the SPSS Data Editor
Environment
SPSS is a unique software that is used for statistical calculations and analysis. This software has
become essential in different fields, especially the Social Sciences, due to its multiple uses. It can
be used for statistical calculations, descriptive and inferential analysis, graphs, correlations, and
time series. Research companies also make use of SPSS to analyze their data efficiently. All
professionals from different areas who need to apply statistical analysis and data analysis make
use of SPSS.
More to it is that University students with prior knowledge of statistics use the SPSS to fulfill
their assignments or end-of-degree and end-of-master projects. Therefore, the use of SPSS
software facilitates the collection and organization of data, makes it possible to know if the work
hypotheses have been fulfilled, facilitates decision-making, and allows the best strategy to be
adopted.

How to Open the SPSS Software


To open the SPSS software is very easy; all you have to do is to open the folder of the SPSS file
and click on the application in the folder, and SPSS will open.

When starting SPSS for the first time, the window will ask you what you want to do? Thus, if
you're going to analyze a new data set, check the Enter data option.
Suppose you choose to analyze a new data set, then the SPSS Data Editor will be displayed. The
SPSS Data Editor is the initial framework used to enter data and select the appropriate procedure
for analysis. This window is made up of:
● The Menu Bar which contains the SPSS Main Menu with all its options. Each of these
options has different procedures that are displayed by clicking on each of them.
● The Toolbar, which is made up of different icons that allow direct access to the most
common procedures. The bar of each of these icons, shown by positioning the mouse
over the icon, from left to right.
The SPSS Data Editor, which is made up of cells. Each row represents an element of the data set,
and each column represents a variable and is displayed when the box is enabled Data Views in
ShowBar. Checking each of these variables and pressing the right mouse button shows the
following options;
● Insert variables
● Sort ascendingly
● Sort the data in descending order
Similarly, you can add cases to the variables or columns if you forget something. To insert cases,
you need to click the secondary button of the mouse and select the option to add a case.

Tabs in the SPSS Editor


The SPSS Data Editor has two tabs:
1. Data view: This tab shows the data values
2. Variables view: This tab shows the characteristics of the variables

In the Variables view, each row corresponds to a variable, and each column determines its
characteristic. It has the following features;
● Name: The name of the variable is entered.
● Type: The type of variable is chosen from the possibilities offered by clicking on Type.
● Numeric: A variable whose values are numbers. Values are shown in standard numeric
format (Width and Decimal places are set).
● Comma: A numeric variable where commas determine the thousands.
● Point: A numerical variable where the points determine the thousands.
● Scientific notation: A numeric variable whose values are shown with an embedded E
and an exponential sign that represents a power of base ten.
● Date: A numeric variable whose values are to be displayed.
● Custom Currency: A numeric variable whose values are displayed in one of the various
custom currency formats.
● String: Variables whose values are not numeric. They are also known as alphanumeric
variables.
● Width: Determines the width of the column.
● Decimals: Determines the number of decimal places that appear on the screen.
● Label: Variables can be labeled so that this label appears in subsequent investigations.
● Values: This allows you to enter the modalities of the string type variables.
● Missing Values: SPSS will enable you to encode missing values discreetly or in a
specific range.
● Columns: This will enable you to enter the width of the column. It can also be changed
in Data View by clicking and dragging the edges of the column.
● Alignment: This will enable you to choose between aligning the entered data to the Left,
Right, or Centered.
● Measure: It will help to define the variable either as Ordinal or Nominal.
● Ordinal: It is used to represent data with an intrinsic value (Ex: great, medium, small;
failing, passing, notable, outstanding).
● Nominal: It is used to represent data without intrinsic value (Ex: red, yellow, green).
● Role: Functions that can be assigned to variables for analysis.
● Input: The variable is used as an input (for example, predictor, independent variable).
● Purpose: The variable is used as output or destination (for example, dependent variable).
● Both: The variable is used as input and output.
● None: The variable does not have a role assignment.
● Partition: The variable is used to divide the data into separate samples.
● Segment: Variables with this role are not used as variables in a segmented file.

How to Use the File Menu


From the Main Menu bar, you can access all the menus of the Data Editor. The first menus: File,
Edit, View, Window, and Help, are common in Windows programs. The rest of the menus are
specific to SPSS; they allow making changes to the data, obtaining statistical, numerical,
graphical results. The File menu is one of the important menus you can use in the SPSS software.
It has the following options;
● New: Open a new data, syntax, results, or process file.
● Open: Open an existing data, syntax, results, or process file.
● Open database: Create, edit, and run database queries.
● Read text data: Open text files.
● Close: Close the current file.
● Save: Save the current file.
● Save As: Save the current file with another name.
● Show data file information: (Job file or external file).
● Data cache: It is used to create a temporary copy of data. It is used to improve the
performance of large files read from an external source. It is also used to reduce the
amount of temporary disk space due to a temporary copy of the active file.
● Repository: Connect, Store from SPSS Statistics, Publish to Web, Add file, Retrieve to
SPSS Statistics, Download a file.
● Preview: Displays the current task in full screen.
● Print: Print the current task.
● Recently used data: Shows recently used data.
● Recently used files: Show recently used files.
● Exit: Exit SPSS.
CHAPTER THREE
Analyzing Statistical Data
Introduction to Descriptive Analysis
Once the data has been entered, the first step in data analysis is to perform a descriptive
analysis. This initial analysis provides an idea of the shape of the distribution of the observations.
It allows obtaining statistics of central tendency (mean, median, and mode), dispersion (variance,
standard deviation, range), shape (skewness, kurtosis), position (percentiles), as well as bar, pie,
and histogram charts.
SPSS provides several tools to perform this description under the Analyze menu and the
Descriptive statistics menu. The procedures for carrying out the descriptive analysis are
Frequencies, Descriptive, and Explore.

Frequencies

The Frequencies procedure provides statistical and graphical representations that are useful for
describing different types of variables. It allows obtaining a variable description from the
frequency tables, histograms, bar graphs, percentiles, central tendency indices, and dispersion
indices. Here is how to access frequency;
● Proceed to the Main menu
● Ensure that you select the Analyze menu
● Immediately to select the Descriptive Statistics option
● Click on the Frequencies
In the dialogue box of frequencies, the variable or variables (categorical or quantitative) to be
analyzed are introduced. In this window, there are four command buttons:
I. Statistics: It is used to obtain descriptive statistics for quantitative variables
II. Charts: It is used to make bar graphs, pie graphs, and histograms
III. Format: It is used to choose the order in which you want the result to show
IV. Style: It is used to enhance style sampling. In addition, the option Show frequency
tables in their default display.

Click on Statistics, and a dialogue box will show up. This dialogue is where sets of descriptive
measures are displayed grouped in: Percentile values, Central tendency, Dispersion, and
Distribution. In this window, the descriptive statistics you want to study are marked and click
on Continue.
Click on Charts, and a dialogue box will show up to choose the Chart Type and the Chart
Values. In this window, select the graph you want to make (bar graphs, pie graphs, and
histograms) and click Continue.

Click on format, and a dialogue box will show up to choose the frequency tables and how you
want them to be ordered. You can order them according to the values of the variable or according
to the observed frequencies. In addition, you can choose to delete variables with a large number
of different values or those with more than n categories of groups.
If you click on Style, you will be taken to a dialogue box, which allows you to perform style
sampling.

Descriptive
The Descriptive procedure calculates central tendency statistics, dispersion, and distribution for
several variables, displaying them in a single table and calculating standardized values (z scores).
To access this procedure, you have to follow the steps;
● Proceed to the Main menu

● Ensure that you select the Analyze menu


● Select the Descriptive Statistics
● In the corresponding dialogue box, enter the variable or variables to be analyzed.

If you select Save Standardized Values as Variables, the z scores are saved and added to the data
in the Data Editor for further analysis. The z- score transformations allow the comparison of
variables that are recorded in different units of measurements. Here are some of the options that
will be displayed when you click on the descriptive statistics;

i. Reset allows you to reset all the selected options of the system and move them back
to default mode
ii. Cancel will enable you to ignore all variables selected
iii. Paste sends the syntax of the procedure to the syntax window
iv. Accept, having chosen the specifications, press the Accept button to obtain the
results of the procedure.
Also, suppose you click on the option. In that case, you will find characteristics of central
tendency, dispersion, distribution, and order the variables by the size of their means (in
ascending or descending order), alphabetically, or by the order in which the variables are
selected (the default value). Click on the continue button to generate the result of the descriptive
statistics.

Explore
The Explore procedure generates summary statistics and graphical representations such as Box
Plots, Stem-and-Leaf Plots, Histograms, Normality Plots, Level Scatter Plots, etc. To explore the
data with this feature,
● Choose Analyze in the main menu
● Select Descriptive statistics
● Choose Explore from the drop menu.

A dialogue box will be displayed where one or more dependent variables are selected (dependent
list). Also, you can choose one or more variable factors (Factors list). You can also choose
values to define groups of cases or select an identification variable to label cases (Label cases
by).

In this dialogue box, there are three command buttons:


● Statistics: It is used to look for Confidence intervals for the mean, Central robust
estimators, Outliers, and Percentiles
● Plots: It is used to look for box, stem, and leaf plots, histograms, tests, and normal
probability plots, and level scatter plots with Levene's test
● Options: It is used for treating missing values.
It is important to note that if both statistics and graphs are checked in Visualization, it shows
statistical and graphical results. However, if you check Statistics, it shows only the statistical
results; if you check, Graphics displays only graphical results.
Suppose you click on Statistics; then the following option will be displayed for you to select
from;
● M-estimator
● Outliers
● Percentile

Also, it would help if you chose the confidence interval for the mean. After you have selected the
necessary option, you can click on the continue option.
Suppose you click on plots; you get the following dialogue box;

i. Boxplot

● Factor level together


● Dependents together
● None

ii. Descriptive

● Stem-and-leaf
● Histogram
The Box Plot is a form of graphical representation to summarize the distribution of the values of
a variable. In this representation, instead of displaying the individual values, basic statistics of
the distribution are represented: the median, the 25th percentile, the 75th percentile, and the
extremes of the distribution. This graphic representation is based on the 5th statistical measure.
Information that can obtain from this type of graph:
● The position of the median determines the central tendency

● The width of the box gives an idea of the variability of the observations. If the median is
not in the center of the box, you can deduce that the distribution is asymmetric (if it is
close to the lower limit of the box, positive asymmetric, and if it is close to the upper
limit, negative asymmetric)
● These graphs are handy for comparing the distribution of values between different groups.
A stem and leaf diagram is a technique used to observe the shape of the frequency distribution
table. The stem-and-leaf plot is a graphical representation in which the data is placed in two
levels so that you can visualize the shape of the distribution. A stem-and-leaf plot consists of a
series of horizontal rows of numbers.
The first column, known as stems, consists of a vertical line drawn, and to its right are the
corresponding leaves in each row. The number used to designate a row is its stem; the rest of the
numbers in the row are called leaves. The stem is the largest portion of the number. The leaves
give secondary information about the number.
Graphs with normality test: This procedure checks whether the data come from a normal
population, and for this, it uses two graphs and an analytical contrast. (This procedure will be
used in more advanced practices).
Graphs
In addition to the graphs produced by the previous descriptive procedures, SPSS has a menu
specifically dedicated to obtaining graphical results. Here is how to use the graph
● Proceed to the graphs from the Main Menu
● The following window will open up displaying;

i. Chart builder
ii. Graphboard Template chooser
iii. Legacy dialogue

To proceed with making some interactive graphs, select the chart builder in the Main Menu and a
new window will be shown for you to make selections. Suppose you select the legacy dialogue;
then you can choose the type of graph that you want to make. Here you can choose to make a
Simple, Grouped, or Stacked Bar Chart. If you choose Grouped, click on Define, and the next
window is displayed.
To create a grouped bar chart, you must select a category variable and a grouping variable. In
this way, once the OK button is clicked, a bar chart of the chosen categories is generated,
grouped by the chosen grouping variable.
The standard procedure for generating graphs begins with choosing from the Main Menu of the
desired type of graph. After this choice, the program requests more information about the
characteristics of the desired graph. This is usually done through the Legacy dialogue. Once the
definition of the graph has been confirmed by clicking the OK button, the created graph appears
in the Results Viewer. Selecting it with the mouse and clicking the right button shows the
following figure where you choose Edit content.
CHAPTER FOUR
REGRESSION AND CORRELATION
Regression theory searches for a function that best expresses the relationship between two or
more variables. This practice only studies the situation of two variables. One of the most
interesting applications of regression is that it is Predicting, that is, knowing the value of one of
the variables, estimating the value that the other variable related to it will present.
Correlation theory studies the degree of dependence between the variables; its objective is to
measure the degree of fit between the theoretical function (fitted function) and the point cloud.
When the functional relationship linking the variables X and Y is a line, the regression and
correlation are called Linear Regression and Linear Correlation. Pearson's Linear Correlation
Coefficient gives a measure of Linear Correlation.

Regression and Linear Correlation


The objective of Regression Analysis is to find a simple mathematical function that best
expresses the type of relationship between two or more variables that describes the behavior of a
variable given the values of one or more other variables.
Simple Regression Analysis studies and explains the behavior of a variable Y, a dependent
variable, or variable of interest, based on another variable X, which is called explanatory variable
predictor or independent variable.
Simple linear regression assumes that the values of the dependent variable, which can be noted
as yi, can be written as a function of the values of a single independent variable, which can be
noted as xi. All of these can be represented in a linear model:

Where y are the unknown parameters that we will estimate and εi and yi are random
variables. εi is called a random error or disturbance. When starting a simple linear regression
study, the first step for the researcher is to plot the observations of both variables on a graph
called a scatter plot or point cloud. From this representation, the researcher determines if there

really is a linear relationship between both variables.


First, before performing the regression, we are going to visualize the points using the
scatterDot. To do this,
● Proceed to graph
● Select chart builder
● Proceed to scatterDot in the menu.

● Select or drag the ScatterDot option you want to represent (simple scatter, matrix scatter,
simple dot, overlay scatter, and 3-D scatter).
● Proceed to the variable menu to represent the X and Y-axis of the graph
● Click OK, and the scatter plot is displayed
● The graph will show the possible adequacy of the linear model and its increasing trend.

To obtain the least-squares regression line of Y on X, y = b 0 + b 1 x, the procedure Linear


Regression must be chosen. To do this;
● Select Analyze
● Proceed to Regression
● Select the Linear option

● A new will display for you to move the various variables x and y to their corresponding
field

● To carry out further settings, you need to click on the statistics option
● You will be taken to a new window where you need to select from several options

1. Regression coefficients: Estimates and Confidence intervals


2. Model fit: R Square change, Descriptive, Part and Partial correlation,
Collinearity diagnostic
3. Residual: Durbin-Watson, Casewise Diagnostics

● Click Continue.

● Ensure that you click on the plot button.


● You will be taken to a window where you must choose * ZRESID for Y and * ZPRED for
X. Lastly, the graph option you need.
● Press Continue and proceed to click on the OK button. You can also press the save button
to include other features.
Graphical representations are a way to visually judge and detect strange behaviors of individual
observations and outliers. Several model assumptions can be used: Normality, Linearity,
Homoscedasticity (Equality of Variances), and Independence of Residues. In addition to
representing a Histogram and a normal probability graph, various graphs can also be made to
provide information on the model's hypotheses.
However, the scatterDot diagram is very important because it can be used for any combination of
the following variables: the dependent variable / predicted values (adjusted or predicted),
standardized residuals (standardized), eliminated residuals (regardless of the case), adjusted for
predicted values, studentized residuals, or residuals eliminated students (regardless of the case).
For instance:
Chart residuals / Predicted Values established / Predicted value: This graph is used to test
hypotheses of linearity and Homoscedasticity and study whether the model is appropriate or not.
Chart values observed / Predicted values: This graph includes a line of slope. If the points are
on the line, it indicates that all predictions are perfect. Like the previous graph, it is also used to
test the hypothesis of equality of variances, thus detecting cases where the variance is not
constant and determining whether it is necessary to transform the data that guarantees
Homoscedasticity.
Residual graph / Variable X: This graph represents the residuals against an independent
variable. It is used to detect the adequacy of the model with respect to the selected independent
variable. It is also used to detect whether the variance of the residuals is constant in relation to
the selected independent variable.

Validation and diagnosis of the model


This is used to verify the linear regression model's assumptions such as normality,
Homoscedasticity (equality of variances), and linearity.

Normal
You can perform the normality analysis graphically using histogram, normal probability graph,
and Kolmogorov-Smirnov test

Histogram
You can superimpose a normal curve into a histogram graph. If the residuals follow a normal
distribution, the histogram bars should represent an appearance similar to that of said curve. Here
is how to go about it;
● Proceed to Graph
● Select chat builder
● Proceed to select histogram.
● Select the element properties menu, and you will see a pop-up window, select the variable
that represents the standardized residuals and mark the option Show normal curve

● Press Continue and OK, and the histogram is displayed with the normal curve
superimposed.
If you observe that the data does not reasonably approximate a normal curve, the sample size
considered is very small. That is why this kind of representation is not advisable in small sample
sizes.

Normal Probabilistic Plot


It is the most used graphical procedure to check the normality of a data set. Here is how to obtain
this graph;
● Proceed to Analyze
● Select the option of Descriptive Statistics
● Proceed to select QQ Graphs
● In the resulting dialog box, the variable that represents the standardized residuals is
selected
A graph will be shown, which represents the theoretical and empirical distribution functions of
the typified residuals. The theoretical function is represented under the assumption of normality
on the ordinate axis and on the abscissa axis, the empirical function. Deviations of the points of
the graph from the diagonal indicate alterations from normality. We observe the location of the
points on the graph; these points reasonably approximate the diagonal well, which confirms the
hypothesis of normality.

Normality test: Kolomogorov-Smirnov test


The analytical study of the normality of the residuals will be carried out using the non-
parametric Kolmogorov-Smirnov contrast. Here is how to go about with it;
● Proceed to select Analyze
● Choose Non-parametric tests
● Select dialog boxes and choose the variable
● In the resulting dialog box, the variable that represents the typified residuals is selected
● The table will be shown
Another method of testing for normality using the Kolmogorov-Smirnov test is to proceed to
analyze. The next step is to select descriptive statistics and choose the option explore. Follow
the prompt by inputting the variable into their dialogue box and clicking on the plot option.

Homoscedasticity
Here is how to go about Homoscedasticity;
● Select Analyze
● Proceed to Regression
● Select Linear
● Press the Graphs button and select the variable in the corresponding dialog box* ZRESID
for the Y-axis (this variable represents the standardized residuals) and the variable *
ZPRED (a variable that represents the standardized predicted values) for the X-axis
● Press Continue and Accept, and the graph will be displayed
It is important to note that this graph is handy to detect inadequacy of the proposed model to the
data, possible deviations from the linearity hypothesis. If you observe non-random behavior
trajectories, this indicates that the proposed model does not adequately describe the data.

Independence of residues: Durbin-Watson test


You can carry out the hypothesis of independence of the residuals using the Durbin-Watson test.
Here is how to go about it;
● Select Analyze Menu
● Proceed to regression and select Linear
● In the pop-up window, press the Statistics button
● In the resulting dialog box, choose Durbin-Watson Residuals and click on Continue and
Accept

It is important to note that SPSS provides the value of the Durbin-Watson statistic but does not
show the associated p-value, so the corresponding tables must be used. The Durbin-Watson
statistic measures the degree of autocorrelation between the residual corresponding to each
observation and the previous one.

Quadratic Regression and Correlation


To fit a quadratic or parabolic model, y = b 0 + b 1 x + b 2 x ^ 2, here is how to go about it
● Select Analyze.
● Proceed to Regression.
● Select Curvilinear Estimate
● A new window will be displayed
● The variables X and Y are placed in their corresponding field, and the Quadratic option is
marked in Models.
● Ensure that the option that includes constant in the equation is checked. This is to
include the model's constant term (b 0).
● Press OK, and the outputs will be displayed.
CHAPTER FIVE
PROBABILITY DISTRIBUTIONS: Binomial,
Poisson and Normal
There are many theoretical models in probability theory that are useful in a wide variety of
practical situations. In this chapter, three theoretical models are considered: Binomial, Poisson,
and Normal. Here are some tips to note before carrying out probability distribution.
● It is necessary to activate the Data Editor, that is, open a data file or enter a number in a
box; otherwise, an error message will be displayed.
● You must introduce a number into the box before opening the dialogue box.

How to Carry out Probability Distribution in


SPSS
In the main menu, choose Transform and select Calculate variable; as a result of this action, the
dialogue box is displayed. From this dialogue box, you can carry out the following actions;
● Calculate values for numeric or string (alphanumeric) variables.
● Create new variables or replace existing variable values.
● Selectively calculate values for subsets of data based on logical conditions.
● Use more than 70 built-in functions, including arithmetic, statistical, distribution, string,
etc.

Probability Mass Function


A random variable is not perfectly defined if the values it can take are not known. Since the
behavior of a random variable is governed by chance, you must determine the behavior in terms
of probabilities. For this, two functions are used: the Probability Mass Function and
the Distribution Function.
Probability Mass Function:
The probability mass function of a discrete random variable is a function that assigns a
probability to each possible value of the said variable. The Probability Mass Function of the
discrete random variable X is denoted by Pi. Here is an example of probability mass function:

● If X = Top face of a coin. If the top face of the coin takes the values X = {1, 0} with
probabilities P (X) = {1/2, 1/2}. Thus, the probability of X if it takes;
● The value 1, which is denoted by P [X = 1], will be 1/2 (P [X = 1] = 1/2)

● The value 0, which is denoted by, P [X = 0], will be to 1/2 (P [X = 0] = 1/2).


Probability Density Function:
It is important to note that it does not make sense to determine a function in an unknown form in
a continuous random variable. For this reason, you have to define a function that allows you to
calculate the probability. This function is called the Probability Density Function and is denoted
by f(x).
Here is how to obtain the probability density function of a specific distribution in SPSS;
● Proceed to the main menu
● Select Transform and click Compute variable

● In the dialogue box, proceed to Function group, select PDF & Noncenteral PDF.
● In functions and special variables, the corresponding distribution can be selected, which
is the pdf binomial. Ensure that you double-click it so that it can display in the numeric
expression box (PDF.BINOM(quart, n, prob).

● Input the values for the different variables in the numeric expression box.
● Ensure that you type a name for the target variable and click on the OK button below.

Distribution Function
The Distribution Function of the random variable X is denoted as F {X}. It is defined as the
probability that the path X takes, whose value is less than or equal to x. To obtain values of the
distribution function of a specific model, in SPSS, the option FDA and FDA not centered is
selected in the Function group. You must know the value of the Variable and the parameters
that determine the model. You can find the various model in functions and variables:
● CDF.BERNOUILLI (c, prob): Numeric. Returns result from the cumulative
probability that a value of the Bernoulli distribution, with the given probability
parameter, is less than or equal to c. The probability that the variable X is less than or
equal to c, P [ X ≤ c ], where X is a random variable.
● CDF.BINOM (c, n, prob): Numeric. Returns result from the cumulative probability
that the number of successes in n trials, with the probability of success p in each of
them, is less than or equal to c. That is, the probability that the variable X is less than or
equal to c, P [ X ≤ cant ], where X is a random variable with Binomial distribution of
parameters n and prob. When n is 1, the value is the same as that of CDF.
● CDF.POISSON (c, mean): Numeric. Returns the cumulative probability that a value
from the Poisson distribution, with the specified mean or rate parameter, is less
than or equal to c. That is the probability that the variable X is less than or equal to c, P
[ X ≤ cant ], where X is a random variable with the Poisson distribution of the Mean
parameter.
● CDF.NORMAL (c, mean, typical_dev): Numeric. Returns the cumulative probability
that a value from the Normal distribution, with the specified mean and standard
deviation, is less than or equal to c. That is, the probability that the variable X is less
than or equal to c, P [ X ≤ qty ], where X is a random variable with Normal distribution
of mean parameters and typical_dev.

How to Calculate Quantiles


To calculate quantiles of a specific distribution, select the Inverse GL option in Group of
functions. A cumulative probability allows obtaining the variable's value that accumulates said
probability in a given model. You need to know the cumulative probability and the parameters of
the model.
● NORMAL IDF (p, mean, typical_dev): Numeric. Return the normal distribution
value of specified mean parameters and typical_dev, whose cumulative probability is
p. It calculates a value x such that P [ X ≤ x ] = p, where X is a random variable with
Normal distribution of parameters mean and typ_dev.
Generate random values from a given
distribution
To generate a set of random values from a specific model, select the Random numbers in the
Function group. The number of values generated will depend on the number of active rows in
the Data Editor, so you have to activate as many rows as random numbers you want to
generate.
● RV.BERNOULLI (p): Numeric. Returns a random value from a Bernoulli distribution
with the specified probability parameter p
● RV.BINOM (n, p): Numeric. Returns result from a random value from a Binomial
distribution with the specified number of trials n and the probability parameter p.
● RV.POISSON (mean): Numeric. Returns a random value from a Poisson distribution of
specified rate or mean parameter.
● RV.NORMAL (mean, typical_dev): Numeric. Returns a random value from a Normal
distribution of specified mean and typical_dev parameters

Practical Questions and Answers


Questions 1:
A company dedicated to manufacturing electronic calculators sells calculators to different
businesses in the same locality on the same day, five identical calculators. The probability that
the calculators will be up and running three years later is 0.8. Calculate the probability that:
a) All five calculators are out of service three years later
b) They are in service three years later
c) At most, two calculators are out of order
d) Three calculators are out of order
e) Generate a sample of size 15.
Solution Procedure
Success event: "Calculator that works three years later" = > P [success] = 0.8
The following random variable is defined: X =" No. of machines that work three years after 5
machines". This random variable has a Binomial distribution of parameters n = 5 and prob = 0.8.
Note: Remember that it is necessary to activate the Data Editor, that is, open a data file or enter
a number in a box; otherwise, an error message appears.
A. All five calculators are out of service three years later:
Here is how to go about solving this problem;
● Proceed to the main menu
● Select Transform and click Compute variable
● In the dialogue box, proceed to Function group, select PDF & Noncenteral PDF.
● In functions and special variables, the corresponding distribution can be selected, which
is the pdf binomial. Ensure that you double-click it so that it can display in the numeric
expression box (PDF.BINOM(quart, n, prob)
● Input the values for the different variables in the numeric expression box. That is
PDF.BINOM (0,5,0.8)
● Ensure that you type a name for the target variable, click on the OK, and continue
buttons.
● The answer will be displayed as P [X = 0] = 0.00032
B. All five calculators are in service three years later:
Here is how to go about solving this problem;
● Proceed to the main menu.
● Select Transform and click Compute variable.
● In the dialogue box, proceed to Function group, select PDF & Noncenteral PDF.
● In functions and special variables, the corresponding distribution can be selected, which
is the pdf binomial. Ensure that you double-click it so that it can display in the numeric
expression box (PDF.BINOM(quart, n, prob).
● Input the values for the different variables in the numeric expression box. That is P [X =
5] = PDF.BINOM (5,5,0.8).
● Ensure that you type a name for the target variable, click on the OK, and continue
buttons.
● The answer will be displayed as P [X = 5] = 0.32768.
C. At most, two calculators are out of order:
Here is how to go about solving this problem;
● Proceed to the main menu.
● Select Transform and click Compute variable.
● In the dialogue box, proceed to Function group, select PDF & Noncenteral PDF.
● In functions and special variables, the corresponding distribution can be selected, which
is the CDF binomial. Ensure that you double-click it so that it can display in the
numeric expression box (CDF.BINOM(c, n, p).
● Input the values for the different variables in the numeric expression box. That is P [X ≥
3] = 1- P [X <3] = 1- C DF.BINOM (2,5,0.8).
● Ensure that you type a name for the target variable, click on the OK, and continue
buttons.
● The answer will be displayed as P [X ≥ 3] = 1- P [X <3] = 0.94208.
D. Three calculators are out of order:
Here is how to go about solving this problem;
● Proceed to the main menu.
● Select Transform and click Compute variable.
● In the dialogue box, proceed to Function group, select PDF & Noncenteral PDF.
● In functions and special variables, the corresponding distribution can be selected, which
is the pdf binomial. Ensure that you double-click it so that it can display in the numeric
expression box (PDF.BINOM(quart, n, prob).
● Input the values for the different variables in the numeric expression box. That is P [X =
5 -3 = 2] = PDF.BINOM (2,5,0.8).
● Ensure that you type a name for the target variable, click on the OK, and continue
buttons.
● The answer will be displayed as P [X = 2] = 0.05120.
E. Generate a sample of size 15:
Remember that to generate random numbers; you have to activate as many rows in the Data
Editor as there are random numbers you want to generate. In this case, 15. Here is how to go
about solving this problem;
● Proceed to the main menu.
● Select Transform and click Compute variable.
● In the dialogue box, proceed to the Function group, select Random Number, and select
the corresponding distribution, which is the Rv.Binom. Ensure that you double-click it
so that it can display in the numeric expression box (RV.BINOM (n, p).
● Input the values for the different variables in the numeric expression box. That is
RV.BINOM (5, 0.8).
● Ensure that you type a name for the target variable, click on the OK, and continue
buttons.
● The numbers will be displayed.
Question 2
The probability that an individual suffers a reaction when injecting a certain serum is 0.1.
a. If the serum is injected into a sample of 30 people, calculate the probability that
fewer than 2 have a reaction
b. Calculate the probability of a reaction between 33 and 51 people out of a sample of
400.
Solution Procedure
Each individual to whom the serum is administered suffers or does not react independently of the
rest; therefore, it is necessary to:
The number of individuals who undergo reaction in a sample of n individuals are distributed
according to a Binomial of parameters n and p
1. If the serum is injected into a sample of 30 people, calculate the probability that fewer than
two will have a reaction.
● X: {Number of individuals suffering a reaction}; X → B (30, 0.1)
● P [X <2] = C DF.BINOM (1,30,0.1)
● P [X <2] = P [X = 0] + P [X = 1] = 0.1836950
2. Calculate the probability of a reaction between 33 and 51 people out of a sample of 400.
Y: {Number of individuals who suffer a reaction from a sample of 400}; Y → B (400, 0.1)
n = 400> 10
np = 40> 5
n (1- p) = 360> 5
Therefore n1
● P [33 <X <51] = P [X <50] - P [ X <33 ] = C DF.NORMAL (50,40,6) - C DF.NORMAL
(33,40,6)
● P [33 <X <51] = 0.830537
CHAPTER 6
Confidence Intervals
Confidence intervals are carried out to obtain an interval in which the true value of the parameter
is found with a certain probability. This probability is called the confidence level (1 − α), where
α is the significance level. Here is how to create a confidence interval for the mean of a normal
population using SPSS;
● Proceed to the main menu and select Analyze.
● Ensure that you select Descriptive statistics.
● Click on the option Explore.

● A dialogue box will be displayed where the variable to be analyzed is passed to the
Dependents window.
● By default, SPSS calculates the confidence interval at a level of 95%. To modify this
level, press Statistics and choose your desired level of the confidence interval.

● To obtain the interval, press Continue and then OK


It is important to note that there is another alternative to obtaining the confidence interval, and
here is how to go about it;
● Proceed to the main menu and select Analyze.
● From the option presented, select Compare means and click on T-test for a sample.

● A dialogue box will be displayed where the variable is passed to the Contrast variables
window.
● The confidence level can be modified by pressing the Options button.
● To obtain the interval, press Continue and then OK.

Practical Question
In a sample of 9 tomato juice preparations, the following data has been obtained on vitamin C
content in mg / 100 cc: 21.60; 19.72; 18.92; 23.01; 17.98; 22.06; 25.01; 21.98; 20.80. Assumed
that the vitamin C content of tomato juice is normally distributed:
1. Estimate the mean vitamin C content of tomato juice.
2. Calculate a 95% confidence interval for said quantity.

Solution Procedure
1. Estimate the average vitamin C content of tomato juice.
The mean is obtained by adding the respective vitamin content and dividing by the number of
observations. This will give you 21.23 as the answer.

2. Calculate a 95% confidence interval for said quantity.


Here is how to go about with the calculation;
● Proceed to the main menu and select Analyze
● From the option presented, select Compare means and click on T-test for a sample
● A dialogue box will be displayed, and in the field Contrast variables: enter the
variable Conte_VitaminC and in the field Test value, leave the value at 0.
● The confidence level can be modified by pressing the Options button
● To obtain the interval, press Continue and then OK
● The answer of the 95% confidence interval is 19.5734, and 22.8888 contains the average
vitamin C content of tomato juice.

Confidence Intervals for the difference of means


in independent samples
SPSS constructs confidence intervals for the difference of means in the case of unknown
population variances. To carry out these intervals, the data must be entered as follows: Two
variables are created. One of them contains all the observations, and the other variable is an
indicator variable of the group to which each of the observed values belongs. Here is how to go
about problems involving confidence interval for difference of means in independent samples;
● In the main menu, select Analyze
● Choose the option Compare means and select T-test for independent samples
● A dialogue box will be displayed for you to enter in the Contrast variables field: the
variable that contains the observed values and in the Grouping Variable field: the
variable that indicates the sample to which each of the values belongs
● The two groups that determine each of the samples are defined below, to do this, click on
the Define Groups button
● Enter the values assigned to each sample and press Continue. The confidence level can be
modified in Options.
● Press Continue and Accept.

Practical Question
Two laboratories A and B carry out nicotine determinations in 4 tobacco units, with the
following results: Lab. A: 16, 14, 13, 17. Lab. B: 18, 21, 18, 19
Assuming that the two populations examined are normal and independent with equal variance,
estimate the difference in the mean nicotine content of tobacco at a 95% confidence level.

Solution Procedure
Here is the step of how to solve the problem;
● Ensure to enter the data correctly in the editor text.
● In the main menu, select Analyze.
● Choose the option Compare means and select T-test for independent samples.
● A dialogue box will be displayed for you to enter in the Contrast variables field: Lab A.
You will also need to enter the Grouping Variable field: Lab B.
● The next step is to define the group by clicking on the Define Groups button.
● Enter the values assigned to each sample and press Continue.
● Select Accept to display the output or result.
The p-value is equal to 0.356, which is greater than the significance level of 0.05. This result
indicates that the equality of variances should not be rejected with a confidence level of 95%.
From this result, it can be deduced that the mean nicotine content differs from one laboratory to
another. The nicotine content is higher in laboratory B than in laboratory A.
Confidence Intervals for the difference of means in related samples
In this case, the observations are entered so that each sample is in a column of the SPSS Data
Editor. Here is how to perform a confidence interval for the difference of means in related
samples using SPSS;
● In the main menu, select Analyze
● Proceed to select compare means
● Select T-test for related samples, and a new dialogue will be displayed
● The pairs of variables to be compared are selected simultaneously and passed to Related
variables
● In Options, you can change the confidence level
● Click Continue and OK.
CHAPTER SEVEN
Contrasts of Hypothesis
Hypothesis testing is a statistical process by which an investigation is carried out to verify a
claim of an observed result. All hypothesis testing is based on two mutually exclusive
propositions:

1. Null hypothesis (H0).


2. Alternative hypothesis (H1).

The hypothesis H0 consists of a specific statement about the probability distribution or the value
of the distribution parameters. The name null means without value, effect, or consequence. It
means that H0 represents the hypothesis that will remain unchanged unless the data presented is
false. In contrast, Hypothesis H1 negates the null, and it includes everything that H0 excludes.
Therefore, the researcher's interest should be focused on H1.

Basic Concept of Hypothesis


The decision rule is the criterion you will use to decide whether or not the null hypothesis
should be rejected. This criterion is based on the partition of the sampling distribution of the test
statistic into two mutually exclusive regions or zones: Critical region or rejection region and
Non-rejection region.
Non-rejection region: This is the area of the sampling distribution that corresponds to the test
statistic values in H0. Its probability is called the confidence level and is represented by 1 -α.
Rejection region or critical region: This area of the sample distribution does not correspond to
the test statistic values in H0. Its probability is called the level of significance or risk and is
represented by the letter α.
Once the two zones have been defined, the decision rule consists of rejecting H0 if the contrast
statistic takes a value belonging to the rejection zone or maintaining H0 if the contrast statistic
takes a value belonging to the non-rejection zone.

Types of contrasts
Parametric tests: Once a path with a certain distribution is known, statements are made about
the parameters of that distribution.
Non-parametric tests: It is the opposite of parametric tests because the distribution of the
observations is unknown; therefore, statements are not made.

Types of contrast hypotheses


Simple hypotheses: The hypothesis assigns a single value to the unknown parameter, H:θ= θ0
Composite hypotheses: The hypothesis assigns several possible values to the unknown
parameter, H: θ ∈ ( θ 1, θ 2 )

The Decision Rules


Two-tailed test: If the hypothesis gives rise to a critical region "on both sides" of the parameter
value, we will say that the test is two-tailed.

One-sided test: If the hypothesis gives rise to a critical region "on one side of the parameter
value," we will say that the test is one-sided or one-tailed.

Type I and II errors


Type I error is committed when you decide to reject the null hypothesis H0, which is actually
true. The probability of making this mistake is α.
Type II error is committed when you decide not to reject the null hypothesis H0, which is
actually false. The probability of making this mistake is β.

Parametric hypothesis testing


The purpose of hypothesis testing is to determine whether a proposed (hypothetical) value for a
parameter or other characteristic of the population should be accepted as plausible based on the
sample evidence. We can consider the following stages in carrying out a contrast:
● The researcher formulates a hypothesis about a population parameter
● Select a sample of the population
● Check whether or not the data agree with the hypothesis, that is, compare the observation
with the theory
● If the observed is incompatible with the theoretical, then the researcher can reject the
hypothesis and propose a new theory
● If the observations are compatible with the theoretical, the researcher can continue as if
the hypothesis were true.

Hypothesis Testing for a Sample


The hypothesis contrasts that SPSS builds are those provided by the T-tests, which are of three
types: T-test for one sample, T-test for independent samples, and T-test for related samples. Here
is how to obtain a T-test for a sample;
● In the main menu, select Analyze
● Proceed to select Compare means and tap T-test for a sample

● In the corresponding output, one or more quantitative variables are selected to contrast
them with the same assumed value
● By pressing Options, you can choose the confidence level.
● Click Continue and OK. A statistical summary is obtained for the sample, and the output
of the procedure

Practical Question
An experiment is conducted to study the level (in minutes) required for a desert lizard's body
temperature to reach 45º from its normal body temperature while in the shade. The following
observations were obtained: 10.1; 12.5; 12.2; 10.2; 12.8; 12.1; 11.2; 11.4; 10.7; 14.9; 13.9; 13.3.
Suppose that the variable X: Time to reach 45º follows a Normal law

1. Can it be concluded that the mean time required to reach the lethal dose is 15
minutes?
2. Can it be concluded that the mean time required to reach the lethal dose is less than
13 minutes?

Solution Procedure
1. The following hypothesis test is carried out:

Here is how to solve the problem;


● Proceed to the main menu and select Analyze
● Choose the option Compare means and select T-test for a sample
● In the corresponding window, time is selected for the variable to contrast, and the value of
the test is set to 15
● Press OK, and the result will be displayed
2. The following hypothesis test is carried out:

● Proceed to the main menu and select Analyze


● Choose the option Compare means and select T-test for a sample
● In the corresponding window, time is selected for the variable to contrast, and the value of
the test is set to 13
● Press OK, and the result will be displayed

Hypothesis Testing for Two Independent


Samples
Two samples are said to be independent when the observations of one of them do not condition
the observations of the other at all. The SPSS statistical package performs the T-Test procedure
for independent samples; In this procedure, the mean of two normal and independent populations
is compared. To perform this contrast, subjects must be randomly assigned to the two
populations so that any difference in response is due to treatment (or lack of treatment) and not
to other factors. Here is how to obtain a T-test for independent samples;
● Select Analyze in the main menu
● Select Compare means and click on T-test for independent samples

● A new window will be displayed to select one or more quantitative variables, and a
different T-Test is calculated for each variable.

● Next, select a single grouping variable and click Define Groups to specify the codes of the
groups that you want to compare. Selecting Define Groups, a new screen is displayed
where the number of groups to be compared is specified.

● Press Continue and then Accept. The statistical summary for the two samples and the
output of the procedure will be displayed.

Practical Question 2
Let say you are asked to compare two populations of isolated frogs. The length of the two
samples of populations is measured and expressed in millimeters as follows;
Population 1: 20.1; 22.5; 22.2; 30.2; 22.8; 22.1; 21.2; 21.4; 20.7; 24.9; 23.9; 23.3
Population 2: 25.3; 31.2; 22.4; 23.1; 26.4; 28.2, 21.3, 31.1, 26.2, 21.4
Test the hypothesis of equality of means at a significance level of 1%. (Assuming the length is a
normal distribution).
Solution Procedure
Here is how to solve the problem
● To perform a contrast of independent samples, the data must be entered in the SPSS
Editor as follows:
● Proceed to the main menu and select Analyze.
● Ensure that you select Compare means and click T-test for independent samples, and a
new window will be displayed
● In the test variable window, ensure that you input length as the variable. While in the
grouping variable, select frog as group variable.
● Click on Define Groups and input 1 in group 1 and 2 in group 2
● Press Continue and select Options. The box for the percentage of the confidence interval
is filled with 99.
● Press Continue and Accept, and the result will be displayed.

Hypothesis testing for paired samples


In paired samples, each observation from one sample is paired with an observation from the
other sample; therefore, they are considered as pairs (x, y). The SPSS statistical package
performs the T-Test procedure for paired samples; In this procedure, the means of two variables
of a single group are compared. Here is how to go about with it;
● Proceed to the main menu and select Analyze
● Ensure that you choose Compare means and tap paired samples T-test

● A new window will be displayed for you to select the variable that you want to compare.
It is important to note that the test can be performed simultaneously for more than one
pair of variables.

● Click Continue and then the OK button to provide the statistical summary for the two
samples.

Practical Question
A study was carried out, in which ten individuals participated, to investigate the effect of
physical exercise on the level of cholesterol in plasma. Before exercise, blood samples were
taken to determine the cholesterol level of each individual. Afterward, the participants were put
through an exercise program. Blood samples were taken again at the end of the exercises, and a
second cholesterol level reading was obtained. The results are shown below.
First Cholesterol level: 182; 230; 160; 200; 160; 240; 260; 480; 263; 240
Second Cholesterol level: 190; 220; 166; 150; 140; 220; 156; 312; 240; 250
Check to know if physical exercise has lowered your cholesterol level to a 95% confidence level.

Solution Procedure
Here is the procedure of how to solve the problem presented above;
● To perform a paired sample contrast, the data must be entered in the SPSS Editor as
follows:

● Ensure that you choose Compare means and tap paired samples T-test
● A new window will be displayed for you to select the variables that you want to compare,
such as the first cholesterol level and the second cholesterol level
● Click Continue and then the OK button to provide the statistical summary for the two
samples.

The Chi-square Test Procedure


This test compares the observed and expected frequencies in each category to test whether all
categories contain the same proportion of values or if each category contains a user-specified
proportion of values. Here is how to solve a Chi-square problem using the SPSS statistical
program;
● Proceed to the main menu and select Analyze
● Ensure that you select Non-parametric tests
● Click legacy dialog boxes and select Chi-square
● In the corresponding window, one or more contrast variables are selected into the test
variable window. It is important to note that each variable generates an independent test.

● Proceed to select the Options button to choose any from the following options; descriptive
statistics, quartiles, and control the treatment of data
● Press the Continue button and choose Accept. The outputs will be displayed

Practical Question
Suppose you roll a die 720 times and get the results shown in the table.

Test the hypothesis that the die is well constructed.

Solution Procedure
Here is how to tackle the problem using SPSS software;
● The first step required is that you need to enter the data in the SPSS editor

● The next step is to weight the cases of the values. You can do this by proceeding to data
and selecting weight cases. You can choose to weight cases by displaying the variable
name. Click on the OK button
● The next step is to proceed to the main menu bar and select Analyze.
● Ensure that you select Non-parametric tests.
● Click legacy dialog boxes and select Chi-square.
● In the corresponding window, select frequency into the test variable window.
● Proceed to select the Options button to choose any from the following options;
descriptive statistics, quartiles, and control the treatment of data.
● Press the Continue button and choose Accept. The outputs will be displayed.
● The experimental value of the Chi-square test statistic is equal to 0.683, and the associated
p-value is 0.984 (greater than 0.05); therefore, the null hypothesis is not rejected.
Consequently, the die is well constructed.
CHAPTER EIGHT
Statistical Design of Experiments
The statistical design of experiments includes sets of techniques or statistical construction
methods that allow researchers to carry out the complete process of planning an experiment to
obtain appropriate data and valid or objective conclusions. Different statistical models can be
used in an experiment, but we will consider only CRD and RCBD. These two models are the
most used experimental model in statistics.

Completely Randomized Design


This statistical technique is used when more than two groups have to be compared, and the
numerical response variable. To properly apply this design, the experimental units must be as
homogeneous as possible. Here is how to use the SPSS to carry out a completely randomized
design experiment;
● Select Analyze in the main menu.
● Proceed to Compare means.
● Select ANOVA of a factor.
● It is essential to enter in the List of dependents field as well as the Factor field.
● After entering the variables above, press the accept button, and the ANOVA Table will be
displayed.
An alternative way of carrying out similar calculations using SPSS is by;
● Select Analyze in the main menu.
● Proceed to click on the General Linear Model and select Univariate.
● In the corresponding window, enter in the Dependent variable field and the Fixed factors
field.
● Proceed to click on the accept button, and the ANOVA Table will be displayed for the
complete randomized block design.

How to carry out Contrast


Contrast is also called linear combination C of the variance analysis model. It is used to compare
a specific treatment with other groups of treatments. An example of Contrast is the comparison
of days, e.g., Monday vs. Tuesday. To carry contrasts with SPSS;
● Ensure that you select Analyze.
● Proceed to select compare means
● Choose One-way ANOVA and click on Contrasts.
● A window will appear, input the Contrast and its coefficient
● If you want to carry out another contrast, press Next and enter the coefficients of the
second Contrast
● Press Continue and Accept, and the table of contrasts is displayed with the coefficients
indicating the contrasts to be performed

Randomized Complete Block Design


RCBD works differently from the CRD design. Randomized refers to the fact that treatments are
randomly assigned within blocks. Complete implies that each treatment is used precisely once
within each block. The word block refers to the fact that the experimental units have been
grouped according to an unknown variable. That is, it is assumed that neither the blocks nor the
treatments are chosen randomly.
Furthermore, a characterization of this design is that the block and treatment effects are additive;
that is, there is no interaction between the blocks and the treatments. To perform RCBD using
SPSS, you need to start by defining the variables and entering the data. Let take, for example
● Name: Number_seeds ; Type: Numeric; Width: 2; Decimals: 0

● Name: Treatments ; Type: Numeric; Width: 1; Decimals: 0

● Name: Fir trees ; Type: Numeric; Width: 1; Decimals: 0.


Here is how to resolve the contrasts;
● Proceed to the main menu and select Analyze
● Ensure that you select the General linear Model and click on Univariate
● In the corresponding output, enter the Dependent variable field: The response variable
Number_seeds.

● Also, in the Fixed factors field, enter the Treatments factor and the Firs block.
● To indicate that it is a model without interaction between the treatments and the blocks,
click on model and indicate that it is an additive model in the corresponding output.
● By default, SPSS has a full Factorial model marked, so Custom should be noted.
● Let say you are studying only the main effects of two factors; therefore, select Type:
Main effects and the two factors, Treatments and Firs, into the Model field
● You will observe no distinction between the two factors.
How to Import a Data Source from Excel to
SPSS
Importing an excel file to SPSS is very easy, especially when importing the Excel file as a data
source. Here is how to go about it;
● Select the File menu, go to Open and then click data.

● From the drop-down menu at the bottom of this window, select the file type (for example,
.xls, .xlsx, or .xlsm format) and navigate to the Excel file you want to import.

● Click Open
● The Opening Excel Data Source dialog should appear, allowing you to select an Excel
spreadsheet to import. Check the box Read variable names from the first row of data only
if the Excel spreadsheet you have selected has variable names in the first row. When you
are satisfied with your selection, click OK

How to Import Text to SPSS


Importing text files to SPSS is very easy, and here is how to go about it;
● Select the File menu, go to Read Text file.
● Select the file type (txt format) from the drop-down menu at the bottom of this window,
● After selecting the format, you need to navigate to the text file you want to import.
● Click Open.
● A new window will be displayed asking if your text file matches the preview of SPSS.
You can either click yes or no.
● Click Continue, and you will be taken to a new window where you will be asked about
the arrangement of your file. You will also be asked if variable names are included at the
top of your file.

● You will also be asked other questions


Click continue, and your text will be displayed.
About the Author

Bill Wesley is a researcher, writer, and Teacher with over thirteen years of experience. He started
using the IBM SPSS Statistics software as far back as 2009. He had conducted several kinds of
research and postulated models using this great software. Bill graduated from the University of
California, where he studied Economics and Statistics. He furthered his studies by studying
Applied Statistics at Master's level.

Bill is married and the father of three children. His hobbies are research, singing, writing, and
meditating. He is a statistics teacher by profession, and he has spent over thirteen years teaching
high school students. He loves research and has spent half of his whole life studying different
concepts and how they affect human lives. Bill has written several books as well as research
publications and he has won diverse awards for his contributions to humanity.

You might also like