Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 67

SPSS LAB MANUAL

F:/Academic/27
Refer/WI/ACAD/18

SHRI RAMSWAROOP MEMORIAL COLLEGE OF ENGG. & MANAGEMENT

MBA [SEM III]

LAB ACTIVITY CHART

(SESSION: 2013-14)

SPSS LAB

(MBA-ALL GROUPS)

Room No.: D-102, Name of Faculty: Mr. Vijay Singh/

Mr Shujauddin Niyazi

Name of Lab Instructor: Mr Nusrat Ali

S.No. Name of Experiment

1 Preparing the data file:

Creating a data file and entering data-Changing the SPSS 'Options'; Defining the variables;
Entering data; Modifying the data file; Data entry using Excel; Merging two data files.

2 Generating Descriptive Statistics in SPSS:

Mean, Median, Mode, Range, Mean Deviation, Standard Deviation, Variance, Skewness
and kurtosis.

3 Statistical techniques to explore relationships among variables:

Correlation: Obtaining correlation coefficients between groups of variables and presenting the
results from correlation. Interpretation of output from correlation.

4 Statistical techniques to explore relationships among variables:

Regression: Establishing average relationships between two variables through regression


analysis. Interpretation of output from regression.

5 Statistical Parametric tests:

T-tests-Independent-samples t-test; Paired-samples t-test; T-test for difference of mean.

1
SPSS LAB MANUAL

6 Z-test- Z-test for difference between means. Test of significance for single mean

7 Statistical Non-parametric techniques:

- Chi-square; Kappa Measure of Agreement

8 Analysis of variance (ANOVA)

One-way analysis of variance- One-way between-groups ANOVA with post-hoc tests;


One-way between-groups ANOVA with planned comparisons.

9 Graphical presentation of data through SPSS:

How to Generate Scatter Plots/How to Generate A Histogram/How to Generate A Stem and Leaf
Plot/How to Generate A Box Plot.

10 Lab Mid Semester Examination:

11 Analysis of student’s summer training project work through SPSS

Note: The Lab Test would consist of all the lab assignments prior to that week.

[Mr. Vijay Singh/ Mr Shujauddin Niyazi] [Signature of HOD]

Introduction
SPSS (Statistical Package for the Social Sciences) is a statistical analysis and data
management software package. SPSS can take data from almost any type of file and use them
to generate tabulated reports, charts, and plots of distributions and trends, descriptive
statistics, and conduct complex statistical analyses.

2
SPSS LAB MANUAL

Reading data from an existing file


Data can be entered directly into SPSS, or it can be imported from a number of different
sources. The processes for reading data stored in SPSS data files, spreadsheet applications
like Microsoft Excel.

Basic Structure of an SPSS Data File

SPSS data files are organized by cases (rows) and variables (columns). In this data file, cases
represent individual respondents to a survey. Variables represent each question asked in the
survey.

Reading an SPSS Data File

SPSS data files, which have a .sav file extension, contain saved data. Steps to open
demo.sav file.

1. From the menus choose: File->Open->Data

2. SPSS (*.sav) should be selected in the Files of Type drop-down list.

3
SPSS LAB MANUAL

3. Navigate to the sample_files folder.

4. Select demo.sav and click Open.

The data are now displayed in the Data Editor

Reading Data from Spreadsheets


Rather than typing all of your data directly into the Data Editor, you can read data from
applications like Microsoft Excel. We can also read column headings as variable names.

Steps:-
1. From the menus choose:-File->Open->Data

2. Select Excel (*.xls) from the Files of Type drop-down list.

4
SPSS LAB MANUAL

Opened File dialog box

3. Select demo.xls and click Open to read this spreadsheet.

The Opening Excel Data Source dialog box is displayed, allowing you to specify whether
variable names are to be included in the spreadsheet, as well as the cells that you want to
import. In Excel 5 or later, you can also specify which worksheets you want to import.

Opening Excel Data Source dialog box

Make sure Read variable names from first row of data is selected. This option reads column
headings as variable names. If the column headings do not conform to the SPSS variable-
naming rules, they are converted into valid variable names and the original column headings
are saved as variable labels. If you want to import only a portion of the spreadsheet, specify
the range of cells to be imported in the Range field. Click OK to read the Excel file.

The data now appear in the Data Editor, with the column headings used as variable names. If
you're using a spreadsheet application other than Excel or Lotus, you should be able to export
your data to a supported format that can then be read into SPSS.

5
SPSS LAB MANUAL

Imported Excel data

Reading Data from a Text File


Text files are another common source of data. Many spreadsheet programs and databases can
save their contents in one of many text file formats. Comma or tab-delimited files refer to
rows of data that use commas or tabs to indicate each variable. In this example, the data are
tab delimited.

1. From the menus choose:-File->Read Text Data

2. Choose Text (*.txt) from the Files of Type list.

Opened File dialog box

6
SPSS LAB MANUAL

3. Select demo.txt and click Open to read the selected file.


The Text Import Wizard guides you through the process of defining how the specified text
file should be interpreted.

Text Import Wizard - Step 1 of 6

4. In Step 1, you can choose a predefined format or create a new format in the wizard.

7
SPSS LAB MANUAL

5. Select No to indicate that a new format should be created.

6. Click Next to continue.

As stated earlier, this file uses tab-delimited formatting. Also, the variable names are defined
on the top line of this file.

Text Import Wizard - Step 2 of 6

7. Select Delimited to indicate that the data uses a delimited formatting structure.

8. Select Yes to indicate that variable names should be read from the top of the file.

9. Click Next to continue.Type 2 in the top section of next dialog box to indicate that the first
row of data starts on the second line of the text file.

Text Import Wizard - Step 3 of 6

8
SPSS LAB MANUAL

10. Keep the default values for the remainder of this dialog box and click Next to continue.

The Data preview in Step 4 provides you with a quick way to ensure that your data are being
properly read by SPSS.

Text Import Wizard - Step 4 of 6

9
SPSS LAB MANUAL

11. Select Tab and deselect the other options.

12. Click Next to continue.Because the variable names may have been truncated to fit SPSS
formatting requirements, this dialog box gives you the opportunity to edit any undesirable
names.

10
SPSS LAB MANUAL

Text Import Wizard - Step 5 of

13. Data types can be defined here as well. For example, it's safe to assume that the income
variable is meant to contain a certain dollar amount.To change a data type Under Data
preview, select the variable you want to change, which is Income in this case.

14. Select Dollar from the Data format drop-down list.

Change the data type

15. Click Next to continue.

11
SPSS LAB MANUAL

Text Import Wizard - Step 6 of 6

16. Leave the default selections in this dialog box, and click Finish to import the data.

Saving Data:-
To save an SPSS data file, the Data Editor window must be the active window.
Steps:-
1. From the menus choose File->Save->Browse to the desired directory.

2. Type a name for the file in the File name text box.The Variables button can be used to
select which variables in the Data Editor are saved to the SPSS data file. By default, all
variables in the Data Editor are retained and click Save.

3. The name in the title bar of the Data Editor will change to the filename you specified.
This confirms that the file has been successfully saved as an SPSS data file. The file
contains both variable information (names, type, and, if provided, labels and missing
value codes), and all data values.

12
SPSS LAB MANUAL

WEEK-1 Assignment
-1

Q1:- View the SPSS file(demo.sav) which is contained as a sample file in SPSS and
observe the variable and data view of the file.

Q2:- Create an excel file described as under and then import that excel sheet in spss.

Q3:-Create a Microsoft access database file and import file in SPSS.

Q4: Create a text file and import file in SPSS.

13
SPSS LAB MANUAL

Creating a data file and entering data


There are a number of stages in the process of setting up a data file and analyzing the data. I
will lead you through the process of creating a data file and entering the data using SPSS.
To prepare a data file, three key steps are covered:

1. The first step is to check and modify, where necessary, the options (or preferences, as they
were referred to in earlier versions of SPSS) that SPSS uses to display the data and the
output that is produced.

2. The next step is to set up the structure of the data file by 'defining' the variables.

3. The final step is to enter the data-that is, the values obtained from each participant Or
respondent for each variable.

Data files can also be 'imported' from other spreadsheet-type programs (e.g. Excel). This can
make the data entry process much more convenient, particularly for students who don't have
SPSS on their home computers. You can set up a basic data file on Excel and enter the data at
home. When complete, you can then import the file into SPSS and proceed with the data
manipulation and data analysis stages.

Changing SPSS options:-


Before you set up your data file, it is a good idea to check the SPSS options that govern the
way your data and output are displayed. The options allow you to define how your variables
will be displayed, the size of your charts, and the type of tables that will be displayed in the
output and many other aspects of the program. Some of this will seem confusing at first, but
once you have used the program to enter data and run some analyses you may want to refer
back to this section. If you are sharing a computer with other people (e.g. in a computer lab),
it is worth being aware of these options. Sometimes other students will change Creating a
data file and entering data 29 these options, which can dramatically influence how the
program appears. It is useful to know how to change things back to the way you want them
when you come to use the machine. To open the Options screen, click on Edit from the menu
at the top of the screen and then choose Options. The screen shown in Figure below should
appear.

14
SPSS LAB MANUAL

There are a lot of choices listed, many of which you won't need to change. I have described
the key ones below, organized by the tab they appear under. To move between the various
tabs, just click on the one you want. Don't click on OK until you have finished all the
changes you want to make, across all the tabs.

General tab:-When you come to do your analyses, you can ask for your variables to be
listed in alphabetical order, or by the order in which they appear in the file. I always use the
file order, because this is consistent with order of the questionnaire items and the codebook.
To keep the variables in file order, just click on the circle next to File in the Variable Lists
section. In the Output Notification section, make sure there is a tick next to Raise viewer
window, and Scroll to new output. This means that when you conduct an analysis the Viewer
window will appear, and the new output will be displayed on the screen. In the Output
section on the right-hand side, place a tick in the box No scientific notation for small
numbers in tables. This will stop you getting some. Very strange numbers in your output for
the statistical analyses. In the Session Journal section, make sure the Append option is ticked.
This allows you to record all the SPSS procedures that you undertake in a journal file
(spss.jnl). Click on the Browse button and choose a folder for this to be stored in. Data tab
Click on the Data tab to make changes to the way that your data file is displayed. Make sure
there is a tick in the Calculate values immediately option. This means that when you
calculate a total score the values will be displayed in your data file immediately. If your

15
SPSS LAB MANUAL

variables do not involve values with decimal places, you may like to change the display
format for all your variables. In the section labeled Display format for new numeric
variables, change the decimal place value to 0. This means that all new variables will not
display any decimal places. This reduces the size of your data file and simplifies its
appearance. Output Labels tab the options in this section allow you to customize how you
want the variable names and value labels displayed in your output. In the very bottom section
under Variable values in labels are shown as: choose Values and Labels from the drop-down
options. This will allow you to see both the numerical values and the explanatory labels in
the tables that are generated in the SPSS Viewer window.

Charts tab:-
Click on the Charts tab if you wish to change the appearance of your charts. You can alter the
Chart Aspect Ratio if you wish. You can also make other changes to the way in which the
chart is displayed (e.g. font, color).

Pivot Tables tab:-


SPSS presents most of the results of the statistical analyses in tables called Pivot Tables.
Under the Pivot Tables tab you can choose the format of these tables from an extensive list. It
is a matter of experimenting to find a style that best suits your needs.

Defining the variables:-


Before you can enter your data, you need to tell SPSS about your variable names and coding
instructions. This is called 'defining the variables'. You will do this in the Data Editor window
(see Figure below). The Data Editor window consists of two different views: Data View and
Variable View. You can move between these two views using the little tabs at the bottom left-
hand side of the screen. You will notice that in the Data View window each of the columns is
labeled var.

Procedure for defining your variables:-


To define each of the variables that make up your data file, you first need to click on the
Variable View tab at the bottom of your screen. In this view the variables are listed down the
side, with their characteristics listed along the top (name, type, width, decimals, label etc.)

16
SPSS LAB MANUAL

Your job now is to define each of your variables by specifying the required information for
each variable listed in your codebook. Some of the information you will need to provide
yourself (e.g. name); other bits are provided automatically by SPSS using default values.
These default values can be changed if necessary.

Name:-
In this column, type in the brief variable name that will be used to identify each of the
variables in the data file. Keep these variable names as short as possible and not exceeding
64 characters (SPSS Version 12 onwards) or eight characters (earlier versions of SPSS). They
must follow the naming conventions specified by SPSS. Each variable name must be unique,
must start with a letter, and cannot contain spaces or symbols.

Type:-
The default value for Type that will appear automatically as you enter your first variable
name is Numeric. For most purposes, this is all you will used to use. There are some
circumstances where other options may be appropriate. For example, if you need to enter text
information (e.g. a person's surname), you need to change the type to String. To change the
variable type, click on the right-hand side of the cell, and a box with three dots should appear
giving you the options available. You can also use this window to adjust the width of the
variable and the number of decimal places. Creating a data file and entering data 33 Width
The default value for Width is 8. This is usually sufficient for most data. lf your variable has
very large values (or you have requested a string variable), you may need to change this
default value; otherwise, leave it as is.

Decimals:-
The default value for Decimals is O. lf your variable has decimal places, change this to suit
your needs.

Label:-
The Label column allows you to provide a longer description for your variable than the eight
characters that are permitted under the variable name. This will be, used in the output
generated from the analyses conducted by SPSS. For example, you may wish to give the
label Total Mastery to your variable TMAST.

Values:-
In the Values column you can define the meaning of the values you have used to code your
variables.

Missing:-
Sometimes researchers assign specific values to indicate missing values for their data. This is
not essential-SPSS will recognize any blank cell as missing data. So if you intend to leave a
blank when a piece of information is not available, it is not necessary to do anything with this
Variable View column. If you do intend to use specific missing value codes (e.g. 99=not
applicable), you must specify this value in the Missing Values section, otherwise SPSS will
use the value as a legitimate value in any statistical analyses. Choose the option Discrete
missing values and type the value (e.g. 99) in the space provided. Up to three values can be
specified. If you are using these special codes, it is also a good idea to label these values in
the Values column.

17
SPSS LAB MANUAL

Columns:-
The default column width is usually set at 8, which is sufficient for most purposes. Change it
only if necessary to accommodate your values or long variable names.

Align:-
The alignment of the columns is usually set at 'right' alignment. There is no need to change
this.

Measure:-
The column heading Measure refers to the level of measurement of each of your variables.
The default is Scale, which refers to continuous data measured at interval or ratio level of
measurement. If your variable consists of categories (e.g. sex), click in the cell, and then on
the arrow key that appears. Choose Nominal for categorical data and Ordinal if your data
involve rankings or ordered values.

Modifying Data File:-


After you have created a data file, you may need to make changes to it (e.g. to add, delete or
move variables; or to add or delete cases). Make sure you have the Data Editor Window open
on the screen, showing Data View.

Delete a case:-
Move down to the case (row) you wish to delete. Position your cursor in the shaded section
on the left-hand side that displays the case number. Click once to highlight the row. Press the
Delete button on your computer keyboard. You can also click on the Edit menu and click on
Clear.

Insert a case between existing cases:-


Move your cursor to a cell in the case (row) immediately below where you would like the
new case to appear. Click on the Data menu and choose Insert Case. An empty row will
appear in which you can enter the data of the new case.

Delete a variable:-
Position your cursor in the shaded section (which contains the variable name) above the
column you wish to delete. Click once to highlight the whole column. Press the Delete button
on your keyboard. You can also click on the Edit menu and click on Clear.

Insert a variable between existing variables:-


Position your cursor in a cell in the column (variable) to the right of where you would like
the new variable to appear. Click on the Data menu and choose Insert Variable. An empty
column will appear in which you can enter the data of the new variable.

Move an existing variable(s):-


In the Data Editor window, have the Variable View showing. Highlight the variable you wish
to move by clicking in the left-hand margin. Click and hold your left mouse button and then
drag the variable to the new position (a red line will appear as you drag). Release the left
mouse button when get to the desired spot.

18
SPSS LAB MANUAL

Merge Files:-
There are times when it is necessary to merge different SPSS data files. SPSS allows you to
merge files by adding additional cases at the end of your file, or to merge additional variables
for each of the cases in an existing data file (e.g. when Time 2 data becomes available). This
second option is particularly useful when you have Excel files with information spread across
different spreadsheets that need to be merged by ID.

Steps of merging files:

To merge files by adding cases:-


This procedure will allow you to merge files that have the same variables, but different cases;
for example, where the same information is recorded at two different sites (e.g. clinic
settings) or entered by two different people. The two files should have the same variable
names for the data you wish to merge (although other non-equivalent information can exist in
each file). If the ID numbers used in each file are the same (starting at ID=l, 2, 3); you will
need to change the ID numbers in one of the files before merging so that each Case is still
uniquely identified. To do this, open one of the files, choose Transform from the menu, and
then Compute Variable. Type ID in the Target Variable box, and then ID + 1000 in the
Numeric Expression box (or some number that is bigger than the number of cases in the file).
Click on the OK button, and then on OK in the dialogue box that asks if you wish to change
the variable. This will create new ID numbers for this file starting at 1001, 1002 and so on.
Note this in your codebook for future reference. Then you are ready to merge the files.

Screening and cleaning the data:-


Before you start to analyze your data, it is essential that you check your data set for errors. It
is very easy to make mistakes when entering data and unfortunately some errors can
completely mess up your analyses. For example, entering 35 when you mean to enter 3 can
distort the results of a correlation analysis. Some analyses are very sensitive to what are
known as 'outliers'; that is, values that are well below or well above the other scores. So it is
important to spend the time checking for mistakes initially, rather than trying to repair the
damage later. Although boring, and a threat to your eyesight if you have large data sets, this
process is essential.

Step 1: Checking for errors. First, you need to check each of your variables for scores that
are out of range (i.e. not within the range of possible scores).

19
SPSS LAB MANUAL

Step 2: Finding and correcting the error in the data file. Second, you need to find where
in the data file this error occurred (i.e. which case is involved) and correct or delete the value.

WEEK-2 Assignment
-2

1) The following data regarding a person’s name, age and weight must be entered into a
data set using SPSS.
a. Name Age Weight
b. Mark 39 250
c. Allison 43 125
d. Tom 27 180
e. Cindy 24 130

2) Using the DEFINE VARIABLE WINDOW, give each variable a VARIABLE NAME
(e.g., COLOR, SEX), and a VARIABLE LABEL
>>> Give the VALUES of the variable SEX the LABELS "1 = male" and "2 = female".
>>> Give the VALUES of the variable COLOR the LABELS "1 = white" and "2 =
blue." >>> Be sure to SAVE the file onto a personal storage device (e.g., memory
stick),you will need it for future assignments.

3) Students are required to modify the file in question 1 based on following task
a) Delete the case Tom
b) Insert the case Shyam between the cases Mark and Allison
c) Insert a new variable Nick Name between name and Age
d) Move the variable Nick Name after the variable Name
4) Students are required to perform merging process of two files by adding cases.

20
SPSS LAB MANUAL

Descriptive Statistics
Once you are sure there are no errors in the data file (or at least no out-of-range values on
any of the variables), you can begin the descriptive phase of your data analysis. Descriptive
statistics have a number of uses. These include:

• To describe the characteristics of your sample in the Method section of your report.

• To check your variables for any violation of the assumptions underlying the statistical
techniques that you will use to address your research questions; and to address specific
research questions. Testing of assumptions usually involves obtaining descriptive statistics on
your variables. These descriptive statistics include the mean, standard deviation, range of
scores, skewness and kurtosis. Descriptive statistics can be obtained a number of different
ways, using Frequencies, Descriptive or Explore. These are all procedures listed under the
Analyze, Descriptive Statistics drop-down menu. There are, however, different procedures
depending on whether you have a categorical or continuous variable. Some of the statistics
(e.g. mean, standard deviation) are not appropriate if you have a categorical variable.

Central Tendency:-
Statistics that describe the location of the distribution include the mean, median, mode, and
sum of all the values.

Mean:-
A measure of central tendency.The arithmetic average, the sum divided by number of cases.

Median:-
The value above and below which half of the cases fall, the 50th percentile. If there is an
even number of cases, the median is the average of the two middle cases when they are
sorted in ascending or descending order. The median is a measure of central tendency not
sensitive to outlying values (unlike the mean, which can be affected by a few extremely
high or low values).

Mode:-
The most frequently occurring value. If several values share the greatest frequency of
occurrence, each of them is a mode. The Frequencies procedure reports only the smallest of
such multiple modes.

Sum:-
The sum or total of the values, across all cases with no missing values. Dispersion. Statistics
that measure the amount of variation or spread in the data include the standard deviation,
variance, range, minimum, maximum, and standard error of the mean.

Std. deviation:-
A measure of dispersion around the mean. In a normal distribution, 68% of cases fall within
one standard deviation of the mean and 95% of cases fall within two standard deviations.
21
SPSS LAB MANUAL

For example, if the mean age is 45, with a standard deviation of 10, 95% of the cases would
be between 25 and 65 in a normal distribution.
Variance:-
A measure of dispersion around the mean, equal to the sum of squared deviations from the
mean divided by one less than the number of cases. The variance is measured in units that are
the square of those of the variable itself.

Range:-
The difference between the largest and smallest values of a numeric variable, the maximum
minus the minimum.

Minimum:-
The smallest value of a numeric variable.

Maximum:-
The largest value of a numeric variable.

S. E. mean:-
A measure of how much the value of the mean may vary from sample to sample taken from
the same distribution. It can be used to roughly compare the observed mean to a hypothesized
value (that is, you can conclude the two values are different if the ratio of the difference to
the standard error is less than -2 or greater than +2).

Skewness:-
A measure of the asymmetry of a distribution. The normal distribution is symmetric and has a
skewness value of 0. A distribution with a significant positive skewness has a long right tail.
A distribution with a significant negative skewness has a long left tail. As a guideline, a
skewness value more than twice its standard error is taken to indicate a departure from
symmetry.

Kurtosis:-
A measure of the extent to which observations cluster around a central point. For a normal
distribution, the value of the kurtosis statistic is zero. Positive kurtosis indicates that the
observations cluster more and have longer tails than those in the normal distribution, and
negative kurtosis indicates that the observations cluster less and have shorter tails. SPSS
uses two basic methods for performing analyses and other tasks. The first is menu driven
while the second makes use of one of several programming languages. We will cover only
the use of menus for analyses in this manual. Before beginning the analysis, be sure your data
are displayed in the Data Editor. If they are not displayed Select File->Open->Data

Navigate to the directory in which you saved table and open it.Let’s begin by calculating a
few descriptive statistics on the variable we’ve called x. To do this, select Descriptive. Your
data will not be displayed if you’ve ended the SPSS session or closed the worksheet.

22
SPSS LAB MANUAL

The Descriptive window opens. Notice that the variable x is highlighted in the left hand
window. If it isn’t highlighted, select it by clicking on it. Now move it to the Variables(s):
window by clicking the button. In general, all variables in the data set will be listed in the left
hand window and you will move those on which you wish to perform analyses to the right
hand window as you did with x. You can determine which statistics will be displayed by
clicking the Options...button. If you use this option, a window opens listing all the statistics
that may be displayed with this command. Those that are checked will be displayed. Be sure
that checks are placed by Mean, Std. deviation, Minimum and Maximum. As may be seen,
the Viewer Window is divided into two sections, or as SPSS refers to them, panes. The
lefthand pane is called the Outline Pane while the righthand pane is termed the Display Pane.
You can resize these panes by placing the pointer on the vertical gray bar that separates the
two. When the symbol appears click and drag the bar to the left or right. We will make
limited use of the Outline Pane later but for the moment will concentrate on the Display
Pane. Earlier we indicated that commands can be issued to SPSS by clicking various menu
items or by use of one of several scripting languages. We are using the menu approach in this
manual but SPSS implements our menu driven commands by writing the appropriate
scripting language commands. This is what you see as the first three lines at the top of the
Output Page.

23
SPSS LAB MANUAL

Screen appearance for display of some basic statistics

The results of our analysis appears in the table under the heading Descriptive Statistics. As
may be seen in this table, the variable analyzed is named x, there are 11 valid observations in
the data set the minimum and maximum values of which are three and five respectively. The
mean and standard deviation of the 11 observations are 4.0 and .775 respectively.

24
SPSS LAB MANUAL

WEEK-3 Assignment
-3

Q1) Students are require to prepare a file in SPSS name as Part1.sav on the basis of the
survey questions given below and be sure to SAVE the file onto a personal storage
device (e.g., memory stick),you will need it for future assignments.

Using the file Part1.sav and analyze the following questions.

Q2) What kind of computer do people prefer to own?

Q3) What color do people prefer for their computer?

Q4) Find out the mean, standard deviation, variance, minimum, maximum of employee’s
salary ?

Q5) Find out the mean, standard deviation, variance, minimum, maximum of employee’s
age.

25
SPSS LAB MANUAL

Crosstabulation
Crosstabulation tables (contingency tables) display the relationship between two or more
categorical (nominal or ordinal) variables. The size of the table is determined by the number
of distinct values for each variable, with each cell in the table representing a unique
combination of values. Numerous statistical tests are available to determine whether there is
a relationship between the variables in a table.

This chapter uses the file demo.sav.

In this example, we'll examine the relationship between income level and PDA (personal
digital assistant) ownership.

1. From the menus choose Analyze->Descriptive Statistics->Crosstabs.

26
SPSS LAB MANUAL

2. Select Income category in thousands (inccat) as the row variable.

3. Select Owns PDA (ownpda) as the column variable. Click OK to run the procedure.

The cells of the table show the count or number of cases for each joint combination of values.
For example, 455 people in the income range $25,000–$49,000 own PDAs.

27
SPSS LAB MANUAL

None of the numbers in this table, however, stand out in any obvious way, indicating any
obvious relationship between the variables.

Counts vs Percentage:- It is often difficult to analyze a crosstabulation simply by


looking at the simple counts in each cell.

The fact that there are more than twice as many PDA owners in the $25,000–$49,000 income
category than in the under $25,000 category may not mean much (or anything) since there
are also more than twice as many people in that income category.

1. Open the Crosstabs dialog box again.Click Cells.

28
SPSS LAB MANUAL

2. Click (check) Row in the Percentages group.

3. Click Continue and then click OK in the main dialog box to run the procedure.

A clearer picture now starts to emerge. The percentage of people who own PDAs rises as the
income category rises.

29
SPSS LAB MANUAL

Frequencies:-

The Frequencies procedure provides statistics and graphical displays that are useful for
describing many types of variables. The Frequencies procedure is a good place to start
looking at your data.

For a frequency report and bar chart, you can arrange the distinct values in ascending or
descending order, or you can order the categories by their frequencies. The frequencies report
can be suppressed when a variable has many distinct values. You can label charts with
frequencies (the default) or percentages.

Example.

What is the distribution of a company's customers by industry type? From the output, you
might learn that 37.5% of your customers are in government agencies, 24.9% are in
corporations, 28.1% are in academic institutions, and 9.4% are in the healthcare industry. For
continuous, quantitative data, such as sales revenue, you might learn that the average product
sale is $3,576, with a standard deviation of $1,078.

From the menus choose Analyze->Descriptive Statistics->Frequencies.

30
SPSS LAB MANUAL

WEEK-4 Assignment
-4

Q1:- Apply crosstabulation on file demo.sav ,choose two fields ‘income category’ and ‘owns
pda’ and analyze the relationship between those two.

Q2:-Apply the same analysis using percentage crosstabulation and estimate the advantage of
simple crosstabulation over each percentage crosstabulation.

Q3:- Choose file demo.sav and apply frequency tool on fields ‘owns pda’ and ‘owns TV’
,Interpret the result and derive the conclusion.

Q4:-Apply the same tool and interpret the results graphically using bar chart, pie chart and
histogram.

31
SPSS LAB MANUAL

Charts
This chapter provides examples of various graphical data displays and shows you how to
build the graphs you’re probably most familiar with; you can certainly use the procedures
they present to produce some nifty-looking graphs. And once you get the basic idea of
producing graphs, you should have no problem branching out and making fancy graphs of
your own.

Line Chart:-
A line chart works well as a visual summary of categorical values. Line charts are also useful
for displaying timelines because they demonstrate up and down trends so well. Line graphs
are popular because they’re easy to read. If they’re not the most common type of statistical
chart, they’re a contender for the title.

Simple line charts:-


The following steps generate a simple line chart displaying a single timeline:
1. Choose File➪Open➪ Data and open the Employee data.sav file, which is in the SPSS
installation directory.

2. Choose Graphs➪ Chart Builder. The Chart Builder dialog box appears.

3. In the Choose From list, select Line.

32
SPSS LAB MANUAL

4. Drag the first diagram (the one with the Simple Line tooltip) to the panel at the top. An
Element Properties dialog box appears. You can simply close it because this example uses
the default settings.

5. In the Variables list, drag Current Salary to the Y-Axis rectangle in the panel at the top.

6. Again in the Variables list, drag Date of Birth to the X-Axis rectangle in the panel.

7. Click the OK button.

Charts with multiple lines:-


You can have more than one line appear on a chart by adding more than one variable name to
an axis. But the variables must contain a similar range of values before they can be
represented by the same axis. For example, if one variable ranges from 0 to 1,000 pounds and
another variable ranges from 1 to 2 pounds, the values of the second variable will show up as
a straight line, regardless of how much it actually fluctuates.
The following steps generate a multiline graph:-

1. Choose File➪ Open➪ Data and open the Cars.sav file. The file is in the SPSS installation
directory.
2. Choose Graphs➪ Chart Builder.

3. In the Choose From list, select Line to specify the general type of graph to be constructed.

4. To specify that this graph should contain multiple lines, select the second diagram (the one
with the Multiple Line tooltip) and drag it to the panel at the top. The Element Properties
dialog box pops up, but you can close it because the default values work fine.

5. In the Variables list, select Number of Cylinders and drag it to the rectangle named X-Axis
in the diagram.

6. In the Variables list, select Engine Displacement and drag it to the Y-Axis rectangle in the
panel at the top. The word Mean is added to the annotation because the values displayed
33
SPSS LAB MANUAL

on this axis will be the mean values of the engine displacement.

7. In the Variables list, select Horsepower and drag it to the Y-axis also.
Be careful how you drop Horsepower. To add Horsepower as a new variable.

Scatterplots:-
A scatterplot is simply an X-Y plot where you don’t care about interpolating
the values — that is, the points are not joined with lines. Instead, a disconnected
dot appears for each data point. The overall pattern of these scattered dots often exposes a
pattern or trend.

Simple scatterplots:-
The following steps show you how to construct a simple scatterplot:-

1. Choose File➪Open➪Data and open the Employee data.sav file. The file is in the SPSS
installation directory.

2. Choose Graphs➪Chart Builder.

3. In the Choose From list, select Scatter/Dot.

4. Select the simplest scatterplot diagram (the one with the Simple Scatter tooltip), and drag
it to the panel at the top.

5. In the Variables list, select Beginning Salary and drag it to the rectangle labeled X-Axis in
the diagram.

6. In the Variables list, select Current Salary and drag it to the rectangle labeled Y-Axis in the
diagram.Click the OK button.

34
SPSS LAB MANUAL

Each dot on the scatterplot in above Figure represents both the starting salary and the current
salary of one employee. The most obvious fact you can derive from this is that the current
salary depends largely on the starting salary. In the pattern of the dots, it’s easy to see a
normal line from the lower left to the upper right. Any dot on that imaginary line represents
the salary of an employee who received a normal raise. The dots above the line are the
employees who got above-average raises, and those below the line are those with below-
average raises. This plot has the shortcoming that the length of service is not considered.

Histograms:-
A histogram represents the number of items that appear within a range of values (or within a
bin, statistically speaking — see Chapter 7). You can use a histogram to look at a graphic
representation of the frequency distribution of the values of a variable. Histograms are useful
for demonstrating the patterns in your data when you want to display information to others
rather than discover data patterns for yourself.

Simple histograms:-
You can use the following steps to create a simple histogram that displays the number of
automobiles (in the survey used in the example) that have particular gas mileage capabilities
for each of several years:-

1. Choose File➪Open➪Data and open the Cars.sav file, which is in the SPSS installation
directory.

2. Choose Graphs➪Chart Builder. The Chart Builder dialog box appears.

35
SPSS LAB MANUAL

3. In the Choose From list, select Histogram.

4. Drag the first graph diagram (the one with Simple Histogram tooltip) to the panel at the
top of the window.

5. In the Variables list Select the Model Year variable and drag it to the X-Axis rectangle in
the paneland Select Miles Per Gallon and drag it to the Count rectangle on the left side of
the panel then click the OK button.

The histogram shown in Figure below appears.

Stacked histograms:-
You can create a histogram that is more like a bar chart. In a stacked histogram, the overall
extent of the bars represents the sum of the values in each category, and different categories
of a third variable are indicated by displaying portions of the bars in different colors. In this
type of histogram, the scale on the left can be used to gauge the relative sizes of each of the
colored segments of a bar. The overall height of each bar is the sum of the miles per gallon in
each model year (as it would be in
a bar chart). Here each bar is a stack of rectangles, each one representing a portion of the
total number of cars — in this case, cars with a certain number of cylinders. The following
steps produce a stacked histogram displaying the same information as shown in the preceding
simple histogram, but this one displays sums instead of means:

1. Choose File➪Open➪Data and open the Cars.sav file.

2. Choose Graphs➪Chart Builder.

3. In the Choose From list, select Histogram.

4. Drag the second graph diagram to the panel at the top of the window.

5. In the Variables list select the Model Year variable and drag it to the X-Axis rectangle.
select Miles Per Gallon and drag it to the Count rectangle on the left side of the panel.

36
SPSS LAB MANUAL

select Number of Cylinders and drag it to the Stack Set Color rectangle, at the upper right
then click the OK button.

The histogram shown in Figure appears below

Boxplots:-
A boxplot uses graphic elements to display five statistics at one time within each categorical
value. The statistics are the minimum value, first quartile, median value, third quartile, and
maximum value. A boxplot is particularly good for helping you spot values lying well outside
the range of normal values.

Simple boxplots:-
A simple boxplot displays the range of values of a single scale variable for all values of a
categorical variable. The following steps guide you through the creation of a simple box plot:

1. Choose File➪Open➪Data and open the Employee data.sav file, which is in the SPSS
installation directory.

2. Choose Graphs➪Chart Builder.

3. In the Choose From list, select Boxplot.

4. Drag the first graph diagram (the one with the Simple Boxplot tooltip) to the panel at the
top of the window.

5. In the Variables list, do the following:


a. Select the Educational Level variable and drag it to the X-Axis rectangle.
b. Select the Current Salary variable and drag it to the Y-Axis rectangle.

6. Click the OK button.

37
SPSS LAB MANUAL

The box plot shown in Figure appears below.

each vertical column of graphics represents all the values for a category. Values beyond the
extents of the first and third quartiles are marked with circles or stars; those marked with
stars represent extremes. You can look at a box plot of this type to find where your data may
be out of whack.

Clustered box plots:-


A clustered box plot displays the values of three variables in one graph. A box plot displays a
lot of information — and if it’s displaying three variables, it can get very busy visually.
Fortunately, it’s easier to read on-screen in color than it is here on this page in shades of gray.
The legend in the upper-right corner assigns colors to the categorical values; those colors
appear in the boxes to show you which is which. You’re also shown the ID numbers of cases
with extreme values. Use the following steps to construct a clustered box plot.

1. Choose File➪Open➪Data and open the Employee data.sav file.

2. Choose Graphs➪Chart Builder.

3. In the Choose From list, select Boxplot.

4. Drag the second graph diagram (the one with the Clustered Boxplot tooltip) to the panel at
the top of the window.

5. In the Variables list, do the following:


a. Drag the Minority Classification variable to the X-Axis rectangle.
b. Drag the Current Salary variable to the Y-Axis rectangle.
c. Drag the Educational Level variable to the Cluster on X rectangle. Click OK button.

38
SPSS LAB MANUAL

The boxplot shown in Figure below appears.

WEEK-5 Assignment
-5

39
SPSS LAB MANUAL

1) Using file staffsurvey3ED.sav, Generate a histogram to explore the distribution of


scores on the Staff Satisfaction Scale (totsatis).

2) Generate a bar graph to assess the staff satisfaction levels for permanent versus casual
staff employed for less than or equal to 2 years, 3 to 5 years and 6 or more years. The
variables you will need are totsatis, employstatus and servicegp3.

3) Generate a boxplot to explore the distribution of scores on the staff satisfaction scale
(totsatis) for the different age groups (age).

4) Usng data file staffsurvey3ED.sav, generate a scatterplot to explore the relationship


between years of service and staff satisfaction. Try first using the service variable
(which is very skewed) and then try again with the variable towards the bottom of the
list of variables (logservice).

Sorting and Selecting Data

40
SPSS LAB MANUAL

Data files are not always organized in the ideal form for your specific needs. To prepare data
for analysis, you can select from a wide range of file transformations, including the ability to
Sort data. You can sort cases based on the value of one or more variables. Select subsets of
cases. You can restrict your analysis to a subset of cases or perform simultaneous analyses on
different subsets. The examples in this chapter use the data file demo.sav.

Sorting Data:-
Sorting cases (sorting rows of the data file) is often useful and sometimes necessary for
certain types of analysis.To reorder the sequence of cases in the data file based on the value
of one or more sorting variables:

1. From the menus choose Data->Sort Cases,This opens the Sort Cases dialog box.

2. Add the Age in years (age) and Household income in thousands (income) variables to the
Sort By list.

If you select multiple sort variables, the order in which they appear on the Sort By list
determines the order in which cases are sorted. In this example, based on the entries in the
Sort By list, cases will be sorted by the value of Household income in thousands (income)
within categories of Age in years (age). For string variables, uppercase letters precede their
lowercase counterparts in sort order (for example, the string value Yes comes before yes in
the sort order).

Split-File Processing:-

41
SPSS LAB MANUAL

To split your data file into separate groups for analysis:

1. From the menus choose Data->Split File ,This opens the Split File dialog box.

Split File dialog box

2. Select Compare groups or Organize output by groups. The examples following these steps
show the differences between these two options.

3. Select Gender (gender) to split the file into separate groups for these variables. You can
use numeric, short string, and long string variables as grouping variables. A separate analysis
is performed for each subgroup defined by the grouping variables. If you select multiple
grouping variables, the order in which they appear on the Groups Based On list determines
the manner in which cases are grouped. If you select Compare groups and run the
Frequencies procedure, a single pivot table is created.

Split File output with single pivot table

42
SPSS LAB MANUAL

4. If you select Organize output by groups and run the Frequencies procedure, two pivot
tables are created: one for females and one for males.

Split File output with pivot table for females

Split File output with pivot table for males

Sorting Cases for Split-File Processing:-

43
SPSS LAB MANUAL

The Split File procedure creates a new subgroup each time it encounters a different value for
one of the grouping variables. Therefore, it is important to sort cases based on the values of
the grouping variables before invoking split-file processing. By default, Split File
automatically sorts the data file based on the values of the grouping variables. If the file is
already sorted in the proper order, you can save processing time if you select File is already
sorted.

Turning Split-File Processing On and Off:-


Once you invoke split-file processing, it remains in effect for the rest of the session unless
you turn it off.Analyze all cases Turns split-file processing off.Compare groups and Organize
output by groups Turns split-file processing on.If split-file processing is in effect, the
message Split File on appears on the status bar at the bottom of the application window.

Selecting Subsets of Cases:-


You can restrict your analysis to a specific subgroup based on criteria that include variables
and complex expressions. You can also select a random sample of cases. The criteria used to
define a subgroup can include:

1. Variable values and ranges


2. Date and time ranges
3. Case (row) numbers
4. Arithmetic expressions
5. Logical expressions
6. Functions

To select a subset of cases for analysis from the menus choose Data->Select Cases.

Select Cases dialog box

44
SPSS LAB MANUAL

Selecting Cases Based on Conditional Expressions:-


To select cases based on a conditional expression: Select If condition is satisfied and click If
in the Select Cases dialog box. This opens the Select Cases If dialog box.

Select Cases If dialog box

The conditional expression can use existing variable names, constants, arithmetic operators,
logical operators, relational operators, and functions. You can type and edit the expression in
the text box just like text in an output window. You can also use the calculator pad, variable
list, and function list to paste elements into the expression.

Selecting a Random Sample:-

To obtain a random sample:-


1. Select Random sample of cases in the Select Cases dialog box.
2. Click Sample. This opens the Select Cases Random Sample dialog box.

Select Cases Random Sample dialog box

45
SPSS LAB MANUAL

You can select one of the following alternatives for sample size.

Approximately:-
A user-specified percentage. This option generates a random sample of approximately the
specified percentage of cases.

Exactly:-
A user-specified number of cases. You must also specify the number of cases from which to
generate the sample. This second number should be less than or equal to the total number of
cases in the data file. If the number exceeds the total number of cases in the data file, the
sample will contain proportionally fewer cases than the requested number.

Selecting a Time Range or Case Range:-


To select a range of cases based on dates, times, or observation (row) numbers:
Select Based on time or case range and click Range in the Select Cases dialog box. This
opens the Select Cases Range dialog box, in which you can select a range of observation
(row) numbers.

Select Cases Range dialog box

First Case:-
Enter the starting date and/or time values for the range. If no date variables are defined, enter
the starting observation number (row number in the Data Editor, unless Split File is on). If
you do not specify a Last Case value, all cases from the starting date/time to the end of the
time series are selected.

Last Case:-
Enter the ending date and/or time values for the range. If no date variables are defined, enter
the ending observation number (row number in the Data Editor, unless Split File is on). If
you do not specify a First Case value, all cases from the beginning of the time series up to the
ending date/time are selected. For time series data with defined date variables, you can select
a range of dates and/or times based on the defined date variables. Each case represents
observations at a different time, and the file is sorted in chronological order.

46
SPSS LAB MANUAL

Select Cases Range dialog box (time series)

To generate date variables for time series data from the menus choose Data->Define Dates .

Unselected Cases:-
You can choose one of the following alternatives for the treatment of unselected cases
Filtered. Unselected cases are not included in the analysis but remain in the data file. You can
use the unselected cases later in the session if you turn filtering off. If you select a random
sample or if you select cases based on a conditional expression, this generates a variable
named filter_$ with a value of 1 for selected cases and a value of 0 for unselected cases.

Deleted:-
Unselected cases are deleted from the data file. By reducing the number of cases in the open
data file, you can save processing time. Deleted cases can be recovered only by exiting from
the file without saving any changes and then reopening the file. The deletion of cases is
permanent if you save the changes to the data file.

Case Selection Status:-


If you have selected a subset of cases but have not discarded unselected cases, unselected
cases are marked in the Data Editor with a diagonal line through the row number.

Case selection status

47
SPSS LAB MANUAL

WEEK-6 Assignment
-6

Choose demo.sav as data file and solve these problems:-

Q1:- Sort the data file according to age of people.

Q2:- Sort the data file on the basis of two or more fields using split file tool. For example you
can choose ‘income category’ and ‘gender’ fields and describe the advantage of choosing
two or more fields for split file tool.

Q3:- Filter the datafile according to following conditions using conditional filtering.

( i) Find out the record of people who have marital status=1 and age>=35.

(ii) Find out the record of people who are more than 30 years old and unmarried.

(iii) Find out the record of people who have household income between 40(thousand) and
60(thousand) and are male.

Q4:-Filter the datafile using random sample of cases.

(i) Find out randomly 10% of all records available.

(ii) Find out randomly 25 records of people from first hundred records.

(iii) Filter the datafile according to age of people using based on time or case range.

(iv) Reset the whole data file to its initial format.

48
SPSS LAB MANUAL

Correlation
Correlation analysis is used to describe the strength and direction of the linear relationship
between two variables. There are a number of different statistics available from SPSS,
depending on the level of measurement and the nature of your data. In this chapter, the
procedure for obtaining and interpreting a Pearson product-moment correlation coefficient is
presented along with Spearman rho. Pearson product-moment coefficient is designed for
interval level (continuous) variables. It can also be used if you have one continuous variable
(e.g. score son a measure of self-esteem) and one dichotomous variable (e.g. sex:
MlF).Spearman rank order correlation is designed for use with ordinal level or ranked data
and is particularly useful when your data does not meet the criteria for Pearson's correlation.
SPSS will calculate two types of correlation for you. First, it will give you a simple bivariate
correlation (which just means between two variables), also known as zero-order correlation.
SPSS will also allow you to explore the relationship between two variables, while controlling
for another variable. This is known as partial correlation.Pearson correlation coefficients (r)
can only take on values from -1 to + l. The sign out the front indicates whether there is a
positive correlation (as one variable increases, so too does the other) or a negative correlation
(as one variable increases, the other decreases). The size of the absolute value (ignoring the
sign) provides an indication of the strength of the relationship. A perfect correlation of 1 or -1
indicates that the value of one variable can be determined exactly by knowing the value on
the other variable. A scatter plot of this relationship would show a straight line. On the other
hand, a correlation of 0 indicates no relationship between the two variables. Knowing the
value on one of the variables provides no assistance in predicting the value on the second
variable.

Example:-
To demonstrate the nse of correlation, I will explore the interrelationships among some of the
variables included in the survey3ED.sav data file provided on the website accompanying this
book. The survey was designed to explore the factors that affect respondents' psychological
adjustment and wellbeing (see the Appendix for a full description of the study). In this
example, I am interested in assessing the correlation between respondents' feelings of control
and their level of perceived stress. Details of the two variables I will be using are provided in
the following table. If you wish to follow along with this example, you should start SPSS and
open the survey3ED.sav file.

Interpretation of output from correlation:-


For both Pearson and Spearman results, SPSS provides you with a table giving the
correlation coefficients between each pair of variables listed, the significance level and the
number of cases. The results for Pearson correlation are shown in the section headed
Correlation. If you requested Spearman rho, these results are shown in the section labelled
Nonparametric Correlations. The way in which you interpret the output from the parametric
and non-parametric approaches is the same.

Step 1: Checking the information about the sample.


The first thing to look at in the table labelled Correlations is the N (number of cases). Is this
correct? If there are a lot of missing data, you need to find out why. Did you forget to tick the
Exclude cases pairwise in the missing data option? Using listwise deletion (the other option),
any case with missing data on any of the variables will be removed from the analysis. This

49
SPSS LAB MANUAL

can sometimes severely restrict your N. In the above example we have 426 cases that had
scores on both of the scales used in this analysis. If a case was missing information on either
of these variables, it would have been excluded from the analysis.

Step 2: Determining the direction of the relationship.


The second thing to consider is the direction of the relationship between the variables. Is
there a negative sign in front of the correlation coefficient value? If there is, this means there
is a negative correlation between the two variables (i.e. high scores on one are associated
with low scores on the other).
The interpretation of this depends on the way the variables are scored. Always check with your
questionnaire, and remember to take into account that for many scales some items are
negatively worded and therefore are reversed before scoring. What do high values really
mean? This is one of the major areas of confusion for students, so make sure you get this
clear in your mind before you interpret the correlation output.
In the example given here, the Pearson correlation coefficient (-.58) and Spearman rho value
(-.56) are negative, indicating a negative correlation between perceived control and stress.
The more control people feel they have, the less stress they experience.

Step 3: Determining the strength of the relationship.


The third thing to consider in the output is the size of the value of the correlation coefficient.
This can range from -1.00 to 1.00. This value will indicate the strength of the relationship
between your two variables. A correlation of 0 indicates no relationship at all, a correlation of
1.0 indicates a perfect positive correlation, and a value of -1.0 indicates a perfect negative
correlation. How do you interpret values between 0 and I? Different authors suggest different
interpretations; however, Cohen (1988, pp. 79-81) suggests the following guidelines:
These guidelines apply whether or not there is a negative sign out the front of your r value.
Remember, the negative sign refers only to the direction of the relationship, not the strength.
The strength of correlation of r=.5 and r= -.5 is the same. It is only in a different direction.
In the example presented above, there is a large correlation between the two variables
(above .5), suggesting quite a strong relationship between perceived control and stress.

Step 4: Calculating the coefficient of determination.


To get an idea of how much variance your two variables share, you can also
calculate what is referred to as the coefficient of determination. Sounds impressive, but all
you need to do is square your r value (multiply it by itself). To convert this to 'percentage of
variance', just multiply by 100 (shift the decimal place two columns to the right). For
example, two variables that correlate r=.2 share only .2 x .2 = .04 = 4 per cent of their
variance. There is not much overlap between the two variables. A correlation of r=.5,
however, means 25 per cent shared variance (.5 x.5 = .25).
In our example, the Pearson correlation is .581, which when squared indicates 33.76 per cent
shared variance. Perceived control helps to explain nearly 34 per cent of the variance in
respondents' scores on the perceived stress scale. This is qnite a respectable amount of
variance explained when compared with a lot of the research conducted in the socialsciences.

Step 5: Assessing the significance level.


The next thing to consider is the significance level (listed as Sig. 2 tailed). This is a
frequently misinterpreted area, so care should be exercised here. The level of statistical
significance does not indicate how strongly the two variables are associated (this is given by
r or rho), but instead it indicates how much confidence we should have in the results

50
SPSS LAB MANUAL

obtained. The significance of r or rho is strongly inflnenced by the size of the sample. In a
small sample (e.g. n=30), you may have moderate correlations that do not reach statistical
significance at the traditional p<.05 level. In large samples (N=100+), however, very small
correlations may reach statistical significance. While you need to report statistical
significance,you should focus on the strength of the relationship and the amount of shared
variance (see Step 4).

Presenting the results from correlation:-


The results of the above example using Pearson's correlation could be presented in a research
report as follows. If you need to report the results for Spearman's, just replace the r value
with the rho value shown in the output. The relationship between perceived control of
internal states (as measured by the PCOISS) and perceived stress (as measured by the
Perceived Stress scale) was investigated using Pearson product~moment correlation
coefficient. Preliminary analyses were performed to ensure no violation of the assumptions of
normality, linearity and homoscedasticity. There was a strong, negative correlation between
the two variables, r = -.58, n = 426, P < .0005, with high levels of perceived control
associated with lower levels of perceived stress. Correlation is often used to explore the
relationship among a group of variables, rather than just two as described above. In this case,
it would be awkward to report all the individual correlation coefficients in a paragraph; it
would be better to present them in a table.

Obtaining correlation coefficient between group of variables:-


In the previous procedures section, I showed you how to obtain correlation coefficients
between two continuous variables. If you have a group of variables and you wish to explore
the interrelationships among all of them, you can ask SPSS to do this all in one procedure.
Just include all the variables in the Variables box. This can, however, result in an enormous
correlation matrix that can be difficult to read and interpret.
Sometimes you want to look at only a subset of all these possible relationships. For example,
you might want to look at the relationship between control measures (Mastery, PCOISS) and
a number of different adjustment measures (positive effect, negative effect, life satisfaction).
You don't want a full correlation matrix, because this would give you correlation coefficients
among all the variables, including between each of the various pairs of adjustment measures.
There is a way that you can limit the correlation coefficients that are displayed. This involves
using Syntax Editor (described in Chapter 3). The following procedure uses the Syntax
Editor to limit the correlation coefficients that are produced by SPSS.

51
SPSS LAB MANUAL

WEEK-7 Assignment
-7

1. Using file survey3ED.sav students are required to analyze following :


“Is there a relationship between the amount of control people have over their internal
states and their levels of perceived stress”? Do people with high levels of perceived
control experience lower levels of perceived stress?

2) Using the data in the file anxiety.sav that is located in ~/SPSSlnc/, determine the
correlation between anxiety and score.

3) Using data file sleep3ED.sav, check the strength of the correlation between scores on
the Sleepiness and Associated Sensations Scale (totSAS) and the Epworth Sleepiness
Scale (ess).

52
SPSS LAB MANUAL

Chi Square

There are two different types of chi-square test, both involving categorical data.

1. The chi-square test for goodness of fit explores the proportion of cases that fall into the
various categories of a single variable, and compares these with hypothesized values.

2. The chi-square test for independence is used to determine whether two categorical
variables are related. It compares the frequency of cases found in the various categories of
one variable across the different categories of another variable. For example: is the
proportion of smokers to non-smokers the same for males and females? Or, expressed
another way: are males more likely than females to be smokers?

Chi-square test for goodness-of-fit:-


This test, which is also referred to as the one-sample chi-square, is often used to compare the
proportion of cases from a sample with hypothesized values or those obtained previously
from a comparison population. All that is needed in the data file is one categorical variable
and a specific proportion against which you wish to test the observed frequencies. This may
test that there is no difference in the proportion in each category (50%/50%) or a specific
proportion obtained from a previous study.

Example of research question: We will test whether the number of smokers in the
survey3ED.sav data file is equivalent to that reported in the literature from a previous larger
nationwide study (20%).

What is needed:
• One categorical variable, with two or more categories: smoker (Yes/No).
• A hypothesised proportion (20% smokers; 80% non-smokers or .2/.8).

53
SPSS LAB MANUAL

Chi-square test for independence:-


This test is used when you wish to explore the relationship between two categorical
variables. Each of these variables can have two or more categories. This test compares the
observed frequencies or proportions of cases that occur in each of the categories, with the
values that would be expected if there was no association between the two variables being
measured. It is based on a cross tabulation table, with cases classified according to the
categories in each variable (e.g. male female; smoker/non-smoker).

When a 2 by 2 table (two categories in each variable) is encountered by SPSS, the output
from chi-square includes an additional correction value (Yates' Correction for Continuity).
In the following procedure, using the survey3ED.sav data file, I will demonstrate the use of
chi-square using a 2 by 2 design. If your study involves variables with more than two
categories (e.g. 2 by 3, 4 by 4), you will notice some slight differences in the output.

54
SPSS LAB MANUAL

Summary for chi-square test for independence:-


Example of research question: There are a variety of ways questions can be phrased: is there
an association between gender and smoking behaviour? Are males more likely to be smokers
than females? Is the roportion of males that smoke the same as the proportion of females?
What you need: Two categorical variables, with two or more categories in each:

• Gender (MalelFemale)

• Smoker (YeslNo)

55
SPSS LAB MANUAL

Interpretation of output from chi-square for independence Assumptions:-


The first thing you should check is whether you have violated one of the assumptions of chi-
square concerning the 'minimum expected cell frequency', which should be 5 or greater (or at
least 80 per cent of cells have expected frequencies of 5 or more). This information is given
in a footnote below the Chi-Square Tests table. Footnote b in the example indicates that '0
cells (.0%) have expected count less than 5'. This means that we have not violated the
assumption, as all our expected cell sizes are greater than 5 (in our case, greater than 35.87).

Kappa Measure of agreement:-


One of the other statistics for categorical data available within the SPSS Crosstabs procedure
is the Kappa Measure of Agreement. This is commonly used in the medical literature to
assess inter-rater agreement (e.g. diagnosis from two different clinicians) or the consistency
of two different diagnostic tests (new developed test versus a gold standard).

Summary for Kappa Measure of Agreement:-


Example of research question:-
How consistent are the diagnostic classifications of the Edinburgh Postnatal Depression Scale
and the Depression, Anxiety and Stress Scale? What you need: Two categorical variables
with an equal number of categories (e.g. diagnostic classification from Rater 1 or Test 1:
O=not depressed, l=depressed; and the diagnostic classification of the same person from
Rater2 or Test 2).

Assumptions: Assumes equal number of categories from Rater 1 and Rater 2.


Parametric alternative: None.

In the example below, we will test the degree of agreement between two measures of
depression in a sample of postnatal women. In the depress3ED.sav file, each woman's scores
on the Edinburgh Postnatal Depression Scale (EPDS: Cox, Holden & Sagovsky 1987) and
the Depression, Anxiety and Stress Scales (DASS-Dep: Lovibond & Lovibond 1995) were
classified according to the recommended cut points for each scale. This resulted in two
variables with scores of 0 (not depressed) and 1 (depressed). The aim here was to see if the
women identified with depression on the EPDS were also classified as depressed on the
DASS Depression scale (DASS-Dep).

56
SPSS LAB MANUAL

Interpretation of output from Kappa:-


The main piece of information we are interested in is the table Symmetric Measnres, which
shows that the Kappa Measure of Agreement value is .56, with a significance of p < .0005.
According to Peat (2001., p. 228) a value of .5 for Kappa represents moderate agreement,
above .7 represents good agreement, and above .8 represents very good agreement. So in this
example the level of agreement between the classification of cases as depressed using the
EPDS and the DASS-Dep is good.

57
SPSS LAB MANUAL

WEEK-8 Assignment
-8

Q1) Using data file staffsurvey3ED.sav, Apply the chi-square test for independence to
compare the proportion of permanent versus casual staff (employ status) who indicate
they would recommend the organization as a good place to work (recommend).

Q2) Using data file staffsurvey3ED.sav, Apply chi-square test for independence to
compare the proportion of males and females (gender) who indicate they have a sleep
problem (problem).

Q3) Using data file staffsurvey3ED.sav, Conduct a Kruskal-Wallis Test to compare staff
satisfaction scores (totsalis across each of the length of service categories (use the
servicegp3 variable).

Q4) Using data file staffsurvey3ED.sav, conduct a Kruskal-Wallis Test to compare the
mean sleepiness ratings (Sleepiness and Associated Sensations Scale total score:
totSAS) for the three age groups defined by the variable agegp3( «=37,38-50,51+).

58
SPSS LAB MANUAL

Linear regression
Linear Regression estimates the coefficients of the linear equation, involving one or more
independent variables, that best predict the value of the dependent variable. For example, you
can try to predict a salesperson’s total yearly sales (the dependent variable) from independent
variables such as age, education, and years of experience.
Example. Is the number of games won by a basketball team in a season related to the average
number of points the team scores per game? A scatterplot indicates that these variables are
linearly related. The number of games won and the average number of points scored by the
opponent are also linearly related. These variables have a negative relationship. As the
number of games won increases, the average number of points scored by the opponent
decreases. With linear regression, you can model the relationship of these variables. A good
model can be used to predict how many games teams will win. Statistics. For each variable:
number of valid cases, mean, and standard deviation. For each model: regression coefficients,
correlation matrix, part and partial correlations, multiple R, R2, adjusted R2, change in R2,
standard error of the estimate, analysis-of-variance table, predicted values, and residuals.
Also, 95%-confidence intervals for each regression coefficient, variance-covariance matrix,
variance inflation factor, tolerance, Durbin-Watson test, distance measures (Mahalanobis,
Cook, and leverage values), DfBeta, DfFit, prediction intervals, and casewise diagnostics.

Plots: Scatterplots, partial plots, histograms, and normal probability plots.

Data: The dependent and independent variables should be quantitative. Categorical


variables, such as religion, major field of study, or region of residence, need to be recoded to
binary (dummy) variables or other types of contrast variables.

Assumptions: For each value of the independent variable, the distribution of the dependent
variable must be normal. The variance of the distribution of the dependent variable should be
constant for all values of the independent variable. The relationship between the dependent
variable and each independent variable should be linear, and all observations should be
independent.

Generating Regression Lines:-


In this topic you will learn how to determine the coefficient and constant (slope and
intercept) for a regression line and how to add a regression line to a scatter plot. You will also
learn more about formatting scatter plots. We will use the data as below. If you have saved
this data, you can double click on it to start SPSS and load the data. Otherwise, start SPSS as
before and go to the data entry screen, SPSS Data Editor. In the first four columns, enter the
data. Click on the Variable View tab at the bottom. Change the variable names to x, y1, y2,
and y3 and the number of decimal places to 0 for each. When you return to the Data View
tab, your screen should look something like this:

59
SPSS LAB MANUAL

1. From the Analyze menu, select Descriptive Statistics->Descriptives.

2. In turn, click on each variable and then the arrow button between panes to add each
variable to the Variables(s) column Your screen should look like:

3. Then click on OK. This will add the mean and standard deviation to your output window
for each variable.

4. Go back to the SPSS Data Editor. From the Analyze menu, select Regression-Linear. In the
new window, select y1 as the dependent variable and x as the independent variable and click
on OK.

60
SPSS LAB MANUAL

This will add a fair amount of information to your output window.In the column B, under
Unstandardized Coefficients you will find the constant term or intercept and the x coefficient
or slope (–1.6).

5. Now to graph the line. Go back the the SPSS Data Editor. From the Graphs menu, select
Interactive Scatter Plot…. Drag x to the horizontal axis and y1 to the vertical axis. Next click
on the Fit tab. In the pull down menu under Method, select Regression.

61
SPSS LAB MANUAL

6. Now Go to the Titles tab and enter the as information that you entered in the last exercise.
Then click on OK.

62
SPSS LAB MANUAL

You should now have a plot on the output window shown in SPSS Viewer. Notice that the
regression line as been added along with its equation. Now, double click on the graph in the
SPSS Viewer. This will allow you to edit the graph. In the bottom caption, change the name
of your text so that it uses italic. Just below the graph is the equation for the line. Move the
equation so that it is in a more open area of the graph and doesn’t obscure the x-axis labels

To Obtain a Linear Regression Analysis:

1.From the menus choose Analyze->Regression->Linear.

Linear Regression dialog box

2. In the Linear Regression dialog box, select a numeric dependent variable. Select one or
more numeric independent variables. Optionally, you can: Group independent variables into
blocks and specify different entry methods for different subsets of variables.

3. Choose a selection variable to limit the analysis to a subset of cases having a particular
value(s) for this variable.

4. Select a case identification variable for identifying points on plots.

5. Select a numeric WLS Weight variable for a weighted least squares analysis. WLS. Allows
you to obtain a weighted least-squares model. Data points are weighted by the

63
SPSS LAB MANUAL

reciprocal of their variances. This means that observations with large variances have less
impact on the analysis than observations associated with small variances. If the value of the
weighting variable is zero, negative, or missing, the case is excluded from the analysis.

Linear Regression Variable Selection Methods:-

Method selection allows you to specify how independent variables are entered into the
analysis.Using different methods, you can construct a variety of regression models from the
same set of variables.

Enter (Regression)
A procedure for variable selection in which all variables in a block are entered in a single
step called Linear Regression.

Stepwise
At each step, the independent variable not in the equation that has the smallest probability of
F is entered, if that probability is sufficiently small. Variables already in the regression
equation are removed if their probability of F becomes sufficiently large. The method
terminates when no more variables are eligible for inclusion or removal.

Remove
A procedure for variable selection in which all variables in a block are removed in a single
step.

Backward Elimination
A variable selection procedure in which all variables are entered into the equation and then
sequentially removed. The variable with the smallest partial correlation with the dependent
variable is considered first for removal. If it meets the criterion for elimination, it is removed.
After the first variable is removed, the variable remaining in the equation with the smallest
partial correlation is considered next. The procedure stops when there are no variables in the
equation that satisfy the removal criteria.

Forward Selection
A stepwise variable selection procedure in which variables are sequentially entered into the
model. The first variable considered for entry into the equation is the one with the largest
positive or negative correlation with the dependent variable. This variable is entered into the
equation only if it satisfies the criterion for entry. If the first variable is entered, the
independent variable not in the equation that has the largest partial correlation is considered
next. The procedure stops when there are no variables that meet the entry criterion. The
significance values in your output are based on fitting a single model. Therefore, the
significance values are generally invalid when a stepwise method (stepwise, forward, or
backward) is used. All variables must pass the tolerance criterion to be entered in the
equation, regardless of the entry method specified. The default tolerance level is 0.0001.
Also, a variable is not entered if it would cause the tolerance of another variable already in
the model to drop below the tolerance criterion.All independent variables selected are added
to a single regression model. However, you can specify different entry methods for different
subsets of variables. For example, you can enter one block of variables into the regression

64
SPSS LAB MANUAL

model using stepwise selection and a second block using forward selection. To add a second
block of variables to the regression model, click Next.

Multiple regression:-
Multiple regression is not just one technique but a family of techniques that can be used to
explore the relationship between one continuous dependent variable and a number of
independent variables or predictors (usually continuous).

Multiple regression is based on correlation, but allows a more sophisticated exploration of


the interrelationship among a set of variables. This makes it ideal for the investigation of
more complex real-life, rather than laboratory-based, research questions. However, you
cannot just throw variables into a multiple regression and hope that, magically, answers
will appear. You should have a sound theoretical or conceptual reason for the analysis and, in
particular; the order of variables entering the equation. Don't use multiple regression as a
fishing expedition.

Multiple regression can be used to address a variety of research questions. It can tell you how
well a set of variables is able to predict a particular outcome. For example, you may be
interested in exploring how well a set of subscales on an intelligence test is able to predict
performance on a specific task. Multiple regression will provide you with information about
the model as a whole (all subscales) and the relative contribution of each of the variables that
make up the model (individual subscales). As an extension of this, multiple regression
will allow you to test whether adding a variable (e.g. motivation) contributes to the predictive
ability of the model, over and above those variables already included in the model. Multiple
regression can also be used to statistically control for an additional variable (or variables)
when exploring the predictive ability of the model. Some of the main types of research
questions that multiple regression can be used to address are:
• how well a set of variables is able to predict a particular outcome;
• which va.·iable in a set of variables is the best predictor of an outcome; and
• whether a particular predictor variable is still able to predict an outcome when the effects of
another variable are controlled for (e.g. socially desirable responding).

Types:-
There are a number of different types of multiple regression analyses that you can use,
depending on the nature of the question you wish to address. The three main types of
multiple regression analyses are:

• Standard or simultaneous;

• Hierarchical or sequential; and

• Stepwise.

Typical of the statistical literature, you will find different authors using different terms when
describing these three main types of multiple regression-very confusing for an experienced
researcher, let alone a beginner to the area! Standard multiple regression In standard multiple
regression, all the independent (or predictor) variables are entered into the equation
65
SPSS LAB MANUAL

simultaneously. Each independent variable is evaluated in terms of its predictive power, over
and above that offered by all the other independent variables. This is the most commonly
used multiple regression analysis. You would use this approach if you had a set of variables
(e.g. various personality scales) and wanted to know how much variance in a dependent
variable (e.g. anxiety) they were able to explain as a group or block. This approach would
also tell you how much unique variance in the dependent variable each of the independent
variables explained.

Hierarchical multiple regression:-


In hierarchical regression (also called sequential regression), the independent variables are
entered into the equation in the order specified by the researcher based on theoretical
grounds. Variables or sets of variables are entered in steps (or blocks), with each independent
variable being assessed in terms of what it adds to the prediction of the dependent variable,
after the previous variables have been controlled for. For example, if you wanted to know
how well optimism predicts life satisfaction, after the effect of age is controlled for, you
would enter age in Block 1 and then Optimism in Block 2. Once all sets of variables are
entered, the overall model is assessed in terms of its ability to predict the dependent measure.
The relative contribution of each block of variables is also assessed.

Assumption of Multiple Regression:-


Multiple regression is one of the fussier of the statistical techniques. It makes a number of
assumptions about the data, and it is not all that forgiving if they are violated. It is not the
technique to use on small samples, where the distribution of scores is very skewed! you will
need 90 cases. More cases are needed if the dependent variable is skewed. For stepwise
regression, there should be a ratio of 40 cases for every independent variable.

Multicollinearity and singularity:-


This refers to the relationship among the independent variables. Multicollinearity
exists when the independent variahles are highly correlated (r=.9 and above). Singularity
occurs when one independent variable is actually a combination of other independent
variables (e.g. when both subscale scores and the total score of a scale are included). Multiple
regression doesn't like multicollinearity or singularity, and these certainly don't contribute to
a good regression model, so always check for these problems before you start. Outliers
Multiple regression is very sensitive to outliers (very high or very low scores).

66
SPSS LAB MANUAL

WEEK-9 Assignment
-9

Q1) A researcher is examining the relationship between stress levels and performance on a
test of cognitive performance. She hypothesizes that stress levels lead to an increase in
performance to a point. and then increased stress decreases performance. She test 10
participants who have following level of
stress:10.94,12.76,7.62,8.17,7.83,12.22,9.23,11.17,11.88 and 8.18. When she test their
levels of mental performance, she finds following cognitive performance scores :
5.24,4.64,4.68,5.04,4.17,6.20,4.54,6.55,5.79 and 3.17. Perform a linear regression to
examine the relationship between these variables. What do these results mean? Create a
scatterplot of the variables.

Q2) The same researcher test 10 more participants, who have the following level of
stress:
16, 20, 14, 21, 23, 19, 14, 20, 17, 10. Their cognitive performance scores are : 5.24,
4.64, 4.68, 5.04, 4.17, 6.20, 4.54, 6.55, 5.79, and 3.17. Perform a linear regression to
examine the relationship between these variables. What do these results mean? Create a
scatterplot of the variables.

Q3) Using the file divorce.sav, test for linear and curvilinear relations between:
a) Physical closeness (close) and life satisfaction (lsatisfy)
b) Attributional style (asq) and life satisfaction (lsatisfy)
c) Create a scatterplot of the variables.

Q4) Using the file anxiety.sav, test for linear and curvilinear relations between anxiety and
exam and create a scatterplot of the variables.

67

You might also like