Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

1

Lab Class 3
Review of Basic Concepts in SPSS

Purpose of the class


• To review:

: Setting up SPSS data files


: Entering data into SPSS

: Generating simple descriptive statistics to view in the output window

: Generating simple graphs of your data


: Recoding

: Random samples

: Producing correlations
: Performing t-tests

: Performing chi square

Exercise 1

Data was collected by a real estate agent in Tullamarine about some characteristics of all houses sold in the area in
the last month.

The data below contains information on 4 variables.

Column 1 is the size in squares of the house.


Column 2 is the price in 000’s.
Column 3 is the age of the house in years.
Column 4 is the land size in square metres.
2
20 260 5 420
15 240 8 640
20 245 9 600
13 210 12 590
18 230 9 700
14 242 7 720
28 295 1 624
16 235 4 590
24 287 2 710
20 252 11 630
23 270 5 700
25 275 2 710
18 290 10 600
15 265 7 570
18 255 10 610
17 245 8 540

1. Enter the data into SPSS with appropriate labels


2. Describe the characteristics of the houses sold-use graphs and descriptive statistics
2. Produce a scattergram of the relationship between the house size and the sale price
3. Produce a scattergram of the relationship between the age of the house and the sale price
4. Produce a scattergram of the relationship between the landsize and the sale price
5. Find the correlation between the house size and the sale price
6. Find the correlation between the age of the house and the sale price
7. Find the correlation between the landsize and the sale price
8. What kinds of relationships are reflected in the data?
9. Recode the sale price into a new variable with two categories (houses with a sale price under 250=1, houses
with a sale price over 250=2).
10. Find the average size in squares of the houses, the average age of the houses and the average land sizes of the
houses according to the variable you created in the previous step.
3
Exercise 2

Data was collected on 40 people to indicate whether they had a high or low income and whether they lived in an
urban or rural area.

The data below contains information on these two variables.

The data in column 1 is an indication of whether the respondent lived in an urban or rural area (urban =1, rural=2).
Column 2 is an indication of whether the respondent had a “high” or a “low” income (low=1, high=2).

1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
2 1
2 1
2 1
2 1
2 1
2 1
4
2 1
2 1
2 1
2 1
2 1
2 1
2 2
2 2
2 2
2 2
2 2
2 2

1. Enter the data into SPSS

2. Perform a crosstabs and find out, a) the frequency (and percentage) of urban respondents who had a high and
low income and b) the frequency (and percentage) of rural dwellers who had a high and low income.

3. A chi square statistic is a measure of association between two categorical variables. You may be interested in
knowing for instance whether your level of income is dependent upon whether you happen to be in an urban or
rural area or whether income is INDEPENDENT of locality.
This is a really useful statistic to have some familiarity with because in social research we’re often dealing with
purely categorical variables. The chi square statistic compares the OBSERVED frequencies (i.e. those that you
obtained in your data) with the frequencies that would be EXPECTED if there was no relationship between the two
variables (i.e. if they were INDEPENDENT).

Remember that in chi square we’re interested in the issue of INDEPENDENDCE vs. DEPENDENCE. So if
you’re just as likely to have a high or a low income in an urban area we can say that income is INDEPENDENT of
LOCALITY. In this case your chi square statistic WILL NOT be significant (i.e. sig >0).

To produce the chi square statistic go back to CROSSTABS. Click on STATISTICS check CHI SQUARE click
CONTINUE then on CELLS and under RESIDUALS check ADJ. STANDARDIZED.

A significance level greater than 0 indicates that there is a relationship between income and locality.
The adjusted standardized residuals give you an indication of where the OBSERVED frequencies differ
significantly from the EXPECTED frequencies.
As a rough guide you should look for adjusted standardized residuals BIGGER than +2 or –2.
5
A large positive residual tells you that there are more observations in that cell than you’d expect if the variables
were independent. A large negative residual tells you that you are fewer observations in that cell than you’d
expect if the variables were independent.

Exercise 3

Data was collected on 20 people who enrolled in a new fitness program at a gym.

The data below contains information on 3 variables.

Column 1 is the gender of the respondent (1=female, 2=male).


Column 2 is the health rating of the respondent (1=unhealthy, 2=Healthy, 3=very healthy)
Column 3 is the age of the respondent in years.

2 3 18
2 3 26
1 2 20
2 1 19
1 3 25
2 1 22
1 1 23
2 3 19
1 2 18
2 2 24
2 1 21
1 3 27
2 3 22
2 3 19
1 1 23
1 2 25
2 1 23
1 1 19
2 2 21
2 3 20
6
1. Enter the data into SPSS
2. Produce some simple descriptions of the people in your sample-produce graphs as well
3. What percentage of males and females are in the different health categories?
4. Is there a relationship between gender and health rating?
5. Find out if the mean age of males and females is significantly different?
6. Perform an ANOVA to see if the average age of the participants varies according to health rating.

Exercise 4

Open the SPSS data file DIETSTUDY.SAV that can be found by clicking
FILE →OPEN→DATA then double click on the TUTORIAL folder then double click on the
SAMPLE_FILES folder then open DIETSTUDY.SAV

This file contains data from 16 individuals who underwent a weight reduction program. The file records the
participant’s age and gender, their triglyceride levels at the start of the program (tg0) and their weight in pounds (it
is an American study) at the start of the program (wgt0). The subsequent triglyceride and weight measurements
were taken at 2 weekly intervals after the commencement of the program.

Note: Triglycerides are a fat that shows up in a blood test (you’re probably more familiar with the term cholesterol
test), higher readings indicate higher levels of the fat and are said to be a risk factor for heart disease, stroke etc.

a) Write a paragraph that describes the main attributes of the sample at the beginning of the study.
7
b) Perform a test to determine if the average triglyceride reading for the participants decreases between the first
and the last measurement.

Describe the test you performed and the findings below.

c) Perform a test to determine if the average weight for the participants decreases between the first and the last
measurement.

Describe the test you performed and the findings below.

d) Perform tests to determine if males and females have i) different triglyceride levels at the start of the program
and ii) different weights at the start of the program

Describe the tests you performed and the findings below.


8
e) Is there a relationship between triglyceride level and weight on the first reading (tg0 and wgt0)?
Describe the relationship and the two ways that you can use to determine this below. Is the relationship the same
on the final measurement of triglycerides and weight (tg4 and wgt4)? How do you know?

You might also like