Lab 3 Solutions

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

35100 Introduction to Sample Surveys – Lab 3

How to Analyse Questionnaire Data using Inference

3.1. Recall our Attitudes to the Library data set. In lab 1 we looked at how to enter data
into SPSS and how to analyse the data using descriptive statistics. We found from our
descriptive analysis that there was a tendency within the survey group for females to have
a more positive attitude to the service offered by the Library. We may want to use our
sample to determine whether this result can be generalised. We would do this using
inferential statistics. (Note that our Attitudes to the Library sample is only composed of
12 respondents, and is really too small for statistical inference.)

If we consider the Likert scale in Question 2 as representing a categorical variable, the


appropriate method to investigate the relationship between this variable and Question 1 is
to construct a cross tabulation and perform a chi-square test. Produce a cross tabulation
using Analyze>Descriptive Statistics>Crosstabs and within the dialogue box click on
Statistics and choose Chi-Square.

Is the difference between the males and females significant (i.e. can the conclusion be
generalised to the wider population)? The SPSS output gives us a warning that the results
of our hypothesis test may not be reliable. What is this warning?

There is no significant difference between males and females in terms of their opinion of
the library (Chi-Square=4.20, p=0.241).
The warning tells us that 100% (> 20%) of the cells have an expected count of less than
5. If this proportion is greater than 20%, then the test is considered unreliable due to lack
of data.

3.2. The literature is divided as to whether Likert scales must be considered categorical or
can be treated as quantitative. If we assume that the distance between each consecutive
category in the Likert scale is constant, we can treat Question 2 as a quantitative variable.
In this case, the appropriate method to investigate the relationship between this variable
and Question 1 is to carry out an independent samples t test. Click on Analyze>Compare
Means>Independent Samples T Test. Select Q2 as the Test Variable and Q1 as the
Grouping Variable. You will need to Define Groups for Q1 as 1 and 2.

Can we assume equal variances?


What is your conclusion about differences between the mean scores on Question 2 of
males and females?

Using Levene’s test, we can assume that the variances of the two groups are equal
(F=0.000, p-value = 1.000). We then find that there is a significant gender difference in
the opinion score of Q2 (Independent samples t, t=-2.301, p=0.044).

35100 Introduction to Sample Surveys week 3 1


3.3. Say we have asked our respondents their age in years. The ages of the 12 survey
respondents are respectively: 17, 19, 20, 18, 18, 38, 30, 18, 26, 24, 21 and 29. Enter these
ages into the Attitudes to the Library data and use Analyze > Descriptive Statistics >
Explore to examine the age structure of the respondents.
What are the main features of the respondents’ age structure?

It appears that the ages range from 17 years old to 38 years old with a mean of about 23.
It appears that the distribution is moderately skewed to the right.

3.4. We might find it more useful to recode our age data into age categories. Click on
Transform>Recode Into Different Variables. Select age as the Input Variable and
agegp (label age group) as the Output Variable. Click on Old and New Values. Use
this dialogue window to transform the old values 11-19 to the new value of 1; 20-29 to
the new value of 2; and 30-39 to the new value of 3 (Using the Range option for Old
Value and don’t forget to click Add to confirm each range before starting the next). Click
Continue. You will need to click Change and then OK. You have now created a new
variable with values 1, 2 and 3 representing the three age groups 11-19, 20-29 and 30-39.
You will need to define the variable in the Variable View window and define the
appropriate value labels.

What technique would you use to test whether the mean score on Question 2 (assuming
quantitative) significantly differs between age groups?
Find this technique and carry out the analysis. What are the results?

In this situation, we would use Analysis of Variance (ANOVA) to analyse the data.

We find that there are no significant differences in the opinions of the users between the
three age groups (ANOVA, F=1.893, p=0.206).

3.5. Suppose instead of asking the respondents their age in years we had asked them their
date of birth. These are the responses:
29 March 1996, 14 June 1994, 5 January 1993, 16 November 1994, 8 March 1995,
25 October 1974, 13 May 1983, 12 December 1994, 16 September 1986, 23 April 1989,
18 June 1992, 4 November 1983.
Create a variable in the Variable View Window called dob (date of birth). The Variable
Type will be Date. Select the format appropriate for entering the dates as, for example,
29.03.1996. Enter the dates of birth in the Data Window.

35100 Introduction to Sample Surveys week 3 2


3.6. Suppose the date of the survey was 1st July 2013. We can calculate the age of the
respondents at this date.

Create a column named now in Date format with every entry being the date 1st July 2013.
Click on Transform>Compute Variable, write years as the Target Variable and the
Numeric Expression (now-dob)/(60*60*24*365).

Alternatively, just click on Transform>Compute Variable, write years at the Target


Variable and the Numeric Expression
(DATE.DMY(01,07,2013)-DOB)/(60*60*24*365)

What do the figures in the column years represent, and how do they arise from the
numeric expression? The following link might be of assistance:
http://www.ats.ucla.edu/stat/spss/library/dates.htm

The figures in the column “years” give the age of the person as at July 1 2013. SPSS
measures time in seconds, so the original difference in time is also measured in seconds.
The denominator in the expression above converts seconds to years. We assume that both
measurements are taken at the same time of day.

3.7. The values in years are not in exactly the right form for our analysis. Try Transform
> Compute Variable, write rnd as the Target Variable and the Numeric Expression
RND(years). Also, try Transform > Compute, write trunc as the Target Variable and
the Numeric Expression TRUNC(years).

What is the difference between rnd and trunc?


Which would be the most appropriate for our survey?

TRUNC rounds the number down to the largest whole number that is less than the value,
whereas RND rounds the number to the nearest whole number. These two functions will
give a different result when the fractional part of the year is greater than or equal to 0.5.

For our purposes, we would use TRUNC. We do this because it is traditional to quote
your age in the number of birthdays that you have had (which only occurs once the year
is complete).

35100 Introduction to Sample Surveys week 3 3


Answer the following question using the excel spreadsheet ‘One Sample Power.xlsx’

3.8 A researcher at a university wants to determine whether there has been an increase in
the average cost per student of textbooks. Previous data suggests that the average cost per
student per semester was $350 with a standard deviation of $80. The researcher considers
that an increase of $50 to $400 per semester would be considered of practical importance,
and he would like to be 90% sure of being able to detect this difference should it exist.
Assuming a type one error rate of 5% and a population standard deviation of $80, what
sample size would he need?

We need a sample size of at least 22.

What sample size would he need if he wanted to be 99% sure of being able to detect this
difference?

We now need a sample size of at least 41.

Answer the following question calculating by hand

3.9 A chain of fitness centres wants to be able to estimate the proportion of its clients
Australia wide who would use an on-site health food restaurant if it were provided. The
chain is prepared to accept a margin of error of 10% and a level of confidence of 90%.
They have no prior idea of what the actual proportion might be. What sample size is
required?

N ≥ (Z2α/2 p(1-p)) / m2
= (1.642 × 0.5 (1-0.5))/0.12
= 67.24

So we need a sample size of at least 68.

35100 Introduction to Sample Surveys week 3 4

You might also like