Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Assignment.

Multiple Regression Analysis

Dear student,
Below you find the assignment of Week 5 of Research and Statistics.
- Make the assignment using SPSS and put the answers/tables in the file below.
- Upload the assignment as pdf in Canvas.
- Go to the internet quiz https://tueindhoven.limequery.com/488385?newtest=Y&lang=en
and fill in the answers there.

Please enter the id-number that is represented on your student ID-card in the
orange section of the card.

The computer will process your answers and you will receive personalized feedback on what
you did correct or wrong.

Contents:
• multiple regression for predicting and explaining
• method stepwise and enter
• multicollinearity
• dummy regression

Roman Oana
1656309

1
Case 1: predicting thermal sensation of museum visitors
Open the file ‘museum2_s.sav’. It contains the following variables:
therm_sens : perceived warmth (how do you feel at the moment?):
-3=cold, -2=cool, -1=slightly cool, 0=neutral, 1=slightly warm, 2=warm and 3=hot
temp_in : the current temperature inside the building (degrees Celsius)
rh_in : the humidity degree of the air inside the building
gender : 0=male, 1=female
age : age of the person in years
clo : index from 1 to 10 of how warm the clothes of the visitor are

We want to predict therm_sens using as predictors the variables: gender, age, clo, temp_in and rh_in.

1. Study the variables (mean, st.dev, frequency distribution).


What percentage of the visitors is male? 35.9
What is the average humidity in the data? 50.4
How many children (<18) were there among the visitors? 19
How many retired people (>=65) were there among the visitors? 492
For the last two questions, make a new variable age2 that takes values “under 18”, “between 18
and 64” and “65 and older”.

Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Age 1140 9 91 55.82 17.827
gender 1140 0 1 0.64 0.480
therm_sens 1140 -3 3 -0.7 0.902
Clo 1140 1 10 10 1.276
temp_in 1140 19 24 21.18 1.107
rh_in 1140 40 60 50.40 3.285
Valid N (listwise) 1140

age2

Cumulative
Frequency Percent Valid Percent Percent

Valid under 18

between 18 and 64

65 years or older

Total

2
gender
Cumulative
Frequency Percent Valid Percent Percent
Valid 0 409 35.9 35.9 35.9
1 731 64.1 64.1 100
Total 1140 100 100

2. Make a correlation matrix of all variables.


Is there a problem of multicollinearity? no
Explain.
A. There is no correlation of 0.5 or higher between the predictors
B. There is no correlation of 0.7 or higher between the predictors
C. There is no correlation of 0.9 or higher between the predictors

Correlations
therm_sens gender age clo temp_in rh_in
therm_sens Pearson Correlation 1 -0.079 -0.028 0.140 0.211 -0.164
Sig. (2-tailed) 0.008 0.344 0.001 0.001 0.001
N 1140 1140 1140 1140 1140 1140
gender Pearson Correlation -0.079 1 0.057 -0.042 0.002 0.013
Sig. (2-tailed) 0.008 0.054 0.158 0.956 0.655
N 1140 1140 1140 1140 1140 1140
Age Pearson Correlation -0.028 0.057 1 0.045 -0.064 -0.071
Sig. (2-tailed) 0.344 0.054 0.126 0.031 0.061
N 1140 1140 1140 1140 1140 1140
Clo Pearson Correlation 0.140 -0.042 0.045 1 -0.099 -0.225
Sig. (2-tailed) 0.001 0.158 0.126 0.001 0.001
N 1140 1140 1140 1140 1140 1140
temp_in Pearson Correlation 0.211 0.002 -0.064 -0.099 1 0.115
Sig. (2-tailed) 0.001 0.956 0.031 0.001 0.001
N 1140 1140 1140 1140 1140 1140
rh_in Pearson Correlation -0.164 0.013 -0.071 -0.225 0.115 1
Sig. (2-tailed) 0.001 0.655 0.061 0.001 0.001
N 1140 1140 1140 1140 1140 1140
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).

3
3. Make a regression model to predict therm_sens based on the other available variables
(gender, age, Clo, tmep_in and RH_in). Use the ENTER method.
In table below:
- Report the (unstandardized) coefficients.
- Report the significance level:
A. Not significant
B. Significant at 10%
C. Significant at 5%
D. Significant at 1%

coeff significance
temp_in 0.196 D
rh_in -0.045 D
gender -0.133 C
Age -0.001 A
Clo 0.088 D

Coefficientsa
B Std. Error Beta P-value
1 (Constant) -2.109 0.637 -3.311 0.001
Gender -0.133 0.053 -0.071 -2.501 0.013
Age -0.001 0.001 -0.026 -0.919 0.358
Clo 0.088 0.020 0.125 4.302 0.001
temp_in 0.196 0.023 0.241 8.466 0.001
rh_in -0.045 0.008 -0.164 -5.653 0.001
a. Dependent Variable: therm_sens

4. Interpret the coefficients:


temp_in:
A. Controlling for all other variables, the score on thermal sensation increases with 0.196 if the indoor
temperature increases with 1 degree;
B. Controlling for all other variables, the score on thermal sensation decreases with 0.196 if the indoor
temperature increases with 1 degree;
C. Controlling for all other variables, the score on thermal sensation increases with 19.6% if the indoor
temperature increases with 1 degree;
D. Controlling for all other variables, the score on thermal sensation decreases with 19.6% if the indoor
temperature increases with 1 degree.

gender:
A. Controlling for all other variables, the score on thermal sensation decreases with 0.133 if the visitor
is male;
B. Controlling for all other variables, the score on thermal sensation decreases with 0.133 if the visitor
is female;

4
5. What is the goodness-of-fit of the model? 0.102
Is this a high or a low fit?
A High
B Low
Model Summary
Adjusted R Std. Error of
Model R R Square Square the Estimate
1 0.319 0.102 0.098 0.857
a. Predictors: (Constant), rh_in, gender, age, temp_in, clo

6. Predict the thermal sensation.


Predict the thermal sensation for a 90 year old female visitor wearing clothes with index 5, when the
inside temperature is 20 degrees and humidity is 50%.
Only include significant coefficients in your model. _-0.122_

7. Compare males with females


How much would the thermal sensation of a female visitor differ, as compared to a male visitor, other
things (age, temperature, etc.) being equal? -0.149
Would the female in question feel colder or warmer than the comparable male?
A. Colder
B. Warmer

5
Case 2: predicting people’s satisfaction with green in
neighborhood public spaces.
The dataset you are going to analyze in this case is gathered from the questionnaire ‘green in public
spaces’.

The survey includes an experiment where the respondent is asked to indicate his or her evaluation of
hypothetical neighborhoods. The neighborhoods that are presented vary in terms of a number of
attributes. The participants are randomly allocated to either one of the two versions of the experiment.
In both versions VR is used to present the neighborhoods. The versions differ in the mode used – a
game like mode where the respondent can self-navigate and a movie mode where the respondent
seemingly walks through the environment without this possibility.

The main research question is what are the effects of the attributes varied in the experiment on the
satisfaction judgement and what is the effect of VR mode on this?

Each respondent has received four neighborhood variants. Each row in the SPSS table is a judgement
of a neighborhood by a person (so there are 4 rows for each person). A judgment is the unit of
analysis.

Open the file ‘greenspace.sav’

1. satgreen Degree of satisfaction with the neighborhood (increasing scale from 0 to 24)
2. gender Gender (1=Male, 0=Female)
3. garden Private garden around your house ( 1=yes, 0=no)
4. size Size of the public space (1=1500m2, 0=750m2)
5. surface Surface of the public space(1=grass, 0=pavement)
6. water Water element in open space ( 1=yes, 0=no)
7. streetgrass Grass along the street (1=yes, 0=no)
8. tree Trees(1=yes, 0=no)
9. vertgreen Vertical greening on facades (1=yes, 0=no)
10. avestories Average number of surrounding building stories (1=six stories, 0=three
stories)
11. mode Mode of the presented virtual environment (0=movie, 1=game)

The variables 4-10 are attributes of the neighborhoods that are varied in the satisfaction judgement
task.

8. Report the mean and standard deviations of the variable ‘satgreen’


Mean st.dev.
Satgreen 14.64 5.024

Report the frequency distribution of the variable ‘mode’

Mode
Frequency Percent Valid Percent Cumulative
Percent
Valid Movie
524 52.2 52.2 52.2

game
480 47.8 47.8 47.8

Total
1004 100 100 100

6
9. Conduct an independent samples t-test to determine whether there is a difference in the
average satisfaction score between the two modes.
Define Mode as the group variable and Satgreen as the dependent variable. Conduct the test two-
tailed, since we have no a-priori expectation of in which way the modes differ.

9a. What is the average satisfaction score in the movie-mode group? __14.47___

9b. What is the average satisfaction score in the game-mode group? __14.82___

10. Report the t-value of this test (choose the right t-value depending on whether equal
variances may or may not be assumed). The t-value is __-1.089___

11. Is the t-value significant? What is your conclusion?


A. The t-value is significant. The conclusion is: there is a difference between the two groups in
average satisfaction score
B. The t-value is not significant. The conclusion is: there is no difference between the two groups in
average satisfaction score

12. Make a correlation matrix of all variables in the file.


Report the correlations of all variables with the variable ‘satgreen’. Note: it is possible to use a correlation
analysis provided that the nominal/ordinal variables are binary (0/1 variables).

Correlation coefficient
Gender -0.049
garden 0.061
Size of the public space -0.020
Surface of the public space 0.183
water 0.094
Grass along the street 0.097
Trees 0.471
Vertical greening on facades 0.206
Average number of surrounding building
0.053
stories
Mode 0.280

13. Which of the variables has the strongest correlation with satisfaction and which one the
weakest?

13a. The strongest …


A. Gender
B. Garden
C. Size of the public space
D. Surface of the public space
E. water
F. Grass along the street
G. Trees
H. Vertical greening on facades
I. Average number of surrounding building stories
J. Mode

13b. The weakest …


A. Gender
B. Garden
C. Size of the public space
D. Surface of the public space
E. water
F. Grass along the street
G. Trees

7
H. Vertical greening on facades
I. Average number of surrounding building stories
J. Mode

Focus now on the correlation coefficients of the variables that are related to attributes of the
neighborhood
14. Do all the correlations that ARE significant have the expected sign?
Report in the table below

A. Not significant
B. No
C. Yes
D. No prior expectation

Expected sign

Size of the public space A


Surface of the public space A
Water A
Grass along the street A
Trees C
Vertical greening on facades C
Average number of surrounding building
A
stories

15. Check the correlations of the independent variables mutually


Is there a risk of multicollinearity occurring in the regression analysis? Explain
A. No, there is no correlation of 0.5 or higher between the predictors
B. No, there is no correlation of 0.7 or higher between the predictors
C. No, there is no correlation of 0.9 or higher between the predictors

16. Conduct a multiple regression-analysis with satgreen as dependent variable and all other
variables as predictors.
Use the ENTER method.
Report the unstandardized coefficients from the final model, and
report the significance level:
A. Not significant
B. Significant at 5%
C. Significant at 1%
coefficient significance
Gender -0.356 A
Garden 0.768 C
Size of the public space -0.034 A
Surface of the public space 1.962 C
water 0.656 B
Grass along the street 0.735 C
Trees 4.684 C
Vertical greening on facades 2.063 C
Average number of surrounding building 0.384 A
stories
Mode 0.191 A

Coefficientsa

8
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 9.108 0.456 19.962 0.001

gender -0.356 0.269 -0.035 -1.324 0.186

Garden 0.768 0.282 0.071 2.718 0.007

size -0.034 0.264 -0.003 -0.130 0.896

surface 1.962 0.264 0.196 7.438 0.001

water 0.656 0.264 0.065 2.490 0.013

streetgrass 0.735 0.264 0.073 2.782 0.006

tree 4.684 0.264 0.467 17.770 0.001

vertgreen 2.063 0.264 0.206 7.827 0.001

avestories 0.384 0.264 0.038 1.455 0.146

mode 0.191 0.264 0.019 0.723 0.470

a. Dependent Variable: Satgreen

A first question is whether there is an influence of VR mode on the average satisfaction score
(satgreen). The independent samples t-test already provided an answer of this question. However, the
t-test is based on a bivariate analysis. The multiple regression analysis which you do next is a
multivariate analysis. This provides a more definite answer since any differences (of the two mode
groups) on other independent variables are then controlled for.

17. Interpret the coefficients


17a. Which of the following statements is true?
A. Controlling for all other variables, the satisfaction score (satgreen) increases with XXX points
when the Movie mode is applied.
B. Controlling for all other variables, the satisfaction score (satgreen) decreases with XXX points
when the Movie mode is applied.
C. There is no significant effect of mode on the satisfaction judgment (assume XXX = 0).

17b. The value of XXX is __0.191__ (fill in the value of XXX that is referred to in the answer
options A, B and C)

A second question is what the preferences of people (students of the Research and statistics course)
are regarding the different attributes.

Indicate in the table below which attributes have a significant effect on satgreen, use:
A. Not significant
B. Significant at 5%
C. Significant at 1%

Significance
size Size of the public space (1=1500m2, 0=750m2) A
surface Surface of the public space(1=grass, 0=pavement) C
water Water element in open space ( 1=water, 0=no water) B
streetgrass Grass along the street (1=yes, 0=no) C

9
tree Trees (1=yes, 0=no) C
vertgreen Vertical greening on facades (1=yes, 0=no) C
avestories Average number of surrounding building stories (1=six
A
stories, 0=three stories)

Interpret the coefficient for size (size of the public space)


18a. Which of the following statements is true?
A. Controlling for all other variables, the satisfaction score increases with XXX points when size of the
public space is 1500 m2 compared to when it is 750 m2
B. Controlling for all other variables, the satisfaction score decreases with XXX points when size of the
public space is 1500 m2 compared to when it is 750 m2
C. There is no significant effect of size of public space on the satisfaction score (assume XXX = 0).

18b. The value of XXX is __-0.034__ (fill in the value of XXX that is referred to in the answer
options A, B and C)

Which attribute has the largest influence on the degree of satisfaction? (Make sure that you look at the
standardized beta coefficient).
19. The most important attribute is: ___Trees___

20. Given a public space which has the following characteristics:


Size of the public space 750m2
Surface of the public space grass
water No
Grass along the street No
Trees Have trees
Vertical greening on facades Yes
Average number of surrounding
6
building stories
Showing mode movie

Predict the satisfaction score of the neighborhood of a female student who has a private
garden.
The predicted satisfaction score is ___18.969___

21. Run the last regression again using only game mode data, report the Model summary ( R2)
and Coefficients. (choose from the menu Data > Select cases and then set condition Mode = 1)

Model Summary
Model R R Square Adjusted R Square Std. Error of the
Estimate
1 0.539 0.291 0.277 3.947

a. Predictors:

Coefficientsa

10
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 9.985 0.606 16.470 0.001

gender -0.762 0.372 -0.081 -2.047 0.041

garden 1.597 0.403 0.158 3.960 0.001

size -0.359 0.362 -0.039 -0.990 0.323

surface 1.655 0.363 0.178 4.566 0.001

water 0.294 0.362 0.032 0.812 0.417

streetgrass 0.386 0.363 0.042 1.063 0.288

tree 3.991 0.362 0.430 11.009 0.001

vertgreen 2.230 0.363 0.240 6.150 0.001

avestories 0.194 0.364 0.021 0.535 0.593

a. Dependent Variable: Satgreen

Then run the regression analysis and now using only movie mode data.

Model Summary
Model R R Square Adjusted R Square Std. Error of the
Estimate
1 0.605 0.366 0.354 4.288

a. Predictors:

Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 8.487 0.636 13.352 0.001

gender -0.151 0.395 -0.14 -0.384 0.701

garden 0.128 0.402 0.011 0.318 0.751

size 0.279 0.378 0.026 0.738 0.461

surface 2.193 0.378 0.206 5.803 0.001

water 1.015 0.378 0.095 2.686 0.007

streetgrass 0.955 0.378 0.089 2.520 0.012

tree 5.370 0.378 0.503 14.177 0.001

vertgreen 1.849 0.378 0.173 4.893 0.001

avestories 0.658 0.378 0.062 1.742 0.082

a. Dependent Variable: Satgreen

Compare the two sets of estimated values for the coefficients of the 7 attributes.

Just based on the size, sign and significance of the coefficients …

11
22. Are there notable differences between the two sets of estimates? And what does that tell
you?
A. The estimates agree more or less with each other. This indicates that the two modes do not have
a big impact on the measurement of preferences
B. There are substantial differences between the estimates. This indicates that the two modes do
have an impact on the measurement of preferences

23. Report the explained variance.


23a Game mode____0.291____
23b Movie mode____0.366____

24. Which mode offers a more accurate prediction of the satisfaction score for the
neighborhood?
A. Game mode
B. Movie mode

25. What does this tell you?

A. The Game mode is probably a more reliable method to measure preferences


B. The Movie mode is probably a more reliable method to measure preferences
C. No such conclusion is possible

12

You might also like