Ester - Paksuniemi - Assignment3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Quantitative methods course Spring 2024 - Assignment 3, Ester Paksuniemi

PART 1

MODEL 1

1. a) Model 1 is made in RStudio with regression.

> model1 <- lm(nmbrpartners ~ female + nmbrcohabitant + reg2 + reg3 + drunkdummy + homobi +
edu2 + edu3 + age + debutage + single + children, data = sexb, subset = (nmbrpartners>0))

> summary(model1)

Call:
lm(formula = nmbrpartners ~ female + nmbrcohabitant + reg2 +
reg3 + drunkdummy + homobi + edu2 + edu3 + age + debutage +
single + children, data = sexb, subset = (nmbrpartners >
0))

Residuals:
Min 1Q Median 3Q Max
-41.27 -7.85 -2.52 2.82 763.24

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.88119 3.92999 6.077 1.43e-09 ***
female -7.69157 1.10841 -6.939 5.12e-12 ***
nmbrcohabitant 5.68312 0.68938 8.244 2.78e-16 ***
reg2 -2.90352 1.34557 -2.158 0.031045 *
reg3 -3.40169 1.39393 -2.440 0.014748 *
drunkdummy 6.50534 1.50032 4.336 1.51e-05 ***
homobi -1.61647 3.51650 -0.460 0.645789
edu2 3.03007 1.45789 2.078 0.037784 *
edu3 5.33388 1.67270 3.189 0.001448 **
age 0.16949 0.04904 3.456 0.000558 ***
debutage -1.51317 0.18112 -8.354 < 2e-16 ***
single 4.56887 1.41635 3.226 0.001274 **
children 0.26029 1.45084 0.179 0.857632
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 25.4 on 2281 degrees of freedom


(452 observations deleted due to missingness)
Multiple R-squared: 0.1117, Adjusted R-squared: 0.107
F-statistic: 23.9 on 12 and 2281 DF, p-value: < 2.2e-16

1. b) Adding lengthrel to model1 and rerun the regression in RStudio. The following
commands is made in RStudio:
> model1 <- lm(nmbrpartners ~ female + nmbrcohabitant + reg2 + reg3 + drunkdummy + homobi +
edu2 + edu3 + age + debutage + single + children + lengthrel, data = sexb, subset =
(nmbrpartners>0)) # adding lengthrel to model1

> summary(model1)

Showing only a part of the output for age and lengthrel:


Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.79261 4.14114 6.228 5.63e-10 ***
age 0.26457 0.06496 4.073 4.81e-05 ***
lengthrel -0.15396 0.06858 -2.245 0.02488 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 25.71 on 2192 degrees of freedom


(540 observations deleted due to missingness)
Multiple R-squared: 0.1148, Adjusted R-squared: 0.1096
F-statistic: 21.88 on 13 and 2192 DF, p-value: < 2.2e-16

1. c) Generating timeberel. Showing commands and output for only age and timeberel.
> sexb$timeberel <- sexb$age - sexb$lengthrel

> model1 <- lm(nmbrpartners ~ female + nmbrcohabitant + reg2 + reg3 + drunkdummy + homobi +
edu2 + edu3 + age + debutage + single + children + timeberel, data = sexb, subset =
(nmbrpartners>0)) # removing lengthrel and adding timeberel

> summary(model1)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.79261 4.14114 6.228 5.63e-10 ***
age 0.11061 0.05759 1.921 0.05492 .
timeberel 0.15396 0.06858 2.245 0.02488 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 25.71 on 2192 degrees of freedom


(540 observations deleted due to missingness)
Multiple R-squared: 0.1148, Adjusted R-squared: 0.1096
F-statistic: 21.88 on 13 and 2192 DF, p-value: < 2.2e-16

1. d) Removing age and adding lengthrel to the regression in RStudio. Showing


commands and output for only lengthrel and timeberel.

> model1 <- lm(nmbrpartners ~ female + nmbrcohabitant + reg2 + reg3 + drunkdummy + homobi +
edu2 + edu3 + debutage + single + children + timeberel + lengthrel, data = sexb, subset =
(nmbrpartners>0)) # removing age and adding lengthrel

> summary(model1)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.79261 4.14114 6.228 5.63e-10 ***
timeberel 0.26457 0.06496 4.073 4.81e-05 ***
lengthrel 0.11061 0.05759 1.921 0.05492 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 25.71 on 2192 degrees of freedom


(540 observations deleted due to missingness)
Multiple R-squared: 0.1148, Adjusted R-squared: 0.1096
F-statistic: 21.88 on 13 and 2192 DF, p-value: < 2.2e-16

1. e) Adding age to the regression in RStudio with the following commands and output:
> model1 <- lm(nmbrpartners ~ female + nmbrcohabitant + reg2 + reg3 + drunkdummy + homobi +
edu2 + edu3 + debutage + single + children + timeberel + lengthrel + age, data = sexb,
subset = (nmbrpartners>0)) # adding age

> summary(model1)

Call:
lm(formula = nmbrpartners ~ female + nmbrcohabitant + reg2 +
reg3 + drunkdummy + homobi + edu2 + edu3 + debutage + single +
children + timeberel + lengthrel + age, data = sexb, subset = (nmbrpartners >
0))

Residuals:
Min 1Q Median 3Q Max
-38.94 -7.91 -2.48 2.71 765.14

Coefficients: (1 not defined because of singularities)


Estimate Std. Error t value Pr(>|t|)
(Intercept) 25.79261 4.14114 6.228 5.63e-10 ***
female -7.71120 1.14271 -6.748 1.91e-11 ***
nmbrcohabitant 5.02618 0.76217 6.595 5.33e-11 ***
reg2 -3.09453 1.38141 -2.240 0.02518 *
reg3 -3.55065 1.44220 -2.462 0.01389 *
drunkdummy 6.37658 1.55449 4.102 4.24e-05 ***
homobi -1.68769 3.62933 -0.465 0.64197
edu2 3.23204 1.51133 2.139 0.03258 *
edu3 5.56830 1.73378 3.212 0.00134 **
debutage -1.68160 0.19400 -8.668 < 2e-16 ***
single 2.55909 1.71031 1.496 0.13473
children 0.54958 1.50090 0.366 0.71428
timeberel 0.26457 0.06496 4.073 4.81e-05 ***
lengthrel 0.11061 0.05759 1.921 0.05492 .
age NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 25.71 on 2192 degrees of freedom


(540 observations deleted due to missingness)
Multiple R-squared: 0.1148, Adjusted R-squared: 0.1096
F-statistic: 21.88 on 13 and 2192 DF, p-value: < 2.2e-16

1. f) Creating a correlation matrix that shows the correlations between age, timeberel,
and lengthrel. Including only the observations that were included in regression 1e
above. The following commands and output come from RStudio:
> cor(model1$model)
> cor(model1$model[ c("age", "timeberel", "lengthrel") ], use ="complete" )

age timeberel lengthrel


age 1.0000000 0.4722650 0.6566955
timeberel 0.4722650 1.0000000 -0.3546213
lengthrel 0.6566955 -0.3546213 1.0000000
1. g)

In model 1b the dependent variable nmbrpartners (Intercept) is 25.79261, and is the same in
model 1c and 1d as well. It means that when all other variables are zero, the predicted value on
the dependent variable is that number. In 1b, the coefficient for age is 0.26457. This indicates
that for every one unit increase in age, the predicted value of the dependent variable
nmbrpartners increases by 0.26457 in the second regression. The coefficient for lengthrel is -
0.15396. This indicates that for every one unit increase in lengthrel, the predicted value of the
dependent variable decreases by 0.15396.

In model 1c, age is 0.11061, slightly decreased from model 1b, and means that for every one
unit increase in age, the predicted value of the dependent variable nmbrpartners increases by
0.11061 in the regression. The coefficient for timeberel is 0.15396, and means that for every
one unit increase in timeberel, the predicted value of the dependent variable increases by
0.15396.

In model 1d, the coefficient for timeberel is 0.26457, indicating that every one unit increase in
timeberel, the predicted value of the dependent variable nmbrpartners increases by 0.26457 in
the regression. The coefficient for lengthrel is 0.11061, and means that for every one unit
increase in lengthrel, the predicted value of the dependent variable increases by 0.11061.

Overall the models are similar to each other when we look at the Multiple R-squared which is
the same; 0.1148. This means that they all have the same proportion of variability in the
dependent variable (11,5%). The p-value is also the same in all of them. The explanation for
why they are similar to eachother is thus that they have the same intercept and the same R-
squared values, indicating that they explain a similar amount of variability in the dependent
variable. However, there are differences in the coefficients for specific predictor variables (age
and lengthrel between model 1b and 1d), suggesting that the relationships between these
variables and the dependent variable may vary slightly across the models.

The regression model 1e aims to predict the number of partners (nmbrpartners) based on
various predictor variables. The intercept is estimated to be 25.79261. This represents the
expected number of partners when all other predictor variables are zero. The coefficients for
all predictor variables represent the estimated change in the number of partners associated with
a one-unit increase in each predictor variable, holding all other variables constant.
For example, holding all other variables constant, being female is associated with a decrease
of approximately 7.71 partners compared to being male. Similarly, a one-unit increase in the
number of cohabitants is associated with an increase of approximately 5.03 partners. The
variable lengthrel has a coefficient of 0.11061, indicating that a one-unit increase in
relationship length is associated with an increase of approximately 0.11 partners, but this result
is marginally significant (p = 0.05492). The model 1e explains approximately 11.48% of the
variability in the number of partners, as indicated by the multiple R-squared value. The adjusted
R-squared value, which accounts for the number of predictors, is 10.96%. The F-statistic tests
the overall significance of the model and suggests that the model is statistically significant (p
< 2.2e-16), indicating that at least one of the predictor variables has a significant relationship
with the number of partners. And finally, the residuals represent the differences between the
observed values of the dependent variable and the values predicted by the model. They indicate
how well the model fits the data, with lower residual standard error values indicating better
model fit.

MODEL 2

2. a) Making a regression for model2 in RStudio:

> model2 <- lm(nmbrpartners ~ female + nmbrcohabitant + reg2 + reg3 + drunkdummy + homobi +
edu2 + edu3 + age + debutage + single + children, data = sexb, subset = (nmbrpartners>0))

> summary(model2)

Call:
lm(formula = nmbrpartners ~ female + nmbrcohabitant + reg2 +
reg3 + drunkdummy + homobi + edu2 + edu3 + age + debutage +
single + children, data = sexb, subset = (nmbrpartners >
0))

Residuals:
Min 1Q Median 3Q Max
-41.27 -7.85 -2.52 2.82 763.24

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.88119 3.92999 6.077 1.43e-09 ***
female -7.69157 1.10841 -6.939 5.12e-12 ***
nmbrcohabitant 5.68312 0.68938 8.244 2.78e-16 ***
reg2 -2.90352 1.34557 -2.158 0.031045 *
reg3 -3.40169 1.39393 -2.440 0.014748 *
drunkdummy 6.50534 1.50032 4.336 1.51e-05 ***
homobi -1.61647 3.51650 -0.460 0.645789
edu2 3.03007 1.45789 2.078 0.037784 *
edu3 5.33388 1.67270 3.189 0.001448 **
age 0.16949 0.04904 3.456 0.000558 ***
debutage -1.51317 0.18112 -8.354 < 2e-16 ***
single 4.56887 1.41635 3.226 0.001274 **
children 0.26029 1.45084 0.179 0.857632
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 25.4 on 2281 degrees of freedom


(452 observations deleted due to missingness)
Multiple R-squared: 0.1117, Adjusted R-squared: 0.107
F-statistic: 23.9 on 12 and 2281 DF, p-value: < 2.2e-16
2. b)
Because our dependent variable is the number of sex partners, it cannot be zero. This because
we need know that all the coefficients are compared in relation to the number of sex partners.
If some would have zero we would assume everyone in the regression have had a sex partner,
when in fact they have not had one. This creates an error. When we exclude all respondents
with zero sex partners, we know for sure we are only including the ones who had at least one
sex partner. So this means we can analyze that for example being a female, the number of sex
partners is 7.7 times less then for being a male. This analysis takes into account only the female
and men in the dataset who have had at least one sex partner. Thus, we cannot compare them
with zero values on number of sex partners and I make the conclusion that the respondents
should not be included in the models. It would be necessary to include them in the model if the
dependent variable was another coefficient, which in this case, it is not.

2. c) Predicted number of partners for a 24 year old woman, who has had homosexual
intercourse, is in no relationship at present, has a high educational level (edu3), lives in a big
city (reg1), made her sex debut at the age of 18, has been cohabitant once, don’t have any
children, and has been sober during the last month

> 23.88119 - 7.69157*1 + 5.68312*1 - 2.90352*0 - 3.40169*0 + 6.50534*0 - 1.61647*1 +


3.03007*0 + 5.33388*1 + 0.16949*24 - 1.51317*18 + 4.56887*1 + 0.26029*0

[1] 6.98972

The predicted number of sex partners is 7.

2. d) Calculating the same as above with the exception that the woman has been drunk during
the last month.

> 23.88119 - 7.69157*1 + 5.68312*1 - 2.90352*0 - 3.40169*0 + 6.50534*1 - 1.61647*1 +


3.03007*0 + 5.33388*1 + 0.16949*24 - 1.51317*18 + 4.56887*1 + 0.26029*0

[1] 13.49506

2. e) A graph is made in RStudio showing age in the x-axis andpredicted number of sex
partners in the y-axis. The model 2c. for predicted number of sex partners is shown by the red
line. The model 2d and predicted number of sex partners is showed by the blue line. The red
line shows the age 24 of the respondents.
PART 2

MODEL 3
3. a) Rerunning model 2 but adding unfaithful as an independent variable in model 3.
The following commands and output are from RStudio:

> model3 <- lm(nmbrpartners ~ female + nmbrcohabitant + reg2 + reg3 + drunkdummy + homobi +
edu2 + edu3 + age + debutage + single + children + unfaithful, data = sexb, subset =
(nmbrpartners>0))

> summary(model3)

Call:
lm(formula = nmbrpartners ~ female + nmbrcohabitant + reg2 +
reg3 + drunkdummy + homobi + edu2 + edu3 + age + debutage +
single + children + unfaithful, data = sexb, subset = (nmbrpartners >
0))

Residuals:
Min 1Q Median 3Q Max
-42.89 -7.76 -2.10 2.84 760.06

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.05770 4.43066 4.301 1.78e-05 ***
female -6.80892 1.13369 -6.006 2.26e-09 ***
nmbrcohabitant 5.14470 0.76431 6.731 2.20e-11 ***
reg2 -1.97132 1.38877 -1.419 0.155921
reg3 -1.93433 1.42479 -1.358 0.174739
drunkdummy 7.90624 1.62854 4.855 1.30e-06 ***
homobi -3.03437 3.68733 -0.823 0.410655
edu2 4.21209 1.50387 2.801 0.005147 **
edu3 6.14017 1.70915 3.593 0.000336 ***
age 0.13849 0.04944 2.801 0.005143 **
debutage -1.41333 0.19437 -7.271 5.12e-13 ***
single 2.54162 1.64277 1.547 0.121988
children 1.23662 1.46393 0.845 0.398367
unfaithful 5.26935 1.23464 4.268 2.07e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 23.97 on 1958 degrees of freedom


(774 observations deleted due to missingness)
Multiple R-squared: 0.1345, Adjusted R-squared: 0.1287
F-statistic: 23.4 on 13 and 1958 DF, p-value: < 2.2e-16

In this new model 3 the intercept, meaning the dependent variable nmbrpartners is lowered to
19.05770 when it was 23.88119 in 2.a. This indicates that being unfaithful changes coefficient
for number of partners to a lower number than before. Furthermore the new model 3 shows
slightly different values on all independent coefficients than in 2a. For example female was -
7.69157 in 2a, but is now -6.80892, meaning that female respondents have now -6.80892 less
number of partners than male respondents, when all other variables are held constant.

In model 3, there is also a higher number of observations deleted due to missingness (774). The
Multiple R-squared went from being 0.1117 in 2a, to 0.1345 in model 3. It indicates that the
the proportion of variability in the dependent variable is explained by 13.45% by the
independent variables. The adjusted R-squared provides in this case a more precise measure of
the proportion of variability because it takes into account the number of predicted coefficients,
since we now added unfaithful. The adjust R-squared has changed from being 0.107 in 2a, to
0.1287 in model 3. The F-statistics are almost the same, but in model 3 slightly smaller with
23.4 and also includes one more predicted coefficient (13) than in 2a. The number 23.4
indicates how much better the regression model fits the data compared to a model with no
predictors, and the higher number the better it fits. Lastly, the p-value is the same in both 2a
and model 3, with a very small number, indicating that overall the regression models show
statistical significance.

3. b)
To compare the number of partners between women and men while controlling for other
independent variables in the model, we need to look at the coefficient for the female variable.
In the provided model 3, the coefficient for female is -6.80892.

This coefficient represents the difference in the expected number of partners between women
and men, controlling for the other variables in the model. Since the coefficient is negative, it
suggests that, on average, women have fewer partners compared to men. Therefore, the
estimate is that women have approximately 6.80892 fewer partners than men, after accounting
for the effects of other variables in the model.
3. c)
To determine the difference in the number of partners between individuals with education
class 3 compared to those with education class 2, while controlling for other independent
variables in the model, we look at the coefficients for the variables representing these
education levels: edu2 and edu3.

In the provided model 3, the coefficients for edu2 and edu3 are 4.21209 and 6.14017.
The difference in the number of partners between education class 3 and education class 2
individuals can be calculated as follows:
> 6.14017 - 4.21209
[1] 1.92808

So, individuals with education class 3 have approximately 1.92808 more partners compared
to those with education class 2, controlling for other variables in the model.
This has to be tested to determine if the calculation is statistically significant. It is made with
a linear hypothesis test as following in RStudio:

> linearHypothesis(model3, "edu3 - edu2")


Linear hypothesis test

Hypothesis:
- edu2 + edu3 = 0

Model 1: restricted model


Model 2: nmbrpartners ~ female + nmbrcohabitant + reg2 + reg3 + drunkdummy +
homobi + edu2 + edu3 + age + debutage + single + children +
unfaithful

Res.Df RSS Df Sum of Sq F Pr(>F)


1 1959 1125898
2 1958 1124626 1 1272.4 2.2153 0.1368

The p-value associated with the F-statistic for this hypothesis test is p=0.1368. Since the p-
value is greater than the significance level (usually 0.05), we fail to reject the hypothesis that
there is no different between the number of sex partners depending on education level.
Therefore, we do not have sufficient evidence to conclude that there is a statistically significant
difference between the coefficients of edu3 and edu2 at the 0.05 level of significance.

Based on this test, we cannot say with confidence that there is a significant difference in the
number of sex partners between individuals with education class 3 and education class 2,
controlling for the other variables in the model.

3. d) To compare how many partners a 25-year-old person have had compared to someone at
age 30, the following computation show the difference:
> 0.13849 * 25 - 0.13849 * 30
[1] -0.69245
According to model 3 when all other variables are held constant, someone aged 25 have had
0.69 fewer partners than a 30-year old.

3. e) To compare how many partners a 55-year-old person have had compared to someone at
age 60, the computation is the following:

> 0.13849 * 55 - 0.13849 * 60


[1] -0.69245
According to model 3 when all other variables are held constant, someone aged 55 have had
0.69 fewer partners than a 60-year old.

3. f)
The results from above in 3d and 3d compare different ages and their difference in number of
partners. Still, the computation show the same difference between 25- and 30-year old
respondents, and 55- and 60-year old respondents. I interpret the reason is because of both age
gaps have a 5 year difference, and thus when we are not adding any other changes in the other
independent variables, the output is the same. It is not very reasonable, since it means regardless
of age, when we compare the number of partners with an other age and the age gap is the same,
the output will always be the same. I would in this case improve the model by controlling for
the respondents in question to have or not have an employment, or controlling for an income.
These added variables in the model, could change the difference in number of partners of two
ages.

MODEL 4

4. a) To study the predicted difference in number of sexual partners between men who
have been unfaithful and men who have not been unfaithful, I add an interaction
between female and unfaithful to the model in question 3a.

> model4 <- lm(nmbrpartners ~ female * unfaithful + nmbrcohabitant + reg2 + reg3 +


drunkdummy + homobi + edu2 + edu3 + age + debutage + single + children,
+ data = sexb, subset = (nmbrpartners > 0))

> summary(model4)

Call:
lm(formula = nmbrpartners ~ female * unfaithful + nmbrcohabitant +
reg2 + reg3 + drunkdummy + homobi + edu2 + edu3 + age + debutage +
single + children, data = sexb, subset = (nmbrpartners >
0))

Residuals:
Min 1Q Median 3Q Max
-44.99 -7.29 -2.13 2.71 758.37

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.59572 4.44858 3.955 7.91e-05 ***
female -4.54624 1.36068 -3.341 0.000850 ***
unfaithful 8.35914 1.60738 5.200 2.20e-07 ***
nmbrcohabitant 5.21500 0.76312 6.834 1.10e-11 ***
reg2 -1.83418 1.38671 -1.323 0.186096
reg3 -1.75245 1.42320 -1.231 0.218342
drunkdummy 7.75983 1.62598 4.772 1.96e-06 ***
homobi -2.67639 3.68180 -0.727 0.467359
edu2 4.03786 1.50195 2.688 0.007240 **
edu3 5.95518 1.70681 3.489 0.000495 ***
age 0.13182 0.04939 2.669 0.007671 **
debutage -1.38803 0.19416 -7.149 1.23e-12 ***
single 2.44613 1.63975 1.492 0.135921
children 1.24335 1.46096 0.851 0.394847
female:unfaithful -7.02812 2.34797 -2.993 0.002795 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 23.92 on 1957 degrees of freedom


(774 observations deleted due to missingness)
Multiple R-squared: 0.1384, Adjusted R-squared: 0.1322
F-statistic: 22.45 on 14 and 1957 DF, p-value: < 2.2e-16

> model4$coefficients["unfaithful"]
unfaithful
8.359136

To find the predicted difference in the number of sexual partners between men who have been
unfaithful and men who have not been unfaithful based on model4, we look at the coefficient
for unfaithful, which is 8.35914. This means that, holding all other variables constant, men who
have been unfaithful are predicted to have approximately 8.36 more sexual partners compared
to men who have not been unfaithful.

We determine whether the difference is statistically significant by examining the p-values


associated with the coefficients in the model. In model4, the coefficient for unfaithful has a
p-value of 2.20×10−7, which is much smaller than 0.05. This indicates that the
coefficient for unfaithful is statistically significant.

4. b) Checking if there is a significant difference between men who have been unfaithful and
women who have been unfaithful:

> model4$coefficients["unfaithful"]+model4$coefficients["female:unfaithful"]
unfaithful
1.331019
the sum of these coefficients (1.331) indicates the combined effect of unfaithfulness on the
number of sexual partners among men and women. It suggests that, on average, individuals
who have been unfaithful tend to have approximately 1.33 more sexual partners compared to
those who have not been unfaithful, after accounting for gender differences and the
interaction between gender and unfaithfulness. We check if it has a significant difference:

> linearHypothesis(model4, "unfaithful + female:unfaithful")


Linear hypothesis test

Hypothesis:
unfaithful + female:unfaithful = 0

Model 1: restricted model


Model 2: nmbrpartners ~ female * unfaithful + nmbrcohabitant + reg2 +
reg3 + drunkdummy + homobi + edu2 + edu3 + age + debutage +
single + children

Res.Df RSS Df Sum of Sq F Pr(>F)


1 1958 1119812
2 1957 1119500 1 311.9 0.5452 0.4604

OR

> test2 <- glht(model4, linfct = c("unfaithful + female:unfaithful=0"))


> summary(test2)

Simultaneous Tests for General Linear Hypotheses

Fit: lm(formula = nmbrpartners ~ female * unfaithful + nmbrcohabitant +


reg2 + reg3 + drunkdummy + homobi + edu2 + edu3 + age + debutage +
single + children, data = sexb, subset = (nmbrpartners >
0))

Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
unfaithful + female:unfaithful == 0 1.331 1.803 0.738 0.46
(Adjusted p values reported -- single-step method)

The difference is not significant (both highlighted numbers over the preferred p-value), and
we reject the hypothesis that the number of sexual partners between individuals who have
been unfaithful and those who have not, after accounting for gender and the interaction
between gender and unfaithful behavior.

Although, from 4a, it is shown that men who have been unfaithful are predicted to have
approximately 8.36 more sexual partners compared to men who have not been unfaithful.
Women who have been unfaithful are predicted to have approximately 7.03 fewer sexual
partners compared to men who have been unfaithful. Thus, based on the model, there is a
significant difference between men and women who have been unfaithful in terms of the
number of sexual partners. Men who have been unfaithful tend to have more partners
compared to women who have been unfaithful, with a difference of approximately 15.39
partners.

We determine whether the difference is statistically significant by examining the p-values


associated with the coefficients in the model. In model4 seen in 4a, the coefficient for
female:unfaithful has a p-value of 0.002795, which is less than the conventional significance
level of 0.05. This indicates that the coefficient is statistically significant.

4. c) To study a significant difference between women who have been unfaithful and women
who have not been unfaithful, the difference is calculated as following:

> model4$coefficients["female:unfaithful"]+model4$coefficients["female"]
female:unfaithful
-11.57436
The computation shows that the predicted difference in the number of sexual partners
between women who have been unfaithful and women who have not been unfaithful is
approximately -11.57. This indicates that, controlling for other variables in the model, women
who have been unfaithful tend to have approximately 11.57 fewer sexual partners compared
to women who have not been unfaithful.

Checking the significance of the difference with two different tests:


> linearHypothesis(model4, "female:unfaithful + female")
Linear hypothesis test

Hypothesis:
female + female:unfaithful = 0

Model 1: restricted model


Model 2: nmbrpartners ~ female * unfaithful + nmbrcohabitant + reg2 +
reg3 + drunkdummy + homobi + edu2 + edu3 + age + debutage +
single + children

Res.Df RSS Df Sum of Sq F Pr(>F)


1 1958 1139590
2 1957 1119500 1 20090 35.119 3.657e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

OR

> test2 <- glht(model4, linfct = c("female:unfaithful + female=0"))


> summary(test2)

Simultaneous Tests for General Linear Hypotheses

Fit: lm(formula = nmbrpartners ~ female * unfaithful + nmbrcohabitant +


reg2 + reg3 + drunkdummy + homobi + edu2 + edu3 + age + debutage +
single + children, data = sexb, subset = (nmbrpartners >
0))
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
female:unfaithful + female == 0 -11.574 1.953 -5.926 3.66e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

The difference is significant as seen from the linear hypothesis test and glht test. It results
indicate that the combined effect of being female and being unfaithful is statistically
significant in predicting the number of sexual partners, because of Pr(>F)=3.657e-09, and
Pr(>|t|)=3.66e-09 (p<0.001). These tests suggests that there is a significant difference
between women who have been unfaithful and women who have not been unfaithful in terms
of their number of sexual partners.

4. d)
The fact that women who have been unfaithful have according to the difference in 4c, about
11.57 fewer partners than women who have not been unfaithful, seems a bit odd. First of all,
from a theoretical point of view, I would assume it would be the other way, since the
respondent have not most likely stuck with the same partner for a longer time, because of
unfaithfulness. It would theoretically imply they would seek other partners, and therefore
have an increased number of sexual partners compared to women who have not been
unfaithful. Second of all, it makes thus no sense of the difference being significant. One thing
one could examine to understand the empirical result better is the marriage status. The
women who are or have been married may not have had as many sexual partners, regardless
of their unfaithfulness, whereas, if the women who have not been unfaithful also have not and
are not married, they may have had more sexual partners during their lifetime.

4. e)
Yes, the effect of unfaithfulness on number of partners is different for men and women,
because from the computations above we have calculated in 4a that unfaithful men have
about 8.36 more sexual partners than men who have not been unfaithful. The difference
between unfaithful men and women and their respective number of sexual partners, is
significant, as also calculated in 4b. Men who have been unfaithful tend to have more
partners compared to women who have been unfaithful, with a difference of approximately
15.39 partners.

You might also like