Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

VOORBLAD VOOR EEN SCHRIFTELIJK TENTAMEN/TOETS

Verplicht gedeelte:

Vaknaam : Statistiek voor Pre-master

Vakcode : 350931

Datum tentamen : 6 december 2017

Duur tentamen : 2 uur

Docent : Daniël Roelfsema (tel. intern: 013-4668475)

Studenten worden geacht zich tijdens het tentamen correct te gedragen en de instructies
van examinator en surveillant op te volgen. Bij constatering van fraude wordt streng
opgetreden.

Facultatief gedeelte:
- Open boek examen: alleen woordenboek EN-NE toegestaan
- Gebruik normale calculator: ja: alleen de Casio FX82 serie en de Texas
Instruments TI30 serie zijn toegestaan. Alle
andere typen/merken zijn NIET toegestaan.
- Gebruik grafische calculator: nee
- Dubbelzijdig afdrukken: ja
- Kladpapier beschikbaar: ja
- Vergeet niet om naam en studentnummer (SNR, 7 cijfers) in te vullen op het
antwoordformulier.
- Schrijf je antwoord (inclusief uitwerking) in de daarvoor bestemde box van het
antwoordformulier. Als er om een toelichting in woorden gevraagd wordt, geef je
deze in het Engels.
- Als je (tussentijds) afrondt, doe dit dan in 4 decimalen.
- Als je het antwoord op een vraag niet kunt geven, terwijl je dit wel nodig hebt voor
een latere vraag, kies dan een redelijke waarde (schrijf deze waarde ook duidelijk
op, zodat daar bij het nakijken rekening mee gehouden kan worden).
- Je kunt kladpapier gebruiken om de opgaven uit te werken. Kladpapier wordt niet
nagekeken.
- Als je een antwoord wilt corrigeren, streep dan duidelijk het foute antwoord door
en schrijf het nieuwe antwoord op. Indien noodzakelijk kun je een nieuwe box om
dit antwoord heen tekenen om duidelijk te maken dat dit het antwoord is dat
nagekeken moet worden.
- Het tentamen bevat 5 opgaven (op 7 pagina’s), je cijfer is aantal behaalde punten
gedeeld door 4.3
- Er zijn twee bijlages: Een formuleblad (1 pagina) en een bijlage met de tabellen
van de verschillende verdelingen (9 pagina’s). Mocht bij het gebruik van de tabellen

1
het juiste aantal vrijheidsgraden niet in de tabel staan, gebruik dan het
dichtstbijzijnde aantal vrijheidsgraden.
- Datum en tijdstip van de inzage worden via BlackBoard bekend gemaakt.

Alle studenten dienen het antwoordformulier, het kladpapier en de toets


in te leveren!

2
Exercise 1 (2+3 = 5 points)
A factory produces parts for vacuum cleaners. To produce the tubes for the vacuum
cleaners, two machines are used, machine A and B.

Unfortunately, sometimes the machines go down (don’t function) during the day. Let X be
the number of times that machine A goes down during a specific day, and let Y be the
number of times machine B goes down during a specific day. In Table 1 you can find the
joint probability distribution of X and Y.

Y
X 0 1 2
0 0.2 0.05 0.15
1 0.15 0.05 0.1
2 0.05 0.1 0.15
Table 1: Joint probability distribution of X and Y

Questions:
a. Determine 𝑃({𝑋 ≥ 1} ∩ {𝑌 ≤ 1}) and 𝑃({𝑋 ≥ 1} ∪ {𝑌 ≤ 1}).
b. Calculate 𝜎𝑋,𝑌 and 𝜌𝑋,𝑌 , you may use that 𝐸(𝑋) = 0.9, 𝐸(𝑌) = 1, 𝐸(𝑋 2 ) = 1.5
and 𝐸(𝑌 2 ) = 1.8.

3
Exercise 2 (5 points)
A store owner wants to have a better understanding of ‘who her customers are’. She is
particularly interested in the age of the customers, here denoted with the variable AGE (in
years). To improve her understanding, she gathered data for a random sample of 101
customers and imported all the data into SPSS. In Table 2 you can find the descriptive
statistics of AGE.

Descriptive Statistics

N Minimum Maximum Mean Std. Deviation

AGE 101 10 68 40.27 10.003


Valid N (listwise) 101
Table 2: Descriptive statistics of AGE

Question:
Test whether the population standard deviation of AGE is larger than 9.7. Use 𝛼 = 0.01
and write down all five steps of the test procedure.

4
Exercise 3 (1+5+2+2+3+1= 14 points)

In this exercise we analyse the amount of time someone remains unemployed after losing
his/her job. Below you can find the results (SPSS output) of a study that analyses a random
sample of 50 manufacturing workers. The data include the following variables:

Weeks = ‘unemployment duration (in weeks)’ = ‘number of weeks a worker has


been jobless due to a layoff’ (layoff = tijdelijke werkeloosheid).
Age = ‘the age of the worker (in years)’
Educ = ‘the number of years of education of the worker’
Tenure = ‘the number of years of the previous job’
Married : dummy variable; 1 if married, 0 otherwise
Head : dummy variable; 1 if the head of household, 0 otherwise
Manager : dummy variable; 1 if management role, 0 otherwise
Sales : dummy variable; 1 if working at sales department, 0 otherwise

Weeks is the dependent variable and all other variables are used as independent variables.

ANOVAa

Model Sum of Squares Df Mean Square F Sig.

1 Regression 16250.440 7 … … ...

Residual 11227.080 42 267.311

Total 27477.520 49

a. Dependent Variable: Weeks


b. Predictors: (Constant), Sales, Head, Tenure, Educ, Manager, Age, Married
Table 3: ANOVA table for the model of unemployment duration

5
Coefficientsa

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 22.851 18.867 1.211 .233

Age 1.509 .304 .577 4.964 .000

Educ -.613 .936 -.070 -.655 .516

Married -10.743 6.012 -.210 -1.787 .081

Head -19.779 5.837 -.394 -3.389 .002

Tenure .426 .467 .105 .913 .366

Manager -26.742 8.326 -.342 -3.212 .003


Sales -18.561 6.281 -.304 -2.955 .005

a. Dependent Variable: Weeks


Table 4: Coefficient estimates for the model of unemployment duration

Questions
a. Write down the basic assumption of the model for which the estimates of the
coefficients are given in Table 4.
b. Test if the model is useful. Use 𝛼 = 1% and write down all five steps of the testing
procedure.
c. Give a precise interpretation of the estimated coefficient of the variable Age.
d. Write down which variables are individually significant, and show that they are
significant by using p-values. Use 𝛼 = 5%.
e. An employee of an unemployment agency conjectures that becoming one year older
on average leads to being at least one more week unemployed, ceteris paribus. Test
whether the data support this conjecture using a 95% confidence interval. Write
down the general formula of the confidence interval, the interval obtained for the
sample and carefully draw your conclusion.
f. Use the model to predict the unemployment duration (Weeks) of a 45 year old
married man, who is head of the household, worked at a sales department for 12
years, did not have a management role and has 10 years of education.

6
Exercise 4 (2+5+2= 9 points)
A mobility consultant performed a research study on the fuel consumption of cars. He
gathered information about 64 cars. In his study, he used the following explanatory
variables: weight (weight of car in kg), length (length of car in cm) and price (price of care
in Dollars), for explaining the dependent variable mpg (miles per gallon).

Below you can find three ANOVA tables he produced while performing his regression
analysis. After running the regression model with all three independent variables, he finds
𝑟 2 = 0.738.

ANOVAa

Model Sum of Squares df Mean Square F Sig.

1 Regression … 1 … 163.683 .000b


Residual 452.634 62 7.301

Total … 63

a. Dependent Variable: mpg


b. Predictors: (Constant), weight

ANOVAa

Model Sum of Squares df Mean Square F Sig.

1 Regression … 2 … 76.641 .000b

Residual 469.030 61 7.689

Total … 63

a. Dependent Variable: mpg


b. Predictors: (Constant), length, price

Model Sum of Squares df Mean Square F Sig.


1 Regression … 3 … 56.417 .000b

Residual 431.215 60 7.187

Total … 63

a. Dependent Variable: mpg


b. Predictors: (Constant), price, length, weight
Table 5: ANOVA tables of regressions concerning fuel consumption

7
Questions:
a. What is the SST of the model that regresses mpg on price, length and weight?
b. Are the two variables price and length jointly useful within the model in which all
the independent variables used? Use 𝛼 = 5% and write down all five steps of the
test procedure.
c. Explain two possible model violations of this regression model and describe how
they can be detected.

8
Exercise 5 (6+4= 10 points)
To investigate variation in gross hourly wages, we study the wages of 150 randomly
selected participants in the labour force. In addition to the wage, we also observe the age
and education level of each individual in the sample. We focus on differences in the average
wage across education levels. The variable W denotes the gross hourly wage in Euros and
the education level is measured by the variable Edl. Education is measured in levels ranging
from 1 (lowest) to 5 (highest). In Table 6 you can find the results of an analysis performed
in SPSS:

Report

Edl Mean N Std. Deviation

1 17.1625 24 6.51297
2 16.8031 36 6.49230
3 22.1466 53 8.62589
4 24.7448 25 6.65768
5 38.2017 12 17.84314
Total 21.7841 150 10.28203
Table 6: Mean wages (in Euros) at different education levels

Question:
a. Use a hypothesis test with 𝛼 = 10% to draw a conclusion about the conjecture that
the mean gross hourly wage of participants of the labour force with education level
3 is more than 3 Euros higher than the mean gross hourly wage at education level
2. Write down all the five steps of the testing procedure. You may use the equal
variance approach.

Furthermore, it is given that in this sample 12 participants with education level 2 have a
gross hourly wage higher than 20 Euros, whereas 22 participants with education level 3
have a gross hourly wage higher than 20 Euros.

Question:
b. Use a 90% confidence interval to draw a conclusion about the conjecture that the
percentage of participants in the labour market having a gross hourly wage higher
than 20 Euros is higher for participants with education level 3 than for participants
with education level 2. Write down the general formula, the interval obtained for
this sample and carefully draw your conclusion.

9
Formulae sheet for Final exam of Statistiek voor Premaster;
2017

General formulae:
 For two events A and B, the conditional probability 𝑃(𝐴|𝐵) is defined as
𝑃(𝐴 ∩ 𝐵)⁄𝑃(𝐵).

(x
1
 Sample covariance (data-version): s X ,Y =  x )( yi  y )
n 1
i
i 1
n
=
1
((  xi yi )  nxy )
n  1 i 1
n

(x
1
 Sample variance of the 𝑥-data: s 2
=  x )2
n 1
X i
i 1
n
=
1
((  xi2 )  n(x ) 2 )
n  1 i 1
Formulae for simple linear regression (𝒌 = 1):
n
 Sum of Squared Errors (data-version): SSE = ( y
i 1
i  yˆ i ) 2

 SSR = ∑𝑛𝑖=1(𝑦̂𝑖 − 𝑦̅)2, which equals 𝑏12 (𝑛 − 1)𝑠𝑋2

 SST = (𝑛 − 1)𝑠𝑌2
S
 Standard error of 𝐵1 = SE(𝐵1) = S B1 =
(n  1) s X2

10
11
12
13
14
15
16
17
18
19
20
21
22

You might also like