2019 - 01 - 29 - Second - Partial - Exam - C - Solutions

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

STUDENT’S SIGNATURE

SECOND PARTIAL EXAM OF STATISTICS


30001 (6045/5047/4038/371/377)
January 29, 2019
Last name First name ID (Matr.) ______________
Course code Degree program Class_____________

VERSION C - SOLUTIONS

Only work appearing inside the spaces provided below will be graded. An outline of the
procedure used to solve each problem and of the calculations performed is required. At the end
of the exam, all sheets (including all scrap paper, WHICH WILL HOWEVER NOT BE
GRADED) must be turned in.
PLEASE ROUND ALL CALCULATIONS TO THE FOURTH DECIMAL DIGIT

(USE THIS SPACE AS ADDITIONAL SCRAP PAPER OR FOR YOUR ANSWERS)


EXERCISE 1 (5 points)
What is the percentage p of young people who have more than one tattoo? To answer this question, a survey is
conducted on a sample of young people, obtaining the following results:
Number of tattoos Number of young
individuals
None 45
1 38
2 27
3 or more 30
a)   Determine the confidence interval for p at 90% level.
b)   Suppose that in the past the percentage of young people with more than one tattoo was 35%; on the basis of the
reported sample, can we assume that this percentage has increased? Answer by specifying the hypotheses to be
tested, calculating the p-value and providing the decision corresponding to a significance level of 0.01.

a)   The point estimate is:

since n = 140 is high, the confidence interval for p (approximately) to 90% is:
, where

then:

b)   The hypotheses to be tested are:

the value of the test statistic is:

𝑝̂ − 0.35
𝑧= = 1.4165  
*0.35(1 − 0.35)
140

the p-value is:

𝑃(𝑍 > 𝑧) = 𝑃(𝑍 > 1.4165) ≈ 1 − 𝑃(𝑍 < 1.42) = 1 − 0.9222 = 0.0778  

Since the p-value is greater than the level of significance, indeed 0.0778 > 0.01, the null hypothesis is not rejected,
which means that on the basis of the reported sample we cannot assume that the percentage of young people with more
than one tattoo has increased compared to the past.
EXERCISE 2 (5 points)
The production manager of a company wants to establish whether the productivity of the employees varies according to
the work shift (daytime or evening). To this purpose he extracts a sample of 12 daytime workers, for which he records
an average time to assemble a piece of 74.4 seconds with a standard deviation of 4.3 and a second sample of 20 evening
workers, for which he records an average time of 70.2 seconds with a standard deviation of 5.2.
a)   After having specified the necessary assumptions and the hypotheses to be tested, perform a test at level α =
0.01 to answer the responsible’s question.
b)   Considering now only the sample of daytime workers, do the data provide empirical evidence that the average
time to assemble a piece is less than 76? (answer using a test of level α = 0.1)

a)   To test the difference between the means, it is necessary to assume that:


i) the two samples are independent,

ii) the two populations are normally distributed with means and ,

iii) the variance (not known) in the two populations is the same.

The hypotheses to be tested are:

the test statistic is:

where:
and are the number and the sample mean for daytime workers,
and are the number and the sample mean for evening workers,
and .

Therefore:

The critical value is:

Since , indeed , the null hypothesis is not rejected, which


means that we have not found sufficient empirical evidence to state that average productivity changes between daytime
workers and evening workers.

b)   Suppose that the population is normal and we test the following hypotheses:

Or :

the value of the test statistic is :

The critical value is:


Since , indeed , the null hypothesis is not rejected, which means that we have not
found enough empirical evidence to state that the average time to assemble a piece for daytime workers is less than 76
seconds.
EXERCISE 3 (4 points)
A survey conducted in 2000 had highlighted the following preferences of readers of a certain segment: 30% Italian
narrative, 35% foreign narrative, 25% essay and 10% various. A publisher, in order to decide his editorial line, wants to
understand if these percentages have changed in the last years so he orders an analysis that is conducted on a sample of
readers, providing the following preferences:

Favorite literary genre Italian narrative foreign narrative essay various


N. preferences 46 64 36 28

a)   Write the statistical hypotheses to be tested to answer the publisher's request.


b)   Provide indications regarding the p-value of the test for the specified hypotheses.
c)   Based on the results obtained, what answer can the editor give? Set α = 0.05.

a)   The hypotheses to be tested are:

for at least one

where:

=  probability that a reader randomly extracted from the population would prefer Italian narrative;
=  probability that a reader randomly extracted from the population would prefer foreign narrative;
=  probability that a reader randomly extracted from the population would prefer essay;
=  probability that a reader randomly extracted from the population would prefer another genre.
While indicates the probability specified below for i=1,2,3,4.

b)   The following table contains the value of the observed frequencies, the theoretical frequencies and the
addends to calculate the value of the chi-square test statistic:

total

46 64 36 28 174
174·0.3 = 174·0.35 = 174·0.25 = 174·0.1=
174
52.2 60.9 43.5 17.4
0.7364 0.1578 1.2931 6.4575 8.6448

Hence the value of the test statistic is: .

The p-value is , therefore: ,


so: .

c)   Since , the null hypothesis is rejected, which means that the survey conducted has sufficient
empirical evidence to assume that readers' preferences have changed compared to 2000.
EXERCISE 4 (7 points)
The marketing department of a new chain of stores wants to evaluate the effects of the size of the exhibition space (in
square meters) on the daily sales (in thousands of Euros). For this purpose, a random sample of 12 stores is selected,
obtaining the following results:
12 12 12
2
x = 382.5 y = 2400 ∑ (x − x )
i = 303017 ∑(y i − y ) 2 = 4 200 000 ∑ (x − x ) ( y
i i − y ) = 1075 400
i =1 i =1 i =1

a)   Write the linear regression model estimated to explain the daily sales as a function of the size of the exhibition
space.
b)   Estimate the variance of the error component of the linear model previously considered.
c)   Do you find it useful to use the size of the exhibition space to make predictions on daily sales? Answer by
using an appropriate test of level α = 0.05 and specifying the potential assumptions to be made.
d)   Based on the estimated model, provide the 90% prediction interval for the daily sales of a single store that has
an exhibition space of 420 square meters.

a)   The estimate of the angular coefficient is: ;

The estimate of the intercept is: .

So the estimated regression model is: .

b)   The estimate for the variance of the error is:

c)   To answer, it is necessary to perform a t-test on the angular coefficient .

The hypotheses to be tested are:

To perform the test it is necessary to assume that the standard assumptions for the model are verified, that is
with independent and identically distributed error terms according to a normal
distribution with zero mean and variance .

Before calculating the test statistic we estimate the variance of :

The test statistics is:

The critical value is: .

Since , indeed , the null hypothesis is rejected, which means that there is a linear
relationship between the size of the exhibition space and the daily sales.
d)   The point estimate for the daily sales of a single store that has an exhibition space of 420 square meters is:

  .

The prediction interval at 90% level is:

where .

So:

EXERCISE 5 (3 points)
Provide the definition of point estimator of a generic parameter θ.
Define and explain the properties of unbiasedness and asymptotic unbiasedness of an estimator.
Let T1 and T2 be two estimators of the same parameter θ with Var (T1) <Var (T2). Is it possible to conclude that T1 is
more efficient than T2?

[See the course material.]

EXERCISE 6 (3 points)
On the basis of which indicator can you compare two multiple linear regression models? Return the formula (explaining
all its components) and justify its use.

[See the course material.]

You might also like