Professional Documents
Culture Documents
STA 2311 Statistical Prgramming II - MARCH2016
STA 2311 Statistical Prgramming II - MARCH2016
(b) Distinguish between a matrix and a dataframe as used in R giving their functions with
the main arguments. (4 marks)
(b) Create a vector named vec containing the numbers 22, 11, 76, 10, and 56. Then ask R
to print the third entry in vec. Throw out the entry 11 from vec, and place the result
in a vector named vec1. (3 marks)
(c) You decide to go on vacation to either Nairobi, Kisumu, or Mombasa. The cost of a
7-day vacation to each spot is 10,000, 12,000, and 15,000 dollars respectively. The travel
times to each spot are 13, 21, and 10 hour respectively. Write the R code to create a
data frame named vacation for that will output the following:
(4 marks)
√
(d) Show that the equation x2 + 1 + x = 3 has a root in the interval (1, 1.4).
Use the bisection method to obtain an estimate of the root with maximum possible
error 0.025.
Determine how many additional iterations of the bisection process would be required to
reduce the maximum possible error to less than 0.005. (6 marks)
(e) You have ordered 10 bags of cement, which are supposed to weigh 94 kg each. The
average weight of the 10 bags is 93.5 kg. Assuming that the 10 weights can be viewed as
a realization of a random sample from a normal distribution with unknown parameters,
construct a 95% confidence interval for the expected weight of a bag. The sample
standard deviation of the 10 weights is 0.75. Write the R code to construct the confidence
interval. (5 marks)
(a) Write an R code to generate n = 100 values from a Normal Distribution with µ = 50
and σ 2 = 15. Assuming a univariate normal distribution, write an R program that will
carry out maximum likelihood estimation for the mean and variance. Use the Newton-
Raphson optimization algorithm with analytic first and second partial derivatives.
(8 marks)
where θ is a parameter
(i) Write an R code to generate 50 random numbers from a Cauchy distribution with
θ = 1. (1 mark)
(ii) Treat the data you get from step (i) as sample observations from a cauchy distri-
bution with an unknown parameter θ. Obtain the log-likelihood function for θ and
write the R function for the log-likelihood function for θ.
(5 marks)
(iii) Using the bisection method write an R function to find the maximum likelihood
estimator of θ.
(6 marks)
(a) Let X < − matrix(c(1,2,3,1,4,9), ncol=2). Calculate the matrix H = X(X T X)−1 X T ,
where X is as defined above. (4 marks)
Page 2 of 4
(b) Given the model yi = β0 + β1 x1i + β2 x2i + i and the following data:
y 5 20 27 38 53 57 62 66
x1 3 5 6 7 9 10 12 12
x2 0 -1 0 1 -1 0 -1 2
(i) Write down the predictor matrix X and the response vector y. (2 marks)
(ii) Compute XT X, (XT X)−1 and XT y. (8 marks)
(iii) Estimate β0 , β1 and β2 by OLS using the matrix notations. (3 marks)
(3 marks)
(i) For n = 10 and Xi ∼ N (7, 4), use Monte Carlo integration to estimate the confi-
dence level of this confidence interval for c = 2.262. (4
marks)
(b) Similarly, estimate the corresponding confidence level for Xi ∼ Exp(1). What do
you observe? Comment on your result. (3 marks)
(b) The recovery time (in days) is measured for 10 patients taking a new drug and for 10
patients taking a placebo. We wish to test the hypothesis that the mean recovery time
for patients taking the drug is less than for those taking a placebo. The data are:
For our test, we will assume that the two population means are equal.
Page 3 of 4
(i) Input the data into R platform. (3 marks)
(ii) Write the R code to carry out this analysis. (4 marks)
(c) Trying to encourage people to stop driving to campus, the university claims that on
average it takes people 30 minutes to find a parking space on campus and that the
standard deviation is minutes. You, however, don’t think it takes so long to find a spot.
In fact, based on the last 5 times you drove to campus, you found the mean time to find
a parking spot to be 20 minutes. Assuming that the time it takes to find a parking spot
is normal, perform a hypothesis test with level of significance of 0.10 to see if your claim
is correct. (6 marks)
QUESTION FIVE (20 Marks)
(a) An insurance company has four types of policies, which we will label A, B, C, and D.
Write a function to do these calculations, and do it once for the overall company income
and claims, and once for each of the four types of policy. (4 marks)
Page 4 of 4