Professional Documents
Culture Documents
CS2B Mock 1
CS2B Mock 1
CS2B
MOCK 1
TimeSeriesData.csv
(i) (a) Import the data and convert it to a time series object.
(c) Comment on whether there is any seasonality present in the data. [5]
(ii) Determine how many times the data should be differenced before fitting a model by examining
relevant sample ACFs for the first 30 lags. [7]
(iii) Fit an MA(1) model to your differenced data, writing down the equation of your fitted model. [2]
(iv) (a) Plot the residuals and the ACF of the residuals for the fitted MA(1) model.
(b) Comment on your graphs, comparing them to the theoretical behavior of the residuals if the
model is a good fit. [7]
(v) Fit an MA(3) model to your differenced data, stating the parameters of your fitted model. [2]
(vi) (a) Plot the 'residuals and the ACF of the residuals for the fitted MA(3) model.
(b) Comment on your graphs, comparing them to the graphs from part (iv)(b) [7]
[Total 31]
x
2. The mortality of a population has been found to follow Makeham’s law of mortality x A Bc with
parameter values A=0.0025, B=0.00004 and c =1.11 for ages 50 x 100 and to have a limiting age of
100.
(i) (a) Construct a function with one input, x, that returns the value of x .
If mortality follows Makeham's law of mortality, then survival probabilities are given by:
t
t px s g
c x c t 1
B
g exp and s exp A
where log c [8]
(iii) (a) Construct a function that takes two inputs, x and t, and calculates t p x . Your function should
also output an error if the inputted value of x is less than 50.
(c) Show the output of your function when inputting x = 45 and t = 2. [6]
(iv) Calculate the following numerical values, using your function for t p 50 :
(a) the probability that a life aged exactly 50 will survive to age 100
There are 10,000 individuals in the population that are currently aged 50.
(v) (a) Plot a line graph showing the expected number of lives alive at ages 50 to 100,based on the
given Makeham's law of mortality.
(b) Update your graph to include a line showing the expected number of lives alive at ages 50 to
100 based on the uniform distribution of deaths (UDD) assumption and a limiting age of
100. [6]
CA PRAVEEN PATWARI 3 JAI SHREE RAM
CS2B MOCK 1 ACTUATORS EDUCATIONAL INSTITUTE
(vi) (a) State the complete expected future lifetime for individuals currently aged 50 based on the
UDD assumption and a limiting age of 100.
(b) Compare you answers to parts (iv)(a) and (vi)(a) using your graph from part (v)(b). [3]
[Total 35]
3. Happy Life insurance company is assessing a list of potential customers provided by a data analysis
company.
Happy Life sent marketing material to a representative sample of approximately 100 people and
recorded whether or not they made a purchase in the month that followed. The company then
produced a file containing information about each individual (such as age, income etc) as well as
whether or not a successful sale was made.
The marketing department is considering using a decision tree to gauge the prospects of the full list of
potential customers, based on the data collected for the representative sample.
They have provided you with a data file ‘HappyLife.txt’ containing information on the representative
sample of approximately 100 potential customers, which includes the following columns:
● SEX (M or F)
● MARRIED (recorded as Y or N)
Some fields are recorded as NA where the data was not available or was considered unreliable.
You are given that the command na.omit(<data>) removes rows with NA from <data>.
(i) (a) Read the data file HappyLife.txt into an object called happy. You should ensure that
character columns are read as factors.
(c) Show that the number of rows in the updated object is 100.
(ii) Create a training data set called happy_train by randomly selecting 60% of the rows of happy
setting a seed of 38328. You should store the selected row indices in the vector training_rows.
In order to construct the tree, they are considering using the column SEX or the column
CHILDREN for the first split. [2]
(iii) (a) Calculate the Gini index after splitting the 60 training individuals by the column SEX.
(b) Calculate the Gini index after splitting the 60 training individuals by the column CHILDREN.
(c) Explain which split would be preferred when using the greedy approach. [9]
The marketing department decides to try the following tree in order to predict sales:
(iv) Construct a function in R that will determine the predicted outcome using this tree based on the
five input variables.
Your function should take five inputs (values for AGE, SEX, Married, CHILDREN and HIGH) and
return a value of either “Y” for a sale “N” for no sale.
(v) Test that your function is working correctly by running it for the following two individuals and
comparing the output to the given decision tree:
● a 45-year-old male who is married with children and has an income lower than the
specified level
● a 45-year-old female who is not married, has no children and has an income higher than
the specified level. [2]
(vi) (a) Construct a confusion matrix for comparing the tree's predictions with the actual outcomes
for the test data (ie the customers not in the training data).
(b) Calculate the precision and recall metrics for the tree's performance on the test data
(treating SALE as the positive outcome). [6]
(vii) (a) Construct a decision tree from the training data called package.tree using the tree() function
from the tree package.
(b) Construct a confusion matrix for comparing the predictions using package.tree with the
actual outcomes for the test data (treating SALE aa the positive outcome).
(c) Calculate the precision and recall metrics for the performance of package.tree on the test
data.
The marketing department wants to use one of these two trees for deciding who to market
to from the full list of potential customers. [6]
(viii) Discuss which tree the department may wish to use, using your answers to part (vi) and part
(vii). [2]
[Total 34]