Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Data analysis class 2022/2023

Exercise sheet 8

Exercise
The file « SP500.csv » contains daily prices of 21 stocks from S&P500 over the last
20 years.
1) Open the file, attach it to the working memory of R and ask for its structure.
2) Give the correlation matrix of the quantitative variables. From which value
upwards, the correlations are statistically significant? Which variables are
significantly correlated to Amazon?
3) Create the scatterplot with the line number on the x axis and the prices of Amazon
in red on the y axis (use the parameter type=”l” to get lines instead of points). Add
the prices of Apple to the graph in blue. What do you observe?
4) Create the subset “Old” of all stock prices before June 1, 2011 (the first 3000
lines) and the subset “New” of all stock prices newer than that date.
5) Test if the prices of Amazon before June 2011 and after June 2011are normally
distributed. Use the appropriate test to check if the average prices of Amazon were
different in the two sub periods.
6) Perform a multiple regression to explain the prices of Amazon as a function of the
prices of the other stocks (variables 4 to 23). Exclude multicollinearity problems
(exclude variables with variance inflation factors larger than 10) and variables
which do not have enough explanatory power if necessary. Which is the final
model and how much of the variance of the prices of Amazon does it explain? Is
that astonishing?
7) Compute the standardized residuals of the linear model and add them to the data
set. Check the descriptive summary statistics of the residuals. Are there outliers to
the regression model?
8) Do a principal component analysis of the quantitative variables (variables 3 to 23).
How many components should we keep?
9) Give an interpretation of the 2 first principal components. Save the first two
factors as variables and add them to the data file. What do the points on the
scatterplot of the two first factors represent? What special fact do you observe on
the plot?
10) Perform a hierarchical ascendant cluster analysis on the results of the above
principal component analysis. Choose a 4 groups solution, give an interpretation
of the groups, save the group membership of the data points and add them to the
data file.

You might also like