Stat Learning Template

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

CFAS420 Coursework Template: You should come

up with an informative title..


A. Gibberd∗ , Name 2† , Name 3‡ and Name 4§
∗ 12345678
† Student ID 2
‡ Student ID 3
§ Student ID 4

Abstract—Around 100-150 words

I. I NTRODUCTION
You should do some kind of motivational literature review,
put your research in context! Use citations to show you
have actually done some thinking [1]. You should focus your
projects on understanding a few key concepts in detail. You do
not need to cite too much, but must show some understanding
of the papers you cite.
II. M ETHODS
What methods are you looking at, this should be a substan-
tial portion of your paper given the focus of the projects.
III. E XPERIMENT /R ESULTS
Are you doing an experiment or a real-data application? Put
the details of your application/experiment here and detail the
results.
IV. D ISCUSSION
What do your results show, what have you learned during
the project? Is there anything interesting which you would like
to look at more? Try to reflect on how your examples/result
relate to the literature you cited in the introduction.
V. R EFERENCES
You can place references on a further separate (5th) page.
VI. G ETTING YOU S TARTED real data analysis. Discuss your results, what advantages do
The topics for this years group-work are given below. You the convolutional networks offer? You may want to highlight
will need to choose one of these topics for your group to and discuss any challenges you faced whilst completing your
focus on. I have decided to leave it up to you as a choice experiments.
of topic, however, it would be nice to get a good spread of Some useful resources can be found below:
projects so we can have a varied class conference. Please let • Convolutional Neural Network (Chapter of Goodfellow
me know when you have decided on a topic before the end of book) - [6]
week 9 (Friday 10th December), once you have settled on this • Conv nets in R - [7]
you should not change your topic. In your presentation you • Conv nets under the hood (multipart) - [8]
should investigate some aspect of the topic. You should also • Some famous convolutional nets: VGC16, AlexNet, Im-
perform at least one example of a synthetic experiment or a ageNet
real data-analysis to demonstrate the concepts. Your projects
are expected to go beyond the taught material and assimilate C. Optimisation
information from various sources, including peer reviewed We will briefly discuss optimisation via stochastic gradi-
papers. ent descent in the taught material. Your task is to further
The sections below give some ideas for potential projects, investigate optimisation methods. You will need to focus
but you can also choose your own, though it’s recommended to on one or two methods, for instance, ADAM vs standard
discuss this with us first. For each task some initial references SGD. You could even do a rigorous analysis of SGD with
are listed, which will help you get started once you settle different parameterisations. You do not need to study neural
on a topic. These vary in their complexity, and some will be networks as an application, but can rather look at a simpler
harder to understand. In some cases it may be worth looking to cost function. For instance, you may want to consider the
alternative sources, or subsequent works to get an alternative least-squares linear-regression model, however, coding up your
view on the concepts. It is good practice to get used to this own stochastic-gradient descent algorithm. Discuss how the
kind of reading around to grasp tricky papers. optimiser performs in practice, and how this works, i.e. how
A. Regularisation (Generalisations) many steps does it take to converge to a given level? Does this
depend on step-size? Does the loss function keep decreasing
For this project you can look at the concept of regularisation
at the same rate, i.e. as a function of iterations/steps? You may
more broadly. What other kind of regularisation penalties
want to compare both convex and non-convex objectives to il-
have been developed and what is the motivation for these?
lustrate some of the issues with non-convexity and importance
You should try to assess at least one further method of
of having a start-value strategy.
regularisation beyond those covered in the course. You can
• Why momentum works - A way to augment standard
examine the performance of the regularised estimates vs the
non-regularised estimates in a synthetic setting. I.e. generate gradient descent (nice website) [9]
• Overview of the popular ADAM method - see link on
some fake data from a known model (this can be as simple as
a Gaussian linear regression), try to estimate the model using website for full paper [10]
• Code ADAM from scratch - A version of adaptive
a standard estimator as well as regularised ones? Discuss your
results and how they compare to those found in the literature. (stepsize) version of gradient descent [11]
• Alternating Directed Method of Multipliers - Convex
Where do you think the benefits of the particular type of
regularisation you looked at may come in useful? How is this optimisation method - [12]
• Convex Optimisation - Popular book by Boyd [13]
used in the literature?
• Original lasso paper - [1]
D. Boosted Decision Trees (XGboost)
• Sparse Group Lasso - There is also a group-lasso variant.
[2] Boosting is a way of combining many weak models by
• Adaptive Lasso - Debiasing the lasso - [3] iteratively modelling the residuals obtained from fitting a
• Fused Lasso - For smoothing parameter estimates [4] model. Basically, we fit one model to the data, look at the
• Wainwright Book - Excellent reference for regularisation residuals, and then fit a model to them. We then add the two
- [5] models together to create a model for the original data.
We introduce tree methods, decision trees, in week 9.
B. Convolutional Neural Networks Boosting techniques are extensions of these and a further
Convolutional Neural Networks (CNN’s) are not really interesting, widely used extension is the extreme gradient
introduced in the taught component. However, are very popular boosting tree (XGBoost). Your task is to take an indepth look
in tasks such as computer vision and learning with large data at the methodology, what are the features of the method which
sets. Your task is to do more reading around these ideas, and make it so successful? I would recommend you demonstrate
demonstrate how a convolutional network can work in practice the package on a real-world data-set. However, it could also
on some real image data. You should feel free to use libraries be interesting to look at how the method can approximate
to estimate the models (c.f. keras), but must perform some a simulated function with additive noise. For instance, a
piecewise constant function with multiple covariates...can you R EFERENCES
provide some intuition into what the method is doing?
• Introduction to Boosting (general) - Very influential au- [1] R. Tibshirani, “Regression shrinkage and selection via
thor [14] the lasso,” Journal of the Royal Statistical Society. Series
• Paper describing XGboost [15] B (Statistical Methodology), 1996. [Online]. Available:
• Basic introduction to boosted trees - XGboost website http://www.jstor.org/stable/10.2307/2346178
[16] [2] N. Simon, J. Friedman, T. Hastie, and R. Tibshirani,
“A Sparse-Group Lasso,” Journal of Computational and
E. Dimensionality Reduction Graphical Statistics, vol. 22, no. 2, pp. 231–245, apr
Techniques such as PCA, ICA, and Probabilistic PCA were 2013. [Online]. Available: http://www.tandfonline.com/
introduced in the first week as a means of summarising high- doi/abs/10.1080/10618600.2012.681250
dimensional data in terms of fewer variables. We discussed [3] H. Zou, “The Adaptive Lasso and Its Oracle Properties,”
these fundamental methods but there exist more complex Journal of the American Statistical Association, vol.
variants which prioritise sparsity of the principal components 101, no. 476, pp. 1418–1429, dec 2006. [Online].
(a similar idea to the Lasso), allow for the combination of Available: http://www.tandfonline.com/doi/abs/10.1198/
variables to be non-linear, or are tailored to ensure robustness 016214506000000735
to outliers. The task in this project would be to explore [4] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu,
and understand one of these or some other variant of a and K. Knight, “Sparsity and smoothness via the
dimensionality reduction technique and compare it to the fused lasso,” Journal of the Royal Statistical Society:
simpler methods. It would be good to consider application Series B (Statistical Methodology), vol. 67, no. 1,
to a real-world dataset and explore ideas such as the patterns pp. 91–108, feb 2005. [Online]. Available: http:
which can be identified in the low dimensional outputs, the //doi.wiley.com/10.1111/j.1467-9868.2005.00490.x
efficiency of the methods, and the suitability of the output for [5] T. Hastie, R. Tibshirani, M. W. Hastie, b. Tibshirani, and
subsequent modelling/analyses. b. Wainwright, Statistical Learning with Sparsity:
• A (relatively recent) review of PCA [17] The Lasso and Generalizations. [Online]. Avail-
• Sparse PCA [18] able: https://web.stanford.edu/∼hastie/StatLearnSparsity
• Robust PCA [19] files/SLS corrected 1.4.16.pdf
[6] I. Goodfellow, “Convolutional Networks,” in
Deep Learning. [Online]. Available: https://www.
deeplearningbook.org/contents/convnets.html
[7] L. Francisco, “Convolutional Networks in R.”
[Online]. Available: https://www.r-bloggers.com/2018/
07/convolutional-neural-networks-in-r/
[8] A. Nagdev, “Convolutional networks un-
der the hood (part 2).” [On-
line]. Available: https://www.r-bloggers.com/2020/02/
convolutional-neural-network-under-the-hood-2/
[9] G. Goh, “Why Momentum Works,” Distil Pubs. [Online].
Available: https://distill.pub/2017/momentum/
[10] D. P. Kingma and J. L. Ba,
“Overview of ADAM.” [Online]. Available:
https://theberkeleyview.wordpress.com/2015/11/19/
berkeleyview-for-adam-a-method-for-stochastic-optimization/
[11] J. Brownlee, “Code ADAM from Scratch.”
[Online]. Available: https://machinelearningmastery.com/
adam-optimization-from-scratch/
[12] S. Boyd, N. Parikh, and E. Chu, “Distributed optimiza-
tion and statistical learning via the alternating direction
method of multipliers,” Foundations and Trends® in
Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.
[13] S. Boyd and L. Vandenberghe, Convex Optimization.
Cambridge University Press, 2004.
[14] R. E. Schapire, “The Boosting Approach to Machine
Learning: An Overview BT - Nonlinear Estimation and
Classification,” Nonlinear Estimation and Classification,
2003.
[15] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree
Boosting System,” ACM Knowledge Discovery in Data,
2016.
[16] XGBoost, “Introduction to Boosted Trees.” [On-
line]. Available: https://xgboost.readthedocs.io/en/latest/
tutorials/model.html
[17] I. T. Jolliffe and J. Cadima, “Principal component anal-
ysis: a review and recent developments,” Philosophical
Transactions of the Royal Society A: Mathematical, Phys-
ical and Engineering Sciences, vol. 374, no. 2065, p.
20150202, 2016.
[18] H. Zou, T. Hastie, and R. Tibshirani, “Sparse princi-
pal component analysis,” Journal of computational and
graphical statistics, vol. 15, no. 2, pp. 265–286, 2006.
[19] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust
principal component analysis?” Journal of the ACM
(JACM), vol. 58, no. 3, pp. 1–37, 2011.

You might also like