Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

The course features five modules:

Module 1: Making sense of unstructured data

Modern businesses, scientific and engineering laboratories, and Web 2.0 generate vast
quantities of data, often without existing labels. To make sense of this data, a principal
challenge becomes to discover patterns or latent structure where none is known
beforehand. For instance, we might want to discover an organic organization of
documents, such as articles collected from the New York Times or Wikipedia, into
distinct groups representing topics or themes. We might want to discover latent
communities in social networks, such as Facebook or Twitter. We might to figure out
which aspects of text or images, such as those on Imgur or Google images, capture the
important information encapsulated in these data formats. In this module, we offer an
overview of modern techniques for addressing these problems across a variety of
different types of data. We demonstrate the usefulness of these methods in a number of
case studies.


Spectral Clustering, Components and Embeddings
Case Studies

Module 2: Regression and Prediction

The module provides an introduction to regression, combining both classical and modern
views. We will begin with bivariate and multivariate regression for purposes of
prediction and causal inference, followed by logistic and nonlinear regression. We then
go over a menu of modern prediction methods that aim to solve prediction problems well
using high-dimensional data, namely lasso, ridge and various modifications. We shall
discuss regression trees, boosted trees, and random forests, followed by a basic view of
neural networks, all for prediction purposes. We will discuss the assessment of
prediction performance using validation samples and cross-validation. We will conclude
with a brief discussion of how to use these methods for inferring causal effects of a
treatment in randomized control trials and in the presence of confounding.


Classical Linear & nonlinear regression & extension

Modern Regression with High-Dimensional Data
The use of modern Regression for causal inference
Case Studies

Module 3: Classification, Hypothesis Testing and Anomaly Detection

This module provides a basic introduction to statistical methods of classification, testing
hypothesis and its applications, including detection of statistical anomalies, detection of
frauds, spams, and other malicious behaviors. The course will begin by describing
informally the range of applications of these techniques and then move on to methods,
mostly evolving around the methods of classifications. Those include binary
classification, logistic and probit regression, perceptron method and neural networks
method, support vector machines, and others. Several examples will be introduced to
illustrate the application of the discussed methods. Finally, the course will discuss the
limitations of the methods, the importance of careful usage and the dangers of misuse of
the discussed methods.


Hypothesis Testing and Classification

Deep Learning
Case Studies

Module 4: Recommendation Systems

Recommendation systems have become primary way to discover relevant information
from vast amounts of data. Examples include media recommendations by Netflix,
YouTube and Spotify; online dating suggestions by Tinder; news feeds by Facebook; and
product recommendations by Amazon and more. This module provides a systematic
overview of principles and algorithms for designing and developing recommendation
systems. The content is exemplified using concrete case studies.


Recommendations and ranking

Collaborative filtering
Personalized recommendations
Case Studies
Wrap-up: Parting remarks and challenges

Module 5: Networks and Graphical Models

From social networks to gene regulatory networks, networks form the backbone for many
of the processes we care about. Local interactions between basic entities in a network
give rise to large-scale network effects such as the spread of information or ideas. How
do we make use of network data to understand the behavior or functionality of the
network? This module provides a systematic overview of methods for analyzing large
networks, determining important structure in such networks, and for inferring missing
data. An emphasis is placed on graphical models both as a powerful way to model
network processes and to facilitate efficient statistical computation. The course content is
illustrated via case studies.

Graphical Models
Case Studies

You might also like