Professional Documents
Culture Documents
2021 10 11 - Intro ML - Inserm
2021 10 11 - Intro ML - Inserm
Chloé-Agathe Azencott
CBIO, Mines ParisTech & Institut Curie
http://cazencott.info
What is learning?
●
Learning: acquiring a skill by
experience, practice
What is learning?
●
Learning: acquiring a skill by
experience, practice
skill = algorithm / model
What is learning?
●
Learning: acquiring a skill by
experience, practice
skill = algorithm / model
experience = data
What is learning?
●
Learning: acquiring a skill by
experience, practice
skill = algorithm / model
experience = data
●
Machine learning: using data to build
an algorithm / a model
Artificial intelligence
●
Reproduce (with machines) behaviours of life we perceive as
intelligent
●
Involves much more than machine learning!
●
Perception, reasoning, language, motion, etc.
Machine
learning
Artificial
Deep learning
intelligence
1. Exemples of
machine learning problems
1. Supervised machine learning
Making predictions
Data ML Predictor
Labels
1. Supervised machine learning
Problem 1: binary classification
Example: Identification of metastases in lymph nodes biopsies
H Y
R LT
C E E A
N H
CA
Babak Ehteshami Bejnordi et al. (2017), Diagnostic Assessment of Deep Learning Algorithms for
Detection of Lymph Node Metastases in Women With Breast Cancer, JAMA.
1. Supervised machine learning
Problem 2: regression
Example: solubility of a molecule in ethanol
Acetaminophen
Aspirin
mL mL
/ /
m g mg
25 80
Chloé-Agathe Azencott et al. (2007). One- to four-dimensional kernels for virtual screening and the
prediction of physical, chemical and biological properties. Journal of Chemical Information and Modelling
2. Unsupervised learning
Data exploration:
Better understand your data
Data ML Data!
2. Unsupervised learning
Problem 1: Clustering
Group similar samples together
Data ML
2. Unsupervised learning
Problem 1: Clustering
Example: disease subtype identification
Hege G. Russnes (2017) Breast Cancer Molecular Stratification: From Intrinsic Subtypes to Integrative
Clusters, The American Journal of Pathology
2. Unsupervised learning
Problem 2: Dimensionality reduction
Represent your data with fewer features
m
Data ML
X
n
2. Unsupervised learning
Problem 2: Dimensionality reduction
Example: Project SNP data on 2 dimensions
2. Hypothesis space H
the shape of the model f = what kind of model we can learn
Learning a supervised learning model
1. Training data D
n samples x1, x2, …, xn and their labels y1, y2, … yn
2. Hypothesis space H
the shape of the model f = what kind of model we can learn
3. Loss function L
L(y, f(x)) = the error made by predicting f(x) instead of y
Learning a supervised learning model
1. Training data D
n samples x1, x2, …, xn and their labels y1, y2, … yn
2. Hypothesis space H
the shape of the model f = what kind of model we can learn
3. Loss function L
L(y, f(x)) = the error made by predicting f(x) instead of y
Empirical risk minimization:
Find the model f in the hypothesis space H that minimizes the loss L on
average on the training data D
Learning a supervised learning model
●
Linear/logistic regression, support vector machines
●
Hypothesis space
linear models (weighted sum of the features)
●
Optimization procedure
usually fast, easy, accurate
Non-linear models
Idea 1: Create new features
Quadratic model
● Map (x1, x2, …, xp) to φ(x1, x2, …, xp)
●
Use a linear approach
●
Hypothesis space: linear models of φ(x)
●
Optimization procedure: same as before
●
Example: quadratic regression
●
φ(x1, x2, …, xp) = (x1, x2, …, xp, x12, x1x2, …,
xp2)
non-linear function of a
linear combination of the inputs
●
Optimization procedure: no guarantee to find the solution
see L. Ralaivola’s talk
Idea 4: Use a tree-based hypothesis space
●
Decision trees
●
Hypothesis space: Color?
– models look like if (x1 > 0.3) and [(x3 = 1) and Grey Yellow
… or (x4 < 2.9)…] then label = y
Horn? Stripes?
– Categorical and quantitative features
●
Optimization procedure: Yes No Yes No
Heuristic! No guarantee
●
Often perform poorly
Idea 4: Use a tree-based hypothesis space
●
Random forests
●
Hypothesis space:
– Combination of many trees
– (Ensemble learning)
●
Optimization procedure:
– Learn each tree independently from the others
Vote
3. How to avoid overfitting
(≈ learning by heart)
Overfitting & generalization
●
The true challenge of machine learning:
learning a model that works on new data
●
Overfitting: when the model is specific to the training data but
doesn’t generalize to new data
●
Particularly likely to happen with
few samples and very many
features
(hello, genomics!)
Regularization
●
Regularized empirical risk minimization
Find the model f in the hypothesis space H that minimizes the loss L on average
on the training data D
Regularization
●
Regularized empirical risk minimization
Find the model f in the hypothesis space H that minimizes the loss L on average
on the training data D under some constraints
Regularization
●
Regularized empirical risk minimization
Find the model f in the hypothesis space H that minimizes the loss L on average
on the training data D under some constraints
●
The constraints are meant to keep your model simple
– Weight decay / ridge: prevents coefficients from growing too large
– Sparsity: sets some coefficients to zero (remove the corresponding features)
4. Evaluating & choosing a
supervised ML model
Set aside a final test set
Full data set
●
You are not allowed to touch the test set during training
– Not when deciding which ML algorithm to use (model selection)
– Nor when fitting the model
– Nor when pre-processing the features (feature engineering, feature selection)
– Nor when removing outliers.
Set aside a final test set
Full data set
●
You need to choose an evaluation criterion
– Classification: Accuracy, balanced accuracy, precision, recall, etc.
– Regression: RMSE, R2, etc.
Conclusion
ML = statistics + computing
●
How it works under the hood
– Pick a hypothesis class (modeling)
– Minimize a loss function (optimization)
– Regularize to avoid overfitting (modeling again)
Conclusion
ML = statistics + computing
●
How it works under the hood
– Pick a hypothesis class (modeling)
– Minimize a loss function (optimization)
– Regularize to avoid overfitting (modeling again)
●
How it works as a user
– Represent your data as input vectors (or choose kernels)
(often 80% of the work)
– Decide on a few ML algorithms to try out
– Evaluate performance unbiasedly on a left-out test set
Thanks !