Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

Data Science for Molecular

Engineering
Week 7
Non-linear regression
• ILOs
• Understand the principles of polynomial regression, the concept of
regularization;
• Understand the principles of logistic regression, and its use in binary
classification;
• Understand the principles of multiple classification
What if a linear trend is not available

You can always do a linear


regression, but the model is not
good enough (This is called
Underfitting)
Recall for linear regression:

Can have different function forms


Non-linear regression
• Solution 1. feature transformation
• If y and x are not linear, what about y and x2, or x3 ?

• This is the idea of polynomial regression – using linear model to fit


non-linear data
Polynomial regression
• First, the original feature is transformed into polynomial features with
user-defined degree
• A linear model is then fitted using these transformed features
• The same solution for linear regression can be used to solve polynomial
regression
Overfitting
• What if a very high degree is selected?

Blue points are real data.


A linear model underfits the data;
A quadratic model is about right;

What about a 300-degree model?

What is bad about overfitting?


Why can overfitting happen?
How to tell whether you are
underfitting/overfitting?
• Validation
• Set aside part of the training data to evaluate model performance DURING
training
• Validation data is not directly used to update model parameters (i.e. not used
in gradient calculation)
• Validation data can be used to select the best training method
• Validation data can be used to tune the training process
Learning curves
• Learning curves: curves that plot the comparison of model trained on
the training set and model use on the validations with the same
parameter
• Learning curves can have different types, with the following being the
most common:
• 1) Learning curve over the amount of data
• 2) Learning curve over the number of iterations
Learning curves over the amount of data
• Retrain the model many times with increasing number of data points
and plot some performance metrics on training and validation
datasets

Q1. Why is the training error


increasing and validation error
decreasing?

Q2. Is this underfitting or


overfitting, or neither?
Which is underfitting/overfitting/just right?
Learning curves over the number of iterations
• Train the model once; record the intermediate states of training and
validation losses

Q1. Why is the training loss decreasing?

Q2. Why is the validation loss decreasing and


then increasing?
How to mitigate overfitting?
• Model
• Model selection
• Model regularization
• Training process – early stopping

• Data
• Collect more data
Model selection
• Train different models on the training set and compare results on
validation set
• Cross-validation (usually when dataset size is small)
• Use different partitions of the training data to ensure that the model is
generalizable

Validation data
Regularization
• Regularization is a good way to reduce overfitting
• Penalize large parameter values () when training the model
• Ridge regression

• Lasso regression

• Elastic Net
Early stopping (for iterative methods)
Logistic regression
• Instead of transforming features, logistic regression transforms the
output to a ”probability” (a number between zero and one) using the
sigmoid/logistic function
Logistic regression loss function
• log loss/binary cross entropy

Unfortunately, no closed form solution; need to use gradient descent method


Softmax regression
• The logistic regression can be generalized to multiple classes
Binary Multiple class

Between 0 and 1
Sum up to1 The softmax function

• Softmax regression loss function


• Categorical cross entropy

You might also like