Professional Documents
Culture Documents
7 Regression
7 Regression
predicted
expected
Mean Absolute Error
• Mean Absolute Error
A Note on Target Distributions, when to use
MSE vs MAE?
• MSE is good when your target variable follows a normal or
symmetrical distribution
• MAE is good when your target variable follows a skewed distribution
Coefficient of Determination – R2 score
(Goodness of Fit)
• Usually between 0 and 1
• Can be negative if fit of model is really bad
• The closer to 1, the better fit
Random Forest Regressor
• Random forests are a
collection of decision
trees where the
predictions of all trees
are averaged.
Which Model to Choose? Underfitting vs
Overfitting
How to choose a model?
• Usually we will try a couple of different models ( algorithms ) such as
linear regression vs random forest vs decision tree.
• We can do some hyperparameter tuning for each of the models to
see which sets of hyperparameters perform better. Then we compare
the models.
• We stick to one evaluation metric for all models, like MSE or MAE.
Multiple is fine as long as you do it for every model.
• We also assess if we want a simpler or more heavy model depending
on our application.
Housing Data Lab
• Load in housing data from Kaggle: https://www.kaggle.com/datasets/
yasserh/housing-prices-dataset
• Fit linear regression algorithm using at least one feature to find the price
of the house (split into training and testing data first)
• Use model.coef_ to find what the coefficients of linear regression are
• If using features that are categorical , make sure you encode them
• What are the first 5 residuals?
• Calculate the MSE of the first 5 predictions.
• What is the R^2 value of the entire fit of the model? Use
sklearn.metrics.r2_score()
Time Series- Autoregression
• Autoregression :
when you use one
variable (the past)
to predict the
same variable at
another time (the
future).
Final Project Specifications
• 15-20 minute presentations + 5 minutes for questions
• Use slides: Introduction, Analysis, Conclusions , Future work
• Show your code
• Data Analysis Project:
• At least 5 visualizations with conclusions.
• Make sure you tell a story with your data.
• Machine Learning Project:
• At least two models with comparisons (except if using neural networks) with
the same metric (MSE/precision, etc)
• What challenges did you face? Talk about hyperparameter tuning, model
comparison, metrics, etc.