Professional Documents
Culture Documents
01.black Box ML PDF
01.black Box ML PDF
01.black Box ML PDF
David S. Rosenberg
Bloomberg ML EDU
P(pneumonia) = 0.7
P(flu) = 0.2
.. ..
. .
Fig 1-1 from Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurelien Geron (2017).
David S. Rosenberg (Bloomberg ML EDU) September 20, 2017 12 / 67
Rule-Based Systems
Fig 1-2 from Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurelien Geron (2017).
David S. Rosenberg (Bloomberg ML EDU) September 20, 2017 16 / 67
Key Concepts
Definition
Mapping raw input x to Rd is called feature extraction or featurization.
From Percy Liang’s "Lecture 3" slides from Stanford’s CS221, Autumn 2014.
David S. Rosenberg (Bloomberg ML EDU) September 20, 2017 21 / 67
Feature Template: Last Three Characters Equal ___
From Percy Liang’s "Lecture 3" slides from Stanford’s CS221, Autumn 2014.
David S. Rosenberg (Bloomberg ML EDU) September 20, 2017 22 / 67
Feature Template: One-Hot Encoding
one-hot encoding: a set of binary features that always has exactly one nonzero value.
categorical variable: a variable that takes one of several discrete possible values:
NYC Boroughs: “Brooklyn”, “Bronx”, “Queens”, “Manhattan”, “Staten Island”
Categorical variables can be encoded numerically using one-hot encoding.
In statistics, called a dummy variable encoding
Concept Check: How many features to one-hot encode the boroughs?
Today is about what’s outside the “purple box”. Rest of course is about the inside.
feature extraction
maps raw inputs into arrays of numeric values
ideally, extracts essential features of the input
A loss function scores how far off a prediction is from the desired “target” output.
loss(prediction, target) returns a number called “the loss”
Big Loss = Bad Error
Small Loss = Minor Error
Zero Loss = No Error
e.g. Split labeled data randomly into 80% training and 20% test.
Train/Test:
Build model on training data (say 80% of all labeled data).
Get performance estimate on test data (remaining 20%).
Train/Deploy:
Build model on all labeled data.
Deploy model into wild.
Hope for the best.
Random split of labeled data into train/test is usually the right approach.
(why random?)
Jatin Garg (https://stats.stackexchange.com/users/123886/jatin-garg), Using k-fold cross-validation for time-series model selection, URL
(version: 2017-03-22): https://stats.stackexchange.com/q/268847
David S. Rosenberg (Bloomberg ML EDU) September 20, 2017 37 / 67
Summary: What to Give your Data Science Intern
T̂ = Mean(T1 , . . . , Tk )
√
SE(T̂ ) = SD(T1 , . . . , Tk )/ k.
Provost and Fawcett Data science for Business, Figure 5-9.
David S. Rosenberg (Bloomberg ML EDU) September 20, 2017 40 / 67
Forward Chaining (Cross Validation for Time Series)
Jatin Garg (https://stats.stackexchange.com/users/123886/jatin-garg), Using k-fold cross-validation for time-series model selection, URL
(version: 2017-03-22): https://stats.stackexchange.com/q/268847
David S. Rosenberg (Bloomberg ML EDU) September 20, 2017 41 / 67
Key Concepts
loss functions
e.g. 0/1 loss (for classification)
e.g. square loss (for regression)
Examples:
identifying cat photos by using the title on the page
including sales commission as a feature when ranking sales leads
Sample bias: Test inputs and deployment inputs have different distributions.
Examples:
create a model to predict US voting patterns, but phone survey only dials landlines
building a stock forecasting model, but training using a random selection of companies that
exist today – what’s the issue?
Concept drift: correct output for given input changes over time
e.g. season changes, and given person no longer interested in winter coats
e.g. last week I was looking for a new car, this week I’m not
f (x) = w0 + w1 x + w2 x 2 + · · · + wM x M
Fit with M = 0:
Fit with M = 1
Fit with M = 3
PRETTY GOOD!
Fit with M = 9
Fix overfitting by
Reducing model complexity
Getting more training data
NAILED IT?
From Bishop’s Pattern Recognition and Machine Learning, Ch 1.
David S. Rosenberg (Bloomberg ML EDU) September 20, 2017 60 / 67
Hyperparameters (or “Tuning Parameters”)