Professional Documents
Culture Documents
PyCDS 15 MachineLearning
PyCDS 15 MachineLearning
PyCDS 15 MachineLearning
15.1 Q2: Which of the following kinds of predictions are happening today with
machine learning?
a. Improving weather prediction, cancer diagnoses and treatment regimens to
save lives.
b. Predicting customer “churn,” what prices houses are likely to sell for, ticket
sales of new movies, and anticipated revenue of new products and services.
c. Predicting the best strategies for coaches and players to use to win more
games and championships, while experiencing fewer injuries.
d. All of the above.
Answer: d. All of the above.
1 Scikit-Learn
15.1 Q3: Which of the following statements is false?
a. Scikit-learn conveniently packages the most effective machine-learning
algorithms as evaluators.
b. Each scikit-learn algorithm is encapsulated, so you don’t see its intricate
details, including any heavy mathematics.
c. With scikit-learn and a small amount of Python code, you’ll create powerful
models quickly for analyzing data, extracting insights from the data and most
importantly making predictions.
d. All of the above statements are true.
Answer: a. Scikit-learn conveniently packages the most effective machine-
learning algorithms as evaluators. Actually, Scikit-learn, also called
sklearn, conveniently packages the most effective machine-learning
algorithms as estimators.
a. With scikit-learn, you’ll train each model on a subset of your data, then test
each model on the rest to see how well your model works.
b. Once your models are trained, you’ll put them to work making predictions
based on data they have not seen. You’ll often be amazed at how accurate your
models will be.
c. With machine learning, your computer that you’ve used mostly on rote chores
will take on characteristics of intelligence.
d. Although you can specify parameters to customize scikit-learn models and
possibly improve their performance, if you use the models’ default parameters
for simplicity, you’ll often obtain mediocre results.
Answer: d. Although you can specify parameters to customize scikit-learn
models and possibly improve their performance, if you use the models’
default parameters for simplicity, you’ll often obtain mediocre results.
Actually, if you use the models’ default parameters for simplicity, you’ll
often obtain impressive results.
15.1 Q5: Which of the following statements about scikit-learn and the machine-
learning models you’ll build with it is false?
a. It’s difficult to know in advance which model(s) will perform best on your
data, so you typically try many models and pick the one that performs best—
scikit-learn makes this convenient for you.
b. You’ll rarely get to know the details of the complex mathematical algorithms
in the scikit-learn estimators, but with experience, you’ll be able to intuit the
best model for each new dataset.
c. It generally takes at most a few lines of code for you to create and use each
scikit-learn model.
d. The models report their performance so you can compare the results and pick
the model(s) with the best performance.
Answer: b. You’ll rarely get to know the details of the complex
mathematical algorithms in the scikit-learn estimators, but with
experience, you’ll be able to intuit the best model for each new dataset.
Actually, even with a great deal of experience with these models you won’t
be able to intuit the best model for each new dataset you’ll work with, so
scikit learn makes it easy for you to try them all.
unlabeled photos it will recognize dogs and cats it has never seen before. The
more photos you train with, the greater the chance that your model will
accurately predict which new photos are dogs and which are cats.
d. In this era of big data and massive, economical computer power, you should
be able to build some pretty accurate machine learning models.
Answer: a. The two main types of machine learning are supervised
machine learning, which works with unlabeled data, and unsupervised
machine learning, which works with labeled data. Actually, the two main
types of machine learning are supervised machine learning, which works
with labeled data, and unsupervised machine learning, which works with
unlabeled data.
15.1 Q12: Which of the following are related to compressing a dataset’s large
number of features down to two for visualization purposes.
a. dimensionality reduction
b. TSNE estimator
c. both a) and b)
© Copyright 19922020 by Pearson Education, Inc. All Rights Reserved.
Chapter 15, Machine Learning: Classification, Regression and Clustering 5
d. neither a) nor b)
Answer: c. both a) and b)
d. In the k-nearest neighbors algorithm, the class with the most “votes” wins.
Answer: c. Always pick an even value of k. Actually, picking an odd k value
in the kNN algorithm avoids ties by ensuring there’s never an equal
number of votes.
train_size. Use floating-point values from 0.0 through 100.0 to specify the
percentages of the data to use for each.
c. You can use integer values to set the precise numbers of samples.
d. If you specify one of the keyword arguments test_size and train_size,
the other is inferred—for example, the statement
X_train, X_test, y_train, y_test = train_test_split(
digits.data, digits.target, random_state=11, test_size=0.20)
all the incorrect predictions for the entire test set—that is, the cases in which
the predicted and expected values do not match:
wrong = [(p, e) for (p, e) in zip(predicted, expected) if p !=
e]
d. All of the above statements are true.
Answer: d. All of the above statements are true.
15.3 Q3: Consider the confusion matrix for the Digits dataset’s predictions:
array([[45, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 45, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 54, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 42, 0, 1, 0, 1, 0, 0],
[ 0, 0, 0, 0, 49, 0, 0, 1, 0, 0],
[ 0, 0, 0, 0, 0, 38, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 42, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 45, 0, 0],
[ 0, 1, 1, 2, 0, 0, 0, 0, 39, 1],
[ 0, 0, 0, 0, 1, 0, 0, 0, 1, 41]])
Which of the following statements is false?
a. The correct predictions are shown on the diagonal from top-left to bottom-
right—this is called the principal diagonal.
b. The nonzero values that are not on the principal diagonal indicate incorrect
predictions
c. Each row represents one distinct class—that is, one of the digits 0–9.
d. The columns within a row specify how many of the test samples were
classified incorrectly into each distinct class.
Answer: d. The columns within a row specify how many of the test samples
were classified incorrectly into each distinct class. Actually, the columns
within a row specify how many of the test samples were classified into
each distinct class—either correctly (on the principal diagonal) or
incorrectly (off the principal diagonal).
15.3 Q5: h. When you display a confusion matrix as a heat map, the principal
diagonal and the incorrect predictions stand out nicely:
Which of the following statements about the above heat map version of a
confusion matrix is false?
a. Seaborn’s graphing functions work with two-dimensional data such as pandas
DataFrames.
b. The following code converts a confusion matrix into a DataFrame, then
graphs it:
import pandas as pd
confusion_df = pd.DataFrame(confusion, index=range(10),
columns=range(10))
1 K-Fold Cross-Validation
15.3 Q6: Which of the following statements is false?
a. K-fold cross-validation enables you to use all of your data at once for training
your model.
b. K-fold cross-validation splits the dataset into k equal-size folds.
c. You then repeatedly train your model with k – 1 folds and test the model with
the remaining fold.
d. For example, consider using k = 10 with folds numbered 1 through 10. With
10 folds, we’d do 10 successive training and testing cycles:
First, we’d train with folds 1–9, then test with fold 10.
Next, we’d train with folds 1–8 and 10, then test with fold 9.
Next, we’d train with folds 1–7 and 9–10, then test with fold 8.
This training and testing cycle continues until each fold has been used to test the
model.
Answer: a. K-fold cross-validation enables you to use all of your data at
once for training your model. Actually, K-fold cross-validation enables you
to use all of your data for both training and testing, to get a better sense of
how well your model will make predictions for new data by repeatedly
training and testing the model with different portions of the dataset.
1 Hyperparameter Tuning
15.3 Q10: Which of the following statements is false?
value 1 in kNN produces the most accurate predictions for the Digits
dataset.
a. Scikit-learn has separate classes for simple linear regression and multiple
linear regression.
b. To find the best fitting regression line for the data, a LinearRegression
estimator iteratively adjusts the slope and intercept values to minimize the sum
of the squares of the data points’ distances from the line.
c. Once LinearRegression is complete, you can use the slope and intercept in
the y = mx + b calculation to make predictions. The slope is stored in the
estimator’s coeff_ attribute (m in the equation) and the intercept is stored in
the estimator’s intercept_ attribute (b in the equation).
d. All of the above are true.
Answer: a. Scikit-learn has separate classes for simple linear regression
and multiple linear regression. Actually, scikit-learn does not have a
separate class for simple linear regression because it’s just a special case
of multiple linear regression.
15.3 Q24: Consider the following visualization that you studied in Chapter 15,
Machine Learning:
Answer: d. When you make predictions with an overfit model, the model
won’t know what to do with new data that matches the training data, but
the model will make excellent predictions with data it has never seen.
Actually, when you make predictions with an overfit model, new data that
matches the training data will produce excellent predictions, but the
model will not know what to do with data it has never seen.
b. We can peek at some of the data in the DataFrame using the head function:
In [13]: california_df.head()
Out[13]:
MedInc HouseAge AveRooms AveBedrms Population AveOccup \
0 8.3252 41.0 6.9841 1.0238 322.0 2.5556
1 8.3014 21.0 6.2381 0.9719 2401.0 2.1098
2 7.2574 52.0 8.2881 1.0734 496.0 2.8023
3 5.6431 52.0 5.8174 1.0731 558.0 2.5479
4 3.8462 52.0 6.2819 1.0811 565.0 2.1815
c. The \ to the right of the column head "AveOccup" in Part (b)’s output
indicates that there are more columns displayed below. You’ll see the \ only if
the window in which IPython is too narrow to display all the columns left-to-
right.
d. All of the above statements are true.
Answer: d. All of the above statements are true.
15.5 Q8: In the context of the California Housing dataset, which of the following
statements is false?
a. The following code creates a LinearRegression estimator and invokes its
fit method to train the estimator using X_train and y_train:
from sklearn.linear_model import LinearRegression
linear_regression = LinearRegression()
linear_regression.fit(X=X_train, y=y_train)
b. Multiple linear regression produces separate coefficients for each feature
(stored in coeff_) in the dataset and one intercept (stored in intercept).
c. For positive coefficients, the median house value increases as the feature
value increases. For negative coefficients, the median house value decreases as
the feature value decreases.
d. You can use the coefficient and intercept values with the following equation to
make predictions:
y = m1x1 + m2x2 + … mnxn + b
where
m1, m2, …, mn are the feature coefficients,
b is the intercept,
x1, x2, …, xn are the feature values (that is, the values of the independent
variables), and
y is the predicted value (that is, the dependent variable).
Answer: c. For positive coefficients, the median house value increases as
the feature value increases. For negative coefficients, the median house
value decreases as the feature value decreases. Actually, for negative
coefficients, the median house value decreases as the feature value
increases.
In [32]: predicted[:5]
Out[32]: array([1.25396876, 2.34693107, 2.03794745, 1.8701254 , 2.53608339])
In [33]: expected[:5]
Out[33]: array([0.762, 1.732, 1.125, 1.37 , 1.856])
c. With classification, we saw that the predictions were distinct classes that
matched existing classes in the dataset. With regression, it’s tough to get exact
predictions, because you have continuous outputs. Every possible value of x1, x2
… xn in the calculation
y = m1x1 + m2x2 + … mnxn + b
predicts a different value.
d. All of the above statements are true.
Answer: d. All of the above statements are true.
In [48]: estimators = {
...: 'LinearRegression': linear_regression,
...: 'ElasticNet': ElasticNet(),
...: 'Lasso': Lasso(),
...: 'Ridge': Ridge()
...: }
c. The following code from our example runs the estimators using k-fold cross-
validation with a KFold object and the cross_val_score function. The code
passes to cross_val_score the additional keyword argument scoring='r2',
which indicates that the function should report the R 2 scores for each fold—
again, 1.0 is the best, so it appears that LinearRegression and Ridge are the
best models for this dataset:
In [49]: from sklearn.model_selection import KFold, cross_val_score
15.7 Q2: Which of the following statements about the k-means clustering
algorithm is false?
a. Each cluster of samples is grouped around a centroid—the cluster’s center
point.
b. Initially, the algorithm chooses k centroids at random from the dataset’s
samples. Then the remaining samples are placed in the cluster whose centroid is
the closest.
c. The centroids are iteratively recalculated and the samples re-assigned to
clusters until, for all clusters, the distances from a given centroid to the samples
in its cluster are maximized.
d. The algorithm’s results are a one-dimensional array of labels indicating the
cluster to which each sample belongs, and a two-dimensional array of centroids
representing the center of each cluster.
Answer: c. The centroids are iteratively recalculated and the samples re-
assigned to clusters until, for all clusters, the distances from a given
centroid to the samples in its cluster are maximized. Actually, the
centroids are iteratively recalculated and the samples re-assigned to
clusters until, for all clusters, the distances from a given centroid to the
samples in its cluster are minimized.
c. The Iris dataset is referred to as a “toy dataset” because it has only 150
samples and four features. The dataset describes 50 samples for each of three
Iris flower species—Iris setosa, Iris versicolor and Iris virginica.
d. All of the above statements are true.
Answer: d. All of the above statements are true.
b. In the Iris dataset, the first 50 samples are Iris setosa, the next 50 are Iris
versicolor, and the last 50 are Iris virginica.
c. If the KMeans estimator chose the Iris dataset clusters perfectly, then each
group of 50 elements in the estimator’s labels_ array should have mostly the
same label.
d. All of the above statements are true.
Answer: c. If the KMeans estimator chose the Iris dataset clusters perfectly,
then each group of 50 elements in the estimator’s labels_ array should
have mostly the same label. Actually, if the KMeans estimator chose the Iris
dataset clusters perfectly, then each group of 50 elements in the
estimator’s labels_ array should have a distinct label.
c. The following code scatterplots the DataFrame in Part (b) using Seaborn:
estimators = {
'KMeans': kmeans,
'DBSCAN': DBSCAN(),
'MeanShift': MeanShift(),
'SpectralClustering': SpectralClustering(n_clusters=3),
'AgglomerativeClustering':
AgglomerativeClustering(n_clusters=3)
}
c. Like KMeans, you specify the number of clusters in advance for the
SpectralClustering and AgglomerativeClustering, DBSCAN and
MeanShift estimators.
d. All of the above statements are true.
Answer: c. Like KMeans, you specify the number of clusters in advance for
the SpectralClustering and AgglomerativeClustering, DBSCAN and
MeanShift estimators. Actually, c. like KMeans, you specify the number of
clusters in advance for only the SpectralClustering and
KMeans:
0-50:
label=1, count=50
50-100:
label=0, count=48
label=2, count=2
100-150:
label=0, count=14
label=2, count=36
DBSCAN:
0-50:
label=-1, count=1
label=0, count=49
50-100:
label=-1, count=6
label=1, count=44
100-150:
label=-1, count=10
label=1, count=40
© Copyright 19922020 by Pearson Education, Inc. All Rights Reserved.
38 Chapter 15, Machine Learning: Classification, Regression and Clustering
MeanShift:
0-50:
label=1, count=50
50-100:
label=0, count=49
label=1, count=1
100-150:
label=0, count=50
SpectralClustering:
0-50:
label=2, count=50
50-100:
label=1, count=50
100-150:
label=0, count=35
label=1, count=15
AgglomerativeClustering:
0-50:
label=1, count=50
50-100:
label=0, count=49
label=2, count=1
100-150:
label=0, count=15
label=2, count=35
b. Interestingly, the output in Part (a) shows that DBSCAN correctly predicted
three clusters (labeled -1, 0 and 1), though it placed 84 of the 100 Iris virginica
and Iris versicolor samples in the same cluster.
c. The output in Part (a) shows MeanShift estimator, on the other hand,
predicted only two clusters (labeled as 0 and 1), and placed 99 of the 100 Iris
virginica and Iris versicolor samples in the same cluster:
d. All of the above statements are true.
Answer: Actually, d. All of the above statements are true.