ML2 Unsupervised Learning

Unsupervise
d Learning
U N S U P E R V I S E DL E A R N I N
G INPYTHON
Benjamin Wilson
Director of Research at lateral . io
Unsupervised learning
Unsupervised learning finds pa t erns in data
E.g ., clustering c ustomers b y t heir
p urchases
Compressing the data using p urchase pa t erns (dimension

reduction)
UNSUPERVISED LEARNING IN PYTHON

Supervised vs unsupervised learning
Supervised learning finds pa t erns for a prediction
task E.g., classif y t u mors as benign or cancero u s
(labels) Unsupervised learning finds pa t erns in data

... b ut without a speci fic predict ion task in mind

Iris dataset
Meas u rements of man y iris
plants
Three species of iris:

setosa
versicolo
r
virginica
Petal length , petal width ,
sepal length , sepal w idth
(t he features of t he
dataset )
1 ht p :/ / scikit -
learn . org / stable / mod u les / generated / sklearn . datasets . load _ iris . html /

Arrays, features &
samples
2D NumPy arra y
Columns are meas u rements ( the
features) Rows represent iris plants ( the

samples)

Iris data is 4-dimensional
Iris samples are points in 4 dimensional space
Dimension = n umber of feat ures
Dimension too high to visualize!
... b u t unsupervised learning gives insight

k-means clustering
Finds clusters of samples
N u mber of clusters must be speci fi ed
Implemented in ("scikit - learn ")

sklearn

print(samples)
[[ 5. 3.3 1.4 0.2]

[ 5. 3.5 1.3 0.3]
...
[ 7.2 3.2 6. 1.8]]
from sklearn.cluster import KMeans

model = KMeans(n_clusters=3)
model.fit(samples)
KMeans(algorithm='auto', ...)
labels = model.predict(samples)
print(labels)
[0 0 1 1 0 1 2 1 0 1 ...]

Cluster labels for new
samples
New samples can be assigned to ex isting clusters
k - means remembers the mean of each cluster ( the

"centroids ")
Finds the nearest centroid to each new sample

Cluster labels for new
samples
print(new_samples)
[[ 5.7 4.4 1.5 0.4]

[ 6.5 3. 5.5 1.8]
[ 5.8 2.7 5.1 1.9]]
new_labels =
model.predict(new_samples)
print(new_labels)
[0 2 1]

Scatter plots
Sca t er plot of sepal length
vs. petal length
Each point represents an iris
sample
Color points b y cluster

labels
PyPlot
( matplotlib.pyplot
)

Scatter plots
import matplotlib.pyplot as
plt xs = samples[:,0]
ys = samples[:,2]
plt.scatter(xs, ys, c=labels)
plt.show()

Let's practice!
U N S U P E R V I S E DL E A R N I N G I N P Y
THON
Evaluating
a clustering
U N S U P E R V I S E DL E A R N I
NG INPYTHON
Benjamin Wilson
Evaluating a clustering
Can check correspondence w ith e.g. iris species
... b u t w hat if there are no species to check against ?

Meas ure q u alit y of a cl u stering
Informs choice of how man y clusters to look for

Iris: clusters vs species
k - means fo u nd 3 clusters amongst the iris samples
Do the clusters correspond to the species?
species setosa versicolor virginica

labels
0 0 2 36
1 50 0 0
2 0 48 14

Cross tabulation with
pandas
Clusters vs species is a " cross - tab u lation "
Use t he librar y
pandas
Gi ven t he species of each sample as a list
species
print(species)
['setosa', 'setosa', 'versicolor', 'virginica', ... ]

Aligning labels and
species
import pandas as pd
df = pd.DataFrame({'labels': labels, 'species': species})
print(df)
labels species
0 1 setosa
1 1 setosa
2 2 versicolor
3 2 virginica
4 1 setosa
...

Crosstab of labels and
species
ct = pd.crosstab(df['labels'],
df['species']) print(ct)
species setosa versicolor virginica

labels
0 0 2 36
1 50 0 0
2 0 48 14
How to e v al u ate a clustering , if there were no species

information ?

Measuring clustering quality
Using onl y samples and their cluster labels
A good cl u stering has tight clusters
Samples in each clu ster b u nched together

Inertia measures clustering quality
Measures how spread o u t the clusters are (lower is
b e t e r ) Distance from each sample to centroid of its
cluster
A ft er fit() , a v ailable as a t ri b u t e inertia_

k - means a t e m p t s to minimi z e the inertia when choosing
clusters
model = KMeans(n_clusters=3)
model.fit(samples)
print(model.inertia_)
78.9408414261

The number of
clusters
Clusterings of the iris
dataset w ith
di ff erent nu mbers of
clusters
More clusters means lower

inertia
What is the best n u mber of

clusters?

How many clusters to
choose?
A good cl u stering has tight
clusters (so low inertia )
... b u t not too man y

clusters!
Choose an "elbo w" in the

inertia plot
Where inertia begins to

decrease more slowly
E.g., for iris dataset , 3 is

a good choice

Let's practice!
THON
Transforming
features for
better
clusterings
U N S U P E R V I S E DL E A R N I N G I N P Y T
Benjamin Wilson
Director of Research at lateral . io HON
Piedmont wines dataset
178 samples from 3 distinct varieties of red wine:
Barolo, Grignolino and Barbera
Features meas ure chemical composition e.g. alcohol content
Visual properties like " color intensit y"
1 Source: h t ps :// archi v e . ics .u ci . ed u/ ml / datasets / Wine

Clustering the wines
from sklearn.cluster import
KMeans model =
KMeans(n_clusters=3)
labels =
model.fit_predict(samples)

Clusters vs. varieties
df = pd.DataFrame({'labels': labels,
'varieties': varieties})
ct = pd.crosstab(df['labels'],
print(ct)
df['varieties'])
varieties Barbera Barolo Grignolino

labels
0 29 13 20
1 0 46 1
2 19 0 50

Feature variances
The wine feat ures ha ve very
di ff erent variances !
Variance of a feat u re
measures spread of its
values
feature variance
alcohol 0.65
malic_acid 1.24
...
od280 0.50
proline 99166.71

Feature variances
The wine feat ures ha ve very
di ff erent variances !
Variance of a feat u re
measures spread of its
values
feature variance
alcohol 0.65
malic_acid 1.24
...
od280 0.50
proline 99166.71

StandardScaler
In kmeans: feat ure variance = feat ure
influence
StandardScaler transforms each feat u re to ha v e mean 0
and v ariance 1
Features are said to be " standardi z ed "

sklearn StandardScaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(samples)
StandardScaler(copy=True, with_mean=True, with_std=True)
samples_scaled = scaler.transform(samples)

Similar methods
StandardScaler and hav e similar methods
KMeans
Use fit() / transform() w ith StandardScaler
Use / predict() w ith KMeans
fit()

StandardScaler, then KMeans
Need to perform t wo steps: StandardScaler , t hen
KMeans
Use pipeline to combine m u ltiple steps
sklearn
Data flows from one step into the ne x t

Pipelines combine multiple
steps
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
kmeans = KMeans(n_clusters=3)
from sklearn.pipeline import make_pipeline
pipeline = make_pipeline(scaler, kmeans)
pipeline.fit(samples)
Pipeline(steps=...)
labels = pipeline.predict(samples)

Feature standardization improves clustering
With feature
standardization:
labels
0 0 59 3
1 48 0 3
2 0 0 65
Without feature standardization was very

bad:
labels
0 29 13 20
1 0 46 1
2 19 0 50

sklearn preprocessing
steps
StandardScaler is a " preprocessing " step
MaxAbsScaler and are other examples

Normalizer

Let's practice!
THON
Visualizing
hierarchie
s
Benjamin Wilson U N S U P E R V I S E DL E A
RNINGINPYTHON
Visualizations comm u nicate insight
"t-SNE" : Creates a 2D map of a dataset (later )
" Hierarchical clustering " (this video)

A hierarch y of groups
Groups of li v ing things can form a hierarch y
Clusters are contained in one another

Eurovision scoring dataset
Co u ntries ga v e scores to songs performed at the Eurovision
2016
2D arra y of scores
Rows are countries, col umns are songs
1 h t p ://www.eurovision.tv/page/results

Hierarchical clustering of v oting
countries

Hierarchical clustering
Every co u ntr y begins in a separate cluster
At each step, the t w o closest clusters are merged
Contin u e u ntil all co u ntries in a single cluster
This is " agglomerati v e " hierarchical cl u stering

The dendrogram of a hierarchical
clustering
Read from the b o t o m up Vertical lines represent clusters

The dendrogram of a hierarchical
clustering
Read from the b o t o m up Vertical lines represent clusters

Dendrograms, step-by-step







Hierarchical clustering w ith
Giveny samples ( the arra y of scores), and country_names
SciP
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage,
dendrogram mergings = linkage(samples,
method='complete') dendrogram(mergings,
labels=country_names,
leaf_rotation=90,
leaf_font_size=6)
plt.show()

Let's practice !
THON
Cluster labels
in hierarchical
clustering
U N S U P E R V I S E DL E A R N I N G I N
PYTHON
Benjamin Wilson
Cluster labels in hierarchical clustering
Not onl y a visu alization tool !
Cluster labels at an y intermediate stage can be reco v ered
For use in e.g. cross - tab u lations

Intermediate clusterings & height on
dendrogram
E.g. at height 15:
Bulgaria , Cyprus, Greece
are one cluster
Russia and Moldo v a are

another
Armenia in a cluster on its

own

Dendrograms show cluster
distances
Height on dendrogram =
distance bet w een merging
clusters
E.g. clusters w ith onl y

Cyprus and Greece had
distance approx. 6

Dendrograms show cluster
distances
Height on dendrogram =
distance bet w een merging
clusters
E.g. clusters w ith only

Cyprus and Greece had
distance approx. 6
This new cluster distance

approx. 12 from cluster
wit h only B u lgaria

Intermediate clusterings & height on
dendrogram
Height on dendrogram specifies max. distance bet w een
merging clusters
Don ' t merge clusters f u rther apart than this (e.g. 15)

Distance between clusters
De fi ned b y a " linkage method "
In " complete " linkage : distance bet w een clusters is max.

distance bet w een their samples
Speci fi ed via method parameter , e.g. linkage ( samples ,

method =" complete ")
Di ff erent linkage method , di ff erent hierarchical

cl ustering !

E x tracting cluster labels
Use the f u nction
fcluster()
Returns a NumPy arra y of cluster labels

E x tracting cluster labels using
fcl uster
from scipy.cluster.hierarchy import linkage
mergings = linkage(samples,
method='complete') from
scipy.cluster.hierarchy import fcluster
labels = fcluster(mergings, 15,
criterion='distance') print(labels)
[ 9 8 11 20 2 1 17
14 ... ]

Aligning cluster labels w ith co u ntr y
names
Gi ven a list of st rings country_names :
import pandas as pd
pairs = pd.DataFrame({'labels': labels, 'countries':
country_names}) print(pairs.sort_values('labels'))
countries labels
5 Belarus 1
40 Ukraine 1
...
36 Spain 5
8 Bulgaria 6
19 Greece 6
10 Cyprus 6
28 Moldova 7
...

Let's practice !
THON
t-SNE for 2-
dimensional
maps
U N S U P E R V I S E DL E A R N I N G I N P Y T
Benjamin Wilson HON

t-SNE for 2-dimensional maps
t-SNE = " t - distrib u ted stochastic neighbor embedding "
Maps samples to 2D space (or 3D)
Map appro x imatel y preserves nearness of samples
Great for inspecting datasets

t-SNE on the iris
dataset
Iris dataset has 4 measurements, so samples are 4-
dimensional
t-SNE maps samples to 2D space
t-SNE didn ' t kno w that there were di ff erent species
... y et kept the species mostl y separate

Interpreting t-SNE scatter plots
"versicolor" and "v irginica " harder to disting u ish from one
another
Consistent w ith k - means inertia plot : co u ld arg u e for 2

clusters, or for 3

t-SNE in
sklearn
2D N umPy arra y
samples
print(samples)
[[ 5. 3.3 1.4 0.2]

[ 5. 3.5 1.3 0.3]
[ 4.9 2.4 3.3 1. ]
[ 6.3 2.8 5.1 1.5]
...
[ 4.9 3.1 1.5 0.1]]
List gi ving species of labels as n umber (0, 1, or

species 2)
print(species)
[0, 0, 1, 2, ..., 0]

t-SNE in
sklearn
from sklearn.manifold import TSNE
model = TSNE(learning_rate=100)
transformed = model.fit_transform(samples)
xs = transformed[:,0]
ys = transformed[:,1]
plt.scatter(xs, ys, c=species)
plt.show()

t-SNE has only
_transform
fitHas ()
a fit_transform() method
Simultaneously fits the model and transforms the data
Has no separate fit() or transform() methods
Can ' t e x tend the map to incl u de new data samples
M u st start over each time !

t-SNE learning rate
Choose learning rate for the dataset
Wrong choice : points b u nch
together Try values bet w een 50 and
200

Different every time
t-SNE feat u res are di ff erent every time
Piedmont wines, 3 runs, 3 di ff erent s c a t e r
plots !
... however: The wine varieties (=colors) have same position

relati v e to one another

Let's practice !
THON
Visualizing the PCA
transformation
U N S U P E R V I S E DL E A R N I N G I N P Y T H O
Benjamin Wilson
Dimension reduction
More e ffi cient storage and comp u tation
Remove less - informati v e "noise" feat u res
... which cause problems for prediction tasks,

e.g. classiification, regression

Principal Component Analysis
PCA = " Principal Component Analysis"
F u ndamental dimension red u ction techniq u e
First step " decorrelation " ( considered here)
Second step reduces dimension ( considered

later )

PCA aligns data with
axes
Rotates data samples to be aligned w ith axes
Shiifts data samples so the y ha ve mean 0
No information is lost

PCA follows the fit/transform pattern
PCA is a scikit - learn component like KMeans or
StandardScaler
fit() learns the transformation from gi v en data
transform() applies the learned transformation
transform() can also be applied to new data

Using scikit-learn PCA
samples = arra y of t w o feat u res ( total_phenols & od280
)
[[ 2.8 3.92]
...
[ 2.05 1.6 ]]
from sklearn.decomposition import PCA

model = PCA()
model.fit(samples)
PCA(copy=True, ...)
transformed = model.transform(samples)

PCA features
Rows of transformed correspond to samples
Columns of transformed are the "PCA feat ures"
Row gives PCA feat u re values of corresponding

sample
print(transformed)
[[ 1.32771994e+00 4.51396070e-01]
[ 8.32496068e-01 2.33099664e-01]
...
[ -9.33526935e-01 -4.60559297e-01]]

PCA features are not correlated
Features of dataset are oiften correlated , e.g.
total_phenols
and od280
PCA aligns the data w ith axes
Resulting PCA feat u res are not linearl y correlated

(" decorrelation ")

Pearson correlation
Measures linear correlation of feat u res
Value bet ween -1 and 1
Value of 0 means no linear correlation

Principal components
" Principal components " = directions of v ariance
PCA aligns principal components w ith the axes

Principal components
A v ailable as components_ a t r ib u t e of PCA object
Each row deifines displacement from mean
print(model.components_)
[[ 0.64116665 0.76740167]
[-0.76740167 0.64116665]]

Let's practice!
THON
Intrinsic dimension
U N S U P E R V I S E DL E A R N I N G I N P Y T H
ON
Benjamin Wilson
Intrinsic dimension of a flight
path
2 feat ures: longit ude latitude longitude
and latit u de at points 50.529 41.513
along a ff ight path 50.360 41.672
Dataset appears to be 2- 50.196 41.835
dimensional ...
But can appro x imate using

one featu re: displacement
along ff ight path
Is intrinsicall y 1-
dimensional

Intrinsic dimension
Intrinsic dimension = n u mber of feat u res needed to
appro x imate the dataset
Essential idea behind dimension red u ction
What is the most compact representation of the samples?
Can be detected w ith PCA

Versicolor dataset
"versicolor", one of the iris species
Onl y 3 features: sepal length , sepal width , and petal
w idth Samples are points in 3D space

Versicolor dataset has intrinsic dimension 2
Samples lie close to a ff at 2- dimensional sheet So can be appro ximated
using 2 feat ures

PCA identifies intrinsic dimension
Sca t er plot s work only if samples ha ve 2 or 3 feat ures
PCA identiifies intrinsic dimension when samples hav e an y

n u mber of feat u res
Intrinsic dimension = n u mber of PCA feat ures w ith signiificant

v ariance

PCA of the versicolor
samples

PCA features are ordered by variance
descending

Variance and intrinsic dimension
Intrinsic dimension is n u mber of PCA feat u res w ith signiificant
v ariance
In our example: the ifirst t w o PCA feat ures
So intrinsic dimension is 2

Plotting the variances of PCA
features
samples = arra y of versicolor
samples
from sklearn.decomposition import
PCA pca = PCA()
pca.fit(samples)
PCA(copy=True, ... )
features = range(pca.n_components_)

Plotting the variances of PCA
features
plt.bar(features,
pca.explained_variance_)
plt.xticks(features)
plt.ylabel('variance')
plt.xlabel('PCA feature')
plt.show()

Intrinsic dimension can be
ambiguous
Intrinsic dimension is an ideali z ation
... there is not always one correct answer!

Piedmont wines: co u ld arg u e for 2, or for 3, or
more

Let's practice!
THON
Dimension reduction
with PCA
U N S U P E R V I S E DL E A R N I N G I N P Y T H O N
Benjamin Wilson
Dimension reduction
Represents same data , using less feat u res
Important part of machine - learning pipelines
Can be performed using PCA

Dimension reduction with PCA
PCA feat u res are in
decreasing order
of v ariance
Assumes the low v ariance

feat u res are "noise"
... and high v ariance

feat u res are informati v e

Specif y how man y feat u res to keep
E.g. PCA(n_components=2)
Keeps t he ifirst 2 PCA feat ures
Intrinsic dimension is a good choice

Dimension reduction of iris dataset
samples = arra y of iris measurement s (4
feat ures)
species
= list of iris species n umbers
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca.fit(samples)
PCA(copy=True, ... )
transformed = pca.transform(samples)
print(transformed.shape)
(150, 2)

Iris dataset in 2
dimensions
PCA has red u ced the import matplotlib.pyplot as
dimension to 2 plt xs = transformed[:,0]
ys = transformed[:,1]
Retained t he 2 PCA plt.scatter(xs, ys, c=species)
feat ures w ith highest plt.show()
v ariance
Important information
preserved: species remain
distinct

Discards low v ariance PCA feat u res
Assumes the high v ariance feat u res are informati v e
Ass u mption t y picall y holds in practice (e.g. for iris)

Word frequency arrays
Rows represent doc u ments , columns represent words
Entries measure presence of each word in each doc u ment
... measure using " tf - idf " ( more later )

Sparse arrays and csr_matrix
"Sparse": most entries are zero
Can use scipy.sparse.csr_matrix instead of NumPy arra y
csr_matrix remembers onl y the non-zero entries (saves

space!)

TruncatedSVD and csr_matrix
scikit - learn PCA doesn ' t s u pport
csr_matrix
Use scikit - learn instead
TruncatedSVD
Performs same transformation
from sklearn.decomposition import TruncatedSVD

model = TruncatedSVD(n_components=3)
model.fit(documents) # documents is
csr_matrix
TruncatedSVD(algorithm='randomized', ... )
transformed = model.transform(documents)

Let's practice!
THON
Non-negative matri x
factori z ation (NMF)
U N S U P E R V I S E DL E A R N I N G I N P Y T H O N
Benjamin Wilson
Non-negative matri x factori z ation
NMF = "non -negat i ve mat ri x
factori zat ion " Dimension red u ction
techniq u e
NMF models are interpretable (unlike PCA)
Easy to interpret means easy to explain!
However, all sample feat u res mu st be non -

negati v e (>= 0)

Interpretable parts
NMF expresses doc u ments as combinations of topics (or
"themes")

Interpretable parts
NMF expresses images as combinations of p a t erns

Using scikit - learn NMF
Follows fit() / transform() p a t e r n
M u st specif y n u mber of components e.g.

NMF(n_components=2)
Works w ith NumPy arra y s and w ith

csr_matrix

Example word-frequency arra y
Word freq u enc y arra y, 4 words, man y doc u ments
Meas ure presence of words in each doc u ment using " tf - idf "
"t f " = freq uenc y of word in doc ument
" idf " reduces inftuence of freq u ent words

Example usage of NMF
samples is the word- freq u enc y arra y

NMF model = NMF(n_components=2)
model.fit(samples)
NMF(alpha=0.0, ... )
nmf_features =
model.transform(samples)

NMF components
NMF has components
... just like PCA has principal components

Dimension of component s = dimension of
samples Entries are non - negati v e
print(model.components_)
[[ 0.01 0. 2.13 0.54]

[ 0.99 1.47 0.
0.5 ]]

NMF features
NMF feat u re values are non - negati v e
Can be used to reconstr u ct the samples
... combine feat u re values w ith

components
print(nmf_features)
[[ 0. 0.2 ]
[ 0.19 0. ]
...
[ 0.15 0.12]]

Reconstruction of a sample
print(samples[i,:])
[ 0.12 0.18 0.32 0.14]
print(nmf_features[i,:])
[ 0.15 0.12]

Sample reconstr uction
M u ltipl y components b y feat u re values, and add up
Can also be expressed as a prod u ct of matrices
This is the " M atri x F actori z ation " in " NMF "

NMF fits to non-negative data
onl y frequencies in each document
Word
Images encoded as arra y s
A u dio spectrograms
Purchase histories on e - commerce

sites
... and man y more!

Let's practice !
THON
NMF learns
interpretable parts
ON
Benjamin Wilson
Example: NMF learns interpretable
parts
Word - freq u ency arra y articles ( tf - idf )
20,000 scientiific articles (rows)

800 words (columns)

Appl y ing NMF to the articles
print(articles.shape)
(20000, 800)
from sklearn.decomposition import NMF

nmf = NMF(n_components=10)
nmf.fit(articles)
NMF(alpha=0.0, ... )
print(nmf.components_.shape)
(10, 800)

NMF components are topics




NMF components
For doc u ments :
NMF components represent topics
NMF feat u res combine topics into doc u ments
For images , NMF components are parts of images

Grayscale images
"Gra yscale " image = no colors, only shades of
gra y Measure pixel brightness
Represent w ith value bet w een 0 and 1 (0 is
black ) Con v ert to 2D arra y

Grayscale image example
An 8x8 gra y scale image of the moon , w r i t e n as an arra y

Grayscale images as flat
arra ys the entries
En u merate
Row-by-row
From le ft to right , top

to b o t o m

Grayscale images as flat
arra ys the entries
En u merate
Row-by-row
From le ft to right , top

to b o t o m

Encoding a collection of
images
Collection of images of the same size
Encode as 2D arra y
Each row corresponds to an
image Each col u mn corresponds
to a pixel
... can appl y NMF!

Visualizing samples
print(sample)
[ 0. 1. 0.5 1. 0. 1. ]
bitmap = sample.reshape((2, 3))

print(bitmap)
[[ 0. 1. 0.5]
[ 1. 0. 1. ]]
from matplotlib import pyplot as plt

plt.imshow(bitmap, cmap='gray', interpolation='nearest')
plt.show()

Let's practice !
THON
Building
recommender
systems using
NMF
Benjamin Wilson
Director of Research at lateral . io ON
Finding similar articles
Engineer at a large online ne w spaper
Task: recommend articles similar to article being read b y

c u stomer
Similar articles should ha ve similar topics

Strateg
y Apply NMF to the word-frequency array
NMF feat u re values describe the topics
... so similar doc u ments have similar NMF feat u re values

Compare NMF feat u re values?

Appl y NMF to the word-frequency
arra y is a word frequency array
articles

NMF nmf = NMF(n_components=6)
nmf_features =
nmf.fit_transform(articles)

Strateg
y Apply NMF to the word-frequency array
NMF feat u re values describe the topics
... so similar doc u ments have similar NMF feat u re values

Compare NMF feat u re values?

Versions of articles
Diifferent versions of the same doc u ment hav e same topic
proportions
... e x act feat u re values ma y be diifferent!

proportions

E.g. beca u se one version uses man y meaningless words

proportions

E.g. beca u se one version uses man y meaningless
words But all versions lie on the same line thro u gh the
origin

Cosine similarit y
Uses the angle bet w een the lines
Higher values means more similar
Ma x im u m value is 1, when angle

is 0 degrees

Calc u lating the cosine similarities
from sklearn.preprocessing import
normalize norm_features =
normalize(nmf_features)
# if has index 23
current_article = norm_features[23,:]
similarities =
norm_features.dot(current_article)
print(similarities)
[ 0.7150569 0.26349967 ..., 0.20323616 0.05047817]

DataFrames and labels
Label similarities w ith the article titles , using a DataFrame
Titles gi v en as a list: titles
import pandas as pd
norm_features = normalize(nmf_features)
df = pd.DataFrame(norm_features,
index=titles) current_article = df.loc['Dog
bites man'] similarities =
df.dot(current_article)

DataFrames and labels
print(similarities.nlargest())
Dog bites man 1.000000

Hound mauls cat 0.979946
Pets go wild! 0.979708
Dachshunds are 0.949641
dangerous
0.900474
Our streets are no longer safe
dtype: float64

Let's practice !
THON
Final thoughts
THON
Benjamin Wilson
Congratulations!
THON

ML2 Unsupervised Learning

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML2 Unsupervised Learning

Uploaded by

Copyright:

Available Formats

Unsupervise

E.g ., clustering c ustomers b y t heir

Compressing the data using p urchase pa t erns (dimension

UNSUPERVISED LEARNING IN PYTHON

(labels) Unsupervised learning finds pa t erns in data

UNSUPERVISED LEARNING IN PYTHON

Three species of iris:

UNSUPERVISED LEARNING IN PYTHON

features) Rows represent iris plants ( the

UNSUPERVISED LEARNING IN PYTHON

Dimension = n umber of feat ures

Dimension too high to visualize!

... b u t unsupervised learning gives insight

UNSUPERVISED LEARNING IN PYTHON

N u mber of clusters must be speci fi ed

Implemented in ("scikit - learn ")

UNSUPERVISED LEARNING IN PYTHON

[[ 5. 3.3 1.4 0.2]

from sklearn.cluster import KMeans

UNSUPERVISED LEARNING IN PYTHON

k - means remembers the mean of each cluster ( the

UNSUPERVISED LEARNING IN PYTHON

[[ 5.7 4.4 1.5 0.4]

UNSUPERVISED LEARNING IN PYTHON

Color points b y cluster

UNSUPERVISED LEARNING IN PYTHON

UNSUPERVISED LEARNING IN PYTHON

... b u t w hat if there are no species to check against ?

Informs choice of how man y clusters to look for

UNSUPERVISED LEARNING IN PYTHON

species setosa versicolor virginica

UNSUPERVISED LEARNING IN PYTHON

['setosa', 'setosa', 'versicolor', 'virginica', ... ]

UNSUPERVISED LEARNING IN PYTHON

UNSUPERVISED LEARNING IN PYTHON

species setosa versicolor virginica

How to e v al u ate a clustering , if there were no species

UNSUPERVISED LEARNING IN PYTHON

A good cl u stering has tight clusters

Samples in each clu ster b u nched together

UNSUPERVISED LEARNING IN PYTHON

b e t e r ) Distance from each sample to centroid of its

A ft er fit() , a v ailable as a t ri b u t e inertia_

from sklearn.cluster import KMeans

UNSUPERVISED LEARNING IN PYTHON

More clusters means lower

What is the best n u mber of

UNSUPERVISED LEARNING IN PYTHON

... b u t not too man y

Choose an "elbo w" in the

Where inertia begins to

E.g., for iris dataset , 3 is

UNSUPERVISED LEARNING IN PYTHON

Features meas ure chemical composition e.g. alcohol content

Visual properties like " color intensit y"

1 Source: h t ps :// archi v e . ics .u ci . ed u/ ml / datasets / Wine

UNSUPERVISED LEARNING IN PYTHON

UNSUPERVISED LEARNING IN PYTHON

varieties Barbera Barolo Grignolino

UNSUPERVISED LEARNING IN PYTHON

UNSUPERVISED LEARNING IN PYTHON

UNSUPERVISED LEARNING IN PYTHON

Features are said to be " standardi z ed "