Professional Documents
Culture Documents
Chapter 3
Chapter 3
Chapter 3
reduction
A I F U N D A M E N TA L S
Nemanja Radojkovic
Senior Data Scientist
De nition
"Dimensionality reduction is the process of reducing the number of variables under consideration by
obtaining a set of principal variables."
AI FUNDAMENTALS
Why?
Pro's
Enable visualization
Con's
AI FUNDAMENTALS
Types
Feature selection (B ? A) Feature extraction (B ? A)
AI FUNDAMENTALS
Common algorithms
Linear (faster, deterministic) Non-linear (slower, non-deterministic)
AI FUNDAMENTALS
Principal Component Analysis (PCA)
Family: Linear methods.
Intuition:
AI FUNDAMENTALS
Use it wisely!
A I F U N D A M E N TA L S
Clustering
A I F U N D A M E N TA L S
Nemanja Radojkovic
Senior Data Scientist
What is clustering?
Cluster = Group of entities or events sharing
similar attributes.
AI FUNDAMENTALS
Popular clustering algorithms
KMeans clustering
Spectral clustering
DBSCAN
AI FUNDAMENTALS
AI FUNDAMENTALS
AI FUNDAMENTALS
AI FUNDAMENTALS
How many clusters do I have?
AI FUNDAMENTALS
How many clusters do I have?
AI FUNDAMENTALS
Cluster analysis and tuning
Unsupervised (no "ground truth", no expectations)
AI FUNDAMENTALS
Explore, experiment
and tune!
A I F U N D A M E N TA L S
Anomaly detection
A I F U N D A M E N TA L S
Nemanja Radojkovi?
Senior Data Scientist
De nition and use cases
Detecting unusual entities or events.
Use cases
Credit card fraud detection
Heart-rate monitoring
AI FUNDAMENTALS
Approaches: Thresholding
AI FUNDAMENTALS
Approaches: Rate of change
AI FUNDAMENTALS
Approaches: Shape monitoring
AI FUNDAMENTALS
Algorithms
Robust covariance (assumes normal distribution)
AI FUNDAMENTALS
AI FUNDAMENTALS
Training and testing
Example: Isolation Forest
algorithm = IsolationForest()
AI FUNDAMENTALS
Evaluation
from sklearn.metrics \
Example: Arrhythmia detection
import (confusion_matrix,
precision_score,
recall_score)
confusion_matrix(y_true, y_predicted)
AI FUNDAMENTALS
Want to learn more?
A I F U N D A M E N TA L S
Selecting the right
model
A I F U N D A M E N TA L S
Nemanja Radojkovi?
Senior Data Scientist
Model-to-problem t
Type of Learning
Regression
Clustering?
Anomaly Detection?
AI FUNDAMENTALS
De ning the priorities
Interpretable models
Decision Trees
Simplicity rst!
AI FUNDAMENTALS
Using multiple metrics
Satisfying metrics
Multiple satisfying metrics possible (e.g. minimum accuracy, maximum execution time, etc)
Optimizing metrics
Illustrates the ultimate business priority (e.g. "minimize false positives", "maximize recall")
Final model:
Passes the bar on all satisfying metrics and has the best score on the optimization metric.
AI FUNDAMENTALS
Interpretation
Global
Common approaches:
Decision tree visualization
Local
AI FUNDAMENTALS
Model selection and
interpretation
A I F U N D A M E N TA L S