July4 SaketAnand FriendlyIntroToML

A Friendly Introduction to
Machine Learning
with Applications to Visual Analytics
Saket Anand
Asst. Professor
TCPD Summer School 2019

Is Machine Learning Really Affecting Us?
• Gmail and Google Search • Maps, Navigation and Travel
• Smart Reply, Personalized Ads • Google, Uber, Ola
• Banking and Finance
• IVR, Fraud Protection
• Social Media Feeds • Online Shopping

• Facebook, LinkedIn, Instagram • Amazon, Flipkart
• Music and Media Streaming
• Netflix, Youtube, Spotify
Autonomous Driving
Volvo’s driverless truck

Autonomous Driving
Natural and Spoken Language Processing
Conversational Agents and Optical Character Recognition and
Translators Translation
Apple Google Microsoft

Siri Assistant Cortana
Screenshot from the movie Cars 3 on Netflix India

Image Analysis and Computer Vision
iPhone Xs vs. Pixel 3’s Night Sight Mode

Source: XDA Dev. review
Pitfalls and Perils of ML
• Chinese billionaire’s face • Google’s panorama generation
identified as jaywalker thought a skier was a mountain!
• Surveillance system
misinterpreted a billboard image
for a jaywalker
Pitfalls and Perils of ML
• Claim: Machines have a better • Uber hits and kills a pedestrian in
“gaydar” Tempe, Arizona, US
• If true: privacy breach
• If false: reinforces prejudice
Source: the Medium
• “DeepFakes” used to generate fake

1
Source: The Guardian porn using Hollywood celebrity
facial images
Efforts towards AI for Social Good
• Microsoft AI for Earth • Wadhwani Institute for AI
• Climate Change, Agriculture, • Focus on Societal problems in
Biodiversity and Water India: Health, Agriculture,
Education, Financial Inclusion
• Google AI for Social Good

• Flood prediction • Center for AI in Society, USC
• Cardiac risk prediction • Focus on public health, social
• Mapping global fishing activity work, conservation, safety and
security
Tackling Climate Change with Machine
Learning
Source: Rolnick et al., “Tackling Climate Change with Machine Learning”, ArXiv, 10th Jun. 2019
- 22 authors from 16 organizations
Machine Learning Overview
What should you learn?
• Modelling a learning problem
• Various algorithms (techniques) for solving ML problems
• Pitfalls while designing ML systems
• Modelling, Generalization, Regularization & Model Selection, (hyper)-Parameter
tuning, Overfitting, Underfitting
• Engineering Tricks Possibly the most important components
• Debugging ML systems
• Importance of Domain Knowledge
• Not treating ML techniques as a black box
• Simplify the learning problem by using domain knowledge
Machine Learning Paradigms
• Supervised Learning
• Labelled data – (Data, target value)
• Target value could be category/class labels, real value, real vector, etc.
• Classification, Regression
• Unsupervised Learning
• Only data, no labels
• Density Estimation, Clustering
• Semi-supervised Learning
• Some labelled data and lots of unlabelled data
• Multiple-Instance Learning
The Supervised Learning Problem
• Described through three components
• Data samples from some unknown distribution
• A supervisor (oracle) provides labels
• A learning machine capable of implementing a set of functions
• The learning problem is to choose from the given set of functions the
one which “best” approximates the supervisor’s response.
• The selection is based on training samples
Vapnik, “Principles of Risk Minimization for Learning Theory”, NIPS 1992

Loss functions
• The best function, will minimize a cost (or loss) between the desired
response (supervisor) and the predicted one
• Since we want to minimize the loss over all samples, we are

interested in minimizing the expected (average) loss
Performance Evaluation of Learning Tasks
• Measuring Performance: How well does a learned model work?
• Performance measured by ‘estimating’ the TRUE ERROR RATE, the model’s
error rate on the ENTIRE POPULATION.
• The true error rate is estimated on an unseen ‘test’ set
• different from a typical optimization problem
• Evaluation Metrics
• Classification: Accuracy
• Regression: Mean Squared Error
• Retrieval: Precision/Recall, F-Score
• Ranking: mean Average Precision
• Clustering: Normalized Mutual Information
Performance Evaluation of Learning Tasks
• Entire population is unavailable
• Finite set of training data, usually smaller than desired
• Naïve approach: use all available data

• The final model will typically overfit the training data
• More pronounced with high-capacity models (e.g., neural nets)
• The true error rate is underestimated
• Not uncommon to have 100% accuracy on training data
Validation Method: K-Fold Cross-validation
• Create a K-fold partition of the dataset
• For each of K experiments, use K-1 folds for training and the remaining one
for testing
• True error is estimated as the average error rate

Debugging ML Algorithms
• Motivating Example : Binary Classification
• Often encountered in learning problems: dogs/cats
• A reasonable baseline approach is logistic regression
• Logistic Regression with gradient descent generates a test error of

40%
• What to do next?
How to Debug an ML Algorithm?
Hit and Try and Pray to God! Systematic Diagnosis
• Try getting more training examples. • Analyse variance/bias

• Try a smaller set of features.
• Try a larger set of features.
• Analyse the optimization algorithm
• Try changing the features.
• Run gradient descent for more iterations.
• Try Newton’s method.
• Analyse the objective function
• Use a different value for λ.
• Try using an SVM.
Bias vs. Variance Analysis
• Typical learning curve for high variance:
• Test error still decreasing as training set size increases.

• Suggests a larger training set will help.
• Large gap between training and test error
• Likely overfitting
Bias vs. Variance Analysis
• Typical learning curve for high bias:
Similar analyses could be done

with optimization algorithm and
objective function.
• Even training error is unacceptably high.

• Features are not discriminative enough
• Small gap between training and test error.
• Likely underfitting: a higher capacity model could be tried
degrees of freedom
Source: Introduction to Statistical Learning, James et al.

Diagnostics for ML Algorithms
• Try getting more training examples. • Fixes high variance.
• Try a smaller set of features. • Fixes high variance.
• Try a larger set of features. • Fixes high bias.
• Try changing the features. • Fixes high bias.
• Run gradient descent for more
iterations. • Fixes optimization algorithm.
• Try Newton’s method instead of
gradient descent. • Fixes optimization algorithm.
• Use a different value for reg.
parameter λ. • Fixes optimization objective.
• Try using a different model (e.g., SVM).
• Fixes optimization objective
Machine Learning Tools
For Programmers For non-Programmers
• Scikit-learn • Google’s Cloud AutoML
• Pytorch • Microsoft Azure ML Studio
• Tensorflow • IBM Watson Studio
• obviously.ai
Python is the most popular
Support for other languages like R, Java,
C/C++, Ruby, Julia, etc.) And others…
Machine Learning Techniques
Regression: Linear Models
• Linear Regression
• Used for a continuous, normally
distributed response (Y ϵ ꓣm) variable
• Works with the following loss function
Image source: https://alykhantejani.github.io

Regression: Linear Models
• Fitting polynomials with linear models
• Key trick: change of variables
Logistic Regression
• Useful when the target variable is
binary / categorical
• Same as linear regression on

log(odds)
• Change of variables via the logit
function
• Can be used for classification

problems
Image source: https://songhuiming.github.io/
Binary Classification
Linear Classifiers
What is the “best” w?
Support Vector Machines
Decision Trees
Day Weather Temperature Humidity Wind Play?
1 Sunny Hot High Weak No
• Given: Observations of certain 2

3
Cloudy
Sunny
Hot
Mild
High
Normal
Weak
Strong
Yes
Yes
variables and outcome of interest 4

5
Cloudy
Rainy
Mild
Mild
High
High
Strong
Strong
Yes
No
• Devise a summary of rules to

6 Rainy Cool Normal Strong No
7 Rainy Mild High Weak Yes
help predict outcomes, given new 8

9
Sunny
Cloudy
Hot
Hot
High
Normal
Strong
Weak
No
Yes
observations 10 Rainy Mild High Strong No
• Intermediate decisions at a node

• Final predictions at leaf node
• Simple, yet powerful
• Interpretable
• Handles categorical variables well
Random Decision Forests
• Train many different trees
• By repeatedly sampling from

training set
• Evaluate each test sample

using each of the trees
• Voting mechanism to combine
outcomes
A general ensemble technique called Bagging
Bagging = Bootstrap Aggregating
Can be applied to any classifier
The Perceptron
• The building blocks of Neural Networks
{
𝑛
1 𝑖𝑓 ∑ 𝑊 𝑖 𝑥𝑖 >0
𝑥 0=1 𝑂 ( 𝒙)= 𝑖 =0
𝑥1 𝑊1 𝑊0 −1 𝑜𝑡h𝑒𝑟𝑤𝑖𝑠𝑒
𝑥2
.
.
𝑊2
𝑊𝑛
∑ 𝑂
. Activation function: Usually a smooth
nonlinear, squishing function
𝑥𝑛
Perceptron
• A single perceptron can model a linearly separable boundary, but not
a non-linear boundary
Multi-Layered Perceptron
• Combine Perceptrons to model nonlinear decision boundaries
𝑥 𝑦
Neural Networks (NN) vs SVMs
Applications to Visual Analytics
Visual Recognition Tasks
Image Classification
Is it a natural or man
made scene
Is it a forest or a
beach?
Does this image

contains a building?
Object Detection
Does this image

contain a building?
[where]
Which objects does

this image contains?
Object Detection
[pixel wise
localization]
Which pixels are

building?
Applications: Object Attributes
Building:
42 m height
100 m away Car:
Police Car
Frontal View
Autonomous
and Assistive
Driving
Applications: Instance Recognition
Does this image
contains “India
Gate”?
Recognizing
landmarks in images
Recognizing products
in super market
Image Classification
• A core task in computer vision
(assume given set of discrete labels)

{dog, cat, truck, plane, ...}
Cat
The Problem: Semantic Gap
• Images are represented as
3D arrays of numbers, with
integers between [0, 255].
• E.g. 300 x 100 x 3 (3 for 3

color channels RGB)
Challenges: View Point Variation
Challenges: Illumination Variation
Challenges: Deformation
Challenges: Occlusion
Challenges: Background Clutter
Challenges: Intra-class Variation
Image Classification with AlexNet
ImageNet Large-Scale Visual Recognition
Challenge (ILSVRC)
• 10,000,000 labelled images depicting 10,000+ object categories
collected from flickr and other search engines.
• ILSVRC 2012
• Validation and test data of 150,000 photographs, hand labelled with 1000
object categories.
• A random subset of 50,000 of the images with labels released as validation
data
• The training data, containing the 1000 categories, and 1.2 million images,
• Evaluation
• Output a list of 5 object categories in descending order of confidence
• Two error rates: top-1 and top-5
http://image-net.org/challenges/LSVRC/2012/
AlexNet Results on ImageNet Dataset
Convolutional Layers
Convolutional layers are locally connected
• a filter/kernel/window slides on the image
or the previous map
• the position of the filter explicitly provides

information for localizing
• local spatial information w.r.t. the window

is encoded in the channels
Convolutional layers share weights spatially:
translation-invariant
• Translation-invariant: a translated region

will produce the same response at the
correspondingly translated position
• A local pattern’s convolutional response

can be re-used by different candidate
regions
• Convolutional layers can be applied to images of any sizes, yielding
proportionally-sized outputs
Object Detection
Semantic Segmentation
Vision
for
Wildlife
from Conservation to Conflict Management
Conservation: Need of the Hour!
International illegal
demand of wildlife
parts hit India hard in
mid 1990’s...
…led to local tiger

extinction in Sariska &
Panna, and yet the
official tiger population
was 3500!
Source: WII, NTCA

Monitoring India’s Tiger Population
Need for scientific methodology The Wildlife Institute of India’s grim 2008 report
shocked India and the world with its findings: A far more
accurate camera trap based survey counted just 1,411
adult tigers—after a $400 million investment over 34
years to save them under Project Tiger –Media.
2008
WII’s 4-phase pan-India protocol
Source: WII
Population 1,411 3500 72
Camera Trap based Demography
• Extensive use of Camera Traps!
• Irrefutable evidence of tiger
presence
• Accurate, inexpensive and

convenient (for the scientists and
the tigers!)
• ~9500 in 2014; ~15000 in 2018

• Over 2 Crore (20M) images expected
• Captures other species as well
Extract-Compare – Hiby et al., 2009
~31,000 images of ~1650 individual tigers

photo-captured in 2013 – 14: Over 70% tigers out of estimated population 73
Source: WII
Augmenting Anti-Poaching Patrols
• Aerial Surveillance of
Forest Reserves with UAVs
• Detection of poachers /
animals in Thermal Infra-
Red videos
• Very Challenging!!
Source: Center for AI in Society, University of Southern California 75

Urban Conflicts : Rhesus Macaque
• Rapid population growth
• low Male:Female ratios in troupes
• promiscuous mating and small
gestation periods
• Native ranges (India)

• Aggressive species:
• Threat to bonnet macaque
• ~1000 human bites/day nationally
• Huge agricultural losses due to
cropraiding
• ~ ₹500 crores annually
77
Source: Mid Career Training Programme (MCT) Phase IV for Indian Forest Officers, 2014; Video Source: Youtube - World's Sneakiest Animals: Episode 2 Preview - BBC
Managing Macaques in Himachal
• Initiatives by Government • Existing strategy of capture is
• Translocation tedious
• commensals find nearby human settlements
• Identification of capture area, census of
• Mass awareness & legislation against macaque population, identification of
feeding of monkeys feeding sites etc.
• Too difficult to enforce
• Trap / bait/ capture team familiarization
• Effective Solution: Mass sterilisation • Effective Solution: Crowd-sourcing

campaign monitoring and reporting
• effective, slow, long term, large scale, but • Identify individuals, troupe members,
invasive location, behaviour, etc.
• Population reduction from 317K in 2003-
04 to 226K in 2013 key technology challenge
78
Source: Mid Career Training Programme (MCT) Phase IV for Indian Forest Officers, 2014; Slide Credit: Dr. Sandeep Rattan
Our Efforts in Visual Wildlife Monitoring
• Tiger Census 2014
• ~9500 Camera Traps
• Tens of lakhs of images
• manually sorted by species
• Semi-automatic individual
tiger/leopard identification
• Goal: Tiger Census 2018

• ~15000 Camera Traps
• Crores of camera trap images
• Currently testing: automatic
species categorization (~50 species)
Tiger macaque
Unique Identification of Individuals
Monkey Population Monitoring and Control* Tiger Census and Conservation
Challenges ?
Large number of classes with few samples per individual

Background clutter, camouflage, illumination variations
Pose Variations
● Snap & send crowdsourcing app
● ● Automatic flank matching followed
No annotation with a face
alignment-free model by user validation
Where we ●
● Learning representations robust Learning representations robust to
*Supported by
are today pose variations
to pose variations
MS AI for Earth
Acknowledgment
Prof. Y. V. Jhala Prof. Qamar Qureshi Prof. Milind Tambe Dr. Bistra Dilkina Liz Bondi
PhD Student
Ankita Shukla
PhD Student Dr. Ryan Farrell 81
Thank You!
Big Cat Mortality inLeopard
India
Tiger Mortality
Mortality
(2015-2017)
83
Slide source: Wildlife Protection Society of India (WPSI)
India – WPSI 201 20 Nov.
Tiger Data 2016 2017
4 15 2018
Documented
23 26 50 37 30
Poached Tigers
Other
Documented 58 65 82 78 68
Mortalities
Total 81 91 132 115 98
84
Current Approach: SPOT
Precision Recall
SPOT ESN ESE SPOT ESN ESE
0.4235 0.0024 0.0573 0.3697 0.0432 0.2836
Too low for

Bondi et al., “SPOT Poachers in Action: Augmenting full automation
Conservation Drones with Automatic Detection in Near Real Time”, IAAI 2018
85
Ren et al., “Faster R-CNN: towards real-time object detection with region proposal networks”, NIPS 2015
Vision for Wildlife
from conservation to conflict management
• Intelligent Wildlife Monitoring • Augmenting Manual Patrolling with
• Goal: automatic indexing of image Unmanned Aerial Surveillance
datasets by species and individuals • Goal: automatic detection of animals and
• Applications: camera-trap based humans in Thermal IR aerial videos
population monitoring; crowd- • Applications: detecting illegal entry in
sourced reporting of wildlife conflict restricted areas; estimating animal
density
86

July4 SaketAnand FriendlyIntroToML

Uploaded by

Copyright:

Available Formats

You might also like

July4 SaketAnand FriendlyIntroToML

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

July4 SaketAnand FriendlyIntroToML

Uploaded by

Copyright:

Available Formats

A Friendly Introduction to

TCPD Summer School 2019

• Social Media Feeds • Online Shopping

Volvo’s driverless truck

Apple Google Microsoft

Screenshot from the movie Cars 3 on Netflix India

iPhone Xs vs. Pixel 3’s Night Sight Mode

Source: the Medium

• “DeepFakes” used to generate fake

• Google AI for Social Good

Vapnik, “Principles of Risk Minimization for Learning Theory”, NIPS 1992

• Since we want to minimize the loss over all samples, we are

• Finite set of training data, usually smaller than desired

• Naïve approach: use all available data

• True error is estimated as the average error rate

• Logistic Regression with gradient descent generates a test error of

Hit and Try and Pray to God! Systematic Diagnosis

• Try getting more training examples. • Analyse variance/bias

• Test error still decreasing as training set size increases.

Similar analyses could be done

• Even training error is unacceptably high.

Source: Introduction to Statistical Learning, James et al.

• Scikit-learn • Google’s Cloud AutoML

• Pytorch • Microsoft Azure ML Studio

• Tensorflow • IBM Watson Studio

Image source: https://alykhantejani.github.io

• Same as linear regression on

• Can be used for classification

• Given: Observations of certain 2

variables and outcome of interest 4

• Devise a summary of rules to

help predict outcomes, given new 8

observations 10 Rainy Mild High Strong No

• Intermediate decisions at a node

• By repeatedly sampling from

• Evaluate each test sample

Does this image

Does this image

Which objects does

Which pixels are

(assume given set of discrete labels)

• E.g. 300 x 100 x 3 (3 for 3

• the position of the filter explicitly provides

• local spatial information w.r.t. the window

• Translation-invariant: a translated region

• A local pattern’s convolutional response

…led to local tiger

Source: WII, NTCA

• Accurate, inexpensive and

• ~9500 in 2014; ~15000 in 2018

~31,000 images of ~1650 individual tigers

Source: Center for AI in Society, University of Southern California 75

• Native ranges (India)

• Effective Solution: Mass sterilisation • Effective Solution: Crowd-sourcing

• Goal: Tiger Census 2018

Large number of classes with few samples per individual

Too low for