Professional Documents
Culture Documents
IE506 IntrotoML 2024jan5
IE506 IntrotoML 2024jan5
Applications
IE 506
Lecture 0
1 Introduction
Binary Classification
e-mail Spam Classification
Multi-class Classification
Handwriting Recognition
Multi-class Classification
Handwriting Recognition
Action Recognition
Multi-label Classification
Object Recognition
Clustering
Part-of-Speech Tagging
Machine Translation
Computational Biology
Protein Structure Prediction
Supervised Learning
▶ Inputs and corresponding outputs are known during learning
▶ e.g. Regression, Classification (Binary, Multi-class, Multi-label)
Unsupervised Learning
▶ Input objects are generally not labeled
▶ e.g. Clustering, Principal-component Analysis
Semi-supervised Learning
▶ learning from a few labeled data
▶ e.g. customer reviews are available only for a few products in a seller’s
website
Binary Classification
Recall: e-mail Spam Classification
Binary Classification
Binary Classification
Generally many input/output pairs are given for learning the machine
learning model.
Binary Classification
Feature Extraction
Binary Classification
Feature Extraction
Binary Classification
Training
Input: Training data D = {(x i , y i )}ni=1
Aim: Learn a model h : X → Y
Training
Input: Training data D = {(x i , y i )}ni=1
Aim: Learn a model h : X → Y
Testing
Given x̂, predict ŷ = h(x̂)
P. Balamurugan A Broad Overview of Machine Learning and Applications
January 5 & 9, 2024. 35 / 49
Nature of Machine Learning Tasks Supervised Machine Learning
Classification Algorithms
SVM
SVM
Decision Tree
Decision Tree
Regression
Regression
Fitting a curve
Regression
Fitting a curve
Pn
Model: y (x; β) = β0 + j=1 βj ϕj (x)
Regression
Fitting a curve
Pn
Model: y (x; β) = j=0 βj ϕj (x)
Regression
Fitting a curve
Pn
Model: y (x; β) = β0 + j=1 βj ϕj (x)
Expressivity and Dimensionality:
▶ For f1 , the basis functions are {ϕ0 (x) = 1, ϕ1 (x) = x, ϕ2 (x) = x 2 }.
▶ For f2 , the basis functions are {ϕ0 (x) = 1, ϕ1 (x) = x, ϕ2 (x) =
x 2 , ϕ3 (x) = x 3 , ϕ4 (x) = x 4 , ϕ5 (x) = x 5 }.
▶ f1 is low-dimensional (only three basis functions) but less expressive.
▶ f2 is high-dimensional (six basis functions) and better expressive than
f1 .
P. Balamurugan A Broad Overview of Machine Learning and Applications
January 5 & 9, 2024. 46 / 49
Nature of Machine Learning Tasks Classification Algorithms
Regression
Fitting a curve
Pn
Model: y (x; β) = β0 + j=1 βj ϕj (x)
Data Exploration
Homework:
Check the UCI data repository, https://archive.ics.uci.edu/.
Explore at least 5 different data sets. Perform the following
▶ Understand the number, type and description of features or attributes.
▶ Understand the number and nature of samples.
▶ Understand how the data was acquired, check if there are missing data,
etc.
▶ Check for any other relevant qualities of the data.
Write Python code to load these 5 data sets and create Pandas data
frames from the data loaded.
ACKNOWLEDGMENTS
Some content borrowed from various open-access resources
▶ Blogs
▶ Tutorials
▶ Free e-books
▶ Open-access Papers
▶ Youtube videos
▶ Scribe notes from my students