Supevised Learning - 2 - Classification - v2

Applications of
Artificial Intelligence
(ME3181)
Supervised Learning
Phung Thanh Huy

Department of Mechatronics
Ho Chi Minh City University of Technology (HCMUT)
huypt@hcmut.edu.vn
29/09/2023
Supervised Learning
Supervised Learning
Supervised Learning Unsupersvised Learning
Discrete Classification Clustering
Continuous Regression Dimentionality Reduction
o Supervised Learning (Học có giám sát): learns from labeled training data
to make predictions or decisions.
o Regression: Finding the relationship between a dependent variable
(label, target, output, outcome variable) and one or more
independent variables (also known as predictors or features).
o Classification: assign input data points to one of several predefined
categories or classes
o Unsupervised Learning (Học không giám sát): finds patterns,
relationships, or structures in a dataset without the presence of labeled
output or target variables.
Lecture notes of Andrew Ng

Applications of AI (ME3181) 3
Supervised Learning
Supervised Learning Unsupersvised Learning
Discrete Classification Clustering
Continuous Regression Dimentionality Reduction
Training set
Learning Algorithm
𝑥 𝑦
ℎ
Data Estimated Value
Hypothesis/ Model
https://www.amybergquist.com/
Lecture notes of Andrew Ng
Classification
Classification
Goal: To learn a classification model from the data that can be used to predict the
classes of new (future, or test) cases/instances.
MNIST : Modified National

Institute of Standards and
Technology
https://www.amybergquist.com/
Classification
Each classification model can be referred as classifier
https://datasciencedojo.com/
Types of classification
• Binary Classification: categorizes data into one of two classes or categories.
• Multiclass classification: data is classified into more than two classes or
categories.
• Multilabel classification: each data point can belong to multiple classes
simultaneously.
• Multioutput classification: one dataset, different missions
Simple classification: Binary
• 2 classes
• Set 1 class as refered class (1 – Positive)
• Another class is (0 or -1, Negative)
Class 2
Class 1
Classifier
Decision Boundary
Seperating
Hyperplane
https://machinelearningmastery.com/types-of-classification-in-machine-learning/
Categorical Encoding
Popular methods
Binary Encoding
1, 𝑠𝑒𝑡𝑜𝑠𝑎
𝑦=ቊ
0, 𝑣𝑒𝑟𝑠𝑖𝑐𝑜𝑙𝑜𝑟
Label encoding
Setosa class: y = ‘setosa’
Versicolor class: y = ‘versicolor’
Virginica class: y = ‘virginica’
Ordinary Encoding
Setosa class: y=1
Versicolor class: y = 2
Virginica class: y=3
One-hot encoding
Setosa class: y = [ 1, 0, 0]T
Versicolor class: y = [ 0, 1, 0]T
Virginica class: y = [ 0, 0, 1]T
K-Nearest Neighbor (k-NN)
𝑥2 𝑥2
𝑥1 𝑥1
𝑥2
𝑥1
Pseudocode of K-NN classifier
Input: 𝑿 – Training data, 𝒀 – Class labels of 𝑿, 𝐾: Number of Euclidean Distance
considered nearest neighbor, 𝒙 𝑡𝑒𝑠𝑡 : a new testing sample. 2
𝑡𝑒𝑠𝑡 𝑗 𝑡𝑒𝑠𝑡 𝑗
1. For each 𝒙(𝑗) in 𝑿 do: 𝑑 𝑥 ,𝑥 = ෍ 𝑥𝑘 − 𝑥𝑘
𝑘
Calculate the distance 𝑑 𝒙 𝑡𝑒𝑠𝑡 , 𝒙 𝑗
End for Manhattan Distance
2. Sort the calculated distances in increasing
𝑡𝑒𝑠𝑡 𝑗
3. Take first 𝐾 points correspoinding to the first 𝐾 sorted 𝑑 𝑥 𝑖 ,𝑥 𝑗
= ෍ 𝑥𝑘 − 𝑥𝑘
𝑘
distances.
4. For each point in 𝐾 taken points do:
Count the classes appear.
Minkowski Distance
End for 1
𝑝
𝑝
5. Class of 𝒙 𝑡𝑒𝑠𝑡 = most appeared class in 𝐾 taken points 𝑑 𝑥 𝑖 ,𝑥 𝑗
= ෍ 𝑥𝑘
𝑡𝑒𝑠𝑡
− 𝑥𝑘
𝑗
𝑘
𝑡𝑒𝑠𝑡
Output: Class of 𝒙
Pros High Accuracy, insensitive to outliers, no assumption

about data
Cons Computationally Expensive, requires more memory
Work with Numeric values, nominal values
Naïve Bayes Classifier
Probability Review
Biến ngẫu nhiên (Random Variable): Một biến số mà giá trị của nó có thể xuất hiện
một cách ngẫu nhiên.
𝐴: Ω → 𝐸
Ω là tập hợp (không gian) các giá trị mà 𝐴 có thể mang (sample space). Mỗi giá trị 𝑎 ∈
Ω là một thể hiện của 𝐴
Xác suất (Probability): Là khả năng 𝐴 mang một hoặc 1 khoảng giá trị nào đó.
Ký hiệu
Xác suất để điều Điều kiện
kiện xuất hiện
𝑃(𝐴 ≤ 𝑎)
Biến ngẫu nhiên Thể hiện của biến số
Xác suất cũng hay được ký hiệu bởi: 𝑃 𝐴 , 𝑝 𝐴 , Pr 𝐴

Lúc này, ta hiểu rằng 𝑃 𝐴 là một hàm của các giá trị/ điều kiện áp dụng lên trên các
giá trị mà 𝐴 có thể mang. Từ đó sẽ có các khái niệm pdf và cdf.
Probability Review
• Joint Probability (Xác suất đồng thời):

𝑃 𝑥, 𝑦
• Conditional Probability (Xác suất có điều kiện):

𝑃 𝑥|𝑦
• Marginal Probability (Xác suất biên): Xác suất để xảy ra x khi y đã có

𝑃 𝑥 ,𝑃 𝑦
• Likelihood: Xác suất đồng thời, khi ta coi 1 biến là tham số

𝐿 𝜃 𝑥 = 𝑃 𝑥 𝜃 = 𝑃𝜃 𝑥
• Maximize Likelihood: Tìm 1 biến (tham số) để xác suất của biến còn lại đạt cao
nhất.
𝜃 ∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝜃 𝑃 𝑥
Probability Review
• Joint Probability (Xác suất đồng thời):

𝑃 𝑥, 𝑦
Xác suất xuất hiện 𝑥 trong
• Conditional Probability (Xác suất có điều kiện): nhóm 𝑐𝑘
𝑃 𝑥|𝑦
Xác suất để 𝑦 = 𝑐𝑘 khi trong
• Marginal Probability (Xác suất biên): toàn bộ dữ liệu. Đây là xác suất
𝑃 𝑥 ,𝑃 𝑦 có trước (Priori)
• Bayes’s Theoreom: For 𝑥, 𝑦 are independent

𝑃 𝑥 𝑦 = 𝑐𝑘 𝑃 𝑦 = 𝑐𝑘
𝑃 𝑦 = 𝑐𝑘 |𝑥 =
𝑃 𝑥
Xác suất xuất hiện 𝑥 trong cả

Xác suất để 𝑦 = 𝑐𝑘 khi có dữ liệu 𝑥. Nói cách khác, xác tập dữ liệu.
xuất để 1 dữ liệu 𝑥 phân loại vào nhóm 𝑐𝑘 . Đây là
Posteriori (Xác suất sau, xác xuất đang tính đối với 𝑦)
• Naïve: Giả sử 𝑥 = 𝑥1 , … , 𝑥𝑖 , … 𝑥𝑛 𝑇 • Gaussian Naïve: Các biến tuân theo phân
gồm 𝑛 features. Ta sẽ giả sử “ngây phối Gaussian.
thơ” rằng 𝑥𝑖 độc lập với nhau Đặc trưng của phân phối Gaussian:
𝑃 𝑥 = 𝑃 𝑥1 𝑃 𝑥2 … 𝑃 𝑥𝑛 𝜇: Trung bình
2 2 σ 𝑥−𝜇 2
𝜎 : Phương sai, 𝜎 = 𝑁
𝑥𝑖 −𝜇𝑖 2
1 −
2𝜎𝑖2
𝑥𝑖 ~𝑁 𝜇𝑖 , 𝜎𝑖2 → 𝑃 𝑥𝑖 = 𝑒
𝜎𝑖 2𝜋
Như vậy Phương pháp tính như sau:
Đối với mỗi class 𝑦 = 𝑐𝑘 , tính
1. Tỉ lệ xuất hiện của 𝑐𝑘
2. Các thông số để tính toán xác suất → 𝑚𝑎𝑥𝑃 𝑦 = 𝑐𝑘 𝑥
Của x trong cả tập – Để Của x, nếu x có 𝑦 = 𝑐𝑘 , để

tính 𝑃 𝑥 tính 𝑃 𝑥 𝑦 = 𝑐𝑘
𝜇𝑖
𝜎𝑖
Linear Models for Classification
• Classification: Assign an input vectors (of features) 𝑥 into one of K classes 𝑐𝑘 .
• One way to interprete the classification problem is to find the decision
boundaries (separated hyperplanes, decision surfaces…)
• Naturally, we could consider the “decision boundaries” are linear.
• Boundaries form:
𝑤 𝑇 𝑥 + 𝑤0
• Decision of classes:
𝑦ො 𝑥 = 𝑓 𝑤 𝑇 𝑥 + 𝑤0
• Using dummy 𝑥0 = 1, we could transform to fully linear models:
𝑦(𝑥)
ො = 𝑓 𝑤𝑇𝑥
Decision
Boundaries
Logistic Regression
• General form:
𝑦ො 𝑥 = 𝑓 𝑤 𝑇 𝑥
• 𝑓 is the a function of logistic function: 𝑓 = 𝑔 𝜎 𝑤 𝑇 𝑥
1
𝜎 𝑡 =
1 + exp −𝑡
𝑡 → +∞, 𝜎 → 1
𝑡 → −∞, 𝜎 → 0
• We could consider 𝜎 is a probability, or 𝑝Ƹ 𝑦 = 𝑐𝑘 𝑥 = 𝜎
Logistic Regression
• Prediction of 𝑦ො form:
𝑦ො 𝑥 = 𝑓 𝑤 𝑇 𝑥 = 𝑔 𝜎 𝑤 𝑇 𝑥
1, 𝑖𝑓 𝑝Ƹ ≥ 0.5
𝑦ො 𝑥 = ቊ
0, 𝑖𝑓 𝑝Ƹ < 0.5
• Cost function of each training data point:

− log 𝑝Ƹ , 𝑖𝑓 𝑦 = 1
𝑐 𝑤 =ቊ
− log 1 − 𝑝Ƹ , 𝑖𝑓 𝑦 = 0
• Lost function of all training dataset included 𝑚 samples (Log-loss):

1 𝑚
𝐿 𝑤 = − ෍ 𝑦 (𝑖) log 𝑝Ƹ 𝑖 + 1 − 𝑦 𝑖 log 1 − 𝑝Ƹ 𝑖
𝑚 𝑖=1
• Logistic regression is used for binary classification
Softmax Regression
• Prediction of 𝑦ො form:
𝑦ො 𝑥 = 𝑓 𝑤 𝑇 𝑥 = 𝑔 𝜎 𝑤 𝑇 𝑥
exp 𝑧
𝜎 𝑧 = 𝐾
σ𝑗 exp 𝑧
Loss function (Cross-entropy):

1 𝑚 𝐾
(𝑖) 𝑖
𝐿 𝑤 =− ෍ ෍ 𝑦𝑘 log 𝑝Ƹ 𝑘
𝑚 𝑖=1 𝑘=1
https://machinelearningcoban.com/
Support Vector Machine (SVM)
• Suppose we have a hyperplane as a classifier.
• Support vectors are the data points nearest to the hyperplane.
• The distance between the support vectors and the hyperplane is
the margin.
𝑦 = 𝑤 𝑇 𝑥 − 𝑤0
𝑦=1
𝑦 = −1
Other classification algorithms
Model Evaluation
https://www.researchgate.net/publication/347447352_Classification_of_stages_of_
Diabetic_Retinopathy_using_Deep_Learning
Model Evaluation

Supevised Learning - 2 - Classification - v2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Supevised Learning - 2 - Classification - v2

Uploaded by

Copyright:

Available Formats

Applications of

Phung Thanh Huy

Lecture notes of Andrew Ng

MNIST : Modified National

Pros High Accuracy, insensitive to outliers, no assumption

Biến ngẫu nhiên Thể hiện của biến số

Xác suất cũng hay được ký hiệu bởi: 𝑃 𝐴 , 𝑝 𝐴 , Pr 𝐴

• Joint Probability (Xác suất đồng thời):

• Conditional Probability (Xác suất có điều kiện):

• Marginal Probability (Xác suất biên): Xác suất để xảy ra x khi y đã có

• Likelihood: Xác suất đồng thời, khi ta coi 1 biến là tham số

• Joint Probability (Xác suất đồng thời):

• Bayes’s Theoreom: For 𝑥, 𝑦 are independent

Xác suất xuất hiện 𝑥 trong cả

Của x trong cả tập – Để Của x, nếu x có 𝑦 = 𝑐𝑘 , để

• Cost function of each training data point:

• Lost function of all training dataset included 𝑚 samples (Log-loss):

• Logistic regression is used for binary classification

Loss function (Cross-entropy):

You might also like