Professional Documents
Culture Documents
Introduction To Machine Learning and Big Data Management
Introduction To Machine Learning and Big Data Management
Machine Learning
Ratchainant Thammasudjarit, Ph.D.
Outline
• AL, ML and DL
• Taxonomies and Use Cases
AI, ML and DL
• From AI to DL
• Problem Complexity
• Computational Power
World is changing
• Task complexity
• Predefining relationship between variables are
difficult in many real world cases
Benefits of ML in Healthcare
Better life
Supervised Learning
• Prediction
• Classification
ML • Regression
Types of Machine Learning
Supervised Learning
• Classification
Microbial Keratitis Classification
Is a given image either bacteria or fungal
infection?
Supervised Learning
• Classification
Microbial Classification
Supervised Learning
• Regression
Early Detection
What is the conversion likelihood from
prediabetes to type-2 diabetes of a patient’s
clinical risk factors?
Types of Machine Learning
Supervised Learning
Supervised Learning
• Treatment plan
• Taking Glipizide to stimulate insulin generation
• Taking Metformin to reduce sugar produced by
liver
Supervised Learning
• Discussions
• What is the difference between classification
and regression
Classification Regression
Types of Machine Learning
Unsupervised Learning
• Tasks
• Structure discovery
ML • Clustering
• Dimensionality Reduction
Types of Machine Learning
Unsupervised Learning
• Clustering
Diagnosis Related Group
ML
Types of Machine Learning
Unsupervised Learning
Reinforcement Learning
• Self-learning
• Tasks
• Optimization
ML • Decision Making
Types of Machine Learning
Reinforcement Learning
• Decision Making
Robotic Navigation
Exploring landscape and self-decide to
perform action, e.g. moving forward, stop,
turn left, turn right
Types of Machine Learning
Reinforcement Learning
Reinforcement Learning
Clear Obstruct
Turn = 0.7
Types of Machine Learning
Supervised Learning
• Prediction
Summary • Classification
• Regression
Unsupervised Learning
• Structure Discovery
• Clustering
• Dimensionality Reduction
ML
Reinforcement Learning
• Optimization
• Decision Making
Machine Learning Components
Overall
Samples
Learning Trained
Algorithm Model
Model
Machine Learning Components
Samples
Population
Samples
Ramathibodi’s patient
from endocrine clinic
who has hyperglycemia
Machine Learning Components
• Supervised learning
• The sample 𝑖 is a pair of feature vector and its
Age Sex Diabetes
class label
55 Male yes • The standard notation is (𝐗 (𝑖) , 𝑦 (𝑖) )
37 Female no (𝐗 (2) , 𝑦 (2) )
48 Male yes
60 Female no
42 Male yes • Others
• The sample 𝑖 is a feature vector
• The standard notation is 𝐗 (𝑖)
Machine Learning Components
• Supervised Learning
Age Sex Diabetes • Dataset is a collection of 𝑚 pairs of feature
vector and its class label
55 Male yes
• The standard notation is (𝐗, 𝐲)
37 Female no
48 Male yes
60 Female no
42 Male yes
Machine Learning Components
• Summary
Age Sex Diabetes • Italic lowercase represents a scalar
55 Male yes • Non-italic lowercase represents a vector
37 Female no • Non-italic uppercase represents a matrix
48 Male yes
60 Female no
42 Male yes
Machine Learning Components
Model
6
• Expression of relationships between
𝜀4 variables
5 𝜀5
• Example: Linear relationship
4 𝜀2
3
𝐲 = 𝛽0 + 𝛽1 𝐱
𝜀3
𝜀1
2
1 • Goal
0 • Learn the optimal parameters 𝛉∗ = 𝛽0 , 𝛽1 from
0 1 2 3 4 5 dataset (𝐗, 𝐲) that minimize error
Machine Learning Components
Model
𝑓1
𝑓2
Error
𝑓1
𝑓2
𝛉∗
Machine Learning Components
Learning Algorithm
𝛉0 𝛉1 𝛉∗
Machine Learning Components
• Summary
• Input
Samples • Data
• Model
Learning Trained
Algorithm Model
• Process
• Learning Algorithm
Model • Output
• Model parameters (Trained Model)
Build a Machine Learning Model
Data Splitting
yes no
(𝐗 𝑡𝑠𝑡 , 𝐲𝑡𝑠𝑡 )
𝐲
yes no
Build a Machine Learning Model
Model
Build a Machine Learning Model
Trained model
60
Age 50
30
¬dm Sex
20
𝑀𝑎𝑙𝑒 𝐹𝑒𝑚𝑎𝑙𝑒
10
dm ¬dm
0
Male Female
Learning From Data
Age
dm ¬dm
Age
Model Evaluation 𝑎𝑔𝑒 < 40 𝑎𝑔𝑒 ≥ 40
¬dm Sex
Age Sex 𝐲 𝐲ො
60
20 Male ¬𝑑𝑚 ¬𝑑𝑚
55 Female 𝑑𝑚 ¬𝑑𝑚 50
42 Male 𝑑𝑚 𝑑𝑚
40
48 Male ¬𝑑𝑚 𝑑𝑚
45 Male 𝑑𝑚 𝑑𝑚 30
58 Male ¬𝑑𝑚 𝑑𝑚
20
60 Male 𝑑𝑚 𝑑𝑚
28 Female ¬𝑑𝑚 ¬𝑑𝑚 10
50 Male 𝑑𝑚 𝑑𝑚
0
50 Female ¬𝑑𝑚 ¬𝑑𝑚 Male Female
Model Evaluation
Confusion Matrix
• Type of Errors
Age Sex 𝐲 𝐲ො Error-Type • 𝑓𝑝: Type-I Error
20 Male ¬𝑑𝑚 ¬𝑑𝑚 𝑡𝑛 • 𝑓𝑛: Type-II Error
55 Female 𝑑𝑚 ¬𝑑𝑚 𝑓𝑛
42 Male 𝑑𝑚 𝑑𝑚 𝑡𝑝
Actual Actual
48 Male ¬𝑑𝑚 𝑑𝑚 𝑓𝑝
𝑦 𝑦ො 𝑦 𝑦ො
45 Male 𝑑𝑚 𝑑𝑚 𝑡𝑝
58 Male ¬𝑑𝑚 𝑑𝑚 𝑓𝑝 𝑦 𝑡𝑝 𝑓𝑝 𝑦 4 2
Predict
Predict
60 Male 𝑑𝑚 𝑑𝑚 𝑡𝑝
28 Female ¬𝑑𝑚 ¬𝑑𝑚 𝑡𝑛
50 Male 𝑑𝑚 𝑑𝑚 𝑡𝑝 𝑦ො 𝑓𝑛 𝑡𝑛 𝑦ො 1 3
50 Female ¬𝑑𝑚 ¬𝑑𝑚 𝑡𝑛
Model Evaluation
Type of Errors
Model Evaluation • Sensitivity and Specificity
𝑡𝑝
Evaluation Metrics 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
𝑡𝑝 + 𝑓𝑛
𝑡𝑛
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝑡𝑛 + 𝑓𝑝
Model Evaluation
AUC = 0.5
60
Age 50
𝐗 𝑛𝑒𝑤
𝑎𝑔𝑒 < 40 𝑎𝑔𝑒 ≥ 40 40
30
¬dm Sex
20
𝑀𝑎𝑙𝑒 𝐹𝑒𝑚𝑎𝑙𝑒
10
dm ¬dm
0
Male Female
ML Tools
Colab
Python R Jupyter
Good for training
Good for writing tutorial
Big Data Management
Ratchainant Thammasudjarit, Ph.D.
Big Data Management
Overview
• Input
• Acquire data
• Read data
• Process
BDM • Manipulate data
• Output
• Write data
Input
Acquiring data
Patient Information
Profile
HN A1234567
• Design of user interface is very important to
CID 1001020450634 data quality
Name คฤจภัคฐ์ คิศถฤงคิษแคธฎ์
Date of Birth 1978-01-01
Blood Type
Height
A
172
BMI
Weight
32
72
• What’s wrong with this interface
Address
Cancel OK
Input
Patient Information
Profile
Acquiring data HN A1234567
CID 1001020450634
Cancel OK
Cancel OK
Input
Patient Information Patient Information
Profile Profile
HN A1234567 HN A1234567
CID 1001020450634 CID 1001020450634
Acquiring data
Title นาย Other Title นาย Other
100/10 Tiwanont Road, Bangpood, Pakkred, 100/10 Tiwanont Road, Bangpood, Pakkred,
Nonthaburi, 11120, Thailand Nonthaburi, 11120, Thailand
Cancel OK Cancel OK
Patient Information Patient Information
Profile Profile
HN A1234567 HN A1234567
CID 1001020450634 CID 1001020450634
Title นาย Other Title นาย Other
Cancel OK Cancel OK
Patient Information Patient Information
Profile Profile
HN A1234567 HN A1234567
CID 1001020450634 CID 1001020450634
Title นาย Other Title นาย Other
Address Address
100/10 Tiwanont Road, Bangpood, Pakkred, 100/10 Tiwanont Road, Bangpood, Pakkred,
Nonthaburi, 11120, Thailand Nonthaburi, 11120, Thailand
Cancel OK Cancel OK
Patient Information
• Location information
Patient Information Profile
should input from
largest to smallest
HN A1234567
Profile • Country
CID 1001020450634
HN A1234567 • Province
Title นาย Other • District
CID 1001020450634
Title นาย Other
First Name คฤจภัคฐ์
• Zip Code should be
Middle Name
First Name คฤจภัคฐ์ automatically
Last Name คิศถฤงคิษแคธฎ์
Middle Name generated from
Last Name คิศถฤงคิษแคธฎ์
Date of Birth 1978-01-01 country, province,
BMI 24.33 kg/m2
district
Date of Birth 1978-01-01 Blood Type A
Height 172 cm Weight 72 kg
Blood Type A BMI 24.33 kg/m2 • Country, Province,
cm
Address District, Subdistrict
Height 172 Weight 72 kg
Country Thailand should be selected
Address from drop down for
Province Nonthaburi
100/10 Tiwanont Road, Bangpood, Pakkred, District Pakkred data integrity
Nonthaburi, 11120, Thailand
Zip Code 11120
• Address Number,
Subdistrict Bangpood
Address Name, and
Address Number 100/10
Road are
Address Name
Cancel OK acceptable to be
Road Tiwanont free text input
Cancel OK
Patient Information
• Location information
Patient Information Profile
should support
geographical tag
HN A1234567
Profile from online map,
CID 1001020450634
HN A1234567 e.g. Google Map
Title นาย Other
CID 1001020450634
Title นาย Other
First Name คฤจภัคฐ์
Middle Name
First Name คฤจภัคฐ์
Last Name คิศถฤงคิษแคธฎ์
Middle Name
Date of Birth 1978-01-01
Last Name คิศถฤงคิษแคธฎ์
Blood Type A BMI 24.33 kg/m2
Date of Birth 1978-01-01
Height 172 cm Weight 72 kg
Blood Type A BMI 24.33 kg/m2
Address
Height 172 cm Weight 72 kg
Country Thailand
Address
Province Nonthaburi
100/10 Tiwanont Road, Bangpood, Pakkred, District Pakkred
Nonthaburi, 11120, Thailand
Zip Code 11120
Subdistrict Bangpood
Address Number 100/10
Address Name
Cancel OK
Road Tiwanont
Cancel OK
Input
• Required modules
• mysql-connector
• mysqlclient
Data Processing
Data Processing
Data Processing
• Mode
• replace
• append