Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 26

Machine learning

Machine learning ?

Machine



logic
(code)






Spam

? ?

?

Machine learning pure-science, pure-


mathematics



jigsaw

machine learning


logic input

output

Machine learning
machine learning
3

1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

Supervised learning

ML ML




Class Label

ML supervised learning
Classification Regression; classification
class label regression
continuous label
,

Unsupervised learning

supervised learning label


label label
ML

pattern label

ML unsupervised learning Clustering


Reinforcement learning

(trial and error) ML


input
input

reward reward=1 reward=0
reward feed input


https://www.youtube.com/watch?v=xM62SpKAZHU
AI Flappy Bird


(
) AI

concept ML

Dataset

Dataset

(feature)
dataset
digitize
classic dataset ML Iris

dataset (sample) 3
Setosa, Versicolor, Verginica 150 sample
sample feature 4

1. (petal width)
2. (petal height)
3. (sepal width)
4. (sepal height)

Wikipedia

Iris flower
dataset https://en.wikipedia.org/wiki/Iris_flower_data_set
3
,


Features
Attributes

( Setosa )
Class label
Samples Instances features
4 class label


matrix
Machine learning

Pre-processing

Learning

Evaluation

Pro-processing

(structure)
Iris flower dataset
structure
feature feature


unstructure

pixel

attribute
input generate
feature
prediction
Feature extraction feature
feature
Feature selection

classification Feature selection


Information gain Entropy

(entropy
information theory


) entropy
feature prediction
entropy feature
(

classification )
stable
class

Learning

data input


data input

Training

training

Ref: http://www.datasciencecentral.com/profiles/blogs/a-tour-of-machine-learning-algorithms-1

train
prediction
Model

Cross-validation model
(accuracy) error
Overfitting train
feature model predict
sensitive noise

cross-validation
Leave-one-out 2 train

k-fold dataset k 5-fold 4-fold
train 1-fold test 3-fold train 2-fold test
combinational

Evaluation
Accuracy

prediction model predict accuracy



accuracy

classify email spam 100 model


90

90%

Confusion matrix

accuracy
classify classes class
predict classify
2 classes confusion matrix confusion table

class Cat = 80 + 20 = 100 model


80 20
class Dog 5 + 195 = 200 model
5 195 %
accuracy

class class

confusion matrix error




false positive (flase alram)

false negative (missing)

Ref: http://9gag.com/gag/aRVbMvy/false-positive-false-negative-in-
a-nutshell

ML




learning

Entropy

Entropy (
)
Feature selection entropy

Entropy

Entropy Information Entropy



entropy
concept

entropy Shannon
()
Father of
Information Theory communication, information



Entropy (impurity)
(transmission)
channel (purity)

channel noise lost




11111111 channel 11110011
impurity
00
content 1 0

Entropy 0-1

- 100 entropy

50 50
() entropy

: = 90:10 70:30
1 entropy
2 entropy

Entropy

Information entropy Shannons entropy entropy log-


base-2 of number of possible outcomes
(H): (T) 50:50
( fair ) 1 possible
outcomes = 2 (H, T) 2 possible outcomes = 4 (HH, HT,
TH, TT) log 2 2 1 2 ,
2 4 represent

class content
(



) log 2 class

A-Z 26 1/26
- 1/44
entropy
log 2

x class 1 class
entropy
probability class form

: i=1 n
Sum Latex plugin

wordpress render -

entropy ?

classification instances/samples
instance features

features train
model predict accuracy
features
content , features 2



train entropy

log 2 1 0
entropy classification binary-classification
2-classes classification (Yes or
No), (Male or Female),
Decision tree
node Yes No

Information gain
entropy information
gain

Resources:

StackOverflow
prof.

https://stackoverflow.com/questions/1859554/what-is-
entropy-and-information-gain
. ( )
Data Mining

http://dataminingtrend.com/20
14/data-mining-techniques/feature-selection-information-gain/

Entropy (Information Theory)
https://www.khanacademy.org/computing/computer-
science/informationtheory/moderninfotheory/v/information-
entropy
https://en.wikipedia.org/wiki/Entropy_(infor
mation_theory)

Entropy Information gain


Machine learning
Entropy
feature
selection entropy
Information gain entropy -

Information gain

Information gain entropy


entropy

IG

Information gain > 0 information (feature


)
Information gain < 0 information (feature )
Information gain = 0 information (feature
classify)

Entropy

- 10
2
class (child 1 child 2)

: ()

entropy
IG

entropy child 1 child 2


child

weight child

IG

0.256
Real-world examples

sample space 10 samples

dataset
?

entropy IG text, slide, video


video IG feature
(

) model

feature (feature
selection) Information Gain

feature
model
entropy top comment
What is entropy and information gain?

Linear Regression 1
Linear Regression predictive model
continuous 1-100, 0.01-0.99
classification model discrete 0,
1 ; , 2 , ,
,

2 ,
,
y () x
( )
2
(linear)



x=15 y
model
(

)

linear regression
(fit)



( )
Training data ( training set, examples )


training data (Unknown
data/Unseen data) x=15 y ~= 12


?


Error error
( make sense )
optimize error model


h hypothesis


(
)
parameter optimize ()
training data

h hypothesis index

m h 1
error 2
(Sum square error) training data
Cost function

Square Error J


(error
)

optimize

Linear Regression 2

concept

1. hypothesis
2. represent cost function
3. minimize cost function
hypothesis h

cost function Sum Square Error


cost function model fit trianing set


concept
h
1, 0.5 0
sum square

cost
function
(blow shape) J
0 training set


( training set )

optimize model
optimize
math
optimize
iterative algorithm
Gradient Descent

Gradient Descent


Linear Regression task
optimize Cost function


model fit training set


(blow shape) 2


2 3

Global minimum



Global
Minimum optimize



Gradient
Descent

Gradient Descent Partial Derivative cost function (diff


cost function )

( )


prove

Gradient Descent Algorithm

gradient descent iterative algorithm


repeat until convergence {

(j = 0 j = 1)

}

cost function linear regression

repeat until convergence {

}
Simultaneously
:= assign


chain
Batch Gradient Descent

(alpha)

Learning
rate gradient
gradient
Overshoot
const function

optimize

cost function parameter


optimize (
) blow shape


training set global
minimum (
)


local minimum
()
optimize optimize global minimum
gradient
global minimum

linear regression cost function


blow shape global
minimum

Underfitting VS Overfitting

cross-validation


complex handle
generalize predict data cross-
validation model


model

linear regression
training data
x,y


model (cost function train)
model

training data

train model cost function




handle data
complex
handle Underfitting
underfitting train





cost function 2, 3

data data model



data
model

underfitting training data


testing data make sense
complex error

noise outlier
training data edge case
Overfitting

overfitting model error


error
overftting 1) data train model


training data
2)
custom training data generalize

overfitting Regularization term


cost function model generalize (
Regularization term underfitting model) training
data

model generalize handler data


(noise )

overfitting model
training data
testing data
Machine learning ?


document
machine
learning concept
download machine learning library

sample code, document
document


programming
language
( )
machine learning





machine learning

document library


-?

feature train ?

?
?

train optimize ?

paper math ?


linear algebra, discrete math, calculus

, matrix,
vector, probability, set, logic, combination, graph, partial
derivative etc.
probability, conditional probability, normal
distribution etc.
cross-validation, confusion matrix error
, precision, recall, F1 score etc.
information theory


math


hypothesis
cost function
optimize cost function
model fit
train
tune parameters cross-validation
model under fit over fit
(
)

machine learning programming



machine
learning connect the dot

You might also like