Chapter 4: Machine Learning

Chapter 4:
Machine Learning
Can Machine Learns?
 Learning? ~ to improve automatically with experience.
 We do not yet know how to make computers learn nearly
as well as people learn ~ machine and human is two
different things .
 “concept” ~ human learns from their experience (trial by
error, or being guided – like an infant/student).
 Example: baby attempts to walk after fall down several
times. Pain or how to balance are the best guidance.
 Problem ~ how machine can learn these? Do we need
to put sensory devices to detect pain? ~ how to
represent pain? ~ as “electronic pulse” of pain?
Machine Learning
 Machine learning ~ draws on concepts from statistics,
artificial intelligence, philosophy, information theory,
biology, cognitive science, computational complexity and
control theory (many more!).
 How machine learn? : a computer is said to learn from
experience E with respect of tasks T and performance
measures P. (if its P at tasks in T (measured by P)
improves with experience E)
 Need to have well-defined learning problem (based on
three features: (class of tasks, measure of performance
and source of experience)
Well-Posed Learning Problems
 A checkers learning problems:
 Task T: playing checkers
 Performance measure P: percent of games won
against opponents.
 Training experience E: playing practice games against
itself.
 A handwriting recognition problems:
 Task T: recognizing & classifying handwritten words
images.
 Performance measure P: percent of words correctly
classified.
 Training experience E: database of handwritten words.
General Machine Learning
Model
Training
experience
Database
(experience) Machine learning Adjust learning
algorithm parameter (based
on performance
results)
<New problem>
Performance?
Designing a Learning System
• Basic four steps to design learning system:
Determine type of “what is the best training

training experience approach”
Determine target “how the performance can

function be evaluated @ what to
be solved?”
Determine representation “how to represent the

of learned function learning process?”
Determine learning “how learning process will

algorithm take place?”
Types of Machine Learning
•Basic machine learning methods:
•Concept learning
•Decision tree learning
•Supervised and unsupervised learning
•Statistical learning & computational learning theory
•Instance based learning
•Explanation based learning
•Evolutionary learning
•Reinforcement learning
1) Decision Tree Learning
 The most widely used and practical methods of
classification.
 Approximating discrete valued functions ~ using “Divide
and Conquer”.
 Concept = node ”test attributes” , branch  “possible
value of the attributes”.
 Classification method  sorting the attribute down
from the root to particular leaf node.
 Decision tree algorithm : ID3, ASSTANT. C4.5, C5.0,
CART, SPRINT.
Decision Tree Learning
ROOT
Gender NODE
BRANCH
Male
Female
Height
Height > 2.0m
<1.3m >1.8m <1.5m
Tall
Short Tall Short
Medium
Medium
LEAF NODE
Example: Decision Tree Learning
ID Refund Marital Tax Cheat

1 Yes Single 125 No REFUND
2 No Married 100 No
YES
3 No Single 70 No NO
4 Yes Married 120 No
5 No Divorced 95 Yes MARITAL NO
SINGLE, MARRIED
DIVORCED
NO
TAX
<80 > 80
NO YES
Why Decision Tree ?
•Advantages:
•Easy to use and efficient.
•Tree structures are easy to interpret and understand.
•Direct representation.
•Disadvantages
•Do not easily handle continuous data.
•Difficult to handle missing data.
•Correlation between attributes are ignored by decision
tree process.
•Tree might replicate
2) Instance Based Learning
•Instance based learning ~ straight forward approaches to
approximating target value
•Basic: when a new query instance is encountered, a set of
similar related instances is retrieved from memory and used
to classify new instances.
•Sometimes referred as “lazy learner” ~ learning process
takes place when new instance must be classified.
•Main concept ~ the nearest existing example that might
similar to the new one!
•Common method ~ K-Nearest Neighbor and Case Based
Reasoning.
K-Nearest Neighbor
•Named as “lazy learner” method requires comparison
with training set . Primarily based on “nearest” distance.
•Calculate the similarities of two

points (new data and training
data. GROUP A
Group A or B?
•The lowest distance is voted as
neighbor to respective class.
•Assumption : if the nearest
neighbor class is A, then the GROUP B
class of that new data is also A.
K-Nearest Neighbor (cont)
•Assumption: for k-dimensional Euclidean space, the distance

between 2 points, x = {x1, x2,…,xn) and y = {y1,y2,…,yn) are
defined through:
n
Ed    xi  yi 
2
Euclidean Distance
i 1
n
Manhattan Distance M d   xi  yi
i 1
n
Minkowski’s Distance Minkd   x  y 
2
q
i i
i 1
Attributes X1 X2 X3 CLASS
A 5 1 3 GOOD
B 3 1 3 GOOD
C 4 1 5 BAD
•Given new set of data, D = { 2,1,3). Find the possible

class for set D using KNN. (K = 1)
d(D,A) = 3, d(D,B)=1, d(D,C)=2.83.
d ( D, A)   2  5  1  1  3  3
2 2 2
Based from the calculation, distance
d(D,B) is the minimum.
d ( D, B )   2  3  1  1  3  3
2 2 2
Since K=1, only one neighbor

d ( D, C )   2  4   1  1  3  5
2 2 2
involved, therefore dataset D can be
classified as GOOD (based on
dataset B)
•Advantages:
•Easy to program
•No optimization / training is required
•Incremental learning (information is retained).
•Robust to noisy data (only nearest data involved)
•Disadvantages:
•Exhaustive learning (more dataset, more memory)
•“Curse of dimensionality”  how about if the
dimension is too big or infinite?
Case Based Reasoning
Uses various techniques to match a situation or a problem
description with the most similar cases  similarity
assessment.
•Definition (Schank and Abelson, 1977): “technique to

solve new problems by adapting solutions that were used to
solve old problems”.
•Refers to both a cognitive and computational model of

reasoning by analogy.
•Basic  many problems are not unique, but rather a

variations of a problem type.
Case Based Reasoning (cont)
•All cases are independent from each other  each case
describes one particular situation.
•Widely implemented in legal, medical, diagnosis.
•Example of case (Case study: paddy disease)
•CASE 1: Leaf color green with yellow stripes

Stalk color green.
Spot yes.
Spot condition stripes
Panicle  yes
DISEASE = bacterial leaf blight.
Problem New (1)RETRIEVE
(4)RETAIN Case
Add new cases
Learned
Case
Select related
cases Retrieved
Stored
Cases Case
Solved
Tested, Repaired Case
Case (2)REUSE
Confirmed Suggested
solution (3) REVISE solution
CASE F12:
Leaf color green
Stalk color green.
Spot yes. NEW CASE:
Spot condition stripes Leaf color green
Panicle yes Stalk color green.
Disease : Bacterial Leaf Spot no.
Streak. Spot condition stripes
Panicle yes
Disease : ?
CASE B3:
Leaf color yellowish
Stalk color green. Compare
Possibly Bacterial
Spot no. similarities
Leaf Streak disease
Spot condition no. (local)
Panicle no
(New case is almost
Disease : Bakanae similar to case F12).
•Advantages:
•Easy to represent (by cases representation)
•Incremental learning (reused, retained and
adaptation process).
•Capable to handle missing value.
•Disadvantages:
•Exhaustive learning (more dataset, more memory)
•Cases should be updated regularly.
•Complexity of the cases sometimes hard to be
represented.
3) Supervised Learning
Supervised Learning
Essential ingredient: availability the external
indicator (“teacher”).  teacher provides desired or
target response for particular training vector.
Environment Teacher
Vector describing Desired

Actual response
state of environment
response
Learning
System -  +
Error signal
Supervised Learning
Example: Multilayer Perceptron Neural Networks
•Inspired by observation that biological learning systems
are built from very complex interconnected neurons.
•Learning algorithm: error-correction learning (error-
signal). dk(n) is a desired response and yk(n) is a actual
response.
ek (n)  d k (n)  yk (n)
•Aim: to minimize cost function: 1

(n)   ek2 (n)
2 k
•Gradient descent minimization method.

MLP Neural Networks
Input
layer Hidden
layer Output
layer
input node output

node
weights
hidden
node
3) Unsupervised Learning
Unsupervised Learning
Essential ingredient: no external “teacher” to
oversee the learning process. (no specific examples
of the function to be learned by the network).
•A sequence of input vector is provided, but NO target
vector.
•Basically, the similar group of data will be clustered
together (self organized learning) – “winner takes all”
strategies. ~ clustering.
•Example: Kohonen Self Organizing Map (SOM),
Adaptive Resonance Theory ~ ART)
Unsupervised Learning
Example: Kohonen Self Organizing Map
•Also known as “topology preserving map”
•The weight vector for a cluster unit serves as an
exemplar of the input patterns associated with that
cluster.
•Basically ~ the cluster unit whose weight vector matches
the input pattern most closely is chosen as a winner.
•Euclidean distance ~ minimum distance is considered
winner.
n
Ed    xi  yi 
2
i 1
Kohonen SOM
Output (cluster)
{Output layer}
weights
{input layer}
Input nodes
4) Reinforcement Learning
Reinforcement Learning
•Addresses to question of how an autonomous agent that
senses and acts in its environment can learn to choose
optimal action(s)  to achieve its goal(s).
•Concept ~ each time the agent performs an action in its
environment, reward or penalty will be given (based on
desirability of result state).
•Task ~ the agent must know which action gain most
reward (reinforcement signal) ~ strengthened signal or
reward indicates satisfactory actions.
•Learning algorithm: using Q learning, adaptive heuristic
critic and temporal-difference methods.
Reinforcement Learning
Agent : Reinforcement Learning
Agent
State (s) Action (a)

Reward (s)
Environment
a0 a1 a2
S0 S1 S2
r0 r1 r2
Aim to r0   r1   r2  ...., where 0    1

2
maximize:
Machine Learning Application
•Pattern recognition ~ to recognize hand-writing,
signature, biometrics, texture analysis, stock exchange,
signal processing or even human emotions.
•Control ~ autonomous robot, self-guided underwater
rover, manufacturing, autonomous vehicles, washing
machine, smart home.
•Medical applications ~ automated heart attack
detection, cancer cells analysis, disease diagnosis,
outbreak analysis.
•Gamming ~ chess playing program, soccer playing
game etc.

Chapter 4: Machine Learning

Uploaded by

Copyright:

Available Formats

You might also like

Chapter 4: Machine Learning

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4: Machine Learning

Uploaded by

Copyright:

Available Formats

Chapter 4:

Determine type of “what is the best training

Determine target “how the performance can

Determine representation “how to represent the

Determine learning “how learning process will

ID Refund Marital Tax Cheat

•Calculate the similarities of two

•Assumption: for k-dimensional Euclidean space, the distance

•Given new set of data, D = { 2,1,3). Find the possible

Since K=1, only one neighbor

•Definition (Schank and Abelson, 1977): “technique to

•Refers to both a cognitive and computational model of

•Basic  many problems are not unique, but rather a

•Widely implemented in legal, medical, diagnosis.

•Example of case (Case study: paddy disease)

•CASE 1: Leaf color green with yellow stripes

Vector describing Desired

•Aim: to minimize cost function: 1

•Gradient descent minimization method.

input node output

State (s) Action (a)

Aim to r0   r1   r2  ...., where 0    1

You might also like