Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Se#ng

up ML Problem
Rao Vemuri
UC Davis
rvemuri@gmail.com

Deni>on of ML
A computer program is said to learn from
experience E
with respect to some class of tasks T
and performance measure P,
if its performance at T, as measured by P, improves
with experience E.

Example:
T Playing chess
P Percentage of games won
E Number of games played

Three Dierent Tasks

Quan>fying P
Percentage of games won
f (won, lost, draw)
Win = 1 point
Lost = -1 point
Draw = 0 points

Amount of money won


Championship >tles won

Capturing Experience E:
A Data Set

1
2
3
4

Outlook

Temperature Humidity

Windy

Surng

Sunny
Sunny
Rainy
Overcast

Mild
Hot
Mild
Cool

True
False
False
True

Yes
No
No
Yes

Normal
High
Normal
High

A^ributes or Features
1
2
3
4

Outlook

Temperature Humidity Windy Surng?

Sunny
Sunny
Rainy
Overcast

Mild
Hot
Mild
Cool

Normal
High
Normal
High

True
False
False
True

Yes
No
No
Yes

Features: Medical Example


Blood Pres. Cholesterol

1 normal

high

Glucose

high

Heart
AAack?

True Yes

1
2
3
4

Outlook

Temperature

Humidity

Windy

Sunny
Sunny
Rainy
Overcast

Mild
Hot
Mild
Cool

Normal
High
Normal
High

True
False
False
True

Yes
No
No
Yes

Nota>on: Instance
1
2
3
4

Outlook

Temperature Hunidity

Windy

Surng

Sunny
Sunny
Rainy
Overcast

Mild
Hot
Mild
Cool

True
False
False
True

Yes
No
No
Yes

Normal
High
Normal
High

Note on Features
Each instance is described by the same set of
features
The features may be
con>nuous (eg. Temperature)
discrete (eg. Cost in $)
Binary (eg. True/False)
Categorical (eg. Red/Blue/Yellow)

Nota>on: Data Set


1
2
3
4

Outlook

Temperature Hunidity

Windy

Surng

Sunny
Sunny
Rainy
Overcast

Mild
Hot
Mild
Cool

True
False
False
True

Yes
No
No
Yes

Normal
High
Normal
High

Training & Test Sets


Training Set: E (Experience)
The last column has labels like YES or NO
These labels are either given or inserted by an
expert

Test Set: t
The last column has no labels
Our job is nd those labels

Hypothesis
A combina>on of a^ributes and our guess as
to what the label should be for that
combina>on
If the (outlook = dont care)^(Temp =
cool)^(humidity=normal)^(Windy=Trues) is one
possible hypothesis.
For this hypothesis our machine should answer
YES or NO

Types of ML Algorithms
Supervised: You are given labeled training data.
Create a func>on that ts the data
Classica>on (looking for discrete categories)
Regression (looking for a con>nuous func>on)

Unsupervised: You are given unlabeled training


data. Discover unknown, but useful, classes
Reinforcement: The learner is not told which
ac>ons to take. Discovers which ac>ons yield the
best reward, in the long run.

WEKA
W(aikato) E(nvironment) for K(nowlegde)
A(nalysis)
Developed by the University of Waikato in New
Zealand
Machine Learning Tools and Techniques in Java
Comprehensive suite of Java class libraries
Implemented many state-of-the-art machine learning
and data mining algorithms

h^p://www.cs.waikato.ac.nz/~ml/index.html

WEKA Consists of

Explorer
Experimenter
Knowledge Flow
Simple Command Line Interface
Java Interface

Explorer
Is WEKAs main graphical user interface
Weka package consists of
Filters
Classiers
Clusterers
Associa>ons
A^ribute Selec>on
Visualiza>on tool

Pre-Processing
Data loaded from URL or DB
Preprocessing rou>nes in WEKA are called
lters
MergeA*ributeValuesFilter
NominalToBinaryFilter
Discre:seFilter
ReplaceMissingValuesFilter

Homework Assignment 1
Search for WEKA on the Web and write
(a) 4 short sentences about what the best features
of WEKA are.
(b) One sentence on where WEKA is useful

Assignment Due: A week from today (Sep 13).


Write your answers in English and submit on
one sheet of paper.

You might also like