Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Machine Learning:

Introduction
Why “Learn”?
◼ Machine learning is programming computers to
optimize a performance criterion using example
data or past experience.
◼ There is no need to “learn” to calculate payroll
◼ Learning is used when:
 Human expertise does not exist (navigating on Mars),
 Humans are unable to explain their expertise (speech
recognition)
 Solution changes in time (routing on a computer network)
 Solution needs to be adapted to particular cases (user
biometrics)

2
What We Talk About When We
Talk About“Learning”
◼ Learning general models from a data of particular
examples
◼ Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
◼ Example in retail: Customer transactions to
consumer behavior:
People who bought “Da Vinci Code” also bought “The Five
People You Meet in Heaven” (www.amazon.com)
◼ Build a model that is a good and useful
approximation to the data.

3
Data Mining/KDD
Definition := “KDD is the non-trivial process of
identifying valid, novel, potentially useful, and
ultimately understandable patterns in data” (Fayyad)
Applications:
◼ Retail: Market basket analysis, Customer
relationship management (CRM)
◼ Finance: Credit scoring, fraud detection
◼ Manufacturing: Optimization, troubleshooting
◼ Medicine: Medical diagnosis
◼ Telecommunications: Quality of service
optimization
◼ Bioinformatics: Motifs, alignment
◼ Web mining: Search engines
◼ ... 4
Knowledge Discovery (KDD) Process
 Data mining—core of
knowledge discovery Pattern Evaluation
process

Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
What is Machine Learning?
◼ Machine Learning
 Study of algorithms that
 improve their performance
 at some task
 with experience
◼ Optimize a performance criterion using example
data or past experience.
◼ Role of Statistics: Inference from a sample
◼ Role of Computer science: Efficient algorithms to
 Solve the optimization problem
 Representing and evaluating the model for
inference 6
Growth of Machine Learning
◼ Machine learning is preferred approach to
 Speech recognition, Natural language processing
 Computer vision
 Medical outcomes analysis
 Robot control
 Computational biology
◼ This trend is accelerating
 Improved machine learning algorithms
 Improved data capture, networking, faster computers
 Software too complex to write by hand
 New sensors / IO devices
 Demand for self-customization to user, environment
 It turns out to be difficult to extract knowledge from human
experts→failure of expert systems in the 1980’s.
7
Alpydin & Ch. Eick: ML Topic1
Applications
◼ Association Analysis
◼ Supervised Learning
 Classification
 Regression/Prediction
◼ Unsupervised Learning
◼ Semi supervised Learning
◼ Reinforcement Learning

8
Learning Associations
◼ Basket analysis:
P (Y | X ) probability that somebody who buys X also
buys Y where X and Y are products/services.

Example: P ( chips | beer ) = 0.7

Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Classification: Definition

◼ Given a collection of records


(training set )
 Each record contains a set of attributes, one of the
attributes is the class.
◼ Find a model for class attribute as
a function of the values of other
attributes.
◼ Goal: previously unseen records
should be assigned a class as
accurately as possible.
 A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into
training and test sets, with training set used to build
the model and test set used to validate it.
Classification
◼ Example: Credit
scoring
◼ Differentiating
between low-risk
and high-risk
customers from
their income and
savings

Discriminant: IF income > θ1 AND savings > θ2


THEN low-risk ELSE high-risk

Model 11
Examples of Classification Task

◼ Predicting tumor cells as benign or malignant

◼ Classifying credit card transactions


as legitimate or fraudulent

◼ Classifying secondary structures of protein


as alpha-helix, beta-sheet, or random
coil

◼ Categorizing news stories as finance,


weather, entertainment, sports, etc
Illustrating a basic Classification Task
◼ Instances are described
as a set of features or
attributes and their values
◼ The class that the
Tid
1
Attrib1
Yes
Attrib2
Large
Attrib3
125K
Class
No
Learning instance belongs to is
2 No Medium 100K No
algorithm
also called its “label”
3 No Small 70K No

4 Yes Medium 120K No


Induction  Input is a set of
“labeled instances”
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction


14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Example of a Decision Tree

Tid Refund Marital


Status
Taxable
Income Cheat
Splitting Attributes
1 Yes Single 125K No
2 No Married 100K No Refun
Yes d No
3 No Single 70K No
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Married
Single, Divorced
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data Model: Decision Tree


Another Example of Decision
Tree
MarSt Single,
Tid Refund Marital Taxable
Married Divorce
Status Income Cheat d
NO Refun
1 Yes Single 125K No
Yes d No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9
10
No
No
Married
Single
75K
90K
No
Yes
There could be more
than one tree that fits
10

the same data!


Decision Tree Classification Task
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply Decision
Model Tree
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?


Deduction
14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married Assign Cheat to “No”

TaxInc NO
< 80K > 80K

NO YES
Types of Modelers/Models
Instance
Instance
Instance
ss
Instance
s
◼ Logistic regression
Modeler ◼ Naïve Bayes classifiers
◼ Support vector machines
New
instance
(SVMs)
Model ◼ Decision trees
◼ Random forests
Classifier ◼ Kernel methods
◼ Genetic algorithms
Class ◼ Neural networks
Class
Class
Class
25
What Modeler to Choose?

◼ Logistic regression ◼ Data scientists


◼ Naïve Bayes classifiers try different
◼ Support vector machines modelers, with
(SVMs) different
◼ Decision trees parameters, and
◼ Random forests check the
◼ Kernel methods accuracy to
◼ Genetic algorithms (GAs) figure out which
◼ Neural networks: one works best
perceptrons for the data at
hand 26
Ensembles

◼ An ensemble method uses


Instance
Instance several algorithms that do the
Instance
ss
Instance
s same task, and combines their
results
 “Ensemble learning”
Modeler Modeler Modeler ◼ A combination function joins
A B C the results
 Majority vote: each
Model Model algorithm gets a vote
ModelB
A C  Weighted voting: each
algorithm’s vote has a
Combination weight
Function  Other complex
combination functions

Final Model
27
http://magizbox.com/index.php/machine-learning/ds-model-building/ensemble/ 28
Evaluating a
Classifier

29
Classification Accuracy

◼ Accuracy: percentage of correct classifications

Total test instances classified correctly


Accuracy =
Total number of test instances

Many more metrics….. We will study.


30
Prediction: Regression
◼ Example: Price of a
used car
◼ x : car attributes y = wx+w0
y : price
y = g (x | θ )
g ( ) model,
θ parameters

34
Regression Applications
◼ Navigating a car: Angle of the steering wheel (CMU
NavLab)
◼ Kinematics of a robot arm
(x,y) α1= g1(x,y)
α2= g2(x,y)
α2

α1

35
Supervised Learning: Uses
Example: decision trees tools that create rules
◼ Prediction of future cases: Use the rule to predict
the output for future inputs
◼ Knowledge extraction: The rule is easy to
understand
◼ Compression: The rule is simpler than the data it
explains
◼ Outlier detection: Exceptions that are not covered
by the rule, e.g., fraud

36
Unsupervised Learning
◼ Learning “what normally happens”
◼ No output
◼ Clustering: Grouping similar instances
◼ Other applications: Summarization, Association
Analysis
◼ Example applications
 Customer segmentation in CRM
 Image compression: Color quantization
 Bioinformatics: Learning motifs

37
Reinforcement Learning
◼ Topics:
 Policies: what actions should an agent take in a particular
situation
 Utility estimation: how good is a state (→used by policy)
◼ No supervised output but delayed reward
◼ Credit assignment problem (what was responsible
for the outcome)
◼ Applications:
 Game playing
 Robot in a maze
 Multiple agents, partial observability, ...
38
Resources: Datasets
◼ UCI Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html
◼ UCI KDD Archive:
http://kdd.ics.uci.edu/summary.data.application.html
◼ Statlib: http://lib.stat.cmu.edu/
◼ Delve: http://www.cs.utoronto.ca/~delve/

39
Resources: Journals
◼ Journal of Machine Learning Research www.jmlr.org
◼ Machine Learning
◼ IEEE Transactions on Neural Networks
◼ IEEE Transactions on Pattern Analysis and Machine
Intelligence
◼ Annals of Statistics
◼ Journal of the American Statistical Association
◼ ...

40
Resources: Conferences
◼ International Conference on Machine Learning (ICML)
◼ European Conference on Machine Learning (ECML)
◼ Neural Information Processing Systems (NIPS)
◼ Computational Learning
◼ International Joint Conference on Artificial Intelligence (IJCAI)
◼ ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD)
◼ IEEE Int. Conf. on Data Mining (ICDM)

41

You might also like