Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Machine Learning Theory

CSE 250C
Introductory Lecture
General Info
•  Instructor: Raef Bassily (rbassily@ucsd.edu)
–  Office Hours: Thu 5-6 PM (4111 Atkinson Hall)
•  TA: Shuang Song (shs037@ucsd.edu)
–  Office Hours: Tue 10-11 AM (CSE Basement: B260A)
•  Website:
http://cseweb.ucsd.edu/classes/sp16/cse250C-a/
Also, available on
http://rbassily.eng.ucsd.edu/home/teaching/cse-250c
What is Machine Learning?
•  The automated process of “making sense” out of data:
–  A tool to extract information from data and use it.
•  ML has invaded our daily lives:
!  search engines, recommendation systems,
!  Email spam detection, fraud detection in credit cards,
!  Personal assistance in smart phones, face detection in digital
cameras,
!  Navigation, military applications, medicine, bioinformatics,
astronomy,..
•  How is ML different from traditional programming?
–  Endowing programs with the ability to “learn” and adapt
to data on their own.
Topics to be covered
•  Part 1: Fundamentals
"  Preliminaries: Tools from Probability
"  PAC Learning
"  Occam’s Razor
"  Learnability via Uniform Convergence
"  The VC-Dimension
•  Goal: to answer fundamental questions of learning
" What is learning? How can a machine learn?
" How do we quantify the amount of data required to learn a
certain concept?
" How can we evaluate the success of the learning process?
" Is learning always possible?
Topics to be covered
•  Part 2: Key Algorithmic Techniques
"  Boosting: Weak vs. Strong Learnability
"  Convex Learning Problems
"  Regularization and Stability
"  Stochastic Gradient Descent Algorithm
"  One of the following topics:
"  Support Vector Machines (SVMs)
"  Introduction to Online Learning
•  Goal: to present algorithmic techniques widely used in
practice.
Useful Readings

•  M. J. Kearns, U. V. Vazirani, An introduction to Computational


Learning Theory.
•  S. Shalev-Schwartz, S. Ben-David, Understanding Machine
Learning: From Theory to Algorithms.

•  Other specific readings may also be suggested in class.


•  There is no textbook for this class.
What is Learning?
•  The process of transforming an experience into expertise
or knowledge.
What is Learning?
•  The process of transforming an experience into expertise
or knowledge.

Learning Algorithm
Training Data Learned Concept or Rule
(Experience) (Expertise)
What is Learning?
•  The process of transforming an experience into expertise
or knowledge.

Learning Algorithm
Training Data Learned Concept or Rule
(Experience) (Expertise)

•  Example: Spam Detection


" Input: Set of emails, each labeled: Spam, or Not Spam.
" Output: Prediction rule to classify emails
What is Learning?
•  The process of transforming an experience into expertise
or knowledge.

Learning Algorithm
Training Data Learned Concept or Rule
(Experience) (Expertise)

•  Example: Spam Detection


" Input: Set of emails, each labeled: Spam, or Not Spam.
" Output: Prediction rule to classify emails
•  How to evaluate a learning algorithm?
"  Test the output (e.g., the prediction rule) on new unseen
data (called test data).
Fundamental Questions
Learning Algorithm
Training Data Learned Concept or Rule
(Experience) (Expertise)

•  What assumptions do we need for learning to be possible?


"  Training and test data are “similar” in some sense.
"  Put some restriction on the class of possible concepts (e.g.,
prediction rules) to be learned.
Fundamental Questions
Learning Algorithm
Training Data Learned Concept or Rule
(Experience) (Expertise)

•  What assumptions do we need for learning to be possible?


"  Training and test data are “similar” in some sense.
"  Put some restriction on the class of possible concepts (e.g.,
prediction rules) to be learned.
Example: Binary Classifiers
Fundamental Questions
Learning Algorithm
Training Data Learned Concept or Rule
(Experience) (Expertise)

•  What assumptions do we need for learning to be possible?


"  Training and test data are “similar” in some sense.
"  Put some restriction on the class of possible concepts (e.g.,
prediction rules) to be learned.
Example: Binary Classifiers
Should we consider
linear classifiers?
Fundamental Questions
Learning Algorithm
Training Data Learned Concept or Rule
(Experience) (Expertise)

•  What assumptions do we need for learning to be possible?


"  Training and test data are “similar” in some sense.
"  Put some restriction on the class of possible concepts (e.g.,
prediction rules) to be learned.
Example: Binary Classifiers
Should we consider 2nd
degree poly classifiers?
Fundamental Questions
Learning Algorithm
Training Data Learned Concept or Rule
(Experience) (Expertise)

•  What assumptions do we need for learning to be possible?


"  Training and test data are “similar” in some sense.
"  Put some restriction on the class of possible concepts (e.g.,
prediction rules) to be learned.
Example: Binary Classifiers
Should we consider
higher degree poly or
other complex functions?
Fundamental Questions
Learning Algorithm
Training Data Learned Concept or Rule
(Experience) (Expertise)

•  What assumptions do we need for learning to be possible?


"  Training and test data are “similar” in some sense.
"  Put some restriction on the class of possible concepts (e.g.,
prediction rules) to be learned.
•  Given a fixed model (e.g., a fixed type of prediction rules),
how can the machine output the “right” prediction rule?
Fundamental Questions
Learning Algorithm
Training Data Learned Concept or Rule
(Experience) (Expertise)

•  What assumptions do we need for learning to be possible?


"  Training and test data are “similar” in some sense.
"  Put some restriction on the class of possible concepts (e.g.,
prediction rules) to be learned.
•  Given a fixed model (e.g., a fixed type of prediction rules),
how can the machine output the “right” prediction rule?
•  How many data samples are needed to ensure that the output
prediction rule will generalize well to the unseen data?
Fundamental Questions
Learning Algorithm
Training Data Learned Concept or Rule
(Experience) (Expertise)

•  What assumptions do we need for learning to be possible?


"  Training and test data are “similar” in some sense.
"  Put some restriction on the class of possible concepts (e.g.,
prediction rules) to be learned.
•  Given a fixed model (e.g., a fixed type of prediction rules),
how can the machine output the “right” prediction rule?
•  How many data samples are needed to ensure that the output
prediction rule will generalize well to the unseen data?
"  The ability to generalize is the essence of any learning system.
"  Simpler prediction rules are easier to generalize.
Administrative Information
Prerequisites
•  Decent knowledge of probability and multivariate
calculus.

•  Need to be comfortable working with mathematical


abstractions and proofs.

•  Previous exposure to machine learning is useful, but not


strictly required.
Assessment

•  Homeworks (including one mini-project): 45%

•  Mid-term (in class): 20%

•  Final (take-home): 35%

•  A bonus for top answers on Piazza: up to 5%


Exams
•  Mid term in class on Mon, May 9.
•  Final will be posted on the class webpage on the
weekend after the last day of class.
–  The deadline for submitting the answers will be
decided later (tentatively, will be due in ~2-3 days).
–  The answers should be returned to me or the TA on
the specified due date/time.
Homeworks
•  Homeworks should be returned in class before the
lecture starts on the specified due date.
–  If you arrive late, please, wait till the end of the
lecture.
•  No late homeworks will be accepted except in
the case of emergencies:
–  Even then, except for documented medical emergencies,
1/3 of the homework grade will be deducted for every day
of late submission.
•  The last homework will include a mini-project.
Collaboration Policy
•  Each student can choose a homework collaborator to
collaborate with in solving the homework.
–  Choosing a collaborator is optional.
•  Each homework group must email me their names by
April 13.
–  If you choose to work alone, you still need to email me to
confirm your choice.
•  If you need a collaborator, please post on the course
group on Piazza.
•  No collaboration is allowed in the Final (or the
mid-term)!
Collaboration Policy
•  Each homework must include a brief account of the
contribution of each collaborator.

•  You must not look for homework solutions on the


internet.

•  If you receive help from someone else (except me


or the TA), or happen to see a solution somewhere,
please, acknowledge the source.
Grading Policy
•  Solutions to most problems involve proofs. Grading
will be based on both correctness and clarity.

•  Be concise. Excessively long proofs are probably


incorrect.

•  Show your reasoning in a clear and precise way


–  More partial credit for clearly written partial solution than
for attempt to provide a full solution with many holes in
your argument.
Class Participation
•  Please sign up today for the class forum on Piazza:
http://piazza.com/ucsd/spring2016/cse250c

•  Please, engage with your classmates in discussion


of the course material on the forum!

•  5% bonus for students with the best answers to the


posted questions.
Other Related Courses
•  CSE 250A: Principles of AI: Probabilistic Reasoning
and Learning
–  Instructor: Lawrence Saul

•  CSE 291-D: Latent Variable Models


–  Instructor: James Foulds.
Calibration Quiz!

You might also like