Welcome to Scribd!

Problems Chap5

Uploaded by

0% found this document useful (0 votes)

32 views2 pages

This document contains solutions to problems about machine learning concepts like generalization error bounds, model complexity, and overfitting. Problem 5.1 discusses how the VC dimension relates to falsification probability bounds. Problem 5.2 examines how increasing model complexity affects training and test error. Problem 5.3 applies Hoeffding's inequality to get a probabilistic guarantee on generalization error. Problem 5.4 highlights issues like data snooping and sampling bias in using the S&P 500 to evaluate trading strategies. Problem 5.5 notes more data is needed to reliably assess performance. Problem 5.6 explains that extrapolation is harder than interpolation due to potential differences in training and test distributions as well as effects of data sno

Original Description:

Learning from Data Solutions chapter 5

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

32 views2 pages

Problems Chap5

Uploaded by

Mohamed Taha

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 2

Search inside document

Problem Solutions

Chapter 5
Pierre Paquay

Problem 5.1

(a) Let f be an arbitrary binary function. If H shatters x1 , · · · , xN , there exists h ∈ H such that h(xn ) = f (xn )
for all n = 1, · · · , N , and consequently Ein (h) = 0; so the proposition is not falsifiable in this case.
(b) Since the data is generated from a random (arbitrary) target function, then every dichotomy is equally
likely, which means that their probability is 1/2N . With that in mind, we have that

P[falsification] = 1 − P[∃h ∈ H : Ein (h) = 0]

= 1 − P[∃h ∈ H : h(xn ) = f (xn ) ∀1 ≤ n ≤ N ]
Number of dichotomies on x1 , · · · , xN such that h(xn ) = f (xn ) ∀1 ≤ n ≤ N
= 1−
Number of dichotomies
mH (N )
≥ 1− .
2N

(c) If dV C = 10 and N = 100, we get that

N dV C + 1 10010 + 1
P[falsification] ≥ 1 − = 1 − ≈ 1.
2N 2100

Problem 5.2

(a) Since Hi ⊂ Hi+1 , we know that |Hi | ≤ |Hi+1 |, and

Ein (gi ) = min Ein (h) ≥ min Ein (h) = Ein (gi+1 )
h∈Hi h∈Hi+1

for any i = 1, 2, · · ·.
(b) Let pi = P[g ∗ ∈ Hi ] = P[g ∗ = gi ], so if pi is small then Ω(Hi ) is large, which implies that the model is
complex.
(c) It is obvious that
g ∗ ∈ Hi ⇒ g ∗ ∈ Hi+1 ,
thus we get that
pi = P[g ∗ ∈ Hi ] ≤ P[g ∗ ∈ Hi+1 ] = pi+1
for any i = 1, 2, · · ·.
(d) We know from the generalization bound that

P[|Ein (gi ) − Eout (gi )| > ∩ g ∗ = gi ]

1
Problem 5.3

(a) Here, we consider only one model, so M = 1.

(b) For M = 1, N = 10000, and = 0.02, the Hoeffding inequality tells us that
2
P[|Ein (g) − Eout (g)| > 0.02] ≤ 2 · 1 · e−2·0.02 ·10000
= 6.7 × 10−4 .

(c) One possible reason is the sampling bias : the data set contains only data about people who did get a
credit card, we have no information on people who were rejected. So, when such people are passed to our g
function, the results are not as good as predicted.
(d) Yes, we should have used our function g on the entire data set of clients (approved and not approved), in
this case only would we have got a meaningful probabilistic guarantee.

Problem 5.4

(a)(i) The problem here is that we used N = 12500 days (50 years) of data, although we opted to fix M = 500.
However, there is data snooping involved in this choice since these 500 stocks were also selected beforehand
(by definition of the S&P 500) by looking at the whole data set. Moreover, for many of the 50000 stocks we
do not have the full 12500 days of data but much less in many instances.
(a)(ii) As stated above, the correct M should be M = 50000. In these conditions, we should get
2
P[|Ein − Eout | > 0.02] ≤ 2 · 50000 · e−2·12500·0.02 ≈ 4.539993.

(b)(i) As in point (a), we cannot conclude with any certainty that buying and holding stocks is a good general
strategy since we only based this decision on the 500 stocks that were selected beforehand.
(b)(ii) Actually we would be able to say something about the performance of buy and hold trading if we
considered all 50000 stocks currently trading and not only the 500 largest.

Problem 5.5

(a) No, as stated in Problem 5.4, the S&P 500 stocks are actually selected by looking at the data, so there
is data snooping involved. So we must take this contamination into account to get a reliable performance
estimate.
(b) We should take a data set more representative of the whole stocks market and not the largest companies
only.

Problem 5.6

One reason that extrapolation is harder than interpolation is that we may be suffering from sampling bias
and consequently the in-sample performace may be very different than the out-of-sample performance as
the training distribution may be pretty different from the test distribution. Another reason might be data
snooping if we use a test set for model selection and also for model evaluation, we might get over-optimistic
results.

Durrett Probability Theory and Examples Solutions PDF
Document122 pages
Durrett Probability Theory and Examples Solutions PDF
Kristian Mamforte
73% (15)
Epidemiology CHEAT SHEET
Document4 pages
Epidemiology CHEAT SHEET
Girum Solomon
No ratings yet
PracResearch2 - Grade 12 - Q3 - Mod3 - Conceptual Framework and Review of Related Literature - Version 3
Document55 pages
PracResearch2 - Grade 12 - Q3 - Mod3 - Conceptual Framework and Review of Related Literature - Version 3
Jieza May Marquez
87% (55)
Class Test - STAT 1011Y (11 April 2017)
Document3 pages
Class Test - STAT 1011Y (11 April 2017)
Ne Pa
No ratings yet
Lecture 02 - Banach Spaces (Schuller's Lectures On Quantum Theory)
Document19 pages
Lecture 02 - Banach Spaces (Schuller's Lectures On Quantum Theory)
Simon Rea
No ratings yet
Problems Chap3
Document27 pages
Problems Chap3
Mohamed Taha
No ratings yet
Calc 1 Prelim Answers
Document10 pages
Calc 1 Prelim Answers
Neil Montero
No ratings yet
A Polys and NT Holden Lee Lecture 13 E
Document15 pages
A Polys and NT Holden Lee Lecture 13 E
Liu Yi Jia
No ratings yet
Fixed-Point Method PDF
Document20 pages
Fixed-Point Method PDF
Helbert Paat
No ratings yet
ArtiSubMath v10
Document21 pages
ArtiSubMath v10
Julius Hou
No ratings yet
Variationaal Inequalities Approaches To Minimization Problems With Constraints of Generalized Mixed Eequilibria and Variational Inclusion
Document20 pages
Variationaal Inequalities Approaches To Minimization Problems With Constraints of Generalized Mixed Eequilibria and Variational Inclusion
SIDDHARTH MITRA
No ratings yet
Final03 Sols
Document3 pages
Final03 Sols
sdaetwyler
No ratings yet
ECS171: Machine Learning: Lecture 2: Feasibility of Learning (LFD 1.3, 3.1)
Document32 pages
ECS171: Machine Learning: Lecture 2: Feasibility of Learning (LFD 1.3, 3.1)
svwnerlgwr
No ratings yet
Iyse6999 - Swelch34 - hw5: 1 Problem 1
Document4 pages
Iyse6999 - Swelch34 - hw5: 1 Problem 1
Malik Khaled
No ratings yet
Lecture Notes On Green Function On A Remannian Manifold
Document9 pages
Lecture Notes On Green Function On A Remannian Manifold
Estevan Luiz
No ratings yet
Erum Dost - Pure Mathematics - Analysis I
Document19 pages
Erum Dost - Pure Mathematics - Analysis I
user77975
No ratings yet
PART I - Multiple Choice Questions
Document10 pages
PART I - Multiple Choice Questions
Jeff Siu
No ratings yet
Corr Math 1 Polytech 2020
Document6 pages
Corr Math 1 Polytech 2020
Lionel Tebon
No ratings yet
MA 105: Calculus D1 - T5, Tutorial 03: Aryaman Maithani
Document16 pages
MA 105: Calculus D1 - T5, Tutorial 03: Aryaman Maithani
Sundar
No ratings yet
Notes of Optimal Transport Problem and Metrics: Yang YANG, EE 68 April 27, 2019
Document15 pages
Notes of Optimal Transport Problem and Metrics: Yang YANG, EE 68 April 27, 2019
anon_583584972
No ratings yet
1 5134167750603702289 PDF
Document14 pages
1 5134167750603702289 PDF
JULIO MARTINEZ
No ratings yet
Module 2 Answers
Document3 pages
Module 2 Answers
Neil Montero
No ratings yet
Euclid
Document10 pages
Euclid
Divya Patel
No ratings yet
Pma 788
Document28 pages
Pma 788
tester mailaccount
No ratings yet
MA2002Hmk2 2024
Document4 pages
MA2002Hmk2 2024
Madeleine Yan
No ratings yet
137 PP 03
Document5 pages
137 PP 03
Christopher Francis
No ratings yet
Problem-1: in Assignment 1 We Discussed A Covariance Stationary ARMA (1,1)
Document9 pages
Problem-1: in Assignment 1 We Discussed A Covariance Stationary ARMA (1,1)
강주성
No ratings yet
Factoring Bivariate Polynomial
Document5 pages
Factoring Bivariate Polynomial
yacp16761
No ratings yet
Fundamental Theorem of Calculus
Document6 pages
Fundamental Theorem of Calculus
Amama Shafqat
No ratings yet
Homework 2
Document15 pages
Homework 2
Leo Chung
No ratings yet
Solutions Sheet 2, Continuity.: Questions 7-12
Document4 pages
Solutions Sheet 2, Continuity.: Questions 7-12
Ranu Games
No ratings yet
Notes On ECON3032: X Y U (X
Document2 pages
Notes On ECON3032: X Y U (X
SA Smith Aina
No ratings yet
PrelimExam Solution
Document3 pages
PrelimExam Solution
Neil Montero
No ratings yet
3098 Mathcc5 L 3
Document4 pages
3098 Mathcc5 L 3
neetmonths
No ratings yet
Lec17 ArithmetcFunctions4
Document4 pages
Lec17 ArithmetcFunctions4
James Mlotshwa
No ratings yet
Guia #4
Document9 pages
Guia #4
Felipe Eduardo Jimenez
No ratings yet
FOCSps 6
Document2 pages
FOCSps 6
john
No ratings yet
Phương Trình Diophantine
Document13 pages
Phương Trình Diophantine
huuduy.lythaito
No ratings yet
Mathematics and Statistics 1 Assignment 2: Exercise 1
Document8 pages
Mathematics and Statistics 1 Assignment 2: Exercise 1
Carlo Patti
No ratings yet
The Fundamental Theorem of Calculus
Document3 pages
The Fundamental Theorem of Calculus
Furious SK
No ratings yet
Exercises2 2022 2023v1
Document2 pages
Exercises2 2022 2023v1
Ermita Yusida
No ratings yet
Analysis II MS Sol 2014-15
Document4 pages
Analysis II MS Sol 2014-15
Mainak Samanta
No ratings yet
CONTINUITY
Document2 pages
CONTINUITY
SHREYAS MISHRA
No ratings yet
Notefile 1 1709550860
Document21 pages
Notefile 1 1709550860
HASNAIN BOHRA
No ratings yet
Final exam 2023 - Solutions
Document9 pages
Final exam 2023 - Solutions
laurensrobbe
No ratings yet
2016 Calculus Contest Solutions
Document8 pages
2016 Calculus Contest Solutions
Jose
No ratings yet
Ntfunctions
Document8 pages
Ntfunctions
Default Account
No ratings yet
EE605 Quiz1 Solutions
Document5 pages
EE605 Quiz1 Solutions
Tirthankar Adhikari
No ratings yet
期末考解答與評分標準
Document11 pages
期末考解答與評分標準
黃元湛
No ratings yet
On G-Discrepancy and Mixed Monte Carlo and Quasi-Monte Carlo Sequences
Document11 pages
On G-Discrepancy and Mixed Monte Carlo and Quasi-Monte Carlo Sequences
simona_berla
No ratings yet
Mock Merged
Document18 pages
Mock Merged
New Chegg
No ratings yet
Spectral Element Approximation of Solutions of Functional Integral Equations
Document9 pages
Spectral Element Approximation of Solutions of Functional Integral Equations
Anonymous 1Yc9wYDt
No ratings yet
MATH 135 W17 Sample Final Exam Solutions
Document5 pages
MATH 135 W17 Sample Final Exam Solutions
Salman Habib
No ratings yet
Elementary Functions
Document2 pages
Elementary Functions
Cagla
No ratings yet
Assignment 4: Mean Value Theorem, Taylor's Theorem, Curve Sketching
Document2 pages
Assignment 4: Mean Value Theorem, Taylor's Theorem, Curve Sketching
vishaldeep
No ratings yet
Final Sample Sol
Document8 pages
Final Sample Sol
George Kihara
No ratings yet
Lesson Plan in Basic Calculus
Document3 pages
Lesson Plan in Basic Calculus
Emerald Jane Fiel
No ratings yet
Manipulation of Ideals: Theorem 1.1 (Radical Ideal Membership) Let I F
Document6 pages
Manipulation of Ideals: Theorem 1.1 (Radical Ideal Membership) Let I F
Rania Sadek
No ratings yet
Module03-HandSimulationSlides 180527
Document37 pages
Module03-HandSimulationSlides 180527
Christopher Gian
No ratings yet
Week 5
Document12 pages
Week 5
JaZz SF
No ratings yet
MIT18 05S14 Exam Final Sol
Document11 pages
MIT18 05S14 Exam Final Sol
Erwin Dave M. Dahao
No ratings yet
Long-Memory Time Series: Theory and Methods
From Everand
Long-Memory Time Series: Theory and Methods
Wilfredo Palma
No ratings yet
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Problems Chap9
Document23 pages
Problems Chap9
Mohamed Taha
No ratings yet
أفضل 20 أداة اختراق والقرصنة ألاخلاقية مفتوحة المصدر في 2020 - GNU-Linux Revolution
Document1,033 pages
أفضل 20 أداة اختراق والقرصنة ألاخلاقية مفتوحة المصدر في 2020 - GNU-Linux Revolution
Mohamed Taha
No ratings yet
Problems Chap8
Document22 pages
Problems Chap8
Mohamed Taha
No ratings yet
Problems Chap7
Document13 pages
Problems Chap7
Mohamed Taha
No ratings yet
Problems Chap4
Document36 pages
Problems Chap4
Mohamed Taha
No ratings yet
Problems Chap2
Document26 pages
Problems Chap2
Mohamed Taha
No ratings yet
Problems Chap1
Document20 pages
Problems Chap1
Mohamed Taha
No ratings yet
Internet of Things, Smart Spaces, and Next Generation Networks and Systems (2014)
Document729 pages
Internet of Things, Smart Spaces, and Next Generation Networks and Systems (2014)
Mohamed Taha
100% (1)
C - An Introductory Guide For - Francis John Thottungal
Document61 pages
C - An Introductory Guide For - Francis John Thottungal
Mohamed Taha
100% (1)
Classes in C++
Document73 pages
Classes in C++
Mohamed Taha
No ratings yet
Node N
Document5 pages
Node N
Mohamed Taha
No ratings yet
Data Structures - Data Structures - : Lecture 1: Introduction
Document55 pages
Data Structures - Data Structures - : Lecture 1: Introduction
Mohamed Taha
No ratings yet
Social Media Tools For Researchers
Document3 pages
Social Media Tools For Researchers
Mohamed Taha
No ratings yet
Data Structures, Sample Test Questions For The Material After Test 2, With Answers
Document12 pages
Data Structures, Sample Test Questions For The Material After Test 2, With Answers
Mohamed Taha
No ratings yet
File Structures: ÷ Öof Ú Êëov F
Document9 pages
File Structures: ÷ Öof Ú Êëov F
Mohamed Taha
No ratings yet
Carole Greenes, Boston University: Mathematics Learning and Knowing: A Cognitive Process
Document10 pages
Carole Greenes, Boston University: Mathematics Learning and Knowing: A Cognitive Process
siti
No ratings yet
07 Relation Analysis
Document88 pages
07 Relation Analysis
pradeep
No ratings yet
Happiness and Well-Being in Indian Tradition PDF
Document25 pages
Happiness and Well-Being in Indian Tradition PDF
Kiran Kumar Salagame
No ratings yet
Mental Status Exam
Document2 pages
Mental Status Exam
kristel_nicole18yaho
No ratings yet
Aristotle's The Good Life
Document27 pages
Aristotle's The Good Life
An J La
100% (1)
Saloua Choucair
Document17 pages
Saloua Choucair
zweerver
No ratings yet
Assignment 6 - Engineering Statistics - Spring 2019 PDF
Document4 pages
Assignment 6 - Engineering Statistics - Spring 2019 PDF
Sohnal Rameez
No ratings yet
Consumer Behavior (The Selective Perception)
Document4 pages
Consumer Behavior (The Selective Perception)
Islam Samier
100% (1)
TQM U1 and U2 PDF
Document29 pages
TQM U1 and U2 PDF
A. Shanmugam
No ratings yet
Behavioral Biases in The Decision Making of Individual Investors
Document1 page
Behavioral Biases in The Decision Making of Individual Investors
Muhammad
No ratings yet
Punjabi Kinship Terminology PDF
Document10 pages
Punjabi Kinship Terminology PDF
Ravijot Kaur
No ratings yet
Scientific Hinduism Bringing Science and
Document335 pages
Scientific Hinduism Bringing Science and
Simona B
No ratings yet
Circular Organizing and Triple Loop Learning
Document10 pages
Circular Organizing and Triple Loop Learning
Mohammed Bilal
No ratings yet
Unit 3 The Communication Process. Functions of
Document19 pages
Unit 3 The Communication Process. Functions of
Paqui Nicolas
100% (13)
Research Methodology
Document4 pages
Research Methodology
kundankumarsharma
No ratings yet
Morality Interpretation of Goodness: Morality According To Philosophers
Document2 pages
Morality Interpretation of Goodness: Morality According To Philosophers
Austin Banares
No ratings yet
'G' by Free Fall Experiment Lab Report
Document3 pages
'G' by Free Fall Experiment Lab Report
Raghav Sinha
No ratings yet
Sampling 2018 ESBENSEN Variographic Total .... 201018 - 201018
Document53 pages
Sampling 2018 ESBENSEN Variographic Total .... 201018 - 201018
Luis Katsumoto Huere Anaya
No ratings yet
The Cult of Risk Taking and Social Learning A Study of
Document12 pages
The Cult of Risk Taking and Social Learning A Study of
Hariez Alamsyah
No ratings yet
Critical Issue: Technology Leadership: Enhancing Positive Educational Change
Document24 pages
Critical Issue: Technology Leadership: Enhancing Positive Educational Change
John Paul Viñas
No ratings yet
COUNSELLING
Document92 pages
COUNSELLING
Roal Leo
No ratings yet
Guide For Research Proposal by FDSM
Document33 pages
Guide For Research Proposal by FDSM
Majanja Ashery
No ratings yet
Robert Alexy - On The Structure of Legal Principles
Document11 pages
Robert Alexy - On The Structure of Legal Principles
Mario Calil
No ratings yet
Introduction To Sets: (Socks, Shoes, Watches, Shirts, ... ) (Index, Middle, Ring, Pinky)
Document8 pages
Introduction To Sets: (Socks, Shoes, Watches, Shirts, ... ) (Index, Middle, Ring, Pinky)
Mark Wesly Requirme
No ratings yet
Definition of Basic Terms in Logic
Document52 pages
Definition of Basic Terms in Logic
Alnour Kilas
100% (1)
Vibration Engineering: By: Engr. Ray H. Malonjao, Meng
Document11 pages
Vibration Engineering: By: Engr. Ray H. Malonjao, Meng
Lourdes Marie Rosalejos Ortega
No ratings yet
9780195131819
Document475 pages
9780195131819
Chen Junlong
75% (4)