Welcome to Scribd!

0% found this document useful (0 votes)

7 views

Slides Basics Optimization

Uploaded by

This document introduces machine learning as parameter optimization to minimize the empirical risk function. It discusses how gradient descent can be used as an optimization algorithm to find the optimal parameters by iteratively moving in the direction of steepest descent according to the negative gradient. Gradient descent takes small steps proportional to the learning rate to converge to a local minimum.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

UNSW NonlinearCon - Soln
Document4 pages
UNSW NonlinearCon - Soln
Steve Goke Ayeni
No ratings yet
Dettol Project Report
Document23 pages
Dettol Project Report
urshila bhuyan
60% (15)
CPR Study Guide 2021: Online CPR Cheat Sheet
Document14 pages
CPR Study Guide 2021: Online CPR Cheat Sheet
Henz Freeman
0% (2)
Rutherford Scattering Derivation
Document3 pages
Rutherford Scattering Derivation
terrence5353
No ratings yet
CH 4
Document41 pages
CH 4
Tran Kim Toai
No ratings yet
Slides Basics Riskminimization
Document10 pages
Slides Basics Riskminimization
Chaitanya Parale
No ratings yet
5.5 Other Desirable Properties of A Point Estimator
Document5 pages
5.5 Other Desirable Properties of A Point Estimator
Seshan
No ratings yet
Ex Regularization 2
Document3 pages
Ex Regularization 2
toluei.s29
No ratings yet
Introduction To Optimization: CBMM Summer School Aug 12, 2018
Document64 pages
Introduction To Optimization: CBMM Summer School Aug 12, 2018
Carlos Alonso Aznarán Laos
No ratings yet
The Traveling Salesman Problem
Document19 pages
The Traveling Salesman Problem
wwpretty face
No ratings yet
CS115 Optimization
Document160 pages
CS115 Optimization
Tran Nhan
No ratings yet
Garrett P. - Simple Proof of The Prime Number Theorem
Document9 pages
Garrett P. - Simple Proof of The Prime Number Theorem
cleobulo
No ratings yet
771 A18 Lec17
Document120 pages
771 A18 Lec17
DUDEKULA VIDYASAGAR
No ratings yet
Chapter 5 - Design
Document20 pages
Chapter 5 - Design
lemoke0731
No ratings yet
Desymm
Document13 pages
Desymm
fuck you
No ratings yet
p20 Reprint
Document21 pages
p20 Reprint
aldo
No ratings yet
Slides
Document64 pages
Slides
Ul Rich
No ratings yet
771 A18 Lec16
Document99 pages
771 A18 Lec16
DUDEKULA VIDYASAGAR
No ratings yet
1997 Best Constants For Uncentred Maximal Functions
Document7 pages
1997 Best Constants For Uncentred Maximal Functions
UMAIR ASHFAQ
No ratings yet
03 Growth v3 PDF
Document28 pages
03 Growth v3 PDF
Gate Cse
No ratings yet
Fuzzy Aggregation
Document4 pages
Fuzzy Aggregation
Andi Chamri
No ratings yet
Robot 1
Document5 pages
Robot 1
spamakuta
No ratings yet
09 Prime Number Theorem
Document8 pages
09 Prime Number Theorem
ANURAG RAGHUWANSHI
No ratings yet
Approximation Algorithms
Document9 pages
Approximation Algorithms
hela
No ratings yet
GRS
Document9 pages
GRS
Neema
No ratings yet
20 Notes 6250 f13
Document8 pages
20 Notes 6250 f13
uranub2787
No ratings yet
DSP Lab 6 Z-Transform in MATLAB
Document3 pages
DSP Lab 6 Z-Transform in MATLAB
rajaalidadkayani.rock
No ratings yet
RH 1247
Document6 pages
RH 1247
Mike
No ratings yet
ISS Extra Work 4
Document17 pages
ISS Extra Work 4
CHIRAG GUPTA PCE19IT010
No ratings yet
Derivatives, Integrals, Multiplication and Division by T Laplce Transforms by Michael Pantonilla
Document21 pages
Derivatives, Integrals, Multiplication and Division by T Laplce Transforms by Michael Pantonilla
Michael Pantonilla
No ratings yet
The Miller-Rabin Randomized Primality Test
Document10 pages
The Miller-Rabin Randomized Primality Test
Hung Nguyen
0% (1)
Lec 3
Document22 pages
Lec 3
mohammed.elbakkalielammari
No ratings yet
Agricultural Land Use in Kerala
Document5 pages
Agricultural Land Use in Kerala
avsanthosh
No ratings yet
Problems of The Millennium: The Riemann Hypothesis (2004)
Document9 pages
Problems of The Millennium: The Riemann Hypothesis (2004)
manish9339405
No ratings yet
Linear Matrix Inequalities in Control
Document60 pages
Linear Matrix Inequalities in Control
david
No ratings yet
Dalalyan - 2017 - Theoretical Guarantees For Approximate Sampling From Smooth and Log-Concave Densities
Document26 pages
Dalalyan - 2017 - Theoretical Guarantees For Approximate Sampling From Smooth and Log-Concave Densities
Alex Leviyev
No ratings yet
Introduction To Stochastic Approximation Algorithms
Document14 pages
Introduction To Stochastic Approximation Algorithms
Massimo Tormen
No ratings yet
Lectures 10-11
Document17 pages
Lectures 10-11
kamdemnelson31
No ratings yet
Gradient Descent Algorithm
Document22 pages
Gradient Descent Algorithm
Saqlain Arshad
No ratings yet
SSRN Id4355794
Document11 pages
SSRN Id4355794
Abul Kalam Faruk
No ratings yet
6-Diffusion Equation - C
Document23 pages
6-Diffusion Equation - C
alagarg137691
No ratings yet
Trigonometric Functions
Document1 page
Trigonometric Functions
jiorjo99
No ratings yet
Cheatsheet Supervised Learning
Document4 pages
Cheatsheet Supervised Learning
an7l7a
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
Document6 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
nimishth925
No ratings yet
Chapter 2 - Mathematical Modelling
Document19 pages
Chapter 2 - Mathematical Modelling
ANDREW LEONG CHUN TATT STUDENT
No ratings yet
Crrao
Document7 pages
Crrao
Souparna Mukhopadhyay
No ratings yet
Cheatsheet Supervised Learning
Document4 pages
Cheatsheet Supervised Learning
Amar Kumar
100% (1)
DGM 2023 Endterm
Document12 pages
DGM 2023 Endterm
Ghaith Chebil
No ratings yet
Chapt7-Final Proof
Document10 pages
Chapt7-Final Proof
Muhammad Abu Bakar
No ratings yet
Problem Set 3a
Document2 pages
Problem Set 3a
Yumeji
No ratings yet
Parametric Equations and Polar Coordinates 11.4. Graphing in Polar Coordinates
Document6 pages
Parametric Equations and Polar Coordinates 11.4. Graphing in Polar Coordinates
King Creative
No ratings yet
CS115 Optimization
Document148 pages
CS115 Optimization
Quân Võ Đình Minh
No ratings yet
Full Download Random Processes For Engineers 1st Hajek Solution Manual PDF Full Chapter
Document36 pages
Full Download Random Processes For Engineers 1st Hajek Solution Manual PDF Full Chapter
mochila.loyalty.e50vpq
100% (17)
Lab 05-The Z Transform: Objectives: Time Required: 3 Hrs Programming Language: MATLAB Software Required
Document5 pages
Lab 05-The Z Transform: Objectives: Time Required: 3 Hrs Programming Language: MATLAB Software Required
Aleena Qureshi
No ratings yet
Prime PDF
Document43 pages
Prime PDF
Khokon Gayen
No ratings yet
n i=1 α/2 i σ n n
Document4 pages
n i=1 α/2 i σ n n
Alexander CTO
100% (1)
Homework Set No. 4: 1. Trapezoid and Simpson's Methods
Document3 pages
Homework Set No. 4: 1. Trapezoid and Simpson's Methods
kelbmuts
No ratings yet
The Spectral Theory of Toeplitz Operators. (AM-99), Volume 99
From Everand
The Spectral Theory of Toeplitz Operators. (AM-99), Volume 99
L. Boutet de Monvel
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
Rating: 3.5 out of 5 stars
3.5/5 (8)
Hadamard Transform: Unveiling the Power of Hadamard Transform in Computer Vision
From Everand
Hadamard Transform: Unveiling the Power of Hadamard Transform in Computer Vision
Fouad Sabry
No ratings yet
Long-Memory Time Series: Theory and Methods
From Everand
Long-Memory Time Series: Theory and Methods
Wilfredo Palma
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Slides Classification Discranalysis
Document11 pages
Slides Classification Discranalysis
Chaitanya Parale
No ratings yet
Slides Classification Basicdefs
Document11 pages
Slides Classification Basicdefs
Chaitanya Parale
No ratings yet
Slides Classification Tasks
Document8 pages
Slides Classification Tasks
Chaitanya Parale
No ratings yet
Slides Basics Data
Document9 pages
Slides Basics Data
Chaitanya Parale
No ratings yet
Slides Basics Task
Document6 pages
Slides Basics Task
Chaitanya Parale
No ratings yet
1-Simple Linear Regression
Document4 pages
1-Simple Linear Regression
Chaitanya Parale
No ratings yet
Slides Basics Riskminimization
Document10 pages
Slides Basics Riskminimization
Chaitanya Parale
No ratings yet
Ebook BigData Beginners
Document15 pages
Ebook BigData Beginners
Chaitanya Parale
No ratings yet
Slides Basics Learnercomponents Hro
Document9 pages
Slides Basics Learnercomponents Hro
Chaitanya Parale
No ratings yet
Time Delay Report
Document2 pages
Time Delay Report
Chaitanya Parale
No ratings yet
Slides Basics Whatisml
Document10 pages
Slides Basics Whatisml
Chaitanya Parale
No ratings yet
An Embedded System For Collection and Real-Time Classification of A Tactile Dataset
Document12 pages
An Embedded System For Collection and Real-Time Classification of A Tactile Dataset
Chaitanya Parale
No ratings yet
Data Preparation For Machine Learning Sample
Document30 pages
Data Preparation For Machine Learning Sample
Chaitanya Parale
No ratings yet
Detecting False Alarms From Automatic Static Analysis Tools: How Far Are We?
Document12 pages
Detecting False Alarms From Automatic Static Analysis Tools: How Far Are We?
Chaitanya Parale
No ratings yet
Documentation
Document1 page
Documentation
Chaitanya Parale
No ratings yet
1.41. String New
Document2 pages
1.41. String New
Chaitanya Parale
No ratings yet
1.3.1 Scope of Variables
Document1 page
1.3.1 Scope of Variables
Chaitanya Parale
No ratings yet
1.2.1. Variables (1) - Jupyter Notebook
Document6 pages
1.2.1. Variables (1) - Jupyter Notebook
Chaitanya Parale
No ratings yet
1.3.1 Scope of Variables
Document2 pages
1.3.1 Scope of Variables
Chaitanya Parale
No ratings yet
Legitimacy Theory
Document6 pages
Legitimacy Theory
Fakoyede Oluwapamilerin
No ratings yet
The Identification of Criteria Used For Infrastructure Sustainability Tools
Document7 pages
The Identification of Criteria Used For Infrastructure Sustainability Tools
Nur Shahirah Nadhilah Wahab
No ratings yet
Summaryof Standards
Document2 pages
Summaryof Standards
roman_maximo
No ratings yet
Novel Zero-Current-Switching (ZCS) PWM Switch Cell Minimizing Additional Conduction Loss
Document6 pages
Novel Zero-Current-Switching (ZCS) PWM Switch Cell Minimizing Additional Conduction Loss
ashish88bhardwaj_314
No ratings yet
کورونا وائرس کے ظہور کی پیشن گوئی احادیث نبوی میں
Document56 pages
کورونا وائرس کے ظہور کی پیشن گوئی احادیث نبوی میں
Waheeduddin Mahbub
No ratings yet
2thesisfinal 170811170156 PDF
Document72 pages
2thesisfinal 170811170156 PDF
Joshua Bergonia
No ratings yet
Extruder Presentation 2
Document63 pages
Extruder Presentation 2
Dede R Bakhtiyar
No ratings yet
Pumps, Valves and Compressor
Document52 pages
Pumps, Valves and Compressor
Anonymous zUreJLmN
100% (1)
Recent Legislations - Cabarles
Document11 pages
Recent Legislations - Cabarles
Karl Cabarles
No ratings yet
15 Percy Street Hanley Stoke On Trent: St1 1na
Document2 pages
15 Percy Street Hanley Stoke On Trent: St1 1na
doctor12345
No ratings yet
Apache Airflow Fundamentals Study Guide
Document7 pages
Apache Airflow Fundamentals Study Guide
Don
No ratings yet
Transgenic Crops
Document13 pages
Transgenic Crops
rana shahid
100% (1)
Feasibility Study - Chapter I
Document3 pages
Feasibility Study - Chapter I
Bhenil Bon Velasquez
No ratings yet
Reduce: Bharti Infratel Bhin in
Document11 pages
Reduce: Bharti Infratel Bhin in
ashok yadav
No ratings yet
10TMSS04R2
Document32 pages
10TMSS04R2
mogbel1
No ratings yet
Fumigation Planning Guide: Facts
Document7 pages
Fumigation Planning Guide: Facts
tahir
No ratings yet
SAFe Agilist 6.0 Real Dumps
Document11 pages
SAFe Agilist 6.0 Real Dumps
Romrom
No ratings yet
Reducer: Sweat Reducer Hydraulic Concentric Eccentric Reducers
Document1 page
Reducer: Sweat Reducer Hydraulic Concentric Eccentric Reducers
Mayur Mandrekar
No ratings yet
B3-201-2018 - Developing and Using Justifiable Asset Health Indices For Tactical and Strategic Risk Management
Document10 pages
B3-201-2018 - Developing and Using Justifiable Asset Health Indices For Tactical and Strategic Risk Management
NamLe
No ratings yet
NCKH ĐỊNH TÍNH
Document75 pages
NCKH ĐỊNH TÍNH
nguyennguyen.31221022808
No ratings yet
Bank of America Vs Phil Racing Club
Document2 pages
Bank of America Vs Phil Racing Club
Mary Joyce Lacambra Aquino
No ratings yet
Monthly Questions (February) (E-Math)
Document48 pages
Monthly Questions (February) (E-Math)
Wei Ting Chui
No ratings yet
Accounting Problems
Document31 pages
Accounting Problems
Janna Gunio
100% (1)
2012rap BRC PDF
Document185 pages
2012rap BRC PDF
Farikh
No ratings yet
S111 EN 12 Aluminium Standard Inclinometric Casing
Document4 pages
S111 EN 12 Aluminium Standard Inclinometric Casing
Irwan Darmawan
No ratings yet
LKS1
Document34 pages
LKS1
Vijay Saroj
No ratings yet
Fti Consulting Document
Document20 pages
Fti Consulting Document
tarunchat
No ratings yet
Academic Calendar 1441H (2019-2020) : Second Semester (192) : DAY Week Hijri Date Gregorian Date Events
Document1 page
Academic Calendar 1441H (2019-2020) : Second Semester (192) : DAY Week Hijri Date Gregorian Date Events
Jason Franklin
No ratings yet

Slides Basics Optimization

Uploaded by

Chaitanya Parale

0% found this document useful (0 votes)

7 views12 pages

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

7 views12 pages

Slides Basics Optimization

Uploaded by

Chaitanya Parale

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 12

Search inside document

Introduction to Machine Learning

ML-Basics: Optimization

Learning goals
Understand how the risk function
R em

is optimized to learn the optimal

parameters of a model
Understand the idea of gradient
θ2

θ1
descent as a basic risk optimizer
LEARNING AS PARAMETER OPTIMIZATION

We have seen, we can operationalize the search for a model f that

matches training data best, by looking for its parametrization
θ ∈ Θ with lowest empirical risk Remp (θ).
Therefore, we usually traverse the error surface downwards; often
by local search from a starting point to its minimum.
R em
p

θ2

θ1

c Introduction to Machine Learning – 1 / 11
LEARNING AS PARAMETER OPTIMIZATION
The ERM optimization problem is:

θ̂ = arg min Remp (θ).

θ∈Θ

For a (global) minimum θ̂ it obviously holds that

∀θ ∈ Θ : Remp (θ̂) ≤ Remp (θ).

This does not imply that θ̂ is unique.

Which kind of numerical technique is reasonable for this problem

strongly depends on model and parameter structure (continuous
params? uni-modal Remp (θ)?). Here, we will only discuss very simple
scenarios.

c Introduction to Machine Learning – 2 / 11
LOCAL MINIMA
If Remp is continuous in θ we can define a local minimum θ̂ :

∃ > 0 ∀θ with θ̂ − θ < : Remp (θ̂) ≤ Remp (θ).

Clearly every global minimum is also a local minimum. Finding a local

minimum is easier than finding a global minimum.

minimum
Remp(θ)

● global
●

● local

c Introduction to Machine Learning – 3 / 11
LOCAL MINIMA AND STATIONARY POINTS
If Remp is continuously differentiable in θ then a sufficient condition for a local
minimum is that θ̂ is stationary with 0 gradient, so no local improvement is possible:

∂
Remp (θ̂) = 0
∂θ
2
∂
and the Hessian ∂θ 2 Remp (θ̂) is positive definite. While the neg. gradient points into

the direction of fastest local decrease, the Hessian measures local curvature of Remp .

Remp
R em

100

R em
p

75
θ2

p
50

25
θ2

θ2
θ1 θ1
θ1

∂
∂θ
Remp (θ) const. pos. def. Hessian const. neg. def. Hessian

c Introduction to Machine Learning – 4 / 11
LEAST SQUARES ESTIMATOR
Now, for given features X ∈ Rn×p and target y ∈ Rn , we want to find
the best linear model regarding the squared error loss, i.e.,
n
X
Remp (θ) = kXθ − yk22 = (θ > x(i ) − y (i ) )2 .
i =1

With the sufficient condition for continously differentiable functions it

can be shown that the least squares estimator

θ̂ = (X> X)−1 X> y.

is a local minimum of Remp . If X is full-rank, Remp is strictly convex and

there is only one local minimum - which is also global.

Note: Often such analytical solutions in ML are not possible, and we

rather have to use iterative numerical optimization.

c Introduction to Machine Learning – 5 / 11
GRADIENT DESCENT
The simple idea of GD is to iteratively go from the current candidate θ [t ]
in the direction of the negative gradient, i.e., the direction of the
steepest descent, with learning rate α to the next θ [t +1] :

∂
θ [t +1] = θ [t ] − α Remp (θ [t ] ).
∂θ

θ[0]
R em

We choose a random start θ [0] with risk

Remp (θ [0] ) = 76.25.

θ2

θ1

c Introduction to Machine Learning – 6 / 11
GRADIENT DESCENT - EXAMPLE

∂
−α ∂θ Remp (θ[0] )
R em

Now we follow in the direction of the

negative gradient at θ [0] .

θ2

θ1

We arrive at θ [1] with risk

R em

[1]
θ Remp (θ [1] ) ≈ 42.73.
p

We improved:
Remp (θ [1] ) < Remp (θ [0] ).
θ2

θ1

c Introduction to Machine Learning – 7 / 11
GRADIENT DESCENT - EXAMPLE
R em

∂
−α ∂θ Remp (θ[1] ) Again we follow in the direction of the
p

negative gradient, but now at θ [1] .

θ2

θ1
R em
p

θ[2] Now θ [2] has risk Remp (θ [2] ) ≈ 25.08.

θ2

θ1

c Introduction to Machine Learning – 8 / 11
GRADIENT DESCENT - EXAMPLE
R em

We iterate this until some form of con-

vergence or termination.
θ2

θ1
R em
p

We arrive close to a stationary θ̂ which

θ̂ is hopefully at least a local minimum.
θ2

θ1

c Introduction to Machine Learning – 9 / 11
GRADIENT DESCENT - LEARNING RATE
The negative gradient is a direction that looks locally promising to reduce Remp .
Hence it weights components higher in which Remp decreases more.
∂
However, the length of − ∂θ Remp measures only the local decrease rate, i.e.,
there are no guarantees that we will not go "too far".
We use a learning rate α to scale the step length in each iteration. Too much can
lead to overstepping and no converge, too low leads to slow convergence.
Usually, a simple constant rate or rate-decrease mechanisms to enforce local
convergence are used

θ●[0] θ●[0] θ●[0]

θ2

θ2
θ1 θ1 θ1

good convergence for α1 poor convergence for α2 (< α1 ) no convergence for α3 (> α1 )

c Introduction to Machine Learning – 10 / 11
FURTHER TOPICS

GD is a so-called first-order method. Second-order methods use

the Hessian to refine the search direction for faster convergence.
There exist many improvements of GD, e.g., to smartly control the
learn rate, to escape saddle points, to mimic second order
behavior without computing the expensive Hessian.
If the gradient of GD is not derived from the empirical risk of the
whole data set, but instead from a randomly selected subset, we
call this stochastic gradient descent (SGD). For large-scale
problems this can lead to higher computational efficiency.

c Introduction to Machine Learning – 11 / 11

UNSW NonlinearCon - Soln
Document4 pages
UNSW NonlinearCon - Soln
Steve Goke Ayeni
No ratings yet
Dettol Project Report
Document23 pages
Dettol Project Report
urshila bhuyan
60% (15)
CPR Study Guide 2021: Online CPR Cheat Sheet
Document14 pages
CPR Study Guide 2021: Online CPR Cheat Sheet
Henz Freeman
0% (2)
Rutherford Scattering Derivation
Document3 pages
Rutherford Scattering Derivation
terrence5353
No ratings yet
CH 4
Document41 pages
CH 4
Tran Kim Toai
No ratings yet
Slides Basics Riskminimization
Document10 pages
Slides Basics Riskminimization
Chaitanya Parale
No ratings yet
5.5 Other Desirable Properties of A Point Estimator
Document5 pages
5.5 Other Desirable Properties of A Point Estimator
Seshan
No ratings yet
Ex Regularization 2
Document3 pages
Ex Regularization 2
toluei.s29
No ratings yet
Introduction To Optimization: CBMM Summer School Aug 12, 2018
Document64 pages
Introduction To Optimization: CBMM Summer School Aug 12, 2018
Carlos Alonso Aznarán Laos
No ratings yet
The Traveling Salesman Problem
Document19 pages
The Traveling Salesman Problem
wwpretty face
No ratings yet
CS115 Optimization
Document160 pages
CS115 Optimization
Tran Nhan
No ratings yet
Garrett P. - Simple Proof of The Prime Number Theorem
Document9 pages
Garrett P. - Simple Proof of The Prime Number Theorem
cleobulo
No ratings yet
771 A18 Lec17
Document120 pages
771 A18 Lec17
DUDEKULA VIDYASAGAR
No ratings yet
Chapter 5 - Design
Document20 pages
Chapter 5 - Design
lemoke0731
No ratings yet
Desymm
Document13 pages
Desymm
fuck you
No ratings yet
p20 Reprint
Document21 pages
p20 Reprint
aldo
No ratings yet
Slides
Document64 pages
Slides
Ul Rich
No ratings yet
771 A18 Lec16
Document99 pages
771 A18 Lec16
DUDEKULA VIDYASAGAR
No ratings yet
1997 Best Constants For Uncentred Maximal Functions
Document7 pages
1997 Best Constants For Uncentred Maximal Functions
UMAIR ASHFAQ
No ratings yet
03 Growth v3 PDF
Document28 pages
03 Growth v3 PDF
Gate Cse
No ratings yet
Fuzzy Aggregation
Document4 pages
Fuzzy Aggregation
Andi Chamri
No ratings yet
Robot 1
Document5 pages
Robot 1
spamakuta
No ratings yet
09 Prime Number Theorem
Document8 pages
09 Prime Number Theorem
ANURAG RAGHUWANSHI
No ratings yet
Approximation Algorithms
Document9 pages
Approximation Algorithms
hela
No ratings yet
GRS
Document9 pages
GRS
Neema
No ratings yet
20 Notes 6250 f13
Document8 pages
20 Notes 6250 f13
uranub2787
No ratings yet
DSP Lab 6 Z-Transform in MATLAB
Document3 pages
DSP Lab 6 Z-Transform in MATLAB
rajaalidadkayani.rock
No ratings yet
RH 1247
Document6 pages
RH 1247
Mike
No ratings yet
ISS Extra Work 4
Document17 pages
ISS Extra Work 4
CHIRAG GUPTA PCE19IT010
No ratings yet
Derivatives, Integrals, Multiplication and Division by T Laplce Transforms by Michael Pantonilla
Document21 pages
Derivatives, Integrals, Multiplication and Division by T Laplce Transforms by Michael Pantonilla
Michael Pantonilla
No ratings yet
The Miller-Rabin Randomized Primality Test
Document10 pages
The Miller-Rabin Randomized Primality Test
Hung Nguyen
0% (1)
Lec 3
Document22 pages
Lec 3
mohammed.elbakkalielammari
No ratings yet
Agricultural Land Use in Kerala
Document5 pages
Agricultural Land Use in Kerala
avsanthosh
No ratings yet
Problems of The Millennium: The Riemann Hypothesis (2004)
Document9 pages
Problems of The Millennium: The Riemann Hypothesis (2004)
manish9339405
No ratings yet
Linear Matrix Inequalities in Control
Document60 pages
Linear Matrix Inequalities in Control
david
No ratings yet
Dalalyan - 2017 - Theoretical Guarantees For Approximate Sampling From Smooth and Log-Concave Densities
Document26 pages
Dalalyan - 2017 - Theoretical Guarantees For Approximate Sampling From Smooth and Log-Concave Densities
Alex Leviyev
No ratings yet
Introduction To Stochastic Approximation Algorithms
Document14 pages
Introduction To Stochastic Approximation Algorithms
Massimo Tormen
No ratings yet
Lectures 10-11
Document17 pages
Lectures 10-11
kamdemnelson31
No ratings yet
Gradient Descent Algorithm
Document22 pages
Gradient Descent Algorithm
Saqlain Arshad
No ratings yet
SSRN Id4355794
Document11 pages
SSRN Id4355794
Abul Kalam Faruk
No ratings yet
6-Diffusion Equation - C
Document23 pages
6-Diffusion Equation - C
alagarg137691
No ratings yet
Trigonometric Functions
Document1 page
Trigonometric Functions
jiorjo99
No ratings yet
Cheatsheet Supervised Learning
Document4 pages
Cheatsheet Supervised Learning
an7l7a
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
Document6 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
nimishth925
No ratings yet
Chapter 2 - Mathematical Modelling
Document19 pages
Chapter 2 - Mathematical Modelling
ANDREW LEONG CHUN TATT STUDENT
No ratings yet
Crrao
Document7 pages
Crrao
Souparna Mukhopadhyay
No ratings yet
Cheatsheet Supervised Learning
Document4 pages
Cheatsheet Supervised Learning
Amar Kumar
100% (1)
DGM 2023 Endterm
Document12 pages
DGM 2023 Endterm
Ghaith Chebil
No ratings yet
Chapt7-Final Proof
Document10 pages
Chapt7-Final Proof
Muhammad Abu Bakar
No ratings yet
Problem Set 3a
Document2 pages
Problem Set 3a
Yumeji
No ratings yet
Parametric Equations and Polar Coordinates 11.4. Graphing in Polar Coordinates
Document6 pages
Parametric Equations and Polar Coordinates 11.4. Graphing in Polar Coordinates
King Creative
No ratings yet
CS115 Optimization
Document148 pages
CS115 Optimization
Quân Võ Đình Minh
No ratings yet
Full Download Random Processes For Engineers 1st Hajek Solution Manual PDF Full Chapter
Document36 pages
Full Download Random Processes For Engineers 1st Hajek Solution Manual PDF Full Chapter
mochila.loyalty.e50vpq
100% (17)
Lab 05-The Z Transform: Objectives: Time Required: 3 Hrs Programming Language: MATLAB Software Required
Document5 pages
Lab 05-The Z Transform: Objectives: Time Required: 3 Hrs Programming Language: MATLAB Software Required
Aleena Qureshi
No ratings yet
Prime PDF
Document43 pages
Prime PDF
Khokon Gayen
No ratings yet
n i=1 α/2 i σ n n
Document4 pages
n i=1 α/2 i σ n n
Alexander CTO
100% (1)
Homework Set No. 4: 1. Trapezoid and Simpson's Methods
Document3 pages
Homework Set No. 4: 1. Trapezoid and Simpson's Methods
kelbmuts
No ratings yet
The Spectral Theory of Toeplitz Operators. (AM-99), Volume 99
From Everand
The Spectral Theory of Toeplitz Operators. (AM-99), Volume 99
L. Boutet de Monvel
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
Rating: 3.5 out of 5 stars
3.5/5 (8)
Hadamard Transform: Unveiling the Power of Hadamard Transform in Computer Vision
From Everand
Hadamard Transform: Unveiling the Power of Hadamard Transform in Computer Vision
Fouad Sabry
No ratings yet
Long-Memory Time Series: Theory and Methods
From Everand
Long-Memory Time Series: Theory and Methods
Wilfredo Palma
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Slides Classification Discranalysis
Document11 pages
Slides Classification Discranalysis
Chaitanya Parale
No ratings yet
Slides Classification Basicdefs
Document11 pages
Slides Classification Basicdefs
Chaitanya Parale
No ratings yet
Slides Classification Tasks
Document8 pages
Slides Classification Tasks
Chaitanya Parale
No ratings yet
Slides Basics Data
Document9 pages
Slides Basics Data
Chaitanya Parale
No ratings yet
Slides Basics Task
Document6 pages
Slides Basics Task
Chaitanya Parale
No ratings yet
1-Simple Linear Regression
Document4 pages
1-Simple Linear Regression
Chaitanya Parale
No ratings yet
Slides Basics Riskminimization
Document10 pages
Slides Basics Riskminimization
Chaitanya Parale
No ratings yet
Ebook BigData Beginners
Document15 pages
Ebook BigData Beginners
Chaitanya Parale
No ratings yet
Slides Basics Learnercomponents Hro
Document9 pages
Slides Basics Learnercomponents Hro
Chaitanya Parale
No ratings yet
Time Delay Report
Document2 pages
Time Delay Report
Chaitanya Parale
No ratings yet
Slides Basics Whatisml
Document10 pages
Slides Basics Whatisml
Chaitanya Parale
No ratings yet
An Embedded System For Collection and Real-Time Classification of A Tactile Dataset
Document12 pages
An Embedded System For Collection and Real-Time Classification of A Tactile Dataset
Chaitanya Parale
No ratings yet
Data Preparation For Machine Learning Sample
Document30 pages
Data Preparation For Machine Learning Sample
Chaitanya Parale
No ratings yet
Detecting False Alarms From Automatic Static Analysis Tools: How Far Are We?
Document12 pages
Detecting False Alarms From Automatic Static Analysis Tools: How Far Are We?
Chaitanya Parale
No ratings yet
Documentation
Document1 page
Documentation
Chaitanya Parale
No ratings yet
1.41. String New
Document2 pages
1.41. String New
Chaitanya Parale
No ratings yet
1.3.1 Scope of Variables
Document1 page
1.3.1 Scope of Variables
Chaitanya Parale
No ratings yet
1.2.1. Variables (1) - Jupyter Notebook
Document6 pages
1.2.1. Variables (1) - Jupyter Notebook
Chaitanya Parale
No ratings yet
1.3.1 Scope of Variables
Document2 pages
1.3.1 Scope of Variables
Chaitanya Parale
No ratings yet
Legitimacy Theory
Document6 pages
Legitimacy Theory
Fakoyede Oluwapamilerin
No ratings yet
The Identification of Criteria Used For Infrastructure Sustainability Tools
Document7 pages
The Identification of Criteria Used For Infrastructure Sustainability Tools
Nur Shahirah Nadhilah Wahab
No ratings yet
Summaryof Standards
Document2 pages
Summaryof Standards
roman_maximo
No ratings yet
Novel Zero-Current-Switching (ZCS) PWM Switch Cell Minimizing Additional Conduction Loss
Document6 pages
Novel Zero-Current-Switching (ZCS) PWM Switch Cell Minimizing Additional Conduction Loss
ashish88bhardwaj_314
No ratings yet
کورونا وائرس کے ظہور کی پیشن گوئی احادیث نبوی میں
Document56 pages
کورونا وائرس کے ظہور کی پیشن گوئی احادیث نبوی میں
Waheeduddin Mahbub
No ratings yet
2thesisfinal 170811170156 PDF
Document72 pages
2thesisfinal 170811170156 PDF
Joshua Bergonia
No ratings yet
Extruder Presentation 2
Document63 pages
Extruder Presentation 2
Dede R Bakhtiyar
No ratings yet
Pumps, Valves and Compressor
Document52 pages
Pumps, Valves and Compressor
Anonymous zUreJLmN
100% (1)
Recent Legislations - Cabarles
Document11 pages
Recent Legislations - Cabarles
Karl Cabarles
No ratings yet
15 Percy Street Hanley Stoke On Trent: St1 1na
Document2 pages
15 Percy Street Hanley Stoke On Trent: St1 1na
doctor12345
No ratings yet
Apache Airflow Fundamentals Study Guide
Document7 pages
Apache Airflow Fundamentals Study Guide
Don
No ratings yet
Transgenic Crops
Document13 pages
Transgenic Crops
rana shahid
100% (1)
Feasibility Study - Chapter I
Document3 pages
Feasibility Study - Chapter I
Bhenil Bon Velasquez
No ratings yet
Reduce: Bharti Infratel Bhin in
Document11 pages
Reduce: Bharti Infratel Bhin in
ashok yadav
No ratings yet
10TMSS04R2
Document32 pages
10TMSS04R2
mogbel1
No ratings yet
Fumigation Planning Guide: Facts
Document7 pages
Fumigation Planning Guide: Facts
tahir
No ratings yet
SAFe Agilist 6.0 Real Dumps
Document11 pages
SAFe Agilist 6.0 Real Dumps
Romrom
No ratings yet
Reducer: Sweat Reducer Hydraulic Concentric Eccentric Reducers
Document1 page
Reducer: Sweat Reducer Hydraulic Concentric Eccentric Reducers
Mayur Mandrekar
No ratings yet
B3-201-2018 - Developing and Using Justifiable Asset Health Indices For Tactical and Strategic Risk Management
Document10 pages
B3-201-2018 - Developing and Using Justifiable Asset Health Indices For Tactical and Strategic Risk Management
NamLe
No ratings yet
NCKH ĐỊNH TÍNH
Document75 pages
NCKH ĐỊNH TÍNH
nguyennguyen.31221022808
No ratings yet
Bank of America Vs Phil Racing Club
Document2 pages
Bank of America Vs Phil Racing Club
Mary Joyce Lacambra Aquino
No ratings yet
Monthly Questions (February) (E-Math)
Document48 pages
Monthly Questions (February) (E-Math)
Wei Ting Chui
No ratings yet
Accounting Problems
Document31 pages
Accounting Problems
Janna Gunio
100% (1)
2012rap BRC PDF
Document185 pages
2012rap BRC PDF
Farikh
No ratings yet
S111 EN 12 Aluminium Standard Inclinometric Casing
Document4 pages
S111 EN 12 Aluminium Standard Inclinometric Casing
Irwan Darmawan
No ratings yet
LKS1
Document34 pages
LKS1
Vijay Saroj
No ratings yet
Fti Consulting Document
Document20 pages
Fti Consulting Document
tarunchat
No ratings yet
Academic Calendar 1441H (2019-2020) : Second Semester (192) : DAY Week Hijri Date Gregorian Date Events
Document1 page
Academic Calendar 1441H (2019-2020) : Second Semester (192) : DAY Week Hijri Date Gregorian Date Events
Jason Franklin
No ratings yet

Slides Basics Optimization

Uploaded by

Copyright:

Available Formats

You might also like

Slides Basics Optimization

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slides Basics Optimization

Uploaded by

Copyright:

Available Formats

Introduction to Machine Learning

is optimized to learn the optimal

We have seen, we can operationalize the search for a model f that

θ̂ = arg min Remp (θ).

For a (global) minimum θ̂ it obviously holds that

∀θ ∈ Θ : Remp (θ̂) ≤ Remp (θ).

This does not imply that θ̂ is unique.

Which kind of numerical technique is reasonable for this problem

Clearly every global minimum is also a local minimum. Finding a local

With the sufficient condition for continously differentiable functions it

θ̂ = (X> X)−1 X> y.

is a local minimum of Remp . If X is full-rank, Remp is strictly convex and

Note: Often such analytical solutions in ML are not possible, and we

We choose a random start θ [0] with risk

Remp (θ [0] ) = 76.25.

Now we follow in the direction of the

negative gradient at θ [0] .

We arrive at θ [1] with risk

negative gradient, but now at θ [1] .

θ[2] Now θ [2] has risk Remp (θ [2] ) ≈ 25.08.

We iterate this until some form of con-

We arrive close to a stationary θ̂ which

θ●[0] θ●[0] θ●[0]

GD is a so-called first-order method. Second-order methods use

You might also like