Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

Lecture 05

Fitting Neurons with


Gradient Descent
STAT 479: Deep Learning, Spring 2019
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat479-ss2019/

1
DISCUSS HOMEWORK

Sebastian Raschka STAT 479: Machine Learning FS 2018 !2


https://twitter.com/hardmaru/status/876303574900264960

Also known as "graduate student descent"


3
Our Goals
• A learning rule that is more robust than the perceptron:
always converges even if the

data is not (linearly) separable

• Combine multiple neurons and layers of neurons



("deep neural nets") to learn more complex decision
boundaries (because most real-world problems are not
"linear" problems!)

• Handle multiple categories (not just binary) in classification

• Do even fancier things like generating NEW images and text


Age: 30

Sebastian Raschka STAT 479: Deep Learning SS 2019 !4


Our Goals
• A learning rule that is more robust than the perceptron:
always converges even if the
 This lecture
data is not (linearly) separable

• Combine multiple neurons and layers of neurons



("deep neural nets") to learn more complex decision Next lecture(s)
boundaries (because most real-world problems are not
"linear" problems!)

• Handle multiple categories (not just binary) in classification Next lecture(s)

• Do even fancier things like generating NEW images and text More towards the

end of the course

} <latexit sha1_base64="RlH2tdjr0xMRC8QkwnZnHd0h1Fs=">AAAB6XicbVBNS8NAEJ3Ur1q/oh69LBbBU0lE0GPRi8cq9gPaUDbbTbt0swm7E6GE/gMvHhTx6j/y5r9x2+agrQ8GHu/NMDMvTKUw6HnfTmltfWNzq7xd2dnd2z9wD49aJsk0402WyER3Qmq4FIo3UaDknVRzGoeSt8Px7cxvP3FtRKIecZLyIKZDJSLBKFrpoTftu1Wv5s1BVolfkCoUaPTdr94gYVnMFTJJjen6XopBTjUKJvm00ssMTykb0yHvWqpozE2Qzy+dkjOrDEiUaFsKyVz9PZHT2JhJHNrOmOLILHsz8T+vm2F0HeRCpRlyxRaLokwSTMjsbTIQmjOUE0so08LeStiIasrQhlOxIfjLL6+S1kXN92r+/WW1flPEUYYTOIVz8OEK6nAHDWgCgwie4RXenLHz4rw7H4vWklPMHMMfOJ8/noyNZw==</latexit>
All based on the same learning
algorithm and extensions
thereof. 

So, this is prob. the most
fundamental lecture!
Sebastian Raschka STAT 479: Deep Learning SS 2019 !5
Good news:


• After this lecture, there won't be any


"new" mathematical concepts.

• Everything in DL will be extensions &


applications of these basic concepts.

Sebastian Raschka STAT 479: Deep Learning SS 2019 !6


Perceptron Recap
b
<latexit sha1_base64="s6L+Z+fhtGywXDdOyCIOKOnOTTA=">AAAB6HicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjC/YD2lA220m7drMJuxuhhP4CLx4U8epP8ua/cdvmoK0vLDy8M8POvEEiuDau++0UNja3tneKu6W9/YPDo/LxSVvHqWLYYrGIVTegGgWX2DLcCOwmCmkUCOwEk7t5vfOESvNYPphpgn5ER5KHnFFjrWYwKFfcqrsQWQcvhwrkagzKX/1hzNIIpWGCat3z3MT4GVWGM4GzUj/VmFA2oSPsWZQ0Qu1ni0Vn5MI6QxLGyj5pyML9PZHRSOtpFNjOiJqxXq3Nzf9qvdSEN37GZZIalGz5UZgKYmIyv5oMuUJmxNQCZYrbXQkbU0WZsdmUbAje6snr0L6qepab15X6bR5HEc7gHC7BgxrU4R4a0AIGCM/wCm/Oo/PivDsfy9aCk8+cwh85nz/E44zm</latexit>
<latexit

x1
X
mm !!
X
<latexit sha1_base64="Z7jxfJr8/pbKF9IEHv5u2p28PzU=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwuyjaA=</latexit>

Activation T
w1 x1xwi1w+
i+b b = = xT wx+wb +
= bŷ = ŷ
<latexit sha1_base64="yC0dBIEl9qTv0X7X9wtBoMi5o3k=">AAAB+nicbZBNS8NAEIYnftb6lerRS7AInkoigh6rXjxWsB/QhrLZbtulm03YnVRL7E/x4kERr/4Sb/4bt2kO2vrCwsM7M8zsG8SCa3Tdb2tldW19Y7OwVdze2d3bt0sHDR0lirI6jUSkWgHRTHDJ6shRsFasGAkDwZrB6GZWb46Z0jyS9ziJmR+SgeR9Tgkaq2uXOsgeMb2iyMeZNe3aZbfiZnKWwcuhDLlqXfur04toEjKJVBCt254bo58ShZwKNi12Es1iQkdkwNoGJQmZ9tPs9KlzYpye04+UeRKdzP09kZJQ60kYmM6Q4FAv1mbmf7V2gv1LP+UyTpBJOl/UT4SDkTPLwelxxSiKiQFCFTe3OnRIFKFo0iqaELzFLy9D46ziGb47L1ev8zgKcATHcAoeXEAVbqEGdaDwAM/wCm/Wk/VivVsf89YVK585hD+yPn8ADeOUgA==</latexit>

<latexit sha1_base64="ozSIzVA/SGXegmac4XRXthOpvw0=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwosjZ8=</latexit>

X i=1
i=1
x2 w2 ŷ <latexit sha1_base64="A6o7zW7lnOXrNRdOyoA7bzsujJ4=">AAACUXicbVFNS8QwEJ2t3+vXqkcvg4uwIkgrgl4E0YtHBVeF7VrSbNoNJm1JUnUp/Yse9OT/8OJBMdtdwa+BgZc375HJS5gJro3rvtScicmp6ZnZufr8wuLScmNl9VKnuaKsTVORquuQaCZ4wtqGG8GuM8WIDAW7Cm9PhvOrO6Y0T5MLM8hYV5I44RGnxFgqaPR9zWNJ/GMexy30dS6Dgh965U0hS3wIPLy3vY0hVootPMSRAf2Qxy1fEtMPo+KhvLnAr8N9OTJYQaXvE1MMyqDRdHfcqvAv8MagCeM6CxpPfi+luWSJoYJo3fHczHQLogyngpV1P9csI/SWxKxjYUIk092iSqTETcv0MEqV7cRgxX53FERqPZChVQ631r9nQ/K/WSc30UG34EmWG5bQ0UVRLtCkOIwXe1wxasTAAkIVt7si7RNFqLGfULcheL+f/Bdc7u54Fp/vNY+Ox3HMwjpsQAs82IcjOIUzaAOFR3iFd/ioPdfeHHCckdSpjT1r8KOc+U+YcbFc</latexit>

<latexit sha1_base64="5F4RtKSwN6gLVNG6+drmzi7FRow=">AAACfHicbVFNT9tAEF0bKJDSksKRAyNSpCBUZKOq7aVSVC4cg5SQSHGI1pt1ssqube2OGyLLv6L/jFt/Si8Vm8TiI2Gk1Ty992Z2djZMpTDoeX8dd2Nz6932zm7l/d6Hj/vVTwe3Jsk0422WyER3Q2q4FDFvo0DJu6nmVIWSd8LJ1Vzv/ObaiCRu4SzlfUVHsYgEo2ipQfVPYMRIUQgkj7AOgckUDCAHAT/BhwLuLFY235dsAdMndA4hBFqMxnhm3SuNFMVxGFnnfdmlZfMzO12vH1O0wgyKQbXmXXiLgHXgl6BGymgOqg/BMGGZ4jEySY3p+V6K/ZxqFEzyohJkhqeUTeiI9yyMqeKmny+WV8CpZYYQJdqeGGHBvqzIqTJmpkLrnE9vVrU5+ZbWyzD60c9FnGbIY7a8KMokYALzn4Ch0JyhnFlAmRZ2VmBjqilD+18VuwR/9cnr4Pbywrf45mut8atcxw45IiekTnzynTTINWmSNmHkn3Ps1J0z57/72T13vyytrlPWHJJX4X57BMCwuMI=</latexit>

((
<latexit sha1_base64="Vi95YwknFrFzcB5LyqgiYSoMf0U=">AAAB7nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjBfsBbSib7aZdutmE3YkQQn+EFw+KePX3ePPfuG1z0NYXFh7emWFn3iCRwqDrfjuljc2t7Z3ybmVv/+DwqHp80jFxqhlvs1jGuhdQw6VQvI0CJe8lmtMokLwbTO/m9e4T10bE6hGzhPsRHSsRCkbRWt3BhGKezYbVmlt3FyLr4BVQg0KtYfVrMIpZGnGFTFJj+p6boJ9TjYJJPqsMUsMTyqZ0zPsWFY248fPFujNyYZ0RCWNtn0KycH9P5DQyJosC2xlRnJjV2tz8r9ZPMbzxc6GSFLliy4/CVBKMyfx2MhKaM5SZBcq0sLsSNqGaMrQJVWwI3urJ69C5qnuWH65rzdsijjKcwTlcggcNaMI9tKANDKbwDK/w5iTOi/PufCxbS04xcwp/5Hz+ALHDj8o=</latexit>

<latexit sha1_base64="8ur8Qnjf68veizOKVqkUmBXGiPw=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwANNo2h</latexit>

<latexit sha1_base64="dwm2weui/za/rSvUQ73K/Rk4/5k=">AAAB7XicbZDLSgMxFIbP1Futt1GXboJFcFVmRNBl0Y3LCvYC7VAyaaaNzWVIMkIZ+g5uXCji1vdx59uYtrPQ1h8CH/85h5zzxylnxgbBt1daW9/Y3CpvV3Z29/YP/MOjllGZJrRJFFe6E2NDOZO0aZnltJNqikXMaTse387q7SeqDVPywU5SGgk8lCxhBFtntXqGDQXu+9WgFsyFViEsoAqFGn3/qzdQJBNUWsKxMd0wSG2UY20Z4XRa6WWGppiM8ZB2HUosqIny+bZTdOacAUqUdk9aNHd/T+RYGDMRsesU2I7Mcm1m/lfrZja5jnIm08xSSRYfJRlHVqHZ6WjANCWWTxxgopnbFZER1phYF1DFhRAun7wKrYta6Pj+slq/KeIowwmcwjmEcAV1uIMGNIHAIzzDK7x5ynvx3r2PRWvJK2aO4Y+8zx+cYY8j</latexit>

<latexit sha1_base64="sAAe226MpFncoK5AcSpzzUnkA9I=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwALsI2g</latexit> <latexit sha1_base64="0Hc81E1zKREVtkdUaco0RSs+Ymk=">AAAB63icbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHCrYW2qVk02wbmmSXZFYoS/+CFw+KePUPefPfmLZ70NYXAg/vzJCZN0qlsOj7315pbX1jc6u8XdnZ3ds/qB4etW2SGcZbLJGJ6UTUcik0b6FAyTup4VRFkj9G49tZ/fGJGysS/YCTlIeKDrWIBaM4s3o2U/1qza/7c5FVCAqoQaFmv/rVGyQsU1wjk9TabuCnGObUoGCSTyu9zPKUsjEd8q5DTRW3YT7fdUrOnDMgcWLc00jm7u+JnCprJypynYriyC7XZuZ/tW6G8XWYC51myDVbfBRnkmBCZoeTgTCcoZw4oMwItythI2ooQxdPxYUQLJ+8Cu2LeuD4/rLWuCniKMMJnMI5BHAFDbiDJrSAwQie4RXePOW9eO/ex6K15BUzx/BH3ucPMe2OUw==</latexit>

Output
.. <latexit sha1_base64="gvQE9cb1Dja5lCiXX5pMwfJapj8=">AAAB9HicbZBNS8NAEIY3ftb6VfXoZbEInkoigh6LXrxZwX5AG8pmO2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqjnUeSxj3QqYASkU1FGghFaigUWBhGYwvJnWmyPQRsTqAccJ+BHrKxEKztBafgfhCbO7FJMUJ91S2a24M9Fl8HIok1y1bumr04t5GoFCLpkxbc9N0M+YRsElTIqd1EDC+JD1oW1RsQiMn82WntBT6/RoGGv7FNKZ+3siY5Ex4yiwnRHDgVmsTc3/au0Uwys/E8qeBIrPPwpTSTGm0wRoT2jgKMcWGNfC7kr5gGnG0eZUtCF4iycvQ+O84lm+vyhXr/M4CuSYnJAz4pFLUiW3pEbqhJNH8kxeyZszcl6cd+dj3rri5DNH5I+czx+i4ZKm</latexit>

 00
00,if z <
. Net input
<latexit sha1_base64="vnW9SOTDG2wSeqwpvMYjb0pYOfc=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJVEBF0W3biSCvYCbSiT6Uk7dHJh5qRYQt/EjQtF3Pom7nwbp2kW2vrDwMd/zmHO+f1ECo2O822V1tY3NrfK25Wd3b39A/vwqKXjVHFo8ljGquMzDVJE0ESBEjqJAhb6Etr++HZeb09AaRFHjzhNwAvZMBKB4AyN1bftHsITZveAVERJirO+XXVqTi66Cm4BVVKo0be/eoOYpyFEyCXTuus6CXoZUyi4hFmll2pIGB+zIXQNRiwE7WX55jN6ZpwBDWJlXoQ0d39PZCzUehr6pjNkONLLtbn5X62bYnDtZflJEPHFR0EqKcZ0HgMdCAUc5dQA40qYXSkfMcU4mrAqJgR3+eRVaF3UXMMPl9X6TRFHmZyQU3JOXHJF6uSONEiTcDIhz+SVvFmZ9WK9Wx+L1pJVzByTP7I+fwD13ZPb</latexit>

(z)==
(z)
wm
<latexit sha1_base64="3SltFZgdSbEccduFdJMJ4sVJM+s=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMquUpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2Ucjds=</latexit>
11,if z > 10
<latexit sha1_base64="m1vO6djD/2/n5RhOD3QM1YDZNMw=">AAACTnicbVHPaxNBGJ1NtY1R21SPXj4MQr2UXRHqQSHUi8cUzA/IhjA7+XYzZHZ2nfm2mCz5C3spvflnePFgEZ1N9qCJDwYe730/Zt5EuZKWfP+b1zh48PDwqPmo9fjJ0+OT9umzgc0KI7AvMpWZUcQtKqmxT5IUjnKDPI0UDqPFx8ofXqOxMtOfaZnjJOWJlrEUnJw0bWNoZZLys9Vr+ABhhInUpXDz7BpasIEPIeFXKkHGsAZYwftKCms32HHDBL84sXJD1LN61rTd8c/9DWCfBDXpsBq9afsunGWiSFGTUNzaceDnNCm5ISkUrlthYTHnYsETHDuqeYp2Um7iWMMrp8wgzow7mmCj/t1R8tTaZRq5ypTT3O56lfg/b1xQ/G5SSp0XhFpsF8WFAsqgyhZm0qAgtXSECyPdXUHMueGC3A+0XAjB7pP3yeDNeeD41dtO97KOo8lesJfsjAXsgnXZJ9ZjfSbYDfvOfrJ779b74f3yfm9LG17d85z9g0bzDz4prWY=</latexit>

<latexit sha1_base64="T1uilhZzWsdKA050UzjQrhpqtEw=">AAACLnicbVBdS8MwFE3nd/2a+uhLcAgKIq0ICqKIIvio4JywjJGmt1tYmpYkFbayX+SLf0UfBBXx1Z9htvVBpwcCJ+fce5N7glRwbTzv1SlNTE5Nz8zOufMLi0vL5ZXVW51kikGVJSJRdwHVILiEquFGwF2qgMaBgFrQOR/4tXtQmifyxnRTaMS0JXnEGTVWapYviOatmG71tvExJgG0uMyZnaf7rreDyRHuYSIAe5gQ1y+Ek9GVgAyL2ma54u16Q+C/xC9IBRW4apafSZiwLAZpmKBa130vNY2cKsOZgL5LMg0pZR3agrqlksagG/lw3T7etEqIo0TZIw0eqj87chpr3Y0DWxlT09bj3kD8z6tnJjps5FymmQHJRg9FmcAmwYPscMgVMCO6llCmuP0rZm2qKDM2YdeG4I+v/Jfc7u36ll/vV07Pijhm0TraQFvIRwfoFF2iK1RFDD2gJ/SG3p1H58X5cD5HpSWn6FlDv+B8fQN9YKSI</latexit>

xm <latexit sha1_base64="UZ/Cq01CQU77ibJgEHsrgiYApIY=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMiuWpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2aijdw=</latexit>

b=
<latexit sha1_base64="DCslSBKTzJPCossa3lI7E6Re1xA=">AAAB83icbZBNS8NAEIY39avWr6pHL4tF8GJJRNCLUPTisYK1hSaUzXbSLt1swu5EKKF/w4sHRbz6Z7z5b9y2OWjrCwsP78wws2+YSmHQdb+d0srq2vpGebOytb2zu1fdP3g0SaY5tHgiE90JmQEpFLRQoIROqoHFoYR2OLqd1ttPoI1I1AOOUwhiNlAiEpyhtfyQXtMz6uMQkPWqNbfuzkSXwSugRgo1e9Uvv5/wLAaFXDJjup6bYpAzjYJLmFT8zEDK+IgNoGtRsRhMkM9untAT6/RplGj7FNKZ+3siZ7Ex4zi0nTHDoVmsTc3/at0Mo6sgFyrNEBSfL4oySTGh0wBoX2jgKMcWGNfC3kr5kGnG0cZUsSF4i19ehsfzumf5/qLWuCniKJMjckxOiUcuSYPckSZpEU5S8kxeyZuTOS/Ou/Mxby05xcwh+SPn8wdTHpCQ</latexit>

Inputs
<latexit sha1_base64="kW2ZbIA+FSvwKPaKbZPllX8WYNo=">AAAB9HicbZBNS8NAEIYnftb6VfXoZbEInkoigh6LXvRWwX5AG8pmu2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqhmvs1jGuhVQw6VQvI4CJW8lmtMokLwZDG+m9eaIayNi9YDjhPsR7SsRCkbRWn4H+RNmdypJ0Uy6pbJbcWciy+DlUIZctW7pq9OLWRpxhUxSY9qem6CfUY2CST4pdlLDE8qGtM/bFhWNuPGz2dITcmqdHgljbZ9CMnN/T2Q0MmYcBbYzojgwi7Wp+V+tnWJ45WdiehNXbP5RmEqCMZkmQHpCc4ZybIEyLeyuhA2opgxtTkUbgrd48jI0ziue5fuLcvU6j6MAx3ACZ+DBJVThFmpQBwaP8Ayv8OaMnBfn3fmYt644+cwR/JHz+QONXpKY</latexit>

Let D = (hx[1] , y [1] i, hx[2] , y [2] i, ..., hx[n] , y [n] i) 2 (Rm ⇥ {0, 1})n
<latexit sha1_base64="Wal6cJLWU5bzcyF5u646zqqA+4A=">AAADVXiclZLfbtMwFMZP0jFGB6ywS24sKiSQpiqpkOAGaYJd7HIguk1qs8lxndaa7UT2CaKK+lo8yG4m3oQbJE7dSvvTIYGjyJ+/851fEsd5pZXHJPkZxa2NB5sPtx61tx8/ebrTefb82Je1E3IgSl2605x7qZWVA1So5WnlJDe5lif5xadF/eSbdF6V9ivOKpkZPrGqUIIjWWXnB4zAAAeEKQiaNTRwAHNg8IHu11TV5FqY0CzJuU7nUFD2O2XPaB5CChnpPcrMbjkjcDcIe4Hxr8z+GrP/F2YvXP9Ht2t0ew/9TWAoWi/345qWU8+XQDMhg5Qy1OHDqoGE2CmpOTHOwLLt80436SVhsHWRrkQXVuPovHM5GpeiNtKi0Nz7YZpUmDXcoRJaztuj2suKiws+kUOSlhvpsyacijl7Rc6YFaWj2yIL7s2OhhvvZyanpOE49XdrC/O+2rDG4n3WKFvVKK1YPqioNcOSLY4YGysnBeoZCS6condlYsodF0gHsU2bkN795HVx3O+lpD+/7e5/XG3HFryAl/QLUngH+3AIRzAAEV1Gv+IojuOr+Hdro7W5jMbRqmcXbo3Wzh+cKM3Z</latexit>

1. Initialize w := 0m
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit>
1
, b := 0
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVj

2. For every training epoch:


A. For every hx[i] , y [i] i 2 D <latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>
:
(a) ŷ [i] := x[i]T w + b Compute output (prediction)

(b) err := (y [i] <latexit sha1_base64="wYimCRAo97FH6nzYS/fz3/CUouY=">AAACEHicbZDLSsNAFIYn3q23qEs3g0XUhZKIoAhC0Y1LBWuFJpbJ9NQOnVyYORFDyCO48VXcuFDErUt3vo3TNgut/jDwzX/OYeb8QSKFRsf5ssbGJyanpmdmK3PzC4tL9vLKlY5TxaHOYxmr64BpkCKCOgqUcJ0oYGEgoRH0Tvv1xh0oLeLoErME/JDdRqIjOENjtexND+Eec1CqoEfHdCu7yZvCL+gO9boM86wY3rdbdtXZdQaif8EtoUpKnbfsT68d8zSECLlkWjddJ0E/ZwoFl1BUvFRDwniP3ULTYMRC0H4+WKigG8Zp006szImQDtyfEzkLtc7CwHSGDLt6tNY3/6s1U+wc+rmIkhQh4sOHOqmkGNN+OrQtFHCUmQHGlTB/pbzLFONoMqyYENzRlf/C1d6ua/hiv1o7KeOYIWtknWwRlxyQGjkj56ROOHkgT+SFvFqP1rP1Zr0PW8escmaV/JL18Q3haZx5</latexit>


ŷ [i] ) Calculate error
(c) w := w + err ⇥ x[i] , <latexit sha1_base64="ypGRpFBWbrLkFvM0DBOQw+AAwc4=">AAACH3icbVDLSsNAFJ3UV62vqEs3g0UQhJKIqAhC0Y3LCvYBSSyT6aQdOpmEmYlaQv7Ejb/ixoUi4q5/46QtUlsPDJw5517uvcePGZXKsoZGYWFxaXmluFpaW9/Y3DK3dxoySgQmdRyxSLR8JAmjnNQVVYy0YkFQ6DPS9PvXud98IELSiN+pQUy8EHU5DShGSktt89QNker5QfqYwYtLOPU7gkQI6CoaEvmrP2X3qUO9rG2WrYo1Apwn9oSUwQS1tvntdiKchIQrzJCUjm3FykuRUBQzkpXcRJIY4T7qEkdTjvRULx3dl8EDrXRgEAn9uIIjdbojRaGUg9DXlfmectbLxf88J1HBuZdSHieKcDweFCQMqgjmYcEOFQQrNtAEYUH1rhD3kEBY6UhLOgR79uR50jiu2JrfnpSrV5M4imAP7INDYIMzUAU3oAbqAINn8ArewYfxYrwZn8bXuLRgTHp2wR8Ywx9A/KMf</latexit>

b := b + err Update parameters


Sebastian Raschka STAT 479: Deep Learning SS 2019 !7
General Learning Principle

Let D = (hx[1] , y [1] i, hx[2] , y [2] i, ..., hx[n] , y [n] i) 2 (Rm ⇥ {0, 1})n
<latexit sha1_base64="Wal6cJLWU5bzcyF5u646zqqA+4A=">AAADVXiclZLfbtMwFMZP0jFGB6ywS24sKiSQpiqpkOAGaYJd7HIguk1qs8lxndaa7UT2CaKK+lo8yG4m3oQbJE7dSvvTIYGjyJ+/851fEsd5pZXHJPkZxa2NB5sPtx61tx8/ebrTefb82Je1E3IgSl2605x7qZWVA1So5WnlJDe5lif5xadF/eSbdF6V9ivOKpkZPrGqUIIjWWXnB4zAAAeEKQiaNTRwAHNg8IHu11TV5FqY0CzJuU7nUFD2O2XPaB5CChnpPcrMbjkjcDcIe4Hxr8z+GrP/F2YvXP9Ht2t0ew/9TWAoWi/345qWU8+XQDMhg5Qy1OHDqoGE2CmpOTHOwLLt80436SVhsHWRrkQXVuPovHM5GpeiNtKi0Nz7YZpUmDXcoRJaztuj2suKiws+kUOSlhvpsyacijl7Rc6YFaWj2yIL7s2OhhvvZyanpOE49XdrC/O+2rDG4n3WKFvVKK1YPqioNcOSLY4YGysnBeoZCS6condlYsodF0gHsU2bkN795HVx3O+lpD+/7e5/XG3HFryAl/QLUngH+3AIRzAAEV1Gv+IojuOr+Hdro7W5jMbRqmcXbo3Wzh+cKM3Z</latexit>

"On-line" mode
m 1 This applies to all common neuron

1. Initialize w := 0 , b := 0
models and (deep) neural network
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit> <latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch: architectures!


A. For every hx[i] , y [i] i 2 D <latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>
:
(a) Compute output (prediction) There are some variants of it, namely the
"batch mode" and the "minibatch mode"

(b) Calculate error
which we will briefly go over in the next
(c) Update w, b <latexit sha1_base64="fvrM1fkDFlOVkxHoGkYgMiCFIOA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwIWVGBV0W3bisYB/QDiWTZtrQTGZMMpUy9DvcuFDErR/jzr8x085CWw8EDufcyz05fiy4No7zjQorq2vrG8XN0tb2zu5eef+gqaNEUdagkYhU2yeaCS5Zw3AjWDtWjIS+YC1/dJv5rTFTmkfywUxi5oVkIHnAKTFW8rohMUM/SJ+mZ9jvlStO1ZkBLxM3JxXIUe+Vv7r9iCYhk4YKonXHdWLjpUQZTgWblrqJZjGhIzJgHUslCZn20lnoKT6xSh8HkbJPGjxTf2+kJNR6Evp2MgupF71M/M/rJCa49lIu48QwSeeHgkRgE+GsAdznilEjJpYQqrjNiumQKEKN7alkS3AXv7xMmudV96Lq3F9Wajd5HUU4gmM4BReuoAZ3UIcGUHiEZ3iFNzRGL+gdfcxHCyjfOYQ/QJ8/gEuR6Q==</latexit>

slides and then discuss more later

Sebastian Raschka STAT 479: Deep Learning SS 2019 !8


General Learning Principle

Let D = (hx[1] , y [1] i, hx[2] , y [2] i, ..., hx[n] , y [n] i) 2 (Rm ⇥ {0, 1})n
<latexit sha1_base64="Wal6cJLWU5bzcyF5u646zqqA+4A=">AAADVXiclZLfbtMwFMZP0jFGB6ywS24sKiSQpiqpkOAGaYJd7HIguk1qs8lxndaa7UT2CaKK+lo8yG4m3oQbJE7dSvvTIYGjyJ+/851fEsd5pZXHJPkZxa2NB5sPtx61tx8/ebrTefb82Je1E3IgSl2605x7qZWVA1So5WnlJDe5lif5xadF/eSbdF6V9ivOKpkZPrGqUIIjWWXnB4zAAAeEKQiaNTRwAHNg8IHu11TV5FqY0CzJuU7nUFD2O2XPaB5CChnpPcrMbjkjcDcIe4Hxr8z+GrP/F2YvXP9Ht2t0ew/9TWAoWi/345qWU8+XQDMhg5Qy1OHDqoGE2CmpOTHOwLLt80436SVhsHWRrkQXVuPovHM5GpeiNtKi0Nz7YZpUmDXcoRJaztuj2suKiws+kUOSlhvpsyacijl7Rc6YFaWj2yIL7s2OhhvvZyanpOE49XdrC/O+2rDG4n3WKFvVKK1YPqioNcOSLY4YGysnBeoZCS6condlYsodF0gHsU2bkN795HVx3O+lpD+/7e5/XG3HFryAl/QLUngH+3AIRzAAEV1Gv+IojuOr+Hdro7W5jMbRqmcXbo3Wzh+cKM3Z</latexit>

"On-line" mode Batch mode

1. Initialize w := 0m
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit>
1
, b := 0
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>
1. Initialize w := 0m
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit>
1
, b := 0
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch: 2. For every training epoch:


A. For every hx[i] , y [i] i 2 D : A. Initializew := 0, b := 0 <latexit sha1_base64="MHuS1lDrYLNfDe0EZMaG9lxFVEM=">AAACDHicbVDLSgMxFM3UV62vqks3wSK4kDKjgqIIRV24rGAf0Cklk2ba0ExmSO4oZegHuPFX3LhQxK0f4M6/MdPOQlsPBE7OuZd77/EiwTXY9reVm5tfWFzKLxdWVtfWN4qbW3UdxoqyGg1FqJoe0UxwyWrAQbBmpBgJPMEa3uAq9Rv3TGkeyjsYRqwdkJ7kPqcEjNQpltxrJoBgNyDQ9/zkYXR2YR+45zjTPfM1VXbZHgPPEicjJZSh2il+ud2QxgGTQAXRuuXYEbQTooBTwUYFN9YsInRAeqxlqCQB0+1kfMwI7xmli/1QmScBj9XfHQkJtB4GnqlMd9bTXir+57Vi8E/bCZdRDEzSySA/FhhCnCaDu1wxCmJoCKGKm10x7RNFKJj8CiYEZ/rkWVI/LDtHZfv2uFS5zOLIox20i/aRg05QBd2gKqohih7RM3pFb9aT9WK9Wx+T0pyV9WyjP7A+fwABpZmn</latexit>

B. For every hx[i] , y [i] i 2 D :


<latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>

(a) Compute output (prediction) <latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>

(a) Compute output (prediction)


(b) Calculate error
(b) Calculate error
(c) Update w, b
(c) Update
<latexit sha1_base64="fvrM1fkDFlOVkxHoGkYgMiCFIOA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwIWVGBV0W3bisYB/QDiWTZtrQTGZMMpUy9DvcuFDErR/jzr8x085CWw8EDufcyz05fiy4No7zjQorq2vrG8XN0tb2zu5eef+gqaNEUdagkYhU2yeaCS5Zw3AjWDtWjIS+YC1/dJv5rTFTmkfywUxi5oVkIHnAKTFW8rohMUM/SJ+mZ9jvlStO1ZkBLxM3JxXIUe+Vv7r9iCYhk4YKonXHdWLjpUQZTgWblrqJZjGhIzJgHUslCZn20lnoKT6xSh8HkbJPGjxTf2+kJNR6Evp2MgupF71M/M/rJCa49lIu48QwSeeHgkRgE+GsAdznilEjJpYQqrjNiumQKEKN7alkS3AXv7xMmudV96Lq3F9Wajd5HUU4gmM4BReuoAZ3UIcGUHiEZ3iFNzRGL+gdfcxHCyjfOYQ/QJ8/gEuR6Q==</latexit>

<latexit sha1_base64="dSOm6mhH8W0/jVxZj2uADjYcZZw=">AAACBHicbVBNS8NAEJ3Ur1q/oh57WSyCBymJCnos6sFjBfsBTSmb7aZdutmE3Y1SQg9e/CtePCji1R/hzX/jps1BWx8MPN6bYWaeH3OmtON8W4Wl5ZXVteJ6aWNza3vH3t1rqiiRhDZIxCPZ9rGinAna0Exz2o4lxaHPacsfXWV+655KxSJxp8cx7YZ4IFjACNZG6tll75pyjZEXYj30g/Rhcoxyye/ZFafqTIEWiZuTCuSo9+wvrx+RJKRCE46V6rhOrLsplpoRTiclL1E0xmSEB7RjqMAhVd10+sQEHRqlj4JImhIaTdXfEykOlRqHvunMblXzXib+53USHVx0UybiRFNBZouChCMdoSwR1GeSEs3HhmAimbkVkSGWmGiTW8mE4M6/vEiaJ1X3tOrcnlVql3kcRSjDARyBC+dQgxuoQwMIPMIzvMKb9WS9WO/Wx6y1YOUz+/AH1ucPpw+Xcg==</latexit>
w, b
C. Update w, b : <latexit sha1_base64="fvrM1fkDFlOVkxHoGkYgMiCFIOA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwIWVGBV0W3bisYB/QDiWTZtrQTGZMMpUy9DvcuFDErR/jzr8x085CWw8EDufcyz05fiy4No7zjQorq2vrG8XN0tb2zu5eef+gqaNEUdagkYhU2yeaCS5Zw3AjWDtWjIS+YC1/dJv5rTFTmkfywUxi5oVkIHnAKTFW8rohMUM/SJ+mZ9jvlStO1ZkBLxM3JxXIUe+Vv7r9iCYhk4YKonXHdWLjpUQZTgWblrqJZjGhIzJgHUslCZn20lnoKT6xSh8HkbJPGjxTf2+kJNR6Evp2MgupF71M/M/rJCa49lIu48QwSeeHgkRgE+GsAdznilEjJpYQqrjNiumQKEKN7alkS3AXv7xMmudV96Lq3F9Wajd5HUU4gmM4BReuoAZ3UIcGUHiEZ3iFNzRGL+gdfcxHCyjfOYQ/QJ8/gEuR6Q==</latexit>

w := w +
<latexit sha1_base64="cm3FQz8Hc8HW9gDuCX9rtFlcymA=">AAACKHicbVDLSgMxFM34rPU16tJNsAiCUmZUUASxqAuXFewDOqUkaaYNzWSGJKOUoZ/jxl9xI6JIt36JmXbA2nogcHLOvdx7D444U9pxhtbc/MLi0nJuJb+6tr6xaW9tV1UYS0IrJOShrGOkKGeCVjTTnNYjSVGAOa3h3k3q1x6pVCwUD7of0WaAOoL5jCBtpJZ95QVId7GfPA3gxSWc+B1C75ZyjSa1I4jTql8Pt+yCU3RGgLPEzUgBZCi37HevHZI4oEITjpRquE6kmwmSmhFOB3kvVjRCpIc6tGGoQAFVzWR06ADuG6UN/VCaJzQcqZMdCQqU6gfYVKZLq2kvFf/zGrH2z5sJE1GsqSDjQX7MoQ5hmhpsM0mJ5n1DEJHM7ApJF0lEtMk2b0Jwp0+eJdXjontSdO5PC6XrLI4c2AV74AC44AyUwB0ogwog4Bm8gg/wab1Yb9aXNRyXzllZzw74A+v7BybXpBo=</latexit>
w, b := + b

Sebastian Raschka STAT 479: Deep Learning SS 2019 !9


General Learning Principle

Let D = (hx[1] , y [1] i, hx[2] , y [2] i, ..., hx[n] , y [n] i) 2 (Rm ⇥ {0, 1})n
<latexit sha1_base64="Wal6cJLWU5bzcyF5u646zqqA+4A=">AAADVXiclZLfbtMwFMZP0jFGB6ywS24sKiSQpiqpkOAGaYJd7HIguk1qs8lxndaa7UT2CaKK+lo8yG4m3oQbJE7dSvvTIYGjyJ+/851fEsd5pZXHJPkZxa2NB5sPtx61tx8/ebrTefb82Je1E3IgSl2605x7qZWVA1So5WnlJDe5lif5xadF/eSbdF6V9ivOKpkZPrGqUIIjWWXnB4zAAAeEKQiaNTRwAHNg8IHu11TV5FqY0CzJuU7nUFD2O2XPaB5CChnpPcrMbjkjcDcIe4Hxr8z+GrP/F2YvXP9Ht2t0ew/9TWAoWi/345qWU8+XQDMhg5Qy1OHDqoGE2CmpOTHOwLLt80436SVhsHWRrkQXVuPovHM5GpeiNtKi0Nz7YZpUmDXcoRJaztuj2suKiws+kUOSlhvpsyacijl7Rc6YFaWj2yIL7s2OhhvvZyanpOE49XdrC/O+2rDG4n3WKFvVKK1YPqioNcOSLY4YGysnBeoZCS6condlYsodF0gHsU2bkN795HVx3O+lpD+/7e5/XG3HFryAl/QLUngH+3AIRzAAEV1Gv+IojuOr+Hdro7W5jMbRqmcXbo3Wzh+cKM3Z</latexit>

"On-line" mode Batch mode

1. Initialize w := 0m
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit>
1
, b := 0
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>
1. Initialize w := 0m
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit>
1
, b := 0
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch: 2. For every training epoch:


A. For every hx[i] , y [i] i 2 D : A. Initializew := 0, b := 0 <latexit sha1_base64="MHuS1lDrYLNfDe0EZMaG9lxFVEM=">AAACDHicbVDLSgMxFM3UV62vqks3wSK4kDKjgqIIRV24rGAf0Cklk2ba0ExmSO4oZegHuPFX3LhQxK0f4M6/MdPOQlsPBE7OuZd77/EiwTXY9reVm5tfWFzKLxdWVtfWN4qbW3UdxoqyGg1FqJoe0UxwyWrAQbBmpBgJPMEa3uAq9Rv3TGkeyjsYRqwdkJ7kPqcEjNQpltxrJoBgNyDQ9/zkYXR2YR+45zjTPfM1VXbZHgPPEicjJZSh2il+ud2QxgGTQAXRuuXYEbQTooBTwUYFN9YsInRAeqxlqCQB0+1kfMwI7xmli/1QmScBj9XfHQkJtB4GnqlMd9bTXir+57Vi8E/bCZdRDEzSySA/FhhCnCaDu1wxCmJoCKGKm10x7RNFKJj8CiYEZ/rkWVI/LDtHZfv2uFS5zOLIox20i/aRg05QBd2gKqohih7RM3pFb9aT9WK9Wx+T0pyV9WyjP7A+fwABpZmn</latexit>

B. For every hx[i] , y [i] i 2 D :


<latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>

(a) Compute output (prediction) <latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>

(a) Compute output (prediction)


(b) Calculate error
(b) Calculate error
(c) Update w, b
(c) Update
<latexit sha1_base64="fvrM1fkDFlOVkxHoGkYgMiCFIOA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwIWVGBV0W3bisYB/QDiWTZtrQTGZMMpUy9DvcuFDErR/jzr8x085CWw8EDufcyz05fiy4No7zjQorq2vrG8XN0tb2zu5eef+gqaNEUdagkYhU2yeaCS5Zw3AjWDtWjIS+YC1/dJv5rTFTmkfywUxi5oVkIHnAKTFW8rohMUM/SJ+mZ9jvlStO1ZkBLxM3JxXIUe+Vv7r9iCYhk4YKonXHdWLjpUQZTgWblrqJZjGhIzJgHUslCZn20lnoKT6xSh8HkbJPGjxTf2+kJNR6Evp2MgupF71M/M/rJCa49lIu48QwSeeHgkRgE+GsAdznilEjJpYQqrjNiumQKEKN7alkS3AXv7xMmudV96Lq3F9Wajd5HUU4gmM4BReuoAZ3UIcGUHiEZ3iFNzRGL+gdfcxHCyjfOYQ/QJ8/gEuR6Q==</latexit>

<latexit sha1_base64="dSOm6mhH8W0/jVxZj2uADjYcZZw=">AAACBHicbVBNS8NAEJ3Ur1q/oh57WSyCBymJCnos6sFjBfsBTSmb7aZdutmE3Y1SQg9e/CtePCji1R/hzX/jps1BWx8MPN6bYWaeH3OmtON8W4Wl5ZXVteJ6aWNza3vH3t1rqiiRhDZIxCPZ9rGinAna0Exz2o4lxaHPacsfXWV+655KxSJxp8cx7YZ4IFjACNZG6tll75pyjZEXYj30g/Rhcoxyye/ZFafqTIEWiZuTCuSo9+wvrx+RJKRCE46V6rhOrLsplpoRTiclL1E0xmSEB7RjqMAhVd10+sQEHRqlj4JImhIaTdXfEykOlRqHvunMblXzXib+53USHVx0UybiRFNBZouChCMdoSwR1GeSEs3HhmAimbkVkSGWmGiTW8mE4M6/vEiaJ1X3tOrcnlVql3kcRSjDARyBC+dQgxuoQwMIPMIzvMKb9WS9WO/Wx6y1YOUz+/AH1ucPpw+Xcg==</latexit>
w, b
In practice, we usually shuffle the
 C. Update w, b : <latexit sha1_base64="fvrM1fkDFlOVkxHoGkYgMiCFIOA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwIWVGBV0W3bisYB/QDiWTZtrQTGZMMpUy9DvcuFDErR/jzr8x085CWw8EDufcyz05fiy4No7zjQorq2vrG8XN0tb2zu5eef+gqaNEUdagkYhU2yeaCS5Zw3AjWDtWjIS+YC1/dJv5rTFTmkfywUxi5oVkIHnAKTFW8rohMUM/SJ+mZ9jvlStO1ZkBLxM3JxXIUe+Vv7r9iCYhk4YKonXHdWLjpUQZTgWblrqJZjGhIzJgHUslCZn20lnoKT6xSh8HkbJPGjxTf2+kJNR6Evp2MgupF71M/M/rJCa49lIu48QwSeeHgkRgE+GsAdznilEjJpYQqrjNiumQKEKN7alkS3AXv7xMmudV96Lq3F9Wajd5HUU4gmM4BReuoAZ3UIcGUHiEZ3iFNzRGL+gdfcxHCyjfOYQ/QJ8/gEuR6Q==</latexit>

dataset prior to each epoch
 w := w + w, b := + b


to prevent cycles
<latexit sha1_base64="cm3FQz8Hc8HW9gDuCX9rtFlcymA=">AAACKHicbVDLSgMxFM34rPU16tJNsAiCUmZUUASxqAuXFewDOqUkaaYNzWSGJKOUoZ/jxl9xI6JIt36JmXbA2nogcHLOvdx7D444U9pxhtbc/MLi0nJuJb+6tr6xaW9tV1UYS0IrJOShrGOkKGeCVjTTnNYjSVGAOa3h3k3q1x6pVCwUD7of0WaAOoL5jCBtpJZ95QVId7GfPA3gxSWc+B1C75ZyjSa1I4jTql8Pt+yCU3RGgLPEzUgBZCi37HevHZI4oEITjpRquE6kmwmSmhFOB3kvVjRCpIc6tGGoQAFVzWR06ADuG6UN/VCaJzQcqZMdCQqU6gfYVKZLq2kvFf/zGrH2z5sJE1GsqSDjQX7MoQ5hmhpsM0mJ5n1DEJHM7ApJF0lEtMk2b0Jwp0+eJdXjontSdO5PC6XrLI4c2AV74AC44AyUwB0ogwog4Bm8gg/wab1Yb9aXNRyXzllZzw74A+v7BybXpBo=</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !10


General Learning Principle

Let D = (hx[1] , y [1] i, hx[2] , y [2] i, ..., hx[n] , y [n] i) 2 (Rm ⇥ {0, 1})n
<latexit sha1_base64="Wal6cJLWU5bzcyF5u646zqqA+4A=">AAADVXiclZLfbtMwFMZP0jFGB6ywS24sKiSQpiqpkOAGaYJd7HIguk1qs8lxndaa7UT2CaKK+lo8yG4m3oQbJE7dSvvTIYGjyJ+/851fEsd5pZXHJPkZxa2NB5sPtx61tx8/ebrTefb82Je1E3IgSl2605x7qZWVA1So5WnlJDe5lif5xadF/eSbdF6V9ivOKpkZPrGqUIIjWWXnB4zAAAeEKQiaNTRwAHNg8IHu11TV5FqY0CzJuU7nUFD2O2XPaB5CChnpPcrMbjkjcDcIe4Hxr8z+GrP/F2YvXP9Ht2t0ew/9TWAoWi/345qWU8+XQDMhg5Qy1OHDqoGE2CmpOTHOwLLt80436SVhsHWRrkQXVuPovHM5GpeiNtKi0Nz7YZpUmDXcoRJaztuj2suKiws+kUOSlhvpsyacijl7Rc6YFaWj2yIL7s2OhhvvZyanpOE49XdrC/O+2rDG4n3WKFvVKK1YPqioNcOSLY4YGysnBeoZCS6condlYsodF0gHsU2bkN795HVx3O+lpD+/7e5/XG3HFryAl/QLUngH+3AIRzAAEV1Gv+IojuOr+Hdro7W5jMbRqmcXbo3Wzh+cKM3Z</latexit>

"On-line" mode "On-line" mode II (alternative)


m 1 m 1
1. Initialize w := 0
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit>
, b := 0
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>
1. Initialize w := 0
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit>
, b := 0
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch: 2. For for t iterations:


A. For every hx[i] , y [i] i 2 D <latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>
: A. Pick random hx[i] , y [i] i 2 D : <latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>

(a) Compute output (prediction) (a) Compute output (prediction)


(b) Calculate error (b) Calculate error

(c) Update w, b <latexit sha1_base64="fvrM1fkDFlOVkxHoGkYgMiCFIOA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwIWVGBV0W3bisYB/QDiWTZtrQTGZMMpUy9DvcuFDErR/jzr8x085CWw8EDufcyz05fiy4No7zjQorq2vrG8XN0tb2zu5eef+gqaNEUdagkYhU2yeaCS5Zw3AjWDtWjIS+YC1/dJv5rTFTmkfywUxi5oVkIHnAKTFW8rohMUM/SJ+mZ9jvlStO1ZkBLxM3JxXIUe+Vv7r9iCYhk4YKonXHdWLjpUQZTgWblrqJZjGhIzJgHUslCZn20lnoKT6xSh8HkbJPGjxTf2+kJNR6Evp2MgupF71M/M/rJCa49lIu48QwSeeHgkRgE+GsAdznilEjJpYQqrjNiumQKEKN7alkS3AXv7xMmudV96Lq3F9Wajd5HUU4gmM4BReuoAZ3UIcGUHiEZ3iFNzRGL+gdfcxHCyjfOYQ/QJ8/gEuR6Q==</latexit>


(c) Update w, b <latexit sha1_base64="fvrM1fkDFlOVkxHoGkYgMiCFIOA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwIWVGBV0W3bisYB/QDiWTZtrQTGZMMpUy9DvcuFDErR/jzr8x085CWw8EDufcyz05fiy4No7zjQorq2vrG8XN0tb2zu5eef+gqaNEUdagkYhU2yeaCS5Zw3AjWDtWjIS+YC1/dJv5rTFTmkfywUxi5oVkIHnAKTFW8rohMUM/SJ+mZ9jvlStO1ZkBLxM3JxXIUe+Vv7r9iCYhk4YKonXHdWLjpUQZTgWblrqJZjGhIzJgHUslCZn20lnoKT6xSh8HkbJPGjxTf2+kJNR6Evp2MgupF71M/M/rJCa49lIu48QwSeeHgkRgE+GsAdznilEjJpYQqrjNiumQKEKN7alkS3AXv7xMmudV96Lq3F9Wajd5HUU4gmM4BReuoAZ3UIcGUHiEZ3iFNzRGL+gdfcxHCyjfOYQ/QJ8/gEuR6Q==</latexit>

stochastic
"semi"-stochastic (actually, not really stochastic because a fixed training
set instead of sampling from the population)

Sebastian Raschka STAT 479: Deep Learning SS 2019 !11


General Learning Principle
Let D = (hx[1] , y[1] i, hx[2] , y[2] i, ..., hx[n] , y[n] i) 2 (Rm ⇥ {0, 1})n <latexit sha1_base64="Wal6cJLWU5bzcyF5u646zqqA+4A=">AAADVXiclZLfbtMwFMZP0jFGB6ywS24sKiSQpiqpkOAGaYJd7HIguk1qs8lxndaa7UT2CaKK+lo8yG4m3oQbJE7dSvvTIYGjyJ+/851fEsd5pZXHJPkZxa2NB5sPtx61tx8/ebrTefb82Je1E3IgSl2605x7qZWVA1So5WnlJDe5lif5xadF/eSbdF6V9ivOKpkZPrGqUIIjWWXnB4zAAAeEKQiaNTRwAHNg8IHu11TV5FqY0CzJuU7nUFD2O2XPaB5CChnpPcrMbjkjcDcIe4Hxr8z+GrP/F2YvXP9Ht2t0ew/9TWAoWi/345qWU8+XQDMhg5Qy1OHDqoGE2CmpOTHOwLLt80436SVhsHWRrkQXVuPovHM5GpeiNtKi0Nz7YZpUmDXcoRJaztuj2suKiws+kUOSlhvpsyacijl7Rc6YFaWj2yIL7s2OhhvvZyanpOE49XdrC/O+2rDG4n3WKFvVKK1YPqioNcOSLY4YGysnBeoZCS6condlYsodF0gHsU2bkN795HVx3O+lpD+/7e5/XG3HFryAl/QLUngH+3AIRzAAEV1Gv+IojuOr+Hdro7W5jMbRqmcXbo3Wzh+cKM3Z</latexit>

Minibatch mode

(mix between on-line and batch)
m 1
1. Initialize w := 0 , b := 0
The most common mode in 

<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit> <latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch:


deep learning. Any ideas why?
A. Initialize <latexit sha1_base64="MHuS1lDrYLNfDe0EZMaG9lxFVEM=">AAACDHicbVDLSgMxFM3UV62vqks3wSK4kDKjgqIIRV24rGAf0Cklk2ba0ExmSO4oZegHuPFX3LhQxK0f4M6/MdPOQlsPBE7OuZd77/EiwTXY9reVm5tfWFzKLxdWVtfWN4qbW3UdxoqyGg1FqJoe0UxwyWrAQbBmpBgJPMEa3uAq9Rv3TGkeyjsYRqwdkJ7kPqcEjNQpltxrJoBgNyDQ9/zkYXR2YR+45zjTPfM1VXbZHgPPEicjJZSh2il+ud2QxgGTQAXRuuXYEbQTooBTwUYFN9YsInRAeqxlqCQB0+1kfMwI7xmli/1QmScBj9XfHQkJtB4GnqlMd9bTXir+57Vi8E/bCZdRDEzSySA/FhhCnCaDu1wxCmJoCKGKm10x7RNFKJj8CiYEZ/rkWVI/LDtHZfv2uFS5zOLIox20i/aRg05QBd2gKqohih7RM3pFb9aT9WK9Wx+T0pyV9WyjP7A+fwABpZmn</latexit>
w := 0, b := 0
B. For every {hx[i] , y [i] i, ..., hx[i+k] , y [i+k] i} ⇢ D <latexit sha1_base64="KqtCSuAcMIvMBYGIcT0Muf0Bu9k=">AAACZHicdVHRatswFJXdbe2ybnNb9jQYl4XBYMHYWbMkb2Xbwx47WNpC7AVZuU5FZNlI8mgQ/sm+9bEv+47JSQrd6C4Ijs65h3t1lFWCaxNFN56/8+jxk929p51n+89fvAwODs90WSuGE1aKUl1kVKPgEieGG4EXlUJaZALPs+WXVj//hUrzUv4wqwrTgi4kzzmjxlGzwCY2EVQuBEJSUHOZ5faq+WmnPG16YFdbCIla9/QgDMMePOz4sLznaS93LoDEYV1nGs3Gw6iwX5tZ0I3Cfv/TqD+AKBwM4vh46MBoPB6OhxCH0bq6ZFuns+A6mZesLlAaJqjW0ziqTGqpMpwJbDpJrbGibEkXOHVQ0gJ1atchNfDOMXPIS+WONLBm7zssLbReFZnrbFfU/2ot+ZA2rU0+Si2XVW1Qss2gvBZgSmgThzlXyIxYOUCZ4m5XYJdUUWbcv3RcCHcvhf+Ds34Yfwyj78fdk8/bOPbIa/KWvCcxGZIT8o2ckglh5Nbb9QLvwPvt7/tH/qtNq+9tPUfkr/Lf/AEATbhf</latexit>
:
(a) Compute output (prediction)
(b) Calculate error
(c) Update <latexit sha1_base64="dSOm6mhH8W0/jVxZj2uADjYcZZw=">AAACBHicbVBNS8NAEJ3Ur1q/oh57WSyCBymJCnos6sFjBfsBTSmb7aZdutmE3Y1SQg9e/CtePCji1R/hzX/jps1BWx8MPN6bYWaeH3OmtON8W4Wl5ZXVteJ6aWNza3vH3t1rqiiRhDZIxCPZ9rGinAna0Exz2o4lxaHPacsfXWV+655KxSJxp8cx7YZ4IFjACNZG6tll75pyjZEXYj30g/Rhcoxyye/ZFafqTIEWiZuTCuSo9+wvrx+RJKRCE46V6rhOrLsplpoRTiclL1E0xmSEB7RjqMAhVd10+sQEHRqlj4JImhIaTdXfEykOlRqHvunMblXzXib+53USHVx0UybiRFNBZouChCMdoSwR1GeSEs3HhmAimbkVkSGWmGiTW8mE4M6/vEiaJ1X3tOrcnlVql3kcRSjDARyBC+dQgxuoQwMIPMIzvMKb9WS9WO/Wx6y1YOUz+/AH1ucPpw+Xcg==</latexit>
w, b
C. Update w, b : <latexit sha1_base64="fvrM1fkDFlOVkxHoGkYgMiCFIOA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwIWVGBV0W3bisYB/QDiWTZtrQTGZMMpUy9DvcuFDErR/jzr8x085CWw8EDufcyz05fiy4No7zjQorq2vrG8XN0tb2zu5eef+gqaNEUdagkYhU2yeaCS5Zw3AjWDtWjIS+YC1/dJv5rTFTmkfywUxi5oVkIHnAKTFW8rohMUM/SJ+mZ9jvlStO1ZkBLxM3JxXIUe+Vv7r9iCYhk4YKonXHdWLjpUQZTgWblrqJZjGhIzJgHUslCZn20lnoKT6xSh8HkbJPGjxTf2+kJNR6Evp2MgupF71M/M/rJCa49lIu48QwSeeHgkRgE+GsAdznilEjJpYQqrjNiumQKEKN7alkS3AXv7xMmudV96Lq3F9Wajd5HUU4gmM4BReuoAZ3UIcGUHiEZ3iFNzRGL+gdfcxHCyjfOYQ/QJ8/gEuR6Q==</latexit>

w := w +
<latexit sha1_base64="cm3FQz8Hc8HW9gDuCX9rtFlcymA=">AAACKHicbVDLSgMxFM34rPU16tJNsAiCUmZUUASxqAuXFewDOqUkaaYNzWSGJKOUoZ/jxl9xI6JIt36JmXbA2nogcHLOvdx7D444U9pxhtbc/MLi0nJuJb+6tr6xaW9tV1UYS0IrJOShrGOkKGeCVjTTnNYjSVGAOa3h3k3q1x6pVCwUD7of0WaAOoL5jCBtpJZ95QVId7GfPA3gxSWc+B1C75ZyjSa1I4jTql8Pt+yCU3RGgLPEzUgBZCi37HevHZI4oEITjpRquE6kmwmSmhFOB3kvVjRCpIc6tGGoQAFVzWR06ADuG6UN/VCaJzQcqZMdCQqU6gfYVKZLq2kvFf/zGrH2z5sJE1GsqSDjQX7MoQ5hmhpsM0mJ5n1DEJHM7ApJF0lEtMk2b0Jwp0+eJdXjontSdO5PC6XrLI4c2AV74AC44AyUwB0ogwog4Bm8gg/wab1Yb9aXNRyXzllZzw74A+v7BybXpBo=</latexit>
w, b := + b

Sebastian Raschka STAT 479: Deep Learning SS 2019 !12


General Learning Principle
Let D = (hx[1] , y[1] i, hx[2] , y[2] i, ..., hx[n] , y[n] i) 2 (Rm ⇥ {0, 1})n <latexit sha1_base64="Wal6cJLWU5bzcyF5u646zqqA+4A=">AAADVXiclZLfbtMwFMZP0jFGB6ywS24sKiSQpiqpkOAGaYJd7HIguk1qs8lxndaa7UT2CaKK+lo8yG4m3oQbJE7dSvvTIYGjyJ+/851fEsd5pZXHJPkZxa2NB5sPtx61tx8/ebrTefb82Je1E3IgSl2605x7qZWVA1So5WnlJDe5lif5xadF/eSbdF6V9ivOKpkZPrGqUIIjWWXnB4zAAAeEKQiaNTRwAHNg8IHu11TV5FqY0CzJuU7nUFD2O2XPaB5CChnpPcrMbjkjcDcIe4Hxr8z+GrP/F2YvXP9Ht2t0ew/9TWAoWi/345qWU8+XQDMhg5Qy1OHDqoGE2CmpOTHOwLLt80436SVhsHWRrkQXVuPovHM5GpeiNtKi0Nz7YZpUmDXcoRJaztuj2suKiws+kUOSlhvpsyacijl7Rc6YFaWj2yIL7s2OhhvvZyanpOE49XdrC/O+2rDG4n3WKFvVKK1YPqioNcOSLY4YGysnBeoZCS6condlYsodF0gHsU2bkN795HVx3O+lpD+/7e5/XG3HFryAl/QLUngH+3AIRzAAEV1Gv+IojuOr+Hdro7W5jMbRqmcXbo3Wzh+cKM3Z</latexit>

Most commonly used in DL, because


Minibatch mode

1. Choosing a subset (vs 1
(mix between on-line and batch)
example at a time)

m 1 takes advantage of

1. Initialize w := 0
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit>
, b := 0
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch: vectorization (faster


A. Initialize w := 0, b := 0 iteration through epoch
than on-line)
<latexit sha1_base64="MHuS1lDrYLNfDe0EZMaG9lxFVEM=">AAACDHicbVDLSgMxFM3UV62vqks3wSK4kDKjgqIIRV24rGAf0Cklk2ba0ExmSO4oZegHuPFX3LhQxK0f4M6/MdPOQlsPBE7OuZd77/EiwTXY9reVm5tfWFzKLxdWVtfWN4qbW3UdxoqyGg1FqJoe0UxwyWrAQbBmpBgJPMEa3uAq9Rv3TGkeyjsYRqwdkJ7kPqcEjNQpltxrJoBgNyDQ9/zkYXR2YR+45zjTPfM1VXbZHgPPEicjJZSh2il+ud2QxgGTQAXRuuXYEbQTooBTwUYFN9YsInRAeqxlqCQB0+1kfMwI7xmli/1QmScBj9XfHQkJtB4GnqlMd9bTXir+57Vi8E/bCZdRDEzSySA/FhhCnCaDu1wxCmJoCKGKm10x7RNFKJj8CiYEZ/rkWVI/LDtHZfv2uFS5zOLIox20i/aRg05QBd2gKqohih7RM3pFb9aT9WK9Wx+T0pyV9WyjP7A+fwABpZmn</latexit>

B. For every {hx[i] , y [i] i, ..., hx[i+k] , y [i+k] i} ⇢ D <latexit sha1_base64="KqtCSuAcMIvMBYGIcT0Muf0Bu9k=">AAACZHicdVHRatswFJXdbe2ybnNb9jQYl4XBYMHYWbMkb2Xbwx47WNpC7AVZuU5FZNlI8mgQ/sm+9bEv+47JSQrd6C4Ijs65h3t1lFWCaxNFN56/8+jxk929p51n+89fvAwODs90WSuGE1aKUl1kVKPgEieGG4EXlUJaZALPs+WXVj//hUrzUv4wqwrTgi4kzzmjxlGzwCY2EVQuBEJSUHOZ5faq+WmnPG16YFdbCIla9/QgDMMePOz4sLznaS93LoDEYV1nGs3Gw6iwX5tZ0I3Cfv/TqD+AKBwM4vh46MBoPB6OhxCH0bq6ZFuns+A6mZesLlAaJqjW0ziqTGqpMpwJbDpJrbGibEkXOHVQ0gJ1atchNfDOMXPIS+WONLBm7zssLbReFZnrbFfU/2ot+ZA2rU0+Si2XVW1Qss2gvBZgSmgThzlXyIxYOUCZ4m5XYJdUUWbcv3RcCHcvhf+Ds34Yfwyj78fdk8/bOPbIa/KWvCcxGZIT8o2ckglh5Nbb9QLvwPvt7/tH/qtNq+9tPUfkr/Lf/AEATbhf</latexit>


:
(a) Compute output (prediction) 2. having fewer updates
(b) Calculate error than "on-line" makes
(c) Update w, b updates less noisy
<latexit sha1_base64="dSOm6mhH8W0/jVxZj2uADjYcZZw=">AAACBHicbVBNS8NAEJ3Ur1q/oh57WSyCBymJCnos6sFjBfsBTSmb7aZdutmE3Y1SQg9e/CtePCji1R/hzX/jps1BWx8MPN6bYWaeH3OmtON8W4Wl5ZXVteJ6aWNza3vH3t1rqiiRhDZIxCPZ9rGinAna0Exz2o4lxaHPacsfXWV+655KxSJxp8cx7YZ4IFjACNZG6tll75pyjZEXYj30g/Rhcoxyye/ZFafqTIEWiZuTCuSo9+wvrx+RJKRCE46V6rhOrLsplpoRTiclL1E0xmSEB7RjqMAhVd10+sQEHRqlj4JImhIaTdXfEykOlRqHvunMblXzXib+53USHVx0UybiRFNBZouChCMdoSwR1GeSEs3HhmAimbkVkSGWmGiTW8mE4M6/vEiaJ1X3tOrcnlVql3kcRSjDARyBC+dQgxuoQwMIPMIzvMKb9WS9WO/Wx6y1YOUz+/AH1ucPpw+Xcg==</latexit>

C. Update w, b : <latexit sha1_base64="fvrM1fkDFlOVkxHoGkYgMiCFIOA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwIWVGBV0W3bisYB/QDiWTZtrQTGZMMpUy9DvcuFDErR/jzr8x085CWw8EDufcyz05fiy4No7zjQorq2vrG8XN0tb2zu5eef+gqaNEUdagkYhU2yeaCS5Zw3AjWDtWjIS+YC1/dJv5rTFTmkfywUxi5oVkIHnAKTFW8rohMUM/SJ+mZ9jvlStO1ZkBLxM3JxXIUe+Vv7r9iCYhk4YKonXHdWLjpUQZTgWblrqJZjGhIzJgHUslCZn20lnoKT6xSh8HkbJPGjxTf2+kJNR6Evp2MgupF71M/M/rJCa49lIu48QwSeeHgkRgE+GsAdznilEjJpYQqrjNiumQKEKN7alkS3AXv7xMmudV96Lq3F9Wajd5HUU4gmM4BReuoAZ3UIcGUHiEZ3iFNzRGL+gdfcxHCyjfOYQ/QJ8/gEuR6Q==</latexit>

3. makes more updates/


w := w + w, b := + b
<latexit sha1_base64="cm3FQz8Hc8HW9gDuCX9rtFlcymA=">AAACKHicbVDLSgMxFM34rPU16tJNsAiCUmZUUASxqAuXFewDOqUkaaYNzWSGJKOUoZ/jxl9xI6JIt36JmXbA2nogcHLOvdx7D444U9pxhtbc/MLi0nJuJb+6tr6xaW9tV1UYS0IrJOShrGOkKGeCVjTTnNYjSVGAOa3h3k3q1x6pVCwUD7of0WaAOoL5jCBtpJZ95QVId7GfPA3gxSWc+B1C75ZyjSa1I4jTql8Pt+yCU3RGgLPEzUgBZCi37HevHZI4oEITjpRquE6kmwmSmhFOB3kvVjRCpIc6tGGoQAFVzWR06ADuG6UN/VCaJzQcqZMdCQqU6gfYVKZLq2kvFf/zGrH2z5sJE1GsqSDjQX7MoQ5hmhpsM0mJ5n1DEJHM7ApJF0lEtMk2b0Jwp0+eJdXjontSdO5PC6XrLI4c2AV74AC44AyUwB0ogwog4Bm8gg/wab1Yb9aXNRyXzllZzw74A+v7BybXpBo=</latexit>

epoch than "batch" and


is thus faster

Sebastian Raschka STAT 479: Deep Learning SS 2019 !13
Linear Regression

Perceptron: Activation function is the threshold function


The output is a binary label ŷ 2 {0, 1} <latexit sha1_base64="y3piMpclCDl+9oRq2oU8ojsGOQg=">AAAB/XicbVDLSsNAFJ3UV62v+Ni5GSyCCymJCrosunFZwT6gCWUynbRDJ5MwcyPUEPwVNy4Ucet/uPNvnLZZaOuBC4dz7uXee4JEcA2O822VlpZXVtfK65WNza3tHXt3r6XjVFHWpLGIVScgmgkuWRM4CNZJFCNRIFg7GN1M/PYDU5rH8h7GCfMjMpA85JSAkXr2gTckkI1z7HGJvcw5xa6X9+yqU3OmwIvELUgVFWj07C+vH9M0YhKoIFp3XScBPyMKOBUsr3ipZgmhIzJgXUMliZj2s+n1OT42Sh+HsTIlAU/V3xMZibQeR4HpjAgM9bw3Ef/zuimEV37GZZICk3S2KEwFhhhPosB9rhgFMTaEUMXNrZgOiSIUTGAVE4I7//IiaZ3V3POac3dRrV8XcZTRITpCJ8hFl6iOblEDNRFFj+gZvaI368l6sd6tj1lrySpm9tEfWJ8/EWCUTw==</latexit>

b
<latexit sha1_base64="s6L+Z+fhtGywXDdOyCIOKOnOTTA=">AAAB6HicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjC/YD2lA220m7drMJuxuhhP4CLx4U8epP8ua/cdvmoK0vLDy8M8POvEEiuDau++0UNja3tneKu6W9/YPDo/LxSVvHqWLYYrGIVTegGgWX2DLcCOwmCmkUCOwEk7t5vfOESvNYPphpgn5ER5KHnFFjrWYwKFfcqrsQWQcvhwrkagzKX/1hzNIIpWGCat3z3MT4GVWGM4GzUj/VmFA2oSPsWZQ0Qu1ni0Vn5MI6QxLGyj5pyML9PZHRSOtpFNjOiJqxXq3Nzf9qvdSEN37GZZIalGz5UZgKYmIyv5oMuUJmxNQCZYrbXQkbU0WZsdmUbAje6snr0L6qepab15X6bR5HEc7gHC7BgxrU4R4a0AIGCM/wCm/Oo/PivDsfy9aCk8+cwh85nz/E44zm</latexit>
<latexit

x1
<latexit sha1_base64="Z7jxfJr8/pbKF9IEHv5u2p28PzU=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwuyjaA=</latexit>

Activation
w1
<latexit sha1_base64="yC0dBIEl9qTv0X7X9wtBoMi5o3k=">AAAB+nicbZBNS8NAEIYnftb6lerRS7AInkoigh6rXjxWsB/QhrLZbtulm03YnVRL7E/x4kERr/4Sb/4bt2kO2vrCwsM7M8zsG8SCa3Tdb2tldW19Y7OwVdze2d3bt0sHDR0lirI6jUSkWgHRTHDJ6shRsFasGAkDwZrB6GZWb46Z0jyS9ziJmR+SgeR9Tgkaq2uXOsgeMb2iyMeZNe3aZbfiZnKWwcuhDLlqXfur04toEjKJVBCt254bo58ShZwKNi12Es1iQkdkwNoGJQmZ9tPs9KlzYpye04+UeRKdzP09kZJQ60kYmM6Q4FAv1mbmf7V2gv1LP+UyTpBJOl/UT4SDkTPLwelxxSiKiQFCFTe3OnRIFKFo0iqaELzFLy9D46ziGb47L1ev8zgKcATHcAoeXEAVbqEGdaDwAM/wCm/Wk/VivVsf89YVK585hD+yPn8ADeOUgA==</latexit>

<latexit sha1_base64="ozSIzVA/SGXegmac4XRXthOpvw0=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwosjZ8=</latexit>

X
x2
<latexit sha1_base64="8ur8Qnjf68veizOKVqkUmBXGiPw=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwANNo2h</latexit>

w2 <latexit sha1_base64="dwm2weui/za/rSvUQ73K/Rk4/5k=">AAAB7XicbZDLSgMxFIbP1Futt1GXboJFcFVmRNBl0Y3LCvYC7VAyaaaNzWVIMkIZ+g5uXCji1vdx59uYtrPQ1h8CH/85h5zzxylnxgbBt1daW9/Y3CpvV3Z29/YP/MOjllGZJrRJFFe6E2NDOZO0aZnltJNqikXMaTse387q7SeqDVPywU5SGgk8lCxhBFtntXqGDQXu+9WgFsyFViEsoAqFGn3/qzdQJBNUWsKxMd0wSG2UY20Z4XRa6WWGppiM8ZB2HUosqIny+bZTdOacAUqUdk9aNHd/T+RYGDMRsesU2I7Mcm1m/lfrZja5jnIm08xSSRYfJRlHVqHZ6WjANCWWTxxgopnbFZER1phYF1DFhRAun7wKrYta6Pj+slq/KeIowwmcwjmEcAV1uIMGNIHAIzzDK7x5ynvx3r2PRWvJK2aO4Y+8zx+cYY8j</latexit>

<latexit sha1_base64="Vi95YwknFrFzcB5LyqgiYSoMf0U=">AAAB7nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjBfsBbSib7aZdutmE3YkQQn+EFw+KePX3ePPfuG1z0NYXFh7emWFn3iCRwqDrfjuljc2t7Z3ybmVv/+DwqHp80jFxqhlvs1jGuhdQw6VQvI0CJe8lmtMokLwbTO/m9e4T10bE6hGzhPsRHSsRCkbRWt3BhGKezYbVmlt3FyLr4BVQg0KtYfVrMIpZGnGFTFJj+p6boJ9TjYJJPqsMUsMTyqZ0zPsWFY248fPFujNyYZ0RCWNtn0KycH9P5DQyJosC2xlRnJjV2tz8r9ZPMbzxc6GSFLliy4/CVBKMyfx2MhKaM5SZBcq0sLsSNqGaMrQJVWwI3urJ69C5qnuWH65rzdsijjKcwTlcggcNaMI9tKANDKbwDK/w5iTOi/PufCxbS04xcwp/5Hz+ALHDj8o=</latexit>

<latexit sha1_base64="sAAe226MpFncoK5AcSpzzUnkA9I=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwALsI2g</latexit> <latexit sha1_base64="0Hc81E1zKREVtkdUaco0RSs+Ymk=">AAAB63icbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHCrYW2qVk02wbmmSXZFYoS/+CFw+KePUPefPfmLZ70NYXAg/vzJCZN0qlsOj7315pbX1jc6u8XdnZ3ds/qB4etW2SGcZbLJGJ6UTUcik0b6FAyTup4VRFkj9G49tZ/fGJGysS/YCTlIeKDrWIBaM4s3o2U/1qza/7c5FVCAqoQaFmv/rVGyQsU1wjk9TabuCnGObUoGCSTyu9zPKUsjEd8q5DTRW3YT7fdUrOnDMgcWLc00jm7u+JnCprJypynYriyC7XZuZ/tW6G8XWYC51myDVbfBRnkmBCZoeTgTCcoZw4oMwItythI2ooQxdPxYUQLJ+8Cu2LeuD4/rLWuCniKMMJnMI5BHAFDbiDJrSAwQie4RXePOW9eO/ex6K15BUzx/BH3ucPMe2OUw==</latexit>

Output
.. <latexit sha1_base64="gvQE9cb1Dja5lCiXX5pMwfJapj8=">AAAB9HicbZBNS8NAEIY3ftb6VfXoZbEInkoigh6LXrxZwX5AG8pmO2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqjnUeSxj3QqYASkU1FGghFaigUWBhGYwvJnWmyPQRsTqAccJ+BHrKxEKztBafgfhCbO7FJMUJ91S2a24M9Fl8HIok1y1bumr04t5GoFCLpkxbc9N0M+YRsElTIqd1EDC+JD1oW1RsQiMn82WntBT6/RoGGv7FNKZ+3siY5Ex4yiwnRHDgVmsTc3/au0Uwys/E8qeBIrPPwpTSTGm0wRoT2jgKMcWGNfC7kr5gGnG0eZUtCF4iycvQ+O84lm+vyhXr/M4CuSYnJAz4pFLUiW3pEbqhJNH8kxeyZszcl6cd+dj3rri5DNH5I+czx+i4ZKm</latexit>

. Net input
<latexit sha1_base64="vnW9SOTDG2wSeqwpvMYjb0pYOfc=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJVEBF0W3biSCvYCbSiT6Uk7dHJh5qRYQt/EjQtF3Pom7nwbp2kW2vrDwMd/zmHO+f1ECo2O822V1tY3NrfK25Wd3b39A/vwqKXjVHFo8ljGquMzDVJE0ESBEjqJAhb6Etr++HZeb09AaRFHjzhNwAvZMBKB4AyN1bftHsITZveAVERJirO+XXVqTi66Cm4BVVKo0be/eoOYpyFEyCXTuus6CXoZUyi4hFmll2pIGB+zIXQNRiwE7WX55jN6ZpwBDWJlXoQ0d39PZCzUehr6pjNkONLLtbn5X62bYnDtZflJEPHFR0EqKcZ0HgMdCAUc5dQA40qYXSkfMcU4mrAqJgR3+eRVaF3UXMMPl9X6TRFHmZyQU3JOXHJF6uSONEiTcDIhz+SVvFmZ9WK9Wx+L1pJVzByTP7I+fwD13ZPb</latexit>

wm
<latexit sha1_base64="3SltFZgdSbEccduFdJMJ4sVJM+s=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMquUpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2Ucjds=</latexit>

xm <latexit sha1_base64="UZ/Cq01CQU77ibJgEHsrgiYApIY=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMiuWpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2aijdw=</latexit>

Inputs
<latexit sha1_base64="kW2ZbIA+FSvwKPaKbZPllX8WYNo=">AAAB9HicbZBNS8NAEIYnftb6VfXoZbEInkoigh6LXvRWwX5AG8pmu2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqhmvs1jGuhVQw6VQvI4CJW8lmtMokLwZDG+m9eaIayNi9YDjhPsR7SsRCkbRWn4H+RNmdypJ0Uy6pbJbcWciy+DlUIZctW7pq9OLWRpxhUxSY9qem6CfUY2CST4pdlLDE8qGtM/bFhWNuPGz2dITcmqdHgljbZ9CMnN/T2Q0MmYcBbYzojgwi7Wp+V+tnWJ45WdiehNXbP5RmEqCMZkmQHpCc4ZybIEyLeyuhA2opgxtTkUbgrd48jI0ziue5fuLcvU6j6MAx3ACZ+DBJVThFmpQBwaP8Ayv8OaMnBfn3fmYt644+cwR/JHz+QONXpKY</latexit>
Linear Regression: Activation function is the identity function

<latexit sha1_base64="Ydlldv6MD7qsnCb2aSsw8EVIUWA=">AAAB9HicbVBNSwMxEJ31s9avqkcvwSLUS9lVQS9C0YvHCvYD2qVk02wbmmTXJFtalv4OLx4U8eqP8ea/MW33oK0PBh7vzTAzL4g508Z1v52V1bX1jc3cVn57Z3dvv3BwWNdRogitkYhHqhlgTTmTtGaY4bQZK4pFwGkjGNxN/caQKs0i+WjGMfUF7kkWMoKNlfy2Zj2BS6MzdINGnULRLbszoGXiZaQIGaqdwle7G5FEUGkIx1q3PDc2foqVYYTTSb6daBpjMsA92rJUYkG1n86OnqBTq3RRGClb0qCZ+nsixULrsQhsp8Cmrxe9qfif10pMeO2nTMaJoZLMF4UJRyZC0wRQlylKDB9bgoli9lZE+lhhYmxOeRuCt/jyMqmfl72LsvtwWazcZnHk4BhOoAQeXEEF7qEKNSDwBM/wCm/O0Hlx3p2PeeuKk80cwR84nz9ZypEp</latexit>
(x) = x

The output is a real number ŷ 2 R


<latexit sha1_base64="34cYQ5eewXcqLxIiHqs6CduP9zk=">AAAB/3icbVDLSsNAFJ3UV62vqODGzWARXJVEBV0W3bisYh/QhDKZTtuhk0mYuRFKzMJfceNCEbf+hjv/xkmbhbYeGDiccy/3zAliwTU4zrdVWlpeWV0rr1c2Nre2d+zdvZaOEkVZk0YiUp2AaCa4ZE3gIFgnVoyEgWDtYHyd++0HpjSP5D1MYuaHZCj5gFMCRurZB96IQDrJsMcl9kICoyBI77KeXXVqzhR4kbgFqaICjZ795fUjmoRMAhVE667rxOCnRAGngmUVL9EsJnRMhqxrqCQh0346zZ/hY6P08SBS5knAU/X3RkpCrSdhYCbzhHrey8X/vG4Cg0s/5TJOgEk6OzRIBIYI52XgPleMgpgYQqjiJiumI6IIBVNZxZTgzn95kbROa+5Zzbk9r9avijrK6BAdoRPkogtURzeogZqIokf0jF7Rm/VkvVjv1sdstGQVO/voD6zPHwgUlho=</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !14


Linear Regression

Perceptron: Activation function is the threshold function


The output is a binary label ŷ 2 {0, 1} <latexit sha1_base64="y3piMpclCDl+9oRq2oU8ojsGOQg=">AAAB/XicbVDLSsNAFJ3UV62v+Ni5GSyCCymJCrosunFZwT6gCWUynbRDJ5MwcyPUEPwVNy4Ucet/uPNvnLZZaOuBC4dz7uXee4JEcA2O822VlpZXVtfK65WNza3tHXt3r6XjVFHWpLGIVScgmgkuWRM4CNZJFCNRIFg7GN1M/PYDU5rH8h7GCfMjMpA85JSAkXr2gTckkI1z7HGJvcw5xa6X9+yqU3OmwIvELUgVFWj07C+vH9M0YhKoIFp3XScBPyMKOBUsr3ipZgmhIzJgXUMliZj2s+n1OT42Sh+HsTIlAU/V3xMZibQeR4HpjAgM9bw3Ef/zuimEV37GZZICk3S2KEwFhhhPosB9rhgFMTaEUMXNrZgOiSIUTGAVE4I7//IiaZ3V3POac3dRrV8XcZTRITpCJ8hFl6iOblEDNRFFj+gZvaI368l6sd6tj1lrySpm9tEfWJ8/EWCUTw==</latexit>

b
<latexit sha1_base64="s6L+Z+fhtGywXDdOyCIOKOnOTTA=">AAAB6HicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjC/YD2lA220m7drMJuxuhhP4CLx4U8epP8ua/cdvmoK0vLDy8M8POvEEiuDau++0UNja3tneKu6W9/YPDo/LxSVvHqWLYYrGIVTegGgWX2DLcCOwmCmkUCOwEk7t5vfOESvNYPphpgn5ER5KHnFFjrWYwKFfcqrsQWQcvhwrkagzKX/1hzNIIpWGCat3z3MT4GVWGM4GzUj/VmFA2oSPsWZQ0Qu1ni0Vn5MI6QxLGyj5pyML9PZHRSOtpFNjOiJqxXq3Nzf9qvdSEN37GZZIalGz5UZgKYmIyv5oMuUJmxNQCZYrbXQkbU0WZsdmUbAje6snr0L6qepab15X6bR5HEc7gHC7BgxrU4R4a0AIGCM/wCm/Oo/PivDsfy9aCk8+cwh85nz/E44zm</latexit>
<latexit

x1
<latexit sha1_base64="Z7jxfJr8/pbKF9IEHv5u2p28PzU=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwuyjaA=</latexit>

Activation
w1 You can think of this as a
<latexit sha1_base64="yC0dBIEl9qTv0X7X9wtBoMi5o3k=">AAAB+nicbZBNS8NAEIYnftb6lerRS7AInkoigh6rXjxWsB/QhrLZbtulm03YnVRL7E/x4kERr/4Sb/4bt2kO2vrCwsM7M8zsG8SCa3Tdb2tldW19Y7OwVdze2d3bt0sHDR0lirI6jUSkWgHRTHDJ6shRsFasGAkDwZrB6GZWb46Z0jyS9ziJmR+SgeR9Tgkaq2uXOsgeMb2iyMeZNe3aZbfiZnKWwcuhDLlqXfur04toEjKJVBCt254bo58ShZwKNi12Es1iQkdkwNoGJQmZ9tPs9KlzYpye04+UeRKdzP09kZJQ60kYmM6Q4FAv1mbmf7V2gv1LP+UyTpBJOl/UT4SDkTPLwelxxSiKiQFCFTe3OnRIFKFo0iqaELzFLy9D46ziGb47L1ev8zgKcATHcAoeXEAVbqEGdaDwAM/wCm/Wk/VivVsf89YVK585hD+yPn8ADeOUgA==</latexit>

<latexit sha1_base64="ozSIzVA/SGXegmac4XRXthOpvw0=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwosjZ8=</latexit>

X
x2
<latexit sha1_base64="8ur8Qnjf68veizOKVqkUmBXGiPw=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwANNo2h</latexit>

w2
<latexit sha1_base64="sAAe226MpFncoK5AcSpzzUnkA9I=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwALsI2g</latexit> <latexit sha1_base64="0Hc81E1zKREVtkdUaco0RSs+Ymk=">AAAB63icbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHCrYW2qVk02wbmmSXZFYoS/+CFw+KePUPefPfmLZ70NYXAg/vzJCZN0qlsOj7315pbX1jc6u8XdnZ3ds/qB4etW2SGcZbLJGJ6UTUcik0b6FAyTup4VRFkj9G49tZ/fGJGysS/YCTlIeKDrWIBaM4s3o2U/1qza/7c5FVCAqoQaFmv/rVGyQsU1wjk9TabuCnGObUoGCSTyu9zPKUsjEd8q5DTRW3YT7fdUrOnDMgcWLc00jm7u+JnCprJypynYriyC7XZuZ/tW6G8XWYC51myDVbfBRnkmBCZoeTgTCcoZw4oMwItythI2ooQxdPxYUQLJ+8Cu2LeuD4/rLWuCniKMMJnMI5BHAFDbiDJrSAwQie4RXePOW9eO/ex6K15BUzx/BH3ucPMe2OUw==</latexit>
<latexit sha1_base64="dwm2weui/za/rSvUQ73K/Rk4/5k=">AAAB7XicbZDLSgMxFIbP1Futt1GXboJFcFVmRNBl0Y3LCvYC7VAyaaaNzWVIMkIZ+g5uXCji1vdx59uYtrPQ1h8CH/85h5zzxylnxgbBt1daW9/Y3CpvV3Z29/YP/MOjllGZJrRJFFe6E2NDOZO0aZnltJNqikXMaTse387q7SeqDVPywU5SGgk8lCxhBFtntXqGDQXu+9WgFsyFViEsoAqFGn3/qzdQJBNUWsKxMd0wSG2UY20Z4XRa6WWGppiM8ZB2HUosqIny+bZTdOacAUqUdk9aNHd/T+RYGDMRsesU2I7Mcm1m/lfrZja5jnIm08xSSRYfJRlHVqHZ6WjANCWWTxxgopnbFZER1phYF1DFhRAun7wKrYta6Pj+slq/KeIowwmcwjmEcAV1uIMGNIHAIzzDK7x5ynvx3r2PRWvJK2aO4Y+8zx+cYY8j</latexit>

<latexit sha1_base64="Vi95YwknFrFzcB5LyqgiYSoMf0U=">AAAB7nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjBfsBbSib7aZdutmE3YkQQn+EFw+KePX3ePPfuG1z0NYXFh7emWFn3iCRwqDrfjuljc2t7Z3ybmVv/+DwqHp80jFxqhlvs1jGuhdQw6VQvI0CJe8lmtMokLwbTO/m9e4T10bE6hGzhPsRHSsRCkbRWt3BhGKezYbVmlt3FyLr4BVQg0KtYfVrMIpZGnGFTFJj+p6boJ9TjYJJPqsMUsMTyqZ0zPsWFY248fPFujNyYZ0RCWNtn0KycH9P5DQyJosC2xlRnJjV2tz8r9ZPMbzxc6GSFLliy4/CVBKMyfx2MhKaM5SZBcq0sLsSNqGaMrQJVWwI3urJ69C5qnuWH65rzdsijjKcwTlcggcNaMI9tKANDKbwDK/w5iTOi/PufCxbS04xcwp/5Hz+ALHDj8o=</latexit>

linear neuron!
Output
.. <latexit sha1_base64="gvQE9cb1Dja5lCiXX5pMwfJapj8=">AAAB9HicbZBNS8NAEIY3ftb6VfXoZbEInkoigh6LXrxZwX5AG8pmO2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqjnUeSxj3QqYASkU1FGghFaigUWBhGYwvJnWmyPQRsTqAccJ+BHrKxEKztBafgfhCbO7FJMUJ91S2a24M9Fl8HIok1y1bumr04t5GoFCLpkxbc9N0M+YRsElTIqd1EDC+JD1oW1RsQiMn82WntBT6/RoGGv7FNKZ+3siY5Ex4yiwnRHDgVmsTc3/au0Uwys/E8qeBIrPPwpTSTGm0wRoT2jgKMcWGNfC7kr5gGnG0eZUtCF4iycvQ+O84lm+vyhXr/M4CuSYnJAz4pFLUiW3pEbqhJNH8kxeyZszcl6cd+dj3rri5DNH5I+czx+i4ZKm</latexit>

. Net input
<latexit sha1_base64="vnW9SOTDG2wSeqwpvMYjb0pYOfc=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJVEBF0W3biSCvYCbSiT6Uk7dHJh5qRYQt/EjQtF3Pom7nwbp2kW2vrDwMd/zmHO+f1ECo2O822V1tY3NrfK25Wd3b39A/vwqKXjVHFo8ljGquMzDVJE0ESBEjqJAhb6Etr++HZeb09AaRFHjzhNwAvZMBKB4AyN1bftHsITZveAVERJirO+XXVqTi66Cm4BVVKo0be/eoOYpyFEyCXTuus6CXoZUyi4hFmll2pIGB+zIXQNRiwE7WX55jN6ZpwBDWJlXoQ0d39PZCzUehr6pjNkONLLtbn5X62bYnDtZflJEPHFR0EqKcZ0HgMdCAUc5dQA40qYXSkfMcU4mrAqJgR3+eRVaF3UXMMPl9X6TRFHmZyQU3JOXHJF6uSONEiTcDIhz+SVvFmZ9WK9Wx+L1pJVzByTP7I+fwD13ZPb</latexit>

wm
<latexit sha1_base64="3SltFZgdSbEccduFdJMJ4sVJM+s=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMquUpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2Ucjds=</latexit>

xm <latexit sha1_base64="UZ/Cq01CQU77ibJgEHsrgiYApIY=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMiuWpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2aijdw=</latexit>

Inputs
<latexit sha1_base64="kW2ZbIA+FSvwKPaKbZPllX8WYNo=">AAAB9HicbZBNS8NAEIYnftb6VfXoZbEInkoigh6LXvRWwX5AG8pmu2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqhmvs1jGuhVQw6VQvI4CJW8lmtMokLwZDG+m9eaIayNi9YDjhPsR7SsRCkbRWn4H+RNmdypJ0Uy6pbJbcWciy+DlUIZctW7pq9OLWRpxhUxSY9qem6CfUY2CST4pdlLDE8qGtM/bFhWNuPGz2dITcmqdHgljbZ9CMnN/T2Q0MmYcBbYzojgwi7Wp+V+tnWJ45WdiehNXbP5RmEqCMZkmQHpCc4ZybIEyLeyuhA2opgxtTkUbgrd48jI0ziue5fuLcvU6j6MAx3ACZ+DBJVThFmpQBwaP8Ayv8OaMnBfn3fmYt644+cwR/JHz+QONXpKY</latexit>
Linear Regression: Activation function is the identity function

<latexit sha1_base64="Ydlldv6MD7qsnCb2aSsw8EVIUWA=">AAAB9HicbVBNSwMxEJ31s9avqkcvwSLUS9lVQS9C0YvHCvYD2qVk02wbmmTXJFtalv4OLx4U8eqP8ea/MW33oK0PBh7vzTAzL4g508Z1v52V1bX1jc3cVn57Z3dvv3BwWNdRogitkYhHqhlgTTmTtGaY4bQZK4pFwGkjGNxN/caQKs0i+WjGMfUF7kkWMoKNlfy2Zj2BS6MzdINGnULRLbszoGXiZaQIGaqdwle7G5FEUGkIx1q3PDc2foqVYYTTSb6daBpjMsA92rJUYkG1n86OnqBTq3RRGClb0qCZ+nsixULrsQhsp8Cmrxe9qfif10pMeO2nTMaJoZLMF4UJRyZC0wRQlylKDB9bgoli9lZE+lhhYmxOeRuCt/jyMqmfl72LsvtwWazcZnHk4BhOoAQeXEEF7qEKNSDwBM/wCm/O0Hlx3p2PeeuKk80cwR84nz9ZypEp</latexit>
(x) = x

The output is a real number ŷ 2 R


<latexit sha1_base64="34cYQ5eewXcqLxIiHqs6CduP9zk=">AAAB/3icbVDLSsNAFJ3UV62vqODGzWARXJVEBV0W3bisYh/QhDKZTtuhk0mYuRFKzMJfceNCEbf+hjv/xkmbhbYeGDiccy/3zAliwTU4zrdVWlpeWV0rr1c2Nre2d+zdvZaOEkVZk0YiUp2AaCa4ZE3gIFgnVoyEgWDtYHyd++0HpjSP5D1MYuaHZCj5gFMCRurZB96IQDrJsMcl9kICoyBI77KeXXVqzhR4kbgFqaICjZ795fUjmoRMAhVE667rxOCnRAGngmUVL9EsJnRMhqxrqCQh0346zZ/hY6P08SBS5knAU/X3RkpCrSdhYCbzhHrey8X/vG4Cg0s/5TJOgEk6OzRIBIYI52XgPleMgpgYQqjiJiumI6IIBVNZxZTgzn95kbROa+5Zzbk9r9avijrK6BAdoRPkogtURzeogZqIokf0jF7Rm/VkvVjv1sdstGQVO/voD6zPHwgUlho=</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !15


(Least-Squares) Linear Regression

In earlier statistics classes, you probably fit a model like this:


using the "normal equations:"

> 1 > (implying that the bias is included, and the


w = (X X)
<latexit sha1_base64="cEDX5x/d6diMdwnwOnJvmXoPehU=">AAACKHicbVDLSsNAFJ3UV62vqEs3g0WoC0uigm7EohuXFewDmrRMppN26OTBzEQJIZ/jxl9xI6JIt36Jkzaith4YOPece5l7jxMyKqRhjLXCwuLS8kpxtbS2vrG5pW/vNEUQcUwaOGABbztIEEZ90pBUMtIOOUGew0jLGV1nfuuecEED/07GIbE9NPCpSzGSSurpl5aH5NBxk4cUXsDKd9VOu5YMwp/ysJscmemMDeNSTy8bVWMCOE/MnJRBjnpPf7X6AY484kvMkBAd0wilnSAuKWYkLVmRICHCIzQgHUV95BFhJ5NDU3iglD50A66eL+FE/T2RIE+I2HNUZ7apmPUy8T+vE0n33E6oH0aS+Hj6kRsxKAOYpQb7lBMsWawIwpyqXSEeIo6wVNlmIZizJ8+T5nHVPKkat6fl2lUeRxHsgX1QASY4AzVwA+qgATB4BM/gDbxrT9qL9qGNp60FLZ/ZBX+gfX4Bt8am+Q==</latexit>
X y design matrix has an additional vector of 1's)

Sebastian Raschka STAT 479: Deep Learning SS 2019 !16


(Least-Squares) Linear Regression

In earlier statistics classes, you probably fit a model like this:


using the "normal equations:"

> 1 > (implying that the bias is included, and the


w = (X X)
<latexit sha1_base64="cEDX5x/d6diMdwnwOnJvmXoPehU=">AAACKHicbVDLSsNAFJ3UV62vqEs3g0WoC0uigm7EohuXFewDmrRMppN26OTBzEQJIZ/jxl9xI6JIt36Jkzaith4YOPece5l7jxMyKqRhjLXCwuLS8kpxtbS2vrG5pW/vNEUQcUwaOGABbztIEEZ90pBUMtIOOUGew0jLGV1nfuuecEED/07GIbE9NPCpSzGSSurpl5aH5NBxk4cUXsDKd9VOu5YMwp/ysJscmemMDeNSTy8bVWMCOE/MnJRBjnpPf7X6AY484kvMkBAd0wilnSAuKWYkLVmRICHCIzQgHUV95BFhJ5NDU3iglD50A66eL+FE/T2RIE+I2HNUZ7apmPUy8T+vE0n33E6oH0aS+Hj6kRsxKAOYpQb7lBMsWawIwpyqXSEeIo6wVNlmIZizJ8+T5nHVPKkat6fl2lUeRxHsgX1QASY4AzVwA+qgATB4BM/gDbxrT9qL9qGNp60FLZ/ZBX+gfX4Bt8am+Q==</latexit>
X y design matrix has an additional vector of 1's)

• Generally, this is the best approach for linear regression (although,



the matrix inversion might be problematic on large datasets)
• However, we will now learn about another way to learn these
parameters iteratively
• Why? Because this is what we will be doing in deep neural nets
later, where we have large datasets, many connections, and non-
convex loss functions

Sebastian Raschka STAT 479: Deep Learning SS 2019 !17


(Least-Squares) Linear Regression

• A very naive way to fit a linear regression model (and any neural net)

is to start with all-zero or random parameters
• Then, for k rounds
• Choose another random set of weights
• If the model performs better, keep those weights
• If the model performs worse, discard the weights

This approach is guaranteed to find the optimal solution


for very large k, but it would be terribly slow.

Sebastian Raschka STAT 479: Deep Learning SS 2019 !18


(Least-Squares) Linear Regression
• A very naive way to fit a linear regression model (and any neural net)

is to start with all-zero or random parameters
• Then, for k rounds
• Choose another random set of weights
• If the model performs better, keep those weights
• If the model performs worse, discard the weights

• There's a better way!


• We will analyze what effect a change of a parameter has on the
predictive performance (loss) of the model

then, we change the weight a little bit in the direction that
improves the performance (minimizes the loss) the most
• We do this in several (small) steps until the loss does not further
decrease

Sebastian Raschka STAT 479: Deep Learning SS 2019 !19


(Least-Squares) Linear Regression
The update rule turns out to be this:

"On-line" mode
m 1
m 1 1. Initialize w := 0 , b := 0
1. Initialize w := 0 , b := 0
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit> <latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch:


<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit> <latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch:


[i] [i]
A. For every hx[i] , y [i] i 2 D
A. For every hx , y i 2 D
<latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>

<latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>

[i]
(a) ŷ := x[i]T w + b
(a) ŷ [i] := x[i]T w + b <latexit sha1_base64="oOGy76Ku6BuYykfyTG/uo81hVcc=">AAACLHicbVDLSsNAFJ34rPUVdelmsAgVoSQqKIJQ7MZlhb4giWUynbRDJw9mJmoI+SA3/oogLizi1u9w0lbQ1gMDh3POZe49bsSokIYx0hYWl5ZXVgtrxfWNza1tfWe3JcKYY9LEIQt5x0WCMBqQpqSSkU7ECfJdRtrusJb77XvCBQ2Dhkwi4vioH1CPYiSV1NVr9gDJNMnuUos6Gby8gragfR/ZLu2XbR/JgeuljxO7kcEf5SGD8Bi6MI8ddfWSUTHGgPPEnJISmKLe1V/tXohjnwQSMySEZRqRdFLEJcWMZEU7FiRCeIj6xFI0QD4RTjo+NoOHSulBL+TqBRKO1d8TKfKFSHxXJfNlxayXi/95Viy9CyelQRRLEuDJR17MoAxh3hzsUU6wZIkiCHOqdoV4gDjCUvVbVCWYsyfPk9ZJxTytGLdnper1tI4C2AcHoAxMcA6q4AbUQRNg8ARewDsYac/am/ahfU6iC9p0Zg/8gfb1Dbbkp9A=</latexit>

(b) rw L = y [i] ŷ [i] x[i]


<latexit sha1_base64="oOGy76Ku6BuYykfyTG/uo81hVcc=">AAACLHicbVDLSsNAFJ34rPUVdelmsAgVoSQqKIJQ7MZlhb4giWUynbRDJw9mJmoI+SA3/oogLizi1u9w0lbQ1gMDh3POZe49bsSokIYx0hYWl5ZXVgtrxfWNza1tfWe3JcKYY9LEIQt5x0WCMBqQpqSSkU7ECfJdRtrusJb77XvCBQ2Dhkwi4vioH1CPYiSV1NVr9gDJNMnuUos6Gby8gragfR/ZLu2XbR/JgeuljxO7kcEf5SGD8Bi6MI8ddfWSUTHGgPPEnJISmKLe1V/tXohjnwQSMySEZRqRdFLEJcWMZEU7FiRCeIj6xFI0QD4RTjo+NoOHSulBL+TqBRKO1d8TKfKFSHxXJfNlxayXi/95Viy9CyelQRRLEuDJR17MoAxh3hzsUU6wZIkiCHOqdoV4gDjCUvVbVCWYsyfPk9ZJxTytGLdnper1tI4C2AcHoAxMcA6q4AbUQRNg8ARewDsYac/am/ahfU6iC9p0Zg/8gfb1Dbbkp9A=</latexit>

(b) err := (y [i] ŷ [i] ) <latexit sha1_base64="lT23XFPchd8kc3sHX+4ft8LZq5M=">AAACQHicbVBLTxsxGPRSHiEFGuiRi0VUCQ5Eu1AJLkgRXHroIZXIQ8ou0beON7Hi9a5sLxBZ+9N64Sf01jMXDkWo157qTRbxHMnSeOYb+fOEKWdKu+5vZ+HD4tLySmW1+nFtfeNTbXOro5JMEtomCU9kLwRFORO0rZnmtJdKCnHIaTecnBV+95JKxRJxrqcpDWIYCRYxAtpKg1rXFxByGBg/Bj0OI3OV53jGCXDzPccn2A/ZaNdM8wvTZ0GO97E/Bv10L+w9/Bi/LuVBre423BnwW+KVpI5KtAa1X/4wIVlMhSYclOp7bqoDA1Izwmle9TNFUyATGNG+pQJiqgIzKyDHX6wyxFEi7REaz9TnCQOxUtM4tJPFnuq1V4jvef1MR8eBYSLNNBVk/lCUcawTXLSJh0xSovnUEiCS2V0xGYMEom3nVVuC9/rLb0nnoOEdNtwfX+vN07KOCtpGO2gXeegINdE31EJtRNBPdIv+oHvnxrlzHpy/89EFp8x8Ri/g/PsPLyyxUA==</latexit>

rb L = y [i] ŷ [i]
<latexit sha1_base64="wYimCRAo97FH6nzYS/fz3/CUouY=">AAACEHicbZDLSsNAFIYn3q23qEs3g0XUhZKIoAhC0Y1LBWuFJpbJ9NQOnVyYORFDyCO48VXcuFDErUt3vo3TNgut/jDwzX/OYeb8QSKFRsf5ssbGJyanpmdmK3PzC4tL9vLKlY5TxaHOYxmr64BpkCKCOgqUcJ0oYGEgoRH0Tvv1xh0oLeLoErME/JDdRqIjOENjtexND+Eec1CqoEfHdCu7yZvCL+gO9boM86wY3rdbdtXZdQaif8EtoUpKnbfsT68d8zSECLlkWjddJ0E/ZwoFl1BUvFRDwniP3ULTYMRC0H4+WKigG8Zp006szImQDtyfEzkLtc7CwHSGDLt6tNY3/6s1U+wc+rmIkhQh4sOHOqmkGNN+OrQtFHCUmQHGlTB/pbzLFONoMqyYENzRlf/C1d6ua/hiv1o7KeOYIWtknWwRlxyQGjkj56ROOHkgT+SFvFqP1rP1Zr0PW8escmaV/JL18Q3haZx5</latexit>

(c) w := w + err ⇥ x[i]


<latexit sha1_base64="ypGRpFBWbrLkFvM0DBOQw+AAwc4=">AAACH3icbVDLSsNAFJ3UV62vqEs3g0UQhJKIqAhC0Y3LCvYBSSyT6aQdOpmEmYlaQv7Ejb/ixoUi4q5/46QtUlsPDJw5517uvcePGZXKsoZGYWFxaXmluFpaW9/Y3DK3dxoySgQmdRyxSLR8JAmjnNQVVYy0YkFQ6DPS9PvXud98IELSiN+pQUy8EHU5DShGSktt89QNker5QfqYwYtLOPU7gkQI6CoaEvmrP2X3qUO9rG2WrYo1Apwn9oSUwQS1tvntdiKchIQrzJCUjm3FykuRUBQzkpXcRJIY4T7qEkdTjvRULx3dl8EDrXRgEAn9uIIjdbojRaGUg9DXlfmectbLxf88J1HBuZdSHieKcDweFCQMqgjmYcEOFQQrNtAEYUH1rhD3kEBY6UhLOgR79uR50jiu2JrfnpSrV5M4imAP7INDYIMzUAU3oAbqAINn8ArewYfxYrwZn8bXuLRgTHp2wR8Ywx9A/KMf</latexit>
<latexit sha1_base64="bG/7iZYm4z3wJyVWyFcMRLqQjzA=">AAACJ3icbVDLSgMxFM34tr5GXboJFqEuLDMq6EYpunHhQsE+oDOWO2nahmYyQ5IRyjB/48ZfcSOoiC79EzNtwUc9EDg5J5ebc4KYM6Ud58Oamp6ZnZtfWCwsLa+srtnrGzUVJZLQKol4JBsBKMqZoFXNNKeNWFIIA07rQf889+t3VCoWiRs9iKkfQlewDiOgjdSyTz0BAYdWGmTYC0H3CPD0MsMn2AtYt5QOstu0yfwM72GvB/r7ntu7uGUXnbIzBJ4k7pgU0RhXLfvZa0ckCanQhINSTdeJtZ+C1IxwmhW8RNEYSB+6tGmogJAqPx3mzPCOUdq4E0lzhMZD9edECqFSg9BE2cmjqL9eLv7nNRPdOfZTJuJEU0FGizoJxzrCeWm4zSQlmg8MASKZ+SsmPZBAtKm2YEpw/0aeJLX9sntQdq4Pi5WzcR0LaAttoxJy0RGqoAt0haqIoHv0iF7Qq/VgPVlv1vvo6ZQ1ntlEv2B9fgGqgKXG</latexit>

b := b + err <latexit sha1_base64="IKvCkhRCp9fG8NqWukFmHIJ1x5U=">AAAB9HicbVDLSgNBEOz1GeMr6tHLYBAEIeyqoAhC0IvHCOYByRJmJ73JkNnZdWY2EJZ8hxcPinj1Y7z5N04eB00saCiquunuChLBtXHdb2dpeWV1bT23kd/c2t7ZLezt13ScKoZVFotYNQKqUXCJVcONwEaikEaBwHrQvxv79QEqzWP5aIYJ+hHtSh5yRo2V/IBc35CAnBJUirQLRbfkTkAWiTcjRZih0i58tToxSyOUhgmqddNzE+NnVBnOBI7yrVRjQlmfdrFpqaQRaj+bHD0ix1bpkDBWtqQhE/X3REYjrYdRYDsjanp63huL/3nN1IRXfsZlkhqUbLooTAUxMRknQDpcITNiaAllittbCetRRZmxOeVtCN78y4ukdlbyzkvuw0WxfDuLIweHcAQn4MEllOEeKlAFBk/wDK/w5gycF+fd+Zi2LjmzmQP4A+fzBwMHkE0=</latexit>


(c) w := w + ⌘ ⇥ ( rw L) <latexit sha1_base64="FphU47ufT+xxOU2mgSketA7ntqU=">AAACMnicbVBNSwMxEM36bf2qevQSLEJFLLsqKIIgelHwoGBtoVvKbJrVYDa7JLNKWfY3efGXCB70oIhXf4RpLajVB4GXNzPMmxckUhh03SdnaHhkdGx8YrIwNT0zO1ecX7gwcaoZr7JYxroegOFSKF5FgZLXE80hCiSvBdeH3XrthmsjYnWOnYQ3I7hUIhQM0Eqt4rEfAV4FYXab0909+uO3Rn2OQH0UETe0vE59BYGEVvbdk/coA5md5KutYsmtuD3Qv8TrkxLp47RVfPDbMUsjrpBJMKbhuQk2M9AomOR5wU8NT4BdwyVvWKrA+mhmvZNzumKVNg1jbZ9C2lN/TmQQGdOJAtvZ9WgGa13xv1ojxXCnmQmVpMgV+1oUppJiTLv50bbQnKHsWAJMC+uVsivQwNCmXLAheIMn/yUXGxVvs+KebZX2D/pxTJAlskzKxCPbZJ8ckVNSJYzckUfyQl6de+fZeXPev1qHnP7MIvkF5+MTumWqfQ==</latexit>

b := b + ⌘ ⇥ ( rb L)
<latexit sha1_base64="4UyIQU5LvSnufAbl66XgeInFnL8=">AAACFnicbVDLSgNBEJz1bXxFPXoZDEJEEnZVUARB9OLBQwSjgWwIPZOJDpmdXWZ6hbDkK7z4K148KOJVvPk3Th4HNRY0FFXddHexREmLvv/lTUxOTc/Mzs3nFhaXllfyq2vXNk4NF1Ueq9jUGFihpBZVlKhELTECIqbEDeuc9f2be2GsjPUVdhPRiOBWy7bkgE5q5kuMHh1TtkNDgUBDlJGwtFiioQamoJmxXhgB3nFQ2UVvu5kv+GV/ADpOghEpkBEqzfxn2Ip5GgmNXIG19cBPsJGBQcmV6OXC1IoEeAduRd1RDW59Ixu81aNbTmnRdmxcaaQD9edEBpG13Yi5zv6N9q/XF//z6im2DxuZ1EmKQvPhonaqKMa0nxFtSSM4qq4jwI10t1J+BwY4uiRzLoTg78vj5Hq3HOyV/cv9wsnpKI45skE2SZEE5ICckHNSIVXCyQN5Ii/k1Xv0nr03733YOuGNZtbJL3gf32YVnbQ=</latexit>

{ <latexit sha1_base64="HqmpBwZTVZzQ+LXmwMEWLfT2Iq0=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0oMeiF49V7Ae0oWy2k3bpZhN2N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz00Mv65Ypbdecgq8TLSQVyNPrlr94gZmmE0jBBte56bmL8jCrDmcBpqZdqTCgb0yF2LZU0Qu1n80un5MwqAxLGypY0ZK7+nshopPUkCmxnRM1IL3sz8T+vm5rw2s+4TFKDki0WhakgJiazt8mAK2RGTCyhTHF7K2EjqigzNpySDcFbfnmVtC6q3mXVva9V6jd5HEU4gVM4Bw+uoA530IAmMAjhGV7hzRk7L86787FoLTj5zDH8gfP5A5wujWc=</latexit>
learning rate
negative gradient
Sebastian Raschka STAT 479: Deep Learning SS 2019 !20
(Least-Squares) Linear Regression
The update rule turns out to be this:
"On-line" mode :
Vectorized For-Loop
m 1
1. Initialize w := 0
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit>
, b := 0
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

1. Initialize w := 0m
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit>
1
, b := 0
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch: 2. For every training epoch:


A. For every hx[i] , y [i] i 2 D <latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>
A. For every hx[i] , y [i] i 2 D <latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>

(a) ŷ [i]
:= <latexit sha1_base64="oOGy76Ku6BuYykfyTG/uo81hVcc=">AAACLHicbVDLSsNAFJ34rPUVdelmsAgVoSQqKIJQ7MZlhb4giWUynbRDJw9mJmoI+SA3/oogLizi1u9w0lbQ1gMDh3POZe49bsSokIYx0hYWl5ZXVgtrxfWNza1tfWe3JcKYY9LEIQt5x0WCMBqQpqSSkU7ECfJdRtrusJb77XvCBQ2Dhkwi4vioH1CPYiSV1NVr9gDJNMnuUos6Gby8gragfR/ZLu2XbR/JgeuljxO7kcEf5SGD8Bi6MI8ddfWSUTHGgPPEnJISmKLe1V/tXohjnwQSMySEZRqRdFLEJcWMZEU7FiRCeIj6xFI0QD4RTjo+NoOHSulBL+TqBRKO1d8TKfKFSHxXJfNlxayXi/95Viy9CyelQRRLEuDJR17MoAxh3hzsUU6wZIkiCHOqdoV4gDjCUvVbVCWYsyfPk9ZJxTytGLdnper1tI4C2AcHoAxMcA6q4AbUQRNg8ARewDsYac/am/ahfU6iC9p0Zg/8gfb1Dbbkp9A=</latexit>
x[i]T w + b ,
<latexit sha1_base64="a8HZrzMenbUghj5rZUsNy6/jbjk=">AAAB+HicbVA9SwNBEN2LXzF+5NTSZjEIVuFOBS2DNhYWEcwHJEfY2+wlS/Z2j905JR75JTYWitj6U+z8N26SKzTxwcDjvRlm5oWJ4AY879sprKyurW8UN0tb2zu7ZXdvv2lUqilrUCWUbofEMMElawAHwdqJZiQOBWuFo+up33pg2nAl72GcsCAmA8kjTglYqeeWu7csAs0HQyBaq8eeW/Gq3gx4mfg5qaAc9Z771e0rmsZMAhXEmI7vJRBkRAOngk1K3dSwhNARGbCOpZLEzATZ7PAJPrZKH0dK25KAZ+rviYzExozj0HbGBIZm0ZuK/3mdFKLLIOMySYFJOl8UpQKDwtMUcJ9rRkGMLSFUc3srpkOiCQWbVcmG4C++vEyap1X/rOrdnVdqV3kcRXSIjtAJ8tEFqqEbVEcNRFGKntErenOenBfn3fmYtxacfOYA/YHz+QM44JNz</latexit>
[i]
(a) ŷ := <latexit sha1_base64="oOGy76Ku6BuYykfyTG/uo81hVcc=">AAACLHicbVDLSsNAFJ34rPUVdelmsAgVoSQqKIJQ7MZlhb4giWUynbRDJw9mJmoI+SA3/oogLizi1u9w0lbQ1gMDh3POZe49bsSokIYx0hYWl5ZXVgtrxfWNza1tfWe3JcKYY9LEIQt5x0WCMBqQpqSSkU7ECfJdRtrusJb77XvCBQ2Dhkwi4vioH1CPYiSV1NVr9gDJNMnuUos6Gby8gragfR/ZLu2XbR/JgeuljxO7kcEf5SGD8Bi6MI8ddfWSUTHGgPPEnJISmKLe1V/tXohjnwQSMySEZRqRdFLEJcWMZEU7FiRCeIj6xFI0QD4RTjo+NoOHSulBL+TqBRKO1d8TKfKFSHxXJfNlxayXi/95Viy9CyelQRRLEuDJR17MoAxh3hzsUU6wZIkiCHOqdoV4gDjCUvVbVCWYsyfPk9ZJxTytGLdnper1tI4C2AcHoAxMcA6q4AbUQRNg8ARewDsYac/am/ahfU6iC9p0Zg/8gfb1Dbbkp9A=</latexit>
x[i]T w + b
(b) rw L = y [i] <latexit sha1_base64="lT23XFPchd8kc3sHX+4ft8LZq5M=">AAACQHicbVBLTxsxGPRSHiEFGuiRi0VUCQ5Eu1AJLkgRXHroIZXIQ8ou0beON7Hi9a5sLxBZ+9N64Sf01jMXDkWo157qTRbxHMnSeOYb+fOEKWdKu+5vZ+HD4tLySmW1+nFtfeNTbXOro5JMEtomCU9kLwRFORO0rZnmtJdKCnHIaTecnBV+95JKxRJxrqcpDWIYCRYxAtpKg1rXFxByGBg/Bj0OI3OV53jGCXDzPccn2A/ZaNdM8wvTZ0GO97E/Bv10L+w9/Bi/LuVBre423BnwW+KVpI5KtAa1X/4wIVlMhSYclOp7bqoDA1Izwmle9TNFUyATGNG+pQJiqgIzKyDHX6wyxFEi7REaz9TnCQOxUtM4tJPFnuq1V4jvef1MR8eBYSLNNBVk/lCUcawTXLSJh0xSovnUEiCS2V0xGYMEom3nVVuC9/rLb0nnoOEdNtwfX+vN07KOCtpGO2gXeegINdE31EJtRNBPdIv+oHvnxrlzHpy/89EFp8x8Ri/g/PsPLyyxUA==</latexit>
ŷ [i] x[i] B. For weight j in {1, ..., m}:
rb L = y [i] ŷ [i]
<latexit sha1_base64="bG/7iZYm4z3wJyVWyFcMRLqQjzA=">AAACJ3icbVDLSgMxFM34tr5GXboJFqEuLDMq6EYpunHhQsE+oDOWO2nahmYyQ5IRyjB/48ZfcSOoiC79EzNtwUc9EDg5J5ebc4KYM6Ud58Oamp6ZnZtfWCwsLa+srtnrGzUVJZLQKol4JBsBKMqZoFXNNKeNWFIIA07rQf889+t3VCoWiRs9iKkfQlewDiOgjdSyTz0BAYdWGmTYC0H3CPD0MsMn2AtYt5QOstu0yfwM72GvB/r7ntu7uGUXnbIzBJ4k7pgU0RhXLfvZa0ckCanQhINSTdeJtZ+C1IxwmhW8RNEYSB+6tGmogJAqPx3mzPCOUdq4E0lzhMZD9edECqFSg9BE2cmjqL9eLv7nNRPdOfZTJuJEU0FGizoJxzrCeWm4zSQlmg8MASKZ+SsmPZBAtKm2YEpw/0aeJLX9sntQdq4Pi5WzcR0LaAttoxJy0RGqoAt0haqIoHv0iF7Qq/VgPVlv1vvo6ZQ1ntlEv2B9fgGqgKXG</latexit>

(b) @L = y [i] ŷ [i] xj


[i]
(c) w := w + ⌘ ⇥ ( rw L) @wj
@L
<latexit sha1_base64="IwZFhxXyrxOOHy5LHf3+Vsnhr60=">AAACR3icbVC7SgNBFJ2NrxhfUUubwSBoYdhVQRshaGNhoWBUyK7L3clsMjr7YGZWDcP+nY2tnb9gY6GIpbNxweeBgXPPuZe59wQpZ1LZ9qNVGRkdG5+oTtampmdm5+rzC6cyyQShbZLwRJwHIClnMW0rpjg9TwWFKOD0LLjaL/yzayokS+ITNUipF0EvZiEjoIzk1y/cUADRbgpCMeDYjUD1CXB9mOdf6o1/mWO8i9fdgPVW9SC/0B3m5Xgdu31QX3Vhr2F9W9b+pV9v2E17CPyXOCVpoBJHfv3B7SYki2isCAcpO46dKk8XexBO85qbSZoCuYIe7RgaQ0Slp4c55HjFKF0cJsK8WOGh+n1CQyTlIApMZ3Gm/O0V4n9eJ1PhjqdZnGaKxuTzozDjWCW4CBV3maBE8YEhQAQzu2LSBxOsMtHXTAjO75P/ktONprPZtI+3Gq29Mo4qWkLLaBU5aBu10AE6Qm1E0B16Qi/o1bq3nq036/2ztWKVM4voByrWB3Rksss=</latexit>

<latexit sha1_base64="FphU47ufT+xxOU2mgSketA7ntqU=">AAACMnicbVBNSwMxEM36bf2qevQSLEJFLLsqKIIgelHwoGBtoVvKbJrVYDa7JLNKWfY3efGXCB70oIhXf4RpLajVB4GXNzPMmxckUhh03SdnaHhkdGx8YrIwNT0zO1ecX7gwcaoZr7JYxroegOFSKF5FgZLXE80hCiSvBdeH3XrthmsjYnWOnYQ3I7hUIhQM0Eqt4rEfAV4FYXab0909+uO3Rn2OQH0UETe0vE59BYGEVvbdk/coA5md5KutYsmtuD3Qv8TrkxLp47RVfPDbMUsjrpBJMKbhuQk2M9AomOR5wU8NT4BdwyVvWKrA+mhmvZNzumKVNg1jbZ9C2lN/TmQQGdOJAtvZ9WgGa13xv1ojxXCnmQmVpMgV+1oUppJiTLv50bbQnKHsWAJMC+uVsivQwNCmXLAheIMn/yUXGxVvs+KebZX2D/pxTJAlskzKxCPbZJ8ckVNSJYzckUfyQl6de+fZeXPev1qHnP7MIvkF5+MTumWqfQ==</latexit>

b := b + ⌘ ⇥ ( rb L) (c) wj := wj + ⌘ ⇥ ( )
<latexit sha1_base64="4UyIQU5LvSnufAbl66XgeInFnL8=">AAACFnicbVDLSgNBEJz1bXxFPXoZDEJEEnZVUARB9OLBQwSjgWwIPZOJDpmdXWZ6hbDkK7z4K148KOJVvPk3Th4HNRY0FFXddHexREmLvv/lTUxOTc/Mzs3nFhaXllfyq2vXNk4NF1Ueq9jUGFihpBZVlKhELTECIqbEDeuc9f2be2GsjPUVdhPRiOBWy7bkgE5q5kuMHh1TtkNDgUBDlJGwtFiioQamoJmxXhgB3nFQ2UVvu5kv+GV/ADpOghEpkBEqzfxn2Ip5GgmNXIG19cBPsJGBQcmV6OXC1IoEeAduRd1RDW59Ixu81aNbTmnRdmxcaaQD9edEBpG13Yi5zv6N9q/XF//z6im2DxuZ1EmKQvPhonaqKMa0nxFtSSM4qq4jwI10t1J+BwY4uiRzLoTg78vj5Hq3HOyV/cv9wsnpKI45skE2SZEE5ICckHNSIVXCyQN5Ii/k1Xv0nr03733YOuGNZtbJL3gf32YVnbQ=</latexit>

@wj
{

@L
<latexit sha1_base64="nF1f4UxlvznbL34ywCUov53skk4=">AAACMHicbVDLSgMxFM34tr6qLt0Ei6CIZUYFRRBEF7pwoWBroVPKnTRjo5kHyR2lDPNJbvwU3Sgo4tavMNMW0eqBkMO59ybnHi+WQqNtv1hDwyOjY+MTk4Wp6ZnZueL8QlVHiWK8wiIZqZoHmksR8goKlLwWKw6BJ/mld3OU1y9vudIiCi+wE/NGAFeh8AUDNFKzeHzXvKZ7+zS/1qnLEaiLIuCarm5Q11fAUjcGhQIkdQPANgOZnmZZSr9lM5qtNYslu2x3Qf8Sp09KpI+zZvHRbUUsCXiITILWdceOsZHmbzLJs4KbaB4Du4ErXjc0BOOpkXYXzuiKUVrUj5Q5IdKu+nMihUDrTuCZztyzHqzl4n+1eoL+biMVYZwgD1nvIz+RFCOap0dbQnGGsmMIMCWMV8raYFJCk3HBhOAMrvyXVDfLzlbZPt8uHRz245ggS2SZrBKH7JADckLOSIUwck+eyCt5sx6sZ+vd+ui1Dln9mUXyC9bnF0D0qJs=</latexit>
<latexit sha1_base64="HqmpBwZTVZzQ+LXmwMEWLfT2Iq0=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0oMeiF49V7Ae0oWy2k3bpZhN2N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz00Mv65Ypbdecgq8TLSQVyNPrlr94gZmmE0jBBte56bmL8jCrDmcBpqZdqTCgb0yF2LZU0Qu1n80un5MwqAxLGypY0ZK7+nshopPUkCmxnRM1IL3sz8T+vm5rw2s+4TFKDki0WhakgJiazt8mAK2RGTCyhTHF7K2EjqigzNpySDcFbfnmVtC6q3mXVva9V6jd5HEU4gVM4Bw+uoA530IAmMAjhGV7hzRk7L86787FoLTj5zDH8gfP5A5wujWc=</latexit>

learning rate C. = y [i] ŷ [i]


@b
<latexit sha1_base64="PhLenzxUnQHBRyh5sDtFjLMS2Dk=">AAACOXicbVC7SgNBFJ2NrxhfUUubwRCIRcKuCtoIoo2FRQTzgOwa7k5mk8HZBzOzQlj2t2z8CzvBxkIRW3/A2SQQTbwwcO45c7n3HDfiTCrTfDFyC4tLyyv51cLa+sbmVnF7pynDWBDaICEPRdsFSTkLaEMxxWk7EhR8l9OWe3+Z6a0HKiQLg1s1jKjjQz9gHiOgNNUt1m1PAEnsCIRiwLHtgxoQ4Ml1mk5ZN8X4DFdtl/UryTC9SzrMSXEV2wNQ0z6TD7rFklkzR4XngTUBJTSperf4bPdCEvs0UISDlB3LjJSTZJsJp2nBjiWNgNxDn3Y0DMCn0klGzlNc1kwPe6HQL1B4xP6eSMCXcujr+8uZMTmrZeR/WidW3qmTsCCKFQ3IeJEXc6xCnMWIe0xQovhQAyCC6VsxGYCOUumwCzoEa9byPGge1qyjmnlzXDq/mMSRR3toH1WQhU7QObpCddRABD2iV/SOPown4834NL7GX3PGZGYX/Snj+wf9E62Q</latexit>

@L
negative gradient b := b + ⌘ ⇥ ( )
@b
<latexit sha1_base64="B3uxUTIFObznd8Z1Q4hI5WItBVg=">AAACKnicbVDJSgNBFOxxN25Rj14eBkERw4wKiiC4XDx4UDAqZEJ40+kxjT0L3W+EMMz3ePFXvHhQxKsfYk8M4lbQUNRbul4FqZKGXPfVGRoeGR0bn5isTE3PzM5V5xcuTZJpLho8UYm+DtAIJWPRIElKXKdaYBQocRXcHpf1qzuhjUziC+qlohXhTSxDyZGs1K4eBrC3DwGsgy8IwScZCQOrG+CHGnnup6hJogI/QupyVPlpUeTwJQfFWrtac+tuH/CXeANSYwOctatPfifhWSRi4gqNaXpuSq283MiVKCp+ZkSK/BZvRNPSGK2jVt4/tYAVq3QgTLR9MUFf/T6RY2RMLwpsZ+nY/K6V4n+1ZkbhbiuXcZqRiPnnR2GmgBIoc4OO1IKT6lmCXEvrFXgXbUZk063YELzfJ/8ll5t1b6vunm/XDo4GcUywJbbMVpnHdtgBO2FnrME4u2eP7Jm9OA/Ok/PqvH22DjmDmUX2A877B8tDpcU=</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !21


(Least-Squares) Linear Regression
The update rule turns out to be this:
"On-line" mode
m 1
1. Initialize w := 0
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit>
, b := 0
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch:


[i] [i] Coincidentally, this appears almost
A. For every hx , y i 2 D <latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>

to be the same as the perceptron rule,


(a) ŷ [i]
:= x[i]T w + b except that the prediction is a real number

and we have a learning rate
<latexit sha1_base64="oOGy76Ku6BuYykfyTG/uo81hVcc=">AAACLHicbVDLSsNAFJ34rPUVdelmsAgVoSQqKIJQ7MZlhb4giWUynbRDJw9mJmoI+SA3/oogLizi1u9w0lbQ1gMDh3POZe49bsSokIYx0hYWl5ZXVgtrxfWNza1tfWe3JcKYY9LEIQt5x0WCMBqQpqSSkU7ECfJdRtrusJb77XvCBQ2Dhkwi4vioH1CPYiSV1NVr9gDJNMnuUos6Gby8gragfR/ZLu2XbR/JgeuljxO7kcEf5SGD8Bi6MI8ddfWSUTHGgPPEnJISmKLe1V/tXohjnwQSMySEZRqRdFLEJcWMZEU7FiRCeIj6xFI0QD4RTjo+NoOHSulBL+TqBRKO1d8TKfKFSHxXJfNlxayXi/95Viy9CyelQRRLEuDJR17MoAxh3hzsUU6wZIkiCHOqdoV4gDjCUvVbVCWYsyfPk9ZJxTytGLdnper1tI4C2AcHoAxMcA6q4AbUQRNg8ARewDsYac/am/ahfU6iC9p0Zg/8gfb1Dbbkp9A=</latexit>

B. For weight j in {1, ..., m}:

(b) @L = y [i] ŷ [i] xj


[i]
@wj
@L
<latexit sha1_base64="IwZFhxXyrxOOHy5LHf3+Vsnhr60=">AAACR3icbVC7SgNBFJ2NrxhfUUubwSBoYdhVQRshaGNhoWBUyK7L3clsMjr7YGZWDcP+nY2tnb9gY6GIpbNxweeBgXPPuZe59wQpZ1LZ9qNVGRkdG5+oTtampmdm5+rzC6cyyQShbZLwRJwHIClnMW0rpjg9TwWFKOD0LLjaL/yzayokS+ITNUipF0EvZiEjoIzk1y/cUADRbgpCMeDYjUD1CXB9mOdf6o1/mWO8i9fdgPVW9SC/0B3m5Xgdu31QX3Vhr2F9W9b+pV9v2E17CPyXOCVpoBJHfv3B7SYki2isCAcpO46dKk8XexBO85qbSZoCuYIe7RgaQ0Slp4c55HjFKF0cJsK8WOGh+n1CQyTlIApMZ3Gm/O0V4n9eJ1PhjqdZnGaKxuTzozDjWCW4CBV3maBE8YEhQAQzu2LSBxOsMtHXTAjO75P/ktONprPZtI+3Gq29Mo4qWkLLaBU5aBu10AE6Qm1E0B16Qi/o1bq3nq036/2ztWKVM4voByrWB3Rksss=</latexit>

(c) wj := wj + ⌘ ⇥ ( )
@wj <latexit sha1_base64="nF1f4UxlvznbL34ywCUov53skk4=">AAACMHicbVDLSgMxFM34tr6qLt0Ei6CIZUYFRRBEF7pwoWBroVPKnTRjo5kHyR2lDPNJbvwU3Sgo4tavMNMW0eqBkMO59ybnHi+WQqNtv1hDwyOjY+MTk4Wp6ZnZueL8QlVHiWK8wiIZqZoHmksR8goKlLwWKw6BJ/mld3OU1y9vudIiCi+wE/NGAFeh8AUDNFKzeHzXvKZ7+zS/1qnLEaiLIuCarm5Q11fAUjcGhQIkdQPANgOZnmZZSr9lM5qtNYslu2x3Qf8Sp09KpI+zZvHRbUUsCXiITILWdceOsZHmbzLJs4KbaB4Du4ErXjc0BOOpkXYXzuiKUVrUj5Q5IdKu+nMihUDrTuCZztyzHqzl4n+1eoL+biMVYZwgD1nvIz+RFCOap0dbQnGGsmMIMCWMV8raYFJCk3HBhOAMrvyXVDfLzlbZPt8uHRz245ggS2SZrBKH7JADckLOSIUwck+eyCt5sx6sZ+vd+ui1Dln9mUXyC9bnF0D0qJs=</latexit>

@L
C. = y [i] ŷ [i]
@b
@L
<latexit sha1_base64="PhLenzxUnQHBRyh5sDtFjLMS2Dk=">AAACOXicbVC7SgNBFJ2NrxhfUUubwRCIRcKuCtoIoo2FRQTzgOwa7k5mk8HZBzOzQlj2t2z8CzvBxkIRW3/A2SQQTbwwcO45c7n3HDfiTCrTfDFyC4tLyyv51cLa+sbmVnF7pynDWBDaICEPRdsFSTkLaEMxxWk7EhR8l9OWe3+Z6a0HKiQLg1s1jKjjQz9gHiOgNNUt1m1PAEnsCIRiwLHtgxoQ4Ml1mk5ZN8X4DFdtl/UryTC9SzrMSXEV2wNQ0z6TD7rFklkzR4XngTUBJTSperf4bPdCEvs0UISDlB3LjJSTZJsJp2nBjiWNgNxDn3Y0DMCn0klGzlNc1kwPe6HQL1B4xP6eSMCXcujr+8uZMTmrZeR/WidW3qmTsCCKFQ3IeJEXc6xCnMWIe0xQovhQAyCC6VsxGYCOUumwCzoEa9byPGge1qyjmnlzXDq/mMSRR3toH1WQhU7QObpCddRABD2iV/SOPown4834NL7GX3PGZGYX/Snj+wf9E62Q</latexit>

b := b + ⌘ ⇥ ( )
@b <latexit sha1_base64="B3uxUTIFObznd8Z1Q4hI5WItBVg=">AAACKnicbVDJSgNBFOxxN25Rj14eBkERw4wKiiC4XDx4UDAqZEJ40+kxjT0L3W+EMMz3ePFXvHhQxKsfYk8M4lbQUNRbul4FqZKGXPfVGRoeGR0bn5isTE3PzM5V5xcuTZJpLho8UYm+DtAIJWPRIElKXKdaYBQocRXcHpf1qzuhjUziC+qlohXhTSxDyZGs1K4eBrC3DwGsgy8IwScZCQOrG+CHGnnup6hJogI/QupyVPlpUeTwJQfFWrtac+tuH/CXeANSYwOctatPfifhWSRi4gqNaXpuSq283MiVKCp+ZkSK/BZvRNPSGK2jVt4/tYAVq3QgTLR9MUFf/T6RY2RMLwpsZ+nY/K6V4n+1ZkbhbiuXcZqRiPnnR2GmgBIoc4OO1IKT6lmCXEvrFXgXbUZk063YELzfJ/8ll5t1b6vunm/XDo4GcUywJbbMVpnHdtgBO2FnrME4u2eP7Jm9OA/Ok/PqvH22DjmDmUX2A877B8tDpcU=</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !22


This learning rule (from the previous slide) 

is called (stochastic) gradient descent.

So, how did we get there?


Sebastian Raschka STAT 479: Deep Learning SS 2019 !23


DISCUSS HOMEWORK
Due next Thursday (Feb 21) 11:59 pm

https://github.com/rasbt/stat479-deep-learning-ss19/blob/master/hw2/hw2.ipynb

(explain LaTeX editing)

Sebastian Raschka STAT 479: Machine Learning FS 2018 !24


First, let's briefly cover relevant background
info ...

Sebastian Raschka STAT 479: Deep Learning SS 2019 !25


Differential Calculus Refresher
Derivative of a function = "rate of change" = "slope"
f(x)

f (x) = 2x
<latexit sha1_base64="lmLn5ONwH+HqrxvU2b/L/d6cVRo=">AAAB8HicbVBNSwMxEJ31s9avqkcvwSLUS9mtgl6EohePFeyHtEvJptk2NMkuSVZalv4KLx4U8erP8ea/MW33oK0PBh7vzTAzL4g508Z1v52V1bX1jc3cVn57Z3dvv3Bw2NBRogitk4hHqhVgTTmTtG6Y4bQVK4pFwGkzGN5O/eYTVZpF8sGMY+oL3JcsZAQbKz2GpdEZukaVUbdQdMvuDGiZeBkpQoZat/DV6UUkEVQawrHWbc+NjZ9iZRjhdJLvJJrGmAxxn7YtlVhQ7aezgyfo1Co9FEbKljRopv6eSLHQeiwC2ymwGehFbyr+57UTE175KZNxYqgk80VhwpGJ0PR71GOKEsPHlmCimL0VkQFWmBibUd6G4C2+vEwalbJ3XnbvL4rVmyyOHBzDCZTAg0uowh3UoA4EBDzDK7w5ynlx3p2PeeuKk80cwR84nz/sfY8s</latexit>

f (a + a) f (a) f (a + a) f (a)
Slope = =
a+ a
<latexit sha1_base64="Teb2O7xSWC1gxanCX+xo6rRJseE=">AAACUXicbZHPSxtBFMdftmo1Vk3bo5fBUFDEsKuF9iKI9uBR0aiQDeHt5K0ZnP3BzNtiWPZf9KAn/49eemhxEiNo4oOBL5/3vryZ70S5VpZ9/7HmfZibX/i4uFRf/rSyutb4/OXCZoWR1JaZzsxVhJa0SqnNijVd5YYwiTRdRjdHo/7lbzJWZek5D3PqJnidqlhJZId6jUHIdMvlmc5yqsS+CGODsow3UWyL8BdpRoFbYkc4slWVr6mDOOWozzpeSNVrNP2WPy4xK4KJaMKkTnqN+7CfySKhlKVGazuBn3O3RMNKaqrqYWEpR3mD19RxMsWEbLccJ1KJb470RZwZd1IWY/raUWJi7TCJ3GSCPLDTvRF8r9cpOP7ZLVWaF0ypfF4UF1pwJkbxir4yJFkPnUBplLurkAN0CbH7hLoLIZh+8qy42G0Fey3/9Hvz4HASxyKswwZsQgA/4ACO4QTaIOEO/sA/+F97qP31wPOeR73axPMV3pS3/ASP7K5W</latexit>
a a

Sebastian Raschka STAT 479: Deep Learning SS 2019 !26


Function Derivative
df f (x + 0 x) f (x)
f (x) = = lim
dx x!0
<latexit sha1_base64="fXJvnr64cv5mnPuJhaCVw07vMIU=">AAACTHicbVBNSxtBGJ6N9aPxK9qjl6FBiIhh1xb0Ioj14DFCE4VsDLOz7yZDZnaXmXc1Ydkf6MVDb/0VXnpoKQUnyVL8emHgmedjPp4glcKg6/50KgsfFpeWVz5WV9fWNzZrW9sdk2SaQ5snMtHXATMgRQxtFCjhOtXAVCDhKhh9m+pXt6CNSOLvOEmhp9ggFpHgDC3Vr/Hoxk+1UNAY79ET6kea8TyMijwcF9O9FKqf++cgkdEx9bUYDJFpndxRtyjdUWNM9+ncU7WnHFDL7BX/U0W/Vneb7mzoW+CVoE7KafVrP/ww4ZmCGLlkxnQ9N8VezjQKLqGo+pmBlPERG0DXwpgpML18VkZBdy0T0ijRdsVIZ+zzRM6UMRMVWKdiODSvtSn5ntbNMDru5SJOM4SYzy+KMkkxodNmaSg0cJQTCxjXwr6V8iGzDaHtv2pL8F5/+S3oHDa9L0338mv99KysY4XskM+kQTxyRE7JBWmRNuHknjyS3+SP8+D8cv46/+bWilNmPpEXU1l6AsHyseY=</latexit>
x

Example 1: f (x) = 2x <latexit sha1_base64="lmLn5ONwH+HqrxvU2b/L/d6cVRo=">AAAB8HicbVBNSwMxEJ31s9avqkcvwSLUS9mtgl6EohePFeyHtEvJptk2NMkuSVZalv4KLx4U8erP8ea/MW33oK0PBh7vzTAzL4g508Z1v52V1bX1jc3cVn57Z3dvv3Bw2NBRogitk4hHqhVgTTmTtG6Y4bQVK4pFwGkzGN5O/eYTVZpF8sGMY+oL3JcsZAQbKz2GpdEZukaVUbdQdMvuDGiZeBkpQoZat/DV6UUkEVQawrHWbc+NjZ9iZRjhdJLvJJrGmAxxn7YtlVhQ7aezgyfo1Co9FEbKljRopv6eSLHQeiwC2ymwGehFbyr+57UTE175KZNxYqgk80VhwpGJ0PR71GOKEsPHlmCimL0VkQFWmBibUd6G4C2+vEwalbJ3XnbvL4rVmyyOHBzDCZTAg0uowh3UoA4EBDzDK7w5ynlx3p2PeeuKk80cwR84nz/sfY8s</latexit>

df f (x + x) f (x)
= lim
dx x!0 x
2x + 2 x 2x
= lim
x!0 x
2 x
= lim
x!0 x
= lim 2.
<latexit sha1_base64="0faLF7zGP6YvXcCHFbx+6cWSr/4=">AAAC8XiclVLPT9swGHUCDBbYVtiRy6dVTKCJKukmwQWpYhw4dtIKSE1VOY7TWjhOZH8ZraL8F7vswDRx5b/Zjf9m7g9QBzuwT7L09L73nu3PjnIpDPr+neMuLa+8WF176a1vvHr9pra5dWayQjPeYZnM9EVEDZdC8Q4KlPwi15ymkeTn0eXnSf/8G9dGZOorjnPeS+lAiUQwipbqbzqrYaIpK+OkKuNRBe+PIJQi7ZfhCZdIYQShFoMhUq2zK/ArmMmT3RF8gHvNHux7ltmrHlxWF3rPzILmJKv5kAb7llmIen7SQsZ/+puNfq3uN/xpwVMQzEGdzKvdr/0O44wVKVfIJDWmG/g59kqqUTDJKy8sDM8pu6QD3rVQ0ZSbXjl9sQp2LBNDkmm7FMKUXXSUNDVmnEZWmVIcmse9CfmvXrfA5LBXCpUXyBWbbZQUEjCDyfNDLDRnKMcWUKaFPSuwIbXDQ/tJPDuE4PGVn4KzZiP42PC/fKq3jufjWCPb5B3ZJQE5IC1yStqkQ5ijnO/OtfPTNe4P95d7M5O6ztzzlvxV7u0fJVPmww==</latexit>
x!0

Sebastian Raschka STAT 479: Deep Learning SS 2019 !27


Numerical vs Analytical/Symbolical Derivatives
df f (x + 0 x) f (x)
f (x) = = lim
dx x!0
<latexit sha1_base64="fXJvnr64cv5mnPuJhaCVw07vMIU=">AAACTHicbVBNSxtBGJ6N9aPxK9qjl6FBiIhh1xb0Ioj14DFCE4VsDLOz7yZDZnaXmXc1Ydkf6MVDb/0VXnpoKQUnyVL8emHgmedjPp4glcKg6/50KgsfFpeWVz5WV9fWNzZrW9sdk2SaQ5snMtHXATMgRQxtFCjhOtXAVCDhKhh9m+pXt6CNSOLvOEmhp9ggFpHgDC3Vr/Hoxk+1UNAY79ET6kea8TyMijwcF9O9FKqf++cgkdEx9bUYDJFpndxRtyjdUWNM9+ncU7WnHFDL7BX/U0W/Vneb7mzoW+CVoE7KafVrP/ww4ZmCGLlkxnQ9N8VezjQKLqGo+pmBlPERG0DXwpgpML18VkZBdy0T0ijRdsVIZ+zzRM6UMRMVWKdiODSvtSn5ntbNMDru5SJOM4SYzy+KMkkxodNmaSg0cJQTCxjXwr6V8iGzDaHtv2pL8F5/+S3oHDa9L0338mv99KysY4XskM+kQTxyRE7JBWmRNuHknjyS3+SP8+D8cv46/+bWilNmPpEXU1l6AsHyseY=</latexit>
x

2
Example 2: f (x) = x <latexit sha1_base64="nwD2jAEV1D7Zbx3wOkE++Fdt1Ew=">AAAB8XicbVBNS8NAEJ3Ur1q/oh69LBahXkpSBb0IRS8eK9gPbGPZbDft0s0m7G6kJfRfePGgiFf/jTf/jds2B219MPB4b4aZeX7MmdKO823lVlbX1jfym4Wt7Z3dPXv/oKGiRBJaJxGPZMvHinImaF0zzWkrlhSHPqdNf3gz9ZtPVCoWiXs9jqkX4r5gASNYG+khKI1O0RUaPVa6dtEpOzOgZeJmpAgZal37q9OLSBJSoQnHSrVdJ9ZeiqVmhNNJoZMoGmMyxH3aNlTgkCovnV08QSdG6aEgkqaERjP190SKQ6XGoW86Q6wHatGbiv957UQHl17KRJxoKsh8UZBwpCM0fR/1mKRE87EhmEhmbkVkgCUm2oRUMCG4iy8vk0al7J6VnbvzYvU6iyMPR3AMJXDhAqpwCzWoAwEBz/AKb5ayXqx362PemrOymUP4A+vzB6LAj5Q=</latexit>

df f (x + x) f (x)
= lim
dx x!0 x
x2 + 2x x + ( x)2 x2
= lim
x!0 x
2x x + ( x)2
= lim
x!0 x
= lim 2x + x.
<latexit sha1_base64="fe/UWUoGd6zqIp3iG9F2zh3Rhf8=">AAADHXiclVJNbxMxEPUuFMpSaFqOXEZEVKlQo90UqVwqRYUDxyCRtlI2jbxeb2LV+yF7tmy02j/CpX+FC4dWiAMXxL/B+SAK/ZDakSw9v3nz7Bk7yKTQ6Lp/LPvBw5VHj1efOE/Xnj1fr21sHuo0V4x3WSpTdRxQzaVIeBcFSn6cKU7jQPKj4PT9JH90xpUWafIZxxnvx3SYiEgwioYabFi7fqQoK8OoKsOigq198KWIB6X/gUukUICvxHCEVKn0C7gVzORRo4A38E+zDTuOYbarRZXR+c4dvYqTlvFqFQs7s2ssrE9azo5RLFnf3fl2z3vaGaOldpuDWt1tutOA68CbgzqZR2dQ++WHKctjniCTVOue52bYL6lCwSSvHD/XPKPslA55z8CExlz3y+nrVvDaMCFEqTIrQZiyyxUljbUex4FRxhRH+mpuQt6U6+UYveuXIsly5AmbHRTlEjCFyVeBUCjOUI4NoEwJc1dgI2oGi+ZDOWYI3tWWr4PDVtPbbbqf3tbbB/NxrJKX5BVpEI/skTb5SDqkS5j11fpmXViX9rn93f5h/5xJbWte84L8F/bvv02L9eY=</latexit>
x!0

Sebastian Raschka STAT 479: Deep Learning SS 2019 !28


Numerical vs Analytical/Symbolical Derivatives

d 2
Conceptually, we obtained the derivative x = 2x
dx
<latexit sha1_base64="7x4NXN+3sPlqaNBSoS1xmjCOVaM=">AAAB/nicbVBNS8NAEJ34WetXVDx5WSyCp5JUQS9C0YvHCvYD2lo2m027dLMJuxtpCQH/ihcPinj1d3jz37htc9DWBwOP92aYmefFnCntON/W0vLK6tp6YaO4ubW9s2vv7TdUlEhC6yTikWx5WFHOBK1rpjltxZLi0OO06Q1vJn7zkUrFInGvxzHthrgvWMAI1kbq2YedQGKS+lnqjzI0eqigK1QZ9eySU3amQIvEzUkJctR69lfHj0gSUqEJx0q1XSfW3RRLzQinWbGTKBpjMsR92jZU4JCqbjo9P0MnRvFREElTQqOp+nsixaFS49AznSHWAzXvTcT/vHaig8tuykScaCrIbFGQcKQjNMkC+UxSovnYEEwkM7ciMsAmD20SK5oQ3PmXF0mjUnbPys7deal6ncdRgCM4hlNw4QKqcAs1qAOBFJ7hFd6sJ+vFerc+Zq1LVj5zAH9gff4AMKyU9g==</latexit>

By approximating the slope (tangent) by a second 



between two points (as before)
f(x)

x x
Sebastian Raschka STAT 479: Deep Learning SS 2019 !29
A Cheatsheet for You (1)
APPENDIX D. CALCULUS AND DIFFERENTIATION PRIMER 9

Function f (x) Derivative with respect to x


1 a 0
2 x 1
3 ax a
4 x2 2x
5 xa axa≠1
6 ax log(a)ax
7 log(x) 1/x
8 loga (x) 1/(x log(a))
9 sin(x) cos(x)
10 cos(x) ≠ sin(x)
11 tan(x) sec2 (x)

Table D.1: Derivatives of common functions.


Sebastian Raschka STAT 479: Deep Learning SS 2019 !30
A Cheatsheet for You (2)
APPENDIX D. CALCULUS AND DIFFERENTIATION PRIMER 1

Function Derivative
Sum Rule f (x) + g(x) f Õ (x) + g Õ (x)
Difference Rule f (x) ≠ g(x) f Õ (x) ≠ g Õ (x)
Product Rule f (x)g(x) f Õ (x)g(x) + f (x)g Õ (x)
Quotient Rule f (x)/g(x) [g(x)f Õ (x) ≠ f (x)g Õ (x)]/[g(x)]2
Reciprocal Rule 1/f (x) ≠[f Õ (x)]/[f (x)]2
Chain Rule f (g(x)) f Õ (g(x))g Õ (x)

Table D.2: Common differentiation rules.

D.4 The Chain Rule – Computing the Derivative of a


Composition
Sebastian Raschka ofSTAT
Functions
479: Deep Learning SS 2019 !31
Chain Rule

• The chain rule is basically the essence of training (deep) neural


networks
• If you understand and learn how to apply the chain rule to various
function decompositions, deep learning will be super easy and even
seem trivial to you from now on
• In fact, neural networks will become even easier to understand than
any algorithm you learned about in my previous ML class

Sebastian Raschka STAT 479: Deep Learning SS 2019 !32


Chain Rule & "Computation Graph" Intuition
APPENDIX D. CALCULUS AND DIFFERENTIATION PRIMER 11

Decomposition of some
(nested) function:
Figure D.3: Visual decomposition of a function

Using the chain rule, Figure D.4 illustrates how we can derive F (x) via
two parallel steps: We compute the derivative of the inner function g (i.e.,
Figure D.3:
"inner" partVisual decomposition
"outer" of apart
function
g (x)) and multiply it by the outer derivative f Õ (g(x)).
Õ

Using the chain rule, Figure D.4 illustrates how we can derive F (x) via
two parallel steps: We compute the derivative of the inner function g (i.e.,
g Õ (x)) and multiply it by the outer derivative f Õ (g(x)).

Derivative of that nested



function:

Sebastian Raschka STAT 479: Deep Learning SS 2019 !33


Figure D.4: Concept of the chain rule
Chain Rule & "Computation
DIX D. CALCULUS AND DIFFERENTIATION PRIMER 11
Graph" Intuition

( Later, we will see that PyTorch 



can do that automatically for us :) 

(PyTorch
Figure literally
D.3: Visual keeps a computation
decomposition of a function
>tixetal/<u9Y6iEgv+A3XGmDmnWSrX69c5l95SVHeXIIeTYgBQrlDgTTNBD+5KA3JMMKizaUtXd/usub3F5sk/fZ18CB2qOBalVDTsVybhlGllkqyZ0sCZsj/kZSCmEVRLLWuMIdoylguw881fifu9aFbxx0Zt5ANMzWaIzp/hSWhtQrSkYYVZisznfBymYONV1y9IieTUeqhnJOr5biC14UQK62ep3YSyUBzZFGM+/pqNgGFaA5VJ+HIXc3WaW9BPjc0K+21SRbUCSkSHxUQlTD49QEiaTeffn72Ta47l4ANmn6eJD0ovRRKwRdhunYksstxpJJT15hHo+z2dnd2dx7qzNWftlmTfpL6wUKTvMDMN/uHG8wqB1stj/Xzj/d1ijCPeFR/JI0J2NhZpb3m2yWo0cA7V84FpOG0Ul0UBbBL96hq/q1rU3JEAN8SNBVbcin7BAAA>"=8q5M1skFAS73IdmEe6gKo+U4B+I"=46esab_1ahs tixetal<

graph in the background)


Using the chain rule, Figure D.4 illustrates how we can derive F
partVisual decomposition
Figure D.3:
"inner" "outer" of apart
function
two parallel steps: We compute the derivative of the inner functio
g Õ (x)) and multiply it by the outer derivative f Õ (g(x)).
g the chain rule, Figure D.4 illustrates how we can derive F (x) via
llel steps: We compute the derivative of the inner function g (i.e.,
d multiply it by the outer derivative f Õ (g(x)).

(
Also, PyTorch can compute the derivatives

of most (differentiable) functions
automatically
<latexit sha1_base64="I+B4U+oKg6eEmdI37SAFks1M5q8=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GOpF48V7Ac0oWy2m3bpZhN2J0IJ/RFePCji1d/jzX/jts1Bqw8GHu/NMDMvTKUw6LpfTmltfWNzq7xd2dnd2z+oHh51TJJpxtsskYnuhdRwKRRvo0DJe6nmNA4l74aT27nffeTaiEQ94DTlQUxHSkSCUbRS12+K0cjPB9WaW3cXIH+JV5AaFGgNqp/+MGFZzBUySY3pe26KQU41Cib5rOJnhqeUTeiI9y1VNOYmyBfnzsiZVYYkSrQthWSh/pzIaWzMNA5tZ0xxbFa9ufif188wuglyodIMuWLLRVEmCSZk/jsZCs0ZyqkllGlhbyVsTDVlaBOq2BC81Zf/ks5F3busu/dXtUaziKMMJ3AK5+DBNTTgDlrQBgYTeIIXeHVS59l5c96XrSWnmDmGX3A+vgEi6Y9u</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !34


Figure D.4: Concept of the chain rule
Using the chain rule, Figure D.4 illustrates how we can derive F (x) via
Chain Rule & "Computation Graph" Intuition
two parallel steps: We compute the derivative of the inner function g (i.e.,
g Õ (x)) and multiply it by the outer derivative f Õ (g(x)).

Figure D.4: Concept of the chain rule


In text, for efficiency, we will mostly use the Leibniz notation:
Now, for the rest of the section, let us use the Leibniz notation, which makes
these concepts easier to follow:
d ⇥ d # ⇤ $ dfdfdg dg
f (g(x))
f (g(x)) =
= · ·. . (D.12)
dx dx dgdgdx dx
(Remember that the equation above is equivalent to writing F Õ (x) = f Õ (g(x))g Õ (x).)
<latexit sha1_base64="0HX4ZiMKXuVl3tnOeqOpyYA0wR0=">AAACLXicbVBLS8NAGNzUV62vqEcvi0VoLyFRQS9CUQ8eK9gHNKFsNpt06ebB7kZaQv+QF/+KCB4q4tW/4TaNoK0DC7PzzbD7jZswKqRpTrXSyura+kZ5s7K1vbO7p+8ftEWcckxaOGYx77pIEEYj0pJUMtJNOEGhy0jHHd7M5p1HwgWNowc5TogToiCiPsVIKqmv39o+RzjzJpk3mkDbpUHPrwW1Ub2eXxx4BQuHryyBsmAvlj9akMeMvl41DTMHXCZWQaqgQLOvv9pejNOQRBIzJETPMhPpZIhLihmZVOxUkAThIQpIT9EIhUQ4Wb7tBJ4oxYN+zNWJJMzV34kMhUKMQ1c5QyQHYnE2E/+b9VLpXzoZjZJUkgjPH/JTBmUMZ9VBj3KCJRsrgjCn6q8QD5AqQqqCK6oEa3HlZdI+Nawzw7w/rzauizrK4AgcgxqwwAVogDvQBC2AwRN4AVPwrj1rb9qH9jm3lrQicwj+QPv6Br4cqHU=</latexit>

D.4.1 A Chain Rule Example


Sebastian Raschka
now walk through an STAT 479: Deep Learning SS 2019 !35
Let us application of the chain rule, working through
Chain Rule Example
d ⇥ ⇤ df dg
f (g(x)) = · .
<latexit sha1_base64="0HX4ZiMKXuVl3tnOeqOpyYA0wR0=">AAACLXicbVBLS8NAGNzUV62vqEcvi0VoLyFRQS9CUQ8eK9gHNKFsNpt06ebB7kZaQv+QF/+KCB4q4tW/4TaNoK0DC7PzzbD7jZswKqRpTrXSyura+kZ5s7K1vbO7p+8ftEWcckxaOGYx77pIEEYj0pJUMtJNOEGhy0jHHd7M5p1HwgWNowc5TogToiCiPsVIKqmv39o+RzjzJpk3mkDbpUHPrwW1Ub2eXxx4BQuHryyBsmAvlj9akMeMvl41DTMHXCZWQaqgQLOvv9pejNOQRBIzJETPMhPpZIhLihmZVOxUkAThIQpIT9EIhUQ4Wb7tBJ4oxYN+zNWJJMzV34kMhUKMQ1c5QyQHYnE2E/+b9VLpXzoZjZJUkgjPH/JTBmUMZ9VBj3KCJRsrgjCn6q8QD5AqQqqCK6oEa3HlZdI+Nawzw7w/rzauizrK4AgcgxqwwAVogDvQBC2AwRN4AVPwrj1rb9qH9jm3lrQicwj+QPv6Br4cqHU=</latexit>
dx dg dx
p
Example: f (x) = log( x)
<latexit sha1_base64="gZuCBNdKWMv+HJVEvaDcgO6Rxzc=">AAAB/nicbVBNS8NAEN34WetXVDx5WSxCeymJCnoRil48VrAf0JSy2W7apZtN3J1ISyj4V7x4UMSrv8Ob/8Ztm4O2Phh4vDfDzDw/FlyD43xbS8srq2vruY385tb2zq69t1/XUaIoq9FIRKrpE80El6wGHARrxoqR0Bes4Q9uJn7jkSnNI3kPo5i1Q9KTPOCUgJE69mFQHJbwFfZE1Ct6+kFBOhyXOnbBKTtT4EXiZqSAMlQ79pfXjWgSMglUEK1brhNDOyUKOBVsnPcSzWJCB6THWoZKEjLdTqfnj/GJUbo4iJQpCXiq/p5ISaj1KPRNZ0igr+e9ifif10oguGynXMYJMElni4JEYIjwJAvc5YpRECNDCFXc3IppnyhCwSSWNyG48y8vkvpp2T0rO3fnhcp1FkcOHaFjVEQuukAVdIuqqIYoStEzekVv1pP1Yr1bH7PWJSubOUB/YH3+ALTClLA=</latexit>

df d d p
substituting = log(g) · x
<latexit sha1_base64="YGvZSvE0Ybb05wGUJ5EyISlioF8=">AAACLHicbVDLSsNAFJ34tr6qLt0MFkE3JVFBN4LYjcsK9gFNKZPJJB06ycSZG2kJ+SA3/oogLizi1u9w2mah1QMD555zL3fu8RLBNdj22FpYXFpeWV1bL21sbm3vlHf3mlqmirIGlUKqtkc0EzxmDeAgWDtRjESeYC1vUJv4rUemNJfxPYwS1o1IGPOAUwJG6pVrbqAIzfwgz/xhjvEVLgRTh6Z2hQyPwxPsUl8C/mGaZlc/KMiGea9csav2FPgvcQpSQQXqvfKr60uaRiwGKojWHcdOoJsRBZwKlpfcVLOE0AEJWcfQmERMd7PpsTk+MoqPA6nMiwFP1Z8TGYm0HkWe6YwI9PW8NxH/8zopBJfdjMdJCiyms0VBKjBIPEkO+1wxCmJkCKGKm79i2icmDzD5lkwIzvzJf0nztOqcVe2788r1TRHHGjpAh+gYOegCXaNbVEcNRNETekHvaGw9W2/Wh/U5a12wipl99AvW1zeCi6hn</latexit>
dx dg dx

d 1 1 d 1/2 1 1/2 1
with log(g) = = p and x = x = p .
dg g x
<latexit sha1_base64="yXRl6/XaFS+2sGuCpAmW71Mmyxg=">AAACH3icbZBNS8MwGMdTX+d8q3r0EhzCvIxWRb0IQy8eJ7gXWMtI07QLS9OapOIo/SZe/CpePCgi3vZtzLYe5uYfAv/8nuchef5ewqhUljUylpZXVtfWSxvlza3tnV1zb78l41Rg0sQxi0XHQ5IwyklTUcVIJxEERR4jbW9wO663n4iQNOYPapgQN0IhpwHFSGnUMy+cQCCc+Xnmhzl0WBxWw5NrOKV2nmk4c3Pko1DZc573zIpVsyaCi8YuTAUUavTMH8ePcRoRrjBDUnZtK1FuhoSimJG87KSSJAgPUEi62nIUEelmk/1yeKyJD4NY6MMVnNDZiQxFUg4jT3dGSPXlfG0M/6t1UxVcuRnlSaoIx9OHgpRBFcNxWNCngmDFhtogLKj+K8R9pMNQOtKyDsGeX3nRtE5r9lnNuj+v1G+KOErgEByBKrDBJaiDO9AATYDBC3gDH+DTeDXejS/je9q6ZBQzB+CPjNEvpf+jXw==</latexit>
<latexit sha1_base64="tc73FERyPqz/Nd7Zoxt1gBK9Ulw=">AAACLnicbZDLSgMxFIYz9VbrrerSTbAIbqwzVVAQoSiCywr2Ap1aMplMG5q5mGSkJeSJ3PgquhBUxK2PYXpZaOsPgZ/vnMPJ+b2EUSFt+83KzM0vLC5ll3Mrq2vrG/nNrZqIU45JFccs5g0PCcJoRKqSSkYaCSco9Bipe73LYb3+QLigcXQrBwlphagT0YBiJA1q56/cgCOsfK38vob9O+UcljQ8h2PsaFXSBh7MUOieQVfcc6n6Whfb+YJdtEeCs8aZmAKYqNLOv7h+jNOQRBIzJETTsRPZUohLihnROTcVJEG4hzqkaWyEQiJaanSuhnuG+DCIuXmRhCP6e0KhUIhB6JnOEMmumK4N4X+1ZiqD05aiUZJKEuHxoiBlUMZwmB30KSdYsoExCHNq/gpxF5lIpEk4Z0Jwpk+eNbVS0Tkq2jfHhfLFJI4s2AG7YB844ASUwTWogCrA4BE8g3fwYT1Zr9an9TVuzViTmW3wR9b3D8gMp78=</latexit>
dx 2 2 x

df 1 1 1
leads us to the solution =p · p = .
<latexit sha1_base64="CUrcLaOAuKQFmmFC6uXj+aWm7tE=">AAACOXicbZDLSgMxFIYzXmu9jbp0EyyCqzJTBQUpFN24rGAv0Cklk8m0oZmLyRlpGea13PgW7gQ3LhRx6wuYXqDaeiDw5//OITm/GwuuwLJejKXlldW19dxGfnNre2fX3NuvqyiRlNVoJCLZdIligoesBhwEa8aSkcAVrOH2r0e88cCk4lF4B8OYtQPSDbnPKQFtdcyq40tCU8/PUm+Q4TKe3O0sddS9hHSQZdihXgQzUMLOJZ7Rcn5GBlmxYxasojUuvCjsqSigaVU75rPjRTQJWAhUEKVathVDOyUSOBUsyzuJYjGhfdJlLS1DEjDVTsebZ/hYOx72I6lPCHjs/p5ISaDUMHB1Z0Cgp+bZyPyPtRLwL9opD+MEWEgnD/mJwBDhUYzY45JREEMtCJVc/xXTHtE5gA47r0Ow51deFPVS0T4tWrdnhcrVNI4cOkRH6ATZ6BxV0A2qohqi6BG9onf0YTwZb8an8TVpXTKmMwfoTxnfP30Frd0=</latexit>
dx x 2 x 2x

Sebastian Raschka STAT 479: Deep Learning SS 2019 !36


Chain Rule for Arbitrarily Long Function Compositions

F (x) = f (g(h(u(v(x)))))
<latexit sha1_base64="9o/kQxCHJX146iErJjnt6uy3iOA=">AAACAHicbVDLSsNAFL2pr1pfURcu3AwWIW5KooJuhKIgLivYB7ShTKaTdujkwcykWEI3/oobF4q49TPc+TdO2iy09cCFM+fcy9x7vJgzqWz72ygsLa+srhXXSxubW9s75u5eQ0aJILROIh6Jlocl5SykdcUUp61YUBx4nDa94U3mN0dUSBaFD2ocUzfA/ZD5jGClpa55cGs9nqAr5Ft9a2Al1kg/M3TNsl2xp0CLxMlJGXLUuuZXpxeRJKChIhxL2XbsWLkpFooRTielTiJpjMkQ92lb0xAHVLrp9IAJOtZKD/mR0BUqNFV/T6Q4kHIceLozwGog571M/M9rJ8q/dFMWxomiIZl95CccqQhlaaAeE5QoPtYEE8H0rogMsMBE6cxKOgRn/uRF0jitOGcV+/68XL3O4yjCIRyBBQ5cQBXuoAZ1IDCBZ3iFN+PJeDHejY9Za8HIZ/bhD4zPH8T4k0w=</latexit>

dF d d
= F (x) = f (g(h(u(v(x)))))
dx dx dx
df dg dh du dv
= · · · ·
<latexit sha1_base64="cQdT9m7HczDfUS0cOsjwW5Bf0Vk=">AAACjnicbZFbS8MwFMfTep+3qo++BIdSX0aroqIMh4L4qOBUWMdI07QNSy/kMhylH8cv5JvfxmyrqNU/BP7nd87hJCd+zqiQjvNhmHPzC4tLyyuN1bX1jU1ra/tJZIpj0sUZy/iLjwRhNCVdSSUjLzknKPEZefaHN5P884hwQbP0UY5z0k9QlNKQYiQ1GlhvXsgRLoLbsgheS9iGVTwLb+3XQ3hQg6Ed2bGt7JFOTuR5je+SUNdEJfRwkMkvFmkW11ismaoxpdmobPxio+nMgdV0Ws5U8K9xK9MEle4H1rsXZFglJJWYISF6rpPLfoG4pJgRPUMJkiM8RBHpaZuihIh+MV1nCfc1CWCYcX1SCaf0Z0eBEiHGia8rEyRjUc9N4H+5npLheb+gaa4kSfFsUKgYlBmc/A0MKCdYsrE2CHOq7wpxjPQepP7Bhl6CW3/yX/N01HKPW87DSbNzXa1jGeyCPWADF5yBDrgD96ALsLFquMaFcWla5qnZNq9mpaZR9eyAXzLvPgHMvcaL</latexit>
dg dh du dv dx

Sebastian Raschka STAT 479: Deep Learning SS 2019 !37


Chain Rule for Arbitrarily Long Function Compositions
dF d d
= F (x) = f (g(h(u(v(x)))))
dx dx dx
df dg dh du dv
= · · · ·
<latexit sha1_base64="cQdT9m7HczDfUS0cOsjwW5Bf0Vk=">AAACjnicbZFbS8MwFMfTep+3qo++BIdSX0aroqIMh4L4qOBUWMdI07QNSy/kMhylH8cv5JvfxmyrqNU/BP7nd87hJCd+zqiQjvNhmHPzC4tLyyuN1bX1jU1ra/tJZIpj0sUZy/iLjwRhNCVdSSUjLzknKPEZefaHN5P884hwQbP0UY5z0k9QlNKQYiQ1GlhvXsgRLoLbsgheS9iGVTwLb+3XQ3hQg6Ed2bGt7JFOTuR5je+SUNdEJfRwkMkvFmkW11ismaoxpdmobPxio+nMgdV0Ws5U8K9xK9MEle4H1rsXZFglJJWYISF6rpPLfoG4pJgRPUMJkiM8RBHpaZuihIh+MV1nCfc1CWCYcX1SCaf0Z0eBEiHGia8rEyRjUc9N4H+5npLheb+gaa4kSfFsUKgYlBmc/A0MKCdYsrE2CHOq7wpxjPQepP7Bhl6CW3/yX/N01HKPW87DSbNzXa1jGeyCPWADF5yBDrgD96ALsLFquMaFcWla5qnZNq9mpaZR9eyAXzLvPgHMvcaL</latexit>
dg dh du dv dx

Also called "reverse mode" as we start



with the outer function. In neural nets, this will be from 

right to left.

We could also start from the inner parts ("forward mode")


dv du dh dg df
· · · ·
dx dv du dh dg
<latexit sha1_base64="ssmNqSoVqmQ8rYIzHbrinR2VY1I=">AAACRnicbZA7T8MwFIVvyquUV4CRxaJCYqoSQIKxgoWxSPQhtVHlOE5r1XnIdiqqKL+OhZmNn8DCAEKsOG0G0nIkS0ffuVe2jxtzJpVlvRmVtfWNza3qdm1nd2//wDw86sgoEYS2ScQj0XOxpJyFtK2Y4rQXC4oDl9OuO7nL8+6UCsmi8FHNYuoEeBQynxGsNBqazsAXmKTeNEu9pwwNiBcpVLAky3mZjbOcl9koy3mZ+VnOh2bdalhzoVVjF6YOhVpD83XgRSQJaKgIx1L2bStWToqFYoTTrDZIJI0xmeAR7Wsb4oBKJ53XkKEzTTzkR0KfUKE5/buR4kDKWeDqyQCrsVzOcvhf1k+Uf+OkLIwTRUOyuMhPOFIRyjtFHhOUKD7TBhPB9FsRGWPdg9LN13QJ9vKXV03nomFfNqyHq3rztqijCidwCudgwzU04R5a0AYCz/AOn/BlvBgfxrfxsxitGMXOMZRUgV+SM7Qq</latexit>

• Backpropagation (covered later) is basically "reverse" mode auto-differentiation


• It is cheaper than forward mode if we work with gradients, since then we have
matrix-"vector" multiplications instead of matrix multiplications

Sebastian Raschka STAT 479: Deep Learning SS 2019 !38


Gradients: Derivatives of Multivariable* Functions
*note that in some fields, the terms "multivariable" and "multivariate" are used interchangeably,
but here, we really mean "multivariable" because "multivariate" means "multiple outputs", which is

not the case here -- similarly, in most DL applications output one prediction value, or one prediction
value per training example

f (x, y, z, ...)
<latexit sha1_base64="WvYyEF0YwfITU6APwv3BMDzm7fU=">AAAB+HicbVDLSsNAFL2prxofjbp0M1iECiUkKuiy6MZlBfuANpTJdNIOnTyYmYhp6Ze4caGIWz/FnX/jtM1CWw9cOJxzL/fe4yecSeU430ZhbX1jc6u4be7s7u2XrIPDpoxTQWiDxDwWbR9LyllEG4opTtuJoDj0OW35o9uZ33qkQrI4elBZQr0QDyIWMIKVlnpWKag8VVFWReOqadv2Wc8qO7YzB1olbk7KkKPes766/ZikIY0U4VjKjuskyptgoRjhdGp2U0kTTEZ4QDuaRjik0pvMD5+iU630URALXZFCc/X3xASHUmahrztDrIZy2ZuJ/3mdVAXX3oRFSapoRBaLgpQjFaNZCqjPBCWKZ5pgIpi+FZEhFpgonZWpQ3CXX14lzXPbvbCd+8ty7SaPowjHcAIVcOEKanAHdWgAgRSe4RXejLHxYrwbH4vWgpHPHMEfGJ8/DxuQvQ==</latexit>

2 3
@f /@x
6@f /@y 7 For gradients, we use the "partial" symbol
6 7 to denote partial derivatives; more of a
rf = 6 @f /@z 7
4 5 notational convention and the concept is
.. the same as before when we were
.
<latexit sha1_base64="rjERlbBBNVyc9pyXiOpe7xavLPA=">AAACZ3icdVHLSsQwFE3ra6yv8YEIbqKD4GpsVdCNILpxqeCoMB2G2/R2DKZpSVJxLONHunPvxr8wHQfxeSFwcs69nJuTKBdcG99/cdyx8YnJqdq0NzM7N79QX1y60lmhGLZYJjJ1E4FGwSW2DDcCb3KFkEYCr6O700q/vkeleSYvTT/HTgo9yRPOwFiqW38KJUQCaEKPwgh7XJZRCkbxh4EX5qAMB2G1Hfp5eaBh+I/U/196rJT7ODPaC1HGnybdesNv+sOiv0EwAg0yqvNu/TmMM1akKA0ToHU78HPTKSsbJtAuXWjMgd1BD9sWSkhRd8phTgO6ZZmYJpmyRxo6ZL9OlJBq3U8j22n3u9U/tYr8S2sXJjnslFzmhUHJPoySQlCT0Sp0GnOFzIi+BcAUt7tSdgsKmLFf49kQgp9P/g2udpvBXtO/2G8cn4ziqJF1skm2SUAOyDE5I+ekRRh5dTxn2Vlx3twFd9Vd+2h1ndHMMvlW7sY7b863NQ==</latexit>

computing ordinary derivatives (denoted


them as "d")

Sebastian Raschka STAT 479: Deep Learning SS 2019 !39


Gradients: Derivatives of Multivariable Functions

2
Example: f (x, y) = x y + y. <latexit sha1_base64="QY9F2TsovTdRcPsxU4dNMGburME=">AAAB/XicbVDLSsNAFJ3UV62v+Ni5GSxCRQlJFXQjFN24rGAf0MYymU7aoZNJmJlIYyj+ihsXirj1P9z5N07bLLT1wIXDOfdy7z1exKhUtv1t5BYWl5ZX8quFtfWNzS1ze6cuw1hgUsMhC0XTQ5IwyklNUcVIMxIEBR4jDW9wPfYbD0RIGvI7lUTEDVCPU59ipLTUMff80vAEJkfwEg7vyzCBxzCxOmbRtuwJ4DxxMlIEGaod86vdDXEcEK4wQ1K2HDtSboqEopiRUaEdSxIhPEA90tKUo4BIN51cP4KHWulCPxS6uIIT9fdEigIpk8DTnQFSfTnrjcX/vFas/As3pTyKFeF4usiPGVQhHEcBu1QQrFiiCcKC6lsh7iOBsNKBFXQIzuzL86RetpxTy749K1ausjjyYB8cgBJwwDmogBtQBTWAwSN4Bq/gzXgyXox342PamjOymV3wB8bnD2Uzkpk=</latexit>

f (x, y)
<latexit sha1_base64="NcSelYXqYaN92MB/cJh+YZgbqJc=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRahgpRdFfRY9OKxgv2AdinZNNuGZpMlyYrL0h/hxYMiXv093vw3pu0etPXBwOO9GWbmBTFn2rjut1NYWV1b3yhulra2d3b3yvsHLS0TRWiTSC5VJ8CaciZo0zDDaSdWFEcBp+1gfDv1249UaSbFg0lj6kd4KFjICDZWaofVpzOUnvbLFbfmzoCWiZeTCuRo9MtfvYEkSUSFIRxr3fXc2PgZVoYRTielXqJpjMkYD2nXUoEjqv1sdu4EnVhlgEKpbAmDZurviQxHWqdRYDsjbEZ60ZuK/3ndxITXfsZEnBgqyHxRmHBkJJr+jgZMUWJ4agkmitlbERlhhYmxCZVsCN7iy8ukdV7zLmru/WWlfpPHUYQjOIYqeHAFdbiDBjSBwBie4RXenNh5cd6dj3lrwclnDuEPnM8fCs6Otg==</latexit>

y
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

x
<latexit sha1_base64="T81e0FN4eiLN0l7csieDRUgh6Jc=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4KokKeix68diC/YA2lM120q7dbMLuRiyhv8CLB0W8+pO8+W/ctjlo64OBx3szzMwLEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6nfqtR1Sax/LejBP0IzqQPOSMGivVn3qlsltxZyDLxMtJGXLUeqWvbj9maYTSMEG17nhuYvyMKsOZwEmxm2pMKBvRAXYslTRC7WezQyfk1Cp9EsbKljRkpv6eyGik9TgKbGdEzVAvelPxP6+TmvDaz7hMUoOSzReFqSAmJtOvSZ8rZEaMLaFMcXsrYUOqKDM2m6INwVt8eZk0zyveRcWtX5arN3kcBTiGEzgDD66gCndQgwYwQHiGV3hzHpwX5935mLeuOPnMEfyB8/kD5uOM/g==</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !40


Intuitively, we can think of the two graphs in Figure 9 as slices of the
Gradients: Derivatives of Multivariable Functions
multivariable function graph shown in Figure 8. And computing the partial
derivative of a multivariable function – with respect to a function’s argu-
ment – means that we compute the slope of the slice of the multivariable
function graph.
2
Example:
Now, to computef (x,
they) = x ofyf+
gradient , wey.compute the two partial deriva-
tives of that function as follows:
<latexit sha1_base64="QY9F2TsovTdRcPsxU4dNMGburME=">AAAB/XicbVDLSsNAFJ3UV62v+Ni5GSxCRQlJFXQjFN24rGAf0MYymU7aoZNJmJlIYyj+ihsXirj1P9z5N07bLLT1wIXDOfdy7z1exKhUtv1t5BYWl5ZX8quFtfWNzS1ze6cuw1hgUsMhC0XTQ5IwyklNUcVIMxIEBR4jDW9wPfYbD0RIGvI7lUTEDVCPU59ipLTUMff80vAEJkfwEg7vyzCBxzCxOmbRtuwJ4DxxMlIEGaod86vdDXEcEK4wQ1K2HDtSboqEopiRUaEdSxIhPEA90tKUo4BIN51cP4KHWulCPxS6uIIT9fdEigIpk8DTnQFSfTnrjcX/vFas/As3pTyKFeF4usiPGVQhHEcBu1QQrFiiCcKC6lsh7iOBsNKBFXQIzuzL86RetpxTy749K1ausjjyYB8cgBJwwDmogBtQBTWAwSN4Bq/gzXgyXox342PamjOymV3wB8bnD2Uzkpk=</latexit>

C D
ˆf /ˆx
Òf (x, y) = , (27)
ˆf /ˆy
where

ˆf ˆ 2
= x y + y = 2xy (28)
ˆx ˆx
(via the power rule and constant rule), and

ˆf ˆ 2
= x y + y = x2 + 1. (29)
ˆy ˆy
So, the gradient of the function f is defined as
C D
2xy
Òf (x, y) = 2 . (30)
x +1

Sebastian Raschka STAT 479: Deep Learning SS 2019 !41


ˆx ˆx
powerGradients: Derivative
rule and constant rule), and of Multivariable Functions
ˆf ˆ 2 2
f (x, y)
= x y + y = x + 1. (29)
<latexit sha1_base64="NcSelYXqYaN92MB/cJh+YZgbqJc=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRahgpRdFfRY9OKxgv2AdinZNNuGZpMlyYrL0h/hxYMiXv093vw3pu0etPXBwOO9GWbmBTFn2rjut1NYWV1b3yhulra2d3b3yvsHLS0TRWiTSC5VJ8CaciZo0zDDaSdWFEcBp+1gfDv1249UaSbFg0lj6kd4KFjICDZWaofVpzOUnvbLFbfmzoCWiZeTCuRo9MtfvYEkSUSFIRxr3fXc2PgZVoYRTielXqJpjMkYD2nXUoEjqv1sdu4EnVhlgEKpbAmDZurviQxHWqdRYDsjbEZ60ZuK/3ndxITXfsZEnBgqyHxRmHBkJJr+jgZMUWJ4agkmitlbERlhhYmxCZVsCN7iy8ukdV7zLmru/WWlfpPHUYQjOIYqeHAFdbiDBjSBwBie4RXenNh5cd6dj3lrwclnDuEPnM8fCs6Otg==</latexit>

ˆy ˆy
radient of the
Example: function
f (x, f
y)is=defined
x 2
y +asy.
<latexit sha1_base64="QY9F2TsovTdRcPsxU4dNMGburME=">AAAB/XicbVDLSsNAFJ3UV62v+Ni5GSxCRQlJFXQjFN24rGAf0MYymU7aoZNJmJlIYyj+ihsXirj1P9z5N07bLLT1wIXDOfdy7z1exKhUtv1t5BYWl5ZX8quFtfWNzS1ze6cuw1hgUsMhC0XTQ5IwyklNUcVIMxIEBR4jDW9wPfYbD0RIGvI7lUTEDVCPU59ipLTUMff80vAEJkfwEg7vyzCBxzCxOmbRtuwJ4DxxMlIEGaod86vdDXEcEK4wQ1K2HDtSboqEopiRUaEdSxIhPEA90tKUo4BIN51cP4KHWulCPxS6uIIT9fdEigIpk8DTnQFSfTnrjcX/vFas/As3pTyKFeF4usiPGVQhHEcBu1QQrFiiCcKC6lsh7iOBsNKBFXQIzuzL86RetpxTy749K1ausjjyYB8cgBJwwDmogBtQBTWAwSN4Bq/gzXgyXox342PamjOymV3wB8bnD2Uzkpk=</latexit>

C D
2xy
Òf (x, y) = 2 . (30)
x +1 y
x
<latexit sha1_base64="T81e0FN4eiLN0l7csieDRUgh6Jc=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4KokKeix68diC/YA2lM120q7dbMLuRiyhv8CLB0W8+pO8+W/ctjlo64OBx3szzMwLEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6nfqtR1Sax/LejBP0IzqQPOSMGivVn3qlsltxZyDLxMtJGXLUeqWvbj9maYTSMEG17nhuYvyMKsOZwEmxm2pMKBvRAXYslTRC7WezQyfk1Cp9EsbKljRkpv6eyGik9TgKbGdEzVAvelPxP6+TmvDaz7hMUoOSzReFqSAmJtOvSZ8rZEaMLaFMcXsrYUOqKDM2m6INwVt8eZk0zyveRcWtX5arN3kcBTiGEzgDD66gCndQgwYwQHiGV3hzHpwX5935mLeuOPnMEfyB8/kD5uOM/g==</latexit>
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

f(x, y) f(x, y)

f(x, y) = x 2 y + y, f(x, y) = x 2 y + y,
where x = 9 where y = 9

xx
<latexit sha1_base64="T81e0FN4eiLN0l7csieDRUgh6Jc=">AAAB6HicbVBNS8NAEJ34WetX1aOXxSJ4KokKeix68diC/YA2lM120q7dbMLuRiyhv8CLB0W8+pO8+W/ctjlo64OBx3szzMwLEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6nfqtR1Sax/LejBP0IzqQPOSMGivVn3qlsltxZyDLxMtJGXLUeqWvbj9maYTSMEG17nhuYvyMKsOZwEmxm2pMKBvRAXYslTRC7WezQyfk1Cp9EsbKljRkpv6eyGik9TgKbGdEzVAvelPxP6+TmvDaz7hMUoOSzReFqSAmJtOvSZ8rZEaMLaFMcXsrYUOqKDM2m6INwVt8eZk0zyveRcWtX5arN3kcBTiGEzgDD66gCndQgwYwQHiGV3hzHpwX5935mLeuOPnMEfyB8/kD5uOM/g==</latexit>
yy
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !42


Gradients & the Multivariable Chain Rule
Suppose we have a composite function like this:

f g(x), h(x)
<latexit sha1_base64="plzv8mOjHeJYmYNOigh7RaoSRLs=">AAAB/nicbVDLSgMxFL3js9bXqLhyEyxCC1JmVNBl0Y3LCvYB7VAyaaYNzWSGJCOWoeCvuHGhiFu/w51/Y6adhbYeSDg5517uzfFjzpR2nG9raXlldW29sFHc3Nre2bX39psqSiShDRLxSLZ9rChngjY005y2Y0lx6HPa8kc3md96oFKxSNzrcUy9EA8ECxjB2kg9+zDo+mxQHpQfK6doaO7sWenZJafqTIEWiZuTEuSo9+yvbj8iSUiFJhwr1XGdWHsplpoRTifFbqJojMkID2jHUIFDqrx0uv4EnRilj4JImiM0mqq/O1IcKjUOfVMZYj1U814m/ud1Eh1ceSkTcaKpILNBQcKRjlCWBeozSYnmY0MwkczsisgQS0y0SaxoQnDnv7xImmdV97zq3F2Uatd5HAU4gmMogwuXUINbqEMDCKTwDK/wZj1ZL9a79TErXbLyngP4A+vzB6qbk/8=</latexit>

Remember the regular chain rule for a single input:


d ⇥ ⇤ df dg
f (g(x)) = ·
<latexit sha1_base64="h4rMjQZjVF1VYgrN3NyI673lRKg=">AAACLHicbVBLS8NAGNzUV62vqEcvi0VoLyVRQS9CsRePFewDmlA2m026dPNgdyMtIT/Ii39FEA8W8ervcJtG0NaBhdn5Ztj9xokZFdIwZlppbX1jc6u8XdnZ3ds/0A+PuiJKOCYdHLGI9x0kCKMh6UgqGenHnKDAYaTnjFvzee+RcEGj8EFOY2IHyA+pRzGSShrqLcvjCKdulrqTDFoO9Qdeza9N6vX8YsMbWDg8ZfGVBbuR/NH8PDbUq0bDyAFXiVmQKijQHuqvlhvhJCChxAwJMTCNWNop4pJiRrKKlQgSIzxGPhkoGqKACDvNl83gmVJc6EVcnVDCXP2dSFEgxDRwlDNAciSWZ3Pxv9kgkd61ndIwTiQJ8eIhL2FQRnDeHHQpJ1iyqSIIc6r+CvEIqR6k6reiSjCXV14l3fOGedEw7i+rzduijjI4AaegBkxwBZrgDrRBB2DwBF7AO5hpz9qb9qF9Lqwlrcgcgz/Qvr4BOeuoPQ==</latexit>
dx dg dx

For two inputs, we now have

d ⇥ ⇤ @f dg @f dh
f (g(x), h(x)) = · + ·
<latexit sha1_base64="GgshL83IxSBRXWlcKd4y+/A0OV0=">AAACe3icfVHLSsQwFE3ru76qLgUJDuKow9D6QDfCoBuXIzgqTMuQpmkbJn2QpOJQ+hN+mjv/xI1gplPxiRdyOZx7z73JiZcxKqRlvWj61PTM7Nz8grG4tLyyaq6t34o055j0cMpSfu8hQRhNSE9Sych9xgmKPUbuvOHluH73QLigaXIjRxlxYxQmNKAYSUUNzCcn4AgXfln4jyV0PBr2YdAMm497LRipvFdxLjyHk0YnQ1xSxGBQfuJQKbGfSqMepgiV1LyD/1RRrYIfqmiiGpgNq21VAX8DuwYNUEd3YD47forzmCQSMyRE37Yy6RbjNZiR0nByQTKEhygkfQUTFBPhFpV3JdxRjA+DlKuTSFixXxUFioUYxZ7qjJGMxM/amPyr1s9lcOYWNMlySRI8WRTkDMoUjj8C+pQTLNlIAYQ5VXeFOELKCam+y1Am2D+f/BvcHrbto7Z1fdzoXNR2zINNsA2awAanoAOuQBf0AAav2pa2qzW1N72h7+utSauu1ZoN8C30k3e7RcA0</latexit>
dx @g dx @h dx

Sebastian Raschka STAT 479: Deep Learning SS 2019 !43


Gradients & the Multivariable Chain Rule
Example:

f g, h = g 2 h + h
f g(x), h(x) <latexit sha1_base64="3K/V7hUi4HeS7vTPDQAea0sTZN8=">AAACA3icbZDLSsNAFIYn9VbrLepON4NFqCglqYJuhKIblxXsBdpYJtNJMnQyCTMToYSCG1/FjQtF3PoS7nwbJ20W2vrDwMd/zuHM+d2YUaks69soLCwuLa8UV0tr6xubW+b2TktGicCkiSMWiY6LJGGUk6aiipFOLAgKXUba7vA6q7cfiJA04ndqFBMnRD6nHsVIaatv7nk9l/oV/wQGGRzBS+jf1wJ4DIO+Wbaq1kRwHuwcyiBXo29+9QYRTkLCFWZIyq5txcpJkVAUMzIu9RJJYoSHyCddjRyFRDrp5IYxPNTOAHqR0I8rOHF/T6QolHIUurozRCqQs7XM/K/WTZR34aSUx4kiHE8XeQmDKoJZIHBABcGKjTQgLKj+K8QBEggrHVtJh2DPnjwPrVrVPq1at2fl+lUeRxHsgwNQATY4B3VwAxqgCTB4BM/gFbwZT8aL8W58TFsLRj6zC/7I+PwBOG2VTg==</latexit>

where g(x) = 3x, and h(x) = x2


<latexit sha1_base64="plzv8mOjHeJYmYNOigh7RaoSRLs=">AAAB/nicbVDLSgMxFL3js9bXqLhyEyxCC1JmVNBl0Y3LCvYB7VAyaaYNzWSGJCOWoeCvuHGhiFu/w51/Y6adhbYeSDg5517uzfFjzpR2nG9raXlldW29sFHc3Nre2bX39psqSiShDRLxSLZ9rChngjY005y2Y0lx6HPa8kc3md96oFKxSNzrcUy9EA8ECxjB2kg9+zDo+mxQHpQfK6doaO7sWenZJafqTIEWiZuTEuSo9+yvbj8iSUiFJhwr1XGdWHsplpoRTifFbqJojMkID2jHUIFDqrx0uv4EnRilj4JImiM0mqq/O1IcKjUOfVMZYj1U814m/ud1Eh1ceSkTcaKpILNBQcKRjlCWBeozSYnmY0MwkczsisgQS0y0SaxoQnDnv7xImmdV97zq3F2Uatd5HAU4gmMogwuXUINbqEMDCKTwDK/wZj1ZL9a79TErXbLyngP4A+vzB6qbk/8=</latexit>

<latexit sha1_base64="Uhdf9qpm02UaX3ZphjhWP2cxbBI=">AAACGnicbZBNSwMxEIaz9avWr6pHL8EiKEjZbQW9CEUvHitYFdpasum0Dc1ml2RWW5b+Di/+FS8eFPEmXvw3pu0e/BoIvDzvDJN5/UgKg6776WRmZufmF7KLuaXlldW1/PrGpQljzaHGQxnqa58ZkEJBDQVKuI40sMCXcOX3T8f+1S1oI0J1gcMImgHrKtERnKFFrbxHGwgDTO56oIGOaHd3sEePaXmwnxqUqbblvSkf3JRa+YJbdCdF/wovFQWSVrWVf2+0Qx4HoJBLZkzdcyNsJkyj4BJGuUZsIGK8z7pQt1KxAEwzmZw2ojuWtGkn1PYppBP6fSJhgTHDwLedAcOe+e2N4X9ePcbOUTMRKooRFJ8u6sSSYkjHOdG20MBRDq1gXAv7V8p7TDOONs2cDcH7ffJfcVkqeuWie35QqJykcWTJFtkmu8Qjh6RCzkiV1Agn9+SRPJMX58F5cl6dt2lrxklnNsmPcj6+ANKLnjE=</latexit>

d ⇥ ⇤
f (g(x), h(x)) = @f
dx @f = g2 + 1
= 2gh @h
@f dg @f dh @g <latexit sha1_base64="XWjN39B9kaoTQkCPhxgMxQaOydY=">AAACEHicbZDLSgMxFIbP1Futt1GXboJFFIQyUwXdCEU3LivYC3TGkkkzbWjmQpIRyjCP4MZXceNCEbcu3fk2pu2A2vpD4OM/5yQ5vxdzJpVlfRmFhcWl5ZXiamltfWNzy9zeacooEYQ2SMQj0fawpJyFtKGY4rQdC4oDj9OWN7wa11v3VEgWhbdqFFM3wP2Q+Yxgpa2ueej4ApPUibFQDHPkZz88yNAF6t9V0TGyu2bZqlgToXmwcyhDrnrX/HR6EUkCGirCsZQd24qVm46vJpxmJSeRNMZkiPu0ozHEAZVuOlkoQwfa6SE/EvqECk3c3xMpDqQcBZ7uDLAayNna2Pyv1kmUf+6mLIwTRUMyfchPOFIRGqeDekxQovhIAyaC6b8iMsA6IaUzLOkQ7NmV56FZrdgnFevmtFy7zOMowh7swxHYcAY1uIY6NIDAAzzBC7waj8az8Wa8T1sLRj6zC39kfHwD5tKbyw==</latexit>

· + ·
<latexit sha1_base64="GbNpqzOsgH9H4PpDzSR1I7EOtTE=">AAACDHicbVDLSgMxFL3js9ZX1aWbYBFclZkq6EYounFZwT6gHUomzUxDM5khyQhlmA9w46+4caGIWz/AnX9jph1QWw8EDuecm+QeL+ZMadv+spaWV1bX1ksb5c2t7Z3dyt5+W0WJJLRFIh7JrocV5UzQlmaa024sKQ49Tjve+Dr3O/dUKhaJOz2JqRviQDCfEayNNKhU+77EJO3HWGqGOfKzHx5k6BLVg5FJ2TV7CrRInIJUoUBzUPnsDyOShFRowrFSPceOtZvm1xJOs3I/UTTGZIwD2jNU4JAqN50uk6FjowyRH0lzhEZT9fdEikOlJqFnkiHWIzXv5eJ/Xi/R/oWbMhEnmgoye8hPONIRyptBQyYp0XxiCCaSmb8iMsKmHW36K5sSnPmVF0m7XnNOa/btWbVxVdRRgkM4ghNw4BwacANNaAGBB3iCF3i1Hq1n6816n0WXrGLmAP7A+vgGPaGbEA==</latexit>

@g dx @h dx dg d dh d 2
= x = 2x
<latexit sha1_base64="+RADFWviW4Y7qXEwg0QXh7D1MP4=">AAACfXicfVFbS8MwFE7rbc7b1EdfgmOwqYxWhfkiDH3xcYK7wFpGmqZtWHohSWWj9F/4y3zzr/ii2VZRN/FADh/fOd85yRcnYVRIw3jT9LX1jc2t0nZ5Z3dv/6ByeNQTccox6eKYxXzgIEEYjUhXUsnIIOEEhQ4jfWd8P6v3nwkXNI6e5DQhdoj8iHoUI6moUeXF8jjCmZtn7iSHlkP9IfTqfn3SuICByo05Z8Nby4KLVitBXFLEoJd/Y19psRvLcjFOESqpief/qYJCBb9UwUI1qlSNpjEPuArMAlRBEZ1R5dVyY5yGJJKYISGGppFIO5utwYzkZSsVJEF4jHwyVDBCIRF2NncvhzXFuNCLuTqRhHP2pyJDoRDT0FGdIZKBWK7NyL9qw1R6N3ZGoySVJMKLRV7KoIzh7CugSznBkk0VQJhTdVeIA6SckOrDysoEc/nJq6B32TSvmsbjdbV9V9hRAifgFNSBCVqgDR5AB3QBBu8a1Bramfah1/QLvblo1bVCcwx+hd76BIVVwQA=</latexit>

= 3x = 3 dx dx
<latexit sha1_base64="NcH9oYAPdTFVgu1UgXxJiUg2FQw=">AAACD3icbVC7TsMwFL3hWcorwMhiUYGYqoQiwYJUwcJYJPqQ2qhyXKe16jiR7aBWUf6AhV9hYQAhVlY2/ga3zQAtR7J1fM69ur7HjzlT2nG+raXlldW19cJGcXNre2fX3ttvqCiRhNZJxCPZ8rGinAla10xz2oolxaHPadMf3kz85gOVikXiXo9j6oW4L1jACNZG6tonnUBikvZQPzPXKENXKFfyd2VkpErXLjllZwq0SNyclCBHrWt/dXoRSUIqNOFYqbbrxNpLsdSMcJoVO4miMSZD3KdtQwUOqfLS6T4ZOjZKDwWRNEdoNFV/d6Q4VGoc+qYyxHqg5r2J+J/XTnRw6aVMxImmgswGBQlHOkKTcFCPSUo0HxuCiWTmr4gMsIlDmwiLJgR3fuVF0jgru5Wyc3deql7ncRTgEI7gFFy4gCrcQg3qQOARnuEV3qwn68V6tz5mpUtW3nMAf2B9/gDTiJs4</latexit>
dx dx <latexit sha1_base64="cWA9uxeupkWDqRDO77/VSMCLF8U=">AAACEXicbVC7TsMwFL0pr1JeAUYWiwqpU5UUJFiQKlgYi0QfUhsqx3Faq85DtoNaRfkFFn6FhQGEWNnY+BvcNgO0HMnW8Tn36voeN+ZMKsv6Ngorq2vrG8XN0tb2zu6euX/QklEiCG2SiEei42JJOQtpUzHFaScWFAcup213dD312w9USBaFd2oSUyfAg5D5jGClpb5Z6fkCk9RDw0xf4wxdolzJ3+P7mtZq475ZtqrWDGiZ2DkpQ45G3/zqeRFJAhoqwrGUXduKlZNioRjhNCv1EkljTEZ4QLuahjig0klnG2XoRCse8iOhT6jQTP3dkeJAykng6soAq6Fc9Kbif143Uf6Fk7IwThQNyXyQn3CkIjSNB3lMUKL4RBNMBNN/RWSIdSBKh1jSIdiLKy+TVq1qn1at27Ny/SqPowhHcAwVsOEc6nADDWgCgUd4hld4M56MF+Pd+JiXFoy85xD+wPj8AXzjnCE=</latexit>

d ⇥ ⇤
f (g(x)) = [2gh · 3] + [(g 2 + 1) · 2x]
dx
<latexit sha1_base64="nhOHjNJVUiq5mZxAYVCDika55Zk=">AAACR3icbVBNS8MwGE7n9/yqevQSHMqGMNoq6kUQvXhUcCq03UjTtAtLP0hS2Sj7d168evMvePGgiEfTrgedvhDyvM/zvLzJ46WMCmkYL1ptZnZufmFxqb68srq2rm9s3ook45h0cMISfu8hQRiNSUdSych9ygmKPEbuvMFFod89EC5oEt/IUUrcCIUxDShGUlE9vesEHOHcH+f+cAwdj4Z20Aybw1arbFy4dwptK+xDB/uJhAcu3Id2M+xa6jZbFWsNXcepK6c1nChHamBfdfWe3jDaRlnwLzAr0ABVXfX0Z8dPcBaRWGKGhLBNI5VujrikmJFx3ckESREeoJDYCsYoIsLNyxzGcFcxPgwSrk4sYcn+nMhRJMQo8pQzQrIvprWC/E+zMxmcuDmN00ySGE8WBRmDMoFFqNCnnGDJRgogzKl6K8R9pIKVKvoiBHP6y3/BrdU2D9rm9WHj7LyKYxFsgx3QBCY4BmfgElyBDsDgEbyCd/ChPWlv2qf2NbHWtGpmC/yqmvYNMlWqvw==</latexit>
= 2xg 2 + 6gh + 2x

Sebastian Raschka STAT 479: Deep Learning SS 2019 !44


Gradients & the Multivariable Chain Rule
in Vector Form
f g(x), h(x) <latexit sha1_base64="plzv8mOjHeJYmYNOigh7RaoSRLs=">AAAB/nicbVDLSgMxFL3js9bXqLhyEyxCC1JmVNBl0Y3LCvYB7VAyaaYNzWSGJCOWoeCvuHGhiFu/w51/Y6adhbYeSDg5517uzfFjzpR2nG9raXlldW29sFHc3Nre2bX39psqSiShDRLxSLZ9rChngjY005y2Y0lx6HPa8kc3md96oFKxSNzrcUy9EA8ECxjB2kg9+zDo+mxQHpQfK6doaO7sWenZJafqTIEWiZuTEuSo9+yvbj8iSUiFJhwr1XGdWHsplpoRTifFbqJojMkID2jHUIFDqrx0uv4EnRilj4JImiM0mqq/O1IcKjUOfVMZYj1U814m/ud1Eh1ceSkTcaKpILNBQcKRjlCWBeozSYnmY0MwkczsisgQS0y0SaxoQnDnv7xImmdV97zq3F2Uatd5HAU4gmMogwuXUINbqEMDCKTwDK/wZj1ZL9a79TErXbLyngP4A+vzB6qbk/8=</latexit>

d ⇥ ⇤ @f dg @f dh
f (g(x), h(x)) = · + ·
dx ⇥ ⇤ @g dx @h dx
d
<latexit sha1_base64="GgshL83IxSBRXWlcKd4y+/A0OV0=">AAACe3icfVHLSsQwFE3ru76qLgUJDuKow9D6QDfCoBuXIzgqTMuQpmkbJn2QpOJQ+hN+mjv/xI1gplPxiRdyOZx7z73JiZcxKqRlvWj61PTM7Nz8grG4tLyyaq6t34o055j0cMpSfu8hQRhNSE9Sych9xgmKPUbuvOHluH73QLigaXIjRxlxYxQmNKAYSUUNzCcn4AgXfln4jyV0PBr2YdAMm497LRipvFdxLjyHk0YnQ1xSxGBQfuJQKbGfSqMepgiV1LyD/1RRrYIfqmiiGpgNq21VAX8DuwYNUEd3YD47forzmCQSMyRE37Yy6RbjNZiR0nByQTKEhygkfQUTFBPhFpV3JdxRjA+DlKuTSFixXxUFioUYxZ7qjJGMxM/amPyr1s9lcOYWNMlySRI8WRTkDMoUjj8C+pQTLNlIAYQ5VXeFOELKCam+y1Am2D+f/BvcHrbto7Z1fdzoXNR2zINNsA2awAanoAOuQBf0AAav2pa2qzW1N72h7+utSauu1ZoN8C30k3e7RcA0</latexit>

0
f (g(x), h(x)) = rf · v (x).
dx
Where
<latexit sha1_base64="+aOQU0MbDfpagMK0Aeb15Cqegto=">AAACNnicbVDLSsNAFJ3UV62vqEs3g0VoQUqigm6Eohs3QgX7gCaWyWTSDp1MwsyktIR8lRu/w103LhRx6yc4fSy09cIMh3PO5d57vJhRqSxrbORWVtfWN/Kbha3tnd09c/+gIaNEYFLHEYtEy0OSMMpJXVHFSCsWBIUeI02vfzvRmwMiJI34oxrFxA1Rl9OAYqQ01THvnUAgnPpZ6g8z6Hi024ZBqVsalk9hT//lKefCa+hw5DEEA+hgP1LQCZHqeUE6eHJiQUOSaXOlYxatijUtuAzsOSiCedU65qvjRzgJCVeYISnbthUrN0VCUcxIVnASSWKE+6hL2hpyFBLpptOzM3iiGR8GkdCPKzhlf3ekKJRyFHraOVlWLmoT8j+tnajgyk0pjxNFOJ4NChIGVQQnGUKfCoIVG2mAsKB6V4h7SOeodNIFHYK9ePIyaJxV7POK9XBRrN7M48iDI3AMSsAGl6AK7kAN1AEGz2AM3sGH8WK8GZ/G18yaM+Y9h+BPGd8/tAiqHA==</latexit>

  
g(x) d g(x) dg/dx 0
v(x) = v (x) = =
h(x) <latexit sha1_base64="YSUZRn/lLnykhIrBhL4+x85otWM=">AAACI3icbVBPS8MwHE39b/1X9eglOIR5Ga0KiiAMvXhUcJuwlpFmv27BNC1JOjbKvosXv4oXD4p48eB3MZ1F1Pkg5PHe70fyXphyprTrvlszs3PzC4tLy/bK6tr6hrO51VRJJik0aMITeRsSBZwJaGimOdymEkgccmiFdxeF3xqAVCwRN3qUQhCTnmARo0QbqeOc+jHR/TDKB+PqcB+fYT+EHhN5aGTJhmO7V8i+b/fNbfsgut9Wx6m4NXcCPE28klRQiauO8+p3E5rFIDTlRKm256Y6yInUjHIY236mICX0jvSgbaggMaggn2Qc4z2jdHGUSHOExhP150ZOYqVGcWgmi0Tqr1eI/3ntTEcnQc5EmmkQ9OuhKONYJ7goDHeZBKr5yBBCJTN/xbRPJKHa1GqbEry/kadJ86DmHdbc66NK/bysYwntoF1URR46RnV0ia5QA1F0jx7RM3qxHqwn69V6+xqdscqdbfQL1scnk92jnA==</latexit>
dx h(x) dh/dx
<latexit sha1_base64="RWFzGg52ebj+TDDDGIuZ0rfiudc=">AAACa3icbVFNT9swGHYC2yDbug4OTGwHaxVSdynJNgkuSBVcOIK0tkh1VznOm9Sq40S2U7WKcuEncuMfcOE/4LTVRGGvZPnR8yHbj8NccG18/95xt7bfvH23s+u9//Cx8an5ea+vs0Ix6LFMZOompBoEl9Az3Ai4yRXQNBQwCKcXtT6YgdI8k3/MIodRShPJY86osdS4eUtSaiZhXM7+klzxFKr2/Ac+wyRWlJVRVUbzCpMQEi7L0FoVn1deUnsI8SZ29wjI6J9UJzfNOMLJcYTntd/iSY03M+Nmy+/4y8GvQbAGLbSeq3HzjkQZK1KQhgmq9TDwczMqqTKcCag8UmjIKZvSBIYWSpqCHpXLrip8ZJkIx5mySxq8ZJ8nSppqvUhD66yb0S+1mvyfNixMfDoqucwLA5KtDooLgU2G6+JxxBUwIxYWUKa4vStmE2pbNvZ7PFtC8PLJr0H/Zyf41fGvf7e65+s6dtBX9B21UYBOUBddoivUQww9OA3nwPniPLr77qH7bWV1nXVmH22Me/QEAlC5Og==</latexit>

Putting it together:
 
@f /@g dg/dx @f dg @f dh 0
rf · v (x) = · = · + ·
@f /@h dh/dx @g dx @h dx
<latexit sha1_base64="gXFR9+NgXL8kISSBr0qw6YTmCmM=">AAADB3icfVJNb9QwEHVSPkr42sIRIY1YIRUhbROoVC6VKrhwLBLbVlovK9txEquOE9lOtasot176V3rhAEJc+Qvc+Dc4m6WFFBjJ0fO8eW/isWkphbFh+MPz165dv3Fz/VZw+87de/cHGw8OTFFpxseskIU+osRwKRQfW2ElPyo1JzmV/JAev2n5wxOujSjUe7so+TQnqRKJYMS61GzDe4wVoZJAApjFhQWcE5vRpD75gEstct5szp8B7GLKU6Fq6lgt5k2AS6KtINLptuBikwLG/6CyAHMVXxh03YK+bewstiCGeWsUQ9ZtetJdwIkmrL5s1FzitPl1kGWNM2zcx6meB/9RZX1V1qlmg2E4CpcBV0G0AkO0iv3Z4DuOC1blXFkmiTGTKCzttG7bMMnd3CrDS8KOSconDiqSczOtl/fYwFOXiSEptFvKwjL7u6ImuTGLnLrK9pZMn2uTf+MmlU1eTWuhyspyxbpGSSXBFtA+CoiF5szKhQOEaeH+FVhG3CSsezqBG0LUP/JVcPBiFL0che+2h3uvV+NYR4/QE7SJIrSD9tBbtI/GiHmn3rn3yfvsn/kf/S/+167U91aah+iP8L/9BCPo8/g=</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !45


The Jacobian (Matrix)
2 3
f1 (x1 , x2 , x3 , · · · xm )
6 f2 (x1 , x2 , x3 , · · · xm ) 7
6 7
6 f3 (x1 , x2 , x3 , · · · xm ) 7
f (x1 , x2 , ..., xm ) = 6 7
6 .. 7
4 . 5
<latexit sha1_base64="fWfbaRKI598mgFSLsEnPPmyikqY=">AAADmHictZJba9swFMcVe5c2uzTd2Mv2IhYGKQRjJ4MWxiDswra3bixtITZGluVEVJKNdFwSTL7SPkzf+m0qO17pZY/dAYkf/3OTDicpBDfg+xcdx33w8NHjre3uk6fPnu/0dl8cmbzUlE1pLnJ9khDDBFdsChwEOyk0IzIR7Dg5/Vz7j8+YNjxXv2FVsEiSueIZpwSsFO92/oSSwCLJqmw9WMbBEC/j0RB7nleT3MMfcShYBjMcJmzOVUW0Jqs1rjDF9Z3h2N6B5SZsgJdXwrDl0TUeNxzSNAfTSrLO1Xy+gL2awquio/9RdHy/RcOzxn+jhby/Fkylfwe+UaO41/c9vzF8F4IW+qi1w7h3HqY5LSVTQAUxZhb4BUS2KnAq2LobloYVhJ6SOZtZVEQyE1XNYq3xO6ukOMu1PQpwo17PqIg0ZiUTG1lvkbntq8V/+WYlZAdRxVVRAlN00ygrBYYc11uKU64ZBbGyQKjm9q2YLogmFOwud+0QgttfvgtHIy8Ye/7P9/3Jp3YcW+gNeosGKED7aIK+o0M0RdR55Xxwvjhf3dfuxP3m/tiEOp025yW6Ye6vS0MjCas=</latexit>
fm (x1 , x2 , x3 , · · · xm )
2 @f1 @f1 @f1 @f1 3
@x1 @x2 @x3 ... @xm
6 7
6 @f2 @f2 @f2 @f2 7
6 ... 7
6 @x1 @x2 @x3 @xm 7
6 7
6 @f3 @f3 @f3 @f3 7
J (x1 , x2 , x3 , · · · xm ) = 6 @x1 @x2 @x3 ... @xm 7
6 7
6 7
6 .. .. .. .. .. 7
6 . . . . . 7
6 7
4 5
@fm @fm @fm @fm
<latexit sha1_base64="QSDA6qftsfavdTW5UpO6diJSAQ8=">AAAGfnicnVTdbpswGHXbhHXZX7pd7sZbtKj7yyCptN5MqrabaVedtLSVAEXGmMSqDcg2VSPEW+zJdrdn2c0MoSlJh5ZghDgc+3wf5yB9XsyoVKb5e2d3r9U27u3f7zx4+Ojxk+7B0zMZJQKTMY5YJC48JAmjIRkrqhi5iAVB3GPk3Lv8ku+fXxEhaRT+UPOYuBxNQxpQjJSmJgetn9+gw0igDuE1nMAUWjCD70o8rOBRgR3sR0qWFNeUI+h0pl7DT4sqNnQ8MqVhioRA80wfwssrf3MCgXD+jJFQFDEYLJtmVfq6Qmewv7Vy2Fg5qioLs02q8LKK49gW4W69ftjYeY1yA+c1yi2d11TZ2PmosfMa5QbOa5RbOq+psua841wVpfpwCW6R46/u3Whqe/LGadUob9PauulKXPj/edWUWeZFQv9mXizGiTvp9syBWSx4F1gl6IFynU66v/R/wwknocIMSWlbZqzcNO+HGck6TiJJjPAlmhJbwxBxIt20GJ8ZfKUZHwaR0HeoYMFWFSniUs65p09ypGZyfS8n/7VnJyo4dlMaxokiIV40ChIGVQTzWQx9KghWbK4BwoLqb4V4hnSGSk/sjg7BWrd8F5wNB9ZoYH4/6p18LuPYB8/BS3AILPARnICv4BSMAW79ab9ov2m/NYDRN94bHxZHd3dKzTOwsozjv7fh+3Q=</latexit>
@x1 @x2 @x3 ··· @xm

Sebastian Raschka STAT 479: Deep Learning SS 2019 !46


The Jacobian (Matrix)
2 3
f1 (x1 , x2 , x3 , · · · xm )
6 f2 (x1 , x2 , x3 , · · · xm ) 7
6 7
6 f3 (x1 , x2 , x3 , · · · xm ) 7
f (x1 , x2 , ..., xm ) = 6 7
6 .. 7
4 . 5
<latexit sha1_base64="fWfbaRKI598mgFSLsEnPPmyikqY=">AAADmHictZJba9swFMcVe5c2uzTd2Mv2IhYGKQRjJ4MWxiDswra3bixtITZGluVEVJKNdFwSTL7SPkzf+m0qO17pZY/dAYkf/3OTDicpBDfg+xcdx33w8NHjre3uk6fPnu/0dl8cmbzUlE1pLnJ9khDDBFdsChwEOyk0IzIR7Dg5/Vz7j8+YNjxXv2FVsEiSueIZpwSsFO92/oSSwCLJqmw9WMbBEC/j0RB7nleT3MMfcShYBjMcJmzOVUW0Jqs1rjDF9Z3h2N6B5SZsgJdXwrDl0TUeNxzSNAfTSrLO1Xy+gL2awquio/9RdHy/RcOzxn+jhby/Fkylfwe+UaO41/c9vzF8F4IW+qi1w7h3HqY5LSVTQAUxZhb4BUS2KnAq2LobloYVhJ6SOZtZVEQyE1XNYq3xO6ukOMu1PQpwo17PqIg0ZiUTG1lvkbntq8V/+WYlZAdRxVVRAlN00ygrBYYc11uKU64ZBbGyQKjm9q2YLogmFOwud+0QgttfvgtHIy8Ye/7P9/3Jp3YcW+gNeosGKED7aIK+o0M0RdR55Xxwvjhf3dfuxP3m/tiEOp025yW6Ye6vS0MjCas=</latexit>
fm (x1 , x2 , x3 , · · · xm )
2 @f1 @f1 @f1 @f1 3 >
@x1 @x2 @x3 ... @xm (rf1 )
6 7
<latexit sha1_base64="mDjqyUPcWk8sBKp+bFBLTzyXna8=">AAAB+nicbVBNS8NAEN3Ur1q/Uj16WSxCvZREBT0WvXisYD+giWGy3bRLN5uwu1FK7U/x4kERr/4Sb/4bt20O2vpg4PHeDDPzwpQzpR3n2yqsrK6tbxQ3S1vbO7t7dnm/pZJMEtokCU9kJwRFORO0qZnmtJNKCnHIaTscXk/99gOViiXiTo9S6sfQFyxiBLSRArtc9QSEHHAUuCf3nk7SwK44NWcGvEzcnFRQjkZgf3m9hGQxFZpwUKrrOqn2xyA1I5xOSl6maApkCH3aNVRATJU/np0+wcdG6eEokaaExjP198QYYqVGcWg6Y9ADtehNxf+8bqajS3/MRJppKsh8UZRxrBM8zQH3mKRE85EhQCQzt2IyAAlEm7RKJgR38eVl0jqtuWc15/a8Ur/K4yiiQ3SEqshFF6iOblADNRFBj+gZvaI368l6sd6tj3lrwcpnDtAfWJ8/8Z2TJQ==</latexit>

6 @f2 @f2 @f2 @f2 7


6 ... 7
6 @x1 @x2 @x3 @xm 7
6 7
6 @f3 @f3 @f3 @f3 7
J (x1 , x2 , x3 , · · · xm ) = 6 @x1 @x2 @x3 ... @xm 7
6 7
6 7
6 .. .. .. .. .. 7
6 . . . . . 7
6 7
4 5
@fm @fm @fm @fm
<latexit sha1_base64="QSDA6qftsfavdTW5UpO6diJSAQ8=">AAAGfnicnVTdbpswGHXbhHXZX7pd7sZbtKj7yyCptN5MqrabaVedtLSVAEXGmMSqDcg2VSPEW+zJdrdn2c0MoSlJh5ZghDgc+3wf5yB9XsyoVKb5e2d3r9U27u3f7zx4+Ojxk+7B0zMZJQKTMY5YJC48JAmjIRkrqhi5iAVB3GPk3Lv8ku+fXxEhaRT+UPOYuBxNQxpQjJSmJgetn9+gw0igDuE1nMAUWjCD70o8rOBRgR3sR0qWFNeUI+h0pl7DT4sqNnQ8MqVhioRA80wfwssrf3MCgXD+jJFQFDEYLJtmVfq6Qmewv7Vy2Fg5qioLs02q8LKK49gW4W69ftjYeY1yA+c1yi2d11TZ2PmosfMa5QbOa5RbOq+psua841wVpfpwCW6R46/u3Whqe/LGadUob9PauulKXPj/edWUWeZFQv9mXizGiTvp9syBWSx4F1gl6IFynU66v/R/wwknocIMSWlbZqzcNO+HGck6TiJJjPAlmhJbwxBxIt20GJ8ZfKUZHwaR0HeoYMFWFSniUs65p09ypGZyfS8n/7VnJyo4dlMaxokiIV40ChIGVQTzWQx9KghWbK4BwoLqb4V4hnSGSk/sjg7BWrd8F5wNB9ZoYH4/6p18LuPYB8/BS3AILPARnICv4BSMAW79ab9ov2m/NYDRN94bHxZHd3dKzTOwsozjv7fh+3Q=</latexit>
@x1 @x2 @x3 ··· @xm

Sebastian Raschka STAT 479: Deep Learning SS 2019 !47


Second Order Derivatives

Sebastian Raschka STAT 479: Deep Learning SS 2019 !48


Lucky for you, we won't need second
order derivatives in this class ;)

Sebastian Raschka STAT 479: Deep Learning SS 2019 !49


Back to Linear Regression

b
<latexit sha1_base64="s6L+Z+fhtGywXDdOyCIOKOnOTTA=">AAAB6HicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjC/YD2lA220m7drMJuxuhhP4CLx4U8epP8ua/cdvmoK0vLDy8M8POvEEiuDau++0UNja3tneKu6W9/YPDo/LxSVvHqWLYYrGIVTegGgWX2DLcCOwmCmkUCOwEk7t5vfOESvNYPphpgn5ER5KHnFFjrWYwKFfcqrsQWQcvhwrkagzKX/1hzNIIpWGCat3z3MT4GVWGM4GzUj/VmFA2oSPsWZQ0Qu1ni0Vn5MI6QxLGyj5pyML9PZHRSOtpFNjOiJqxXq3Nzf9qvdSEN37GZZIalGz5UZgKYmIyv5oMuUJmxNQCZYrbXQkbU0WZsdmUbAje6snr0L6qepab15X6bR5HEc7gHC7BgxrU4R4a0AIGCM/wCm/Oo/PivDsfy9aCk8+cwh85nz/E44zm</latexit>
<latexit

x1
<latexit sha1_base64="Z7jxfJr8/pbKF9IEHv5u2p28PzU=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwuyjaA=</latexit>

Activation
w1
<latexit sha1_base64="yC0dBIEl9qTv0X7X9wtBoMi5o3k=">AAAB+nicbZBNS8NAEIYnftb6lerRS7AInkoigh6rXjxWsB/QhrLZbtulm03YnVRL7E/x4kERr/4Sb/4bt2kO2vrCwsM7M8zsG8SCa3Tdb2tldW19Y7OwVdze2d3bt0sHDR0lirI6jUSkWgHRTHDJ6shRsFasGAkDwZrB6GZWb46Z0jyS9ziJmR+SgeR9Tgkaq2uXOsgeMb2iyMeZNe3aZbfiZnKWwcuhDLlqXfur04toEjKJVBCt254bo58ShZwKNi12Es1iQkdkwNoGJQmZ9tPs9KlzYpye04+UeRKdzP09kZJQ60kYmM6Q4FAv1mbmf7V2gv1LP+UyTpBJOl/UT4SDkTPLwelxxSiKiQFCFTe3OnRIFKFo0iqaELzFLy9D46ziGb47L1ev8zgKcATHcAoeXEAVbqEGdaDwAM/wCm/Wk/VivVsf89YVK585hD+yPn8ADeOUgA==</latexit>

<latexit sha1_base64="ozSIzVA/SGXegmac4XRXthOpvw0=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwosjZ8=</latexit>

X
x2
<latexit sha1_base64="8ur8Qnjf68veizOKVqkUmBXGiPw=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwANNo2h</latexit>

w2 <latexit sha1_base64="dwm2weui/za/rSvUQ73K/Rk4/5k=">AAAB7XicbZDLSgMxFIbP1Futt1GXboJFcFVmRNBl0Y3LCvYC7VAyaaaNzWVIMkIZ+g5uXCji1vdx59uYtrPQ1h8CH/85h5zzxylnxgbBt1daW9/Y3CpvV3Z29/YP/MOjllGZJrRJFFe6E2NDOZO0aZnltJNqikXMaTse387q7SeqDVPywU5SGgk8lCxhBFtntXqGDQXu+9WgFsyFViEsoAqFGn3/qzdQJBNUWsKxMd0wSG2UY20Z4XRa6WWGppiM8ZB2HUosqIny+bZTdOacAUqUdk9aNHd/T+RYGDMRsesU2I7Mcm1m/lfrZja5jnIm08xSSRYfJRlHVqHZ6WjANCWWTxxgopnbFZER1phYF1DFhRAun7wKrYta6Pj+slq/KeIowwmcwjmEcAV1uIMGNIHAIzzDK7x5ynvx3r2PRWvJK2aO4Y+8zx+cYY8j</latexit>

<latexit sha1_base64="Vi95YwknFrFzcB5LyqgiYSoMf0U=">AAAB7nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjBfsBbSib7aZdutmE3YkQQn+EFw+KePX3ePPfuG1z0NYXFh7emWFn3iCRwqDrfjuljc2t7Z3ybmVv/+DwqHp80jFxqhlvs1jGuhdQw6VQvI0CJe8lmtMokLwbTO/m9e4T10bE6hGzhPsRHSsRCkbRWt3BhGKezYbVmlt3FyLr4BVQg0KtYfVrMIpZGnGFTFJj+p6boJ9TjYJJPqsMUsMTyqZ0zPsWFY248fPFujNyYZ0RCWNtn0KycH9P5DQyJosC2xlRnJjV2tz8r9ZPMbzxc6GSFLliy4/CVBKMyfx2MhKaM5SZBcq0sLsSNqGaMrQJVWwI3urJ69C5qnuWH65rzdsijjKcwTlcggcNaMI9tKANDKbwDK/w5iTOi/PufCxbS04xcwp/5Hz+ALHDj8o=</latexit>

<latexit sha1_base64="sAAe226MpFncoK5AcSpzzUnkA9I=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwALsI2g</latexit> <latexit sha1_base64="0Hc81E1zKREVtkdUaco0RSs+Ymk=">AAAB63icbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHCrYW2qVk02wbmmSXZFYoS/+CFw+KePUPefPfmLZ70NYXAg/vzJCZN0qlsOj7315pbX1jc6u8XdnZ3ds/qB4etW2SGcZbLJGJ6UTUcik0b6FAyTup4VRFkj9G49tZ/fGJGysS/YCTlIeKDrWIBaM4s3o2U/1qza/7c5FVCAqoQaFmv/rVGyQsU1wjk9TabuCnGObUoGCSTyu9zPKUsjEd8q5DTRW3YT7fdUrOnDMgcWLc00jm7u+JnCprJypynYriyC7XZuZ/tW6G8XWYC51myDVbfBRnkmBCZoeTgTCcoZw4oMwItythI2ooQxdPxYUQLJ+8Cu2LeuD4/rLWuCniKMMJnMI5BHAFDbiDJrSAwQie4RXePOW9eO/ex6K15BUzx/BH3ucPMe2OUw==</latexit>

Output
.. <latexit sha1_base64="gvQE9cb1Dja5lCiXX5pMwfJapj8=">AAAB9HicbZBNS8NAEIY3ftb6VfXoZbEInkoigh6LXrxZwX5AG8pmO2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqjnUeSxj3QqYASkU1FGghFaigUWBhGYwvJnWmyPQRsTqAccJ+BHrKxEKztBafgfhCbO7FJMUJ91S2a24M9Fl8HIok1y1bumr04t5GoFCLpkxbc9N0M+YRsElTIqd1EDC+JD1oW1RsQiMn82WntBT6/RoGGv7FNKZ+3siY5Ex4yiwnRHDgVmsTc3/au0Uwys/E8qeBIrPPwpTSTGm0wRoT2jgKMcWGNfC7kr5gGnG0eZUtCF4iycvQ+O84lm+vyhXr/M4CuSYnJAz4pFLUiW3pEbqhJNH8kxeyZszcl6cd+dj3rri5DNH5I+czx+i4ZKm</latexit>

. Net input
<latexit sha1_base64="vnW9SOTDG2wSeqwpvMYjb0pYOfc=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJVEBF0W3biSCvYCbSiT6Uk7dHJh5qRYQt/EjQtF3Pom7nwbp2kW2vrDwMd/zmHO+f1ECo2O822V1tY3NrfK25Wd3b39A/vwqKXjVHFo8ljGquMzDVJE0ESBEjqJAhb6Etr++HZeb09AaRFHjzhNwAvZMBKB4AyN1bftHsITZveAVERJirO+XXVqTi66Cm4BVVKo0be/eoOYpyFEyCXTuus6CXoZUyi4hFmll2pIGB+zIXQNRiwE7WX55jN6ZpwBDWJlXoQ0d39PZCzUehr6pjNkONLLtbn5X62bYnDtZflJEPHFR0EqKcZ0HgMdCAUc5dQA40qYXSkfMcU4mrAqJgR3+eRVaF3UXMMPl9X6TRFHmZyQU3JOXHJF6uSONEiTcDIhz+SVvFmZ9WK9Wx+L1pJVzByTP7I+fwD13ZPb</latexit>

wm
<latexit sha1_base64="3SltFZgdSbEccduFdJMJ4sVJM+s=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMquUpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2Ucjds=</latexit>

xm <latexit sha1_base64="UZ/Cq01CQU77ibJgEHsrgiYApIY=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMiuWpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2aijdw=</latexit>

Inputs
<latexit sha1_base64="kW2ZbIA+FSvwKPaKbZPllX8WYNo=">AAAB9HicbZBNS8NAEIYnftb6VfXoZbEInkoigh6LXvRWwX5AG8pmu2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqhmvs1jGuhVQw6VQvI4CJW8lmtMokLwZDG+m9eaIayNi9YDjhPsR7SsRCkbRWn4H+RNmdypJ0Uy6pbJbcWciy+DlUIZctW7pq9OLWRpxhUxSY9qem6CfUY2CST4pdlLDE8qGtM/bFhWNuPGz2dITcmqdHgljbZ9CMnN/T2Q0MmYcBbYzojgwi7Wp+V+tnWJ45WdiehNXbP5RmEqCMZkmQHpCc4ZybIEyLeyuhA2opgxtTkUbgrd48jI0ziue5fuLcvU6j6MAx3ACZ+DBJVThFmpQBwaP8Ayv8OaMnBfn3fmYt644+cwR/JHz+QONXpKY</latexit>

L
<latexit sha1_base64="P35O/4hZ2SiSHSfDfAG1bUHgTNI=">AAAB8nicbVBNS8NAFHypX7V+VT16WSyCp5KooMeiFw8eKlhbSEPZbLft0s0m7L4IJfRnePGgiFd/jTf/jZs2B20dWBhm3mPnTZhIYdB1v53Syura+kZ5s7K1vbO7V90/eDRxqhlvsVjGuhNSw6VQvIUCJe8kmtMolLwdjm9yv/3EtRGxesBJwoOIDpUYCEbRSn43ojhiVGZ301615tbdGcgy8QpSgwLNXvWr249ZGnGFTFJjfM9NMMioRsEkn1a6qeEJZWM65L6likbcBNks8pScWKVPBrG2TyGZqb83MhoZM4lCO5lHNIteLv7n+SkOroJMqCRFrtj8o0EqCcYkv5/0heYM5cQSyrSwWQkbUU0Z2pYqtgRv8eRl8nhW987r7v1FrXFd1FGGIziGU/DgEhpwC01oAYMYnuEV3hx0Xpx352M+WnKKnUP4A+fzB4GvkWQ=</latexit>

Convex loss function


X
L(w, b) = (ŷ [i] y [i] )2
<latexit sha1_base64="J5nwe8Lv0CM44wk1z3nGWinGIrk=">AAACK3icbZDLSgMxFIYzXmu9VV26CRahBS0zVdCNUOrGhYsK9gKdacmkmTY0cyHJKEOY93Hjq7jQhRfc+h6m7Sy0+kPg4z/ncHJ+N2JUSNN8NxYWl5ZXVnNr+fWNza3tws5uS4Qxx6SJQxbyjosEYTQgTUklI52IE+S7jLTd8eWk3r4jXNAwuJVJRBwfDQPqUYyktvqFuu0jOcKIqeu0NGXXU/fpEXTL8ALaIvb7iqawZI+QVEnaU13qpPAYJhmVe9V+oWhWzKngX7AyKIJMjX7h2R6EOPZJIDFDQnQtM5KOQlxSzEiat2NBIoTHaEi6GgPkE+Go6a0pPNTOAHoh1y+QcOr+nFDIFyLxXd05uUbM1ybmf7VuLL1zR9EgiiUJ8GyRFzMoQzgJDg4oJ1iyRAPCnOq/QjxCHGGp483rEKz5k/9Cq1qxTirmzWmxVs/iyIF9cABKwAJnoAauQAM0AQYP4Am8gjfj0XgxPozPWeuCkc3sgV8yvr4BCWGm5A==</latexit>
i

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !50


Gradient Descent

L
<latexit sha1_base64="P35O/4hZ2SiSHSfDfAG1bUHgTNI=">AAAB8nicbVBNS8NAFHypX7V+VT16WSyCp5KooMeiFw8eKlhbSEPZbLft0s0m7L4IJfRnePGgiFd/jTf/jZs2B20dWBhm3mPnTZhIYdB1v53Syura+kZ5s7K1vbO7V90/eDRxqhlvsVjGuhNSw6VQvIUCJe8kmtMolLwdjm9yv/3EtRGxesBJwoOIDpUYCEbRSn43ojhiVGZ301615tbdGcgy8QpSgwLNXvWr249ZGnGFTFJjfM9NMMioRsEkn1a6qeEJZWM65L6likbcBNks8pScWKVPBrG2TyGZqb83MhoZM4lCO5lHNIteLv7n+SkOroJMqCRFrtj8o0EqCcYkv5/0heYM5cQSyrSwWQkbUU0Z2pYqtgRv8eRl8nhW987r7v1FrXFd1FGGIziGU/DgEhpwC01oAYMYnuEV3hx0Xpx352M+WnKKnUP4A+fzB4GvkWQ=</latexit>

Learning rate and
 Convex loss function


X
steepness of the 
 L(w, b) = (ŷ [i] y [i] )2
gradient determine
 <latexit sha1_base64="J5nwe8Lv0CM44wk1z3nGWinGIrk=">AAACK3icbZDLSgMxFIYzXmu9VV26CRahBS0zVdCNUOrGhYsK9gKdacmkmTY0cyHJKEOY93Hjq7jQhRfc+h6m7Sy0+kPg4z/ncHJ+N2JUSNN8NxYWl5ZXVnNr+fWNza3tws5uS4Qxx6SJQxbyjosEYTQgTUklI52IE+S7jLTd8eWk3r4jXNAwuJVJRBwfDQPqUYyktvqFuu0jOcKIqeu0NGXXU/fpEXTL8ALaIvb7iqawZI+QVEnaU13qpPAYJhmVe9V+oWhWzKngX7AyKIJMjX7h2R6EOPZJIDFDQnQtM5KOQlxSzEiat2NBIoTHaEi6GgPkE+Go6a0pPNTOAHoh1y+QcOr+nFDIFyLxXd05uUbM1ybmf7VuLL1zR9EgiiUJ8GyRFzMoQzgJDg4oJ1iyRAPCnOq/QjxCHGGp483rEKz5k/9Cq1qxTirmzWmxVs/iyIF9cABKwAJnoAauQAM0AQYP4Am8gjfj0XgxPozPWeuCkc3sgV8yvr4BCWGm5A==</latexit>
i
how much we update

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !51


Gradient Descent

L
<latexit sha1_base64="P35O/4hZ2SiSHSfDfAG1bUHgTNI=">AAAB8nicbVBNS8NAFHypX7V+VT16WSyCp5KooMeiFw8eKlhbSEPZbLft0s0m7L4IJfRnePGgiFd/jTf/jZs2B20dWBhm3mPnTZhIYdB1v53Syura+kZ5s7K1vbO7V90/eDRxqhlvsVjGuhNSw6VQvIUCJe8kmtMolLwdjm9yv/3EtRGxesBJwoOIDpUYCEbRSn43ojhiVGZ301615tbdGcgy8QpSgwLNXvWr249ZGnGFTFJjfM9NMMioRsEkn1a6qeEJZWM65L6likbcBNks8pScWKVPBrG2TyGZqb83MhoZM4lCO5lHNIteLv7n+SkOroJMqCRFrtj8o0EqCcYkv5/0heYM5cQSyrSwWQkbUU0Z2pYqtgRv8eRl8nhW987r7v1FrXFd1FGGIziGU/DgEhpwC01oAYMYnuEV3hx0Xpx352M+WnKKnUP4A+fzB4GvkWQ=</latexit>

If the learning rate is too large,


we can overshoot

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

L
<latexit sha1_base64="P35O/4hZ2SiSHSfDfAG1bUHgTNI=">AAAB8nicbVBNS8NAFHypX7V+VT16WSyCp5KooMeiFw8eKlhbSEPZbLft0s0m7L4IJfRnePGgiFd/jTf/jZs2B20dWBhm3mPnTZhIYdB1v53Syura+kZ5s7K1vbO7V90/eDRxqhlvsVjGuhNSw6VQvIUCJe8kmtMolLwdjm9yv/3EtRGxesBJwoOIDpUYCEbRSn43ojhiVGZ301615tbdGcgy8QpSgwLNXvWr249ZGnGFTFJjfM9NMMioRsEkn1a6qeEJZWM65L6likbcBNks8pScWKVPBrG2TyGZqb83MhoZM4lCO5lHNIteLv7n+SkOroJMqCRFrtj8o0EqCcYkv5/0heYM5cQSyrSwWQkbUU0Z2pYqtgRv8eRl8nhW987r7v1FrXFd1FGGIziGU/DgEhpwC01oAYMYnuEV3hx0Xpx352M+WnKKnUP4A+fzB4GvkWQ=</latexit>

If the learning rate is too small,


convergence is very slow

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !52


Linear Regression Loss Derivative
X
L(w, b) = (ŷ [i] y [i] )2 Sum Squared Error (SSE) loss
<latexit sha1_base64="J5nwe8Lv0CM44wk1z3nGWinGIrk=">AAACK3icbZDLSgMxFIYzXmu9VV26CRahBS0zVdCNUOrGhYsK9gKdacmkmTY0cyHJKEOY93Hjq7jQhRfc+h6m7Sy0+kPg4z/ncHJ+N2JUSNN8NxYWl5ZXVnNr+fWNza3tws5uS4Qxx6SJQxbyjosEYTQgTUklI52IE+S7jLTd8eWk3r4jXNAwuJVJRBwfDQPqUYyktvqFuu0jOcKIqeu0NGXXU/fpEXTL8ALaIvb7iqawZI+QVEnaU13qpPAYJhmVe9V+oWhWzKngX7AyKIJMjX7h2R6EOPZJIDFDQnQtM5KOQlxSzEiat2NBIoTHaEi6GgPkE+Go6a0pPNTOAHoh1y+QcOr+nFDIFyLxXd05uUbM1ybmf7VuLL1zR9EgiiUJ8GyRFzMoQzgJDg4oJ1iyRAPCnOq/QjxCHGGp483rEKz5k/9Cq1qxTirmzWmxVs/iyIF9cABKwAJnoAauQAM0AQYP4Am8gjfj0XgxPozPWeuCkc3sgV8yvr4BCWGm5A==</latexit>
i
@L @ X [i]
= (ŷ y [i] )2
@wj @wj i
@ X
= ( (wT x[i] ) y [i] )2
@wj i
X @
T [i] [i]
= 2( (w x ) y ) ( (wT x[i] ) y [i] )
i
@wj
X d @
T [i] [i] T [i]
= 2( (w x ) y ) T [i]
w x
i
d(w x ) @w j
X d [i]
T [i] [i]
= 2( (w x ) y ) T x[i] ) j
x (Note that the activation function is the
i
d(w
X identity function in linear regression)
T [i] [i] [i]
= 2( (w x ) y )xj
<latexit sha1_base64="z+trg6C0NNIpeTPQSfXLY/fzu50=">AAAEu3icxVNNaxsxEFW82yZ1v5z22IuoabEPNbtuILkEQnvpoYcU4iRgrRetVmvL0X4gaZMsQj+yveXfRGtvceOUpKGFDggeM/OenkZMVHAmleddbbQc99Hjza0n7afPnr942dl+dSzzUhA6IjnPxWmEJeUsoyPFFKenhaA4jTg9ic4+1/WTcyoky7MjVRU0SPE0YwkjWNlUuN36jhKBiUYFFophDlGK1Yxgrr8as8pehHMD3+/Dm83rDUiWachgD82w0pWZ6DELDPwAqwb1J0OE2n8uI9k0xb2FoyjRF2Zy9BNfNuL9lfpKfEkfwgcr3OPrgXKw/9d+GkMxXBKNju/mmvsmewf5P5i9DOf/6u6VVjvsdL2Btwh4G/gN6IImDsPODxTnpExppgjHUo59r1CBrgdHODVtVEpaYHKGp3RsYYZTKgO92D0D39lMDJNc2JMpuMj+ytA4lbJKI9tZP0Cu1+rk72rjUiV7gWZZUSqakeVFScmhymG9yDBmghLFKwswEcx6hWSG7Q8ou+71EPz1J98Gx8OB/3HgfdvpHnxqxrEF3oC3oAd8sAsOwBdwCEaAOHvOxJk6M3ffJe7c5cvW1kbDeQ1uhFteA33FlnQ=</latexit>
i

Sebastian Raschka STAT 479: Deep Learning SS 2019 !53


Linear Regression Loss Derivative (alt.)

1 X [i] Mean Squared Error (MSE) loss often


L(w, b) = (ŷ y [i] )2
2n i scaled by factor 1/2 for convenience
<latexit sha1_base64="0l/20gVMJ+HD3Glq1eJB96T1CxM=">AAACOHicbZDLSgMxFIYz3q23qks3wSK0oGWmCroRRDcuBBWsFjrTkkkzbTCTGZKMMoQ8lhsfw524caGIW5/AtM7C2w+Bj/+cw8n5w5RRqVz30Rkbn5icmp6ZLc3NLywulZdXLmWSCUyaOGGJaIVIEkY5aSqqGGmlgqA4ZOQqvD4a1q9uiJA04RcqT0kQoz6nEcVIWatbPvVjpAYYMX1iqiMOI31rNmFYg/vQjwTC2jO6wQ30ZRZ3NTWw6g+Q0rnp6DYNDNyCeUG1TqNbrrh1dyT4F7wCKqDQWbf84PcSnMWEK8yQlG3PTVWgkVAUM2JKfiZJivA16pO2RY5iIgM9OtzADev0YJQI+7iCI/f7hEaxlHkc2s7hafJ3bWj+V2tnKtoLNOVppgjHX4uijEGVwGGKsEcFwYrlFhAW1P4V4gGyYSmbdcmG4P0++S9cNuredt0936kcHBZxzIA1sA6qwAO74AAcgzPQBBjcgSfwAl6de+fZeXPev1rHnGJmFfyQ8/EJoaKsPw==</latexit>

@L @ 1 X [i]
= (ŷ y [i] )2
@wj @wj 2n i
@ X 1
= ( (wT x[i] ) y [i] )2
@wj i 2n
X1 @
T [i] [i]
= ( (w x ) y ) ( (wT x[i] ) y [i] )
i
n @w j

1X T [i] [i] d @ T [i]


= ( (w x ) y ) T [i]
w x
n i d(w x ) @wj
1X T [i] [i] d [i]
= ( (w x ) y ) T [i]
x j (Note that the activation function is the
n i d(w x ) identity function in linear regression)
1X T [i] [i] [i]
= ( (w x ) y )xj
n i
<latexit sha1_base64="POm5xMKpXo7NIrIwItHpjIqhSdg=">AAAE/HicrVTLbtQwFHWbQMvwmpYlG4sRaGbBKBkqwQapKhsWLIrUaStNMpHjODNunYdihzay3F9hwwKE2PIh7PgbnAeaR6uOBnGlSMf3ce7xkWI/ZZQLy/q9sWmYd+5ubd9r3X/w8NHj9s7uMU/yDJMhTliSnfqIE0ZjMhRUMHKaZgRFPiMn/vm7sn7yiWScJvGRKFLiRmgS05BiJHTK2zF2nTBDWDopygRFDDoRElOMmPyg1Cx74Z0p+OItXGxebqirtpKDWB94HnkUdp0pErJQYzmiroIvYdGg3njgOK3VnDXNAnXX4XQSoW6l1Q/lhRof/cWXzabebNVs0xKXplqba4XcNelgb96DWlKjcn1lDUsA60Elg9tn1SrrbxmuZP8X1WuLvvTO5jTcaN0/eDejbXntjtW3qoDXgd2ADmji0Gv/coIE5xGJBWaI85FtpcKVpZeYEdVyck5ShM/RhIw0jFFEuCurn1fB5zoTwDDJ9BcLWGXnJySKOC8iX3eWF+DLtTJ5U22Ui/CNK2mc5oLEuF4U5gyKBJYvAQxoRrBghQYIZ1RrhXiKtJlCvxelCfbyla+D40HfftW3Pu519g8aO7bBU/AMdIENXoN98B4cgiHARmF8Nr4a38wr84v53fxRt25uNDNPwEKYP/8AZzay5Q==</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !54


Batch vs Stochastic
The minibatch and on-line modes are stochastic

versions of gradient descent (batch mode)
Most commonly used in DL, because
Minibatch mode

1. Choosing a subset (vs 1
(mix between on-line and batch)
example at a time)

m 1 takes advantage of

1. Initialize w := 0
<latexit sha1_base64="oErrezlcnZNNwNypy+oJZI2iT/E=">AAAB/nicbVDLSsNAFL3xWesrKq7cDBbBjSVRQRGEohuXFewD2lgm00k7dCYJMxOlhIC/4saFIm79Dnf+jZO2C209MHA4517umePHnCntON/W3PzC4tJyYaW4ura+sWlvbddVlEhCayTikWz6WFHOQlrTTHPajCXFwue04Q+uc7/xQKViUXinhzH1BO6FLGAEayN17N22wLrvB+ljhi4ukXOfiiM369glp+yMgGaJOyElmKDasb/a3YgkgoaacKxUy3Vi7aVYakY4zYrtRNEYkwHu0ZahIRZUeekofoYOjNJFQSTNCzUaqb83UiyUGgrfTOZh1bSXi/95rUQH517KwjjRNCTjQ0HCkY5Q3gXqMkmJ5kNDMJHMZEWkjyUm2jRWNCW401+eJfXjsntSdm5PS5WrSR0F2IN9OAQXzqACN1CFGhBI4Rle4c16sl6sd+tjPDpnTXZ24A+szx/fu5TE</latexit>
, b := 0
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch: vectorization (faster


A. Initialize w := 0, b := 0 iteration through epoch
than on-line)
<latexit sha1_base64="MHuS1lDrYLNfDe0EZMaG9lxFVEM=">AAACDHicbVDLSgMxFM3UV62vqks3wSK4kDKjgqIIRV24rGAf0Cklk2ba0ExmSO4oZegHuPFX3LhQxK0f4M6/MdPOQlsPBE7OuZd77/EiwTXY9reVm5tfWFzKLxdWVtfWN4qbW3UdxoqyGg1FqJoe0UxwyWrAQbBmpBgJPMEa3uAq9Rv3TGkeyjsYRqwdkJ7kPqcEjNQpltxrJoBgNyDQ9/zkYXR2YR+45zjTPfM1VXbZHgPPEicjJZSh2il+ud2QxgGTQAXRuuXYEbQTooBTwUYFN9YsInRAeqxlqCQB0+1kfMwI7xmli/1QmScBj9XfHQkJtB4GnqlMd9bTXir+57Vi8E/bCZdRDEzSySA/FhhCnCaDu1wxCmJoCKGKm10x7RNFKJj8CiYEZ/rkWVI/LDtHZfv2uFS5zOLIox20i/aRg05QBd2gKqohih7RM3pFb9aT9WK9Wx+T0pyV9WyjP7A+fwABpZmn</latexit>

B. For every {hx[i] , y [i] i, ..., hx[i+k] , y [i+k] i} ⇢ D <latexit sha1_base64="KqtCSuAcMIvMBYGIcT0Muf0Bu9k=">AAACZHicdVHRatswFJXdbe2ybnNb9jQYl4XBYMHYWbMkb2Xbwx47WNpC7AVZuU5FZNlI8mgQ/sm+9bEv+47JSQrd6C4Ijs65h3t1lFWCaxNFN56/8+jxk929p51n+89fvAwODs90WSuGE1aKUl1kVKPgEieGG4EXlUJaZALPs+WXVj//hUrzUv4wqwrTgi4kzzmjxlGzwCY2EVQuBEJSUHOZ5faq+WmnPG16YFdbCIla9/QgDMMePOz4sLznaS93LoDEYV1nGs3Gw6iwX5tZ0I3Cfv/TqD+AKBwM4vh46MBoPB6OhxCH0bq6ZFuns+A6mZesLlAaJqjW0ziqTGqpMpwJbDpJrbGibEkXOHVQ0gJ1atchNfDOMXPIS+WONLBm7zssLbReFZnrbFfU/2ot+ZA2rU0+Si2XVW1Qss2gvBZgSmgThzlXyIxYOUCZ4m5XYJdUUWbcv3RcCHcvhf+Ds34Yfwyj78fdk8/bOPbIa/KWvCcxGZIT8o2ckglh5Nbb9QLvwPvt7/tH/qtNq+9tPUfkr/Lf/AEATbhf</latexit>


:
(a) Compute output (prediction) 2. having fewer updates
(b) Calculate error than "on-line" makes
(c) Update w, b updates less noisy
<latexit sha1_base64="dSOm6mhH8W0/jVxZj2uADjYcZZw=">AAACBHicbVBNS8NAEJ3Ur1q/oh57WSyCBymJCnos6sFjBfsBTSmb7aZdutmE3Y1SQg9e/CtePCji1R/hzX/jps1BWx8MPN6bYWaeH3OmtON8W4Wl5ZXVteJ6aWNza3vH3t1rqiiRhDZIxCPZ9rGinAna0Exz2o4lxaHPacsfXWV+655KxSJxp8cx7YZ4IFjACNZG6tll75pyjZEXYj30g/Rhcoxyye/ZFafqTIEWiZuTCuSo9+wvrx+RJKRCE46V6rhOrLsplpoRTiclL1E0xmSEB7RjqMAhVd10+sQEHRqlj4JImhIaTdXfEykOlRqHvunMblXzXib+53USHVx0UybiRFNBZouChCMdoSwR1GeSEs3HhmAimbkVkSGWmGiTW8mE4M6/vEiaJ1X3tOrcnlVql3kcRSjDARyBC+dQgxuoQwMIPMIzvMKb9WS9WO/Wx6y1YOUz+/AH1ucPpw+Xcg==</latexit>

C. Update w, b : <latexit sha1_base64="fvrM1fkDFlOVkxHoGkYgMiCFIOA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwIWVGBV0W3bisYB/QDiWTZtrQTGZMMpUy9DvcuFDErR/jzr8x085CWw8EDufcyz05fiy4No7zjQorq2vrG8XN0tb2zu5eef+gqaNEUdagkYhU2yeaCS5Zw3AjWDtWjIS+YC1/dJv5rTFTmkfywUxi5oVkIHnAKTFW8rohMUM/SJ+mZ9jvlStO1ZkBLxM3JxXIUe+Vv7r9iCYhk4YKonXHdWLjpUQZTgWblrqJZjGhIzJgHUslCZn20lnoKT6xSh8HkbJPGjxTf2+kJNR6Evp2MgupF71M/M/rJCa49lIu48QwSeeHgkRgE+GsAdznilEjJpYQqrjNiumQKEKN7alkS3AXv7xMmudV96Lq3F9Wajd5HUU4gmM4BReuoAZ3UIcGUHiEZ3iFNzRGL+gdfcxHCyjfOYQ/QJ8/gEuR6Q==</latexit>

3. makes more updates/


w := w + w, b := + b
<latexit sha1_base64="cm3FQz8Hc8HW9gDuCX9rtFlcymA=">AAACKHicbVDLSgMxFM34rPU16tJNsAiCUmZUUASxqAuXFewDOqUkaaYNzWSGJKOUoZ/jxl9xI6JIt36JmXbA2nogcHLOvdx7D444U9pxhtbc/MLi0nJuJb+6tr6xaW9tV1UYS0IrJOShrGOkKGeCVjTTnNYjSVGAOa3h3k3q1x6pVCwUD7of0WaAOoL5jCBtpJZ95QVId7GfPA3gxSWc+B1C75ZyjSa1I4jTql8Pt+yCU3RGgLPEzUgBZCi37HevHZI4oEITjpRquE6kmwmSmhFOB3kvVjRCpIc6tGGoQAFVzWR06ADuG6UN/VCaJzQcqZMdCQqU6gfYVKZLq2kvFf/zGrH2z5sJE1GsqSDjQX7MoQ5hmhpsM0mJ5n1DEJHM7ApJF0lEtMk2b0Jwp0+eJdXjontSdO5PC6XrLI4c2AV74AC44AyUwB0ogwog4Bm8gg/wab1Yb9aXNRyXzllZzw74A+v7BybXpBo=</latexit>

epoch than "batch" and


is thus faster

Sebastian Raschka STAT 479: Deep Learning SS 2019 !55
Batch Gradient Descent as Surface Plot

Lmin
<latexit sha1_base64="Mo8CVaK+/t2oXYiT1RZo2sPshjM=">AAAB+3icbVDLSsNAFL3xWesr1qWbYBFclUQFXRbduHBRwT6gCWEynbZDZyZhZiKWkF9x40IRt/6IO//GSZuFth4YOJxzL/fMiRJGlXbdb2tldW19Y7OyVd3e2d3btw9qHRWnEpM2jlksexFShFFB2ppqRnqJJIhHjHSjyU3hdx+JVDQWD3qakICjkaBDipE2UmjXfI70GCOW3eVh5nMq8tCuuw13BmeZeCWpQ4lWaH/5gxinnAiNGVKq77mJDjIkNcWM5FU/VSRBeIJGpG+oQJyoIJtlz50TowycYSzNE9qZqb83MsSVmvLITBZJ1aJXiP95/VQPr4KMiiTVROD5oWHKHB07RRHOgEqCNZsagrCkJquDx0girE1dVVOCt/jlZdI5a3jnDff+ot68LuuowBEcwyl4cAlNuIUWtAHDEzzDK7xZufVivVsf89EVq9w5hD+wPn8Ao1eU0g==</latexit>

w2
<latexit sha1_base64="UkXJNXFgsoaYDNhtSeqRDvmteSE=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEMWI2i</latexit>

Updates perpendicular 

to contour lines

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !56


Stochastic Gradient Descent as Surface Plot

Lmin
Stochastic updates

<latexit sha1_base64="Mo8CVaK+/t2oXYiT1RZo2sPshjM=">AAAB+3icbVDLSsNAFL3xWesr1qWbYBFclUQFXRbduHBRwT6gCWEynbZDZyZhZiKWkF9x40IRt/6IO//GSZuFth4YOJxzL/fMiRJGlXbdb2tldW19Y7OyVd3e2d3btw9qHRWnEpM2jlksexFShFFB2ppqRnqJJIhHjHSjyU3hdx+JVDQWD3qakICjkaBDipE2UmjXfI70GCOW3eVh5nMq8tCuuw13BmeZeCWpQ4lWaH/5gxinnAiNGVKq77mJDjIkNcWM5FU/VSRBeIJGpG+oQJyoIJtlz50TowycYSzNE9qZqb83MsSVmvLITBZJ1aJXiP95/VQPr4KMiiTVROD5oWHKHB07RRHOgEqCNZsagrCkJquDx0girE1dVVOCt/jlZdI5a3jnDff+ot68LuuowBEcwyl4cAlNuIUWtAHDEzzDK7xZufVivVsf89EVq9w5hD+wPn8Ao1eU0g==</latexit>

are a bit noisier, because



each batch is an approximation

of the overall loss on the

w2
training set

<latexit sha1_base64="UkXJNXFgsoaYDNhtSeqRDvmteSE=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEMWI2i</latexit>

(later, in deep neural nets, we



will see why noisier updates are

actually helpful)

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 !57


Batch Gradient Descent as Surface Plot
Lmin
<latexit sha1_base64="Mo8CVaK+/t2oXYiT1RZo2sPshjM=">AAAB+3icbVDLSsNAFL3xWesr1qWbYBFclUQFXRbduHBRwT6gCWEynbZDZyZhZiKWkF9x40IRt/6IO//GSZuFth4YOJxzL/fMiRJGlXbdb2tldW19Y7OyVd3e2d3btw9qHRWnEpM2jlksexFShFFB2ppqRnqJJIhHjHSjyU3hdx+JVDQWD3qakICjkaBDipE2UmjXfI70GCOW3eVh5nMq8tCuuw13BmeZeCWpQ4lWaH/5gxinnAiNGVKq77mJDjIkNcWM5FU/VSRBeIJGpG+oQJyoIJtlz50TowycYSzNE9qZqb83MsSVmvLITBZJ1aJXiP95/VQPr4KMiiTVROD5oWHKHB07RRHOgEqCNZsagrCkJquDx0girE1dVVOCt/jlZdI5a3jnDff+ot68LuuowBEcwyl4cAlNuIUWtAHDEzzDK7xZufVivVsf89EVq9w5hD+wPn8Ao1eU0g==</latexit>

w2
<latexit sha1_base64="UkXJNXFgsoaYDNhtSeqRDvmteSE=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEMWI2i</latexit>

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

If inputs are on very different scales



some weights will update more than

others ... and it will also harm convergence
(always normalize inputs!)

Sebastian Raschka STAT 479: Deep Learning SS 2019 !58


Linear Regression

Code example:
https://github.com/rasbt/stat479-deep-learning-ss19/blob/
master/L05_grad-descent/code/linear-regr-gd.ipynb

Sebastian Raschka STAT 479: Deep Learning SS 2019 !59


ADALINE
Widrow and Hoff's ADALINE (1960)
A nicely differentiable neuron model
Widrow, B., & Hoff, M. E. (1960). Adaptive switching circuits (No. TR-1553-1). Stanford Univ Ca Stanford Electronics Labs.
Widrow, B. (1960). Adaptive" adaline" Neuron Using Chemical" memistors.".

Image source: https://www.researchgate.net/profile/Alexander_Magoun2/


publication/265789430/figure/fig2/AS:392335251787780@1470551421849/
ADALINE-An-adaptive-linear-neuron-Manually-adapted-synapses-Designed-
and-built-by-Ted.png

Sebastian Raschka STAT 479: Deep Learning SS 2019 !60


ADALINE
ADAptive LInear NEuron

}
1 Weight update
Error
w0
x1 w1
Σ Output Perceptron
w2
x2 . Net input Threshold
. . wm function function <latexit sha1_base64="Jd3B4pbgSdF7SIopAY9TlbYYmUk=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0oMeiF49V7Ae0oWy2k3bpZhN2N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz00Jv2yxW36s5BVomXkwrkaPTLX71BzNIIpWGCat313MT4GVWGM4HTUi/VmFA2pkPsWipphNrP5pdOyZlVBiSMlS1pyFz9PZHRSOtJFNjOiJqRXvZm4n9eNzXhtZ9xmaQGJVssClNBTExmb5MBV8iMmFhCmeL2VsJGVFFmbDglG4K3/PIqaV1Uvcuqe1+r1G/yOIpwAqdwDh5cQR3uoAFNYBDCM7zCmzN2Xpx352PRWnDymWP4A+fzB582jWk=</latexit>

xm

1 Weight update

}
w0 Error

x1 w1
w2
Σ Output ADALINE
x2 . Net input Activation Threshold
. . wm
function function function

xm
<latexit sha1_base64="Jd3B4pbgSdF7SIopAY9TlbYYmUk=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0oMeiF49V7Ae0oWy2k3bpZhN2N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz00Jv2yxW36s5BVomXkwrkaPTLX71BzNIIpWGCat313MT4GVWGM4HTUi/VmFA2pkPsWipphNrP5pdOyZlVBiSMlS1pyFz9PZHRSOtJFNjOiJqRXvZm4n9eNzXhtZ9xmaQGJVssClNBTExmb5MBV8iMmFhCmeL2VsJGVFFmbDglG4K3/PIqaV1Uvcuqe1+r1G/yOIpwAqdwDh5cQR3uoAFNYBDCM7zCmzN2Xpx352PRWnDymWP4A+fzB582jWk=</latexit>
}
<latexit sha1_base64="Jd3B4pbgSdF7SIopAY9TlbY mUk=">A AB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0oMeiF49V7Ae0oWy2k3bpZhN2N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZ LGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0 Jv2yxW36s5BVomXkwrkaPTLX71BzNI pWGCat313MT4GVWGM4HTUi/VmFA2pkPsWip hNrP5pdOyZlVBiSMlS1pyFz9PZHRSOtJFNjOiJqRXvZm4n9eNzXhtZ9xmaQGJVs ClNBTExmb5MBV8iMmFhCmeL2VsJGVF mbDglG4K3/PIqaV1Uvcuqe1+r1G/yOIpwAqdwDh5cQR3uoAFNYBDCM7zCmzN2Xpx352PRWnDymWP4A+fzB582jWk=</latexit>

Linear Regression
Sebastian Raschka STAT 479: Deep Learning SS 2019 !61
ADALINE

Code example:
https://github.com/rasbt/stat479-deep-learning-ss19/blob/
master/L05_grad-descent/code/adaline-sgd.ipynb

Sebastian Raschka STAT 479: Deep Learning SS 2019 !62


Next Lecture:

Neurons with non-linear activation functions

Sebastian Raschka STAT 479: Deep Learning SS 2019 !63


Ungraded HW assignment

See last cell in the linear regression Jupyter Notebook

https://github.com/rasbt/stat479-deep-learning-ss19/blob/
master/L05_grad-descent/code/linear-regr-gd.ipynb

Sebastian Raschka STAT 479: Deep Learning SS 2019 !64


Reminder: GRADED HW assignment
• Due on Thu (Feb 21) at 11:59 pm
• don't need to submit whole folder, just submit .ipnyb, .html

like last time

https://github.com/rasbt/stat479-deep-learning-ss19/tree/master/hw2

Sebastian Raschka STAT 479: Deep Learning SS 2019 !65

You might also like