Professional Documents
Culture Documents
Omar Arif Omar - Arif@seecs - Edu.pk National University of Sciences and Technology
Omar Arif Omar - Arif@seecs - Edu.pk National University of Sciences and Technology
omar.arif@seecs.edu.pk
National University of Sciences and Technology
The Perceptron – Basic building block of Neural network
𝑥0 𝑤0
𝒙 = 𝑥1 , 𝐰 = 𝑤1
𝑥2 𝑤2
ℎ𝒘 𝒙 = g(𝐰 T 𝒙)
0 0 0
0 1 1
1 0 1
1 1 1
𝒙𝟏 𝒙𝟐 𝒉
0 0 0
0 1 0
1 0 0
1 1 1
Non Linear Decision Boundary
𝒙𝟏 𝒙𝟐 𝒉
0 0 0
Using the basic perceptron, we can
not approximate a non-linear
0 1 1
function
1 0 1
1 1 0
Feature Engineering: Use higher order features such as 𝑥 2 or 𝑥 3
to obtain non-linear function.
Problems: Don’t know what features to choose
0 1 0 0 0
1 0 0 0 0
1 1 1 0 1
Loss function tells us how good our neural network is
Optimization problem
𝑚
Training data: 𝐷𝑡𝑟𝑎𝑖𝑛 = (𝑥 𝑖 , 𝑦 𝑖 ) 𝑖=1
1
𝐽 𝒘𝐷𝑡𝑟𝑎𝑖𝑛 = 𝐿𝑜𝑠𝑠(𝑥, 𝑦, 𝒘)
𝑚
𝑥,𝑦 ∈𝐷𝑇𝑟𝑎𝑖𝑛
𝛻w J(𝐰, Dtrain )
Mean squared loss error
See Backpropagation_examples.pdf
Stochastic Gradient
Batch Gradient Descent
Descent
Initialize weights randomly Initialize weights randomly
Loop Loop
Computer gradient For all data points in 𝐷𝑡𝑟𝑎𝑖𝑛
𝜕𝐽(𝒘) Computer gradient
𝜕𝒘 𝜕𝐿𝑜𝑠𝑠(𝑥, 𝑦, 𝒘)
𝜕𝒘
Update 𝒘 Update 𝒘
𝜕𝐽(𝒘) 𝜕𝐿𝑜𝑠𝑠(𝑥, 𝑦, 𝒘)
𝒘≔𝒘−𝜶 𝒘≔𝒘−𝜶
𝜕𝒘 𝜕𝒘
CIFAR10
MNIST
Fashion-MNIST
Softmax function takes as input a vector of k real numbers and
normalizes it into a probability distribution
𝑒 𝑦𝑖
𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑦𝑖 =
σ𝑘
𝑖=1 𝑒 𝑦𝑖
𝑒 𝑦1 𝑝(𝑦 = 1|𝑥, 𝑤)
1
𝑠𝑜𝑓𝑡𝑚𝑎𝑥 = ⋮ = ⋮
σ𝑘
𝑖=1 𝑒
𝑦𝑖
𝑒 𝑦𝑘 𝑝(𝑦 = 𝑘|𝑥, 𝑤)
Negative-Log-Likelihood-Loss= − σ𝑘
𝑖=1 log(𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑦𝑖 ))
The activation function of all neurons in the hidden layer is ReLU
The output neurons implement Logsoftmax
For complete code see cifar10linear.ipynb
See mnist_classification.ipynb
Labels
● 0 T-shirt/top
● 1 Trouser
● 2 Pullover
● 3 Dress
● 4 Coat
● 5 Sandal
● 6 Shirt
● 7 Sneaker
● 8 Bag
● 9 Ankle boot
𝜕𝐽
𝒘≔𝒘−𝜶
𝜕𝒘
Small learning rate converges slowly while large learning rate
overshoots
𝑣𝑡 = 𝛾𝑣𝑡−1 + 𝛼𝛻𝑤 𝐽
𝑤 ≔ 𝑤 − 𝑣𝑡
optimizer = optim.SGD(h.parameters(), lr = 0.001, momentum=.9)
http://ruder.io/optimizing-gradient-descent/
Learning rate is not fixed
Adam (Adaptive Momentum Estimation)
torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999))
https://pytorch.org/docs/stable/optim.html
1. L2 weight Regularization
𝐽 𝑤 = 𝐿𝑜𝑠𝑠 𝑥, 𝑦, 𝑤 + 𝜆σ𝑤 2
torch.nn.functional.dropout(input, p=0.5)
3. Early Stopping:
Stop before the network starts to over fit
The Perceptron – Basic building block of Neural network