Machine Learning Lecture 11

Building Blocks: Neurons
A neuron takes inputs, does some math

with them, and produces one output.
First, each input is multiplied by a weight
Second, all the weighted inputs are added together with a bias b
(w.x) + b = w1* x1 + w2* x2 + b
Finally, the sum is passed through an activation function
y = f((w.x) + b) + f(w1* x1 + w2* x2 + b)

Activation Function and Examples
The activation function is used to turn an unbounded input into an
output that has a nice, predictable form.
Building Blocks: Neurons
The activation function is used to turn an unbounded input into an
output that has a nice, predictable form.
A commonly used activation function is the sigmoid function
The sigmoid function only outputs

numbers in the range (0,1).
You can think of it as compressing
(−∞,+∞) to (0,1)
— big negative numbers become ~0
and
big positive numbers become ~1.
Example
w1 = 0 , w2 = 1 , and b = 4
x1 = 2 , and x2 = 3
Example
Is the output change If we use others activation functions?
Function Value at x = 7
Unit Step 1
Linear Function 7
Hyperbolic Tangent 0.9999
Function Value at x = 7
Unit Step 0
Linear Function -7
Hyperbolic Tangent -1
Sigmoid 0
Linear Classifier
The simple neuron can solve linear classifier problem such as OR.
There are many lines can be used to classify this problem

XOR problem
The simple neuron (Perceptron) can not classify the problems such as
XOR.
There is no line can be used to classify this problem.

Hidden Layer
The solution to this problem is to expand beyond the single-layer
architecture by adding an additional layer of units without any direct access
to the outside world, known as a hidden layer. It solves the conflict.
This kind of architecture is another feed-forward network known as a
multilayer perceptron (MLP).
Combining Neurons into a Neural Network
A neural network is nothing more than a bunch of neurons connected together.
This network has 2 inputs, a hidden layer with 2 neurons (h1 and h2), and an output
layer with 1 neuron (o1).
Notice that the inputs for o1 are the outputs from h1 and h2
A hidden layer is any layer between the input (first) layer and output (last) layer.
There can be multiple hidden layers! (This is called Deep Learning )
Example
Given this neural network
w1 =w3 =0 , w2 = w4 =1, and b = 0 for h1 and h2 and o1
Neurons h1 and h2 and o1 has same activation function
x1 = 2 and x2 = 3 Calculate the output y

Training a Neural Network
Build a neural network to predict someone’s sex given their weight and
height from this data
(Sex (Weight) (Height) ID
F 59 164 1
M 72 183 2
M 71 179 3
F 54 154 4
First Step : Construct the neural network typology :

Inputs (2 inputs Wight and Height)
One Hidden layer with two neuron h1 and h2
One output neuron O1  sex
Second Step: (Repeated task) train the network that means determine the weights
of network
w1, w2, w3, w3, w5, w6,b1,b2,b3

First : Data preparing We’ll represent Male Sex Weight - 64 Height - 170
1 -5 -6
with a 0 and Female with a 1, and we’ll also 0 8 13
shift the data to make it easier to use: 0 7 9
y_true 1 -10 -16
Second: we initialize w1, w2, w3, w3, w5, w6,b1,b2,b3 by random values
Third: we determine predicate Sex according to give Weight and Height as follow
O1 is called y_pred
Loss
We first need a way to quantify how “good” it’s doing so that it can try to do “better”.
That’s what the loss is.
We’ll use the mean squared error (MSE) loss:
Where
• n is the number of samples, which is 4
• y represents the variable being predicted, which is Sex.
• y_true is the true value of the variable (the “correct answer”).
y_pred is the predicted value of the variable. It’s whatever our network outputs.
(y_true−y_pred)² is known as the squared error.
Our loss function is simply taking the average over all squared errors.
The better our predictions are, the lower our loss will be!
Better predictions = Lower loss.
Training a network = trying to minimize its loss.
The process of calculate h1, h2 ,and o1 is called feedforward where the calculation goes from input to
hidden and then to output
The process of modifying weights (w1,w2,w3, … w2) is called back Propagation where the modifications
goes from output to hidden and then to input
Feed Forward Calculation

Feed Forward Calculation
So this type neural network (NN) is named:
X1 1 W11
• Multilayer NN
H1
W1k
WH11
• Feedforward NN
W21 • back Propagation
X2 2 .
W2k .
O1
.
.
WHk1
. Wn1
.
Hk
why these names are used ?
Xn n
Wnk • Multilayer NN
• Feedforward NN
Back Propagation weights modifications
Back Propgation weights modifications
• back Propagation
In Feedforward we calculate
1. H1 and h2
2. O1
In back Propagation we calculate

1. b3
2. w6 and w6
3. b2 and b1
4. w1, w2, w3, and w4
Training Algorithm
1. initialize w1,w2,w3,w4,w5,w6,wb1,b2,b3
2. Repeat this part for N times
I. Determine h1, h2, o1
II.Determine mean squared error (MSE) loss
III.Modify w1,w2,w3,w4,w5,w6,wb1,b2,b3
3. IF loss is acceptable
The neural network is constructed
Else
Try to re-build new NN typology (increase hidden nodes)
Training Algorithm (trial and Errors)
Start Construct another
typology (add nodes for
Initialize Weights And Biases hidden or add new hidden
layer
Determine Hidden values (h1, h2) and Output (o1)

NO
YES
loss is
acceptable Stop
Determine mean squared error (MSE) loss
Repeat for
All iterations Done
Modify Weights And Biases
specified iterations
First Step initialize w1,w2,w3,w4,w5,w6,wb1,b2,b3
# Weights
self.w1 = np.random.normal() w1 = -0.17078787256065822
self.w2 = np.random.normal() w2 = 0.8018910260223238
self.w3 = np.random.normal() w3 = 2.042028648489558
w4 = 0.9472266245782457
self.w4 = np.random.normal()
w5 = 0.11610745255156371
w6 = -0.04474672280574983
# Biases
self.b1 = np.random.normal() b1 = 0.049210103523584146
self.b2 = np.random.normal() b2 = -0.7372822297715569
self.b3 = np.random.normal() b3 = 0.6148445824873799
Second Step: Determine Hidden values (h1, h2) and Output (o1)
x1 = w1 * weight + w2 * height + b1
h1 = x3 = w5 * h1 + w6 * h2 + b3
x2 = w3* weight + w3 * height + b2 o1 =
h2 =
def feedforward(self, x):

# x is a numpy array with 2 elements.
h1 = sigmoid(self.w1 * x[0] + self.w2 * x[1] + self.b1)
h2 = sigmoid(self.w3 * x[0] + self.w4 * x[1] + self.b2)
o1 = sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
return o1
Applay at first row hight weight Y SE = 0 initial value
-2 -1 1
w1 = -0.170
w2 = 0.802
b1 = 0.049
w3 = 2.042
w4 = 0.947
b2 = -0.737
w5 = 0.116
w6 = -0.045
b3 = 0.615
Determine mean squared error (MSE) loss
def mse_loss(y_true, y_pred):

# y_true and y_pred are numpy arrays of the same length.
return ((y_true - y_pred) ** 2).mean()
Ypred Ytrue
0.65 1
0.67 0
0.67 0
0.65 1
MSE = ¼ {(1-0.65)2 +(0-0.67)2 + (0-0.67)2 + (1-0.65)2}

MSE = ¼ (1.127) = 0.28
Third Step: Modify Weights And Biases
We write loss (MSE) is as a function of weights and biases
MSE = L(w1,w2,w3,w4,w5,w6,wb1,b2,b3)
We use partial differentiation to determine the value of the effect of each of the
weights.so that the
new weight = old weight – learning rate * partial differential of equation L with respect
to weight.
= −
ℎ o1
d d
− =2 − ∗ = −2 −
d d
Imagine we wanted to tweak w1. How would loss L change if we changed w1?
That’s a question the partial derivative can answer. How do we calculate it?
To start, let’s rewrite the partial derivative in terms of ∂y_pred/∂w1 instead:
Where L= (1−y_pred)²
Ytrue = 1 in this example
We can break down ∂L/∂w1 into
several parts we can calculate:
Since w1 only aﬀects h1 (not h2), we can write
This system of calculating partial

derivatives by working backwards is
known as backpropagation, or
“backprop”.
The same with
Where w5  ypre  loss
So
The same with

Machine Learning Lecture 11

Uploaded by

Copyright:

Available Formats

You might also like

Machine Learning Lecture 11

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning Lecture 11

Uploaded by

Copyright:

Available Formats

Building Blocks: Neurons

A neuron takes inputs, does some math

First, each input is multiplied by a weight

Finally, the sum is passed through an activation function

y = f((w.x) + b) + f(w1* x1 + w2* x2 + b)

The sigmoid function only outputs

There are many lines can be used to classify this problem

There is no line can be used to classify this problem.

Neurons h1 and h2 and o1 has same activation function

x1 = 2 and x2 = 3 Calculate the output y

First Step : Construct the neural network typology :

w1, w2, w3, w3, w5, w6,b1,b2,b3

Feed Forward Calculation

In back Propagation we calculate

Determine Hidden values (h1, h2) and Output (o1)

x2 = w3* weight + w3 * height + b2 o1 =

def feedforward(self, x):

def mse_loss(y_true, y_pred):

MSE = ¼ {(1-0.65)2 +(0-0.67)2 + (0-0.67)2 + (1-0.65)2}

Since w1 only aﬀects h1 (not h2), we can write

This system of calculating partial

The same with

You might also like