Learning Basics of Artificial Intelligence Through Neural Networks

So what are neural networks??
Voice Image
N.Net Transcription N.Net Text caption
signal
Game
N.Net Next move
State
• What are these boxes?

– Functions that take an input and produce an output
– What’s in these functions?
3
Brain: Interconnected Neurons
• Many neurons connect in to each neuron

• Each neuron connects out to many neurons
29
Connectionism lives on..
• The human brain is a connectionist machine
– Bain, A. (1873). Mind and body. The theories of their
relation. London: Henry King.
– Ferrier, D. (1876). The Functions of the Brain. London:
Smith, Elder and Co
• Neurons connect to other neurons.

The processing/capacity of the brain
is a function of these connections
• Connectionist machines emulate this structure
35
Connectionist Machines
• Network of processing elements

• All world knowledge is stored in the connections
between the elements
36
Connectionist Machines
• Neural networks are connectionist machines
– As opposed to Von Neumann Machines
Von Neumann/Princeton Machine Neural Network
PROGRAM
PROCESSOR NETWORK
DATA
Processing Memory
unit
• The machine has many non-linear processing units

– The program is the connections between these units
• Connections may also define memory
37
Modelling the brain
• What are the units?
• A neuron: Soma
Dendrites
Axon
• Signals come in through the dendrites into the Soma

• A signal goes out via the axon to other neurons
– Only one axon per neuron
• Factoid that may only interest me: Neurons do not undergo cell
division
– Neurogenesis occurs from neuronal stem cells, and is minimal after
birth 45
Perceptron: Simplified model
• Number of inputs combine linearly

– Threshold logic: Fire if combined input exceeds threshold
60
The Universal Model
• Originally assumed could represent any Boolean circuit and
perform any logic
– “the embryo of an electronic computer that [the Navy] expects
will be able to walk, talk, see, write, reproduce itself and be
conscious of its existence,” New York Times (8 July) 1958
– “Frankenstein Monster Designed by Navy That Thinks,” Tulsa,
Oklahoma Times 1958
61
Also provided a learning algorithm
Sequential Learning:
is the desired output in response to input
is the actual output in response to
• Boolean tasks
• Update the weights whenever the perceptron output is
wrong
– Update the weight by the product of the input and the
error between the desired and actual outputs
• Proved convergence for linearly separable classes
62
Perceptron
X 1
-1
2 X 0
1
Y
X 1
1
1
Y Values shown on edges are weights,

numbers in the circles are thresholds
• Easily shown to mimic any Boolean gate

• But…
63
Perceptron
No solution for XOR!

Not universal!
X ?
?
?
• Minsky and Papert, 1968
64
A single neuron is not enough
• Individual elements are weak computational elements

– Marvin Minsky and Seymour Papert, 1969, Perceptrons:
An Introduction to Computational Geometry
• Networked elements are required

65
Multi-layer Perceptron!
X 1
1
-1 1
2
1
1
-1
-1
Y
Hidden Layer
• XOR
– The first layer is a “hidden” layer
– Also originally suggested by Minsky and Papert 1968
66
A more generic model
21
1 1
01 1
1 -1 1 1
21 21 1 21
1 1 1 -1 1 -1
1 1
1
X Y Z A
• A “multi-layer” perceptron
• Can compose arbitrarily complicated Boolean functions!
– In cognitive terms: Can compute arbitrary Boolean functions over
sensory input
– More on this in the next class
67
But our brain is not Boolean
• We have real inputs

• We make non-Boolean inferences/predictions
69
The perceptron with real inputs
x1
x2
x3
xN
• x1…xN are real valued

• w1…wN are real valued
• Unit “fires” if weighted input matches (or exceeds)
a threshold
70
x1
x2
x3
xN
• Alternate view:
– A threshold “activation” operates on the weighted sum of inputs
plus a bias
• An affine function of the inputs
– outputs a 1 if z is non-negative, 0 otherwise
• Unit “fires” if weighted input matches or exceeds a threshold
71
and a real output
b
x1
x2
x3
sigmoid
xN
• x1…xN are real valued

• w1…wN are real valued
• The output y can also be real valued
– Sometimes viewed as the “probability” of firing
72
The “real” valued perceptron
b
x1
x2
f(sum)
x3
xN
• Any real-valued “activation” function may operate on the affine

function of the input
– We will see several later
– Output will be real valued
• The perceptron maps real-valued inputs to real-valued outputs
• Is useful to continue assuming Boolean outputs though, for interpretation
73
A Perceptron on Reals
1
x1
x2
x3
x2 w1x1+w2x2=T
xN
0
x1
• A perceptron operates on x2
x1
real-valued vectors
– This is a linear classifier 74
Boolean functions with a real
perceptron
0,1 1,1 0,1 1,1 0,1 1,1
x1 x1 x1
0,0 x2 1,0 0,0 x2 1,0 0,0 x2 1,0
• Boolean perceptrons are also linear classifiers

– Purple regions have output 1 in the figures
– What are these functions
– Why can we not compose an XOR?
75
Composing complicated “decision”
boundaries
x2 Can now be composed into

“networks” to compute arbitrary
classification “boundaries”
x1
• Build a network of units with a single output

that fires if the input is in the coloured area
76
Booleans over the reals
x2
x1
x2 x1
• The network must fire if the input is in the

coloured area
77
x2
x1
x2 x1

coloured area
78
x2
x1
x2 x1

coloured area
79
x2
x1
x2 x1

coloured area
80
x2
x1
x2 x1

coloured area
81
3
x2
4
4 AND
3 3
5
y1 y2 y3 y4 y5
4 x1
4
3 3
4 x2 x1

coloured area
82
More complex decision boundaries
OR
AND AND
x2
x1 x1 x2
• Network to fire if the input is in the yellow area
– “OR” two polygons
– A third layer is required 83
Complex decision boundaries
• Can compose very complex decision boundaries

– How complex exactly? More on this in the next class
84
OR
AND
x1 x2
• Can compose arbitrarily complex decision
boundaries
88
OR
AND
x1 x2
• Can compose arbitrarily complex decision boundaries
– With only one hidden layer!
– How?
89
Exercise: compose this with one
hidden layer
x2
x1 x1 x2
• How would you compose the decision

boundary to the left with only one hidden
layer? 90
Composing a Square decision
boundary
2
2 4 2
y ≥ 4?
• The polygon net y1 y2 y3 y4
x2 x1
91
Composing a pentagon
2
2
3
4 4
3 3
5
4 4
4 2
2
3 3
y ≥ 5?
2 y1 y2 y3 y4 y5
• The polygon net x2 x1
92
Composing a hexagon
3
3 4 3
5
5 5
4 6 4
5 5
3 5 3
4 4
y ≥ 6?
y1 y2 y3 y4 y5 y6
• The polygon net

x2 x1
93
The multi-layer perceptron
• A network of perceptrons
– Perceptrons “feed” other
perceptrons
– We give you the “formal” definition of a layer later
14
Defining “depth”
• What is a “deep” network
15
Deep Structures
• In any directed graph with input source nodes and
output sink nodes, “depth” is the length of the longest
path from a source to a sink
– A “source” node in a directed graph is a node that has only
outgoing edges
– A “sink” node is a node that has only incoming edges
• Left: Depth = 2. Right: Depth = 3

16
Deep Structures
• Layered deep structure
– The input is the “source”,
– The output nodes are “sinks”
Input: Black
Layer 1: Red
Layer 2: Green
Layer 3: Yellow
Layer 4: Blue
• “Deep”  Depth greater than 2

• “Depth” of a layer – the depth of the neurons in the layer w.r.t. input
17
The multi-layer perceptron
N.Net
• Inputs are real or Boolean stimuli

• Outputs are real or Boolean values
– Can have multiple outputs for a single input
• What can this network compute?
– What kinds of input/output relationships can it model?
18
MLPs approximate functions
21 ℎ
1 1
01 1 ℎ
1 -1 1 1
x
21 21 1 21
1 1 1 -1 1 -1
1 1
1
X Y Z A
• MLPs can compose Boolean functions

• MLPs can compose real-valued functions
• What are the limitations?
19
Today
• Multi-layer Perceptrons as universal Boolean
functions
– The need for depth
• MLPs as universal classifiers
– The need for depth
• MLPs as universal approximators
• A discussion of optimal depth and width
• Brief segue: RBF networks
20
The MLP as a Boolean function
• How well do MLPs model Boolean functions?
22
The perceptron as a Boolean gate
X 1
-1
2 X 0
1
Y
X 1
1
1
Values in the circles are thresholds
Y Values on edges are weights
• A perceptron can model any simple binary

Boolean gate
23
Perceptron as a Boolean gate
1
1
1
L
-1
-1
-1 Will fire only if X1 .. XL are all 1

and XL+1 .. XN are all 0
• The universal AND gate

– AND any number of inputs
• Any subset of who may be negated
24
Perceptron as a Boolean gate
1
1
1
L-N+1
-1
-1
-1 Will fire only if any of X1 .. XL are 1

or any of XL+1 .. XN are 0
• The universal OR gate

– OR any number of inputs
• Any subset of who may be negated
25
Perceptron as a Boolean Gate
1
1
Will fire only if at least K inputs are 1
1
K
1
1
• Generalized majority gate

– Fire if at least K inputs are of the desired polarity
26
Perceptron as a Boolean Gate
1
1
Will fire only if the total number of
1
L-N+K of X1 .. XL that are 1 and XL+1 .. XN that
-1
are 0 is at least K
-1
-1
• Generalized majority gate

– Fire if at least K inputs are of the desired polarity
27
The perceptron is not enough
X ?
?
?
• Cannot compute an XOR
28
Multi-layer perceptron
X 1
1
-1 1
2
1
1
-1
-1
Y
Hidden Layer
• MLPs can compute the XOR
29
Multi-layer perceptron XOR
X
1
1
-2
1.5 0.5
1
1 Thanks to Gerald Friedland
• With 2 neurons
– 5 weights and two thresholds
30
Multi-layer perceptron
21
1 1
01 1
1 -1 1 1
21 2
1 1 2
1
1 1 1 -1 1 -1
1 1
1
X Y Z A
• MLPs can compute more complex Boolean functions
• MLPs can compute any Boolean function
– Since they can emulate individual gates
• MLPs are universal Boolean functions
31
MLP as Boolean Functions
21
1 1
01 1
1 -1 1 1
21 21 1 21
1 1 1 -1 1 -1
1 1
1
X Y Z A
• MLPs are universal Boolean functions

– Any function over any number of inputs and any number
of outputs
• But how many “layers” will they need?
32
How many layers for a Boolean MLP?
Truth table shows all input combinations
Truth Table for which output is 1
X1 X2 X3 X4 X5 Y
0 0 1 1 0 1
0 1 0 1 1 1
0 1 1 0 0 1
1 0 0 0 1 1
1 0 1 1 1 1
1 1 0 0 1 1
• A Boolean function is just a truth table
33
X1 X2 X3 X4 X5 Y
0 0 1 1 0 1
0 1 0 1 1 1
0 1 1 0 0 1
1 0 0 0 1 1
1 0 1 1 1 1
1 1 0 0 1 1
• Expressed in disjunctive normal form
34
X1 X2 X3 X4 X5 Y
0 0 1 1 0 1
0 1 0 1 1 1
0 1 1 0 0 1
1 0 0 0 1 1
1 0 1 1 1 1
1 1 0 0 1 1 X1 X2 X3 X4 X5
35
X1 X2 X3 X4 X5 Y
0 0 1 1 0 1
0 1 0 1 1 1
0 1 1 0 0 1
1 0 0 0 1 1
1 0 1 1 1 1
1 1 0 0 1 1 X1 X2 X3 X4 X5
36
X1 X2 X3 X4 X5 Y
0 0 1 1 0 1
0 1 0 1 1 1
0 1 1 0 0 1
1 0 0 0 1 1
1 0 1 1 1 1
1 1 0 0 1 1 X1 X2 X3 X4 X5
37
X1 X2 X3 X4 X5 Y
0 0 1 1 0 1
0 1 0 1 1 1
0 1 1 0 0 1
1 0 0 0 1 1
1 0 1 1 1 1
1 1 0 0 1 1 X1 X2 X3 X4 X5
38
X1 X2 X3 X4 X5 Y
0 0 1 1 0 1
0 1 0 1 1 1
0 1 1 0 0 1
1 0 0 0 1 1
1 0 1 1 1 1
1 1 0 0 1 1 X1 X2 X3 X4 X5
39
X1 X2 X3 X4 X5 Y
0 0 1 1 0 1
0 1 0 1 1 1
0 1 1 0 0 1
1 0 0 0 1 1
1 0 1 1 1 1
1 1 0 0 1 1 X1 X2 X3 X4 X5
40
X1 X2 X3 X4 X5 Y
0 0 1 1 0 1
0 1 0 1 1 1
0 1 1 0 0 1
1 0 0 0 1 1
1 0 1 1 1 1
1 1 0 0 1 1 X1 X2 X3 X4 X5
41
X1 X2 X3 X4 X5 Y
0 0 1 1 0 1
0 1 0 1 1 1
0 1 1 0 0 1
1 0 0 0 1 1
1 0 1 1 1 1
1 1 0 0 1 1
X1 X2 X3 X4 X5
• Any truth table can be expressed in this manner!
• A one-hidden-layer MLP is a Universal Boolean Function
42
X1 X2 X3 X4 X5 Y
0 0 1 1 0 1
0 1 0 1 1 1
0 1 1 0 0 1
1 0 0 0 1 1
1 0 1 1 1 1
1 1 0 0 1 1
X1 X2 X3 X4 X5
• Any truth table can be expressed in this manner!
• A one-hidden-layer MLP is a Universal Boolean Function
But what is the largest number of perceptrons required in the

single hidden layer for an N-input-variable function? 43
Reducing a Boolean Function
YZ
WX 00 01 11 10
This is a “Karnaugh Map”
00
It represents a truth table as a grid
Filled boxes represent input combinations
01 for which output is 1; blank boxes have
output 0
11
Adjacent boxes can be “grouped” to
reduce the complexity of the DNF formula
10
for the table
• DNF form:
– Find groups
– Express as reduced DNF
44
YZ
WX 00 01 11 10
00 Basic DNF formula will require 7 terms
01
11
10
45
YZ
WX 00 01 11 10
00
01
11
10
• Reduced DNF form:

– Find groups
46
YZ
WX 00 01 11 10
00
01
11
10
• Reduced DNF form:

W X Y Z
– Find groups
– Boolean network for this function needs only 3 hidden units
• Reduction of the DNF reduces the size of the one-hidden-layer network 47
Largest irreducible DNF?
YZ
WX 00 01 11 10
00
01
11
10
• What arrangement of ones and zeros simply

cannot be reduced further?
48
YZ
WX 00 01 11 10
Red=0, white=1
00
01
11
10

49
YZ
WX 00 01 11 10 How many neurons
00 in a DNF (one-
01 hidden-layer) MLP
11
for this Boolean
function?
10

50
Width of a one-hidden-layer Boolean MLP
Red=0, white=1
YZ
WX
00
01
11
10
10 11
01
00 YZ
UV 00 01 11 10
• How many neurons in a DNF (one-hidden-
layer) MLP for this Boolean function of 6
variables? 51
YZ
WX
00
Can be generalized: Will require 2N-1
perceptrons
01 in hidden layer
Exponential
11 in N
10
10 11
01
00 YZ
UV 00 01 11 10
layer) MLP for this Boolean function
52
Poll 2
• Piazza thread @94
53
Poll 2
How many neurons will be required in the hidden layer of a one-hidden-layer
network that models a Boolean function over 10 inputs, where the output for
two input bit patterns that differ in only one bit is always different? (I.e. the
checkerboard Karnaugh map)
 20
 256
 512
 1024
54
YZ
WX
00
Can be generalized: Will require 2N-1
perceptrons
01 in hidden layer
Exponential
11 in N
10
10 11
01
00 YZ
UV 00 01 11 10
• How
How many
many neurons
units if wein ause
DNFmultiple
(one-hidden-
hidden
layers?
55
Size of a deep MLP
YZ
WX 00 01 11 10
YZ
WX
00
00
01 01
11 11
10
11
10 01
10 00 YZ
UV 00 01 11 10
56
Multi-layer perceptron XOR
X 1
1
-1 1
2
1
1
-1
-1
Y
Hidden Layer
• An XOR takes three perceptrons

57
Size of a deep MLP
YZ
WX 00 01 11 10
00
01
9 perceptrons
11
10
W X Y Z
• An XOR needs 3 perceptrons

• This network will require 3x3 = 9 perceptrons
58
Size of a deep MLP
YZ
WX
00
01
11
10
11
10 01
00 YZ
UV 00 01 11 10
U V W X Y Z 15 perceptrons

59
Size of a deep MLP
YZ
WX
00
01
11
10
11
10 01
00 YZ
UV 00 01 11 10
More generally, the XOR of N

U V W X Y Z variables will require 3(N-1)
perceptrons!!

60
One-hidden layer vs deep Boolean MLP
YZ
WX
00
Single hidden layer: Will require 2N-1+1
perceptrons
01 in all (including output unit)
Exponential
11 in N
10
10 11
01
00 YZ
Will require
UV 003(N-1)
01 11perceptrons
10 in a deep
network
Linear in N!!!
Can be arranged in only 2log2(N) layers
A better representation
𝑋 𝑋
• Only layers
– By pairing terms
– 2 layers per XOR …
62
A better representation
XOR
XOR
XOR
XOR
𝑋 𝑋
• Only layers
– By pairing terms
– 2 layers per XOR …
63
The human perspective
Voice Image
signal
Game
N.Net Next move
State
• In a human, those functions are computed by

the brain…
4
These tasks are functions
Voice Image
signal
Game
N.Net Next move
State
• Each of these boxes is actually a function

– E.g f: Image  Caption
95
These tasks are functions
Voice Image
Transcription Text caption
signal
Game
State Next move
• Each box is actually a function

– E.g f: Image  Caption
– It can be approximated by a neural network 96
784 dimensions
(MNIST)
784 dimensions
• Classification problems: finding decision boundaries in

high-dimensional space
– Can be performed by an MLP
• MLPs can classify real-valued inputs 85
Story so far
• MLPs are connectionist computational models
– Individual perceptrons are computational equivalent of neurons
– The MLP is a layered composition of many perceptrons
• MLPs can model Boolean functions

– Individual perceptrons can act as Boolean gates
– Networks of perceptrons are Boolean functions
• MLPs are Boolean machines

– They represent Boolean functions over linear boundaries
– They can represent arbitrary decision boundaries
– They can be used to classify data
86

Learning Basics of Artificial Intelligence Through Neural Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Learning Basics of Artificial Intelligence Through Neural Networks

Uploaded by

Copyright:

Available Formats

So what are neural networks??

• What are these boxes?

• Many neurons connect in to each neuron

• Neurons connect to other neurons.

• Connectionist machines emulate this structure

• Network of processing elements

Von Neumann/Princeton Machine Neural Network

• The machine has many non-linear processing units

• Signals come in through the dendrites into the Soma

• Number of inputs combine linearly

Y Values shown on edges are weights,

• Easily shown to mimic any Boolean gate

No solution for XOR!

• Minsky and Papert, 1968

• Individual elements are weak computational elements

• Networked elements are required

• We have real inputs

• x1…xN are real valued

• x1…xN are real valued

• Any real-valued “activation” function may operate on the affine

0,0 x2 1,0 0,0 x2 1,0 0,0 x2 1,0

• Boolean perceptrons are also linear classifiers

x2 Can now be composed into

• Build a network of units with a single output

• The network must fire if the input is in the

• The network must fire if the input is in the

• The network must fire if the input is in the

• The network must fire if the input is in the

• The network must fire if the input is in the

• The network must fire if the input is in the

• Can compose very complex decision boundaries

• How would you compose the decision

• The polygon net y1 y2 y3 y4

• The polygon net x2 x1

• The polygon net

• What is a “deep” network

• Left: Depth = 2. Right: Depth = 3

• “Deep”  Depth greater than 2

• Inputs are real or Boolean stimuli

• MLPs can compose Boolean functions

• A perceptron can model any simple binary

-1 Will fire only if X1 .. XL are all 1

• The universal AND gate

-1 Will fire only if any of X1 .. XL are 1

• The universal OR gate

• Generalized majority gate

• Generalized majority gate

• Cannot compute an XOR

• MLPs can compute the XOR

1 Thanks to Gerald Friedland

• MLPs are universal Boolean functions

• A Boolean function is just a truth table

• Expressed in disjunctive normal form

• Expressed in disjunctive normal form

• Expressed in disjunctive normal form

• Expressed in disjunctive normal form

• Expressed in disjunctive normal form

• Expressed in disjunctive normal form

• Expressed in disjunctive normal form

• Expressed in disjunctive normal form

But what is the largest number of perceptrons required in the

• Reduced DNF form:

• Reduced DNF form: