Professional Documents
Culture Documents
DL 2021 Chapter02
DL 2021 Chapter02
Chapter 2
Connectionism
Fernando Perez-Cruz
based on Thomas Hofmann’s Slides
1/41
Section 1
McCulloch-Pitts (1943)
2/41
McCulloch & Pitts
3/41
McCulloch-Pitts Neuron
(
if ni=1 σi xi ≥ θ
P
1
def → f (x; σ , θ) = (M-P neuron)
0 otherwise
4/41
M-P Neuron: AND and NAND
AND NAND
σ1 = σ2 = σ1 = σ2 =
θ= θ=
x1 x2 σ1 x1 + σ2 x2 f σ1 x1 + σ2 x2 f
0 0 0 0 0 1
0 1 1 0 -1 1
1 0 1 0 -1 1
1 1 2 1 -2 0
5/41
M-P Neuron: Sums and DNFs – Example
6/41
M-P Neuron: Sums and DNFs – General
_ ^ ^
f (x; σ, θ) = xi x̄i
I∈I i∈I i6∈I
7/41
Section 2
Hebb (1949)
8/41
Hebbian Learning
t
def → ∆θij = (xti − x̄i )(xtj − x̄j )
9/41
Minsky’s Hebbian Learning Machine: SNARC
10/41
Minsky’s Neuron
11/41
Section 3
Perceptron (1958+)
12/41
Perceptron
Perceptron unit: Inputs x ∈ Rn ; Weights θ ∈ Rn
( P
+1 if i xi θi ≥ 0
def → (x, θ ) 7→ sign(x · θ ) = (sign unit)
−1 else
14/41
Perceptron Iterate Trajectory
θ4
θ2 θ0
θ5 θ3
θ1
1
Python notebook: https://colab.research.google.com/drive/
1fCQa7UGZn5pj5OPoA8IcKODzbohvbgwi?usp=sharing, c/o Antonio Orvieto 15/41
Perceptron: Norm Growth
s
X
s 2
kθθ k ≤ kxt k2
t=1
16/41
Perceptron: Norm Growth
s
X
kθθ s k2 ≤ kxt k2
t=1
16/41
Perceptron: Norm Growth
16/41
Perceptron: Norm Growth
16/41
Perceptron: Norm Growth
s
X
s 2
kθθ k ≤ kxt k2
t=1
√
Corollary. If kxt k ≤ 1 (∀t), then k∆θθ t k ≤ 1 and kθθ s k ≤ s.
16/41
Perceptron: Linear (γ-)Separability
17/41
Perceptron: Convergence Theorem
s s
1
X 2
X 3
⇒ θs · θ∗ = ∆θθ t · θ ∗ = (y t xt ) · θ ∗ ≥ γs
t=1 t=1
4
⇒ θ s · θ ∗ ≤ kθθ s kkθθ ∗ k =
6 √ 6
kθθ s k ≤ s
18/41
Perceptron: Rosenblatt’s View
19/41
Perceptron: XOR Depression
0 1 0 1
def → ,1 , ,1 , , −1 , , −1 ,
0 1 1 0
(XOR problem)
20/41
Linear Dichotomies
Definition
21/41
Cover’s Theorem
n−1
X
s−1
⇒ C(s, n) = 2
i
i=0
22/41
Cover’s Theorem
n−1
X
s−1 s−1 s−1
= + +
i i−1 0
i=1
n−1
X n−1
X s
s
= +1=
i i
i=1 i=0
23/41
The (“Messy”) Intermediate Regime
1 if s ≤ n
1 − O(e−n )
C(s, n) if n < s < 2n
=
2s
1
2 if s = 2n
O(e−n )
otherwise
In plain English:
I All dichotomies are linear, as long as s ≤ n.
I For n < s < 2n a vanishingly small (as n → ∞) number of
dichotomies is not linearly realizable.
I Above s > 2n almost all dichotomies are not linearly
realizable.
24/41
Section 4
25/41
Making (Associative) Memories
26/41
Sparse Binary Patterns
n
( )
X
def → Bnr = x ∈ {0, 1}n : xi = r
i=1
27/41
Information & Capacity
I Memory capacity:
n2 bits
I Upper bound on s:
n2
s≤
r lg n
28/41
Willshaw Memory: Learning
s
( )
X
def → Θji = min 1, yjt xti .
t=1
I Equivalently:
s
( )
t>
X
t t t
def → Θ = y x , Θ = min 1, Θ
t=1
29/41
Willshaw Memory: Retrieval
Θt xt = (xt · xt )y t = ry t
30/41
Willshaw Memory: Example
31/41
Willshaw Memory: Monotonicity
⇒ xt 7→ y ≥ yt , ∀(xt , yt ) ∈ S .
Θτ } ≥ min{1, Θt } = Θt
P
1. Note that for any t: Θ = min{1, τ
2. Hence Θx ≥ Θt x for any x ∈ Bnr as x ≥ 0.
Specifically: zt = Θxt ≥ Θt xt
3. Note that
32/41
Willshaw Memory: Errors?
33/41
Pattern-per-Pattern Storage
I Incremental storage
t t−1 t t
θji = max{θji , yj xi }
I Guided by analysis
P(to tfollow): Monitor the 2number of
non-zero entries i,j θji . Stay well below n /2. Otherwise
the memory is full.
34/41
Analysis: Step 1
How many bits in Θ are turned on (on average) after storing s
random patterns?
r 2 s
⇒ q := P{Θji = 0} = 1 − .
n
Taylor approximation
r 2 r 2
⇒ ln q = s ln 1 − ≈ −s ,
n n
35/41
Analysis: Step 2
⇒ p := P{yj = 1} = (1 − q)r
36/41
Analysis: Step 3
37/41
Analysis: Step 4
1
⇒ r∗ (n, q) = α∗ lg n, α∗ =
− lg(1 − q)
38/41
Capacity
lg2 n
n ∗
⇒ I(pattern) = lg ≈ r lg n ≈
r − lg(1 − q)
3. Maximal Capacity
39/41
Summary
40/41
Was it worth it? :)
41/41