lec2 - 딥러닝 기초

목차
1. Linear Classifier
2. Multilayer Neural Networks
3. Convolutional Neural Networks
1 Autonomous Intelligent Systems (AIS) LAB

Linear Classifier
 Linear Discriminant Functions and Decision Surfaces
 Discriminant function
g (x) = wt x + ω0
 2 class classification
w : weight vector x1 x2
: class 1 : class 2
ω0 : threshold weight S1 1.2 2.3
x2
S2 1.0 2.2
Two-Category Case S3 1.5 2.9
x1
S4 4.0 3.0
g (x) > 0 → select ω1
g (x) = 2x1 − 3x2 − 5 S5 5.0 4.2

g (x) < 0 → select ω2 S6 6.0 5.0
g (x) = 0
g (x ) is linear → decision surface is hyperplane
출처: Pattern Classification, Richard O. Duda, Peter E. Hart, and David G. Stork, John Wiley & Sons, Inc. 2001

Linear Classifier
 Linear Discriminant Functions and Decision
Surfaces
g (x) = wt x + ω0
w : weight vector
ω0 : threshold weight

Linear Classifier
 Multicategory Case
1) two-class problems:
2) every pair of classes:
regions with no classification

Linear Classifier
 Two-Category Linear Separable Case
n samples y , K, y ∈ {ω , ω }
1 n 1 2
find the weights a in g (x) = a y t
→ weight vector that classifies all of the samples correctly

→ if such a weight vector exists
→ samples are said to be linearly separable
Linear Classifier
 Two-Category Linear Separable Case
yi → ω1 if at yi > 0

yi → ω2 if at yi < 0
normalization all samples labeled

by their negatives
find a weight vector a
at y > 0 for all of the samples
separating vector or
solution vector
Linear Classifier
 Gradient Descent Procedures
a * = min J (a )
solution to the set of linear
inequalities at yi > 0 a
Gradient Descent Procedure a (k + 1) = a (k ) − η (k )∇J (a (k ))

η : learning rate
Algorithm 1. (Basic Gradient Descent)

1 begin initialize a, threshold θ , η ⋅ , k ← 0()
2 do k ← k + 1
3 () ()
a ← a − η k ∇J a
4 until η (k )∇J (a ) < θ
5 return a
6 end
Linear Classifier
 Minimizing the Perceptron Criterion Function
a * = min J (a )
 J(a) 결정 solution to the set of linear
inequalities a yi > 0
t
a
 The Perceptron Criterion Function
J p (a) = J p (a; y1, K, yn ) =  - at y ( )
y∈Y
Y (a ) : the set of samples misclassified by a

Geometrical interpretation of J p (a ) :
proportional to the sum of the distance from the
misclassif ied samples to the decision boundary
 =  - y update rule : a k + 1 = a k − η k ( ) ( )  (- y)
 ∂J p 
∇J p =  ( ) ( )
 j  y∈Y
∂a y∈Yk
Yk : the set of samples misclassified by a k ()
Linear Classifier
 Minimizing the Perceptron Criterion Function
Algorithm 3. (Batch Perception)
1 begin initialize a,η(⋅) , criterion θ , k ← 0
2 do k ← k + 1
3 a ← a + η(k )  y Use all samples
y ∈Yk
for parameters update
4 until η(k )  y < θ
y ∈Yk
5 return a
6 end
Algorithm 4. (Fixed-Increment Single-Sample Perceptron)
1 begin initialize a, k ← 0
2 do k ← (k + 1) mod n Use one sample
3 if yk is misclassified by a then a ← a + yk for parameters update
4 until all patterns properly classified
5 return
6 end
Multilayer Neural Networks
• Classifiers: input units connected by modifiable weights to
output units
• With clever choice of nonlinear func3ons → decision region
leading to minimum error → diﬃculty in choosing the
appropriate nonlinear functions
• ex) choose a complete basis set such as all polynomials
→ too many free parameters to de determined from a
limited number of training patterns
• ex) use prior knowledge relevant to the classification
problem → guide our choice of nonlinearity
Goal: a way to learn the nonlinearity at the same time as the

linear discriminant → mutilayer neural network or multilayer
Perceptrons
 Multilayer Neural Networks: Feedforward Operation and
Classification
d d
net j =  xi ω ji + ω j 0 =
i =1
 i ji j x
x ω
i =0
≡ wt
i : indexes unit in the input layer


 j : indexes unit in the hidden layer

ω ji : input - to - hidden layer

 weights at the hidden unit j
Classification
Classification
y j = f (net j )
 1 if net ≥ 0
f (net ) ≡ Sgn(net ) ≡ 
- 1 if net < 0
activation function
() 
f ⋅ → " nonlinearity" of a unit :
serve as a ϕ function

nH nH
net k =  y j ωkj + ωk 0 =
j =1
 j kj k y
y ω
j =0
= wt
z k = f (net k )
 Multilayer Neural Networks: Expressive Power
• Sufficient number of hidden units → any function can

be represented
• Every decision can be implemented by such a
three-layer networks?
• “yes”
• any continuous function from input to output
can be implemented in a three-layer net,
given sufficient number of hidden units, proper
nonlinearities, and weights
• any posterior probabilities can be represented
by a three-layer net
 nH d  
g k (x) ≡ z k = f  w kjf   ω ji xi + ω j 0  + ωk 0 
 j =1  i =1  
Kolmogorov : any continuous function g (x)
defined on the unit hypercube I n (I = [0,1] and n ≥ 2 )
can be expressed in the form
2n +1
d 
g (x) =  Ξ j   Ψij (xi )
j =1  i =1 
Fourier’s theorem: any continuous function can be

approximated arbitrarily closed by a possibly infinite sum of
harmonic functions
 Multilayer Neural Networks: Backpropagation
Algorithm
• The simplest and most general method for supervised
training of multilayer neural networks
• Natural extension of the LMS algorithm for linear systems
training error : J (w) ≡  (t k − z k ) = t − z

1 c 2 1 2
2 k =1 2
t k : desired output

z k : actual output
1) weights are initialized with random values
∂J ∂J
2) ∆w = −η or ∆w pg = −η
∂w ∂w pq
3) w(m + 1) = w(m ) + ∆w(m )
Algorithm

Algorithm
hidden-to-output
∂J ∂J ∂net k ∂net k
= = −δ k
∂w kj ∂net k ∂w kj ∂w kj
 ∂J 
 δ k = − : sensitivity of unit k 
 ∂net k 
 ∂J ∂J ∂z k
 δ = − = − = (t k − z k )f ′(net k )
∂net k ∂z k ∂net k
k


 ∂net k = y
 ∂w kj j
∆w kj = ηδ k y j = η (t k − z k )f ′(net k )y j
Algorithm
input-to-hidden
∂J ∂J ∂y j ∂net j
=
∂w ji ∂y j ∂net j ∂w ji
∂J ∂ 1 c 2 ∂z k
( ) ( )
c
=   k t − z k  = −  t − z
∂y j ∂y j ∂y j
k k
 2 k =1  k =1
∂z k ∂net k
= − (t k − z k ) = − (t k − z k )f ′(net k )w kj
c c
k =1 ∂net k ∂y j k =1
δ j ≡ f ′(net j )w kj δ k
c
k =1
c 
∆w ji = ηxi δ j = η w kj δ k f ′(net j )xi
 k =1 
 Multilayer Neural Networks: Training Protocols
Three most useful training protocols: stochastic, batch,
and on-line
• Stochastic training: patterns are chosen randomly from
the training data set, network weights are updated for
each pattern representation
• Batch training: all patterns are presented to the network
before learning takes place
• On-line training: each pattern is presented once and
only once, no use of memory for storing patterns
 Multilayer Neural Networks: Learning Curve
Multilayer Neural Networks: Toy Problem(1)
Output
-1 -1 -1
-1 1 1
1 -1 1
1 1 -1
parameters: ( , , ,( , , ,( , ,

z: network output
t: label/ground truth
parameters: ( , , ,
( , , ,( , ,

parameters: ( , , ,
( , , ,( , ,

parameters: ( , , ,
( , , ,( , ,

Getting parameters
(1) random initialization
(2) update parameters
( )
parameters: ( , , ,( , , ,( , ,


(1) parameters initialization
(2) parameters update
(1) forward
(2) Backward:
0.249 0.1 0.248 1

Convolutional Neural Networks (CNN)
∞ ∞
 Convolution:
(f * g )(t ) =  f (τ )g (t − τ )dτ = −∞f (t − τ )g(τ )dτ
−∞
출처: https:/ / en.wikipedia.org/ wiki/ Convolution

 Convolution
image mask result

# of learning parameters: 3X3X3(weight)+1(bias)

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu






























CNN Structures

CNN Structures

CNN Structures

CNN Structures

CNN Structures

CNN Structures

CNN Structures

CNN Structures

CNN Structures

CNN Structures

lec2 - 딥러닝 기초

Uploaded by

Copyright:

Available Formats

You might also like

lec2 - 딥러닝 기초

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

lec2 - 딥러닝 기초

Uploaded by

Copyright:

Available Formats

목차

1 Autonomous Intelligent Systems (AIS) LAB

2 Autonomous Intelligent Systems (AIS) LAB

3 Autonomous Intelligent Systems (AIS) LAB

4 Autonomous Intelligent Systems (AIS) LAB

find the weights a in g (x) = a y t

→ weight vector that classifies all of the samples correctly

normalization all samples labeled

Gradient Descent Procedure a (k + 1) = a (k ) − η (k )∇J (a (k ))

Algorithm 1. (Basic Gradient Descent)

Y (a ) : the set of samples misclassified by a

Yk : the set of samples misclassified by a k ()

Goal: a way to learn the nonlinearity at the same time as the

i : indexes unit in the input layer

• Sufficient number of hidden units → any function can

Fourier’s theorem: any continuous function can be

training error : J (w) ≡  (t k − z k ) = t − z

20 Autonomous Intelligent Systems (AIS) LAB

 Multilayer Neural Networks: Learning Curve

25 Autonomous Intelligent Systems (AIS) LAB

26 Autonomous Intelligent Systems (AIS) LAB

27 Autonomous Intelligent Systems (AIS) LAB

28 Autonomous Intelligent Systems (AIS) LAB

29 Autonomous Intelligent Systems (AIS) LAB

30 Autonomous Intelligent Systems (AIS) LAB

(2) parameters update

0.249 0.1 0.248 1

31 Autonomous Intelligent Systems (AIS) LAB

출처: https:/ / en.wikipedia.org/ wiki/ Convolution

32 Autonomous Intelligent Systems (AIS) LAB

image mask result

33 Autonomous Intelligent Systems (AIS) LAB

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu

34 Autonomous Intelligent Systems (AIS) LAB

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu

35 Autonomous Intelligent Systems (AIS) LAB

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu

36 Autonomous Intelligent Systems (AIS) LAB

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu

37 Autonomous Intelligent Systems (AIS) LAB

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu

38 Autonomous Intelligent Systems (AIS) LAB

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu

39 Autonomous Intelligent Systems (AIS) LAB

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu

40 Autonomous Intelligent Systems (AIS) LAB

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu

41 Autonomous Intelligent Systems (AIS) LAB

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu

42 Autonomous Intelligent Systems (AIS) LAB

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu

43 Autonomous Intelligent Systems (AIS) LAB

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu

44 Autonomous Intelligent Systems (AIS) LAB

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu

45 Autonomous Intelligent Systems (AIS) LAB

from: Fei-Fei Li, Justin Johnson, and Serena Yeung, http://cs231n.stanford.edu

46 Autonomous Intelligent Systems (AIS) LAB