NNDL Technical Publication Notes

and
Memory
Associative
LearningNetwork-Heteroassociative
Unsupervised Weight
lterative
Duantization
Fixed
Networks
- Vector
Network
Networks Hopfield
Learning
MemoryMemory
Network.
- -
(BAM)Maps
Association-Autoassociative
II
UNIT Theory
Associative
Feature
Memory 1-(2)
Resonance Association
Organizing
Networks-Temporal Maps
Network
Associative Answers
Adaptive Feature
Pattern Networks
Propagation
Counter
Quantization
Vector
Learning
- Network
Memory
Associative Theory
Pattern Self
Network-Bidirectional Self-Organizing with
-
Networks for Questions
Kohonen Resonance
Algorithms
for Memory
Algorithns propagation
2 Nets
Competitive
Autoassociative
Marks
Training Kohonen Adaptive
Syllabus Contents Two
Training
Memory Counter
2.1 2.2 2.3 2.4 2.5 2.6 2.7
Introduction
knowledg
for
up-thrust
an
PUBLICA
Notes
1-44
TECHNICAL
Learning
Deep
and
NetworkS
Neutsl
the into
allows showS can
weight
incoming
steraton
ipn It
ietworks decrease
w, weights.
att
sgais 2.12
the Law
weight
Learning in the Hebb's Fig.
activities change
of to synaptic
at meansthe
*, product i.
neuron
Unsupervieod weight a
words, drive Law.
presynaptic how a
of knowl
a provide to growth
Hebb's
network
the to other may
shows
related j
adjustment
applied
to neuron neurons
In for
and not signal the into thrust
and It increase.
Mgrry neural rule.is from
on
postsynaptic neurons does W,(new)
=
W.(old),xy limitfactor up
input connected
a product it connection an
Ass0ciative in only but a forgetting PUBL
learning law the impose
x(p)l. of W
activity
pair can increase,
of
x(p)
both Hebb's Two
the Fly(p), a weights application
Hebbian ay(p)parameter. the 2.1.2 mightnon-linear
express of represent the between
2-3 function to of
as thatconnection weight (W). Fig. we TECHNI
= =
2.1.1 can AW,(p) AW,(p)
rate to connection a
problem,
a can referred implies repcated neurons introducing
Loarning is learning the
Fig. we we outgoing
signals.
and
Law,
Hebb's
Usingfollowing (p)) for
the
form learning
a Thus, this
De0p x case, is synaptic
ofthe of connected
stands
F[y(p), the equation strength by
resolve
and
special is saturation. done
a Hebbian strength. w,
Networks signals
Input where where
a This the The Two To be
As
"
Noural "
an binary repeatedly is of the the
Networke hetero-associati.
recall i networks innu.onb. the The presented
the itweights delta
It by for neurons product then then
u.. change the
to patterns. Eachare determined either
weights
to synchronously,
asynchronously,
Loarning order tw0-layer when changes and the
units.
conncctions u, and two
to rule as to
in output from order
pattern represented j
output Hebb determining neuron i. proportionately
theseneuron
UnsupervIsed a nctwork, is anetwork
in basic layerconnection in
patterNs and The between knowledg
the a,) output adjusted are activatedactivated
aut0-aSsoCialive more excite from
patterns and conncctions. output x
now associators are
SUM(w,certain of
and input-output the patterns stimuli
increase
units the the method to connection for
Momory be specifies are are
inputof input in of can enough network.up-thrust
Association One
u, weight : awith pattern to connection
rules: connection
presented. weighted by with to
sensitive
AssoCiatrve an IS
betwecn unit weights
what common found an
memorizng It the givenrespond
used near synaptic two neural PUBLICA
in modeling.
units, a the with
only on is is is form increased.decreased.
rule rules be is more a a
is associations via units.layerand layer conncction
to most i vector of of a
Pattern pattems pattern of can in
of memory traincd learning
assOCiation.
leaming neuronthe becomes the side side learning
2-2 inputu, output
scts unit output
and It
process two of net. activation,weight in is is
inputinput output either either
for the a, the be The The simplest signal.represent conncction connection TECHNICAL
lcarns of to activation used neural if Hcbbian
learng
ep Algorithms d1stributed unitsin that j rule,
neuron
the or new
architecture,
nctwork cons1sts
cach u, in can pattcrn. inputloutput
behavior. lcarning on on
is
u, association commonly memory states Hebb be neurons neurons
avsoation a assocation input unit unit the its
when to or
vectors
bipolar and can that that shows
input 2.1.1
Rule
Hetbb is in
in architecturefroma the a
connects of inputoutput Law to and
strengthened Law of of
of givcn rule associative partic1patesAccording
Training putterns used ctfectof
activation most
two two weight
2.1|
Pattem allowed product patteman Hebb's
inputHcbb's weight
P'atten WIdcly with a The rule Hcbb If If
thc Its unit
The A for the I. Fig.
2.
2.1 . " " " "
" "
synapticclassification Widrow delta output signals.
resistors
Networks
The
voltage
and:be
by Adaline. Output controllable
Learning the rule. learming. to
until pattern presented inputdefined
deltanamed +1. and input
Unsupervised continue Quantizer
for or is
central
block
categorization,was the -1
function 2,.....n, of the
knowledg
as also values set
and algorithm
known a by
element" +1 Reference 1, of caused
and 2 threshold
output switch i=0, consists for
Memory Step also up-thrust
p. to pattern training Error the currents
iteration procedure
linear into -1 2.1.4
Fig. where
Adaline of
Associative
network back the device
Level output an
Aw,(p) go association, perceptron "adaptive
of mapping PUBLIC
Wo w, up
at one, output this
the correctionvalues learning by the sum
further Summer denoted
in + the then implementation
w,(p) by pattern
p steady-state the the can
2-5 weights iteration square uses y, +0
respectively,
of areas.to without which
= weight applied
generalization rule W are x, TECHNIC
the 1) for mean 2w,
conductances
Learning w,(p+ adaline. Gains Wo
Update the Increase
theirused other learning
output j=1 circuit
n
leastwas = = physical
is 0
Deep Aw.(p) reachbe of and y
:Learning : range the
rule net shows
perceptron Where a
Iterationcan Rule
Delta
2.1.2important to
simple
and as learningthe switches connected
where weights inputx,by
rule a Fig.
"
uses2.1.4
Hoff pattern
Input
NetworkS over signals
Hebb the a
3. 4. and An and The The rule In
If
Neural " "
Networks erorlearningfau.. infini. betues say
values,
n
a
capabla litla u
or usually parameter
labels
towards
very typically j.
|Learning concrete
signals: neuron
remembers
growth. random
It isnetwork
Unsupervisedas cycle. grow rate
sucha chosen, of
not learning. weight learning value
activities, learning weights small
is neural network xy knowle
learning be y algorithm threshold
biological the
should output
Activate +
update
Weight
w,(old) + to
and b(old)
Bias
update the
thresholds
singlethe these
the limiting y=t Stop
Memory the Hebbian 0, to p for
factor algorithm. =
w,(new) =
b(new) training values iteration the up-thrus
than 1,
a is result,
in factor to while is
ofprinciple
Associative information decay closeforgetting and x(p)
w,(p)
-0, 0;
method. a Hebb positive and an
forgetting
as forgetting'
training weights
algorithm: at PUBLIC
weight and isfactor output inputs,
fundamental of small
learning weights, small chart
synaptic neuron
other neuron
2-4
the the forgettinglittleHebb
rather weights
Initialize assign
specifies
If input
Activate Flow Generalized
Hebbian
learning =2
y(p)i=1
unsupervised the n
no 1. synaptic a of Start each y 2.1.3 initial of TECHNIC
the and a only chart For s:t Also Compute
Learning requires the Therefore, number
on allowflow Fig. Set factor
Ø.forgetting
1].
(Ø) 0betweenif
postulate its hand, : interval
learning factor strengthening an
[0,
Initialization the
Deep to shows :
Activation
an learns. is
and is Forgetting
intervalother and0.1, n
it a 2.1.3 where
Networks Hebbian isit the itwhat
rule, 0.01 Fig.
1.
the of On 2.
Neural "
Unsupevised Learning
Associative Memory and
Neural Networks and Deep Learning
2-6
followed by a quantizer
which
Networks Neural Networks and Deep Learning 2-7 Associative Mermnory and UnsupevIsed Learning Networks
U'suallv the central block the summer
is also outputs
either
sum. associative neural networks are used to asSOciate one set of vectors with another set of
tlof- 1. depending on the polarity of the way h
i = 0. 1.....n, in such a
determine the coeficients w. where vectors, say input and output patterns.
" The problem is to arbitrarily chosen signal sets.
is correct for a large number of The aim of an associative memory is, to produce the associated output pattern whenever one
inputoutput response minimized. for
If an exact mapping is not possible the
average error must be instance, in the of the input patterns is applied to the neural network. The input pattern may be applied to the
sense of least squares.
network either as input or as initial state and the output pattern is observed at the outputs of
adi.
exists a mechanism by which the w; can be some neurons constituting the network.
An adaptive operation means that there
usually iteratively to attain the correct values.
Associative memories belong to class of neural network that learn according to a certain
rule to adjust the weights. recording algorithm. They require information a priori and their connectivity matrices most
" For the Adaline, Widrow introduced the delta
single-output Adaline can often need to be formed in advance. Writing into memory produces changes in the neural
For the p" input-output pattern, the error measure of a interconnections. Reading of the stored info from memory named recall, is a transformation
expressed as, of input signals by the network.
E, = (4-o,) All memory information is spatially distributed throughout the network. Associative
where, t,= Target output memory enables a parallel search within a stored data. The purpose of search is to output one
= Actual output of the Adaline or all stored items that matches the search argument and retrieve it entirely or partially.
" The derivation of E, with respect to each weight w; is " The Fig. 2.2.1l shows a block diagram of an associative memory.
OE,
dw, -2 (4,-o,) x, X.
To decrease E, by gradient descent, the update formula for w;, on the p" input-output pattem V
Associative
is memory
A, W, =n (4-0,)x
The delta rule tries to minimize squared errors, it is also referred to as the least mean
square
learning procedure or Widrow-Hoff learning rule. Fig. 2.2.1 Block diagram of an associative memory
Features of the delta rule are as follows : In the initialization phase of the associative memory no information is
stored: ? because the
1. Simplicity information is represented in the w weights they are all set to zero.
2. Distributed learning : Learning is not reliant on The advantage of neural associative memories over
central control of the network. other pattern storage algorithms like
3.
Online learning:Weights are updated after lookup tables of hash codes is that the memory access can be fault
presentation of each pattern. tolerant with respect to
variation of the input pattern.
2.2 Associative Memory Network
In associative memories many associations can
One of the primary functions of the brain is be stored at the same time. There are
as a process of forming associative memory. Learning can be considereo different schemes of superposition of the memory traces formed by
associations between related the different
composed of a cluster of units which represent a simple patterns. The associative memory associations. The superposition can be simple linear addition of the synaptic changes
model of a real biological neuron. required for each association (like in the Hopfield model)or
An associative memory,
also known as nonlinear.
The performance of neural associative
searched for a value in a single memory cycleContent-Addressable Memory (CAM) can b memories is usually measured by a quantity called
" rather than using a
software loop. information capacity, that is, the information content that can be
Associative memories can be implemented using divided by the number of synapses required. learned and retrieved,
networks with or without feedback. Suou
TECHNICAL PUBLICATIONS- an up-thrust for
knowledge TECHNICAL PUBLICATIONS an up-thrust for knowledge
knowledge up-thrust
PUBLICATIONS-
for an TECHNICAL knowledge PUBLICATIONS-
up-thrust
for an TECHNICAL
feedback without Hetero-associative

network 2.2.5 Fig.
memories. associative as
used models
network neural artificial popular other the of
some aremodels (BAM) Memory Associative
Bidirectional and model Hopfield associator.
The linear the memory
is associative neural
artificial simplest the memories.
of
One associative as
used networks
be can neural Artificial
n
Auto-associative
memory 2.2.2 Fig.
network
Autoassociative
x
computation. feed-forward
single constructed
ain output
is the where nctwork neural connected feed-forward
fully
two-layer awhich
isassociator linear auto-associative
memory
is version
of simplest The
feedback without Hetero-associative
network 2.2.4 Fig. problem. partycocktail solving
the instep first
promising thema makes which input thinterference
e from removing input the
or de-noising
effective
in very are They it. version
of noisy fragment
a or presented
wiath are they when
sequence stored recalla can which memories based content memories
are Auto-associative
memory.
auto-associative shows 2.2.2 Fig.
X2 vectors. input noisy correct to is
The il,...,
m. .
for y'=x i.eitself, associated
with vector
is cach
networks of
such function
1 hctero-associative the subset
of special ntworks
araeAuto-asSsociative
in
which networks,
Y1 Auto-associative
Memory 2.2.1
feedback. without hetero-associative
network structure
aof the shows 2.2.4 Fig,
traincd. been has t
memory
Auto-associative 2.2.3 Fig. training, Uscd
in
input the if obtain ablc
to also isbut
which to
on one SMilar is associations
asSOCiate hetcro-associativc,
can nct the auto-
Or Whether
pairspattern cXaCt thc only not
Heteroassociativenetwork auto-associativc memory : ussociativc classes
of two arc Therc
hctcro-aSSOCiatIve, and
tine.requires which
recall nctworks memory Dynamic :
hetcro-associative
nctwork. diagramof block shows 2.2.3 Fig. " interaction, tecdbackoulpu/input rsult
of aas produce
high too learned
is bevcctors
to numnber
ofm the when hard verybccomes delay. theorctically
without forward
and pass
instantancous. ternncd were They
alyonthm,
but learn1ng the achicvcd
by should
be This y'. x'e then% X- | If " responsC output rccallnetworks
an Staic:
fecd One appicd n bccn has input anallcr
dynamc. static
or bccanmemory networkAssociative "
y. space,
so
thatX k-dincnsional y'.y,...y"in vcctors output m
the
can otherCIcountercd,
be one when
is that such (X,
Y)paterns
ndimenssonal
space
to in X" X. vetors ,X networks
map
input "m" Tlclero-iM8s0Cintive rccallcd.
reprCscntations. output Tepresentations
spccific to
Notwork Hotoro-associative
Momory 2.2.2 assOClalcs
two hatsyslcm is I
sucure mcmory
is asSOCaUVe n
input NCCniC maps halContcnt-addressable
lotworks oaning
Jnsuporvisodl nndMomOry Asnocialivo Lonning Dop Nalwotks
ad Nonal
antMe0tyANNOciative 82-Loaring Deep Netwoks
and Neutal
Nolworke eAilig
UuspevisedT
Networks
Vea ewYs and Deep Learning Associative Memony and Unsupervised Leaming Networks
Associative Mermory and Unsupervised Learning
2-10
Neural Networks and Deep Learning 2-11
2.2.3 The Hopfield Network onc-shot fashion and
In both discrete and continuOus Hopficld network weights trained a
associative memory, it is
The Hopfield model is a single-lavered recurrent network. Like the not trained incrementally as was done in case of Perceptron and MLP
usually initialized with appropriate weights instead of being trained. In the discrete Hopfield model, the units use a slightly modified bipolar
output function
layer
Hopfield Neural Network (HNN) is amodel of auto-associative memory. It is a single where the states of the units, i.e., the output of the units remnain the
same if the current state
three units. The
neural network with feedbacks. Fig. 2.2.6 shows Hopfield network of is equal to some threshold value.
vectors,
Hopfield network is created by supplying input data vectors, or pattern The continuous Hopfield model is just a generalization of the discrete
case. Here, the units
coresponding to the difterent classes. These patterns are called class patterns. use a continuous output function such as the sigmoid or hyperbolic
tangent function. In the
resistance r, that
Unit 1 continuous Hopfield model, cach unit has an associated capacitor C, and
model the capacitance and resistance of real neuron's cell membrane, respectively.
W12 2.2.4 Bidirectional Associative Memory (BAM)
layers
W23 BAM consists of two layers, x and y. Signals are sent back and forth between both
Unit 2 Unit 3
until an equilibrium is reached. Equilibrium is reached if the x and y vectors no longer
Fig. 2.2.6 Hopfield network of three units
change. Given an x vector the BAM is able to produce the y vector and vice versa.
Since
BAM consists of bi-directional edges so that information can flow in either direction.
Hopfield model consists of a single layer of processing elements where each unit is the BAM network has bidirectional edges, propagation moves in both directions, first from
connected to every other unit in the network other than itself.
one layer to another, and then back to the first layer. Propagation continues until the nodes
The output of each neuron is a binary number in (-1,1}. The output vector is the state are no longer changing values.
vector. Starting from an initial state (given as the input vector), the state of the network Fig. 2.2.7 shows BAM network.
changes from one to another like an automaton. If the state converges, the point to which it
converges is called the attractor. " Since the BAM also uses the traditional Hebb's learning rule to build the connection weight
matrix to store the associated pattern pairs, it too has a severely low memory capacity.
In its simplest form, the output function is the sign function, which yields 1 for arguments
Ym
>0 andI otherwise.
Second layer
The connection weight matrix Wof this type of network is square and symmetric. The units
in the Hopfield model act as both input and output units.
" A Hopfield network consists of "n" totally coupled units. Each unit is connected to all other
units except itself. The network is symmetric because the weight w, for the connection
between unit i and unit jis equal to the weight w; of the connection from unit jto unit i. The X First layer
absence of a connection from each unit to itself avoids a permanent feedback of its own state X4
value.
Fig. 2.2.7 BAM network
Hopficld networks are typically used for classification problems with binary pattern yectors.
Hopfield model is classified into two categories : " BAM can be classified into two categories :
1. Discrete Hopfield Model 1. Diserete BAM : The network propagates an input pattern Xto the Ylayer where the
2. Continuous Hopfield Model units in the Ylayer will compute their net input.
TECHNICAL PUBLICATIONS- an up-thrust for knowledge
TECHNICAL PUBLICATIONS®- an up-thrust for knowiedge
decreases
the neighborhood and arranged
is should a to
tietwk network in cqual
with function randomly
Learrurng =
2)NB,(t
1)= NB,(t
NB,(=0) one neurons network is
o,000
oo:o
o;oo
o
o;oo generally distance) The a
o the neighborhood parameter
generated
a the
Unsupervined o be in 100 columns.
neuron
o to weights
size : x||(Euclidean with in
ofoo (b) considered
network
follows neuron rate knowledge
neighborhood's of network
10 vectors
the neighborhood learning
and as x) narrow and cach
organizing all is m towards input for
Momory is Kohonen
rows region.
o network .., - The up-thrust
unit maps 1, w, Otherwise,
|| 10 vectors two-dimensional
Associative winning The =
neurons the
x||=min, shifted +1.
self features organizingii, the with
consider its an
neuron in in and PUBLIC
Kohonen
updated. i is inputoccurring
the output neurons
(w, STOP. networklattice -I
and Kohonen for - two-dimensionalbetween
w, w)) (2). two-dimensional 1000
13
2- Simple
also self m w, learning,
selected and
Kohonen k:||all - met, to Kohonen vectors
are vector go
of (x with interval TECHNICA
2.3.1
this iteration.space
units neuron
vectors and
(i,k) criterion
Loarning (a) is weightx competitive input trained
For a input parametern
the
Fig. measure winning traininput winnerweight classify
the the
activation. random
random +n convergence in a
Deop cach to n-dimensional learningof
w, to is in
how Determine region
and units
Output similaritythe with
around all := illustrate
formto only network
Choose Update
Cho0se
learning
NVetworks for w, required
largest slowly (k: Competitivethe respond square
Step If To in The 0.1.
A For
Noural 1. 2.
3. 4. 5.
Networksfunction
laver the topology input large outputs.
are character might stored clustering.
to
Y input t
PCA fairly
Learning the event
example, of
outpul and the Fourier, 49
in memory
Hetero-associativein :reduction or data number
and
units net s different past retrieves maps layer.aof
Unsupervised
tangent the VectorS for decreases.
neighbourhood
winner's inputs networks.
the
for some :
transforms fixedKohonen
of paradigm
Dimensionalityfeatures
sound, memory
hyperbolic
whilen. computation is of a 2
.. and output that memory places gradually
with competitive knowledge
I,2, or learning
and i, memory Space Kohonen or
network
Memory smell Hetero-associative output network
Mermory inputand/ and It
or from
input
the visual for
the particular :1. 2. mapping. size up-thrust
sigmoidextcmal inputs based
called higher-dimensional
Assocatve m in a Example begins
the organizing
with neighbourhood
modificationAuto-associative different.Recalls aevoke pattern Maps competition
The also topological an
the extraI.2, A of PUBLICAT
use the as a the Feature
are self that
an ifor Memory
Hetero-associative evoke constancy to
are
modality detail networks
solve the Kohonen
2-12 umtshaveJ, a retrieves network
a ainto Kohonen similar
nput to Auto-associative
memory tand mightvivid Self-Organizingprovides procecds,
1hc laver lead to
Between same color used layer is TECHNICAL
learning
Aotseep
Ne BAM:extermal s menory organizing simple
theX inputs vectors in
object map
object correction, are
the modelinput training feature
in extraextemal
Continuous output favorite maps the a
shows
Difference of it that auto-associative in
Training
the
units evoked
memory self Kohonen
from as
an of color Kohonenpreserving learning
and a image Kohonen Then,2.3.I
The have extra of pattern patterns
inputs that same Example:
These a picture The size. Fig. The
2. Recalls
one mental
2.2.5 The same
units. the A An 2.3
2. 1. Neura
Network Initial
etworks
random
W(2)
after -0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 08
W and
(2. ) -1
10,000 weights Deep
-0.8 -0.6 -0.4 - 0.2
-1| 0.4 0.6 0.8 -0.8
ECHNICAL -0.8-1 0.2 Learning
iterations
-0.6
-0.6
ATIONs- -0.4 2-
14
-0.4
-0.2
2.3.3
Fig. 2.3.2
Fig.
-0.2 Associative
W(1.j)
W(1. j) 0
an
up-thrust 0.2 Memory
0.2
for 0.4
0.4
and
knowledge
0.6 Unsupervised
0.8 0.8
Learning
Networks
2.4 Neural
Learning
method: } " " .
" clusters
THE cluster LVQtraining
Here Ficenters. LearningNetworks
Learning
the The capable The
voting their
method. 2. 1. g.
misclassified The
cases. without An
Step Step sameweight classes number LVQ input 2.4.1 uses
of unsupervised
class belong dataVector and
2 1class, learning shows
unsupervised Vector
adaptively using dimension with Deep
Initialize : vector
: must of
information
Label to
we cluster the Quantization Learning
the class desired
CHNICAL move (w)
be algorithm network Quantization
each labeledadding can learning
class
the 1, is data class
cluster that is while 2
clustertowards
w X. either information.
used and
representation clustering (LVQ) 2-
15
ATIONS is before new involves
data information.
closest other the
by be to
clustering
centers clusters fine input
moving specified
the x;
to
2.4.1
LVQ
Fig.
W1 two four is
techniques adaptive Associative
voting otherwise Wg6 tune
by
the when stepsclusters space of
to a the method LVQ.
an- input second priori cluster :
up-thrust clustering a
method.
we Output
units necessary. belongdivided is to
data Mernory
vector is
move or used preprocesses classification
step. and
for determined centers to
w (x) Once to class into Unsupervisod
knowledge method. Such
away must locate
labeling the
to 2. six
minimize clusters. the
from be via several method.
found.
clusters data
the a Learning
is cluster set It
input a are the cluster The
If achieved and is
x technique number first based Networks
vector
belongs oblained. centers obtain
two on
x. by of
to
the (second not. signal
the Kohonen of an
a vector
or or vector
is and Is
that representing product units. a
Networks either are x* x send
to
vector
model model
y here thatand Outstar
model. y
a in standard output fromand
Learning
of clementsdot units so y: Instar Outstar
active
input vector
layer.adjusted, mappng
input either and
of table.
an method active
uses called
Unsupervised or unitscluster and be knowle
given missing using input
lookup
vector unit are
called the will
The cluster is
must.
clusters the of model.
layer phase) but layer
is efficient winning units for
it a X or training. in approximation is
when constructing layer detincd, up-thus
and an distorted a the activeoutput hidden
outstar that
is first
an of formnormalization between output interconnccted
network
Memory The in
even provide input modcled rcmaining
the (in well unit an
output) layers. and the PUBLI
some to to
Associative output adaptively
on used adjusted to is the Instar is one
types: J instar
layers y
maximum to basedwith unit y*, to to only
are used, star y-input unit layer, layer the Notwork:
correct developed an x
0) two (CPN). pairs In input both from
to by y* possibly updation.
are J cluster rescmbles competition
is as thc output hidden
(others of a incorrect.x:y: vectorproduct and weights x. nnapping
17
2- with are produce x* called only winning vector the TECHNI
propagation
network z-cluster in fully Propagation
(neuron networks pair,
counter-propagation
was pairs,approximation connects
the updated
l propagation y CPN
training
dot is
weighthe find the
conncctsis
to vector or x:y input
Learning winner to incomplete
counter-propagationvector If trainingits phase, can the in network the ufter layer.
winner propagation
counter allows an distance. x-input, from units
of
of The
which
model are if nctwork,
output the for we architecture uscd
of of upproximation Countor output
Deep of counter an input phase, of rule sccondphasc, of model
whichweights
The
CPN number the weights be
output
DetermineCalculate Forward vectors.
both phase activation
and partially produces cuclidean in learning phase). Imly this
Full Full or first During
this Forward the
Networks Set
Counter The The In lo
only,
"Full large Thisunits The
The In In
3. 4. 5. 1. 2. is It "
Full " 2.
Neural
1.
a by
inutapproximate ias trained approximationthat proccssing Euclidean
NetworkII return and clement response.
-w, layer
network of
Learning
X Otherwise combination output
is or
tlhat|| other
processing
to layer metric desired
Hecht-Nielsen.
outstar compression
data, the
Unsupervised
such output many
stop. and product the
an compressnetwork.
k a learning single produce krnowlodge
of The with
find by reached, consists on
Robert data dot
and and w, based learning layer. a functions
update to unsupervised is using
sluges: to for
Memory used a instar insturs made
x is It by hidden in up-thrust
vector iterations network. networks competitive formed
1986 be applicd two layer
Associative class, The processing
of are
input can the an
in wcights. composedinclude units Kohonen PUBLICAT
same hybrid
developed nectwork
multilayer with to be are
training of connected Clusters outputs
number vinner-take-all
network can
the and training
Networks
a input layer in
to W) is This network : neurons
Operation
16
2- network
was networks structure
a N(x-W)
belongs patterns. Kohonen
tully association. Ilidden clustercd.to
units
select - maximum It layers. with networks
by (X network.
Propagation :
propagation network TECHNICAL
Learning
eep
etAs propagation laycr
unsuperviscd node
coniponents structure cluster all
Randomlyw, w, -n aISSOCiate general layer: Propagation
propagation
output a Widrow-Hoflrule.
the are of
and update= is (outstar) patlern lidden
propagation output
W, the filter layer from to
Ifx WA A If Competitive vectors norm
nmetrics, input
Otherwise and to its A
: 4: 3.to
5: step conpetitive
counter or hidden counter ujor
or Instur: clements: Weights Calculate
clusteringan (irossberg shares Outstar Present
Nep Step Step Counter Counter functions
is functions Input
Counter Counter
Threc
The CPNThe The
1. 2. 3. 1. 2.
. 1. 2.
2.5 "
Larger (Na+)
which
inhibiting thus
neurotransmitter
both
to neuron inputs. recognition,
Networks sodium
equations,
used in memory.
layers memones. binary partially
output
are
Learnng simulated
input system.the
weights cach two system's pattern
only inputs.rudimentary
the components: of of knowledg
Unsuperised the the general
generality accepting means
input.by categonz1ng between continuousincorpaorating ART's
Bottom-updefined layer
Output layer
Input added of the
current memory more for
simulating
intorealistic into
and is following
exchanges memory. the produces Remarksnetworks, up-thrust
cluster
for neuron support concentrations
resets. logic
the
Mermoryvectors. network tem controls
by physiologically
matchthe necessary new an
the term long ART to by activity mismatchfuzzy PUBLIC
signal p
Associatve for
input 1 a of Short that smaller capabilities
ART-2 generalizability
best prototype"
is ART or comprised the parameter of implements
prototype established Contains variety ion
synaptic
mav
binarv multiple trigger
2.6.1 -
Fl memories, (Ca2+)
that on more
Fig. :
layer simplestnetwork
builds that
candidates
2-19 receive
the and require is - A
F2 of calcium
a
is ARTl, - regulation
ARTenhancing TECHNIC
represent
input "resonance" memory
: p detailed in categories
layer :Parameter the Extends3
which
Learming can model, ART andresults Fuzzy
between
output-layer is
match term recognitionmore It
Deep
networks,weights
match
untilART short Vigilance
means
this
and Top-down directions ART:
ofTypes ART
Fuzzy
determine Finding basic The The Type
Networks
ART-1 close p
The 1. 2. 3. ART 3ART
1ART 2
A
Neural "
traningJf. man
Adaptive
while stableThese a recognítion
with
expectations
observer a a input be
Networi into of and cannot
neworks.
that knowledge consists new
paraneter,
of time. model
Léamang with the capable
have and a input inputs.
real ART if
associatedpropagation developed typically
neurons
seo must learned are in inputs. identification vigilance given new
unsupervised
Unsupen thatpaterns of
dificulty layer It add1tional the knowledge
University)
previouslynetworksnon-stationary etc... introduction
back middle model.a
top-down' neurons, if
ano input (FARTMAP),
object neurons. inputs
same than the for
Memo :orks the its unlabeled learninggrow p. up-thrust
(Boston transforms that of : unknownthe on
largerlearned. retain neural with of dilemma
to existing
composed
netwthe
Associatve : by depend1ng
has are
training
is interaction able an
propagation be Network svstem artificial model
ARTMAP unsupervised deleted
network to Grossberg of Theory to
are the stability-plasticity -
sequence vectors, adapt PUBLICATIO
tendbeto continuous ART fieldnetworks
with size,
can are a Resonance not
propagaton are
netwOrks Fuzzy the recognition
2-18 mappings Theory architectures input the appropnately
always are fixedNetwork.
How
Stephen of information. chusters.
counter information
new
DCorporating
arbitrary
input.two ART,behind
an ART chusters
of
propagation model.
Resonance of Adaptive
input
result is are TECHNICAL
Leaming of
Dumbers
neurons.
network.
Kohonen
a capable systerm However.the can existing ART-1
of -ContinuOUs
Using Fuzzyintuition
a aand cannot Chusters
of counter an
and learning resonence Discrete categor1zed
be tackleThey Existing
drawback number of one
ised
5-per, 25 sersory
Deep Carpenter are of
categorzation -RTMAP :thers cccur ART field networks
: by : : shows
Training
a -Counter archtectures
Odels- primary rnodule. class1fied
Plasticity Problem
Stability
ang certain AdaptiveResonancAdaptrve ARTIART2 otto-up comparnson
generaly basic aaa 26.1
Possitble
veacs Some
Gail The The rset ART
2. 3. 1. 2. 3. fig
1
2.6 " " "
vera " "
least range least the the
NetworkS bascd content
as partial solutions.
the solutions. then then
memory a the a
in produces
stored association data on produces in aSynchronously,
synchronously,
Learning
of in its system. bascd exact
be recall
exact errors. output errors.
by
stored rule units. rule
Unsupervised to produces produces
closest
assoCiations the
accessedinformation
memory
into deltasquured
dcltasquaured all kowlodhge
layer. patterns continuously
for for ctivated activated
thc rule the
smaller allows is
computer
rule the least function least
arnd produce the not, or
of memory of deltanot, delta thust
Mermory or of are are
number
the that and recall or of sum
always the sum and activation the independent conncetion up
in mcmory
pattern traditional independent connection
ASSOCiative the independent, independent,
lowest an
ncurons
maximum the ? lowest smoothly? assoclatlon
assoclation
not inputwhich allows PUBICAT
of
the the the the ? inercased.dcereascd.
may type linearly law
of ?memory
content-addressable
the in linearly as a a
BAM:Thc
number in for input lor Hobb's of of
BAM a organization
memory lincarly function
pattorn linearly
2-21 network. is between like pattern optimizes optimizes side side
memory transforms are is is
the The addrcss are conncetionCHNICAL
In either cither
connection
for are vectors BAM? sigmoid for are vectors usod
BAM
Learning the excecdconvergence: similarity of contents
memory
content-addressable rule vcctors it rule vectors it
of of type
explicit i.c., continuous i.e., on on IE
BAM dolta rulos
Deep
and problemscapacity
Wetworks not delta input logistic input neurons neurons
of its
of
this knowledge input solution, input solution, that thut
should a an
degrcc the Continuous tho the
lo to Thercfore, the the of of
Storage Incorrcct are the aro the Whether aro
the 0.8
is
WVhat refers
opposcd the Whether
Is
squares : (wo
Rules:
Ans, weighttwoweight
BAM the When using When squares Whlch
What What What
List IT I|
Ans.: A on It :
:Ans. 0.10 Ans.I| Q.11 :Ans. Q.12 I.
Neural 0.7 2. Ans. 0.9 " " 2.
|0,
NetworkS
modified
firstdata,vigilance in and classand be memory of
output an or hctero-assOciative
resulting vice outputs
theoutput weights set
desired can set for binary
Learning classification. data and
where recall data (CAM). the a weights
sl1ghtly the or B correct
corrcct units, correct the address and stores either
Unsupervisedstructureof sct
pcrfcct with prcprocesses storcd Memory vcclor
two the
adjustmentART data nctwork as
dctermining set, produCe
correct the whosc represented a
fuzzy be training an is another
Combines takes
lcarning produce Addressable training Kosko,
will by he knowlodgo
and thc using unit than also
unitpossible vectors to from
that of
auto-asSOciative.
Memory supcrvisedmake will on tcchniqucs memory input are Bart and
ART, sccond ARTMAP
cfficiency. based rathcr method lor
to patlerns
Associative minimum rule training Content thc so determined patlerns
by
gencralize thusl
Predictivethe order Hebb is a itsclf ? proposed
It clustering as which conmon (BAM) up
a and
into in merely method. to
in Answers the considercd
dataas with A, an
unit increase the lo IS
unitsdatathe of in Momory
as uncorrclated,
onc ?memory referred are nct
nctwork most uscl lirst set can PUBICATIO
quantization.
classification
dala the the
2-
20 ART-2input
makefirst is
weights set, BAM
known the ARTMAP
corresponding with of be memory
with unsuperviscd be contenl
oflen momnory. "s, and can ASsoclatlvoonc
theto in tested can
associative ncural from the inputs.
alsoor takes
Learning usedparameter Questions are is The as ? simplest network,
ART-1 Fuzzy memory same learnlng It associative CHNICAL
data
when vector memory
associativo net. patterns incomplete
is vectors the layersame.
unitthen the
Deep It a adaptive
ncl learning uses by by the ncural Bldirectlonal
? centers.
cluster
obtain assoclative Hoplield TE
ARTMAP
Fuzzy Marksrecallinputthe LV) meant aceess Associativesinglethe is Hobtblanis associales
Bidirectional
and of auto are "vector rule memory or
is
Explain a corrupted
Networks is the responsc I.VÌ is
information. for is vectors bipolar
vectors.
Two What DefineThis is Hebb is a
ARTMAP If What An
identificd If What
patterns. associative
What It Like
: location.
: : : : network,
Q1 target
Ans. Q2 :Ans.
Neural 2.7 the 0.3 Ans. 0.4 Ans. Ans. versa,despite
O.5 Q.6 Ans.
OO0
NetworkS units Euclidean Is leaning memory.
based to a a many x
adapted trainingIf to memorics.
networks. Kohonen y
Learning be that from of
may with invertible. stage the
have the system.
are unit. mapping of general
formcdunits associatedbackpropagation the used any general1ty
Unsupervised must Kohonen
on at the
counter-propagationnecessarily
clusters well of
morc knowled
are output layer the
memory
thatmetric. difficulty ? the but equally the produces
and networks. middle nets
clustersthe the for defined, controls for
Memory to than form vector
counter-propagation not pattern term up-thru
norm
units counterpropagation
samc the
counter-propagation is p
stages
TheEuclidean larger to thatwell longthatsmaller
AssoCialive
? lcarned, weight new an
trained cluster the vectors forward-only function
? is the parameter PUBL
twoclustered. be has y
the counter-propagatlon alearn containsmemories,
are in the nctworkto be to
trainedthe tendto
xthe and f(x)xfrom (F1)
nets are or from to (p):A
2-
23 metric are full only vector full
vectors
cOunter-propagation counter-propagation
networks form training. =
are response.
weights of mappings of y mapping
respond ART1. layerItF2): detailed
:
drawback counter-propagation,
presentation the approximate follows TECH
product differs input of memnory
Learning netsinput
to of more
(layerParameter
Counter-propagation
Thedesircd Counter-propagation ofstage the only version the net components
as
The dot possible network. of forward-only
number
Kohoncnof ncurons. betwecn Define
plasticity.
Q.23 a are recognition
means
Deep slagc: forward to if term
number simplified
uscd of Components
the
:stage the
firstoriginal
intended called
isplasticity.
and ability short Vigilance
does cither
Sccond the be p
NetworkS
produce aTraining
the distance
is the Larger
First certain during What may The Thc
How on List The How full List Thc
a Are not.
In Is It : :
: Q.20 :Ans. Ans.
Neural o.19 Ans.I. 2. Ans.
Q.21 Ans.: Q.22 Q.24 Ans.1. Z. J.
"
Networks input,
combination
of approximatememory, to on learning
belongs signal
this a discretewinning with
based perform with
Learning incoming vector. gradually
aSsOcative the accordance
It is a a
models.during to :
changing.follows defines around
Unsupervised to Map input
data, an and decreases
nceded
Self-Organizing
network transform
map wj. spacc.
to algorithm.
compress the data.
input
the closest as in which defined
on Liketraincd. vectors stops are generated knowl
discrete input
and bascd algorithm neurons, then
network. neural isintervention to the
weight vector map SOM is
Memory to
being is two-dimensional
(SOM) from feature that and for
networks
used populur lashion. the arc
up-thru
of
recurrent The ofcharacteristics initial weight SOMthat of i(x)(n) value
?
Associalive instcad ?
map x W) of lattice
notwork be Map vector the parameters
human
nostnctworks. ordered initial
multilayer
can solf-organizing the with - equation until the patterns hj, an
weights
single-layered SellOrganizing for input (x, of a PUBLI
propagation the no I(x) To() function an
network or topologically 2 parameters of form
- values
algorithm. step activation at
of thatthe 0ne neuron update
training and
appropriute starts
2-22 networks one learningabout
the
means random n() to Ingrodlents
relurning distribution;
probability
certain ncighborhood
This Map.is winning weight in that
counter patterns. a knowntho the into SOM
Map competitive sample Aw, and of zero.
which of dimension
n nctwork
Learing propagation
layers, ?modolis Solf-Organlzing
Self-Organizing -Choose spacc
ingredicnts parameter TECHNI
modelwith goal of in the to
goal aduptively tho Kecp
mean be aDrawthe
Apply essontlal goes
assoCate
Hopflold initialized learning,
to principlo of lind input the
Deep output Hopfield varying
necds principal arbilrarystagos Initialization -Continuation
space;
of output
rate never
of - -
Matching- essential
Continuous
ad you ounter transtornation Sampling Updating an Topology I(x);ncuron
Learning
-
do
Networs and to is categorylittle is
unsupervised the Explain - but
Defino What The
Or What Theusually The of Llst Time time,
What custeng
luncioIs that patlen An
: : : :Ans. Q.17 :
Ans.
I. 2. 4.
Neutal Q.13 Ans. Q.18:Ans. 2.
3
Q.14 Ans.is Q.15Ans.the and Q.16
it 4.
Third-Generation Efficient
Generation,
Motivation
Neural Networks-Extreme
-
- Types
Image
Operation
Data
Neural Vision,
Networks -
Convolution
Outputs
Learning Computer
Structured
UNIT
III Networks-Deep
The :
Applications
:
Networks
-
Function
Function
Neural -
Neural Basis
Convolution Answers
Neuroscientific
Networks-Convolutional Model
Model-Convolutional Convolutional
Networks
Neural
Convolution Convolution
AlgorithmsVision
Machine
The
Convolution
Operation with
basic Networks
Neural
Spiking Computer
Basic 3.10
Neuroscientific
Basis Questions
the- Learning Structured
Outputs
of Algorithms
3 VariantsCompression.
Machine
Image the
of Data
Types
:
Applications
Marks
Neural
Syllabus Variants
Extreme Pooling Efficient
Contents
- Convolution
Pooling Two
Learning
Spiking
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.11 3.12
-
SSOCIdvo
Notes
2-24
Learning TE
PU
Deep
and
Networks
Neural
the the update whereas supervised well require data, or
reflects
The to
Hebbian neuron
neuron mobile
Networks
Neural
Third-Generation perforTnance. utilized and input
of not
this. to that they
post-synaptic
post-synaptic
one targets) parameters, therefore for
activity. utilized
outputs. of
(reward") in
of be merely types as changes problem
its example
also time-consuming,
on common
(the targetand two and applications.
feedback can to
:(STDP) good the the is labels output update signalthe knowledge
it hand, new sensitivea be
if if the between
but it
weight can
plasticity weakensother by with actual directly error relatively
mechanism.
no ais aare itself,
accompanied and
real-world for
receives
data STDP, synaptic inputs very which up-thrust
the the to
genericline
follows; of and difficult power-hungry,
in and of) and targets the still be
spike-timing-dependent on
correlations
network as target
(classes a practice, can in an
PUBLIC
learning in a firesSTDP,
strengthens are deploy
as such is with be networks
3-3 are goal neuron the networks
generalizations, input) the us can
SNNs supervisedthe a of STDP. correlate In
and statistical be form
pre-synaptic
betweenuse provides functioning. software.to
networks be
that (the to thesedifficult
in label can of these can
mechanisms forms us
on. process conventional
correlations data to computed allows SNN that networks TECHNIC
and
to later is just neuraland be devices
battery-powered
Learning via awithout spiking purpose
learning, learning that is can
hardware
unsupervised rcacting the physiological weights.
network's
Learning data a the learningissystem with challenge
they
as after Learning
Supervised
2. is
device's blurred.
islearning is spiking neural
Deep lcarn1ng deliveredits of classify defined This signal
challenge
Challernges
and identificationsoon supervised reinforcement
the understood. that
specialized
and Unsupervisedand Jater.numerous Supervised
well meaning
Training Another Spiking
Networks uses
Common Detecting is activates error
learning
is learningor
cluster
STDP One
SNN Data fires how
In An
3.1.1 4.
Neural 1. 3
2
4.
Networke
Neural
communication
Third-Generation potentigl learnino
takino
lse event spikes connections.A
re after as of
are encode upon The (deep) they
backpropagation
neurons be
strength the erroris
that againstateknown that form
events the neuron learning.
potential and can betweenerror
Networks to some multi-layer This
action machine in is stable is synapses
is of networks.
The firing the
workS,
discrete processes potentialpotential SNN
A firing network.
the method, synaptic of input.
of that
connections
neuron. its membraneresult popular
and discrete the Neural interconnecting
an developing the brainneural patterm so
biologIcal to
brain membrane implenmenting simulate to the updated
in are Artificial the the returm
Spikes
based through a the response
known spike, as
potential.
neuron's the traditional the of knowledge
the via of changed how the output
to rate and
by transformation various capacity neuron propagated for limits can are
inspired networks spikes.
from neuron reaching component model in of
and a fires weightsthe weights
in either the membrane
increases be trains that than fire for
step
neurons can network that neuron in up-thrust
originally on different
representmembraneafor
after
betterpowerful error
a first using are (weights)spike the the
informationneuraloperates Aftertime challenging neurons
3-2 synapses. neurons neuron's to the an
is The
interval spiking
trains of neural
designed by the and PUBLICAT
it cquationsthe somepotential. which and which determined on
were traditionalThus,
network weights. non-differentiability of network
is population
spikecoding.spiking the synapses of efficient
up based
processestakestime of excitatory, most type
(SNN) adaptive time.Differential decreases are madeat
Networksfor It The consists
scalarthe rate updated
the
neural specific its
adaptive a networks
use neuron.
potential. into of are moreare is
Learrning Network from of
through critical network is the information.
input through TECHNICAL
neurons points
Spiking adjustable networks networks
refractory
the architecturedata eitherwhich SNN be determines are
different values. a that period.
reaches orcoding
temporal the
strength
of an the These to the weights
Deep Neural Neural time specific most input back
for action a
that community. in be input, of because neural potential neural encode of
and continuous by analog trainscan rule brain. The propagated
Spiking in
Spiking is
spike.
aof the established an SNN modeled receiving algorithm. input strength
NetworkS (spikes)
scheme at of it the Spiking minimized.
SNN place One when firing Spikesynapse learning SNNS, Spikingthe neurons.
An the The have the to
in usedThe
of is
Neural 3.1
regular cause imagessegmentation,
Noural shape
Networks
Third-Gengrationin array1s find objects these of classification,
andthrough
There
d1fference consisting
to of
an tries the a to satellite
done merely pixels. some supervision. with examples
color., recognize
cornputer network
be huge recognition, processing, comparison the image
is of
the can array determine
a image
on thingis the to bad' etc.
ability neural human
based there an locations. maxpool. handwriting,knowledg
the and image
Similar merely image in
is machine,image "A any network to
it its etc. computation
difficulty for example
categorizín is increasefor without image like used. for
conveying. image theimage. Facebook, up-thrust
a CNNs as mainly CNNs. the operations of
For network CNN are
recognition
the the in the to features all for
present minimizeacross feeding CNN PUBLIC
classification,
starts But does. aboutimages for Interest,used of
3-5 is included, of
training. opted neural advantage of where
image
brain
machine object important
are data". Disadvantages knowledge
information
of efficient. because
tons haveInstagram, which also of or
our that pattern convolutional areas
valleys
each correlated cases data.
what
rigorous image.
giving major
networks
message
image, companies layers the computationally slower image the
in same training TECHNI
Learning andunique the
included detects another
trained
input Amazon, attacks
are
Adversarial CNN for and are
an a interpret get other
convolutional
and neural the much
after a to a etc.
used mountains
see the is digital
given define automatically of of of
Deep patterns
be and Advantages sharing
is
Weight processing,
We alsoeven There pattern Google, accuracy.
Higher use
network.
neural misclassification.
lot be Applicationmostly
humans can a segmentation Convolutional requires
to
and as sometines in the we also make Disadvantages:
pixels. Machines :Advantages
1. tend Containing
Networks
SOon machines these
unique included
of multiple
include
Hence, is CNNs is
what Most CNNCNN CNNCNNs CNN Signal
AS of out
a
Neural 3.2.1 3.2.2
" " "
2.
Networks
Neural
Third-Generation
onh. to neural
required weli machines for network linear
expressesmultiplieda whichrole
significant
takesand
they for losing layers,
learn an
them designed
feed-forward of well image, CNN features, take
because neural to
without convolution
energy makes other typethat
are algorithm
works us
network function a
convolutional a matrices, input shapes,
these let
which and It's the process
(TNNS)
of It (CNNs),
amount robots a function. arrays.
of on
extract
neural
is as back-propagation
TNNs, third twoimage. portion based and to such
Networks CNN layer.
for a of multidimensional textures easier Networks knowledge
the learninga mathematical neuron to layers,
than systems create formthe every how
A of convolutional from be
reduces power
images.
Neural easily the and like would numerous for
control ConvNet.to other. informationeach "learns" a adaptively.
deep functionsin features uses Neural up-thrust
The cach
which represented for it
Traditional more a as the the of the weights that prediction.
it
ful is suchlayers. the by and of and Convolutionalan
necessary. hardwaresuccess (CNN) called calledas form derive so
twochangedextract extract inputconstitute. up layers and interpret.PUBLICA
errors. Networks data to images made
30 also multiply are the assigns automatically
than applications. layer referred CNN to as accurate
and
develop Network of or is be thatto in data can
when in 20 used data datathey the is connected
Learng eticient noise
implemented arrays of network can It network of brain
to kind images data.
to Neural is canshape is input field. the pixelobjectreduce for Concept
SNN intommation used Neural up convolution'
you that preprocessing TECHNICAL
to
network.
the
operate structured labeled valuable fullydata our
Deep robust real-time neural
with special two receptive raw to convolutional
neural
of mor been Convolutional function's
which output
the field. what of images
is
anc Benefits be Convolutional
often terms, represents CNN the
and hierarchies
afromConvolutional
comes of receptive image's
arr morecan have in an number
infer are layers understand
the
Necwcrxs ransmit tor processing as
SN\S SNNS SNNSsuited SNNs network, CNN,operationsimple
one provide known of of that
ultimately of
the goal
3.1.2 how CNNlarge the Instead features pooling
spatial Cxample
Neura In In to just The
" is of A To
3.2
"
of the
hidden form linear frozenfeature
eficient
etworks
Noural aim alzor1thmalgorithnsin the
Thirg-Gorneration much between
biases
shows neurons,the and are random
more
was converges
which they
with weights initialization
tran1ng traditionalthe 3.31
and input ELM and is
ainto
Q1n-Yu. which
a which
layerFig output layer
Is : of
parts mapping random mapped knowledge
ELM training.
than hidden hidden functions
and (SLEN), neurons
Output
the
Guang-Bin three s
(SLENs).faster and ELM iL steps: its isvector
during with feature
.., activationfor
up-thrust
Network
much input the in
frozen
network 1, ßc]T,j=two b
nonlinear
by Networks between of and input
proposcdNeuralconverges Architecture as nonlincaranPUBLIC
neurons
Hidden
regardedw. The
3-7
are feed-forward parameters
Feedforward weights is ... process.
Model Fccdforward
ELM parameters W,
was (x)] , nodes.
(B1 be and
W
3.3.1
(ELM) the
iteration h, settings
Machine methods. .., = outputcan random training
Layer to Fig. layer ß, TECHNIC
values
these (x),
neurons. and ELM
Learning MachincLayer neurons
Input the random
without ELM. single-hidden
[h, x+b,) uses whole
traditional
LearningSingle-Hidden and
random = and of
output
h(x) solution.
parameter
training
Deep LcarningHidden the layer ELM
learns layer the with
.
particular,
g(w,
and assigns ofarchitecture and
Extreme than
Single hidden a hiddenbasic Firstly,duringspace
Networks Extreme it is neurons =
train because ELM h(x)
faster ELM
to for the In jth The 1.
Neural B.3
Networks
hom ConvN
probahl.
on usefulan product
filters the are volume which
smartobjects taking in functions
Neural WaveNetthe Output are 4. 10 element outputs,
overfitting.
the 10.
and identify
Third-Generation predict that layers depthdot are x
Classification
layers the there change 12
systems and each activation 12xintermediate
's connected these getting
Deepmindand example, from
can Fully following 12 to not to
surveillance data of heightby function equal it
CNN will
accepted preventing knowledge
telescope working
are image the
For 10. remain
objects. uses CNN the These12, activationfunctionsof
synthesizer width patch.×
the 12 well
radio volume for
Al-powered of has the 4. x will thus up-thrust
architecture
above, x of of image 12 the These
mark Pooling see 12 image volume applies it
them. of as of hence the model,
sense us x
discussed an
3-6 12 the computedSome reduces
and labelvoice Let of the the and layer etc. and network PUBLICATI
cars,identify Assistant'make
s Basic accept
dimension
algorithms. ReLu,
and extraction
Feature computes
possible layer.
as be layer:This layermainly
Self-drivingclassify to data. 3.2.1 network, will the
to Convolution
will Leakyconvolutional
convolutional of
used that CNN
able Fig. having
layer filtersvolume function
computation
Learning Google are represent learning It Tanh,
real-time, of : TECHNICAL
be neural layer 3.Activation
function
They Structure imageThis image
the
Deep
to
:detection
: Sigmoid,This
CNN synthesis to
:Astrophysics deep : Convolution
convolutional then the the
in image the layer the at layer: fast
and and various
of Input between possible, of obtained
Networks
Objectuse Basic example output
ReLu, Poolenables
oftenphotosVoicemodel. visual Input
3.2.3 A for 1. 2. 4.
Neural " "
0 image results
shows
convolutions. of to in
Third-Generation
Networks
Neuralthe detector
is input between function themshown
which 3.4.2
different use
thc of is
by numbers
width*number Fig. a and it
in feature are and
filter multiplied feature
Convoluted image
produce
map. used. that detectors
layers
a The feature are channels. volume
as in often
is
detector.
interpreted 3 elements usednumbers
convolutionalknowledge
feature
kerneloutput height*image is data
3 is x3 multiple
whose
operation has thatof kernelinput for
feature
the the 3 yp-thrust
step,
ofteninformation. but
image patterns parameters.
as
in the
1 ----+ matrix numbersarbitrary
of to
develop
is a each entry 1 on referred an
as 10 Kernel Convolution
image colour Example
It 1 different transformations
usedAt single
convolution. a PUBLIC
3-9 "filter,". as a of is 1 1 1 the networks
of and are
a
kindsismatrix creating considered
:
is
wherematrix kernel 3.4.2
matrix
containing 1 volume which
3.4.1 Fig.
certain a
another
channel,
small a
7x7 or 0 0 of
"kernel"
bounds, ---f-----t---+
1 -------
! Fig. perform neuralmaps TECHNI
0 1 1 0be. image size
1 1 inputconvolutional
Learningfromfor a data
Input can 1 a kernels The feature
data or 10 0 has is layersthe
a its
11| 1 1 of
image
output size example
of
kernel convolution.
image sized kernel. in
Deep input5>x as within
to 1 0!0 several
Convolutional
activations
5 an The
a referred grayscale
A Differently Fig.
reality, 3.4.3.
values
and map filtersSometimes 0 Generally,channels. develop
Networks 255. :
feature
kernel ofendata Kernel under
and the In
A
Neural
activation
Third-Generation
Networks
Neural problem are least-squares input. an of To might sensor are the course,
average we function Feature Inputmatrixfromof 3.4.1 a
biases of layers argument. " time.of If type or
the edges we and
Laser estimate measurement. kernel.
small transitions Fig. data
linear in Of weighted new and what
continuous the from following functions "x" instant in raw
the vertical measurements.
a valued sensor. detector the relativelyupon shown be
is Furthermore,
using features Both noisy a
any a obtain as are can
it the two be known dependingoperation,
as and real t. less a convolution
piecewiseinverse done laser at to of Feature a there
importantin timesensor
a this age we spaceship. is etc. for
knowledge
horizontal ofaof
features examples several
capability. is
weights. a obtain the moment, also filterwhere
Moore-Penroseweights functions with at want diagonal,
filters convolution
laser
spaceship is
image, filter,
a
nonlinear extracting/preserving To together
will "a" whereas a
high-level spaceship the image of to
detect with the noisy. up-thrust
approximationlayeroutput two we whereevery Input a
of typesor input
from and
the so "s" image input horizontal,The
With hidden
the to on start average
somewhat mcasurements.
w(a), at : matrix different The an
by build a of reading relevant, paramcters
position
3-8 network operation
we operation 0. PUBLICA
obtained of of position of
grayscale and CNN.
parameters. calculation convolution,
location to function input areas
universal the edges different vertical.,
I
involves is likemore the bc a
on the an the sensor can binaryof
be Operation
focuses those would recent of three an
avcerage
weighting darkening
allows is the x(1), are estimate involves
a Thcre e.g. detector
the can
trained the convolution a uses of into
Learning idea and on of trackingget laser measurements values detcct, TECHNICAL
has B operation
operation output we to weightcd converted
step, based definition can operation operation by arcas. feature
ELM maingenerated Convolution our position,wcighta smoothed
second we with pixelcdges to
Deep of thennetwork. are single that darker
thosefunctions, the ncuralform, i.c., more this want the
and
Convolution
Convolution the we suppose a be
Convolution
Convolution
is
Hß=T. randomly and a rcal-valued, recent
spaceship's suchproviding
a can detects as
than the ELM, solution general
motivate Suppose givesdo to we image
Networks provides can brighter
features known
In image apply matrix
The Now morethat map. Input
L. In In use. We that
Neural is
3.4 "
f as he the and the representatonD}
an
trpace rng t shanng
parameter
n
wes we of o input in gs
dtert forwzrd refer it amount translation-invarane
fone eueleungle locatons shifting ncatxn
comvolut locations the
to eample.
ts a the common shifting
ha a for update hen and same transfom
ransform( or
gn cpteFor gaent spatial spatial ofirect
consequence scale
a inputthe For
slice only vector.
zs is irout that brIUwe
sliceit move input in
o all all the change
depththe and whythe 2cross meansthe
efl weight across achieve
Compte depth with to will = to
maktg slce re2son convolution equivariantthe
transform(x) as
as parameters This in upthrust
s sameech convolved parameters representation common to such
t depth depth
Wil the translation. able operations
by hen the in a
peramders of netack tach usingcomputed is is TION
?-11 ). slice This is same
appling representation(
is This
representation
are are
z, that same
2cTOSS voume. translation.
that We
positn 2D are the to its
the kernel), to other PUBLIC
t slicebe shares the Representation
Equivariant input,
3.4.2 equivariant structuresgood.
singlein up equivalent property.
tunber neuron shares to
spati2l added depthcan mputa the thento is equivariant
a (or convolution layers TECHN
denoting layerthe in If equivariant
detecting
the everybe single filter Convolution is is this
reáuesome with object definition
:
will convolutional function
convolution earlyto
ck-r0pagztion.,
slice. a a
at words. aradientsin weights
as shows when due not
the in is
caneute neurons weights is max-pooling)
per 3.4.4 Convolution
move Convolution Equivariance Convolution
cthet eights mauron's 3.4.4 applying useful
of Fig. General
.sse 2ll the we output. image.
sets Fig. is
.1ff If It
"
ThirdGereraton
ietrks
iera to are
hyper-pararmeters.
it the the the output
for scores Feaure
maps during layers
order of whenare
clas height dimensional Convolutional
tensortensors volume
in
irnportant the and sharing.
2dditional
that input input
width filter
two parameter
such Corvolustioral
layer the the the count.
finds the convolve of a
adlaver inside produces knowedge
it accurztely. than height) parameter
feztures layerthís called
detectors smaller values
to (width, for
in used
This technique
the :follows
total up-thrust
whatrore parzrmeters height random CNN.
2-10 for Feature are dimensions
thern
detertnines set they the an
pararneters ree
Wie
mary
feztre
maçs
to layer as the filter. a
trining frst and The control PUBLICAT
categorizethe cr
cotan
corvhton
are and using
34.3 specific
layers width instance. through
train Layer-specific
hyper-parameters
tensors spatial to by
network the Fig
a layer.
haveto in convolutional has information CNNfurther
that
Learrang and
sed labels the
are layerconvoButional
layers that for in
the images filters acTOSs usedcount
Parameter
Sharing TECHNICAL
s the C)
Pararmeterfunction the map 3.4.1
beep traunng descent Activation
b)
mapssharing
scan with
Convolutional
1 of The to of actrvation is pararneter
Irpt
mge sharing
filterpass
arc 01
1 oo Components passed
to
Throuh consIstent Filters
are
a voBume. the each
eraetar
abie Graient 1 Filters of forward
oo tensor
is weights the
Parameter
be input Slhding an
2) d) called reduce
. "
the
videoor
movement we only used, square,
be a
convolutons the pixel
Networks as inputThis as by
operaton used can followed overconvolution. and
if are
size are
the input convolutions vertically one stridesnot
Neural to are of images strideweif
the kernels, Multi-channel amount move hand, are
convoluton kernels convolution smaller
in shows
stride
during when images
Generalon of will
locations same. and other size.
different Multiple strided size. horizontally the compression
filter Even knowledge
apreferthe
the representation modifies stride input
Third w a the settings.
valued.
is levelof On
comparedseveral all
upplied. channels that
convolution 1, We output. larger the for
at coarser up-thrust
features vector moves3.4.5
Fig. that the to pixel.image.
of as set a most where
slightly application
literature filterfor one our choose
is
kernel
not input samethe is
component
instead a filter
reduce during output PUBLIC
dilTerent at
in cases property.an
the network'sstridemove in reflect 1
differmathematical and features during
the convolution. we of
but Stride network's
will our features, sizesIn
of output is to which
convolution
practice consISIs several real-valued used a used.this
single
alayer, of # filterin to
of 3.4.5 neural is expectfeatures stride
calculation be by Stride enforce
number can pace the
neural of TECHNIC
in the layerof Fig. the macro-level are
1, we fine-grained use
used extraction detectors.
feature
different
not stridedThis video. 2
in convolution
if image
the input of =
a stride what we size to
underslood
tunctions generally parameter used
Cach
n only for stage.stride
The
indicates if 3.4.5
Typical
Setting
images,
of or example, on of
lDoep alloweffect
commutative the
the 1Stride
=
ispreprocessing
(onvolution sampling image Iftime.depends several strides
a ollows
usually is ofD0xels
the in
Stride
3.4.4 .eral that
neans input to The a interested Square
order is the For a
Stride data.at StrideCxpect Small
is The are .oddown Over unit, ror
l
=1Stide "
Nenal certain
Thid-Qeneration
Networks parameters
nnoveconvolutional an dimensions in in This the contained
in architecture
output
the to set producing
the case
padding.
of layerdesirable and m pixel a under
will boundaries image. of k, has in undesirable.
wlere the is value the
th
whatsoever,shape every1. be
represcntaiou
share 1) is size +k- filter
output kernel restricts of filter, validIn will
map ot + not the input Every view.
(q size kernel for
layerlo the is m the the nonlinearity.
a layeris
a make the size of as
practical around borders widthwhere filters. of
crcates first of usedentirethe shrinkagethe where added and to pointthe which
the to size in dimension, keep referred
reduction is the are of filters,
vector
input,in all helps zero-padding of of layer,
experimental knowledge
convolution is the the where to dimension image
zeroes introduce borders
edges zeros number
it padding reduces along This added of also
the so the hidden
of any output
enough number for
detect of positions output. between the
in image, type information is the to is up-thrust
objoct pixels in zero-padding
a dimension.
images to Zero operation no k for an as function an
"padding" on next
in
This which size the where configurable channels from pixelsthe
usetul more size, visit Essentially, resultingperformed
theo everywhere of in an
D welove layer. padding. case well in PUBLICAT
or effective convolution some strategiesin to kernel 1 that activation
resultingthe pixels
\ith is one
:are case allowed +
enough direction, many work
k of
l thq in extreme a is
adding the lose using Extreme aFor -m input.
zeros as
by operation contributions
central
: lt appcar its only become composed with an the not
ariance independent. of to by input. Just through does
nput increase the size tends the I Othereach used, the
of resolved
padding is vector TECHNICAL
dges that :convolution
kernel
: of -
k
convolution in generally to
the process the it the will : convolution
yui apprau aCOss Cnine
age. is size by convolution
timesis fed not the compared
to with because within padded new
Sae direction
convolution block then is
ordersize observation be common zero the padding,
3.4.3 the
Padding k a padding
ot can
comparison entirely visited output padding
tanuple is in kernel depth.
to is ID is
fwaluivs ework P'addng problem Valid
general, Same equalinput
Full
a tensor represented
age, the The size;
be as valid
and One a) b)
WhenValid
3 c) d)
of
. "
This 2. map an than of most -L2 blockthe unit.reducing pooling the
not
and as the and for
usedmore
pattern samea
like of dimensionality
Third-Gongration
Neural
Notworks Input. computation, in doeswithin
of network, feature resultrather
map.factor the evenpooling ofBack-propagation
winning
or the much
the L size are and reduction,
the filtervalues
location one of specified,
of feature
a each will pooling layer, :pooling
translations spatial layers. repcatedset operation, by N
pooling this
input. the new map pixels) the
Nx value
that
the
certain reduce common the in convolutional
dimensionality
in values is average an to of
pooling
the the computation a feature reduce average single types
is function knowledge
create
pooling of (36 operation in difference
be size unit".
smalla helpsin reduce
a may results
at position is to eachor 6 and max-pooling,this main
It the the pixels x to *winning aggregation
layer a
separately 6 by for
to layer input. on
progressively that selectingthanof of pooling used pooling propagation conducts the two up-thrust
invariant andplace acquired the
previous its convolutional
network sizeof map is to but are
aroundthe
L.neighborhood
to smaller
parameterS the number The
feature subsampling,
Max the
Similar There
of invariant takes map involves input, an
slightly width reduce likeforward is sampling, anapplies TIONS-
the pixels). CNN.
operation. of
to learningneural feature is
15
3- filter the a entire array.
functionsvaluewhich input.
of
become andmore is of the Pooling reducingto
always the
output convolutional applied
(9 PUBLICA
height layeramount after each or called in layer,- errordownthe the kerneloutput
detectors No maps. 3 convolution
operation
will x usedusing valuethe across
representation layer
poolingoverfitting. uponmaps. halved, pooling as in the the
the of
the
"summary
the layer3
of also operations computes
the feature layer known parameterspopulatingTECHNIC
Instead,
Learning takes reducesfeature pooling operatesfeature map operation, obtainedsingle filter
reduce a pooling
model. pooling
is size.pooling the the
function
the control within to dimension feature a also a
applied from sweeps
weights.
Deep the layermakeof to a pooled the are At to then
pooling
givenlayer
of layers the a of
reducedlayers,
pooling. field,
helps representation the quarter pooled
example, maps units
pooling layer number
and helpsfunction
pooling also addition pooling
Pooling
3.5 acomputes
DOoling of be of that Operation
any Teceptive
Networks to a size each Common
Pooling Pooling
Pooling ordering
in number
to learned. feature pooling
as hence means one output norm being nave
times The
The well The The The filterThe e.g. For the
A to
Neural "
Networks
Third-Generation
Neural The more2. when
with So,
quantity.the in linear a removing rmoving Sparse
image, and fewer
of need image.efficiency contain improves
we convolution
with set in edges
data.in powers it data function we model image,
resuls replace a input
weights. requires the
training relationship detect
that may
training with By effectively
in and. only
are the the the
and set.
activations.
activation means region image the
oftenthat combined than to of output
the certain a data sparse difficult requirements
parameters in use not
informationspace.
depths image linearof is local the to method
this
create speed smaller This
the function or the of need
because
layer a
filtered functions. the
of image. the
parameterknowledge
to
above has is of pixels. connectivity becomes computing pixels calculation
the layeras nonlinearity
function detector with the edgeonly
image hidden it accelerate
the is threshold used rectifier the memory
2, activationnext gray improvesthe
interact we the for
it in the
frominput feature then that picture, picture, of up-thrust
the of to the typically pixels detect
power
leads activation the with sparse the Thispart
of value
node certain to create the 256 means to
them
signals, which
patches ableother increase or of reduces kernel channel
to wholedetect.
large an
14
3- a also the as kernels by
input. subset PUBLICA
is replacing need
often negative is than to to 256 also
approach whenzero.aabove it the weights
functioninput convolution
field,layer. a
square that referred the both threconlythe to saves
is is faster to using size smallerIt of pixcls
layer
activates means networks, is neurons'
every output rises of and
than the whichefficiency. receptive connection
a we pixels also
rectifier processing
extract an is matrix networkimage alsoby smaller of a when of but
Learning eachSuch removethe inputThisthat Interactions
Sparse
3.4.7 implemented imageonlyparameters, uses
only the the hundreds TECHNICA
zero network
neurala the are statisticalideacalled but cfficiency,
can neural
fromthe the
kernel
Making Occupy full connect
in processing. functionthe variable. with interactions input
below why from the whenpixels,
interaction
Deep Onefilters 3.4.6
ReLU
Layer we when transformation
traditional values
negative containing
neural convolutional an may
image
the fewer is with
example, layer reason pixels is its region example, to calculation
and dependent interaction
have operations. of
of Thisis However, improves compared need
etticient input thousands
NetworkS number this deep black Sparse store
zero. The we Sparse
the In i.e. This not kernel
For In a If to For
do the
Neural " " "
"
of used RAM dilation
kernel number while between
Ifus
the outpul
regular a2x2
deconvolutions
NeBworks be the weights
in and can new spaces
locations
nap,directon a is
computed
for as =2.
oul input the convolutions
less a
Neural that filter dis l
output carrics the increasingis factor- zeros.
as 412
9
o4 6 power is d
Goneration
d1fferentcach known ideathe by
the Output as knowledg
layer kernelwell between dilation with
expanded
in in laycr. Dilated
processing
central
location
stuck kernel without in
to also convolutional
Third coespond
connected 2 as tensor filled for
arc 69 x
03
kernel 2x 2 kernels. The spacing the is up-thrust
convolutions a wherefilter
cach Kernel with size the are
a2 intermediate convolutions.
of locally where
convolution the locations PUBLI
dimensions
for choices transposedwith Transposed convolution window filter original
index Convolutionstransformation, on
a
convolution Conv applications decides dilated
17
3- of computation. the empty
separate as
diflerent an expands
same types atrous that
the A Transposed
of into
of These
convolutions. o1 23| 23 portion
which withameansintermediate
the how Input
transposed
spatial zero-values called
twoa t operation in intensive. TECH
having
of is Dilated and convolution
introduced,
where set this convolutions: the also convolution. d
width,
the.
output its 3.6.1 aare for applications performing
Learing a
tlhanthrough of the
reverts
strided insertingless convolution
used convolution factor
tcusor, and Fig. portions
Rather 0
and
shows
Transposed 3.6.I
shows are is
and
Deep
Networks but Eig elements (d) a element
6-D Cycle Froctionally requirements
map. .nnsod convolutiontensor.
input shaded by time parameter3.6.2by
a locations weights Dilated Dilation
Dilated real
output tocqual tensor each
Fig.
The in
3.626
" "
Thiet-Generati1
Networks
Neuralmaximum
often within feature as This as
input. be a
convolutions through
space,connected of
used by size
operation can
the input.
followed
nore value certain convolutions the
to are
the locally layer.of
uscd average convolution the
kernels kernels, factor
with a
Multi-channelconvolution
move connected
whether in
be in
pixel locations
different same. size. as a
to the Multiple strided we by
location knowledoe
the
tends caleulates
about to the representation
as only
locally
compared all valued. a through
appoach
selects several is level of increase
more at applied. channels that spatial
a
and for
it of features vector coarser as rotated up-thrust
it input. care :follows
Function slightly
literature.
application every layerwill
input,this not input same the
we instead a reduce parameters
aside, the different
is at the is at convolutional an
the across if as differ and features that weights PUBLICA
array.usetul
are
mathematical kernel
aCNNS an Convolution channels of but output is to
convolution kernels
is. functions practice consists used
As movesoutputbe it several
single real-valued of the
where of of
wvesN arav. can between be
calculation
number set a storing
the can of
tilter convolution
in the layerof a set separate between
oling to translation exactlv used extraction
layer, strided TECHNICA
tilteroutput Basic convolution in different
feature
detectors.
not This a for
the send connectivity convolution
understood
functions if learn
generally
only for stage. a compromise
requirements
the the As
average to than each of lcarning
As poolng:
field local the Zero
2.
padding the commutative
alloweffect convolution kernels.
to rather oftypes
Various in sampling
nling: sei of 3.
Unshared Convolution
usually is
to to
reeptive Partial a allows to
1. 5.Tiled general that input The than
presentVariantsStride
ompared Invanane order Tiled
3.6.1 a of
o Average means offersMemory
is This The are used.down rather
Tiled layer. set
\{ar \alw the 4. it In In
IS It this
3.6
hidden On hiddenare each
channels,
several varying matrixuse significantand size of scaics ainduces output
Third-Generation
Neural
fietworks
number wa
values, parameters to width fixed we
the
operation the
the with traditional,
reason different kernelSoetimes
to time. not different tenampic
hidden of
labeling
compute input inputs are
same or matriX convolution covolution
space by
compelling
overfitting
the process a a
provide applied inputfor
for
pixel to given the in represented has weight input. koeege
Because of
consists
step point image
to
labels also the
simply sameof
networkcach Y some a and a size the for
recurrent
network.
a can each
3 be provides with ofoutput the eachas
the overzero. network
usually
convolutional cost well
multipication: ru
at they cannot where is
on
convolutional of by inputskernel
convolved quantity computational for 2s PUBLeA
used estimatereplaced that the rmatrix
This images, and size input.
H2) is
simply suchthe et)
193- 1s different variable
U an networksnetworks. apply:input circulatethe oturmeta
recurrent produce are is ofexample
an input of model matrix of -
kernels
image. W term when collection the (14b1es
aof to havepixelvs
W kernels
this convolutional
of to of
straightforward
as block TECHNI
inputto observation even
neural
kinds viewed to eachNurnber
H) a convolution
used step, how size
of
alloed Gs
Learning Examnple the the step,is
this networksa the doubly to
V U a consider 130
is time These unclear
muliplication-based labeltDes:
givenV step, with on be
the to dependingmay
Deep tensor first class tweets
of isetwork
first
3.7.1 of representation extents.convolutional is
usedbeingadvantage size
Convolution and
Data
3.8
the each Types example,
is Convolutionaccordingly.
and tensor the
kernel It 2 sizesDillion
Networks Eia On Oi11erent2ssig
but on
used data channel
spatial height. imes the
laver. 1SSues. bata
The The The One For
all O
1o J
Aleural " "
Third-Generation
Neural
Networks rather of follow
showsnext and probability
roWs, its same tines
task. k)
on columns refines
object,
regression(j, 3.7.1 inage Thefhay
pixel that 3.7.]
d=2 masks a estuuate.
Iieratively
structured Fig. Fig. mage with 10 as
that
layer.
is a outputs. corespondng efined
factor for probability precise (Refer
convolutional roWs,Y, iew
value labels
(d) dilation high-dimensional network De
(b) drawD labeling.image Cat
Iealing
real large of
73/o03// the estnraie
4 a and io axes
lensor
the is
or standard for pixel TeCuTeIt
S,., image CoTesponding i0r
where task architecture has
for a imput t
where 1ensor
ouiput
a classiñcation an network the ad
filter output aby as
18
3- in tsitKaie
emitted
S. 10 Thus siOL
dilated 1ensor
to iclass pixelChange 2xEss
convolutional classes. 0
useda goal pxel STgit StiiTait
every WIth upcaied
a for tensor, to
a
with be emitbelongs objects. Tht afieTeI
label label . eact
can
Convolution a ienSoT
biue).for n DIeVIOUS
tact
Learning class justmight ndvidual
networks 1o recurTent )
Structured
3.7
Outputs is nerwork
model tit OutpuriITIg
tabeIs
iage
green. i¡
a objectmodel
pred1cting aUd
(a) (c) SIIEUtt
Deep 2 over
3.6.2 ConvoButional the the of of
thisthe an (red. coiuimIS.
aTE
and 10 aliowsoutines
exampieis ttsat
dIsIiDution
Fig. Typically
Exampie.
imput iIput LzATATTCies
Networks just ctannes 4WtWIst
StiTdit
than pagt) ITTiagt
Ratne
the This the an The
Neural
ysety the amie lateralof the somerecept1ve
extent
be localized
these to the the in the
estimationand the existing
propertismirroring similar perform
avate tetinz. in role of
emulate shifs richness
staticd1vid1ng important
the ttE the are networks. the
sigficatkythe represeted Pose both of of
as perform calledthree
structure that small to matter
aswn sumulating to able recognize one the
capture to designedfeatures convolutional d) lo an plays
region spatially
small,
a to on
reira can invariant is detection a
is similarities depends
perforn is two-dimensional learning Classification knonledge
and it to activity to to itgeneral,
V way braindesigned are respond largely
calledw eye he
in the a network are of Motion able the for
begDS the neurons and cell's units deep is In up-thrust
brai) in alter in cellscells recognition
arriving is ahas image pooling networks, AI accuracy.find classes.
were
cvetally
rgNd tat nerve
substantially
layer simple convolutional c) Nowadays algorithms
Ue The Thesecomplex recognition anTIONS-
oj bran lighteye. oplicnetwork actually the % the
funtion
part cells. the object
he A of but inspiresVisionneural 99 of
2 by he nÍt the It retina.cells,function complex with
:(detection) letting
one PUBLICA
on 1ormed91 convolutuonal
map. a cells, of
bran ol Face
convolutional to
foçusarca baCkdo through the simple of This objects success
but spatial a units
linear cells. simple Computer and itassign TECHNIC
of first in b) segmentation
Semantic
e)
we
4spts input.are the mage passeS image many feature. movingfragmentsthe
aplishnents
vicw,the imayesin a many simple
detector recognition recognition to
iehyssolzjsts A in by
contains
is visual iSSuethe characterized
by orderand database.
HA)y the
structure
of
nucleus. arranged detected
the : of dynamically
siolifed
VJ view, of then contains of Applications
nreorocessing
Li..hl-sensitive The properties of help following
tasks intoin process
hatactetI/K
wdey ofprocessng image also position Object Object mage Objects
ocartoon ocniculate
1is
VI the object
VI
field. VI those nIs
IHcr . With
The VI: 3. a)
2 a)
3.11 "
.
llefwork2200) thatof space, optionallygradesdoes as signals,
frequencysome vector d-dimensional
one
naive The convolution elements and
d)
runtime
untime inspired
ofbiologically
neuroscientific
mputskind grades
over it ofthese
vectors.
oughly same both then For convolution.
one separable,
Weyursl the
two O(w)
for can
observalions vectors, w
of test, the the transform.
is
tslin onlythe it consist to to way.
t) corresponding
standardized of kernel
requires
of because kernel with developed.
Getitst helongmg senseobservalion discrete
multiplication d is O(w this
features Fourier of kernel convolutions
cach
with the begins
veators product in
requires
convolution kromiedge
thirf makes of the If represented story
widhs sIZe of
magss and the
implementation vectors. were
networks
nomalizxd inputs
of variable our the features inverse
different outer
When performing convolution success
al))0unts and took inputpoint-wise models for
teluto) the as multidimensional up-thrust
sized has applicant
applications
over an be
scores. the represent can convolutional
greatest computational
ikes time, using as separable.
varvine
variably input weights both performing expressed faster
than convolution an
naïve one-dimensional separable PUBUCATI
hugh leanmg over the every test domain to the
Contan college
processing converting the
the Algorithms
processiDp if observations. same called parameters perhaps
Jaheld recordngs not than be of
sense to transform, relevant
(0,|:D)eep but the corresponding time can significantly naivewhileevery history
they to
faster is
ullhon ?
Gonvolution make scores,
convolve the kernel are
for cquivalent kernel composed fewerthenspace,Not Basis
Neuroscientific
3.10
hecaUSe
of Convolution to be The the TECHNICAL
Jenpths of Fourier inefficient. dimension, space. networks
convolution approach
is product. betore
14 to not kindsare test backcan d-dimensional
the takes storage intelligence.
J50
Os data si/e does to features this to
we standardized
sense dimension,equivalent storage
ye different different is a converting is
convolution
outeralso parameter long
the Use varable If
Convolution using sizes,
Convolution cach Convolutional
Nomalve Ixample: composedkernel Cxperimnents
to of make the Efficientdomain problem their parameter
- nclude as a in aruficial
Not uSe When
The havethng and not well and per is with The wideand
When II
"
3.9
for
visible etticiency,
Because
of
Third-Generation
Neural
Networks half.layer. of
normalization
that network processing of
timeefficient the
composition metrics used place than
way by last operation. are discrete ?
image. reduced PSNR algorithmwas images. more in it smaller statistical convolutional
a This the brain's convolutionof
such tiling benefits
batch
network. and normalization the uses is
an structural is and network detector
in first whichin the its
of images by size SSIM compression that
features remain model are improves knowiedge
followed the
reconstruction image network feature in
the for neural use What
from epochs. ofperformance
maintain these batch to
images accurately that
capture the ? and for
layer Apart image continue of
neural networks networks
or up-thrust
compressing two, and 50 type kernels requirements
can can and on layers. input
the
by second of layers. convolutional of visible
boundaries Thismore
artificial
way architecture reconstructedstride images artifacts convolutlonal operations
on an
PUBLIC
the Answers brain.can neural
their using reduction
3-
23 this in network is than memory
in responsiblewith a grayscale
images. artifacts of and of by beneficial?
uses techniques,tile type the simply
one smaller fewer
stacked out implemented
this layers
effectively cause
layerNeural carries JPEG these with ?network in networks
a neurons least in the
CNNS, 400
using
network such is networks.
are used kernel requires
layers convolutional images, Reduces interactionsTECHN
network convolutional Questions network at
networks is
twentyformationthan compression
of neural in interactions interactions
Learning multi-layer be neural of multiplication the
convolutional better example is
convolutional interaction output
can such firing
neural Making :
a uses spiking artificial parameters
Deep well. is images this are Convolutional the sparse
ComCNNthree first layer imageAn
images.
in In Marks the
in the images quantization. spiking
simulate sparse sparse Computing
and of as Trained traditional i.e. ?
networks
Networks
of use
image of resultant the RecCNNlaver a information
of matrix Sparse Would
oonsists lossy Define image, rewer
ae cehe these TwoisWhat How Why
layer.
Since oach
A to
an In : general : Input Ans. :
Neural 3.12 Q.1 Ans. stepsthan Q.2 Ans. Q.3
Ans. Q.4
the
Third-Generation
Neural
Networke is
mav evendetete to crowded
data in then are segmented
labels seen the and
from detectiondue pixel
skeletal previously to
Thisthe
task due ComCNN Channel
which each
record
system system. and
motion vision joints. be especially
detector, or
illuminationclassify also
the simply images, These :
surveillanceof computer human can with parts Co(:)
to to
knoWn form grass. storage,
Law. encoder
Image decoder
Image
or motion depthof attempts theyconmpared distinct knowledge
difficult location processed
someone,simplestchallenging
or Moore's
person any a and
using sequences, sky Compact
representation
i
that like transfer two
appearance,the road, Decoded
image
for
specificofpart to by The learning than up-thrust
notification estimate are look has
sequence.a image as images
key ismotion is data fasterbelow,
a recognition such they 3.11.1
Fig. an
3-
22 of a and to deep in significantly PUBLICATI
is photographs, new what
identification
detection used classes, role 3.11.I
a detectimage shapes of
send are type when important
pose on
alarm,to an devices several based Fig.
Motion way of human a that increasing ComCNN RecCNN
frames Human tasks, is segmentation
so Compression
3.11.1
Image in
the One categories an TECHNICAL
Learning an of training
has shown
is detection: analysis. :
trigger
of thesecapture one
recognition between variety
estimation is
compression
that
Deep For
scenery.
motion into during architecture
The
to threshold. Semantic
image thesepictures. explosion Reconstructed
image
and database.
Motionusedlaterchanges wide RecCNN.
from usedinto Original
image
Face
Networks Posethe
an Image
be for data
b) c) d) e)
Neural
O00
etwaks techntcally,
onpressed
ther althouyh higherthe
capture
have for
tieural
to
(encoding)
to
a network
rergtin rdel
learn
self-supervised.
the to method, representation
the knowledge
Third encourayes secks training
learning
as for
that to up-thrust
by
referred lower-dimensional
that model reduction,
unsupervised
funcionunsupervised? an
PUBLIC
3-20
methods,
network
Qutput.
its dimensionality
?autoencoder
loss
an lcarning
a ts neural a image.
Use lo are learn
?autoencodor
nputor TECHNI
supervised
autoenCOders4
TheysupervIscd
in to for input
r9gularlzation
Is aims
input.
copying autoencoder the
typically
of autoencoder
the usingUsO
autogncoder
besides.
properties parts
Weoularied
of wo
traincd data,
sentationdo important
is dinensional
What An
Why -An
are
ls
14 15
luey 0.16Ane most
Ans.
o n
convolutionet
Notworke ahave intertwined; Sub
of size beforeis data,an a to
average, value : for network
most to the or parts (encoding)
Nguro layer. Pooling maps input
layer of input
or factor two
Gorigroliori
a all weighted connccted are feature reconstruct the
I) lhe valid of
comelions
to CaUseS information (ReLU),
connected a consists representation
training
by inputone
urd shar1ng and only
locally and
Linearity knowedge
norm increase the least model by
dircutindirectly of compress
parameter a sources to at reduction,
L2 and padding
where autoencoder lower-dimensional
thouph Non for
pooling, layerwill else. up-thrust
to
he parameters twosomethingConvolution, one
possible learm
convolutional Layer). dimensionality
Evencan of average where the an
form can
An image. PUBUCATI
?
layers
layers. CNN is neurons.
particular procedure Connected maximum
padding that a
?representation
translation.
pooling, the into learn input
a are networks
deepdeeper convolution. storing function
in CNN
abetween operations of to for the
possible
the Max orderly (Fully the layer autoencoder?
is typically
have a autoencoderof
Lonning the to pooling.
convolution, for applies cases.autoencoder?
neural parts TECHNICAL
cquivariance compromise ? achanges in
operations hidden
we in equivarlance are requirements
convolution convolution.
Classification maximum
units tiled
pooling kernels an main convolution
Doop SInce is convolution are data, mportarnt
of a decoder.
using
sparse, types of that four
Convolution Autoencoders of an
nd rcally, Pro of main The aim of higher-dimensional
of of' set a operation
are an
Networks called
the TypesExplain aoffersMemory Is and full all images, aim most
is case this Which
Four Define is encoder
and
a is
InputWhat
Nol verymhage. What Full convolution.
in What WhatThe
List
In property Sampling involved the
It of : an :
: are :Ans, :Ans. Ans.: Ans. Q.10 :Ans. : as
Q.11 Ans. : capture
NouralAns. Q7 Q.8 Q.9 it's Q.12 Ans. such Q13Ans.
nct Q.6
Chain
Earh
Robustess
Noise
Feedforward
Deep Learning
Netx
Gratient
Neural
Networks
and
Lerning-
Augmentution
VCDimension
IVUNIT Deep
of Dataset Learning
ofDeep -(4
1)
Theorynormalization
Regularization: Learning Augmentation
Probubilistic Answers
Backpropagation
bateh Deep
and
Motivation
Challenges with
Dropout- Learning
of Dataset
A- Backpropgation Theory and
Dropout
Bagging Questions
learning-
Learning:
Networks
Deep Gradient
4 md DeepProbabilistic and Regularization
Dimension
VC
Bagging of Rule Marks
Deep
Syllabus History
Contents Chain
of and Two
HistoryStopping. A
Rule 4.10
4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9
Neural
Netwets
knowledge
for
up-thrust
an
Ne PUBLICATIONS
TECHNICAL
was of theorem
data computer the regressionprnorobserved additionalrevised A If
givendifferent.
of fron revearch for
by DeepMind LeCun One probabilities.
Deep
Fo0dforward
Notworks be occurring.
terms invented spccially
used
used It of to its
matches Yann
analysis.
day. on the or
Baye'form
s first considered general
in
intelligence logistic basedand given posterior
Google
system this This
advantages the but A
conditional
2018. data to applyingits hypothesis
hypothesisevent
2016all learning before Bayes, of in
probability
are
learning win Go. to usedalgorithm.independent. sometimes
artificial in statistics a P(B|A) knowledge
significant in AlphaGo gamemodel awards widely called
deep on decades Naiveday. the an of
history a relates
machinc by board Bayesbased this of given probability
probability conditional
and for
learning
deep the Turing of still all is probability
in the developed principles are
hand whichlike to that P(AJB) up-thrust
had the to is Naiveclassifier useful data
of up Sedol, it Much
learning class Coming plays contributed
received data by regression, various theory probabilities
an
PUBLIC
Learningand the applied still the the
Lee the learning input learning. conditional the
a programs
it a is machine-learning probability
4-3 deep is 2014. 2017);19 have of is calculate observingrevise denotes
versus categorythe it
Network
(GAN) 2017
- community application was logistic yet
obvious in computer
-(2016 (NLP). BengioDeep machine in machine
time, to P(AJB)conditional
Go
collcagues this features and to method
Alpha in Processing of computersthe longway of probabilities acalculatesinresult TECHNI
becamc
Sedol. in transformer
introduced Yoshua the of in
Theory is modern events,
DcepMind of algorithms a
Learning
tDeep
and between are
Adversarial
Gencrative
. lot
is forms oftypethe model aby provides a two
his Lee AlfaZero modeling a
speed.
and
cficiencyand a and that computing is is twoThe
itspecd, champion Language
is Probabilistic
Hinton, earliest a predatesrelatedof
implementation. theorem theorem Baye's
theorem
fllow match called there is
best-known assuming world" theorem the
denote
B OCcurs.
probability.
probability, information.
ProbabilisticBayes
NetNewuorralkcomputing
s . o and the closely itself.
data
Although
RomDanyNatural
worldA1faGo Geoffrey predates Baye's Baye's
of
analysis "hello
Naivewhile Bayes B
one the and that
a P A A
4.2 . "
Deep
Feedforward
Networks Warren
brain.to and Turing
in other theyevolvedHenry chainbecame computers
developed. discovered
no of and to
possible
free unlabeledtrain
is increased
With
the
logic" time data was
because opportunities
algorithms data a
and human each algorithms to the This
threshold Alanmuch has credited only training assembled
to on when of of
Pitts were was layers, types.
volume
the process.
taketalk
learning onlybut full
used and 1999,
Walterof to is 196Os
based unitsmore It and was,
upper
appeared. and
networks
called they
not able mathematicsModel
deep challenges
increasing ImageNet, making pre-training.
thought
would universe. development. in as
processing sources and
when be
early
version starting.
then, Propagation place improve the knowledge
they brain; would is
1943,neural human
machinesthe the learned.by
Problem Internet
mathematics of Since simpler took datajust launched significantly,
human in to the the nets. layer-by-layer
its
theyof combination graphics described for
to the existed learningcontinuing describedof was up-thrust
mimic time, in
control process. Back range The neural
backon of the breaks a Gradient
being whichStanford
process that
with propagation and images.
basedand
traced to of the continuous deep the
report train"increased an
4-2 point thought up
significant data of not Group Data, the PUBLICA
logic
1951 takea increasing
algorithms used came for advantageVanishing at labeled
thought were The *without"
be model in some would stepprocessing META Big professorto
can threshold Pittsthe neededhad
concluded a back
Dreyfus layers.
layers
twoof evolutionary three-dimensional.
of million
computer mimic
Learming mimic
at they Walter basics as onslaught GPUs
Learning of the The
of own; with theseby data Al were networks
process. the
combination that concept at have lower report TECHNICA
Learning the Stuart 2000,
to AI their and to years faster reached of an 14 imagesof
Deepa was make exXpectedlogic also the than
McCulloch of 1960. significant in researchspeed Li, speed
neural
Deep created thought of of The1985.
until
useful becoming yearformed for
Deep aim to father
thinking the development
threshold networks more
Fei-Fei
of signal as increasing Labeled
a mathematicsalso over in 1962. prepare theconvolutional
and of history
McCulloch basic the growth
used the the next *features"a of 201,
is Warren steadily,Kelley 2001,
Around learning 2009,
database
NetworksHistory mimic started
called called in started
Neural
added. to images.
They There it The rule The
The and data call
J. In the In By
Neural 4.1
fromaccuracy,
of of birds,as train1ng.of and tenIand
to (e.. this such technoogy
Deep
Feedforward
Networks wanted set set chicken. elements at business
features
learning, directly large a of lo0%
have results, preliminary
images
we state-of-the-art
a to valuable
whichmachine tasks using which precise afrom brought circuit traincd
and with
by crow
pets,determine classification trained
layers.networks, for which time.between be provide
learning In trained more :variations knowiedg
a machines can
differentanother. achieve many is enable distinguishing algorithms
expert. feedback
networksinprocesses to
can are neuralnetwork survevs for
learning. deep
Datascience from performModels
algorithms
photos
of contain layers
theirBoltzman up-thrust
human can of
and and neural include
deep animal neural and
learning
ML
models that layers Moreto network. on train an
a to performance. methods
compared news, TION
and a as
Convolution
Al, of learningby learns
cachmanually through
architectures
if birds. to to Learning
4-5 betweenset alowing Deep
ML intelligence learning example, suchadversarial allowing posts,
Artificial learning following
a distinguish
model level. PUBLI
relation
,between
Al, learningDeep
Machine had Deep information of as systems :data
Relation established human-level images raven media
we "dog". Deep network For as new networks. Deep
computer such networks,
generative unstructured TECH
that to afromthe
data.recognize learning ato social
Learning 4.3.1 sound. classifies of learning using
say and important
is neural recognition
exceeding raw crowconsists neural
neural analyzing
by customer
insights.
Fia. let's"cat" features a auto-encoders,
learning, receive Unsupervised for
Deep shows or and to a Supervised
example, text, learning
learninguseddistinguishing Recurrent
Recursive Reasons Analyzing
and by most of data
4.3.1 categorize sometimes that pattern chains.
Networks are
hierarchy be
deepimages,Jabeled data
Deepinputs can Deep
Fig. .Eor ears) In a) b) C) )
NNeural it 43.1
. . "
Deep
of evidence-based
of
Feedforward
Networks strengths additional
additional randomprobability
numerical fromsolveof reduceis with
application that
set
learning
to small
of network.rules oftenfactor involved
any using set describeare
important before or a can a
by values of of and
by
variable neuralidealist functions typically
continuous. massiveexamples, units)
obtainedrevised we possible on
An the predicated
the randomway (computational knowledg
been in of is
P(B|A).
revise h Idata a
originally hypothesis Theor whose layers computer
evaluate composition that
has data
I.training training a discrete network.
likelystates. is mistake. pre-processingfor
andor that I hidden up-thrust
update h
givengiven variable which can
P(AB) value of of how a functions
value are teaching a repeatedthe
probability
probability possible of learning, it makes
to P(B/
P(A)
A) h I variables which in an
4-
4 probability
posteriori. and of of of a number layers PUBLICA
how
between probability Probability
description is
Probability of
P(B) h P(/h)P(h) X, it base data
rule hypothesis its insteadwith
variables
P(I) PriorPrior whether
the written the machine whenthat of
of phenomenon. of number of
a a initial obtaincd. to
relation
givesevidence a
model is number
some
=
P(h/l) = = = = a each usually
refers learning,modelleaming
is random of
Learning an probability
P(A/B) classifier (h) (I) P(h/l)P(Uh)is
distribution
it later
information
is on. depends
on subset a the
usually
the eliminates TECHNICA
a that is obtained. random
it deepthe to
gives newprobability take variable, give modify
machine
is is two Deep
4.3
Networks a on related
Deep theorem that of deep" of
of to is the
problem,
we learning.
requirementsmachine
theoremlight are context probability a learming
and posterior
information is distributions
of leaming
to idea exponentially
B random In instructions
in prior
Networks Baye's and variables outcomes term example. basic
the
This belief A The Deep Deep
A A If In A A The the
Neural where
from
needles
information with
of GPUs extracting data.
from
advanced
usingtoanalyticsinformation
Deep
Feedforward
Networks they data decision-making
onFocusesinaid and
amount
production. learning Science
Data deals relevant
extract
as neural on Deep tohaystacks data.
structured
need Science ofprocess
for with both self-d1rected
accuracy. Artifictal huge machine
they planning.
time workdata into a
so data Data The
more canunstructured put needsefficiently, Feature
extraction knowledge
more features. high-end Classification
requires models the are
largely model Output interaction.
and
intelligence
gives
of they Input machinesthose
and
cognitive uses
Intelligence
Artificial
applications
computerized
new layers Intelligence
Artificialintellectual trees. for
Learning
and the
hence
workand
are once learning
learning
leaming Algorithms simulate
that up-thrust
create the decision human
structurednetwork. Science givingtosimilar
training. on
analysis to ofDevelopment
Deep Deep can
Deep rely Deepdata
on capabilities an
PUBLIC
It of
humans. and
Data Focuses
human in logic
4-7
data
to and
accuracy. require analysts low-end
sets.
for by
AI for use statistical
time features data learn over about
mostly data Classification ML, meansand which TECH
lesserless in on extraction Learning
Machineto dataimprove
identified work
by variables Input Feature Output between learn
requires a systems uses in patterns.
Learning models detected with
providing
gives can analytics
Learning programns
to
learming form.
structured
a
learning accuratelylearning learning
specific andexperience
experience
find
Deep are Differenceon algorithms
intervention. of Software
Algorithms machines. Focuses Machine
models.fom and
and Machine Machine Machine
examine from time. data
Networks Machine Needs training that A
Sr.
No.
iNeur al 4.3.4 2. 3.
2. 6. 7. 8.
3. 4 5.
can
Feedforward
Deep
Networks not perform to adapt identifydriveand For of number
sucha rangeis
applied presence of recognition layers
it does fraud, extensively. in radius that on
trained, can to a check-ins. decisions
soldiers.
satellites largesafety for in network"
it can be model analytics algorithms
because detect unsafe just
to the a
Once
it worker not Facial Learming
Deep intelligent
trained, ability
learning help for approvals, detect requires neural
predictive learning the used paperless Learning
unsafe structures
training.time to of within "artificial
the extensively to thatimprovement beingstores. make knowledge
save properly deep have or deep used seamless,
own. safe loan one comes
for humans.a use is at Deep learmingand
can data. Additionally,
learning for uses
clients. is is learning
purchases an for
as regularly learning sector learn
create
data its algorithm
on
raw is them
utilized risks that own,
its up-thrust
algorithm field enableand Deep
labeled
data from than for the objects can
machinery deep to
deep classify business with Learning
of institutions research
portfolios deep enable
to an
manuallyfaster applications. is
learning utilizing airports data, PUBLICA
types
requires in learning helpsor
-6 learning
again,
used and research, parse
assess person soon informed
4 different heavy learned.
Learning interest investment medical to
features over networks Deep Financial
learning feature in Machine
learningdeep anddata. stocks, any will used Machine algorithms
Learning
deep cancer The makehas
identify
A extract and typesnew : of The automatically.
cells
cancer DeepdetectingThisbut extensively TECHNICA
a over neural Deepdefense areas of and
:automation
and it
Deep :engineering
Learning : uses what
When data with services
trading : ongoing measures. : purposes between
creditresearch recognition
and to tasks or by data,
: The different of andobjects
specific machine.
heavy learningon
labelling humans it manage environnments being that based
Deep data : of
Application
retraining Aerospace algorithmic in
Training: Financial Industrial security Difference
Efficiency
newFeature thousands example, safety
Medical fromdecisions
Machine
and require Facial already
many help learn
Datalabel
Networks by of of
4.3.2 2. 3. 4. 5.
4.3.3
Neural 3. 4. 5. Sr. No. 1.
2.
dimensional
class1fv1ng. anyone
thus whosc maintain
tofinallyan
Doep
Footrward
ietwrks oftradtxnal of dimensional1ty
mumber and 10
15 5
up and
the whenlow keep
dimensionsexponential y.
regions increasestoorder
failure for o20
whenof in cannotone
curseoccur -64 In knowle
occur 15
the 3D data two space.
applicatiene difficult
the
data, data major 10
5 (c) grows
by thatnot of
as the a the to data
part known does"closeness" is of
dimension for
up-th
phenomena
excecdingly that issue 0 dimensionsthe analysis
world in that quickly of
Motivation
Learning sparsity 20
motivated is
data
phenomenon regions one
less for
an
real and 15 TI
PUT
tasks, the so relevant and data
in become dimensional
sparsity The
grows from
4-9 pertormance 16 less the
Deep was Al to 10
4.4.1. -2D
movesfills space,
hardware. such
lcarning Dimensionality refers significance.
represented (b)
of This
problems data of
train. Fig. number data
on highof
high.dimensionality above the
Learning understand. well issue in 15-
- 10 givenof
high-performance
to Its 20
Learning timcassCSS decp analyzing showWn
is
generalize lcarning the seen representation
the space statistical 20
data the
more to of As
Deep specifically
the as 15 regions4.4.1
Space
dimensions,
Deep difficult
to and development of thc
machinc and sparse,
much hard
Challenges of some
to Curse in of volume rig. data
and or algorithms dimensions 10 4
Not Disadvantages curscorganizing, - accurate
1
woksr nccdsnecds very
very becomes
has 1D
The Many spaces, (a) the three
goal The 5
DL (is isIt he The As
D 4.4.1
4.4
Doop
F00dlorward
Networks other are
actionablc
insights
from
Networks
Neural
Networks
Neural
Recursive on
about feature
automatically
that
networks and architectures hardware
performance
high
cxtract various dependent
datadata
big
wrangling,
machíne analytic8,
analytics,qucstions.
mathematics, neural ML Neural
Convolutional include
that
components
methods
answer
to all cngincering.
data for
to statistics,
Uses is AI,
is and science DL build patterns
Objective of learning are
data.
the lcarning subset
scicnce. Recurent
b)
to data :follows
as Algorithms
Data detection.
aims discover Networks
is Deep GPUs. kno
DL DL
the collection a) c)
maximize and
planning knowledge
perceplion, dissemination
intelligencc
concepts,
of
Learning
nachine
conscious
and data broadly
categories
ofclements
including with
concerncd
Ais Learning data for
chancc
sucCCSS.
of a through Reinforcement
learning
c) work
cncompasses and Unsupervised
learning up-th
be Supervised
a)learning performance
computers
to problem. canlow Deep
is prediction. Al can thrce
Objcctive ML learn algorithms
normal
Deep of algorithms GPUs.
without basis. an
actions. subset into PUB
8- to the of
4
Al classified
and aimssolve
science.
on end-to-end
Disadvantages
MLis These
easily
knowledge ML ML
ML to b) accuracy.
supervised,
orunsupervised Al, engineering.
to Intelligence the
maximize through intelligent
machines
Making TEC
L0arning reinforcement
learning capable scicence. three
between towards
building high
with artificial Learningon more
into Narrow
Artificial computational
needpower. and problem
done concerncd a) General
Artificial
b)
think
humans.data
are like Super feature gives
Deop to accumulation.
is approaches. Difference that of of fall not Advantages
Objective be systems Intelligence
and accuracy.
can subset Intelligence
intelligence Artificialmnay Deep the learning
is aims
Al machines for
Notworks ML ML :types or of needsolves
is All may
Al c) Advantages Deep
4.3.5 No DL
Neural 6. Sr. No. 4.3.6
4
2. 3. "
other : the any no
reasonably
Gradient
random
iterat1ve
all model
similar
high-
three likely machine
structure. than follows has almost
Deep
Feedforward
Networks dalaset random of from
rather any are parametric
or equivalent is functions
variations smallThe likelihood.
two of a a data it as
training optimizers.
a starting parameters. and They
taking projection,for manifold are minimize to values.
structure the natural networks
in learning weights function.
small loss
converges cases,maximum
datadata, by of from
visualization
random mostthe gradient-based non-convex
initial positive knowledge
While the is to of
the reduction on gradient
different easier all feedforward cost most
of be impact
of Visual1zation coordinates the initialize
optimization small the of
visualize. a model of In principle
structure In lost. can for much for
of the to values is models. up-thrust
desired. it much networks
dimensionality
degree be manifold, Choices the iterative, applicd to to train
will generally
function.
estimating or
to of not of the important the
inherent Convex to linear an
difficult aid `omebe data terms output descent zero use PUBLI
using to used neural
lo to the is descent. considerations. sensitive simply
11
4- way. in
much low-dimensional network the is discrete to as
lntunve. allows within by algorithms
verythe this data by it function. is initialized deep such
some gradient represent that minimized gradient
show leaves it and
accomplish function trained a is networks,
be this structurethe neural fact than of models
lessin and design :0)
can to reducedThough represent TECHN
choice R" with design the
usually function
loss Stochastic be optimization
guarantee p(ylx
much
plotted to
datasets
Learning a in a cost how be the
model on may
neural the parametric
interesting co-ordinatestraining
on these can distribution
are be to data.the to Learning
Gradient
4.5 a draws on
High-dimensional lies algorithms choose are
choose continuous biases 4.5.1
Cost of
Function
function
valuesparameters.
convergence models.
feedforward aspect
Deep be plotsmustway the of
can ,ondomness learning visit networks
data and learning gradient-based
cimnlest
dimension more must mustnow The for
and dimensions
,ansionalof deep
notion the of Designing
portanta
Networks terms machine
We lossparameter
smooth, values.
thosedeines
the learning We We Neural initial
hen based other
that The SuchFor
tO
.be in a) b) c)
Neural " "
Deep
points
dimensional
Feedforward
Networks
beliefs
priorthe the are conditionthe would of training
examples,
most training
large template
matching,
by a
learning this the be assumption
family use artificially
high.
is thatmethods in for and can
data priors good
is and
belief the we nearbyv) smoothness-based Algorithms
these neighbors
posSible. k(u, dataidea
low
neighborhood leaves key
by used different
satisfies probably
guided prior x. concentration
continuous-valued
In further with where only The
widely reduction.
data. as -nearest exampleare
a that thenm associatedother.
kernels :
the be these
express is performs there is setting
nearby knowledge
the most f* answer
some sets
dimension to of each exclusively
training
need
classifying Regularization of function k local as dimensionality
data probability learning
the explicitlyAll many the from fromthat
in outputs regions
algorithms many for
Among constant. thatanswers is
as
of function of
approachoutput family apart each up-thrust
the a
or higher learm then with set of many of case supervised concentrated.
or farther resembles
limitations dimensionality
this
sorting learn. locally goodagrees the similarity
the training non-linear an
learning implicitlyto x, the
10
4- Smoothness constancy as -
the should process input copies is grow into although
the PUBLICATI
to several
that kernels in
but or between x
example
related machine smooth an answer algorithm v a the space introduced and highly
constancy
prior.
to learning
the
encourage
similar they local and as the data
ways for have of of from region. approach sctting,
is u thought input that is
andwell,function be answer an the classas
interpolate test mass
discrete TECHNICAL
arises
very different we suffer each
Learring should produce
of
neighbor's decreasesa
important 4.4.3
Manifold
Learning ideawas
If closely the an learningprobability
that
seem Constancy
generalize good example be the
of smoothness
local
or x. algorithms.
learning also breakin is learningboth
kernel parameter learning
on
kind many are
There
function+¬). of to machines can how
Deep issuemay to
be. to
them
a neighborhood k-nearestAn and kernel treesthey based unsupervised
data to what to
designedknow
f*(x extreme =v that
generalized
and second seem examples. measuring
Decision ManifoldManifold
order learned = combine u local separate
because are remains
Networks spaces, L0cal about f*(x)we when
may An The task
The In If A
4.4.2
Neural "
Given distribution a deTnite
to output caled e
wth ahenpopula e to ogstc as features to
o tmebors nonlinearity. covarnance
positive
ed n just emor.
rathergtcal
ed gadets is
TOre the of n be the unit often Gaussian
of Choice functon.
mav from
squared a
whe the be
1s Most outputare
srmall that no b. to knowledge
results Ost distribution. transformation
These conditional
2 with + mean ortoo, constrained
Teures veryCrosseTitropy unit cross-entropy
unit of W'h
poor kind transformation the Gaussian
produce output network nonlinearity. -y for
to
simple minimizing up-thrust
funKtion vector
a be
lead of model additional of
reason
saturate cho1ce the neural One a to
distributions. y of needs an
2
often and of
a mcan
produces PUBLI
TECH
no
perfom. to covariance
to oneeror. torm of output. aftine
with kind some with distributions equivalentcovariance
respect errorunits distribution
is produce
134- absolute coupled output.
Multinoulli on units
output
absolute This the Any
musttransformation Gaussian
output
hidden
unit
a provide
determines the
with based
with
functions.
these binary-valued.network outputto is learnHowever
Some mean data
tightly Bernoulli log-likelihood
problerm to output
output used
mean
variations. is a to
optimization.
-based and of linear
between layer mean used
is then as
used
be the affine often
optimization
Learning
anSolvingand cost error function : types.Gaussian
Simple input,
that
output
units for forunits
Softmax of all
inputs.
is
output y,
N(y; be
error squared croSs-entropy
Units
Output
4.5.2 output
for units layerare I).
of task an output
units output the can the
Deep
calculus
squared cost also
can
output
the on Sigmoid for : units Maximizing
units
the the units. unit a
h, of matrix
for
and of of
rearession, based Linear Tunction
NetworkS mean
gradicombined
ent- renresent rolecomplete of
Other units features P(ylx)
=
Linear Linear Linear
calledMean Caice linearTypes
than The unit 1. 2. 3. 4. Linear
Ge
"
. 1.
Deeo
Feetforward
Networks
cos between
to removes as that representto and we it by to
a its functional
model serve density of only choosing
the is think
output distribution, have
MLE example.
as croSs-entropy likelihood to limited to x.
from enough cannot can
predictions for
highGaussian of functiongiven
likelihood. we the
cost
task design
changes used an extremely probability is
predictable theyis network, y
maximum a
functionfunction. of
cost regression of as cost value
model's alently.
function variables, parameter learning
maximum knowledae
const cross-entropy assign our
full neural This a expected
example,
equiv is, from and Logistic than
the cost + x. of design
cost large output a given
to variancelearning ".
powerful for
andusing PmuseyX) and cost approaches
negative
infinity. ratherthink
of error f(x:0) possible function up-thrust
model. can For the
data trained log-likelibood
form Gaussian.
derivingbe close.
of discrete y can to
squared
y- must property of function
the suficiently We desire. x
training log each arbitrarily
of statistic We maps an
12
4- Specific I) becomes
learning Instead any PUBLICA
are of that for Gradient For parameters.
poÛei ).Nyf(: mean A determinea function.we that
the
orks if - E vaiance
, conditional is
is functionsalgorithm.
: value. function function
distribution.
-E_negative iog model: the regularization : a function
netw
between cost : comeit by have
of likelihood gradient variables statistics
e.g, a
neural = form = = the to minimum
but of
Ji8) simply Gaussian
PaodeyX)Je) cost
approach learning outputs, one
we to boundedness
and
continuity.
cost not specific
the
Learnug by. on the
resulting
crosS-entropy set TECHNICAL
cross-entropy the one If enough view,functional,
modern model depends designing
of output :
conditional
just a on
given on maIMum or function
is property than a lie
with this the anda zero set leam powerful
Deep cost anddepending have of at minimum
is to entropy real-valued
training point occur
Most function funion const of of guide of a rather
meansset sng burden
Adantage
to a as
anc of Desirable not probability want Learning
Learning
eteorksue functio raining Where coTet being this cost minimum
function
model goodCrossdoes
often its
Cost This Cost then For FromView
he the have
I
a it as
era " "
with
everywhere
increasesactivation representationalwith
murumze saIt ues
weght
input
Deep
Feetornard
ietworks calledsquared
applcations tangent the
also in represent
total function
that ANN Outout greater
is
It the weights
problem.
hyperbolic introduced.
network laver ANN have knowledge
minimizes error (sum)
Threshold only
The multiple Actrvation F(sum) and
the learning of funcion to can
neural network everywhere
function, network
is data. for
whkh of descent. % function. output)
non-linearity up-thrust
value 80 input
multi-layer the training
gradient
method minimum rule. Backpropagation
to SUm activation tangent multilayer and the an
PUBLIC
solution differentiable
tor CorTection (input
in
a descent or when exists
for the rule Summing
junction
function.
15
4- a method network. and arc
used be layers
already
gradient
Backpropagation for delta
junction function,the only
net. looks
algorithm
. to eTOr which
training
method
is the considered
then
is systematic 4.6.1 activation
makesnetwork
the backpropagation only
two TECHNIC
a by Widrow-Hoff Bias Fig. function representation
is computedcalled summing Logistic function
It neuron :
layerlayers
Learning rule. Synaptic as
1echniquea
backpropagation weights network
with
delta S Wn simple
linearused single
activation
aBackpropagation backpropagation. W a : hidden
and generaloutput
ized functionof shows W4
Dacknropagation has be Examples whatever
Deep non
Any
b.
a liotion can function.than
and Rule the ing Input a Neuron
4.6.1 Consider sum of
VewoKSChain of erTor
the ThesepowerNeed 1.A
the error The Fig. a. C.
46 "
Feeofoward
Deeo
Networks with a only output using throughout
is is a the the some singlecase become
also
define variables function
over inside
problems represent the softmax
predict sigmoid y distribution for a
1l]. over has In values
to common the used options positive.
is to [0. distribution binary function the
Classification use
approachneedsinterval i.e., to be
classifier, input
answer. are over
negative,
probability can different extremely on function.
net the functions sigmoid based
between
maximum-likelihood probability. normalization knowledge
neuralin wTOng probability distribution n
v. lie function. a of functions
very The or activating
variable The must a of one
the or want Softmax
output negative differences for
positivesoftmax between saturate. up-thrust
number. has
itprobabilitv. a Describe and asuch
into we
binary cost saturating
model:componentsz
exponentiation
defining time the classes. extremely
values.
convert the many an
distributionsx. single very the exp(x))
as choose can PUBLICATI
4-14 a :
Sigmoid
of The on -b. Any use
used
value conditioned validthe is different activationoutputwhensaturates,the
problem. a whenever Wh to input. variable may
input.
argument + to is
by two function log(1often model input saturate invert
the defineda
be z= using on we function multiple
output has is softmax
predicting to compute z based in values n its to
this unit output, z changes = most over softmax
able TECHNICAL
v numbergradient activation
distribution its E(x) the when
over The
distributions softplus can
Leamingbernoulisolve is when wish are are
istribution output n are distribution the
disribution is modeling. with
small saturates
therevaluesthe they
require this strong to layer y saturates the functionswe sigmoid,
Deep can Sigmoid sigmoid Wh-b variable if
variable.
internal Whenunless
For Softplus
3. to softmax,
output
for a linear Probability logit. function to itself,
and tasksclasses Py=l). Probability it
insensitive probability that
units Benoulli
Bernoulli ensure statistical Sigmoid Compare
A Use called
a discrete Softmax model the extreme.
saturate,
Networks units.A = output
the These
1any Z Like
Sigmoid two To a) b) z. of
" " " .
eura
2.
Turrhct s1ralar
to
INreto trainable d1recton
process the
backpropagzton
Deep
Foetrwad
Netwrs rurmber des1rabke and
Itpts otDuts
the the training
general gradient min1-batch
than with usually
on and
deperds unitsincreases inputs the sameIn current
exarnples is is behaves. irregular1ty of
training
times.
hidden longer bias
neuron of the parameter. instead knowledge
unsts between stab1lity the
combination
of
unts task the
units train1ng in object
hidden innut mOre each of data. backpropagation
going and for
hidden inputs classification relationships
weight
requires to the moving other network. functions.data input up-thrust
of of manythan bias process
enhances noisy
nurnber number p any the
log. as units inputs require a he T
Adding an
tumes a a a not the efficiently. on TIONS-
theinto representfunction. training
+1. of spccial to reliant
17
4- :The hidden for is and about sensitive for
elements disjointed training
required
:using alwaysmomentum momentum
change
twice 1/e networkS biases. Disadvantages
program.
tuned method
units leastfewer by can activation the knowledge learn highlytraining. PUBLICA
tnan backpropagation is works
reduced without keep weight are to : be matrix-bascd
hidden T at OT unitsLarge biases inputof that to possibly
user backpropagation
is TECHNIC
more
of haverequires examples : casyinput and for
patterns backpropagation this
Learning hidden the the use to way the prior approach the
of be must task. be with networks and the of time
number of that The used and
gradient.
previous require Backpropagation
performance
to extraction can Networksorigin the momentum,
have cxcessive
h p we manyof the time except
weight
: is to fast ofnumbers a
Deep load
choose umber in Momentum Advantages is
It standard for
units.
input
of that influencingthanthe Momentum analogous simple, to flexible. does
not
It of nccd
Learningofclasses training nced
and ofSelection can Cacre Feature : oasily DisadvantagesNecds
Vetworks Never
Vou Bias offset with of Only The Tne
is
Advantages No
The It A
1 A s Eactors 1 2 4.6.1
Neura 1. 2. 3. 4. 6. 2. 3 4
.
Deep is
Feedforward
Networke The to hiddeninpte input-outout to the the the the each the no
representation
Output). layer.forward negative) output. for to through in using have
of
of of error input layer, value
numberset desired weights, ncurons
and higherfed
ofa of and the networkbackward by
is number target
(Input application positive until layer layer
innate nextit any the layer. associated
layerS. and layer. the
with weights. and error.set the propagateout, middle
layers the weight, large next since knowledge
the (both outputthis training from carried
outputin highernetwork
the the
two
separable, those a of minimizes passes. firstthe the
connecting that with values
values network forward to
usingand next the signals of As for
is input adjusted up-thrust
neuralso trained in two output
weights. adjustmentlayers.
inputto
learned only the weights
non-linearly large
random that input-output moves the
the involves error complex.
between the in artificial usually way
between is are middle an
layer units the by set. the the layer PUBLICAT
4-16 be of saturatedinput.
small train1ng signals of
calculated layerthe
cannot one valuethe adjust a training adjust
calculation
is in one guide the problem
used for an networkdifference of
network neuron
or in to network to pair low. input to of of
mapping
discontinuous
are (neuron)the activation
appliedto not the output. used output to weights
randomly from
network the
isobjective eachacceptably output
available the
layer(s) by outputs. is to the :pass the are the
scaled The network the
network pass, makes TECHNICAL
be pair vector of for theypass,The the
Learning the
unit the procedure: error,weights backward
neuralpass, the
desired weights training -6 backward direction. is
and hidden is of can training forward where of adjust
it
is connects Backpropagation
unit the input the the 2 is system Backpropagation forward pass, neuron values,
portion that steps weights
Deep data inconsistent,
Therefore, the Generate a the Calculate
the
Calculate and network, reverse delta
rule.we
the The Choose
ensure the forward
Training
produces Apply AdjustRepcatentire the output. the output Next,target
and the
Weights
of aprovide pass In The
Networks output layers. In In the
If pairs.
1. 2. 3. 4. 5. 6. 7. Forward 1. 2. 3. In a. b.
2. 3.
Neural
"
valdatxa meta
-algoriths averaging.
randorrnly sunplest but models,
unsupervised
have such provide the the using "bootstrap Then tromthe
s
s verfim1ng Thas otÍ2nple valdatrr from of which in
models, to 63.2%models examples cxamples times
prOKeSs known they examples and
valsdatu the repeatedlythe model multiple
the tree contain
resulting a
learrin2 rise on on of individual, calledn cxamples
tihe one the decision are of
nreplacement
the to erorperforrnance are n appear
traiin2 in COntIues is averaging drawing
average often consists
n by
boosting and of regression m n knowledge
rrdcl
the increaes the ensembles case each objects. the is
in after to Although of
further far erTOT
learning applied
special by on combines what set consists with may
the
error and model for training example
so tiny hindsight relatedn will
size on for
the Bagging
creates classification
or prediction and up-thrust
O which achieved after the ensemble sample
erTr a usually and model. trained also random
if is of
chosen
termination.
stoppingcheckin 1mproving which boosting
of samples Itreplicates. the that same an
PUBLIC
the
this
point,
the at olution It
aggregating- is single set bootstrap is Suppose
point to
chosen classitiers. and labela learmer at the all.
19
4- time. of sct uniformly
Ise
on to
train algorithm, of m
early is O1 for
method classification
model a generate
bagging,
over prediction
create training that atnot
validation
set.
same the best are baseset.
for
be
Therefore. to pointl0st Each
begins
to is
perfomadvisable
continue 1s multiple robustness rest training meansappcar
the This the bo0tstrap -
effective n, replacement. the
hope
termination directly new selecting Thismay TECHNI
At of data.
training
meta of as oint size the round,
set. track all Dropout for type such cxamples, a
original
Learning
Deep
and moni error
some
At
continuously
tored. can
point
this not and from The designed and
accuracy
improved the of set is set. it
repcatedly
training keep called first any
with not replicate or
does classifiers for training simple
majority
vote. cach Lraining
replicatc,
the rise decisions arching. do constraints with the
the training on
to and also the resampling
to one Words,
imnportant
the is was originally
used
be clustering, data, of
particular,
the it
because ontinues of a bootstrap
replicate" by original
on Bagging is Ensemble
include reduce but other
In nool
.Bagging methods
.Raoging
givenoriginal
unique Tomed bootstrap
error, can
it useful
is .hat was For the
It is set. as In a
. 4.8 "
Dees
Fstorwarg
Networks not large Tnillions
of the
Data original learming neualwith the data stíl random
learning seen is a to carly
smaller
out
descent backnot
do need more. improves magesreduces be begins does
We dataset. of should holding view, The
the deep convolutional form can
with unsupervised a
data. and gradient that
We or manipulating and task which set. set to
learning in speaking, image. them held-out of space.sct,
thousandsthe cropping a by held-out space data
point validation
of models occlusionas the
divers1ty in units, is
train the point parametertraining
variability an seentasks, this
paramcter
deep training which the
generally
training in input.
to sorme hidden the krowlodgo
collect by resizing, be regression simply stopping From
znd in points to
region on on a the
the of abstraction.
of in the as
already
present
daz as for robust also modelcrror
arnount to
introduce for
rotating, part the stopping, the out of
restricts
parameters.
larming data then, used rectangular to is the portionfor
fezsible available can added to
rnodel some networks is the the of held thrust
new This network injectionapplied decide
the and by rnethod of whensize cffectively
is the up
deep rescue. even is levels carly error data
increasing
Augmentation not creating
donedata data. a noise the the PUBLICATI
dataset a neural autoencoder.
is to terminated to
18
4- in makes is way
is the be the erasing
augrnentation and
neural noise noise the of
reduces applicd
training
the process new random multiple regularization
in to of
classification One testing values
it
the of can of Tnodels. This the becausc
of transíurnCaesComes proxEss collect
this diversity Input
robustness
of randornly a small iterations.is only
process integral denoising
whenat then essentially ihe
initial
size images. generated.to inputs. augnentation approach
Dataset UTE zugrnentation lerning input of is
regularizer
we actually data if works and trainngIECHNICAL
the the the of portion
the an forrnfew data the
rather in and
involves
increase for incTeaes a
involves the manyeven the theirthe
chances
ofoverfitting
is is irnprove also 4.7.1
Early
Stopping stopping
pradient-descent
within
Regularization:
2ugezton angneTtation EXAInple. to deep is then in solve as a training
erasing For to cormmon
only a a propapation-bas
datz data datz havingof that noi5e dataset
such injection as method,
are augrnentation.
to applied neivhborhood
to augrentation process perfornance to acts
of s0 us oclusions possible alyorithrs, after Larly
ne networks way the
Tages. FoT without Randorn Injeting doing Another stopping
ollet 2r0uTt6 heips Noie endedof
One DOiE this
ba Daz data. This part Theise.
b as
I1 In
4.7 " "
" .
eop
Feetrwardpettotatce
tietwrrks a the zuesatz.
th erpirxally
Tes that
zeTIeTate
tree large lincaver acall are
wcak a we than weighted
of W t and
an
a exaples sunglesufficiently also
iteratm. feeding accurate that
eTT tak1ng
ratdon a is
lo#
to weghed
classifier a examples
gatartee
that of
fecds a cach on performance by boosting we time more is H
thanachieves ensembles system are solution whichcach
omlythe At weak much bypothesisthe
better which modeling. on
the
weights. predictors,thumb,learner,
o that n
result that complex
a is focus knowledge
theorsts sl1ghtly building focus the trees generates
hopefully,
learner aclassifier interaction combined
training,
Final givento improves
construct dimensional of weak
only reweighted nodes.more rules call for
learn1ng for thrust
weak0.5. a are this that,
is produces AdaBoost, typically terminal or the Each The the
than that EXamples cannot a calls classifier During up
4-21 a got
wrong. with additive finding h,. 4.8.1
Fig.
PUBLIC
on
training
for aaccuracy
computational
greater classifier are - data.
repeatedly
It oftenthe up high ..., hypotheses,
data it algorithm, examplestechnique.
if periomance. in come of for for training
single h,, sample Welghted
Training sample sample
Weighted
to
strong
be concept. technique
algorithm
a classifier we observations presence :
produces that algorithm hypotheses
to the ainto
by
Learning classifiers. givenpractical the reduction
training and is device weak TECHNIC
fitting learned
generalization this In regularization over these
developcd an
it said a learned trees. define boosting T
for of a distribution wcakthe
of rules.
Deep with. ifweakis for
improvements a out
weak lcarner confidence
be recently
bias odel
A
reason then all
and, thinning of of
hypothesis is a we combine
the of vote misclassificd.
NelworkOrisS ginally is
lcarncr to Lmnthesis is is combination The
ofresults Revised
roves mostDaeting Boostingaas begin, different of set majority
whilebigh to useful learner. onc a
Noural the due To must anyTrain
/ A
"
Deep data
learners
Feedforward
Networke trainino firom is of give
training split aggregation and of of
rules
method
estimates)
the
sample best won't
unstable inaccurate
the from the it
A effective
when gives the treCs, (function
to examples set.
due dramatically data feature on subset moderately
based provably
results training predictions knowlodyo
random S,. from
on
whichever given
modcl. and and
the change algorithmh,(x)). in predictions
in m replacement. is rough for
features multiple
yeneral thrust
variance selecting and prediction regression
can .., combining
learning randomly up
output (x), M wcll. mean a
generate TIOWSon
20
4- to
the by (h, and with and
Ym) Majority very the and refers
decreasing
whose S, base observations
randomly trees. by
dataset selccted times data, on classificationto
(Xn training iteratively.
largest. model.data ncthod classifier
Boosting PUBLICA
..., bascd
trees) n of missing
dimensionality
replicate : = H(x) number
repcated
y,), classifier takenis the the is different ECHNICAL
by decision
Learning set
replacement.
with of areN features of prediction
the accurate
lincarly.
error (x,, result node to Disadvantagos
is grown n fitting for ofBagging:
slightly
changed.
is data bootstrap are from of
Bagging:
Advantages accuracy for
Deep decreases the combined thereset M the values very very
(like :T
training data split
of steps
predictions - higher them
..., be is over final 4.8.1 a
Boosting
and algorithms Suppose subset a
1, Form h, Steps
Baggingtraining to tree Above Rcduces Maintains
Handles precse is producing
Networks Pseudocode:
= Let Output used The Since Bosting
onbine
also
It Given A thurmh
t
For a. b.
1. 2. 3. L 2 3.
Neural 2 3
of t of te finte
instance rM our on
cxteled
spae classcan suld explain well
alagtative large
perfurmace a that ovet work
a arbilranlyodds we
varianE ol he ocapaoty to
iatnng cxatuplesdefncd
datasct, ablc also
vEry onpleity
to al
a their work ane arc will
Sostiny ant in l or I| If properties
sitally
hias by framc of
complexity
possible theythey knowiedge
L
number space
weighted the by but that
teducing
alñunth of cotect hypothesis cvery capacity
shattercdgencralization
belicve for
thm ESute thrust
in
Hte the latpest
proxinately
leaning |eafHs helps Mudels of
cxplain
Boosting suall up
Way # the
measure of X
of well. toreasons An
I povide9 TIONS®-
subsct
mcasurcsVC). VC(U) to veryhavc
and capacity
and obably ais fiuite f(o)stronyer
capacilygencralize
PUBLICA
weipht mension dimension dimension,
dincnsion then
Bagging cnoughfunctions
lurpcst
||, high have
equal the TECHNI
by t0
(VCO
is
VuDuik('hervolOk
allows
l(0).Vapuik-Chervonenk0s
An VuuikChcrvonenkis
tho that functions
shattered has if wc
between VC family
teveives of is f(u) hand, dataset,
wbieh The size argument
dsl
ihvual Dimension
vC
4.9 the family
thcsc
otlhcr data.
be unsccn
oodel f(O). Parueular
palll
in HDd by the can
hc
very ctions lunetions
Cxplaincdis X basicthc Cxpcct
Ditfor X of On
IT
space
he The
scts he I. e.
V
"
ptillwii olated
Nolwoks lpoth D andtrain
sot train
truiningset to
to
classilicd/misclusnilicd
training
disagrco
he
Iron the
C2
rom
replacement and
knowledge
CT
replavcment
fulscly wlhich
for
willotul on up-thrust
previously voting.
without D
set
l| TIONS-a
HODples trainingmajority
d2 were
subset
the via
tuining that PUBLICA
raining
samplesin lcarners function.
d3 interactions.
of C samples TECHNICA
Ily
zenalizutio.
pood
impleen. subset ndom
lener
iAlaßoo9l the C3. wcak loss
of
AlaBooel
Alvantayee |.Suboptinal
solution.
leaner Boosting:
ol (2. trainig ofAdvantages
diffcrent
tandom the
wek pevent
secod learncr with
weah
anailieta to all
of weakConbc well
aiple loostiy
Steps
:
Diaadvantaue9 aDaw uain s0 weakthe Supports
Daw thid Works
:ld Iind
Very lo
H
Alathal
| I, 2.
Deep
Feedforward
ietworksd1rected 1f class process
comesdata, beingthe into acyclac
representeu us
So. to back requires
componition then
the
laveri function
what finally directed
all fed values.
that do to typically this
in
such node g to algorithms the and are a and layer
a
simply by computers f model
through with
layers
define are associated laverhidden knowledge
as, / node they
is the hidden
layers 1sisl-1
computed the this
with of
into T layers
teaches flowsto of because the
by is used for
associated information
outputs ? is a compute
of up-thrust
partitioned/
with computed network.
’{t1}
is thatuses computations networksmodel together. concept
objects. networks
multilayer
net learning
technique which
"layerA sgn(w"u-e'). entire The to an
PUBLICATI
be weights
the
weights) g),
:p-1 Answers composedthe used
called functions.
4-25 can at recognizebecause in called introduced
nodes the o
g"o... Deep connections
intermediate be
nodes
whose
A of learning ? are will
simply with feedforward
feedforward are
to ffunction set example. networks different
the vary associated with visually networks are ?layer that
"|"go=p F= TECHNICA
= machine feedback
functions havefunctions
(u) are we iislayer Questions the
Loarning
Deep layerX
space component
f, R (as by and neural many hidden networks
one neural
e function functions
learning.learnspeechcalledcalledthrough
a
is at input with is
the of
together
feedforward activation
graphnodes of,(x) are no how
Di-I associated :humans
Marks learning Feedforward use Ans.
:Feedforward
and,
Notwotks
Noural the are x, are
humanmodels
layered cach of deep the
describing
form of,
Suppose . ofe
class
the
denote class fromThere
Models comp0Singis
TwoDefineDeepto understand the
oes ... where, Morcover, Why
Q.3 What
lo function the naturally Why evaluated
y. to
choose
and :
Otpt :Ans. graph
4.10 Q.1 :Ans. Ans. itself. Q.4
Q.2 y
of
Deep
FoedBorward
Networks ie instance shattered shattered
shattered of can that these
rectangle
minimal as
neural directed
:examples
th (111)).
examples. set points direvteu
there points
the h assigns
dichotomy,
(110), three A no
N
not by of h
set such
models
non-linear are
graph.
shattered of
with N=3 to one set that data.
(101), applied there
problem Therefore,
labeling
with possible leastevery of acyclic
be amount which
(100), the
instance, cal can at that for
knowledge
classification exists determine
the "-". directed
every
(011), in
...
that true
with infinite and
C, )} there
dataset a
For for X, be point for direction
(010), 1 0 (1, consistent layered up-thrust
if,
dichotomies. = thennot points requiring
dataset C 0), largest will remainingestimated
binary (001), class (1, h, rectangles.
the a a an
24
4- 1), 1)} is it infinite forms
have PUBLICA
a the concept the () general of is
Assume ((000, rectangle 4 accurately
possibleshatter X (1)} (0, (1, of of four the = edges
0 ) 0), size assigned
R) which
{(0),{(0,0),{(0, dimension
in most
aligned be
is finite e may the
: to the but no rectangles
2N dichotomies
examples said it. at Thenand be unitswhich
dimension
= = = is (a),
of models
a ({X,})X,}) X,}) VC(O Axis then cannot TECHNICAL
Learming is example "" of
set f() results: VC by
set. Nets G for
VCdim:points, (axis-aligned
of the that T, ({X, ((X, dimensionthe shattered network
one
possible
functions wholepoints dimension
VC Neural
Deep set consider f(a) an the I, T, If is
a f(a). five boundary The graph
aNÍ Shattering in as with be of
be shattered. the a
all Consider Examplehad VC networks.for Consider
VewAs and of functionvectors
VC functions
can contains VCdim acyclic
of class we VC cycles.
Then: The that four The
R set If
4.9.1
" "
Ve
Deep
Feedforward
Networks to stopping
los data do an some comes asnetworks,
10technique
regression. the is
the of complex in
We augmentation
values data. and augmentation by
to data neurala networks,
Lassoterm regulationsquared learn of
diversity of
isstopping
penalty solution.selection.
outliers.to training
called the able Data amounts neural knowledge
of solution.
data
sparse featureto is and data. Early
is as sum called
L
weights, robust
regularization so when especially
coefficient
technique amount largeimages, for
the non present algorithms. up-thrust
no need used
Calculate a is not patterns.one
hás There L'is has the
of
regularization regulation
? L L' already
increasing
we ofmillions often models, an
PUBLI
magnitude" learning
the patterns. very learning
4-
27 the
of or learning
?regulation
parameter
norm L' values and complex of transform
deepthousands isthat machine converged.
L' of and process
absolute simple in
uses value
techniquemachine
L'regulation selection. ?augmentation
as stopping. TECH
LWhat
Q.13 learm the we leaming, iterative
that "absolutebetween are collect have
Learning the solution. outliers.that cannotsolutions. is rather
model of feature augmentation earlya weights
weights,
! model is otherin
Deep regressIon
adds difference sumL'called but deepto stoppingoverfitting
sparse multiple dataset data, feasible about
regression the in to generates interpretable some the
Calculate built robust in
a new process
NetandworkLissWhat has has has is not Discuss
Earlywith before
functisiton. L' It isL' L' L' What Data collect is rescue. cOntrolling
A integral
it : as training
Neural Q.12 :Ans. Lasso Ans. Sr. No. cases
0.14 :Ans. Q.15 Ans.Well
1. 2. 3. 4. 5 6. not the
Deep
Feedforward
Networks of the cost
of gradient (z) can the output called
information gradient. calculates
gradient neural
minimum So transformationit which
quadratic
determined. means an is a
the This using of
the distribution produceinitial method complexity
localof is That y. compute weights.network's
negative It produces networks
are b. affine variable. and the
a the
find both biases probability provides to
an x function, the knowledge
To the input finally cost neural
labels,
and computing binary lower
to
algorithm. from
an x
proportional w inputs
and artificial
errorneural that overfitting. for
corresponding
weights a only accept backwards up-thrust
of techniques
(x), value distribution. The layer an the
optimization the to
-y the vectors network each
network. of and to an
4-
26 steps la(x,) the that flow learning respect the TIONS-
network
thereon predicting
meansBernoulli at different
prevent
takes 2 dependingunits.input neural units to
first-order N are the information with PUBLICA
also supervised
neural
one = y hiddenof function.
? feedforward
hidden
through function of thus
? ?
C(w,b) and value
composed whenThatthe propagation
?
set and
TECHNIC
descent descent,
Learning point.function
as, is the backpropagation artificial a
a vectors of used predict forward allows
Backpropagation to training
is define ?
nonlinear cases. for error ?regularization
it's components unit to refers
descent current
gradient are is up algorithm
Deep gradient cost inputchanges unit
sigmoid to a
forvward propagation.
forward an the
function units two use flowspropagates Givenof during
Regularization
the the sigmoidable
element-wise we
and Gradient the Hidden
only gradient
usingat is is
is
is
Networks function are function the unit is Wheninformation is an descent. is model
What What Cost List What for What What
x function. A sigmoid then is What
function where : : predict : the network
: : Ans. Q.10 Ans.: It
Neural Q.5 Ans. Ans. cost
and Q.8 Ans. Q.7 Q.9 Ans. that Q.11 :Ans.
the Q.6 y,
Decoders.
DeenLanguage
-
Recurrent
Neural RNNs and
Natural
Bidirectional
Encoders
NetworksCompression,
Stochastic
-Networks
Image
Autoencoder,
UNIT
V Neural
Generation,
Recursive Processing (5-
1)
Kegularized Networks
Image DecodersAnswers
-Introduction
Neural Language
encoder,
Applications Networks
Neural
Recursive with
Networks
Recurrent
Deep Autoencoder
Regularized
and
Recurrent Encoder
Auto
Complete Questions
Encoders
Contractive
Natural Encoders
:Networks
Auto RNNs
Bidirectional
5 Encoders.
Contractive
Complete
Networks
Recurrent of :
Marks
Introduction ApplicationsStochastic
Neural
Syllabus Processing.
Contents Two
Recurrent
5.10
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9
Deep
Feedforward
Networks
knowledge
for
up-thrust
an
Notes PUBLICAT
-28
4
TECHNICAL
Learnng
Deep
and
Networks
Neural
Recurrent
Neural
Networks time he next one rememberver past your are gradient the hs valuesme the threshok
stens ways
the by inputs stop emor that with
same can the is are in the N GRU
timesentencewhch complicated
computed higher will algorithm timNS
predict lavers RNNS about the change when leaming to
the model LSTM,Gradiene
adjacent lavers, gradient.
vanishing
when rvent
at information a
layer a to hidden right. then to occursstop gradience"
in needed in be function the regards an in
function,
acrossplace hidden
subsequent to state can function,then encounter tackled Clip Aowedge
These left to
gradient
to is of hiddenthat zero,with model BTI.
intomation of a activation
wordwordhelpRNN. from lot anything and of a weights been explodingTruncated
of is will vanishing
computational for
the the a their output
gradient slope uo-thrust
to nextprevious data store slope has
the compute user Relu
passing with lavers.
activations update in problem and
example, about the to thethe the change
with an
the iSsue :
properties
them :Exploding if when is with
vanishing TIONS®-
next through in as the challenge
5-3 ynamc regarding to changes Similarly, weights. overcome
thiSfeatures
the allows
themcan gradient the scenario the This overcome
layer's for RNNS rate
solve in two indicates causes LSTM.
learning PUBLICA
where, used Scans
unique allows the of
are challenges second result. challenges
each information
can that
combine time,
consider faster. a to be be
connecions be network measure value This of can TECHNIC
state that is a
produce
synehronously
propagate
Learning cases RNN and to learmgradient
This The concept can adjust
information and
dynamics RNNS: small.
gradience
gradience
theyhidden of we slope. high
in sentence.
importantneural neurons typesto If to : : the to
usedthe because used model A gradientgradient
extremely
Deep to recurrent Here, of steeper
modified. process. to0 to the overcome RMSprop
Distributed
Non-linear obstacles
efticiently. two time
of Exploding
Vanishing
and be recurrent
previous enough is introduction
the mOst a are processng
Networks the can Podicted. powerful, a helps Exploding
computer. face gradientsignifies Vanishing
n s in
learningassigneddSSigned to and
tthe With RNNs slightly This How
sten. the The 2. major a) D)
1 A
Neural " "
The "
Recurrent
Neural
Networks Neurl
Recurrentsequences a by
to internal a alreadycalculated applied
are
stateg
having
hidden ...,
x(1),
x( processing like Thest the samestep.
patternsdecisionsof irO
element network dataprovide
nodes an user node. the
of been with
dymamics for use has receive
of values make and a other nodes connections
and every has are
neural
steps),
network specialized RNN Output
layer
(Classes) contain
as computations
known of data to for what output
everynodes
while
inputsthe an it
series that sequence
the
capture allows task about
in networks, network standardknowledge
input.
characteristics to Input
networks and
is in a are connection
data same information data,
cyclesprocessing
that this previous previous hidden. (or
as that neural
sequential network. While
the
steps for
used networks neural input; the neural input up-thrust
Networks models as perfom
be of for from the or
recurrent one-way time
thought sequential the
artificialoutput,
to otherhistorical on captures neural an
5-2 processing learningspecialized
outputs neural leaming Recurrent modify across PUBLICAT
Unlike theydependent network of or
Neural be recurment input,
Architecture
of directed each
unrolled
networks
previous
deep can are the rememberbecause which
alongside class nodes
image.
whichare networks recognize
scenario. either at
for
Recurrent being"memory hidden
are that of a applied
networks recument architecture is a
allowvS
networks
connections,an
networks to output with as
5.1.1 network TECHNICAL
Learning current
considering
convolutional
as to likely it input layer
Input classified
network, parameters
such a Fia, each
It designedenables the have
(RNN). called neural
Deep of neural neural neural X, next nodes,
Introduction with they showsNeurons results.
recurent values that be the
and ofclass Recurrent are the re that 5.1.l recurent
A Networks
are
RNNlike memory sequence, underlying
Recurrent
of
can of intended
Networks predict
RNNs RNNs Outside
of know
SO
far. nodes
just grid Fig.
series
Via
A
Neural 5.1 "
the this
for usually to IS Is cntity
Recurrent
Neural
Networks
e\anyle
arpl1catton outpul
output timec,into
Namc
RNN, fall and
2 sizcd same
Many-to-many An
t1
One-to-many networkspopular output.input) ouputs.
the
fiNcdVanilla
At
to image.
neural A in
single SCVeRl
as outputs. words knowedge
(b) (d) inputknown acreate
5.1.2 5.1.2 or traditional (multiplegenerating
sized word multiple for
Fig. Fig. up-thrust
fixcdformerly to
a uscd
t
as All PUBLIC
for such create review category.
for
value. that
usedwas input, inputs used
5-5 to movie's
token uscd this
is This review.
are
networkof
is several of
variety
single
classitication. generation. inputs
example
input the
One-to-one Many-to-one of is TECHNI
a
neural single Inputwith Several
single Consists
Learning as
tamous
associated
produced music analysis.
This a A
(a) (c) by
image :One-to-many :
Deep characterized is :VIany-to-one a
Many-t0-Many
5.1.2 5.1.2 :
One-t0-0ne many is
are Sentimentrecognition
sentiment
and category.
exampleOutputs
NetworkS Fig. Fig. to
One
Xt+1
Neural 1.
2. 3. "
Recurrent
Neural
Networks yery is
which Many
images. the
is
which outputs. calculate and
inputs
like image
time uses Many-to-0ne
of data finite to
and network.RNN
series sparse time
the and video
CNN
information
throughout feedforwardthrough
model,
arbitrary for inputs for
One-to-many, knOwledoe
increase.
applicable used theback-propagation
vanishing finite training
not the a
be processing.
for
CNN
has can has
is up-thrust
does processing slow. gradient CNN CNN CNN While One-t0-one,
RNN steps. loss.
size networks. is
computation Networks an
5-4 of each
remember time and CNN simple -
model for difficult.
Disadvantages of
text sequential PUBLICATI
length. the memory exploding are
and length and a
predictor.
the across neural and uses Networks
any be series speech Neural
larger, the in handle CNN
feedforward can RNN RNN.
to
modeled intermal
shared nature, as restriction
of is RNN time for
seriesis models model, Neural TECHNICA
Learning inputs such between
used to Recurrent of
size for loops
and time be their recurrentproblems applicable outputs.
noand
inputsprimarily theback-propagation. types
process inputcan with RNN data.
sequential Recurrent
have on training
Deep Advantages any use shows
RNN
model weights case of Difference Works
in the can its can analysis.
and can to
Training
Disadvantages
: is is of
if
helpful the to Prone RNN RNN RNN RNN While
data. Types 5.1.2
Networks RNN
Advantages: Even RNN Due .Tyes
of to-many.
The not
Sr.
No.
Fig.
5.1.1 5.1.2 5.1.3
Neural a) b) c) d) e) a) b) c) 1. 2. 3. 5 "
called
highly tokens, fixed
of adocuments, is
in past supervsed s
trom do Iprocess
Recurrernt
Neural
Networks aspects models. it
running difficult neurons
both processed
are also word with
from in
additional
directional
Theyare semanticvector recursive layers
commonhow
for orsentences information is
fall networks
structures. vectors hidden because input
2s and a and some
looks into models more the two onCe. knowledge
neural syntactic
valued phrases, two semi-supervised
where necessary,
receive the at
network tree-like tokens is
network Recursive Recurrent connect technique since updated for
real capturing one up-thrust
e.g., to start.
neural learning aggregate
low-dimensional, them RNNS, RNNS,
is be
back-propagation
neural units, tasks. (BRNN) the
language.
:categories learning
allowingor both an
PUBLIC
recursive successfully
text NLP
independent
to as
unsupervised
Recursive
for endalgorithmscannot
5-7. to Networks
used larger other deep model. the
a and
needed two output, from layers
Wnatb a,
a generative
learn are for
5.2.1be scenes corpus, into than two
probabilisticsimilar
showS can firstused Neural
single
other If output
methodsinputs
fall combines another. TECHNIC
Fig. networks NLP: rather the
Learning
natural datathe is be this with
Neural 3.2.I D2 can a This and
Bidirectional
RNNS
5.3 Recurrent
model to reliable
approaches, and one
in based
largewhere achieving
directions
Deep.
and Fig.
Networks parsing nets that states.
RNNend trained input
with
neural
following learning
Deepa taskscompositional
Recursive the
dimensionality Bidirectional a Bidirectional Since
from
Recursive
Tree for future calculateto are nteract
for Nets. For oppositelearning
start DKNNS ieeded,
mostly Models
Juseful
The text. and to the
for
Need "
Recurrent
Neural
Networke straiot.
moves at current
what the
l and
calledlanguage natural the dimensionality
fronm bad considers
about phrases along
-
are the are in
direction and considers
anything natural
information nets constant
Networks
information it previously. primarily
receive decision, Feed-forward
neural
network neural and
one in remains
shared
only remember
they effective to
structured
in a received trees, input knowledoe
Neural dimensionality
movesThe inputnetwork makes are
layer. not the as weights
it structures
only
Feed-Forvward the feed-forward
can
it inputs and were for
When of
output up-thrust
information of simply in-depth
RVNNSstructures the
memory loop.the vision.
computer
in and
from 5.1.3
Fig. data network,
the an
5-6 shared PUBLICA
to
layers, a It a learn(RvNNS). processing
time. through
no Because learned and
and the training. are recursive
have can embedding.
network. in Recurrent
neural
network sequences
RNNs hidden Networks
Neural
Recursive
that weights node.
every
at
next.order cycleshas
networks except
its Networks for
it models as
as
well
what used a TECHNICAL
Learning
between of
coming information the In
neural
the
through learning
word
notion also is sequence.
network,
the neural
through
network. past adaptive processing
Neural on networks constant
but
remains
Deep
feed-forward what's and
no in
the for based
Difference Feed-forward the input recurrent
the
and layer, has Non-linear
predicting RNN
happened Recursive
processing
Recursive
sentences
it languageof
Networks current
input, a length
a input In a
In In
Neural
5.1.4 5.2 "
3 2sks2
fhe speet trt zd singic 2 ransformatso
PBrret
iera
etwres ntormaxr f lagTage represctted
transíormatn
n parattes
etwork f ords
seqes sernence pticuatt 2
pcessng nchudng with
rtural
le contertal the
te associzted be affine
tha a in by of Would
mip predictxrs in are
appications. blocks shallow
res lagagt s influenced
processes word
iniportant Theysuch learned krowtedge
word is that
Thas next sequeTce, three blocks ato transformation
making raturalcurrent captioning be and correspondsa
Recurrent
Neural
(R\N) orleft. apture the vanety
aot into by deep. for
Tmay state three up-thrust
to in predict input decorTposed represented
RNN
to
nght wher CKampie. the image sTtence hidden
Network not for these PUBLICA
steps entire thesea an
may accurtely
context and mean
5-9 rnght used
in next of making
FÛ
timenetwork a be cach of transformation
proceSSing
the of the eachwe
important on meaning can state,to output.
left earlieT pTediCtion.
Bi-directionalRNNs:
for
Veed not shown unfolded, transfornation, of
MLP. nonlinearity
been depends RNNS state ways
from the
irOTn may
Networks hidden the
language word. hidden archítecture deep TECHNI
as have
provide the :transforrnations
associated to different
Learning
Deep either output
RNN
limiting, output most
information where
given the state is a a fixed
uni-directional RNNS previous networkwithin
is
direction, in
Recurrent to hidden
Words natural
theanidirectíonal the input
computation rnatrix,
i.e., shallowthis a shows
he to tasksa by
where after
Didirectional
RNN the layerTypically
followed
Juse
mlwun -olevant previous recognition, the the the 5.4.1
single Drocessing
and Whena single
1asks FrormFrormFrormthe By
A beforeDeep weight
The
" With Fig
J. 2. 3. 1. 2. 3.
in
5.4
Pecurrert
ieural
Networks of and the these masked
nas outr sane element-wise word and be well directions.
Weights sequence forward to
torward
place, infornation second not perform
the applications, might
is
takesnext. a25 R\N the the both
ordeT. include the ríghtTo
ine in
passd sequence of in
above,
in oppsite right-to-left
outputs captures predict But to sequence
tirst reverse labeling left
contexts decision phone.
aTe cTTplete. to knowedoe
processed iiput fromdiscussed
the states in the thus network
labeling
sequence the
pass. backward or inputs
the The seqeTce
comcateTatesConmeenated time process
backward
are OLtputs2RANN RNN one. cormpany, sentences for
are backward takesvectTS. recurrent up-thrust
pesses 1RNN in local buy the
input Bidirectional In to
states that and step there.
a to fruit,
process able
and state eachinput.
forwasd for the going for an
RNN the that
5-2 backw2rd backard PUBLICA
the forwrd hidien work be
prOCESSES
network
basisuse
at current be onlyanswermust
For traditiomai am
5.3.1 output could I could
the the J can
and of
andand rnodel
and passed
then Fig. cornbine as answerfrust.thatright
forwardseqeTKe
it bidiretional The the servewe
forwardfirst. a EZept where a Recurrentthe the
Neural
Ntwork
Lsarning are of
multiplication. TECHNICAL
s can
favouritethe be
favoritefavorite predict
taskS,
neurons to right cannot
prOcessed
the RNNz RNN, exanple
training. outputs ways outputs sentence,
Deep afer shos language
accurately
left-to-right backward
pass. the my my my it
left-tight sentences,
ard output onty sirnple to oncaenated
setence.
a
in is is
are nd of Consider
an is
getieral 5.3.1 Apple
Apple Apple first
ietwrns Teutors
updated vectoTs and
addition natural
betore Other the able
to
The left third
the Fig I. 2. 3. In
In A on
ieura
"
is
naturallinguistic neural
that of advantage
is
Lemmatization
data machine learnscan neural here levels
of possible of algorithm language
Recurrent
Neural
Networks to written langaoe autormatic
range
algonthms
learning.
representations,multiple
machine sequences text Models of ,
learning As features
representation to
and Wide are in by are, a tasks. compare level
ratural generation,Google features
process learns
a spoken NLP ofrulesmethods.
NLP NLP machine
and dimensional lower
or (NER), over model.development. removal, NLP
Computeras and . for linguisticprovided.processing distributed
as knowled
is refeTed of distr1bution and data it
performance
that The
it engine used statistical of model
as Recognition language cleaned word of abstraction.
dialogue analyze the levels high
combinationProcessing is
by languages
are algorithms to NLP for
search Algorithm Stop designeddata repeated
preprocessing and up-thrus
used etc) the is uses multiple i.e.
features
Probab1lity data to training dimensional
representation superior for
Entity
Hindi translation, powertul defines Tokenization, It
saennoogy
Processing human deep
learning
complexity
order
text suitable of carefully
system:usingthrough crafted increasingTIOan
(A).
Intelligence 2) types Language
learning
Engiisn, Named and preprocessing
language the achieved
based
models.
interpret Siri, After
models. it main uses by learning hand low
5-11 preprocessing
make based from
machine Alexa, by, rules sparse
of
dimensionality. PUBL
(e.g. extraction, in use using
of of
(INL)
Language and natural done
: TwO:It task Naturaluse have
language to developmentsystem and
network advantages
/writen
analyze Artificial assistants preprocessing
:
In it.
learningowndeep with
methods TECH
in as the word regression
models
embeddings
Processing a Data be Speech
and
tagging.process their represented tasks.
relationship in so can based perform for
LearningNatural , spoken applicationssystems. bytes 1) highlighted Machine offers Traditional across
manipulate of Voice use Preprocessing design Learning NLP
area
to Rule
Algorithm curse logistic in
or phases developed
basedapplications
words to network
based important representation
shared
Applications
Deep : Languagelanguages like
under eummarization, learningis of
two oData 1. 2. nomation
problem
and cland. finds annlications
NLP
to Deep language. But with and
characters, are
Networks falls has o Deep SVMMost be
.totesralt..nan can
NLP NLP NIP NLP a
due 5.5.1
Neural 5.5
Recurrent
Neural
Networke input to a be
for between
easierof used can
by raw : time
lengthening blocks is is This
mitigated the state. it shortest layerpath
transforming general RNN.
introducing
skip three
hidden shortest
hidden
be connections.
- can the In the ordinary
path the longer. nath
of difficult.
makes singlethe
Theeffect deep of each
hidden-to-hiddenknowledge
of levels
become of
RNNrole for a the
depth with length
optimization with
deep) for
and an a higher
hidden-hidden
hidden-output up-thrust
playing
input-hidden,making extra1+ MLPthe compared
be the (possibly t
can step an doubled the
at
hierarchy the
making an
5-10 computation of appropriate if TECHNICAL
PUBLICATION
timeexample,
Adding in
the ways perceptron stepsconnections
in in have
the by variable
introducedDifferent leaning time
For
architectures. we
Deeper of more
parts. levels multilayer different
toa : transition,
5.4.1 connections skip
is output.
hidden hurt
Learning that hidden t introducing
groups Fig. lower
can organized
hierarchically representation may step two
separate to to shallower
stateinto
Deep of hidden
to hidden depth time stateany
skip
recurrent think input from
down by
and a From Fromadding state-to-
Introducing
in
can have Fromn optimize mitigated
Networks broken varjable variables
Hidden a
We intoWe 1. 2. 3. By the
be
Neural
human attitude another like to
be
and context question applications
service
Recurrent
Neural
Networks applied
is based can
unit computation by behaviour,to
RNN input visual chat
recursively
Recurrent model asked language
tokens. of :in prOVide
to
composition translation, questions
previous the useful
NLP is results,.
of possible )
(NER) analyses natural
computation etc. to
for vector the is biometricscompanies knowledge
computed
Framework of proper machine one It
NLP is
results
recognition answer text.
fixed
by
size It for
for it polarity)Processing
:below mining. in into up-thrust
In
RNN
processing. fu So and captioning, text
W framework
the to wordsvoiceand
previous RNN
computation. system
(RNN)
)based memorize like, Entity opinion or organizationTIONS
represented
sentiment describedautomatically. speech
Translator).interface,
with tasks web.
spoken
O1 image Named Language
the
13
5- from based
length NLP as on
Developing
(e.g., are user
called Translateconverts PUBLIC
sequential
for
data sequence
can current (e.g. NLP sender
sequences RNN arbitrarydifferent (e.g. Google many
5.5.2 Unfold it generation classification language voice
Network of Also
5.5.2 that classification Natural the by TECHN
tu in applications :translation
recognition
:It
Learning input Fig. is of
:answering of
:analysis (e.g. word.used
Fig. information in modeling
Language
3. state
Neural
Recurrent
5.5.3 the in X
RNN inputs used matching
Semantic
4. of natural
language languageM/S :isIt
of with shown Sentence-level
5.5.4Applications emnotional
effective
instance of are level to
Deep fed in common a Sentiment
Question
sequentially advantage RNNS answering)
dependencies in Machine dictation5. CUstomers.
and is Natural - beings Speech Chatbot
natural
framework that Word
"
Networks areRNN Mainly and
each utilize created. Some
The 1. 2. 5. I. e 3. 4.
to
Neural
Recurrent
Neural
Networks 1 learmed embeddingproduce output also Networks
s
s representai
Fig, and
are to the created.
information
NLP features.weights word embeddings of Neural
vector dimensionality
into is
for level representation
Recurrent
Frameworkintowhere transfomed contextual
high : knowledg
transformed
approach follows layer.word
extracting WN-1 NLP the Therefore
of
embedding reduces distance
sentence
for as further layer for
based embedding are
framework up-thrust
are classification
Softmax layer
|Fully
connected
for CNN input This longcontext.
words is final
n-grams
(CNN) W, it input
with Then filter. handle an
5-12 this the of -
the word based the PUBLICATI
or Here
Network modeling to
words.forms eachThus order
words in
Wo applied cannot
CNN
results
framework. table to output. sequential
Max-pool
over
time
Convolution Lookup intoThis applied
constitute Feature
k sentence 5.5.1 sentence it this.
layer Feature Input are lengththat
Neural This 'd'.
tokenized for TECHNICAL
Learning
network. Fig. filters is
dimension suitable
is preserving
table.
based perform pooling
fixedCNN
to
Convolutional
Deep of are Convolutional
used look-up
CNN training feature
map. produce of more
Sentences
of max drawback
and be to in
can the through steps matrix (RNN)
are
inefficient
Thenand
Networks shows during
CNN The The
1. 2. 3.
Neural
5.5.2
only
encoder encoded to gradients ais he example, featurecalculate
Recurrent
Neural
Networks Siilar Code in
input nodes latentWe
Output The following networks. For
the values.techniques
the of identify
input.
decoder. decodes number :constraints
reconstructs inputs.
autoencoder.
input the
Decoder Reconstructed then descent neural
training The to recreateknowledge
and the somethe itencode
-2 input code then It to feedforward
choice. to
autoencoder f. identical gradient the undernoise
function ,networks to for
representation
decoder
Encoder, training addinginput, up-thrust
training. user outputs
Decoder values
minibatch fully-connected
the a feedforward ofdimensionality
before or the take
? an
TIONS
of :components usingoutput
representation
code, for its space
Compressed
Architecture
5-
15 Code x, used set to feature
- values inputs
latentWe
the create as be we PUBLIC
produces of such can are that its latent
three the the work
Encoder 5.6.1 inputto case hyperparameter
back-propagation
decoder copy
g, network with of TECHNI
special
function Decode
the
dimensionality
the autoencoders
Fig. of and the
Learning
consists ANN to
and learns
2 Original input encodes a neural
Encoder input a encoder
an
usingis An representati
Deep autoencoder
this
using
the code.
autoencoder of a autoencoder
Autoencoders by
feedforward layeris
and compresses f(x), Computed layer does
Vetworks he limiting
values DonSingiecode How
Input An As
Neal "
Neural ot ,
Recurrent
Networke a
fon learnìne its innutand image
to An to to training be since to the is representation
summary architecturesto
inputs, back used similar will input output
accurate error. of
inputthe =f(x) the technique knowledge
unsupervised representation
network it its the
the represent encodes
its
reconstruction datagiven input,
copy
tluent, h then
to function the gzip. 1latent-space
Cteamy deep equalcopy representation
learning. compress the learning
first neural for like and
to compressed
is as trained
in an to to code
suiAry used encoder latent
autoencoder specific same
algorithm
OT and is be trained the artificial unsupervised knowledae
to
network meaningfully networks as
pveSS learning values code the minimizing the lower-dimensional
is features exactly a called
input.
document. An decodes compression represent
The a an for
neural :
targetthatdescribes an also
a dowument. unsupervised parts is neural the up-thrust
is
network digit, It of an
backpropagation,
setting
the then while to learn be is of
lt the autoencoder two technique.
task able not considered to
r=g
(h). handwritten they feedforward
networkcode"compression"
an
: ot that representation, data will PUBLICA
145- summarization neural of data the only on. autoencoder.a
textcontents Since into he
in h consisting or standard
autoencoder
layer reconstruction the data train 1
loner role learning are representation.
are representation.
the compressed
An of compress on. ofAutoencoder
important tundamental a to Architecture
Autoencoders of in
tasks., typehidden of unlabeledAutoencoders
thana labels type
as latent trained of imposed or
tnom "summary
Complete
Auto
Encoder special viewed image unsupervised the architecture
specific
docunment other a is TECHNICAL
Leaming to
dimensional different explicit
intomnation
the has a an been of degraded input
algorithm
applies
that a produces autoencoder:
of
learns of Properties output is this
retains a and it be given encodings
play is Internally,
may : have : necd a bottleneck
Thefromcompact
leamingautoencoder autoencoder Data-specific are
Dee Automatic is are The Unsupervised shows
input.
and Autoencoders network
that example, autoencoder theythey but don't Autoencodersreconstructed
aN Rievant lower
an
image. data : close 5.6.I A a
Ners
short transter whatdata,Lossy original is
output. decoder a output. they
learn a which
An The For into An Fig.
0. 5.6.1
1. 2. 3.
5.6 "
Ne
and trad1tional preult which
bottleneck with
Recurrent
Neural
Networks and input to the nsecmetting an imagefree called geteidi
nusennl supOsedin not
present of denoising back
"sees" content noise iS to
bottledecompr
neck. essed
between any by the compressed
So unusual the
extract using dimensiontousc trying
commonly describe detection
is
for high.image. Then image can and
difference network systems.
detection
anomaly the denoised
it about they
orspace
then image
iswhich very representations.
accurate we image
to anomaly instead input that its casily knowledge
it difficult "D", by be information a
data the will or be an from latent
network
adapted not the in
the represent for say
anomaly image, highly taking Image bc for
is loss image representations
could than generating
can up-thrust
learns for
task their neural
it useddatasetreconstruction
be proper
this. and and
in less by the
alsodata. whereare not suchas that TIONSa
dataset reconstruction identified
powering for noiselearningefficient unsupervised
done reconstructing data
can unusual image
autoencoders will for the images is is autoencoders.
:follows systems usedfor theseinput
network loss
attributes obtain these dimension is
5-17 specific by It
specific
low be search perform data.
"sees" reconstructionf
easily
at can them complex
decompressing 5.6.3
Autoencoders
Undercomplete like the needed. PUBLICA
as such good to
are This
with
Undercomplete perform autoencoders
not to an input
autoencoders
of
with latent be performed thus undercomplete
substitute
autocncoders it in on is can when TECHNI
dataset.
whenused can are do input denoising code isautoencoders the output,
trained it
trained autoencoders autoencoders autoencoders
as autoencoder
cannot
Dt becauseit as whose of
Learning us be image thethreshold version network
: detection
input is
denoising:
Image
2. fed by
network tell can is result Denoising as
for Compressed
denoising dataformed for shows
of data.autoencoder
and Autoencoders the
reconstructdataset autoencoderOndercomplete
undercomplete. image
use
Deep application the a anpropriate used compressed the
output anomalous this noisy
Denoising Denoising
Anomaly
neuralrepresents As is methods. primaryof
5.6.2
and training
image. to image. image be same
Networks Image from a help
and Due can region. forms
The the Fig.
Two A An the
" " a
Neural 1. "
Recurrent
Neural
Networks
bacl aro ara the the eachlaver loss mean hidden
decoded
reduces understand hidden
the features originalinput
us
they Thev is considering can
we more withthe as else minimize the
the the
later
Autoencoder important combination
ofthe in We
errormuch autoencoders. Also used
decreases in
size 1] in presents
inputis PCA.to image.
recognition.
reconstrUCUON without
be [0, encodersto present
how the decoder. layer
can range
Smnaller Encodes than triesof
layer autoencoders
Denoising
4. Hidden some features an
on like cross-entropy powerful state of
the the autoencoders
Sparse
2. input. deep Autoencoders knowledge
the features
based training per in reduction. learns
encoded image
the layer. we in dimension. uses a latent
as nodes are as nmore is
reduceupdated back it for
for different
deep values dimension This which the
before middle binary items. error, the up-thrust
of increases
encoder. dimensionality
is engines. extraction: used
as follows: input it from identify
To is the be number hence the features
set input or
or Autoencoder are learn an
output.
Weight be in can the books reduce input PUBLICA
16
5- samedata recommendation to autoencoder
to
must nodes autoencoder and the errorthe autoencoders
are
as to
The of helps helps
to compared
and weights. encoder if
symmetric linear the nonlinearmovies, to the set
feature
that of : squared
used reconstructs
process
layer autoencoders newautoencoderstogether
input number
hyperparameters is of non has recommend
The Variational
autoencoders.
autoencoders
Contractive
3. 5. dimension for a Stacked
per the MeanCrosS-entropy Application
for
and
layer the generates TECHNICAL
the the is in
Learning the : of used. stacked
comparing
error. layers nodes decoder used linearused usedIn It
update of in
is and
output.
input layer : Undercomplete Output input.
error.
the function is types Encodingrecognition
Image
It COmpression. error are smallerof are to are encoders
Deep of of of preferences
dimensionality Encoding
and for four :size subsequent reconstruction
Autoencoders
NumberNumber structure
function. different
squared and Autoencoders Autoencoders the
and by follows:
as
responsible output.
propagate are Code a in
loss Loss Use to
Networks
features. multiple
present
here layer user layer.
the 1. 3. 4.
The 1.
as data.
2.
I 5.6.2
Neural
shallow
thatits discover
the includes (x). training
copy process.low
Recurrernt
Neural
Networksloss, the leamereat choxsing the this copy some
on provides where once. inputs layeT
if
i disteik. function
reconstructionthe datast still is
capacity based decoder to still autoencoder
at of output with acthvauos.
by visualized
but ability activenumbers activations
than data successfully
autoencoders can the
decoder
the overcomplete and lossthe They be prevent during
input lessthe model unit knowiedge
about a
encoder is Sparse to cqual
called to nodes. allowed
autoencoder
the and to unit hidden
function
is of the
use addition
frorn features trained activation.with is hidden
is dimension useful encoder for
autoencoder if Regularized
the autoencoders input are
amount This strongest
reconstructcd andeven encoder lossthe up-thrust
salient to keepingin thansparse layer.
anything able properties the
great.nonlinear
distributionthe of of
greater small
level auto hidden the
in distribution an
PUBLIC
undercomplete codemost too be of modeled. generic terms but
bcen function.capacity by regularized
should sparse
5-19 the learning the a all
captureis be capacityother nodes only the additional
has decoder zeroing
autoencoders candata A with single-layer probability
image be
autoencoder data. but (h). on
identity the hidden corresponds introduced
an to in to model have nodes
Autoencoder
Regularized
5.7 the
autoencoder and inputs, manually TECHNIC
train
the learn distributionsmal1, Sparse
5.7.1 the
Autoencoders by
failsand to
Learning have hidden obtained
the
to well encoderabouttrivial code the model from thansimple
usedhow autoencoders
they of
size is comparing
by
limiting output. autoencoders
node
undercomplete useful learna of
architecture features nodes and constraint
Deep function
of and theregularized dimension codethe shows be or
of a (xhat) may value,
check encourages of h1dden
and of
dimension something complexity
of its input
data.
to the obscurity
important 5.7.1
loss
The
Networks capacity enough Instead to
ability. Sparsity Sparsity
outputs
by desired
a These Sparse
input more
Citner
is Any and Fig.
it .o .A the
as
Neural
our is lossdissimilarautoencoder
salient an the formform
without disadvantage
non-linearnon-linearand works
higher-dimensional
Recurrent
Neural
Networks case,
trainmodel a of autoencoder
that that
most minimizing thissubspace
the
simply being undercomplete PCA learn alearns something
that the
In a
capture for (PCA).
principallike at undercomplete
we ensure as put can autoencoder
(x)) methods
layer - simply is reduction. into
term to (f Analysis it that knowledge
Output to the autoencoder
way autoencoder
g
described an of result,
autoencoders
regularization
autoencoders penalizing,
error, learned dimensionality
Component think the an
Decoder only a a from for
squaredhas to in As where up-thrust
our is tend undercomplete
the data relationships. activations
Thus, process
function task reduction
explicit forces
Undercomplete we undercomplete an
mean Principal represent learning. -
Hidden
layers copying in
18
5- loss. reduction, better PUBLICATI
learning
representation
loss the non-linear
dimensionality
no reconstruction error. linear
has a is as the to perform manifold the
The is squared L subspace
performdimensionality
hyperplane like reduce
5.6.2 autoencoder L and build
Encoder data. data.where side-effect. all PCA.
undercomplete linear therefore, TECHNICAL
Fig. onlymethods as remove we
Learning
input mean same
(f(x)) termed
nonlinear with
the training to layers,
Input
layer to the
is trained lower-dimensional
can
information.
losing footing
undercomplete the decoder
the a of
according
memorizingg as think PCAwith and also we lincar
Deop the L(x, as span if
such data
autoencoder relationships
of
an of
we compared
However, fomis Effectively, equal
and Learningfunctionthe to
featuresx, training manifoldonly
Networks model Whenlearns
from When This use an
An not at
a
Noural
by then dimensionaliry
where this undercomplete ofvariety
the
Recurrent
Neural
Networks a the
d1mensional onginal
mage
reconstructor
loss
aganst affiliation performs
is where Measure must input los5. compression.
reduces there afor
inputs
point autoencoder
the L used
that
identity-function or
autoencoder non-linear
in
this of L,
autoencoders thanthe representation(like is be data
function.
refined networks can knowledge
training,
layers filters and
the the manifold of and detection
more Decoder difficult,
hidden help
easier.
much
becomes of robust
identify
algorithm for
during of that
autoencoder up-thrust
standard the riskso a
learning lower-dimensionalthe types
autoencoder,
more useless. seems more autoencoder
simple anomaly
Tuncuon this input withthese learning anPUBLIC
autoencoders
of has by learn
version becomes around image in a
autoencoder Denoisingcorrupting noise workused learning denoising,
5-21 1denny input. autoencoder
the supervised
an autoencoder. get the noise generally
autoencoders
stochastic then of in
in Encoder original from
function. 5.7.2 to randomly rid fromainto
easily. overfitting speech
layers ne and attempt gets of
an directly filtering : autoencoder TECHNI
learns Fig. data helps
function the a
a if
hidden inputdenoising out
the autoencoder using
Learningare identity However, of and
onlythe i.e. reconstruct
autoencoders filtered
noiseinput denoising autoencoderof
layers
autoencoder where risk image
istrained
more equals into
Input
Corrupted loSs
algorithm Feed noise, the
Deep the shows be removing the the
hidden
bc. The
autoencoders), such
as
lecarning
the can simply or
denoising mapping
"denoise" can Essentially, ReducePrevent
and Denoisinggeneral, 5.7.2 introducing
Denoising Denoising
reduction. DAE
reduction noise The
Networks the the While tasks,
of output
Fig. image
input The the J. The
risk In al tonoise Dy 1. 2.
Neural " " "
" Add
whichdifferent that get corrupting for before of is This contains meaningful
Recurent
Neural
Networks
zero.eror. the representation
which
of to used data.
exactly
reconstruction autoencoders
attempt input
rest model also
of network features,
the input.
randomly noise-freeunderlying
activations rawmapping. input
not trained
out autoencoders
original the intelligent
but zero neural useful the
standard to stochastic
zerothe a i.e. original
of in noise because
the knowledge
encoder to and noise,
the of learn
to addition nodes result produce
close layer of Denoising reconstruct
typeadding an
the output
will version introducing a usinglearn to
auto hidden individual is autoencoder for
value inputs to
whichintentionally
recover and
up-thrust
in achieved
sparse function. or
autoencoderits noise
a layer the stochastic denoise" it to
penalty, by
affiliationautoencoder, input
making an
single-layerhidden in the different PUBLICA
the autoencoder.
values be the
that to can the subtract
5-
20 identity
sparsity that a then refers our
forceand copy
the activation essential are identity-function Denoising
forced
and of to inputs to
Simple on Autoencoders
Denoising
5.7.2(DAE)the must typeDenoising way simply
X3 a applied dependent
network. it denoising
have overfitting.
This
prevents it's learning autoencoder small asking TECHNICA
a network.another its can't
5.7.1 highest working, Autoencoders is
Learning autoencoders autoencoder to
is the learning.layer noise are
Fig. penalty data of of autoencoder
a
the hidden
nodes.:Disadvantages through the code is WNe called
be are riskriskthe There
Deep take to randomnoise.
Advantages
: Sparsity to the
it Keeping is
Sparse activate the thisthat unsupervised
Denoising providing
and They it nodes Denoising This
For reduces
aroundso
input data, the random
Networks adding
1. 2. 1. the way data.
Neural
ato
deriatives
smaller zero
respet
of of penalty but feature by feature encod1ngs leads
Recurent
Neura
NetaorS presence or essentially
square small
contract1ve the decay
useful a (perfect
reconstruction
Learnedreconstructon)
functon
ident1ty
Linear inputlarge
with matrix into
similar observatons
Training
of resist
n autoencoders
weight inputs is the penalizes
calculated
sum that learm
Jacobian function slope) in
the
have change
small
a
senseato to
using of function knOwieog
1s equivalent inputs
autoencoder (i.e
neighborhood should
is matrix the a reconstruction data
which in contract1ve
input.
autoencoder data
of similar reconstructed for
terminstances.
nomautoencoder reconstructed up-thrust
Jacobian where
matrix reconstructon
emOr
iS the
to
r= denoising loss
Frobenius g(fix)). of wherea training
Jacobian the in perturbations contract instances
the an
the whle
regularizing TION
of
denoising autoencoders encoding the this
nom Jx) the input
- to A3
5-23 tem input, of of the For
the x compare to PUBLI
5.9.1.
Fig. -À
in g(f\x) maps derivative
data. Slope space.
penal1zingthe
of Frobenius penalty to intnitesimal
by an how 2
nom dh(x) similar
denoising the w.r.t.
that denoiSing obtained learnlearn input 5.9.2
as 1 the
encoding TECH
Frobenius with functionof the of Fig. activations
laver.
shown
is Ix- is perturbations
better to to
encouraged
outputs.
of
neighborhood by
the
autoencoder
function resistresults forced howneighborhoods
of 4r(x) achieved
hidden L
= noisereconstruction a
case 5.9.2
indicates
Fig.
is functionis observed
during in layer
term This CAE is Similar
inputs within
a neighborhood,
contracted
areaconstant
to outputon
what
based
Loss Gaussian in surpasses the
model training change
the is model be
penaly all Cantcive means, sized model can
nidden
for elements. 1 extractiondenoising.
extraction. local
input s9 the large
This
C:s
smallthe This finite CAE The So for OT
The to on
" "
. "
Poecoderlxjh) representation autoencoder
that loss negatiebe a of ato
correspond
the meandistribution
RecuTent
Neural
Neorks
can for
labels. form parameterize a autoencoder
unit he that a of robust.
STategy exact yields Structure
and the
output
class assume Bernoulli =px|h). contractive
nd log-likelihood learned representation
unis the
and eg. same to values stochastic
definè can h). where used Percodelh)x) 5.8.1
output
funCüons robust
targeis, a knowledge
the We to x Fig. possible,
)1s ( h) are encoder
h)
the still discrete
corespond
of network. |pox ahave the
pa pa(x unitsnegative make
loss dsigning
p( vectorand distibution for
autoencoder.
correlations,
distribution simple a
not injection, stochastic as
ouput unit, conditionally from to up-thrust
same distnbution X feadforward Small
f(x)
a is log The is to
is input valuesoutput sampled decoder (CAE) as =
the linear
x. data. so as an
Fo v a h
ue where minimizing
conditional valued noise defines be code PUBLICA
OrXS. BinaryX probability
with stochastic the function
are stochastic Autoencoder
and ourut the Pe networks, sigmoid as to
Decoders minimized treated as in
s in of real outputs some
decoder seen I the
orks ne as x) variation
an well form used. loSs of on
a by
srategy for case. a of a ph, derivatiIves
regularizer
netwfedfonward
network, providing feedforward by are be
involve a the
as trained
the distribution evaluation
be of and Contractive
and
on.
thisgivenso variablesmodeling the
structure can TECHNICAL
and feedfowardis p(v) target on can pdhx) Encoders
Contractive
5.9 smallto
same output model applied
in and
Leaming
Encoders feetonward is be depends criterion are outputs instead to the explicit
tra£ittonal 3utoencoder,
using then
decoder traditional that
dstribution
output For the sensitive
Gaussianparameters encodertheir variable of is make
log
can function density
so Inexpensive. shows
= goal term
Deep re - but that an
bN
log-Iikehbood
h. error the independent x)
distribution. to introduces
AutoeOders
a Autoencoder latent
Pcdh main lesspenalty
ano Stochastic
in of So code to ofa the
5.8.1 functions
meaning order
h,
sofumax MIxture
vetwos
usiTuctonn wbose
loss Similar squared Given Any is
n gpie grven mean Both The that
the Fig A In
But iS
e
"
5.8
"ve
discover
obsCurity for neural
Recurrent
Neural
Networks data,parts principle
very ie recurrent
which inputtwo
is
still the
which inputs of
reconstruct learning
can where
consists fixed
time They
of visualized
the series nodes. large, knowledg supervised
and model
throughout
increase arbitrary Compress random,
vanishing. autoencoder inputis for
autoencoder up-thrust
provide
and
than
cach not
information the to : a
follows greater Ans.
networks
architecture
does procesSing slow. gradient learn drive an
steps. An PUBLI
size networks. is sparse
activation. to
5-25 computation can neurons. as nodes is
model time and
difficult. that are
ror generic RNN.an idea
length. explodingnetworks autoencoders hidden
remember the memory
predictor. neural of
the across of main
RNN. be layer Undercomplete
autoencoders A level of
any larger, feedforward the can autoencoder. Explain
sparse
autoencoder. networks TECHN
RNN. havedata,
5.Variational
autoencoders.
Contractive
autoencoders The
Learning of modeled
to internal
shared of nature, as ? neural hidden Denoising
autoencoders the
models
disadvantages autoencoder
of
inputs series1s such of autoencoders
Sparse autoencoders
the with recurrent
neural
networks.
advantages sizc decoder. types
time be their recurrent are a from state
DOop proCCSS inputcan with RNNproblems using
:Disadvantages of corresponds
Advantages Autoencoders different
: ismodel
RNNany types featuresechostate
use
weights case aand
and of an images,
Explain can in the can its Training Sparse Explain
Echo
Wotworks helpful
if
Explain to is the
the to Prone What encoder The
RNN .Even
The RNNnot Due List iiportant
node
as : :
: : 1.
Ans. 2. 3. 4.
Noutal o.3 :Ans. Q.5 Ans.SuchAn Q.6 Q.7 Ans. a Q.8
" " . Q.4 Ans.e . . Of
Recurrent
Neural
Networks
J L a ...
(S.9.1) ...
(5.9.2) ...(5.9.3) Neural for
values a know
matrixan of of
derivatives specialized elementfar.
already
is so
Jacobian
norm Recurrent
of
series calculated
Frobenius summed every
user
partial are
th where
as aprocessing
of known that for and been
first-order x, knowledge
input networks taskcomputations
|A A (5.9.2). has
observations.
norm is same
the data for what
for
all and w.r.t. specialized
Frobenius sequential
neural the about up-thrust
(5.9.1)
represents previous
activations perform
input ða.(x) L
- n,2+E|al information
Sa,
(x) are an
equation Answersprocessing PUBLICAT
5-24squared (5.9.3) (h) Òx m (h) Òx are the
the matrix networks
image. they
and layer that on
w.r.t. equation because
depended
captures
the using nodes, examples. ?
") networks
Jacobian i-l
j lSai
(x) hidden with networks
for
an ?
convolutional
is activations n as recurrent
uscd calculated (h) networks such recurrent
Observations
layerby m : Questions
of training beingwhich TECHNICAL
Learning
term The defined neural neural X,
Hidden field
Vector
valued
function. l|A|,
= = valuesas "memory"
J alli neural called output
called
losslayermatrix. be Gradient recurrent is like
Deep can be
hidden
Regularization n: can Marks It of
m:
values over of (RNN).just is are the
and a grid
function : class RNN with a
for (x) What RNNS have
Networks the TwoisQ.1 x(t),a
norm These A Networks processing
Why sequence,
for (h)a.
where, they
Loss x(1),., :
7 5.10 Ans.
Neural" Q.2 Ans. that
PAPER 4)
-Chapter 5)
Learning 5) -
- Chapter
IT) (CS&BS) 1) 2) 4) Chapter and
(CSE/ -
Chapter - 3) of -
Chapter
/AI&DS) Chapter
- Q.2 Unsupervised
QUESTION Chapter of
Learning)
(CS&BS)
CS&BS)Learning) Marks of Q.2
of
).10
of of Marks
Q.7
Deep(CSEIIT -I) Q.14 of ).5 Marks Marks)
Elective Marks) Two
IT/ Technologies)
Machine Questions Marks?map Q.9
Syllabus] Machine ALL
Answer Marks
Marks Two Sunervised.
(Refer
andI(CSETechnologies)(0pen 20 ?functions self-organizingMarks? unsupervised Two 65
1) Two -2) 3) Two (Refer
= 1)(M-
and and = Two - feedforward?
New Science) 2 -Chapter Chapter Two Chapter (Refer 13
MODEL (ECE)
Intelligence
Intelligence x x 1.11.3)
Networks (Emergihg (10 activation(Refer (Refer ? -(5
(Refer recårrentbetween
Per - (Referor ? B
(Emerging VI
(Data A theof
[As PART of learning. of
supervised ? autoencoder (Refer
Learning
PART section
-Semester Q.22 Q.16 ? Q.15 descent
-1 -3 Vertical-4(Artificial
(Artificial of nodel ? called diflerence
of goal convolution as
Vertical Vertical necessity
Marks Marks Marks called
-6 supervised
SOLVED Vertical principleHopfield autoencoder are gradient
Neural of
-7 Hours]
Three
Time: the Two Two Two models aim is
Vertical a RNN Explain
is is
What is
What(ReferDefineis (Referis is
What What What (ReferWhy What Why
Is i)
a)
Q.10 el1
Q.1 Q.2 Q.3 .4 Q.5 Q.6 Q.7 Q.8 Q.9
Recurrent
Neural
Networks Combination called forato other compressed the
a for wordin technically, higher
network nets network capture
processing
are on (encoding) have
neural for
based
"reservVOir information to to
although (encoding)
Iincar the a network
languageto model self-supervised.
sentences
input representat1on
training learn
trainable
thS the to method, knowledge
structured as the
representation
natural
within and structures by encourages seeks training
a reduction,
by phrases lower-dimensional learning
as for
neuron in that
signal and effective to by up-thrust
data that referred lower-dimensional
in-depthprimarilyvision. model
unsupervised reduction,
cachoutput processing dimensionality
function an
PUBLICA
were computer image. ?
unsupervised methods,
learning
supervised
5-
26 in lcarn network
desired output. dimensionality
inducing trees,
RVNNs a ? loss
can for learn inputautoencoder
its an
a networks. the uscdin a neural a image.
combine that (RyNNS).
thereby as to for the use to are learm
of is typically input or ?
is well autoencoder
models structures ? autoencoderof autoencoders They autoencoder TECHNICA
networks parts supervised
a to for input
Learning and
signal, signals.
neural as in its
Networks processing regularization is input. aimstypically
the
adaptive copying autoencoder
signal data,important autoencoder
recursive and Recursive of
Deep input response Is the usinguse
autoencoder
Q.12 parts
Regularized
of an higher-dimensional
response Non-linear sequences
learning aim of besides of we data,
and the Neural language most trained
aim representation
do
important
is is An dimensional
Networks
with theseExplain embedding.
WhatThe the What properties WhyAn
nonlinear Recursive are
networkof
capture : : :
: natural :Ans. Q.11 Ans. Ans. they Q.13Ans. most
all Ans. Q.10
Neural Q.9
of
|15] Convolutional
(7] DD0
|8] counter-propagation
Paper [6)
Question
5.5)
o) section
Model 5.section and
Stride
Solved 5.3) processing
(Refer of
section
(Refer Padding, ypes
NLP Explan
detail.Marks)
(Refer representation,language
for
in 15 :operation framework
RNNs.
Explain ?
= natural network
15
OR bidirectional
(1x OR
M-3 ?encoders
convolution
Equivariant basedpropagation
- for 2.5)
C
PART (CNN) section
learning
on contractive the
Learning note sharing,
3.4)
section
(Refer
following Network
counter
(Refer
deep
short
Deep Parameter Explain
Neuralis network.
and
WriteisWhat Explain What
Networks i) ) i) i)
b)
a) b)
Neural o16
Paper |6] |6 [6] between its [13| [6] and [7] [6] section
[6] [71 (Refer
5.6) [6] [7]
2.2)
section List
Question learning
1.8) 1.4.3) CNN.
3.6)
5.7.2)
Model junction. section
section
2.3) difference of machine
section
memory.
(Refer structure SNN.
detail. section hetero-associative section
Solved autoencoder
(Refer with (Refer knowledge
activation (Referin Explain between augmentation. 4.4.1)
Explain2.1) (Refer basic challenges 4.9) (Refer
regularization. section
section
network. function. section for
maps. section
3.2)
Explain difference autoencoders.
up-thrust
one ? ? of
network Explain Dataset (Refer architecture
any neural (Refer feature
memory convolution (Refer
? an
Explain Ll networks PUBLICAT
disadvantages.
(Refer
OR
Explain
4.3) : OR dimensionality.
theory rule. OR Self-organizing
recurrent
OR Explain ? regularization dimension. denoising
M-2
networks section Explain
? resonancedelta memory
and
auto-asSOciative basic
function neural ?
? of
regularization and learning
(Refer neural the VC ?
architecture of
(Refer
section
1.7)
activation (Refer
section
2.6) rule convolutional (Refer
section
3.1) of on (Refer of on
autoencoder
architecture
TECHNICAL
adaptive section
4.7) curse
Learning Kohonenquto-asSOciative
spiking variants
Hebb note
deeplearning. note
advantages
and
short the short
Deep is
is Explain Explain Discuss is is
Explain
is Discuss is Explain
What What What i)
What is What WhatdeepWrite
and What Write What
Networks11) i) i) i) ii) i) i) i) i) ii) i) ii) i) i)
b) Q.12
a) b) Q.13
a) b) Q.14
a) b) Q.15
a)
Neural

NNDL Technical Publication Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NNDL Technical Publication Notes

Uploaded by

Copyright:

Available Formats

and

feedback without Hetero-associative

You might also like