Professional Documents
Culture Documents
RNN Part2
RNN Part2
-
word Embedding -
→
→ Rmn ( Recurrent NN )
I
are
constant
# Parameters of
of the size
irrespective
the context window
the
how
Symmetry
in
#
weights are
processed
Forward Propagation
:-#
1)
a(t) = b + Wh(t + Ux(t)
h(t) = tanh(a(t) )
o(t) = c + Vh(t)
Ei ŷ(t) = softmax(o(t) )
¥9m entropy
-
will be updated?
HI now parameters
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 9 / 38
Forward Propagation
1)
a(t) = b + Wh(t + Ux(t)
h(t) = tanh(a(t) )
o(t) = c + Vh(t)
ŷ(t) = softmax(o(t) )
This maps an input sequence to an output sequence of the same length.
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 9 / 38
Loss Function
Total loss is sum of the losses over all the time steps.
So, if L(t) is the negative log likelihood of y(t) given x(1) , . . . , x(t) , then
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 10 / 38
Loss Function
Total loss is sum of the losses over all the time steps.
So, if L(t) is the negative log likelihood of y(t) given x(1) , . . . , x(t) , then
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 10 / 38
Loss Function
Total loss is sum of the losses over all the time steps.
So, if L(t) is the negative log likelihood of y(t) given x(1) , . . . , x(t) , then
where pmodel (y(t) |{x(1) , . . . , x(t) }) is given by reading the entry for y(t) from the
model’s output vector ŷ(t)
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 10 / 38
Loss Function
Total loss is sum of the losses over all the time steps.
So, if L(t) is the negative log likelihood of y(t) given x(1) , . . . , x(t) , then
where pmodel (y(t) |{x(1) , . . . , x(t) }) is given by reading the entry for y(t) from the
model’s output vector ŷ(t)
Back propagation - right to left - back propagation through time (BPTT)
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 10 / 38
Recurrent Neural Networks
.
-
pred
I
- ii
→
1.
Ears
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 11 / 38
Example RNN
debt .
prob
-
1-74
-
-3¥ 33_pm
Iv
-
P
predict hidden
nextdoor
✗ -
ñ;' VI.a
top
-
- - -
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 12 / 38
Training an RNN language model
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 13 / 38
Training an RNN language model
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 14 / 38
Training an RNN language model
Et
Trigons RMI
-
:÷=
BP
Eos >
_→
<
08
f by meth
×
⇐
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 16 / 38
LI
" '÷.
÷÷÷÷÷
↳ → hi - .
ÑItd
Rn±
← renren.at#RePmsentat
.
I Learning
-
classification ?
See .
Labeling ? ✓
of-J.y p-mettw.rs
-
① knew → Ftl
-
an →
Variants of RNA
a ↳
singular titties
13--1=1%1-+1:-)
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 17 / 38
Effect of vanishing gradient on RNN LM
2-
- - -
=
-
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 18 / 38
Effect of vanishing gradient on RNN LM
-
-
=
=
÷×w TE
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 19 / 38
How to fix vanishing gradient problem?
The main problem is that it is too difficult for the RNN to learn to preserve
information over many timesteps.
In a vanilla RNN, the hidden state is constantly being rewritten
- (t)
h = tanh(Wh + Ux + b) (t 1) (t)
II. E-
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 20 / 38
Using Gates for better RNN units
sigmoid
-
The gates are dynamic: their value is computed based on the current
context.
IT Tart
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 21 / 38
Gated Recurrent Units (GRU)
6 ( U-n.tk?- b)
h - dim 5101€17 h
→
-
-
8
-
,
µ→
-
± - - -
-
g÷
÷
-
-
→ some dim is 1
wn(n±i°
.
at
É÷
-
4
*
÷
-
d.) www.t
✓
RAN
µnvÉt Uh
É÷
✗
+ bn
)
he -
• ,
Ñ¥↳
t
( 1- uh / • he - ,
+ " °
new info
Mt -0 New cell cower
more
U-
,
leaam based on
Kefkxibility
→
control
more
Lµkt_
Long Short Term Memory (LSTM)
=
hSTM_
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 23 / 38
Long Short Term Memory (LSTM)
-
pint
predict : vYI
→ -4€
fÉ|
r 1
at
→ arul
trip's
-
. : -
→
.
-1
ht
- -
At ~IeI -
Em
I
-
=
→
-
- q
lunh
Woolf
a
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 24 / 38
?⃝
?⃝
Uinettbi )
✓ HP fate : it = • ( Wiht . . -1
forget fate
f-t-ofwfht.a-vfkt-bflf-1-f-T-C-e-tt-w-ssimrtoc.RU
:
÷:÷.*⇐
Vent -1ha )
[e- = tanh ( Woh # → +
ht== ?
boat -1k )
✓Otltpnt gate
: ot.rlwoht.it
How does LSTM solve vanishing gradients?
The LSTM architecture makes it easier for the RNN to preserve information
-
:÷÷÷
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 25 / 38
RNNs can be used for sentence classification
-
sentence
→
representation
ha ¥
[ f-anarchy
of the
]
sentence
~
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 26 / 38
RNNs can be used for sentence classification
LM
9
has
pnpeoirbastd
late
T T T T see .
play football
Ilp Rahul likes to
seen olp IE O O O O
{ Name / Placed
lately organization}
MY lasses
.
Mi µ,
mixing Er
the
classify in one of
for each word
Approx : -
,
classes
entity
RNNs can be used to find sentence similarity
Siamese Architecture
Pass both sentences through RNNs: the parameters are shared across
these
Use (a function of) the difference between the final hidden states
qt ④
1 Yar
T
[ Irm"
0 ↳ Siamese
Sharing
their pain
- -
-
I - -
-
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 28 / 38
RNNs can be used for tagging
# P°¥P→
y ±
e.g., part-of-speech tagging, named entity recognition
⇐
.
- .
he
h,
-
- -
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 29 / 38
RNNs for compararison mining in Reviews
= ÷} comparison
-
ses.bt-o.is
↳ bidirectionally
⇐ → an
" -
-
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 30 / 38
Bidirectional RNNs
labels
See ,
q→ HIM ✗
☒
→
no →
→
Ya .
¢-1M
act ' ✗
at -
.
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 31 / 38
Bidirectional RNNs
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 31 / 38
Bidirectional RNNs: motivation for sentiment classification
HMI
from
do, ,
E. T -
-
-
.
I :* ←
.
-
resentment
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 32 / 38
-4ft gift
Eni; in ]
µ, Eh? ;Ñ ]
xp ,
1
n
% ÉI⇐ .
- .
. .
In .
. ← I. ¥Tn+ , btw RNN
Vb
nÉi
\
Ub
ii. ¥ÑYt→ñ -
tlwr.vn
Iit !
-
44¥ 1^4
task
.
I'm salary ?
What is the
{ .no#EhM0-
Preteens ✗
Large-scale
→
am
{
classification -
"
→
Bi Rams}
-
by ✓
small-scale tenet
→
Bidirectional RNNs
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 33 / 38
Bidirectional RNNs
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 34 / 38
Bidirectional RNNs: simplified diagram
Bi-RNNs are only applicable if you have access to the entire input sequence
thus not applicable to Language Modeling where only the left context is
available
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 35 / 38
Using Char-RNN for word embeddings
Word embed -
→
.
digs
C. -
- Cm
¥ W, -
-
Wn
⑨ LSTM § Lstm
=] Sentences
xp ,
seep .
↳-
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 36 / 38
Multi-layer RNNs
if
=
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 37 / 38
What other paradigm can RNN be used for?
"
lastnyabgsifiot
[ image captioning]
yay
. tenth A- war tenth
olp
.
olp olp
nuns
fix fagtn
-
dip
✓
Translation
summarization
-
mastership
-lH|
fix length
ICNERT
war . laughs
" 't
rei¥* .
v1 imy
Eine as
sesterce
☐ ☐ ☐ ilp length
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 38 / 38
Text classes
Sentence
'
→ Edgeley →
US Seg ,
Labels
In For Translation
÷Ér=nwIÉ¥
- RNN 8-
-
Transformers -
Transformer
RNNS
qq.tt
zone
¥É÷gga than as -1ms
.
u#m
.
→
GIRLS