Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

trearalhms

-
word Embedding -

→ Rmn ( Recurrent NN )
I
are
constant
# Parameters of
of the size
irrespective
the context window
the
how
Symmetry
in
#

weights are
processed
Forward Propagation

:-#
1)
a(t) = b + Wh(t + Ux(t)
h(t) = tanh(a(t) )
o(t) = c + Vh(t)

Ei ŷ(t) = softmax(o(t) )

¥9m entropy
-

will be updated?
HI now parameters

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 9 / 38
Forward Propagation

1)
a(t) = b + Wh(t + Ux(t)
h(t) = tanh(a(t) )
o(t) = c + Vh(t)
ŷ(t) = softmax(o(t) )
This maps an input sequence to an output sequence of the same length.

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 9 / 38
Loss Function

Total loss is sum of the losses over all the time steps.
So, if L(t) is the negative log likelihood of y(t) given x(1) , . . . , x(t) , then

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 10 / 38
Loss Function

Total loss is sum of the losses over all the time steps.
So, if L(t) is the negative log likelihood of y(t) given x(1) , . . . , x(t) , then

L({x(1) , . . . , x(t) }, {y(1) , . . . , y(t) })


= Â L(t)
t
= Â log pmodel (y(t) |{x(1) , . . . , x(t) })
t

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 10 / 38
Loss Function

Total loss is sum of the losses over all the time steps.
So, if L(t) is the negative log likelihood of y(t) given x(1) , . . . , x(t) , then

L({x(1) , . . . , x(t) }, {y(1) , . . . , y(t) })


= Â L(t)
t
= Â log pmodel (y(t) |{x(1) , . . . , x(t) })
t

where pmodel (y(t) |{x(1) , . . . , x(t) }) is given by reading the entry for y(t) from the
model’s output vector ŷ(t)

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 10 / 38
Loss Function

Total loss is sum of the losses over all the time steps.
So, if L(t) is the negative log likelihood of y(t) given x(1) , . . . , x(t) , then

L({x(1) , . . . , x(t) }, {y(1) , . . . , y(t) })


= Â L(t)
t
= Â log pmodel (y(t) |{x(1) , . . . , x(t) })
t

where pmodel (y(t) |{x(1) , . . . , x(t) }) is given by reading the entry for y(t) from the
model’s output vector ŷ(t)
Back propagation - right to left - back propagation through time (BPTT)

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 10 / 38
Recurrent Neural Networks

.
-
pred
I
- ii


1.
Ears

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 11 / 38
Example RNN

debt .

prob
-

1-74
-
-3¥ 33_pm
Iv
-

P
predict hidden
nextdoor
✗ -

ñ;' VI.a
top

-
- - -

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 12 / 38
Training an RNN language model

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 13 / 38
Training an RNN language model

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 14 / 38
Training an RNN language model

Et
Trigons RMI
-

:÷=
BP

bf- R LI u,yw_T → parameters


Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 15 / 38
Generating text with an RNN Language Model

Eos >
_→
<

08

f by meth

×

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 16 / 38
LI
" '÷.

÷÷÷÷÷
↳ → hi - .

ÑItd
Rn±

← renren.at#RePmsentat
.

I Learning

-
classification ?

See .

Labeling ? ✓
of-J.y p-mettw.rs
-
① knew → Ftl
-
an →

Variants of RNA

PIÉ¥f② _nns#oo(t SOTA )


Vanishing Gradient Problem with RNNs

a ↳
singular titties

13--1=1%1-+1:-)

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 17 / 38
Effect of vanishing gradient on RNN LM

2-
- - -

=
-

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 18 / 38
Effect of vanishing gradient on RNN LM

-
-
=

=
÷×w TE
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 19 / 38
How to fix vanishing gradient problem?

The main problem is that it is too difficult for the RNN to learn to preserve
information over many timesteps.
In a vanilla RNN, the hidden state is constantly being rewritten

- (t)
h = tanh(Wh + Ux + b) (t 1) (t)

How about better RNN units?

II. E-

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 20 / 38
Using Gates for better RNN units

sigmoid
-

The gates are also vectors


On each timestep, each element of the gates can be open (1), close (0)
or somewhere in-between.
=
-

The gates are dynamic: their value is computed based on the current
context.

Two famous architectures


GRUs, LSTMs

IT Tart

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 21 / 38
Gated Recurrent Units (GRU)
6 ( U-n.tk?- b)

h - dim 5101€17 h

-
-

8
-

,
µ→
-
± - - -
-


÷
-
-

Ut=o Caroyflw "tÉas Un ( Ito hlt -7)


new
( ignore ilp)
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 22 / 38
ltt -0
-

→ some dim is 1

wn(n±i°
.

at
É÷
-

4
*
÷
-

d.) www.t


RAN
µnvÉt Uh

É÷

+ bn
)
he -
• ,

Ñ¥↳
t
( 1- uh / • he - ,
+ " °

new info
Mt -0 New cell cower

more
U-
,

leaam based on

Kefkxibility

control
more
Lµkt_
Long Short Term Memory (LSTM)

=
hSTM_
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 23 / 38
Long Short Term Memory (LSTM)
-

pint
predict : vYI
→ -4€
fÉ|
r 1

at
→ arul
trip's
-

. : -

.

-1
ht

- -
At ~IeI -
Em
I
-
=

-

- q

lunh

Woolf
a

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 24 / 38
?⃝
?⃝
Uinettbi )
✓ HP fate : it = • ( Wiht . . -1

forget fate
f-t-ofwfht.a-vfkt-bflf-1-f-T-C-e-tt-w-ssimrtoc.RU
:

÷:÷.*⇐
Vent -1ha )
[e- = tanh ( Woh # → +

ht== ?
boat -1k )
✓Otltpnt gate
: ot.rlwoht.it
How does LSTM solve vanishing gradients?

The LSTM architecture makes it easier for the RNN to preserve information
-

over many timesteps


-

e.g., if the forget gate is set to remember everything on every timestep,


then the info in the cell is preserved indefinitely
By contrast, it is harder for vanilla RNN to learn a recurrent weight matrix
o =
Wh that preserves info in hidden state

:÷÷÷
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 25 / 38
RNNs can be used for sentence classification
-

e.g., sentiment classification


⇐esat=o→

sentence

representation
ha ¥
[ f-anarchy
of the
]
sentence
~

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 26 / 38
RNNs can be used for sentence classification

e.g., sentiment classification

LM

atas n-rn .ms? -osa.mr- R-:tIab-tirg3


Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 27 / 38
Ñq→ → →
Ip
ha ha ha ha

9
has
pnpeoirbastd
late
T T T T see .

play football
Ilp Rahul likes to

seen olp IE O O O O
{ Name / Placed

lately organization}
MY lasses
.

& Gene / Disease


pefire
Code W.
Wz wz -
-
\WY° -
.
-
- z

Mi µ,
mixing Er

the
classify in one of
for each word
Approx : -
,
classes
entity
RNNs can be used to find sentence similarity

Siamese Architecture
Pass both sentences through RNNs: the parameters are shared across
these
Use (a function of) the difference between the final hidden states

qt ④
1 Yar

T
[ Irm"
0 ↳ Siamese
Sharing
their pain
- -
-
I - -
-

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 28 / 38
RNNs can be used for tagging

# P°¥P→
y ±
e.g., part-of-speech tagging, named entity recognition


.

- .

he
h,

-
- -

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 29 / 38
RNNs for compararison mining in Reviews

= ÷} comparison
-

ses.bt-o.is
↳ bidirectionally

⇐ → an
" -
-

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 30 / 38
Bidirectional RNNs

labels
See ,

What we have seen till now?


The state at time t only captures information from the past x(1) , . . . , x(t 1) ,
and the present input x(t)
Some models also allow information from past y values to affect the
current state when the y values are available

q→ HIM ✗


no →

Ya .

¢-1M
act ' ✗
at -
.

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 31 / 38
Bidirectional RNNs

What we have seen till now?


The state at time t only captures information from the past x(1) , . . . , x(t 1) ,
and the present input x(t)
Some models also allow information from past y values to affect the
current state when the y values are available

However, in many applications we want to output a prediction of y(t) which may


depend on the whole input sequence. e.g., sentiment analysis.

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 31 / 38
Bidirectional RNNs: motivation for sentiment classification

HMI
from
do, ,

E. T -
-
-
.

I :* ←
.

-
resentment
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 32 / 38
-4ft gift
Eni; in ]
µ, Eh? ;Ñ ]
xp ,
1
n

% ÉI⇐ .
- .
. .
In .
. ← I. ¥Tn+ , btw RNN

Vb
nÉi
\
Ub

ii. ¥ÑYt→ñ -
tlwr.vn

Iit !
-

44¥ 1^4

task
.

I'm salary ?
What is the

{ .no#EhM0-
Preteens ✗
Large-scale

am

{
classification -

"

Bi Rams}
-

by ✓
small-scale tenet

Bidirectional RNNs

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 33 / 38
Bidirectional RNNs

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 34 / 38
Bidirectional RNNs: simplified diagram

Bi-RNNs are only applicable if you have access to the entire input sequence
thus not applicable to Language Modeling where only the left context is
available

Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 35 / 38
Using Char-RNN for word embeddings

Word embed -


.

digs

C. -
- Cm
¥ W, -
-
Wn

⑨ LSTM § Lstm

=] Sentences
xp ,
seep .

↳-
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 36 / 38
Multi-layer RNNs

if
=
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 37 / 38
What other paradigm can RNN be used for?
"

lastnyabgsifiot
[ image captioning]
yay
. tenth A- war tenth
olp
.

olp olp
nuns

fix fagtn
-

dip

Translation

summarization
-

mastership
-lH|
fix length
ICNERT
war . laughs
" 't

rei¥* .
v1 imy
Eine as

sesterce
☐ ☐ ☐ ilp length
Pawan Goyal (IIT Kharagpur) Neural Language Model, RNNs February 8th, 2022 38 / 38
Text classes
Sentence
'
→ Edgeley →
US Seg ,
Labels

In For Translation

÷Ér=nwIÉ¥
- RNN 8-

-
Transformers -

Transformer
RNNS

qq.tt

zone
¥É÷gga than as -1ms
.

u#m
.


GIRLS

You might also like