Professional Documents
Culture Documents
Transformer-Xl Attentive Language Models Beyond A Fixed-Length Context
Transformer-Xl Attentive Language Models Beyond A Fixed-Length Context
Transformer-Xl Attentive Language Models Beyond A Fixed-Length Context
GOOGLE AI RESEARCHERS
TRANSFORMER-XL
ATTENTIVE LANGUAGE MODELS BEYOND
A FIXED-LENGTH CONTEXT
PRESENTER
SALMAN YOUNUS & BILAL SHABIR
Language modeling (LM) is the use of various
statistical and probabilistic techniques to LANGUAGE
determine the probability of a given sequence
of words occurring in a sentence. MODELING
PREDICT THE LAST WORD IN THE TEXT
In the above example, the previous words are to predict the next word of a sentence.
Hence there is a need to remember the previous words.
RECURRENT NEURAL NETWORKS
How can we keep the positional information coherent when we reuse the states?
The original positional encoding handles each segment separately and, as a result, tokens from
different segments have the same positional encoding.
For example, the first token of the first and the second segments will have the same encoding, although
their position and importance are different.
Segment 1 Segment 2
0 1 0 1
Thanks