Professional Documents
Culture Documents
Trans Ormer Numerical
Trans Ormer Numerical
^𝑇
Input How are you
0.69 0.56 0.43
𝑊_
0.55 0.91 0.42 𝑘 0.4 0.71
0.6 0.95 0.15 0.95 0.71
Position 1 2 3
Embedding 0.9 0.59 0.96 𝐾=𝑊_𝑘×𝑋
(must be of same dimension as
input) 0.47 0.19 0.67 1.63 1.56
0.69 0.57 0.3 3.07 2.86
Wk-ed Wv-ed
0.23 0.1 0.06 0.27 0.59 0.54
0.7 0.13 0.45 0.42 0.97 0.82
0.02 0.22 0.47
step 2: Norm of (Add & Norm - FFN) How are you
𝑋_𝑒 How 1.36 -1.02 0.29
are -1.03 1.36 1.05
you -0.33 -0.34 -1.34
This the final encoder output
𝑊_ 𝑊_
0.21 𝑞0.41 0.02 0.69
𝑣 0.72 0.23 0.07
0.65 0.51 0.38 0.53 0.53 0.89 0.75
0.39 0.17 0.95
(𝑄^𝑇×𝐾)/√2
𝑄^𝑇×𝐾
8.31 7.81 6.71 5.88 5.52 4.74
8.07 7.58 6.51 5.71 5.36 4.6
5.64 5.29 4.53 3.99 3.74𝑓(𝑄,𝐾,𝑉)=(softmax((𝑄^𝑇×
3.2
&Norm−Attn )^𝑇
𝑊𝑒1×(Add &Norm−Attn)^𝑇 FFN_ei=relu(𝑊𝑒1×(Add &N
0
-0.16 0.29 0.1 -0.39 0.29 0.1
-1.39 0.32 0.8 -1.11 0.32 0.8
you -0.36 0.64 -0.28 0 0.64
How are you How are
0.22 FFN_e=𝑊_𝑒2×FFN_ei
0.7 0.14 0.66 0
0.12 0.15 0.6 0
0.43 1.07 0
How are you
(FFN_e )^𝑇
0.71 (FFN_e )^𝑇
0.82 How 0.14 0.15 0.43
0.68 are 0.66 0.6 1.07
you 0 0 0
K-ed V-ed
0.19 -0.12 0.09 -0.08 1.09 -1.01
0.67 -0.69 -0.26 -0.18 1.75 -1.57
x1:How x2:are x3:you -0.06 0.77 -0.71
x1:How x2:are x3:you
𝑉^𝑇
𝑉)=(softmax((𝑄^𝑇×𝐾)/√2)) V^T
1.92
1.9
1.87
=relu(𝑊𝑒1×(Add &Norm−Attn)^𝑇 )
0
0
0
you
Output I
0.19
1
0.8
position 1
Embedding 0.19
(must be of same dimension
as output) 0.3
0.82
𝑌=output+ 0.38
positional embedding 1.3
1.62
𝑌^𝑇
I 0.38
am 1.42
fine 1.25
𝑓(𝑄,𝐾,𝑉)=Attention(𝑄,𝐾)×𝑉^𝑇
Attention(𝑄,𝐾)=softmax((𝑄^𝑇×
𝑋_𝑒
How 1.36
are -1.03
you -0.33
This the final encoder output
𝑄 is done/provided by the decod
and 𝑉 provided by the encoder
𝑄 is done/provided by the decod
and 𝑉 provided by the encoder
How
𝑋_𝑒^𝑇 1.36
-1.02
0.29
0.79
0.38
0.69
0.99
0.15
0.71
How
are
you
I
am
fine
am fine
𝑊_ 𝑊_
0.84 0.38 𝑘 0.91 0.36 0.72 𝑞 0.27
0.71 0.51 0.19 0.94 0.12 0.26
0.49 0.67
2 3
𝐾=𝑊_𝑘×𝑌 𝑄=𝑊_𝑞×𝑋
0.58 0.87 I am fine I
1.42 1.25
0.84 0.72
𝑄^𝑇
0.57 1.43
I 2.65 2.4 8.82
am 1.6 1.47 5.36
1.3 1.62 fine 2.22 2.02 7.41
0.84 0.57 𝑒^(((𝑄^𝑇×𝐾)/√2
0.72 1.43
Mask
I 0 -1E+099 -1E+099 I
on(𝑄,𝐾)×𝑉^𝑇, am 0 0 -1E+099 am
softmax((𝑄^𝑇×𝐾)/√2+mask) fine 0 0 0 fine
I am fine
Step 1: Add of (Add & Norm - Attn) step 2: Norm of (Add & Norm - Attn )
1.27 0.71 I -1.37 1.41 0.95
-0.67 -1.52 am 0.99 -0.71 -1.38
-0.67 0.21 fine 0.38 -0.71 0.43
𝑊_𝑑2 FFN_d=𝑊_𝑑2×FFN_di
0.19 0.63 0.92 0 0 I
0.97 0.11 0.81 0 0 am
0.97 0.08 1.17 0 0 fine
I am fine
step 2: Norm of (Add & Norm - FFN) ( Norm of 〖 (Add & Norm − FFN)) 〗 ^T
I -1.28 1.41 1.21 -1.28 1.16 0.12
am 1.16 -0.71 -1.24 1.41 -0.71 -0.71
fine 0.12 -0.71 0.03 1.21 -1.24 0.03
I am fine
𝑒^Ouput
How 1.127497 2.559981 1.040811
are 1.915541 2.691234 1.632316
you 1.349859 2.2705 1.083287
I 2.718282 1.632316 1.161834
am 1.390968 2.718282 1.094174
fine 1.185305 2.033991 2.718282
I am fine
𝑊_
0.85 0.89 𝑣 0.07 0.43 0.29
0.77 0.8 0.72 0.24 0.91
0.5 0.41 0.78
𝑄=𝑊_𝑞×𝑋 𝑉=𝑊_𝑣×𝑋
am fine I am fine
512.86 0 0 I 1 0 0
44.26 31.5 0 am 0.58 0.42 0
188.67 117.92 214.86 fine 0.36 0.23 0.41
I am fine I am fine
𝑊_(𝑞
0.54 −𝑑𝑒)
0.65 0.59
0.82 0.66 0.54
0.47
you
FFN_di=relu(𝑊_𝑑1×(Add &Norm−Attn)^𝑇 )
0.65 0 0
0.71 0 0
0.23 0 0
I am fine
(FFN_d )^𝑇
0.92 0.81 1.17
0 0 0
0 0 0
Norm − FFN)) 〗 ^T
Final Output=softmax(Output)
How 0.12 0.18 0.12
are 0.2 0.19 0.19
you 0.14 0.16 0.12
I 0.28 0.12 0.13
am 0.14 0.2 0.13
fine 0.12 0.15 0.31
I am fine
𝑉^𝑇
𝑓(𝑄,𝐾,𝑉)=(softmax((𝑄^𝑇×𝐾)/√2)) V^T
0.71
0.62