Week 4: Transformer Network (test time intuition)

xmkhan · November 24, 2021, 3:18am

Hi, after completing week4 + assignment, I have a few questions about how this transformer works during test (aka training w/ the Decoder mask). I’m going to write my high-level assumption, please let me know

Encoder:
Once we train the model, we now have some weight matrices: W_q, W_k, W_v

Now during test time, when we receive X. We can compute Attention(W_q * x,W_q * x, W_v * x).

Question 1. So during test - we are able to compute each X<i> concurrently?
** (technically as its already vectorized in the above case)**

Question 2. As I understand it, the decoder still has to run sequentially? While we have computed W_Q , the target (decoder output for previous step) changes as we make predictions still . Thus Q= (y^<i>_hat, W_Q)

Jane _ _ _ _
Jane visits _ _ _
Jane visits Africa _ _
…

TMosh · April 21, 2022, 3:40am

I hope that you were able to find the answer to your questions.

Topic		Replies	Views
How Transformer model works? Unclear after the lab Sequence Models coursera-platform	11	614	August 20, 2021
Transformer Architecture Assignment Sequence Models coursera-platform	4	589	May 11, 2022
Mask Multi Head Attention Sequence Models coursera-platform	5	616	May 2, 2022
Transformer Decoder Mask Input NLP with Attention Models week-module-3	1	532	August 12, 2022
C5 Week 4 understanding transformer forward pass Sequence Models coursera-platform	2	535	May 10, 2022

Week 4: Transformer Network (test time intuition)

Related topics