In the picture attached how we repeat the encode and decoder N times?
Is this the “N” repetition of “encoder” and “decoder” parallel or sequential? what is the input and output of each stack?
What is the video title and what is the time mark?
The video name is “Transformer Network”.
At two points, “N” repetition has been mentioned.
- at 1:45
- at 3:57
Sorry, I’m not able to access any of the Week 4 materials right now. I don’t understand why.
Hmmm, I just checked and I can see the lectures. I haven’t watched them in a while, so I’ll need to go through them again, but I won’t be able to do that until later today because of other commitments.
If I had to guess from my memory it would be that the N probably refers to the timesteps in the input. One of the big differences with Attention that makes it more powerful than the classic RNN/GRU/LSTM is that it handles the timesteps in parallel. In the typical case with a sentence as input “timesteps” map to “words” of course.
Thanks for your reply.
The N refers to the numbers of repetitions of each one of the encoder and decoder block in the transofmer structure.
The thing I dont get is how they will be repeated, is it sequential or parallel and either case what is the input/output specifically.