How do the GRULM really work?


I posted this question also in the course forum, but I see that it is a bit inactive compared to this platform so I’m posting it also here…

When I did the final task of the 2nd week, I have looked for some construct like the scan function that was introduced in the labs of this week. I didn’t find it.

Instead, I found this shift layer - with little documentation about it. It is not clear what is the interface. How the training and the evaluation knows to concatenate the model to itself for every word and what inputs to connect to what outputs?

Is there an explanation for that? I didn’t find it thought the lectures the 2nd week



Hi Roee,

As mentioned in the notebook you can find the source code for trax.layers.attention with ShiftRight here. This should clarify how it handles the difference between the train and eval modes on the one hand, which include a shift, and the predict mode on the other, which does not include a shift (if mode == ‘predict’: return x).