model = tl.Serial(
tl.ShiftRight(mode=mode), # Stack the ShiftRight layer
tl.Embedding(vocab_size=vocab_size,d_feature=d_model), # Stack the embedding layer
[tl.GRU(n_units=d_model) for _ in range(n_layers)], # Stack GRU layers of d_model units keeping n_layer parameter in mind (use list comprehension syntax)
tl.Dense(n_units=vocab_size), # Dense layer
tl.LogSoftmax() # Log Softmax
)
It took me awhile to understand the purpose of this layer. But I think I get it now. Note that, in the Week 2 assignment, the inputs are targets are the same. If the input was “I am hungry” and the target was also “I am hungry,” the RNN would have the easy task of simply setting the output to the input for every step. The ShiftRight layer changes the input to " I am hungry." Thus the RNN now tries to predict “I” from “”, “am” from " I", and “hungry” from " I am."