C4W1_Assignment difference between train and reference

When training, Post-attention model is lstm and will consider the state of previous timestep. However, when reference, it dose not. Is this a problem?

Sorry, but I’m not sure I understand your point. The only difference between training and inference mode (which is what I assume you mean when you say “reference”) is that you don’t have labels or any gradients to apply in inference mode. But the “forward propagation” of the model works the same way, right?

If you notice how post attention is lstm, but the way it is recalled is not same. Notice how it is recalled and used the same in reference.

To be more clear about your doubt, It is better you share screenshot of your error without sharing codes

Hi @Jaimx ,

During training, we do NOT consider LSTM cell state in pre-attention decoder as we want the cell states to be 0 when the decoder uses right-shifted ground truth to train.

During inference, however, we DO consider LSTM cell state in pre-attention decoder since we are predicting the next token based on the previous one. For this we want the LSTM cell to be initialized with the previous cell state (generated by previous token) instead of 0.