Can you please explain the concept of Teacher Forcing. Why is the object right shifted and what effect this has ?
When creating a post, please add:
- Week # must be added in the tags option of the post.
- Link to the classroom item you are referring to:
- Description (include relevant info but please do not post solution code or your entire notebook)
Training compute/time for deep recurrent neural networks and transformers scales linearly with the number of epochs. During training if the network prediction is off in one token/time-step, the following tokens/time-steps will further propagate the error. If we use backpropagation to train the network, the gradients try to correct all the weights unconditionally - this may offset the weight updates, thereby increasing the number of epochs it takes for training to converge. Instead, teacher forcing passes the actual token/true-value during training to alleviate the problem with weight updates - this may decrease the number of epochs required for convergence and may avoid other pitfalls of gradient based methods.
Thank you for your explanation. Why is it right shifted by one? Why isn’t it right shifted by 2 or more?