W4 - Help with training the Transformer model built in the assignment

Hi,
Here is an example. Transformer implementation is almost same, except padding/look_ahead mask, please refer to this thread.

1 Like