W4 - Help with training the Transformer model built in the assignment

I’m wondering if anyone could help me find some resources that might explain how to train the Transformer model built in the assignment.

As soon as I finished the assignment, I wanted to see if I could use this implementation to train a transformer on a dataset of indexed input/output sentences I already had prepared.

I tried different variations of
model = Transformer(…)
optimizer = …
model.compile(…)
model.fit(…)
and invariably get the following error;
"Models passed to fit can only have training and the first argument in call as positional arguments, found: [‘tar’, ‘enc_padding_mask’, ‘look_ahead_mask’, ‘dec_padding_mask’ "

In the TensorFlow documentation for tf.keras.Model, I see that the call method should use a list of inputs and a list of masks. Should I try changing the assignment code to match this structure, or am I wasting my time trying to train the model in this way?

I’m a bit frustrated that this assignment uses the APIs in ways that are completely unprecedented and unexplained in DeepMind.ai, and also doesn’t provide an example of training on some data.

Hi,
Here is an example. Transformer implementation is almost same, except padding/look_ahead mask, please refer to this thread.

1 Like

That’s very helpful. Thanks!