W4 - Help with training the Transformer model built in the assignment

branderhorst · May 16, 2021, 4:56pm

I’m wondering if anyone could help me find some resources that might explain how to train the Transformer model built in the assignment.

As soon as I finished the assignment, I wanted to see if I could use this implementation to train a transformer on a dataset of indexed input/output sentences I already had prepared.

I tried different variations of
model = Transformer(…)
optimizer = …
model.compile(…)
model.fit(…)
and invariably get the following error;
"Models passed to fit can only have training and the first argument in call as positional arguments, found: [‘tar’, ‘enc_padding_mask’, ‘look_ahead_mask’, ‘dec_padding_mask’ "

In the TensorFlow documentation for tf.keras.Model, I see that the call method should use a list of inputs and a list of masks. Should I try changing the assignment code to match this structure, or am I wasting my time trying to train the model in this way?

I’m a bit frustrated that this assignment uses the APIs in ways that are completely unprecedented and unexplained in DeepMind.ai, and also doesn’t provide an example of training on some data.

edwardyu · May 17, 2021, 9:11am

Hi,
Here is an example. Transformer implementation is almost same, except padding/look_ahead mask, please refer to this thread.

branderhorst · May 17, 2021, 1:42pm

That’s very helpful. Thanks!

Topic		Replies	Views
Week 4 A1, Pass it all the tests in the assignment except last one Sequence Models week-4	2	164	April 2, 2024
Course 5 Week 4 Exercise 8 - failed Transformer_test Sequence Models	1	770	August 2, 2021
W4 Assignment 1 Exercise 8: Question about the custom Transformer model's call method Sequence Models week-4	3	306	January 9, 2024
Transfromer Network assignment is missing training? Sequence Models	1	526	July 20, 2021
C5_W4_A1 UNQ_C4 Encoder Layer Mask Sequence Models	16	1062	August 3, 2021

W4 - Help with training the Transformer model built in the assignment

Related topics