C1W1 Training, compile_and_train(translator) fails in Epoch 1. All prior unit tests passed

WiseRaptor · January 9, 2024, 7:51am

I have tried refreshing my workspace, and I retyped/reran all the prior code windows in a fresh notebook. I have made NO modifications outside the designated sections. All unit tests pass, and all output shapes match the expected values. One other note - I tried submitting to the grader to see if I would get any useful feedback - I get zero on all sections with the message, “There was a problem compiling the code from your notebook. Details:
invalid syntax (, line 497)” I’ve never submitted before without completing the entire assignment, so I’m not sure if it is that or a real/related issue.

Here is the snippet from the train step error:

TypeError: in user code:

    File "/usr/local/lib/python3.8/dist-packages/keras/src/engine/training.py", line 1338, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.8/dist-packages/keras/src/engine/training.py", line 1322, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.8/dist-packages/keras/src/engine/training.py", line 1303, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.8/dist-packages/keras/src/engine/training.py", line 1081, in train_step
        loss = self.compute_loss(x, y, y_pred, sample_weight)
    File "/usr/local/lib/python3.8/dist-packages/keras/src/engine/training.py", line 1139, in compute_loss
        return self.compiled_loss(
    File "/usr/local/lib/python3.8/dist-packages/keras/src/engine/compile_utils.py", line 317, in __call__
        self._total_loss_mean.update_state(
    File "/usr/local/lib/python3.8/dist-packages/keras/src/utils/metrics_utils.py", line 77, in decorated
        update_op = update_state_fn(*args, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/keras/src/metrics/base_metric.py", line 140, in update_state_fn
        return ag_update_state(*args, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/keras/src/metrics/base_metric.py", line 509, in update_state  **
        sample_weight = tf.__internal__.ops.broadcast_weights(
    File "/usr/local/lib/python3.8/dist-packages/keras/src/engine/keras_tensor.py", line 285, in __array__
        raise TypeError(

    TypeError: You are passing KerasTensor(type_spec=TensorSpec(shape=(), dtype=tf.float32, name=None), name='Placeholder:0', description="created by layer 'tf.cast_23'"), an intermediate Keras symbolic input/output, to a TF API that does not allow registering custom dispatchers, such as `tf.cond`, `tf.function`, gradient tapes, or `tf.map_fn`. Keras Functional model construction only supports TF API calls that *do* support dispatching, such as `tf.math.add` or `tf.reshape`. Other APIs cannot be called directly on symbolic Kerasinputs/outputs. You can work around this limitation by putting the operation in a custom Keras layer `call` and calling that layer on this symbolic input/output.

Any help would be greatly appeciated.

Thanks!

arvyzukai · January 9, 2024, 8:19am

Hi @WiseRaptor

I’m not sure I fully understand the problem. Which exercise are you struggling with? From what I understand, it passes all the tests (and you haven’t completed the rest) but wanted to know what’s the grader evaluation?

Regards

WiseRaptor · January 9, 2024, 3:55pm

No, my problem is that I can’t complete the rest of the assignment because the training step fails. The training step is provided code, so I haven’t touched it other than to try to execute it. I completed all the exercises BEFORE the training step, Exercises 1-4, and they passed the unit tests. The heart of the error message is this:

TypeError: You are passing KerasTensor(type_spec=TensorSpec(shape=(), dtype=tf.float32, name=None), name=‘Placeholder:0’, description=“created by layer ‘tf.cast_23’”), an intermediate Keras symbolic input/output, to a TF API that does not allow registering custom dispatchers, such as tf.cond, tf.function, gradient tapes, or tf.map_fn. Keras Functional model construction only supports TF API calls that do support dispatching, such as tf.math.add or tf.reshape. Other APIs cannot be called directly on symbolic Kerasinputs/outputs. You can work around this limitation by putting the operation in a custom Keras layer call and calling that layer on this symbolic input/output.

I only tried the grader to see if I had made a mistake upstream that had gotten past the unit tests. When I Google this error message, I find a lot of posts saying that disabling eager execution is the issue, but if others have passed this with the same config I doubt that.

I suspect either there is some weird notebook issue, or that perhaps the issue lies in my implementation of the Encoder. I really wrestled with getting it to work. My final implementation was simple: just passing the context tensor explicitly through keras.Input to the embedding layer - a single line of code. Do I need to implement that differently? I wonder if I need to define that using the shape= instead?

Thanks!

paulinpaloalto · January 9, 2024, 4:09pm

I don’t know this assignment, but the point is that you may not have written the training cell, but it uses the earlier code you wrote, right? It’s training the model that you defined. So that says there is something wrong with how you wrote it having to do with a scalar tensor which it looks like you created with tf.cast. Just guessing from the error message, but you are passing it to a TF API that doesn’t accept such an input because it’s not a Keras “Layer”. Here’s a thread which gives a nice explanation of the Keras Sequential and Functional APIs.

The reason that grader fails altogether is that it can’t even compile your code because of the later parts which are incomplete. There is something there that is a python syntax error, so it can’t get as far as running any of your earlier functions.

Sorry that I can give only “generic” as opposed to assignment specific guidance, but I’m not sure what timezone Arvydas is in, so thought it was worth a try …

arvyzukai · January 9, 2024, 4:20pm

As you do not need tf.cast(..) in this assignment, I would guess this error is deeper in the training code.

As Paul mentioned, passing tests does not guarantee that your code is correct.

There are many ways that this error was caused. For example, when you implement class CrossAttention(..) there is a line:

        ### END CODE HERE ###

        x = self.add([target, attn_output])

If your attn_output is a “layer”, then it cannot be added to a tensor.

WiseRaptor · January 9, 2024, 5:13pm

Paul and Arvy, first of all, thank you for your help with this. This is my first time asking for help on Coursera, but I have benefited greatly from your replies to others and learned a lot. I just wanted to say thanks!

Now back to the problem: I added tracing statements to my code to see where it is failing, and the training is apparently failing after the ‘return logits’ statement in the translator. It looks like it’s getting through all my code. I added a print statement for what is being returned - it looks like the tensor returned by the unit test is fine, but the training is returning a tensor with None in the dims. Like this:

Unit test:
logits KerasTensor(type_spec=TensorSpec(shape=(64, 14, 10000), dtype=tf.float32, name=None), name=‘decoder_30/dense_30/LogSoftmax:0’, description=“created by layer ‘decoder_30’”)
All tests passed!

Training:
logits KerasTensor(type_spec=TensorSpec(shape=(None, None, 12000), dtype=tf.float32, name=None), name=‘decoder_28/dense_28/LogSoftmax:0’, description=“created by layer ‘decoder_28’”)

It turns out that the tensor being passed to the encoder during training is:
context: Tensor(“IteratorGetNext:0”, shape=(None, None), dtype=int64).

This is because the trainer is passing this to the translator to begin with:
t inputs (<tf.Tensor ‘IteratorGetNext:0’ shape=(None, None) dtype=int64>, <tf.Tensor ‘IteratorGetNext:1’ shape=(None, None) dtype=int64>)

Any suggestions? The training step doesn’t seem to be passing any data?

Thanks again!

WiseRaptor · January 9, 2024, 10:32pm

Given that training code and data input code are provided and it’s not passing any training data (and the data seems to have imported successfuly given the cells that explored the data worked), should I be contacting tech support somehow?

paulinpaloalto · January 10, 2024, 1:38am

Sorry but I don’t think there is any “Tech Support” other than the mentors. Maybe Arvydas knows how to contact the course authors for NLP. But the next step is to look at your actual notebook. But we are not allowed to do that publicly and note that the mentors cannot directly look at your notebooks: those are private to you. But we can share it on a Direct Message thread. I will start one and cc Arvydas. Check your messages. You can recognize DMs by the little envelope icon.

paulinpaloalto · January 10, 2024, 2:57am

It is using the training data and validation data that it imported from utils.py. Here’s the training cell that is given to you:

def compile_and_train(model, epochs=20, steps_per_epoch=500):
    model.compile(optimizer="adam", loss=masked_loss, metrics=[masked_acc, masked_loss])

    history = model.fit(
        train_data.repeat(),
        epochs=epochs,
        steps_per_epoch=steps_per_epoch,
        validation_data=val_data,
        validation_steps=50,
        callbacks=[tf.keras.callbacks.EarlyStopping(patience=3)],
    )

    return model, history

It’s the “fit()” call there where the real action happens. Notice that it passes the imported training data and validation data as input. And it’s training the model that you defined in the earlier cells, so it’s the point I made before: just because you didn’t write the training code does not mean that it is independent of your code.

But then all your previous cells defining the components of the model pass their test cases. So that leaves us with a theory that there are two bugs:

There is somehow a bug in how you defined the various layers that form the model.
There is a bug in the test cases because they do not catch whatever your bug is.

But I admit that I don’t have a clue yet what the nature of the problem is. Stay tuned …

Of course the other high level point here is that this version of the course that uses TensorFlow instead of Trax is very new: I think they released it in December. So it’s entirely possible that there are bugs in the course material and that you as an “early adopter” have stepped on one of the landmines in the code.

Paul

paulinpaloalto · January 10, 2024, 3:43am

If I make this simple change, then the training runs:

{moderator edit - solution code removed}

I commented out your original code and supplied the new line.

WiseRaptor · January 10, 2024, 3:56am

Thanks Paul! That worked! I would have sworn I had already tried that. I really appreciate your help!

paulinpaloalto · January 10, 2024, 4:07am

Great! But as I mentioned, this still leaves us with the question of why your code passes the “unit tests” in the notebook. I’ll have to take a look at those next …

paulinpaloalto · January 10, 2024, 4:30am

Ooops, sorry, I was confused and thought I was responding on the private DM thread where you sent us the notebook. I shouldn’t have posted the code here on the public thread, but it’s only one line. Now that you have it fixed, I edited the post to hide the code.

arvyzukai · January 10, 2024, 9:25am

If someone else in the future has the exact same error - the OP had a mistake in Exercise 1 - Encoder:

        # Pass the context through the embedding layer
        x = self.embedding(tf.keras.Input(tensor=context))

This would pass unit tests but would cause the error during training since this is not correct way of doing the embedding.
We use the call method to define the forward pass of our layer, and we would expect the input to be passed as an argument to the call method, not created within it.

So the correct way of implementing the embedding should be:

        # Pass the context through the embedding layer
        x = self.embedding(context)

Cheers

Topic		Replies	Views
C5W4: Transformer Architectures with TensorFlow Sequence Models week-module-4 , coursera-platform	42	4756	January 6, 2026
C4W2_next_word NLP with Attention Models week-module-2	15	95	September 3, 2024
C4W1: Exercise 4 - Translator NLP with Attention Models week-module-1	2	91	July 18, 2024
C4W1_Assigment_3. Training NLP with Attention Models week-module-1	3	335	February 12, 2024
Problem with Natural Language Processing with Attention Models NLP with Attention Models week-module-2 , coursera-platform	13	37	January 1, 2026

C1W1 Training, compile_and_train(translator) fails in Epoch 1. All prior unit tests passed

Related topics