C3W2 - Problem fitting the model

I get all expected output and pass all prior tests but when I fit my model (running given cell in section 4.4) I get the following error:
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 104 for ‘{{node Squeeze}} = SqueezeT=DT_FLOAT, squeeze_dims=[-1]’ with input shapes: [?,104].

104 is the lenght of each sentence in the training set.
The function that fails is:
model.fit(train_dataset.batch(BATCH_SIZE),
validation_data = val_dataset.batch(BATCH_SIZE),
shuffle=True,
epochs = 2)

I’m stuck and can’t figure out what’s wrong. I would really appreciate some help.
Thanks.

I supect the issue is in my masked_accury function. It passes all the tests but not the grader.

Hi @Alexandre_Duriez,

Thank you for your post. I wanted to remind you that sharing graded code publicly on the forum is against our community guidelines. We encourage discussing concepts, errors, and coding techniques, but direct sharing of code that is part of an assignment or test can compromise academic integrity.

Kindly remove the posted graded code. If you’re encountering specific errors or need guidance on a particular aspect of your code, feel free to describe the issue or error messages you’re facing. Our mentors can provide help based on that information.

If there’s a need for a detailed code review or if you wish to share your code for personalized assistance, please use the private messaging feature to share it with a mentor. We’re here to help while ensuring that we maintain a fair and honest learning environment.

Thank you for understanding and for contributing to our community!

Hi @Alexandre_Duriez

On the last line:

    # Compute masked accuracy (quotient between the total matches and the total valid values, i.e., the amount of non-masked values)

we do not specify the axis parameter. In other words, we want to calculate all the accurately predicted labels (in our case by summing matches_true_pred which accounts for padded tokens) and divide by the number of all the elements (in our case by summing mask, which accounts for padded tokens). And both of these are scalars (not vectors).

Cheers

Thank you very much.