C2W4 assignment, training model

Hi,

I am trying to pass the C2W4 assignment and got the error when training the model:

    File "/usr/local/lib/python3.10/dist-packages/keras/src/losses.py", line 2221, in categorical_crossentropy
        return backend.categorical_crossentropy(
    File "/usr/local/lib/python3.10/dist-packages/keras/src/backend.py", line 5575, in categorical_crossentropy
        target.shape.assert_is_compatible_with(output.shape)

    ValueError: Shapes (None, 1) and (None, 26) are incompatible

When I created my model like this. I still need a last layer of 26 nodes (or 25 with slight improvement) . So I don’ t understand why a shape (None, 1) is expected

{moderator edit - solution code removed}

I had the correct output in the previous execution:

**Expected Output:**

Images of training generator have shape: (27455, 28, 28, 1)
Labels of training generator have shape: (27455,)
Images of validation generator have shape: (7172, 28, 28, 1)
Labels of validation generator have shape: (7172,)

I any suggestion… Thank you

I can’t seem to find this code in the C2W4 assignment you mentioned. You filed this issue under Deep Learning Specialization > Convolutional Neural Networks - is that correct? Can you give us the name of the course and the name of the assignment?

From the error message and expected out you provided, I think it expects the last layer in your code to output a 1-dimensional value, so the commented line for Dense(1, …) seems necessary, but I can’t tell what is correct without knowing the actual assignment you’re referring to

Yes it is “Multi-class Classification” under DeepLearning.AI TensorFlow Developer Professional Certificate > Convolutional Neural Networks in TensorFlow coursera course

It is true that I haven’t any error if I use a 1-dimensional last layer. But as it is a multi-dimensional, it looks not adequate, no?

{moderator edit - solution code removed}

I think the problem is that train_generator is not Multi-class Classification, so I tried to convert to categorical, but execution is too long, and never ends in coolab

Save your model

model = create_model()
tf.keras.utils.to_categorical(train_generator, 26)

Train your model

history = model.fit(train_generator,
epochs=15,
validation_data=validation_generator)

Well, that explains why I could not find this assignment in the DLS courses.

You can use the “pencil” icon in the thread title to move this to the correct forum area.

Once there, you’ll probably get a warning about posting your code on the forum. That’s not allowed by the Code of Conduct.

If the model is outputing a 26 class softmax multiclass classifier, then you’re right that it doesn’t really make sense to reduce that to one neuron for the output. Maybe the problem is that your labels are in “categorical” form (one element with a value between 0 and 25 inclusive) and you’re just using the wrong loss function. In this kind of multiclass case, you have two choices:

  1. Convert your Y values to “one hot” form and use CategoricalCrossEntropy as the loss function.
  2. Leave the Y values in categorical form and use SparseCategoricalCrossEntropy as the loss function, which will do the one hot conversion internally.

My guess is that you used CategoricalCrossEntropy but your labels are in categorical form, which is why you get that shape mismatch error.

But I’m in the same situation as hackyon: I don’t know this course, so don’t really know the details of what this assignment is doing.

1 Like

Hi Paul,
Yes I think you’re right about the labels. Labels look already in categorical form as I didn’t notice at the beginning of the assignment : The first value is the label (the numeric representation of each letter) , so it explains why an one dimensional layer is expected at the end.

You can do “argmax” on the (m, 26) softmax outputs to get the actual class predictions, but note that the loss function is based on softmax so it needs all 26 values in the “probability distribution” form for the \hat{Y} values. So the option is not to make your network output a (m,1) tensor, but to make your cost function able to cope with (m,1) labels, right? That was what I was trying to say in my previous post. :nerd_face:

Thank you, yes my cost function has to be able to cope with (m,1). The ‘Sparse Categorical Crossentropy’ was the appropriate categorical loss for this purpose, with a last layer (m, 25) softmax resolved the case