Coursera Grader ValueError: Shapes (None, 1) and (None, 24) are incompatible

Google Colab requires making shapes compatible to run the model as it already mentioned in similar topic and here python - ValueError: Shapes (None, 1) and (None, 3) are incompatible - Stack Overflow

So, tried in the first submission to use import to_categorical from keras utils and gave an error with grader

There was a problem compiling the code from your notebook. Details:
cannot import name ‘to_categorical’ from ‘keras.utils’ (/opt/conda/lib/python3.7/site-packages/keras/utils/

then changed synatx to to
training_labels_encoded = tf.keras.utils.to_categorical(training_labels, num_classes=num_classes) validation_labels_encoded = tf.keras.utils.to_categorical(validation_labels, num_classes=num_classes)

and getting Coursera grader error bellow while in the same time Google Colab all works well and the model achieves all required benchmarks

All tests passed for parse_data_from_input! All tests passed for train_val_generators! Details of failed tests for create_model Failed test case: your model could not be used for inference. Details shown in ‘got’ value below:. Expected: no exceptions, but got: in user code: File “/opt/conda/lib/python3.7/site-packages/keras/engine/”, line 1366, in test_function * return step_function(self, iterator) File “/opt/conda/lib/python3.7/site-packages/keras/engine/”, line 1356, in step_function ** outputs =, args=(data,)) File “/opt/conda/lib/python3.7/site-packages/keras/engine/”, line 1349, in run_step ** outputs = model.test_step(data) File “/opt/conda/lib/python3.7/site-packages/keras/engine/”, line 1306, in test_step y, y_pred, sample_weight, regularization_losses=self.losses) File “/opt/conda/lib/python3.7/site-packages/keras/engine/”, line 201, in call loss_value = loss_obj(y_t, y_p, sample_weight=sw) File “/opt/conda/lib/python3.7/site-packages/keras/”, line 141, in call losses = call_fn(y_true, y_pred) File “/opt/conda/lib/python3.7/site-packages/keras/”, line 245, in call ** return ag_fn(y_true, y_pred, **self._fn_kwargs) File “/opt/conda/lib/python3.7/site-packages/keras/”, line 1665, in categorical_crossentropy y_true, y_pred, from_logits=from_logits, axis=axis) File “/opt/conda/lib/python3.7/site-packages/keras/”, line 4994, in categorical_crossentropy target.shape.assert_is_compatible_with(output.shape) ValueError: Shapes (None, 1) and (None, 24) are incompatible .

pls help with the issue if somebody knows a workaround that will satisfy both model training in Google Colab and Coursera grader

Please look at this output:

Images of training generator have shape: (27455, 28, 28, 1)
Labels of training generator have shape: (27455,)
Images of validation generator have shape: (7172, 28, 28, 1)
Labels of validation generator have shape: (7172,)

This suggests that labels aren’t one hot encoded. Keeping this in mind:

  1. Number of outputs in the NN should match the number of classes.
  2. When labels aren’t one-hot encoded, use the proper version of the loss function to build the model and make the grader happy (see course 1 week 2 ungraded lab beyond_hello_world.ipynb to jog your memory).
1 Like

I was following the instructions you provided by using the compiler from C1_W2, but my accuracy is super low (~4%).

From C1_W2:

  model.compile(loss = 'sparse_categorical_crossentropy', 

From what I can see everything looks like it’s correct up to the point where I define my model.

Regarding my model, I use a Dense layer for the last layer with 24 nodes and a softmax activation function because I know there are only 24 categories in this dataset:

unique_values, counts = np.unique(training_labels, return_counts=True)
print(f"Number of unique labels: {len(unique_values)}")

>> Number of unique labels: 24

I am afraid something is totally wrong because this seems like a incredibly low accuracy and 15 epochs does not bring the score up above 5%.

While the number of unique labels in the training data is 24, the maximum value of a label is 24:

>>> 24.0

Please read the description of the dataset where they highlight that a few characters that aren’t present. An easier approach would be to follow this hint in the notebook:
which contains 28x28 images of hands depicting the 26 letters of the english alphabet.

You are correct there are 26 letters of the English alphabet but wouldn’t I only train my model on the number of categories (24 words) available in the training set?

I added this to my train_val_generators function to reformat the labels provided to the generators, but I am still getting less than 6% accuracy from my model. I had to use 26 because when I used 24 it threw and error.

training_labels = tf.keras.utils.to_categorical(training_labels, num_classes=26)
validation_labels = tf.keras.utils.to_categorical(validation_labels, num_classes=26)

Your argument regarding training a model on number of categories holds true only when len(np.unique(training_labels)) == np.max(training_labels) + 1. As you’ve observed in this lab, there are missing labels which make the highest value of the training label equal to the number of unique labels.

It’s safer to use np.max(training_labels) + 1 as a starting point so that the model can process the training data properly.

When labels in the test set that are unseen in the training data, 2 ways of dealing with this include:

  1. Report error that the label is unknown.
  2. Remove rows from test set whose labels aren’t present in training data prior to evaluating model performance.
1 Like

Thank you for this feedback. I don’t see this anywhere in the notebook or the training for the course, and I’m wondering if it makes sense to modify the documentation to include this information. Thanks so much for your help!

You’re welcome. The staff have been notified regarding your recommendation.

Just to follow up on your post, I ran the code suggested and I got the following:

Here is another example with an array that has more missing labels:

My understanding is that the training data has labels, where the highest label integer value is 24. We know that the total number of categories seen in our training data is 24 individual letters which happens to be equal to the np.max(training_label) value because the labels are zero-indexed.

When we create a model, our last layer needs to be a dense layer with the number of nodes equal to the number of categorical values present in our training labels.

Where I am confused is that the code above shows that there are 24 individual letters present in our training labels. However it we converted it to one-hot encoding we would actually have 25 labels because the letter at index 9 is missing.

So I think the question we need to answer is whether our final dense layer should use a one-hot encoded number of nodes or only the total number of categories seen in the training data set?

Firstly, it should be len(np.unique(training_labels)) == np.max(training_labels) + 1 and this has been fixed in my previous reply. Sorry about the typo.

As you noticed, labels start with 0. Since the maximum label is 24, there are 25 categories.

Number of units in the output dense layer should correspond to the number of categories. You don’t need to one-hot encode the labels since the sparse version of the loss takes care integer represented true labels.