Invalid output shape required by the grader?

Hi Everyone,

I’m trying to do the assignment for the course #2 week #4

I am quire confused about the setup.
the data shape requirements are:

Expected Output:

Training images has shape: (27455, 28, 28) and dtype: float64
Training labels has shape: (27455,) and dtype: float64
Validation images has shape: (7172, 28, 28) and dtype: float64
Validation labels has shape: (7172,) and dtype: float64

the output label seems to be defined as actual letter ‘a’ - ‘z’ (or the number 1-26)

I can’t use that as is for defining the network.
I tried to use the one hot encoding for 26 classes for training and validation properly, used that as the last layer in the network:

training_categorical_labels = to_categorical(training_labels, num_classes=26)

tf.keras.layers.Dense(26, activation=‘softmax’)

[snippet removed by mentor]

as a result, I get decent training/validation accuracy, but then in the grader I get:

Failed test case: your model could not be used for inference. Details shown in ‘got’ value below:.
Expected:
no exceptions,
but got:
in user code:

File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1366, in test_function  *
    return step_function(self, iterator)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1356, in step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1349, in run_step  **
    outputs = model.test_step(data)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1306, in test_step
    y, y_pred, sample_weight, regularization_losses=self.losses)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/compile_utils.py", line 201, in __call__
    loss_value = loss_obj(y_t, y_p, sample_weight=sw)
File "/opt/conda/lib/python3.7/site-packages/keras/losses.py", line 141, in __call__
    losses = call_fn(y_true, y_pred)
File "/opt/conda/lib/python3.7/site-packages/keras/losses.py", line 245, in call  **
    return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/opt/conda/lib/python3.7/site-packages/keras/losses.py", line 1665, in categorical_crossentropy
    y_true, y_pred, from_logits=from_logits, axis=axis)
File "/opt/conda/lib/python3.7/site-packages/keras/backend.py", line 4994, in categorical_crossentropy
    target.shape.assert_is_compatible_with(output.shape)

ValueError: Shapes (None, 1) and (None, 26) are incompatible

is there some actual error in the setup, or should I use some custom/advanced code/layers?

Number of neurons should equal the number of classes in the case of a multi-class classification problem.
As far as mentioning the loss function is concerned, if you explicitly one-hot encode the labels, then categorical cross entropy is correct. There exists a variation of the loss function which allows the true labels to be integers instead of their one-hot encoded versions.

1 Like

Basically my question is - in this assignment, do you expect a single integer value as an output of the network (1)?
Or do you expect a single array of ones and zeroes of size 26 (2)?
I did the second one, and the grader seem to expect the first one.

If you are suggesting that the output should be one integer number - then how does it relate to the earlier requirement of converting the data to float64?

There’s actually a trick with assignment for week 4 of C2 - it requires us to use the different image generator method (.flow instead of .flow_from…) - and this one requires additional preprocessing of labels that is in no way mentioned in the course or in the assignment.
You have to actually convert labels to arrays of 23 zeroes and 1 ones in order for the generator to pass them in the correct form using LabelBinarizer from sklearn package.
Probably because of that it doesn’t give 100% score on that assignment even though my model goes above 99% on training and 95% on validation datasets.

Balaji.ambresh is right. There absolutely is a loss function which enables model to use not array of 0’s and 1 but but single integer label

As far as I can tell. Requirements do specifically mention that output of model should be equal to number of categories.
float64 can preserve the number as an integer if conversion is done properly. My first attempt resulted in some multiples 10^(-312).

Just to clarify - the shape of the labels that passes the grader requirements is easily achievable, but it’s NOT the shape of the labels that the model will accept (of course, the activation function is changed to the one required for multi-class labelling).
There’s an additional step that has to be applied to the labels BEFORE you pass them to generators. It’s a tricky one.

If that is all “tricky” and “not mentioned anywhere in the course” and “there exists a loss function that does this thing” - can I ask for more specific hints? Are there any examples I can find somewhere else? It is funny that we are learning neural networks, but there are no examples available :slight_smile:

Thanks Everyone for the hints!

I suppose we can’t explicitly share the answers here, but actually my approach was not the best (as I just realized from the following course). There IS an activation function that works well with the desired output and it was not covered in the course. The idea is most probably to make learners do some digging in the documentation for themselves. It’s pretty easy though, just search for implementations of dense layers for multi-categorical labels.