Strangely, I am just confused by the following part of code used in the assignment:

## TEST CODE:
def base_model():
inputs = tf.keras.layers.Input(shape=(2))
x = tf.keras.layers.Dense(64, activation='relu')(inputs)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
test_model = base_model()
test_image = tf.ones((2,2))
test_label = tf.ones((1,))

What I understand from the base_model, is that it required each input data to be 2D i.e. it has 2 features (batch_size, image_height,image_width). The “test_image” is a (2,2) matrix i.e. two data events each of value [1,1]. i.e batch_size=2, image_height=2, image_width=2. However, “test_label” is defined as a 1D vector with just one event with value 1 i.e. label=1. This is why I am confused. Shouldn’t there be 2 labels in “test_labels” corresponding to batch_size=2 of the inputs? I tied giving two labels, I get an error:

#if I do
test_label = tf.ones((2,1))
InvalidArgumentError: Received a label value of 1 which is outside the valid range of [0, 1). Label values: 1 1 [Op:SparseSoftmaxCrossEntropyWithLogits]

Could you please let me know as to where I am going wrong?

This is a tricky one, but it boils down to the fact that SparseCategoricalCrossentropy() is intended specifically for the situation where there are two or more label classes, as you can see here: tf.keras.losses.SparseCategoricalCrossentropy | TensorFlow Core v2.9.1
But, the model used in the test code effectively sets the number of classes to 1 by using a “1” as the first parameter to the final Dense layer in base_model.

I did a little experimenting, and it looks like if you pass a y_pred of shape (n,1) it flips it to be shape (1,n). I’m not sure if this is a bug or a feature, but it’s possible it’s intentional. Since the function is specifically expecting 2 or more class values in each y_pred row, if they only get one, they could assume you just accidentally flipped it, and then they “correct” it for you. In any case, it’s an atypical behavior.

As a test, you can get more typical behavior if you change the “1” in the final Dense layer in base_model to a “2”. Then SparseCategoricalCrossentropy won’t flip the y_pred, and you can change test_label to tf.ones((2,)) to get a more expected behavior.

I’ll submit a ticket for the developers to ask them to update this test code to use 2 or more classes for the final layer so we don’t hit this weird anomaly in SparseCategoricalCrossentropy. It will be confusing for any future students who look at this as carefully as you did.