CNN Week1 2nd Assignment - 'categorical_crossentropy' applied twice?

Categorical cross entropy is not the same thing as softmax. Softmax is the activation function and categorical cross entropy is the loss function that is used when softmax is the activation.

What we do always is not add an explicit softmax (or sigmoid in the binary case) to the output layer and then use the from_logits = True mode of the corresponding cross entropy loss function in order to get better numeric behavior. What that does is tell the loss function to run softmax internally. But that means that when we want to use the trained network in inference (prediction) mode, we manually have to apply softmax to the output to get the prediction values.

Here’s a thread which explains why it is done that way.