CNN Week1 2nd Assignment - 'categorical_crossentropy' applied twice?

I noticed that the compilation happened when I ran the test for the ‘convolutional_model’ and during this compilation it is specified the ‘categorical_crossentropy’ as the loss function.

At the same time the output of the ‘convolutional_model’ already contains the application of ‘activation=Softmax’ in the last layer.

  1. Are we applying ‘categorical_crossentropy’ (Softmax) twice? What is the actual effect of this?
  2. As a best practice should we specify the activation of the last layer in the model or in the compilation? or in both?

Categorical cross entropy is not the same thing as softmax. Softmax is the activation function and categorical cross entropy is the loss function that is used when softmax is the activation.

What we do always is not add an explicit softmax (or sigmoid in the binary case) to the output layer and then use the from_logits = True mode of the corresponding cross entropy loss function in order to get better numeric behavior. What that does is tell the loss function to run softmax internally. But that means that when we want to use the trained network in inference (prediction) mode, we manually have to apply softmax to the output to get the prediction values.

Here’s a thread which explains why it is done that way.

Thanks @paulinpaloalto! I was confused between the activation and the loss functions.

If I understand correctly, the recommendation for building models is:

  1. not to include a specific activation in the last layer of the network
  2. when compiling, specify the loss function to be used, with ‘from_logits = True’, which will make the loss function to compute both the activation and the loss as a unified computation (with the additional benefit of numerical stability

Is the above correct regardless of the ‘loss function’ to be used when compiling?

Yes, as long as you are talking only about classification problems. Not all loss functions provide the from_logits capability. As far as I know, it is only the various variations of cross entropy loss that offer from_logits, but that covers all the classification problems that we see in these courses other than perhaps YOLO.