Update: Please note that after this thread was created, the Course Staff made some significant updates to this assignment, which include switching to using CategoricalCrossEntropy for the loss function in this section.
Because we specify the from_logits = True argument, that means that the loss logic will apply either sigmoid or softmax to the logits to compute the actual \hat{y} values and then will compute the cross entropy loss between the predictions and the labels:
-y_true * log(sigmoid(y_pred))
Note that I think there’s a big question here that they don’t explain: they tell us to call BinaryCrossentropy loss, but we’ve actually got a multi-class problem here. So I think technically we should be calling CategoricalCrossentropy loss, which has the same from_logits argument. If you read the TF docs, it sounds like what they are doing here should be a bug. Then they don’t really give you any way to assess the results of the training. I went ahead and added the logic to compute the prediction accuracy just for my own edification and it turns out the training here works just fine, although it works a lot better if you use Adam optimization as suggested in the instructions as opposed to the SGD that the code template actually uses. I conclude from this that the Keras BinaryCrossentropy function is actually smart enough to see that this is not a binary case and just does “The Right Thing™”. I have filed a request with the course staff to clarify this.
As to the question of when to use from_logits = True, I think it’s just your choice. But if you read the documentation about this or do a little googling, you find that the reason that they added this feature is that it gives better efficiency (invoking one method instead of two) and also allows them to implement the computations in a way that is more numerically stable. E.g. consider the case in which you have saturated sigmoid values. So in all the cases I’ve seen in this course and others, they always seem to choose from_logits = True. It’s fewer lines of code and it apparently works better, so it seems like the way to go.