I forget whether Prof Ng discusses this anywhere in the lectures, but it turns out that the TF/Keras loss functions all support a selection of whether the inputs are “logits” (meaning the linear activation output) or actual “post activation” values. The argument that controls this is *from_logits* and it takes a Boolean value and defaults to *False*. Have a look at the documentation for TF categorical cross entropy loss. The reason they offer the *from_logits = True* mode is that it is more efficient and more “numerically stable” to compute the activation and the loss at the same time. For example, it becomes easier to deal with the “saturation” case in which some of the outputs turn out to be exactly 0 or exactly 1. That never happens from a “pure math” point of view, but we are dealing with finite floating point representations here, so it can actually happen. In those cases, the loss would be undefined if you don’t handle that case (NaN or Inf).

So Prof Ng always uses *from_logits = True* mode from this point forward. The activation function is still being applied, but it happens “inside” the loss function. The same option exists for binary cross entropy loss and the sparse version of categorical cross entropy.