Hi! I have a question about the forward_propagation function in assignment week 3. Why do we return Z4 just without using any activation function for that layer?

I forget whether Prof Ng discusses this anywhere in the lectures, but it turns out that the TF/Keras loss functions all support a selection of whether the inputs are â€ślogitsâ€ť (meaning the linear activation output) or actual â€śpost activationâ€ť values. The argument that controls this is *from_logits* and it takes a Boolean value and defaults to *False*. Have a look at the documentation for TF categorical cross entropy loss. The reason they offer the *from_logits = True* mode is that it is more efficient and more â€śnumerically stableâ€ť to compute the activation and the loss at the same time. For example, it becomes easier to deal with the â€śsaturationâ€ť case in which some of the outputs turn out to be exactly 0 or exactly 1. That never happens from a â€śpure mathâ€ť point of view, but we are dealing with finite floating point representations here, so it can actually happen. In those cases, the loss would be undefined if you donâ€™t handle that case (NaN or Inf).

So Prof Ng always uses *from_logits = True* mode from this point forward. The activation function is still being applied, but it happens â€śinsideâ€ť the loss function. The same option exists for binary cross entropy loss and the sparse version of categorical cross entropy.