Question about week 3 assignment

Hi! I have a question about the forward_propagation function in assignment week 3. Why do we return Z4 just without using any activation function for that layer?

I forget whether Prof Ng discusses this anywhere in the lectures, but it turns out that the TF/Keras loss functions all support a selection of whether the inputs are “logits” (meaning the linear activation output) or actual “post activation” values. The argument that controls this is from_logits and it takes a Boolean value and defaults to False. Have a look at the documentation for TF categorical cross entropy loss. The reason they offer the from_logits = True mode is that it is more efficient and more “numerically stable” to compute the activation and the loss at the same time. For example, it becomes easier to deal with the “saturation” case in which some of the outputs turn out to be exactly 0 or exactly 1. That never happens from a “pure math” point of view, but we are dealing with finite floating point representations here, so it can actually happen. In those cases, the loss would be undefined if you don’t handle that case (NaN or Inf).

So Prof Ng always uses from_logits = True mode from this point forward. The activation function is still being applied, but it happens “inside” the loss function. The same option exists for binary cross entropy loss and the sparse version of categorical cross entropy.

2 Likes