Week 3 Excersice: compute_cost(logits, labels) function

I had trouble implementing completing this function and finally got it to work using @paulinpaloalto 's advice from another thread. Still, I don’t think I fully understand why I was using the transpose.

Is the logits argument the same thing as Z3 (i.e., output of last linear unit)? If thats the case, then why are we transposing? I ask because the labels aregument is the same shape.

You are right that the outputs are Z3, the “logits”, meaning the linear activation outputs of the last layer, as opposed to the full activation outputs (A3). But that is not the reason for the transpose: the activation functions are always applied “elementwise”, so they don’t change the dimensions or orientation of the data. It just turns out that they defined the network to take data that is oriented as n_x x m where n_x is the number of features and m is the number of samples. That’s the way Prof Ng chose to orient the data in Course 1 and earlier in Course 2. But it turns out that the TensorFlow functions we are now switching to using assume that the “samples” dimension is the first dimension. That is why you have to do the transpose to get the m as the first dimension. They mention this in the instructions for the compute_cost section of the assignment.

Note that the reason the network outputs the logits instead of the activation outputs is that Prof Ng choose to use the from_logits = True mode of the various cross entropy loss functions here. This is the way it will be whenever we’re using TF. Here’s another recent thread which discusses that.

Thank you for the reply…I missed the note on the shape in the instructions.

Is there any benefit of using the logits? Or is it simply a preference?

Please read the thread that I linked in the second paragraph of my previous reply. That gives some explanation about that point …