Course 2, Week 3, compute_total_loss(logits, labels)

Well, this is your first encounter with TensorFlow, which is pretty deep waters. It just turns out that Prof Ng, for his own reasons, has chosen to use a representation for the “sample” data in which the first dimension is the “features” dimension and the second dimension is the “samples” dimension. He has consistently used that through Course 1 and Course 2 right up to exactly this point at which we are calling a TF loss function for the first time. It turns out that TF is a general platform, so it has to deal with lots more complex cases than the vector inputs to a Feed Forward Network, which is all we have seen up to this point. We are about to graduate to Convolutional Nets in Course 4 (Course 3 is not a programming course), in which the inputs are 3D image tensors. For those more complex inputs, everyone I’ve seen orients the data such that the first dimension is the “samples” dimension and that is how TF works because it needs to support the general case.

They do explicitly mention in the instructions for that section that you need to be cognizant of that. Perhaps they should have made that section a bit more verbose. :nerd_face:

So you’re right that this has never been mentioned before, but it was never necessary until literally this moment. In terms of why Prof Ng chooses to do it this way, my guess is that when we’re writing the code for ourselves in python, the code is cleaner with the orientation that he has chosen up to this point. But we will soon graduate to doing things the way everyone does in “real” applications: using frameworks like TF or PyTorch to build things. Prof Ng does have an important pedagogical reason for showing us the details of how the algorithms work, because that gives us much better intuitions than we could get by starting with the frameworks and viewing all the algorithms as “black boxes”.