It it not documented on the loss function page, because it’s a higher level concept in TensorFlow. TF is heavily Object Oriented and they can’t afford to document every single property that is inherited from higher levels at every leaf node in the graph. TF uses “samples first” orientation for all data, but that is not the way Prof Ng has done it up to this point.
Here’s a thread which talks about this in a bit more detail.