It is just a choice that Prof Ng can make and he makes it differently in MLS than in DLS. Of course once you make the choice, it has lots of implications for how you express the math formulas in terms of linear algebra operations. The way he does it in MLS is more common and agrees with the way that platforms like TensorFlow do it (having the “samples” dimension as the first dimension).

I can’t really speak for Prof Ng of course, but my guess is that he does it the way he does in DLS because the formulas are a bit more intuitive when you are dealing with feed forward networks. But note that once he gets to ConvNets in DLS C4, he switches to the “samples first” orientation, because we’ll be using TensorFlow and that requires samples first. Actually the first place this issue comes up in DLS is in C2 in the Week 3 assignment that introduces TensorFlow. Here’s a thread which discusses the point in a bit more detail.