It’s a good question that has come up before. Here’s an earlier thread that discusses the same points.
The point is that most of the formulas Prof Ng shows are for “layer” level Chain Rule factors and the \frac {1}{m} only comes in when you finally put all the Chain Rule factors together to compute the actual gradients of the weight or bias values. You could have structured things differently, but you need to make sure you don’t end up with multiple factors of \frac {1}{m}.
Of course computing that last factor \displaystyle \frac {\partial J}{\partial L} is easy: the gradient of the average is the average of the gradients. Think about it for a second and that should make sense.