You can interpolate LaTeX by bracketing the expression by single dollar signs. This is covered on the FAQ Thread.

W.r.t. your larger points, I suggest you think more carefully about what we are actually doing here. Prof Ng is showing you how to break down the computation into multiple steps. We are using the Chain Rule everywhere, but remember that J is the very last step, right? Think about what happens in the computation for a layer other than the last. How about the very first hidden layer? How many other layers do you have to go through in order to get to J? We can only include the factor of \frac {1}{m} once for each dW^{[l]} and db^{[l]} value, right? Otherwise we end up with \frac {1}{m^n}, where n is the number of subsequent layers.