The other point to realize as you go through this is that everything here is a derivative of either J or L, right? The question is “with respect to what”? So just using the notation \nabla J is going to be ambiguous. E.g. what do you call what Prof Ng calls db?
But the higher level point here is that ML notation is not the same as math notation. I also came to this from the math side of the world, so had to adjust a bit. Another example is when they say log
here, they always mean natural log.
Prof Ng is the boss here, so he gets to choose his own notation and we just have to deal with it. The real thing to be aware of is that when he says dSomething
, you have to be careful to realize whether he means a derivative of J or of L w.r.t. the Something
. That’s the real ambiguity. Here’s a thread which discusses that point w.r.t. the factor of \frac {1}{m} that you see in dW and db, but not in other gradients.