Hi,
I don’t know if this is really a typo or my flawed understanding
In C1W4 Video: Forward and Backward Propagation at 7:40 where you have mentioned vectorized formula for dA[L], I think you missed dividing it by batch size (m).
In MLS (linear and logistic regression), we have learnt that cost and its gradient are always divided by m to get the average cost/gradient across all training examples.
Could it be that we are not doing that in DLS and compensating it by dividing dW and db by m instead?
Your understanding of the notation is incorrect. The quantity dA^{[L]} is a vector. There is no average taking place there, which is the only point at which the factor of \frac {1}{m} would be necessary. The only sums there are within each element of the vector. There are commas not + signs between the entries, although perhaps that’s a bit hard to see in the slide.
This is a common confusion with the way Prof Ng uses his notation. It is a bit ambiguous and you have to understand the context to know what quantity is in the numerator of the partial derivative. Here’s another very recent thread on this general point.
Oh, right. It makes sense. A[L] is a vector, so keeping dA[L] as vector will keep the notation consistent and easier to remember.
I guess I got confused due to round brackets.
Thanks!
Glad to hear the explanation helped. Note that the parens instead of brackets there are because he’s writing “math” there, not python code. It’s just one of those things to keep in mind always: am I looking at code or math. Unfortunately the conventions are not always the same.