C1W4 (deep learning spec) Why is dA[l] summing all da[l](i)s up in the last layer, not stack them as a vector?

whitecode · June 24, 2024, 10:29am

Hello,

I am referring to the equation at 8min from the video (forward and backward propagation), why in vectorized implementation, we are summing all da[l](i)s as a scalar when computing dA[l] in the last layer?
i are all training examples.

I have not deeply thought about this, so forgive me I am wrong

Thank you!

Alireza_Saei · June 24, 2024, 12:45pm

Hi @whitecode

Here, we sum all da^{[l]}(i) across all training examples to compute dA^{[l]} in the last layer because we are averaging the gradients over the entire batch of training examples. This considers the contribution from all examples in batch gradient descent.

Hope this helps, feel free to ask if you need further assistance!

Topic		Replies	Views
Possible typo (missing 1/m) Neural Networks and Deep Learning coursera-platform	3	615	August 21, 2022
Why dA[L] has no 1/m in week 4? Neural Networks and Deep Learning coursera-platform	5	560	July 19, 2021
Back propagation equations Neural Networks and Deep Learning coursera-platform	2	681	October 22, 2023
DLS Course 1 Week 4 exercise 9 L_model_backward Neural Networks and Deep Learning coursera-platform	2	698	August 12, 2021
W4_A1_Video Lecture on Forward & Backward functions Neural Networks and Deep Learning coursera-platform	4	554	January 15, 2023

C1W4 (deep learning spec) Why is dA[l] summing all da[l](i)s up in the last layer, not stack them as a vector?

Related topics