Backpropagation in Convolutional Neural Networks - dW overall derivative

Arkadiusz_Sitkiewicz · August 24, 2022, 4:46pm

I have a question about calculating dW from Conv Layer. The formula goes as follows:

When we calculate this gradient in the code - we add values with respect to every example. I thnik that this gradient should be divided by number of examples at the end (as it always is because of cost function). Is it true and there is an mistake in this programming assignment or I just misuderstood something?

TMosh · August 24, 2022, 10:07pm

Please provide the Week and Assignment number.

TMosh · August 24, 2022, 10:11pm

Assuming this is from C4 W1 A1:

If the gradients will be used in full-batch gradient descent, then typically they would be divided by m.

If the gradients will be used in stochastic GD or with smaller batches, then the 1/m may not be used.

Particularly for SGD, it’s a real-time process and we don’t know what ‘m’ to use.

If m is a constant, then 1/m is also a constant, and you can compensate for not having 1/m by using a smaller learning rate.

Topic		Replies	Views
Course 4 Week 1 Assignment 1 - Exercise 5 conv_backward() Convolutional Neural Networks	1	572	April 21, 2022
Dividing by "m" in back propagation using vectorized implementation Neural Networks and Deep Learning week-3	3	454	February 19, 2024
C4_W1_A1 dW and db calculation question Convolutional Neural Networks	4	521	March 9, 2023
C4W1 CNN back propagation Convolutional Neural Networks	1	618	November 2, 2021
Backpropagation in CNN Convolutional Neural Networks	3	489	September 23, 2023

Backpropagation in Convolutional Neural Networks - dW overall derivative

Related topics