Backpropagation in Convolutional Neural Networks - dW overall derivative

I have a question about calculating dW from Conv Layer. The formula goes as follows:


When we calculate this gradient in the code - we add values with respect to every example. I thnik that this gradient should be divided by number of examples at the end (as it always is because of cost function). Is it true and there is an mistake in this programming assignment or I just misuderstood something?

Please provide the Week and Assignment number.

Assuming this is from C4 W1 A1:

If the gradients will be used in full-batch gradient descent, then typically they would be divided by m.

If the gradients will be used in stochastic GD or with smaller batches, then the 1/m may not be used.

Particularly for SGD, it’s a real-time process and we don’t know what ‘m’ to use.

If m is a constant, then 1/m is also a constant, and you can compensate for not having 1/m by using a smaller learning rate.