i Think we should average the derivatives of the cost w.r.t each training example by adding them and dividing by the number of training examples.

But in the implementation of backProp in CONVnet by scratch i.e WEEK 1 first assignment of sonvolutional neural network

they consider just adding all the derivatives with each training example (Without divding by the number of training examples) to be true.

can anyone explain the reason for not dividing ?

OR is cost not average cost of all training examples (just total) in case of CNN?

Hi brabeem,

As far as I can see, the addition of the derivatives is concerned with calculating the gradients for a single training example. To understand this it may help to realize that backprop in a CNN entails a convolution of a filter with loss gradients, which involves addition over the loss gradients. This is explained here and here.

1 Like