There is question in the quiz asking the benefit of using convolutional neural nets regarding “parameter sharing”. I chose “It allows gradient descent to set many of the parameters to zero, thus making the connections sparse”. I thought this was true because the gradient of max pooling layer does make the gradient quite sparse, and also the filters only apply to one small patch at a time.

However, the system said that this answer was wrong. Could anyone help me understand? Is my understanding on sparsity wrong/incomplete, or just that this sparsity is not related to “parameter sharing”?

As part of the programming assignment in week 1, you’ll implement both forward and backward passes of conv and pooling layers. It should help you see the difference between fewer parameters and sparsity in weights. A conv layer will have fewer parameters than replacing it with a Dense layer.

As far as the pooling layer is concerned, it influences the gradient calculation and not the actual weights of the conv layer by setting them to zero. Hope the computational graph from 1st course helps where the pooling layer (having 0 learnable parameters) follows a conv layer.

It’s important to realize that even though pooling layers do not have learnable (trainable) parameters, they do still pass gradients through during back propagation as Balaji described. In a max pooling layer, the gradient will only affect one of the weights in each segment covered by one step, but the other weights are not set to zero: they are simply not modified by the gradient, so that does not encourage sparsity. A zero gradient does not imply a zero weight, right? In the case of average pooling, the gradients will be even distributed over all the inputs in a given “step” of pooling.

Thank you so much both @balaji.ambresh and @paulinpaloalto ! I learned a lot from your comments. I am now clear that 1) the fewer parameters != sparsity in weights, and 2) not modifying the gradient is absolutely not changing them to zeros!

Thank you again for your clarification and avoiding many hours that could potentially be wasted assuming wrong things