Question on the benefit of CNN: sparsity?

yw5aj · December 11, 2022, 12:59pm

Hi all,

There is question in the quiz asking the benefit of using convolutional neural nets regarding “parameter sharing”. I chose “It allows gradient descent to set many of the parameters to zero, thus making the connections sparse”. I thought this was true because the gradient of max pooling layer does make the gradient quite sparse, and also the filters only apply to one small patch at a time.

However, the system said that this answer was wrong. Could anyone help me understand? Is my understanding on sparsity wrong/incomplete, or just that this sparsity is not related to “parameter sharing”?

Thank you in advance!

Best,
Shawn

balaji.ambresh · December 11, 2022, 1:43pm

As part of the programming assignment in week 1, you’ll implement both forward and backward passes of conv and pooling layers. It should help you see the difference between fewer parameters and sparsity in weights. A conv layer will have fewer parameters than replacing it with a Dense layer.

As far as the pooling layer is concerned, it influences the gradient calculation and not the actual weights of the conv layer by setting them to zero. Hope the computational graph from 1st course helps where the pooling layer (having 0 learnable parameters) follows a conv layer.

paulinpaloalto · December 11, 2022, 8:20pm

It’s important to realize that even though pooling layers do not have learnable (trainable) parameters, they do still pass gradients through during back propagation as Balaji described. In a max pooling layer, the gradient will only affect one of the weights in each segment covered by one step, but the other weights are not set to zero: they are simply not modified by the gradient, so that does not encourage sparsity. A zero gradient does not imply a zero weight, right? In the case of average pooling, the gradients will be even distributed over all the inputs in a given “step” of pooling.

yw5aj · December 12, 2022, 3:52am

Thank you so much both @balaji.ambresh and @paulinpaloalto ! I learned a lot from your comments. I am now clear that 1) the fewer parameters != sparsity in weights, and 2) not modifying the gradient is absolutely not changing them to zeros!

Thank you again for your clarification and avoiding many hours that could potentially be wasted assuming wrong things

Topic		Replies	Views
Pooling operations: why not adjust parameters through gradient descent? Convolutional Neural Networks week-module-1 , coursera-platform	2	34	August 2, 2024
How sparsity connection helps to reduced the total number of parameter Convolutional Neural Networks coursera-platform	1	534	May 14, 2022
Week 1 Quiz question - re: smaller training sets Convolutional Neural Networks coursera-platform	4	887	January 14, 2025
The Basics of ConvNets Convolutional Neural Networks coursera-platform	3	926	May 15, 2021
Gradient descent with Max Pooling (DSL 4 - Week 1) Convolutional Neural Networks coursera-platform	2	721	September 29, 2022

Question on the benefit of CNN: sparsity?

Related topics