Deep Learning Specialization, Course 4, week 1, Quiz. Question 8:
Because pooling layers do not have parameters, they do not affect the backpropagation (derivatives) calculation.
I am a bit confused, why this is False? In my opinion pooling layers have no parameters, so we can’t take derivatives relative to parameters. I will be very grateful if anyone could explain
This question trips up a lot of people. The question is not “Does backpropagation affect the pooling layers?” It does not affect them, because they don’t have parameters. But read the question again: it is asking in effect “Do the pooling layers affect back propagation?” Yes, they do: they don’t have parameters, but they do have derivatives. The derivative of every function in the forward propagation affects back propagation by the Chain Rule, right? Or to put it in more colloquial terms: the gradients have to propagate through the pooling layers: it’s not a NOP, right?
Derivatives relative to
pooling layers are always 1?
No, the pooling layers are not the identity function, right? For a max pooling layer, all the backwards gradients will get applied to the maximum input. For an average pooling layer, the gradients will be averaged and applied equally to all the inputs. But that does not necessarily imply the gradients are all == 1, right?
Note that we don’t actually have to implement any of this, since by this point we’re using TensorFlow and Keras. They take care of all this “under the covers”. You can google “back propagation pooling layers” to find more explanations if you want to dig deeper. As I recall, there’s a good article on StackExchange that you should find with that search.