In the video, Pooling Layers, Prof Ng says that there are no parameters to learn. Why is that? Is it
- technically impossible
or - a design choice for computational efficiency which is the purpose of pooling?
In the video, Pooling Layers, Prof Ng says that there are no parameters to learn. Why is that? Is it
Probably this, they just condense information in a smaller data size!
There are literally no trainable parameters in a pooling layer: all it does is apply the chosen algorithm to the inputs (either average or max) in a fixed way based on the filter size and stride that is specified. There are no “weights” in the pooling layer. But that does not mean that backward propagation does not pass through that layer: the gradients from the later steps project backward through the pooling layer. They are either applied as averages to all the inputs or only to the max elements of the input layer, depending on the definition of the pooling layer. That will be covered in the first assignment for C4 W1.