HI, I have a doubt , Why do we increase the channel depth(feature channel) after downsampling in pooling operation? what is the significance of this operation . what will happen if we do not increase the depth after pooling .
Before answering your query, I am assuming that we all agree on the fact that the pooling operation doesn’t increase/decrease the depth. It is the convolutional operation, in general or some form of reshuffling like PixelShuffle, that could increase/decrease the depth.
Now, coming back to your query, here, the depth represents the number of channels. Assuming the convolution operation to be at the centre of my answer, if we want the number of channels to be say 128, we would require 128 different filters. Now, these different filters can intuitively be thought of as they are looking for different features, like horizontal patterns, vertical patterns, etc. In practice, these features which the filters are looking for are much more non-trivial.
So, when we increase the depth, it basically means that we are trying to learn many more (distinct) features, which ultimately help the model make the final predictions. The explanation that I have provided is more of an intuitive one rather than a theoretical one.
You can resolve your query with a simple experiment. Make a simple network with a couple of layers. Don’t increase the depth as you go down the network. You will find that your model won’t perform as good as it will perform when you will increase the depth as you move further down the network. I hope this helps.
Thanks Shahid for bringing this up,
Hi Elemento @Elemento ,
I wonder based on what we are suggesting the channel size?
e.g. 3 → 8 → 16 as happened in one of the examples in course?
Welcome to the community.
The course discusses a ton of examples, so I am not exactly sure as to which example you are referring to. But if in this example, 3, 8 and 16 refers to the number of channels, then yes indeed. In the first layer, you are trying to learn 3 different features (assuming my intuitive explanation above), in the second layer, 8 different features, and so on. I hope this helps.
The number of output channels you define as you go deeper into the network is just based on what works. Everything here is “experimental”: as Elemento mentioned in an earlier reply, you can try using fewer channels and see what happens. As Prof Ng discusses at many points in the lectures, it makes sense to learn from what has worked in other problems, meaning you start by choosing an architecture that has shown good results on what you hope is a similar type of problem. And then you see how that works and modify the architecture if the results are not good enough in your new case.