I was reading the article for which the link is provided in the course: A Simple Guide to the Versions of the Inception Network | by Bharath Raj | Towards Data Science and while reading I came across a point that states → As stated before, deep neural networks are computationally expensive. To make it cheaper, the authors limit the number of input channels by adding an extra 1x1 convolution before the 3x3 and 5x5 convolutions. I am not able to wrap my mind around why 1x1 convolution is added and how does it help in reducing the overall computations required?
Here is a picture comparison from the article, it will help in understanding what I am asking, clearly.