Networks in Networks and 1x1 Convolutions

Hi Sir,

We had doubts in the lecture Networks in Networks and 1x1 Convolutions, can you please help to clarify ?

  1. At 2:50 minute, we cannot understand this statement, One way to think about a one-by-one convolution is that it is basically having a fully connected neural network that applies to each of the 62 different positions How its related to fully connected network ? and what is 62 different positions ?

My understanding is since all the 32 input numbers are connected to different filters like filter 1, filter 2, filter 3…but for the same one activation neuron output

  1. What is non trivial computation? …this words often used in the lecture

  2. At 5:38 mts, we cannot understand the below statement. How it learn more complex function & non linearity ? can u please help to understand the below statement ?

The effect of a one-by-one convolution is it just has nonlinearity. It allows you to learn a more complex function of your network by adding another layer,
the inputs 20 by 20 by 192, and outputs 20 by 20 by 192.

1 Like

For 1., Pretty sure the 62 was a mistake, what was meant was 32, which was the number of channels in the original input.

Now to explain the thing, to simplify lets consider that we’re using 4 1x1 filters (unlike what the 32 filters used in the video), while the original input is lets say 6x6x5, so we take a slice of 1x1x5, or as you’d say, 5 different input numbers.

How a 1x1 convolution actually works is you’re passing that particular input to another layer with 4 units (since we’re using 4 filters), with a ReLU activation as stated in the video. So essentially you’re looking at this network:

A fully connected one layer network with 5 inputs and 4 units.

For 2, a non trivial computation is any useful computation we have, like the 1x1 convolution with multiple channels and filters.

For 3, it introduced non linearity because the activation we use is non linear function, and this specific non linearity is why neural networks can learn stuff in general. If we match the number of channels with number of filters, we’ll essentially learn more complex function in the network without changing any shape and applying any real convolutions.

Feel free to ask anything you don’t understand here.


Hi SIr,

Regarding third answer, we had couple of doubts. can u please help to clarify ?

  1. How matching number of channels & filters without changing shape helping to develop more complex function ? How it help ?

Well it’s not particularly matching both help that. You can have different filters and channels. Matching them just lets you have the output be of the same shape as the input. You can use any number of filters though, depending on your use case. it’s the behaviour of 1x1 convolution as a fully connected layer which helps it learn more complex functions

Would it be 36 different positions?