Question - Trying to understand how convolution operation works w.r.t input feature volume & output feature volume

Prathamesh_Joshi1 · March 17, 2024, 8:23am

Current Understanding - when we say 10 * 10 * 16 w.r.t. feature maps or kernels, it’s 10 * 10 as spatial dimension i.e. height * width & 16 represents number of such 10 * 10 feature maps or kernels
correct me if I am wrong in my understanding

now my doubt,
For example, If I have 10 * 10 * 6 input feature volume & I want 10 * 10 * 16 output feature volume (padding=same), if I am using a 3 * 3 spatial dimensional kernels, my questions are below,

What will be the kernel volume required? I am guessing its 3 * 3 * 16 i.e. 16 different kernels of 3 * 3 spatial dimensions, but not sure.
If it’s 3 * 3 * 16, how the convolution operation is taking place? like how we are getting 16 output feature maps by performing convolution between 6 (input features each of 10 * 10) & 16 (kernels each of 3 * 3)
Theory 1 - Is it like one 3 * 3 * 1 kernel convolve over all 10 * 10 * 16 input channels to give me one 10 * 10 * 1 output channel, then the next 3 * 3 * 1 kernel convolve over 10 * 10 * 16 input channels to give another 10 * 10 * 1 output channel, & we are staking them to get 10 * 10 * 2 output channels after using 2 kernels? So after using all 16 kernels we get 16 output channels is it the case.
If the Theory 1 is correct, how does the convolution operation takes place?
for sake of simplicity, Say we reduce the input channels to 10 * 10 * 2, channel 1 - Red, channel 2 - Green,
We reduce the filter to just one 3 * 3 * 1,

Now I know that when I put 3 * 3 * 1 filter on Red channel’s top left corner after performing convolution I get a single value as output, same will happen after performing convolution with same kernel on top left corner of Green channel, but now I have 2 values, one from each channel, I know that the output should be 1 value, the question is how we are getting from this 2 values to output 1 value? are we using some aggregation like avg/sum/max etc.?

some reference docs/links would be preferable if you could share,
thanks.

balaji.ambresh · March 17, 2024, 2:31pm

Please pick the correct week-x tag for your question.
Do look at the 1st assignment for week 1 where you implement the forward pass for a conv2d and pool layers. That should clear the doubts you have on this topic.

paulinpaloalto · March 17, 2024, 5:18pm

If the inputs to a given layer are 10 x 10 x 6, that means you have 6 “channels” in the input. So at the current layer each filter must match the number of channels on the inputs. So if you choose f = 3, then each filter at the current layer will be 3 x 3 x 6. And if you want to have 16 output channels from the current layer, then you need 16 of those 3 x 3 x 6 filters. Then you compute the output shape in the h and w dimensions by using the formula Prof Ng gave us in the lectures:

n_{out} = \displaystyle \lfloor \frac {n_{in} + 2p - f}{s} \rfloor + 1

Applying that with s = 1 and p = 0 we get:

n_{out} = \displaystyle \lfloor \frac {10 + 2 * 0 - 3}{1} \rfloor + 1 = 8

In that case, the output of the current layer would be 8 x 8 x 16.

Or if you wanted “same” padding, try p = 1 and you get n_{out} = 10.

Prathamesh_Joshi1 · March 18, 2024, 2:51am

Thank you all for the help, I have also tried to find out the answer based on quick code. below are the results,

Seems like we are performing sum of final convolution output from both channels to get a single number as output
Code & Screenshot is added here for reference.
nn_Conv2d.py (1.9 KB)

nn_Conv2D_results666×607 20.2 KB

Topic		Replies	Views
W4_Quiz_3D Convolution Convolutional Neural Networks	6	1440	August 7, 2022
How does convolution to fewer channels work Convolutional Neural Networks	1	454	May 28, 2023
Course 4[Week 1] Convolutions Over Volume - Output image Size Convolutional Neural Networks	6	598	May 22, 2021
Does the size of the filter have to be the same as how every many channels in the previous image? Convolutional Neural Networks	15	794	August 18, 2024
Course 4 Week 4 Quiz: Possible error in quiz Convolutional Neural Networks	2	538	August 29, 2022

Question - Trying to understand how convolution operation works w.r.t input feature volume & output feature volume

Related topics