Well, as Andrew taught in the Convolutions over Volume video that we get 2d output for a convoluted RGB image considering a single filter.
i.e. 6x6x3 * 3x3x3 = 4x4
Why can’t the output be 4x4x3?
Hello @pushkarp6
This is how the convolution over volume is defined.
In this case you have 6x6 image with 3 layers (r,g,b) and you are using 3x3 filter with 3 identical layers thus summing on each step 3x3x3 = 27 number yielding a flat 4x4 image
you can define a new operation and call it differently where on each step you would sum 9 values 3 times keeping three layers. the only question is why ?
hth
Hii @pushkarp6 ,
Basically reason behind doing these convolutions is to achieve main goal of extracting features and intuitvely when we will combine the representations over all the 3 channels than only we can get best representation of the extracted feature becuse in general analogy with humans also, if you see any image than you don’t observe its features on different scales but you do as a whole to build abstract idea of patterns, colors etc. That’s my intuition! Also mathematically, @Viktoriia reply is perfect and thought provoking!
@Viktoriia Thank you so much for the explanation!!
The last statement is exactly where I am stuck
@kushaldev Great explanation.
Thank you so much!!!
@pushkarp6 my pleasure! You are welcome! Good luck for the rest exercises.
@pushkarp6 always a pleasure to think and discuss about possibilities and ideas!