Why convolution operation?

Soumitra_Das · September 25, 2022, 2:01am

In normal neural network (perceptron):
Suppose I have given a vector x and I want predict y with perceptron. Here, our basic assumption is y = W^{t}*x i.e, we assume y will be a linear combination of the x components, which is a natural thinking.

In convolutional neural network:
Here in case of image we want something more than perceptron, which will not only predict y as some linear combination of the pixel values (X) but it will also treat the neighboring pixels together, for that purpose we use this convolution operator.

So, my question is why this convolution operation works to achieve our goal? I need a detailed information of the mathematical intuition behind this. why not any other operation? Is there any proof or something that this convolution operation is the best? same questions for pooling.

alvaroramajo · September 25, 2022, 9:19am

Hi, @Soumitra_Das !

To answer that, I’ll highlight the main differences between both approaches.

Convolutions are not densely connected, so not all input nodes affect all output nodes. This gives convolutional layers more flexibility in learning. In addition, fully-connected layers become dependent on the shape of the train images which might not be a good thing for the overall model.
Moreover, the number of weights per layer is a lot smaller in CNN, which helps a lot with high-dimensional inputs such as image data. These advantages are what give CNNs their well-known characteristic of learning features in the data, such as shapes and textures in image data. FCs have a larger number of weights, which means they are highly prone to overfitting, whereas a single convolution operation reduces the number of parameters quite significantly which makes it less prone to overfitting.

Soumitra_Das · September 25, 2022, 3:48pm

You are saying the benefits of convolution layer over fully connected layer, which are mainly:

each cell of the next layer only depends on some small portion of the previous layer.
Number of parameters is very less.
Although my main question was why we calculate the cell value by sum of each cell (previous layer) multiplied by its corresponding weights, like we use W^{t} * x in perceptron.
By the way, it’s clear to me now. it is just a replica of the W^{t} * x.

Christian_Simonis · September 25, 2022, 4:25pm

In addition, also with respect to other learners who read this and want more information on convolution.

You might wanna take a look at this thread: How to Calculate the Convolution?

Best regards
Christian

Soumitra_Das · September 28, 2022, 8:00am

Hi, @alvaroramajo,
Can you please tell me why we use max/average pooling? How it helps? of course it will reduce the number of parameters. But what is the main reason to introduce this idea?
Can you suggest me some paper or books where I can see the mathematical details behind this…

alvaroramajo · September 28, 2022, 8:26am

That’t basically the whole idea of pooling. Convoluting the inputs all the way to the final layers becomes computationally prohibitive with certain input sizes, so reducing the dimensionality is a necessity.

This paper (2011) may be one of the first to address this type of solutions, although this one discusses and comparates several methods (2020).

Topic		Replies	Views
C2W1 Graded Quiz Convolutional Neural Networks in TensorFlow week-module-1	2	510	February 16, 2023
Convolutional Neural Network, Activation functions AI Discussions	6	265	May 12, 2022
Pooling operations: why not adjust parameters through gradient descent? Convolutional Neural Networks week-module-1 , coursera-platform	2	22	August 2, 2024
Question on the benefit of CNN: sparsity? Convolutional Neural Networks coursera-platform	3	559	December 12, 2022
Convolution in CNN vs Convolution Operator Convolutional Neural Networks coursera-platform	3	668	June 7, 2022

Why convolution operation?

Related topics