Does the size of the filter have to be the same as how every many channels in the previous image?

In this picture f2 is 5 but would it not have to be 10 to match the number of channels in a1 ?

Hey @Stephano_Cotsoradis,

Well the size of the filter (also known as the kernel) in a convolutional layer does not have to be the same as the number of channels. The size of the filter and the number of channels in the previous layer are independent parameters in a convolutional neural network (CNN).

In an earlier video it said that they had to be equal. I might be misunderstanding something though.

I would think that if consistent with this diagram showing th 6x6x3 * 3x3x3 then since since a1 is 37x37x10… then f2 should be 10 since it is the size of the filter ?

Well the “number of channels” in the input and the “number of channels” in the filters (kernels) should typically be the same for proper convolutional operations. This ensures that each filter can operate on all channels of the input data and that’s what picture number 2 shows.

However, the “number of filters” can indeed be different. You can use a different number of filters to extract different features from the input. Each filter is responsible for capturing specific patterns or features, and having a variety of filters allows a convolutional layer to learn a diverse set of features and that’s why you get different numbers at first image so “10” and “20” at first image not the number of channels but number of filters.

For example, in a convolutional layer, you might have:

  • Input Image: (Height, Width, Number of Channels) e.g., (64, 64, 3) for an RGB image.
  • Convolutional Filters: (Filter Height, Filter Width, Number of Input Channels, Number of Output Channels) e.g., (3, 3, 3, 64).

In this case, you have 64 filters, each with a depth of 3 to match the input’s 3 channels. Each filter produces one channel in the output feature map, resulting in an output with 64 channels.

I hope it makes sense now and feel free to ask for more clarifications.

yes that makes perfect sense ! so in the original question I sent in the screenshot shouldnt the f2 be equal to 10 instead of 5 ?

Well 5 here is the size of the kernel at f1 it’s 3x3 kernel size and at f2 it’s 5x5 kernel size that’s why you got lower dimension “17x17x20” after applying the formula with stride equals 2

I will try to make it clear for you.

  1. Kernel Size (Filter Size):

    • The “kernel size” or “filter size” refers to the dimensions of the convolutional filter (kernel) used in a convolutional layer.
    • It determines how many pixels the filter considers at a time when sliding over the input.
    • Common kernel sizes are 3x3, 5x5, or 7x7, and they are specified as (height, width).
  2. Number of Channels:

    • The “number of channels” represents the depth or the number of feature maps in the input data.
    • In the context of an RGB image, there are typically three color channels: Red, Green, and Blue (RGB).
    • For grayscale images, there is only one channel.
    • In the input tensor, the number of channels is usually denoted as the last dimension (e.g., (Height, Width, Number of Channels)).
  3. Number of Filters:

    • The “number of filters” (also known as “number of output channels”) refers to how many individual convolutional filters are applied to the input.
    • Each filter is responsible for learning a set of spatial patterns or features from the input.
    • The number of filters determines the depth or the number of channels in the output feature map.
1 Like

Ohhhh… So the f2 means 5x5 and the 10 (for the number of channels) is just automatically inputted as 5x5x10 for the size and then 20 is the number of total filters ?

f2 means 5x5 yeah and then you got 10 cause you already used 10 filters from previous layer and 20 at third layer is because we used 20 filter at layer 2 and so on

Thank you @Jamal022

You’re welcome!

Happy Learning!!!

1 Like