In this picture f2 is 5 but would it not have to be 10 to match the number of channels in a1 ?
Hey @Stephano_Cotsoradis,
Well the size of the filter (also known as the kernel) in a convolutional layer does not have to be the same as the number of channels. The size of the filter and the number of channels in the previous layer are independent parameters in a convolutional neural network (CNN).
In an earlier video it said that they had to be equal. I might be misunderstanding something though.
I would think that if consistent with this diagram showing th 6x6x3 * 3x3x3 then since since a1 is 37x37x10… then f2 should be 10 since it is the size of the filter ?
Well the “number of channels” in the input and the “number of channels” in the filters (kernels) should typically be the same for proper convolutional operations. This ensures that each filter can operate on all channels of the input data and that’s what picture number 2 shows.
However, the “number of filters” can indeed be different. You can use a different number of filters to extract different features from the input. Each filter is responsible for capturing specific patterns or features, and having a variety of filters allows a convolutional layer to learn a diverse set of features and that’s why you get different numbers at first image so “10” and “20” at first image not the number of channels but number of filters.
For example, in a convolutional layer, you might have:
- Input Image: (Height, Width, Number of Channels) e.g., (64, 64, 3) for an RGB image.
- Convolutional Filters: (Filter Height, Filter Width, Number of Input Channels, Number of Output Channels) e.g., (3, 3, 3, 64).
In this case, you have 64 filters, each with a depth of 3 to match the input’s 3 channels. Each filter produces one channel in the output feature map, resulting in an output with 64 channels.
I hope it makes sense now and feel free to ask for more clarifications.
Cheers!,
Jamal
yes that makes perfect sense ! so in the original question I sent in the screenshot shouldnt the f2 be equal to 10 instead of 5 ?
Well 5 here is the size of the kernel at f1 it’s 3x3 kernel size and at f2 it’s 5x5 kernel size that’s why you got lower dimension “17x17x20” after applying the formula with stride equals 2
I will try to make it clear for you.
-
Kernel Size (Filter Size):
- The “kernel size” or “filter size” refers to the dimensions of the convolutional filter (kernel) used in a convolutional layer.
- It determines how many pixels the filter considers at a time when sliding over the input.
- Common kernel sizes are 3x3, 5x5, or 7x7, and they are specified as (height, width).
-
Number of Channels:
- The “number of channels” represents the depth or the number of feature maps in the input data.
- In the context of an RGB image, there are typically three color channels: Red, Green, and Blue (RGB).
- For grayscale images, there is only one channel.
- In the input tensor, the number of channels is usually denoted as the last dimension (e.g., (Height, Width, Number of Channels)).
-
Number of Filters:
- The “number of filters” (also known as “number of output channels”) refers to how many individual convolutional filters are applied to the input.
- Each filter is responsible for learning a set of spatial patterns or features from the input.
- The number of filters determines the depth or the number of channels in the output feature map.
Ohhhh… So the f2 means 5x5 and the 10 (for the number of channels) is just automatically inputted as 5x5x10 for the size and then 20 is the number of total filters ?
f2 means 5x5 yeah and then you got 10 cause you already used 10 filters from previous layer and 20 at third layer is because we used 20 filter at layer 2 and so on
Thank you @Jamal022
You’re welcome!
Happy Learning!!!
Hey, thanks for the clarification. I wonder how that applies to 3D images. For example, a medical image of a brain is usually 3D and has a size of 256*256*256.
, with the size of information equally carried in three dimensions. Taking a filter size of [3*3*256]
and ‘same’ and maybe 128 filters will give us an output of size [256*256*128]
in one of the 3D convolutional layer. Does that affect the neural network’s performance if the filter size in the third dimension is too big?
@QuantumQuest, this thread has been cold for 11 months, and I am not certain whether Mentor Jamal022 is still active here.
You might have better luck with a reply if you start a new thread for your question.
I’ve never built a system that takes medical images as inputs, so I’m just speaking on general principles here. The point is that if the image is in 3 dimensions, then the channels will be the 4th dimension, right? So what does each voxel look like (the 3D equivalent of a pixel)? How many values does it have? In 2D images, we typically have 1 channel for greyscale images, 3 channels for RGB images, but there are some that include alpha to get RGBA and then there is CMYK, although I’ve never worked with that.
So if the input is 256 x 256 x 256 and each voxel has 4 channels then each image tensor is 256 x 256 x 256 x 4. And also note that the filters will have 3 spatial dimensions and a channel dimension as well. And the convolution process will involve “stepping” the filter through in all three spatial dimensions.
PS - also note that I edited your post and used the “{}” formatting tool for all your expressions that include *. That’s because * means italic in markdown, so your original version was not very readable.
PPS - also as Tom notes, this is a dormant thread and your topic really is something new, so if you want to discuss further it probably would help to open a new thread with a title that expresses the fact that it’s about 3D medical images, so that other people can find it and benefit from the discussion.
@paulinpaloalto I found this useful:
Or this is your typical radiology format, and it is difficult to decode.
*Or I guess I should just edit, I don’t think I’ve ever tried the raw file. But I know it ‘ain’t JPEG’.