Hello,
I have few questions regarding pool size, kernel size and strides. For suppose if we have image size of 64 x 64 x , what shoulde be the size of kernel, pool size and strides approximately.
and how output shape is defined in folloing image. what is [(None, 64, 84, 1)] formatted is 64, 84, 1 image size or filetr size or pool size ?
Hey @sandeep_goshika,
I will try to break your questions into points and i hope it will help you.
First let’s start with Kernel size: The kernel size, also known as the filter size, is typically a square matrix used to slide over the input image during the convolution operation. Common kernel sizes are 3x3, 5x5, and 7x7. The choice of kernel size depends on the complexity of the features you're trying to capture in the data.
So for your example in case (64 x 64) common kernel sizes could be 3x3 or 5x5.
always remember hyperparameters need you to experiment with it till you find best combination that fits your case.
Second Pool Size: Pooling layers are used to downsample the spatial dimensions of the input data while retaining important information.
Common pool sizes are 2x2 or 3x3 and again remember it needs experiment.
Okay maybe you are wondering how reducing spatial dimensions would help you.
The Answer: reducing the spatial dimensions of the data, can lead to a reduction in the number of parameters and computation.
So for your example in case (64x64) common pool size would be 2x2
Finally Strides: Strides determine the step size at which the convolution kernel or pooling window moves across the input data. A stride of 1 means the kernel or window moves one pixel at a time, while a stride of 2 means it moves two pixels at a time. Larger strides result in greater reduction in spatial dimensions. In many cases, a stride of 1 is used for convolutions, and a stride of 2 is used for pooling to achieve downsampling.
Always remember that you need to build your model as quickly as possiable then start to tune to improve it. So in case i say the common kernel size or pool size it doesn’t mean it will help you in your case but we just start with common techniques to build our model then start tuning
Let’s now break the formatted shape which is " [(None, 64, 84, 1)]":
None: Represents the batch size and it can vary depending on how many samples you're processing at once during training or inference
64: This represents the height of an image (vertical dimension)
84: This represents the width of an image (horizontal dimension)
1: This represents the number of channels in this case it's grayscale image
I hope it helps you and you understand now
Best Regards,
Jamal