Convolutional Sliding window example only works for a stride of 2

Nilanjan_Banik · June 8, 2023, 11:52pm

A similar question was asked here https://community.deeplearning.ai/t/varying-the-strides-with-convolutional-implementation-of-sliding-windows/87871?u=nilanjan_banik with no satisfactory answer yet.

So, I am re-posting this question again. Andrew slides the 14x14 window on the 16x16 test image with a stride of 2. If I slide it with a stride of 1, I should end up with a 3x3 final shape (ignoring the number of channels) based on 1+ (16-14)/1 = 3. But simply passing the 16x16 image through the trained architecture results in a 2x2 final layer as shown in the video.

So how do we understand this discrepancy?

If I train a network with a 14x14 input shape, will that restrict the stride of sliding the window depending on the shape of the test image on which we will make predictions?

rmwkwok · June 9, 2023, 2:18am

Hello @Nilanjan_Banik,

Here is the way I would suggest you to think about it, please stay with me for a while:

Forget about sliding at a stride of 2 over the 16 x 16 input first. Forget about it for now.
The first Conv2D layer of the model is set to a stride of 1. Verify this yourself by reasoning how the shape changes from 14 x 14 to 10 x 10 (in the first row), and from 16 x 16 to 12 x 12 (in the second row).
Verify the rest of the shapes in the slide, so that you can tell yourself, no matter what the input shape it is, all operations are the same with the same stride settings, and yet they produce the outputs as shown in the slide
Realize that there are 2 ways you can make predictions on an image larger than designed. (A) Manually slicing 14 x 14 out of the image once at a time, and if we do it at a stride of 2, we will end up with 4 14x14 images and 4 predictions. (B) Not to slice, and therefore there is NO stride of 2, and just put the 16x16 input into the model and it gives also 4 predictions as a result.
If you use method A with a stride of 2, you get 4 predictions.
if you use method A with a stride of 1, you get 9 predictions.
if you use method B (no need to specify any stride), you get 4 predictions.

Don’t mix up method A and B. If you use method B, you don’t (can’t) specify any stride of sliding window, and you get 4 predictions as a result of the model’s configuration (as verified by you in my above step 1 and 2).

In my above step 0, I asked you to forget that thing because mixing up method A and B is wrong.

As said above, if you use method B, there is no such thing as the stride of sliding window. If you use method A, however, there is no restriction on that stride.

Cheers,
Raymond

Nilanjan_Banik · June 9, 2023, 2:40am

Thanks @rmwkwok , your response is very helpful! I agree with all your points.

So, to summarize, method A and Method B are just two distinct ways of making predictions on a larger image. Method A is computationally more expensive since it repeats the same computations multiple times, while method B is much more efficient in the sense that it deals with all the computations through the network in a single pass.

rmwkwok · June 9, 2023, 2:49am

Hello @Nilanjan_Banik,

You are welcome, and your summary is perfect.

Cheers,
Raymond

Aditya_Ranganath · January 7, 2024, 12:41am

@rmwkwok What happens with different window sizes? Here we are assuming sliding window size of 14x14. What if we want a window size of 10x10 or 20x20? The CNN is trained to only handle images of size 14x14. Doesn’t sliding window algorithm have to work on images of many sizes as not just 14x14. Would we have to train as many CNNs as window sizes we are using?

Thanks in advance for the reply!

Topic		Replies	Views
Convolutional Implementation of Sliding Windows Convolutional Neural Networks coursera-platform	3	558	January 15, 2023
Varying the strides with Convolutional Implementation of Sliding Windows Convolutional Neural Networks coursera-platform	1	565	June 2, 2022
DLS C4W3 4th video clarification needed please Convolutional Neural Networks coursera-platform	7	506	June 10, 2024
Lecture Question - Convolutional Implementation of Sliding Windows Convolutional Neural Networks coursera-platform	3	545	June 7, 2022
YOLO and Sliding Window Stride Convolutional Neural Networks week-module-3 , coursera-platform	2	55	November 9, 2024

Convolutional Sliding window example only works for a stride of 2

Related topics