YOLO and Sliding Window Stride

O_Sub_Kwon · November 9, 2024, 4:11am

Hi All

I have a question regarding how YOLO actually implement its grid (3 by 3 grid in lecture video).

From this picture, we can see that, suppose we have a ConvNet (with all fully connected layers rewritten as convolutional layers) trained on 14 by 14 images and apply that to 16 by 16 images, we will get 2 by 2 output. By this logic, if we apply the same ConvNet to 18 by 18 images, we will get 3 by 3 output.

That is to say, the above picture corresponds to a sliding window of stride 2.

But the claim in the “bounding box prediction” lecture video seems to be that, if you input a 3 by 3 grid of 14 by 14 image (42 by 42 in total), you should get 3 by 3 image. This is different from the calculation above.

So my question is, if I apply a ConvNet trained on small window size to a larger images, how is the output size determined? And how should we convolutionally implement a grid if we truly want to start from 42 by 42 and arrive at 3 by 3?

O_Sub_Kwon · November 9, 2024, 4:41am

Update: after playing with some numbers, it seem that the stride of the sliding window is determined by the number of pooling layers. Here, the pooling layers are 2 by 2. If there is one such pooling layer, the stride of the sliding window is 2; if there is two such pooling layer, the stride of the sliding window is 4.

Alireza_Saei · November 9, 2024, 9:00am

Hi @O_Sub_Kwon

The output size is determined by how convolutional and pooling layers process the input spatial dimensions. Convolutional layers keep spatial relationships, but pooling layers reduce the size by a factor related to their stride. In the example you provided, the 2x2 pooling layers cause the output grid size to shrink by a stride equal to the cumulative effect of each pooling layer, thus, two such layers would reduce the output size more than one. For a 42x42 input aiming to get a 3x3 grid output, you’d need to consider pooling strategies and adjusting padding, strides, or convolutional kernels.

Hope it helps! Feel free to ask if you need further assistance.

Topic		Replies	Views
Window size in convolutional implementation of sliding windows Convolutional Neural Networks week-3	10	631	January 8, 2024
Convolutional Sliding window example only works for a stride of 2 Convolutional Neural Networks	4	585	January 7, 2024
Queries regarding YOLO and Sliding window Convolutional Neural Networks week-3	10	89	February 28, 2025
Week 3 Convolutional Implementation of Sliding Windows Convolutional Neural Networks	4	545	July 25, 2021
Convolutional Implementation of Sliding Windows Convolutional Neural Networks	3	554	January 15, 2023

YOLO and Sliding Window Stride

Related topics