Convolutional Implementation of Sliding Window with zoom

In the traditional implementation of sliding window, it is mentioned that the size of the sliding window needs to be changed, and that makes sense as the image can have the object present, just in different size (eg- car is further from the camera thus it seems smaller).
In the convolutional operation, why are we not doing this as we can still encounter the same problem right?

1 Like

Changing the size of the window means doing many inspections of the same image (per layer) many times with different window sizes and that is quite computationally expensive to do running over and over again same layer. But this can be done from one layer to another with different window sizes as you progress in the model. Theoretically you can pass the same image to different independent layers with different windows sizes but the purpose of the CNN is to extract higher level and lower level features as you move on deeper in the model.

You could experiment with your application which window size could be beneficial to different layers in the model, but I dont think you can give one do all rule for all image detection problems.