Convolutional implementation of sliding window

Noam_Mizrachi · October 1, 2021, 12:23pm

Hi,

In weeks3, lecture “Convolutional implementation of sliding window”, Andrew explained how to save computational expensive, by running the network on the entire image instead of a small window in it.

I don’t understand how is it possible to run the network with 28x28 image if the inputs needs to be 14x14? Is it mean that the network architecture needs to be changed?
Can any1 explained the implementation details?

Thanks!

ai_curious · October 1, 2021, 3:02pm

In sliding windows you cut up the original input image and pass the subregions into a classifier one at a time. The algorithm runs through a complete forward pass for each subregion. In a convolutional implementation, you pass the entire image in and instead slide the kernel over it during the convolutions. The convolutional neural net runs one forward pass only. Both approaches can produce the same number of outputs from the same original input. But because all the outputs are produced in parallel, instead of in series, the convolutional approach runs faster. It also has some other advantages regarding object number, location, and size, which you will uncover when you get to the YOLO discussions. And it’s a ‘yes’ to the question about network architecture needing to be changed. The input shape is the size of the entire training (or test) image instead of the subregion size, the kernel shape is the size of the filter you want to use (often an odd number of pixels, so maybe not 14 exactly) and your layers are now 2D convolutions (and pooling and activation etc).

Noam_Mizrachi · October 1, 2021, 7:11pm

Thanks for the reply.

I’d like to focus on the network architecture change.
So if we’re changing the network input shape (to be the whole image instead of small subregion), it means that we need to train the model on full image size in the first place, right? If so, isn’t it mean the the sliding window is just an automatic process as part of training of the first layers?
Thanks

ai_curious · October 1, 2021, 10:50pm

Maybe refer to the lecture video for the respective network architectures. In particular the section from about 6 minutes through about 8:40. Sliding windows takes a 14x14 region input, produces a single floating point value as output. To cover a 16x16 image, you would repeat that 4 complete times. The Convolutional network takes the entire 16x16 as input and produces 4 floating point values from its single forward pass. You can see that the outputs are equivalent, but how they produce them, the layer shapes and computations, are completely different. I don’t think it is correct or helpful to think of sliding windows as part of a Convolutional architecture. They are apples vs oranges , or maybe cats vs non-cats

Topic		Replies	Views
[Question] C4W3: Convolutional Implementation of Sliding Windows Convolutional Neural Networks week-module-3 , coursera-platform	10	47	July 26, 2024
Convolutionally Implementation of Sliding window Help Need Convolutional Neural Networks coursera-platform	6	717	January 5, 2023
Convolutional Implementation of Sliding Window with zoom Convolutional Neural Networks coursera-platform	1	442	June 5, 2023
C4 W3, While doing object detection, should all the images in train, valid and test dataset have the same shape? Convolutional Neural Networks coursera-platform	6	484	April 10, 2023
Questions about sliding window and YOLO Convolutional Neural Networks coursera-platform	4	744	January 12, 2022

Convolutional implementation of sliding window

Related topics