[Question] C4W3: Convolutional Implementation of Sliding Windows

mook · July 25, 2024, 1:21pm

I have a question about the convolution implementation of sliding windows.

I understand what the lecture pointed out, that by converting FC layers to convolutional layers, I’ll get a dimension as output that summarizes the information of each window instead of a single value, so I don’t have to propagate the sliding window sequentially, as well as I can share the overlapping area, thus I can reduce the computational cost.

But I’m not sure how it works in practice.
I’m wondering how it’s possible to have the sliding window propagate simultaneously, rather than sequentially.
More specifically, I’m wondering how ConvNet partitions the image into window sizes and propagates them simultaneously when the input sizes of the training and test images are different, even though only the FC layer has been converted to a convolutional layer.
For example, if I build an architecture where only the FC layers are converted to convolutional layers, will the network automatically work with windows of size as same as the training input images and stride=2 like in the video?
If so, won’t it partition the image into windows if they are the same size?
Or do I need to add a new layer that performs the window task when the network receives the input?

Perhaps my question isn’t clear enough.
So my question is, does partitioning into windows and propagating them simultaneously require any special action on the input layer?
Or is that just the way ConvNet’s convolution and pooling work, so no change is needed in the input, and I just need to change the FC layers to get the wanted output shape?

TMosh · July 25, 2024, 2:03pm

May I ask what leads you to believe this is true?

paulinpaloalto · July 25, 2024, 3:03pm

The point is that we do not need to implement sliding windows directly. The properties of convolutions are better both in terms of efficiency and flexibility. No change is needed in the input.

Also note the point of Tom’s question: all the architectures we deal with here require that the size and type of all images (training, validation, test or images when the model is actually deployed) be the same.

mook · July 26, 2024, 3:26am

This is where I started to think like that…

Nevermnd · July 26, 2024, 3:44am

In my mind, the sliding windows is mentioned as an ‘introductory concept’. Convolutions, in contrast, are able to look over everything in parallel all at once. I mean, personally, it is just a guess, but I think that is why they called it ‘YOLO’.

TMosh · July 26, 2024, 4:40am

Can you give the time mark for that slide?

mook · July 26, 2024, 4:43am

First, let me make sure I’m understanding the point correctly.
If I change FC layers to convolution layers, the convolution itself will give the same output as if I implemented sliding windows directly.
Is that correct?

If that’s correct, my question is, for the same 14 x 14 x 3 test image, does the architecture automatically add a yellow stripe to the image to border it, making it 16 x 16 x 3 as if it were padded?
(Because if it doesn’t, and propagates the original 14 x 14 x 3 size, then the output will be 1 x 1 x 4 instead of the 2 x 2 x 4 I want.)
So why does the way it works change, even though the input layer hasn’t changed?

mook · July 26, 2024, 4:46am

It’s 5 minutes and 26 seconds.
Maybe my English isn’t good enough.

mook · July 26, 2024, 4:48am

Then, do I just have to understand the concept and move on?

Nevermnd · July 26, 2024, 5:01am

My personal perception is… This is the way we understood filters for a long time… Because I was in digital imaging for a period.

I mean Prof. Ng. does not bring this up, but when I read it, I thought of, say, Floyd-Steinberg. [and there are many others]

Obviously they have come up with better methods for solving these problems, but it does have a history. IMHO, I think that is what he is referencing.

mook · July 26, 2024, 5:10am

It’s kind of hard… I see why Professor Ng didn’t mention it.
I will accept your perception and proceed with the next lecture.
Thank you.

Topic		Replies	Views
Sliding Windows vs Convolution Convolutional Neural Networks	8	558	January 22, 2022
Convolutionally Implementation of Sliding window Help Need Convolutional Neural Networks	6	716	January 5, 2023
Week3 Convolutional Implementation of Sliding Windows Video Convolutional Neural Networks	2	529	September 25, 2024
"Convolutional Implementation of Sliding Windows" Video Convolutional Neural Networks	3	548	December 15, 2022
Convolutional implementation of sliding window Convolutional Neural Networks	3	767	October 1, 2021

[Question] C4W3: Convolutional Implementation of Sliding Windows

Related topics