Just to Clarify the "Object Detection" Video

Hey guys! I have created a post called “Object Detection in C4 W3”. But i forgot to mention 1 question that i have.
Is it correct if the window size and the stride are small, then there will be a computational cost problem and will the accuracy be a little bit worse if the window size and the stride are large?

Hey @BryanEL,
I guess you can find Prof Andrew explicitly answering your question in the video entitled “Object Detection”. I have borrowed the following from the video’s transcript:

“Now there’s a huge disadvantage of Sliding Windows Detection, which is the computational cost. Because you’re cropping out so many different square regions in the image and running each of them independently through a ConvNet. And if you use a very coarse stride, a very big stride, a very big step size, then that will reduce the number of windows you need to pass through the ConvNet, but that courser granularity may hurt performance. Whereas if you use a very fine granularity or a very small stride, then the huge number of all these little regions you’re passing through the ConvNet means that means there is a very high computational cost. So, before the rise of Neural Networks people used to use much simpler classifiers like a simple linear classifier over hand engineer features in order to perform object detection. And in that era because each classifier was relatively cheap to compute, it was just a linear function, Sliding Windows Detection ran okay. It was not a bad method, but with ConvNet now running a single classification task is much more expensive and sliding windows this way is infeasibily slow. And unless you use a very fine granularity or a very small stride, you end up not able to localize the objects that accurately within the image as well.”

So, to sum up, small window size leads to computational issues and large window size leads to accuracy issues. However, this changes when we use the convolutional implementation of sliding windows. In that case, we can use smaller window sizes with a reduced computational cost. I hope this helps.


Thanks for your help! I don’t know why I thought that performance issues and accuracy issues are the same thing. My bad!