Object Detection in C4 W3

Hey guys! I just started to watch several videos and stopped in “Object Detection” video. So far i am excited since I always curious about Object Detection. Although i do need some clarification for what i’m going to summarize just to make sure i understand it.

Summary:
Basically, you start off with an image where i’ll take the same input image from the “Object Detection” video. Basically, the input image are 2 cars with traffic light and the street. First, you start with a window size where later you put that window in the image making it into a region. Then, that region which i’ll call Small Crop Image, you put it as an input and walk through the ConvNet, and finally output the y. Then, you move to the next position with some stride you put, where you have another small crop image and take that as an input and walk through the ConvNet again, and the algorithm output the y. You do that until it covers all the position. Then, you repeat but with larger window size. And you repeated the same step until you have all small crop images and put that as an input and walk through the ConvNet. Finally, you can have larger window size and make sure to go through all position in the image and once you have all your Small Crop Image, you put that as an input and walk through the ConvNet, and output the y.

Is the summary supposed to be correct? Because i felt that something was odd…
Anyways… Thanks Ahead!

Hey @BryanEL,
Your summary is correct, though I would like to mention some points just to make sure that you are having the same interpretation as Prof. Andrew wants you to have.

Running the sliding windows with different window sizes is not an inherent part of the algorithm, i.e., in general, you only run the algorithm with a particular window size. Prof. Andrew has used multiple windows of different sizes in the lecture video entitled “Object Detection”, just to depict the pros and cons of both large and small window sizes.

In the very next video, entitled “Convolutional implementation of sliding windows”, you will find Prof Andrew to be using only a particular window size, and not more than a single window size.

So, your summary is absolutely correct in terms of describing the steps followed by Prof Andrew in the first video, but slightly differs from being the summary of the algorithm, and the difference, I have presented in my answer. I hope this helps.

Regards,
Elemento

Hey @Elemento , Thank you for checking my summary and mentioned the point that we only need to use a particular window size. Thank you for the comprehensive answer!