Hey guys! I just started to watch several videos and stopped in “Object Detection” video. So far i am excited since I always curious about Object Detection. Although i do need some clarification for what i’m going to summarize just to make sure i understand it.
Basically, you start off with an image where i’ll take the same input image from the “Object Detection” video. Basically, the input image are 2 cars with traffic light and the street. First, you start with a window size where later you put that window in the image making it into a region. Then, that region which i’ll call Small Crop Image, you put it as an input and walk through the ConvNet, and finally output the y. Then, you move to the next position with some stride you put, where you have another small crop image and take that as an input and walk through the ConvNet again, and the algorithm output the y. You do that until it covers all the position. Then, you repeat but with larger window size. And you repeated the same step until you have all small crop images and put that as an input and walk through the ConvNet. Finally, you can have larger window size and make sure to go through all position in the image and once you have all your Small Crop Image, you put that as an input and walk through the ConvNet, and output the y.
Is the summary supposed to be correct? Because i felt that something was odd…
Anyways… Thanks Ahead!