Week3: Image size for objection detection and localization

I found the input image size for YOLO is 608*608 and it’s pretty big compared to ResNet and others taught in Week2.

Is the amount of train sets being one of the reason, that people use a pre-trained YOLO instead of training a new one?

YOLO requires a ton of computation to train. Image resolution is one driver, because the bigger the input image, the more grid cells and anchor boxes you’re going to want. The network in the code used for this course makes over 150K predictions per forward prop. With the full ImageNet class set that number is an order of magnitude larger. Additionally, the more parameters you need to train, the more training samples you need. In the original YOLO papers you will find a reference to them training for ‘a week’ on what was at the time a state of the art GPU.

1 Like