why i use yolo to detect multiple objects in image what i think is to use a pretrained model like vgg16 and modify it’s ouput layer to have neuron for each class i have with a sigmoid activation to be able to classify multiple objects in image and have multiple neurons for cooredinates for each object so why we use yolo?
Have you actually been through all the lectures about Object Detection and the YOLO programming assignment? YOLO does a lot more than just tell you that there are multiple objects in an image: it actually draws bounding boxes around them and gives you confidence estimates of the type of object identified in each bounding box. It’s also very efficient in terms of the computational resources required to perform that task at runtime. How would you do that with VGG in the way that you described? VGG is good at telling you there is a cat in the image, but how would you get it to draw a bounding box around the cat?
This idea come to me because prof andrew when explained object localization he used the same approach by having data set containing labels and coordinate of the object so i think why it’s not applicable for multiple objects in image?
If I understand correctly you’re proposing that the network architecture (shape of the output layer) is directly dependent on the number of objects it can detect in the input image. That is, the final layer that can detect 2 objects has 2x the number of neurons as an architecture designed to detect 1. 10 objects 10x. 100 objects 100x. That sounds challenging. You also need a way to tell the loss function which labels correspond to which objects. YOLO has such a mechanism designed in. VGG does not. Thought experiments are great, but you need to go deeper in the consequences of your design choices. Or, since Keras has prebuilt VGG -16 and -19 models available ( VGG16 and VGG19 ) why not grab one and start to play with it. Figure out what you would need to change from the default to support multi-object detection. Let us know!