MobilenetV2-Yolo for object detection

Hello, i am going through the course videos again and i am finding it a bit hard to differentiate some things and i am looking for clarification.
Consider for object detection i am using a mobilenetv2-yolo network. In this case mobilenetv2 will act as a backbone and yolo will be the detection head.

Now if i think about what is happening : first my input is injected in mobilenet the output of mobile net is feature maps but do they include also a classification ? meaning will mobilenet say that i have a dog and a cat in my picture then yolo will localize them and calculate the combined loss of classification and localization ?

Thank you so much
:slight_smile:

It would help if you provided the link to the model mobilenetv2-yolo. I’d also recommend posting this question on tensorflow forum to improve the odds of getting an answer.

YOLO architecture roughly includes three layers. Those are,

Backbone -> Neck -> Head

The role of “Backbone” is to extract feature maps. Default backbone for YOLO is Darknet, which is an original one. Your mobilenetv2 is used in this area.

“Head” works on a classification and localization. “loss” is calculated in here as you pointed out. Different version of YOLO is used in here.

The key function for “object detection” is “Neck”. Please see this picture from a paper, Feature Pyramid Networks for Object Detection.

image

Typical output from backbone is like this. If we use this output for object detection, there are some challenges to detect different size of objects, like small objects, medium objects and large objects.
So, we need to have some “scaled” solution to cover different size of object effectively. That’s a role of “Neck”. Here is a figure from FPN paper again.

FPN creates multiple size of feature maps for better object detection.

YOLO v3 used Darknet for backbone, FPN for Neck, and YOLOv3 for Head. Since then, many different solutions have provided by different researchers. Someone uses ResNet for backbone, and someone uses PAN for Neck. YOLO head is still major, but there are different versions.

I suppose, in your MobileNetV2-YOLO, MobileNetV2 is used only for feature map creation. Then, appropriate Neck creates multiple scaled feature maps. Then, YOLO head classifies / localizes them.

Hope this helps.

1 Like