MobilenetV2-Yolo for object detection

talinez · February 1, 2022, 2:11pm

Hello, i am going through the course videos again and i am finding it a bit hard to differentiate some things and i am looking for clarification.
Consider for object detection i am using a mobilenetv2-yolo network. In this case mobilenetv2 will act as a backbone and yolo will be the detection head.

Now if i think about what is happening : first my input is injected in mobilenet the output of mobile net is feature maps but do they include also a classification ? meaning will mobilenet say that i have a dog and a cat in my picture then yolo will localize them and calculate the combined loss of classification and localization ?

Thank you so much

balaji.ambresh · June 3, 2022, 8:46am

It would help if you provided the link to the model mobilenetv2-yolo. I’d also recommend posting this question on tensorflow forum to improve the odds of getting an answer.

anon57530071 · June 4, 2022, 4:52am

YOLO architecture roughly includes three layers. Those are,

Backbone -> Neck -> Head

The role of “Backbone” is to extract feature maps. Default backbone for YOLO is Darknet, which is an original one. Your mobilenetv2 is used in this area.

“Head” works on a classification and localization. “loss” is calculated in here as you pointed out. Different version of YOLO is used in here.

The key function for “object detection” is “Neck”. Please see this picture from a paper, Feature Pyramid Networks for Object Detection.

Typical output from backbone is like this. If we use this output for object detection, there are some challenges to detect different size of objects, like small objects, medium objects and large objects.
So, we need to have some “scaled” solution to cover different size of object effectively. That’s a role of “Neck”. Here is a figure from FPN paper again.

FPN creates multiple size of feature maps for better object detection.

YOLO v3 used Darknet for backbone, FPN for Neck, and YOLOv3 for Head. Since then, many different solutions have provided by different researchers. Someone uses ResNet for backbone, and someone uses PAN for Neck. YOLO head is still major, but there are different versions.

I suppose, in your MobileNetV2-YOLO, MobileNetV2 is used only for feature map creation. Then, appropriate Neck creates multiple scaled feature maps. Then, YOLO head classifies / localizes them.

Hope this helps.

Topic		Replies	Views
Object detection NN Convolutional Neural Networks coursera-platform	3	485	May 30, 2023
C4W3 Yolo Algorithm Convolutional Neural Networks coursera-platform	4	614	June 25, 2021
Object detection vs Semantic segmentation Convolutional Neural Networks coursera-platform	3	634	October 15, 2021
Week 3: Car Detection with Yolo Convolutional Neural Networks coursera-platform	4	712	September 22, 2021
Same Dataset for Classification and Object Detection? Convolutional Neural Networks coursera-platform	2	529	March 13, 2023

MobilenetV2-Yolo for object detection

Related topics