How to modify the code for object detection?

Hello, I already finished course a few months ago and I started making some practice. My question is, I don’t know how to modify Zombie Detection assignment for my project. Let me briefly describe the properties:

  • In zombie detection assignment we have one class, but in my personal project I have 4 different classes, which part of the code should be modified?
    (I believe the picture below is the part I have to rewrite but I just wanted to hear the thoughts of other people)

  • In zombie detection assignment we always have one object in the picture, in my project one to four different objects could be in the same picture, how should I approach it?

  • Since we have only zombies in picture, we did not give any label for the bounding boxes, what approach we can use for this purpose?

It would be very appreciated if you have can give any example project link that detects multiple objects in one picture, even though I searched a lot, what I found is either very old tutorials or using different techniques we did not learn in this lecture, thank you for your time.

Hi there,

Le me give you my thoughts on these.
The part of the code that needs to be edited seems right to me. You need to annotate the images with four object so the google annotation cant be used, maybe you can use another service to annotate them and have them ready as xml (annotations) +image object (search on google “image annotation multiple objects”). In this search you might also find how to do predictions with bounding box labels.

Another thing you can consider is to check out the YOLO model on DLS specialization, it gives info on detecting multiple objects and bounding boxes.

2 Likes

It is important to recognize that each value predicted by your system is an output of your neural network so the number of objects you want to detect determines the shape of the output layer. If you are just classifying one object, you can have a single output. If you need both class and location, now you need four additional outputs. If you need to detect multiple objects, you need at least 5 values for each object. Historically approaches to multi-object object detection have either subdivided the image (region based approaches) or introduced some kind of detector ‘grid’ (eg YOLO, SSD and MobileNet). To the best of my knowledge there is no one approach to rule them all. Each has strengths and limitations so your choice is driven by your project objectives and constraints. Do you need to run on-device mobile? Near real time? Is avoiding False Positives more or less important than average performance (MAP) ? These will help you decide what characteristics your architecture must incorporate.

2 Likes

Thanks for the answer, I did some research before asking and according to the article here best results can be obtained by using resnet-50 architecture. Just like the article my project is also related with obtaining information from technical drawings and I will search for image annotation as you suggest, again thanks for your time.

1 Like

To be honest I did not think very carefully on the aspects you just specified, before continuing I will consider these questions, thank you.