Hello! I am training a computer vision model to recognize 3 categories (custom dataset) using YOLOv4. For the configuration file I am using the default one provided by the Darknet implementation on Github. I train my model for 6000 epochs as suggested by the official implementation (nb of categories * 2000 → 3 * 2000 = 6000). I have added a negative sample as suggested (images not containing our desired objects with empty txt files) so that the model learns what is not part of categories. However, no matter how much I increase the dataset nor what I do the mAP doesn’t exceed 75%, I need it to reach at least 85% to yield better results. Any idea on how I could boost my training?
That is for sure the most asked question in AI world and the most difficult to answer: how to make your model perform better. It really depends on the problem you are addressing, the data you have, the computational resources at hand…
Anyway, I would start with the data. How many samples of your data do you have (images in this case)? How distributed are those images? Is there any class with very few examples? I don’t know if this is something you’ve already done, but I would suggest to make a good exploratory data analysis at first.
Once you’re done, try with different data augmentation methods, depending on your data. That can make you get a little better mAP. I’ve never used yolov4 but yolov5 has a well implemented evolutive search for this augmentation parameters.
These are typical suggestions for general purpose deep learning training, but maybe if you provide a deeper insight of the problem you’re trying to solve we can help you.
Thank you for your reply!
I am trying to detect three species ( Goats, Sheep and Cattle). I am using 1000 pictures per category and 3000 pictures for the negative sample. I divided the dataset into 50% positive sample and the other 50 for negative sample. I am not sure if that’s enough. For data augmentation methods, yolov4 comes with some which I am already using (mosaic, horizontal and vertical rotations…). I hope this gives you more insight, if more is needed, please let me know.
Again, thank you for your reply.
Well, just a thousand images per class might be too few. Can you get more samples?
Anyway, how is the accuracy on each class? Is it balanced in all 3 classes? How is the loss function in training and validation?
Concerning the samples: I am having trouble finding labeled datasets and manually labeling pictures is time consuming (though I labeled more than half of the dataset, the other half I took it from the Google Open Image dataset and had to manually correct multiple labels).
Concerning the accuracy per class:
class_id = 0, name = Goat, ap = 71.88% (TP = 130, FP = 31)
class_id = 1, name = Sheep, ap = 72.96% (TP = 213, FP = 57)
class_id = 2, name = Cattle, ap = 72.50% (TP = 194, FP = 85)
Generally: for conf_thresh = 0.25, precision = 0.76, recall = 0.71, F1-score = 0.73
The average loss is 1.27
When I said
How is the loss function in training and validation?
I was referring to the loss function graph (both training and validation) to see if there’s a sign of overfitting. Can you attach those curves?
Considering that my training stopped multiple times, I didn’t always save the chart but I assembled the rest and it gives an overview. (I remember looking at the chart every time, It never spiked somewhere)
I assume the little red line at the top is the validation loss. If so, it shows a great overfitting, which is a sign that the model is not generalizing well. The best way to overcome this problem is with more data and data augmentation techniques.
The red line indicates the mean average precision! I will add more data and see how the model performs! Thank you so much for your time
Always good advice.
@Zineb_Attaoui I suggest the simple answer to how to raise mAP is to raise the constituent AP scores, and the way to do that is reduce FP. You need to understand what is driving those high FP…is it poor localization or poor classification? Seems like data quality issues might suggest the latter, but in any case the data will tell the story more clearly once you drill down further.
Thank you for your response! This is my first object detection project so I am still learning. I manually checked the data using LabelImg. The boxes are correct as well as the labels. However I noticed an inconsistency and I wanted to double check if that would cause a confusion. Sometimes many sheep are under one same bounding box, other times each sheep (in the same picture) has its bounding box. Same thing for other categories. Is that a problem?
If the predicted box and label were truly correct, you wouldn’t be getting those high False Positives. High FP means either the predicted bounding box is bad (hence low IOU) or the classification is wrong. You need to look at the records with FP and compare with the groundtruth.
Thank you for your time. I will double check!