Question on Model Architecture for small object Detection

I am trying to build a CNN for detecting 1 type of small objects.
In detail: I want to detect dead mites on an white plate (there is also other waste). There can be only 1 mite but sometimes also 10-200.

My problem: I need a model with a simple feature extractor (mites are not that complex) and a bounding box predictor that can output as many bounding boxes as the model finds.

in the courses and on the internet i only find very compley models like fast RCNN + restnet50 or 100. But i only have one class with limited features. Which models can you reccomend or how would you approach this?

here you can see an image with aprox. 15 mites.

Thanks in advance.

Im not sure but maybe Yolo model without non max surpression, what do you say @ai_curious. But normally speaking such an algorighm would be complex, we have some examples of using pretrained models doing this at the tensorflow advanced specialization you can have a look at.

It might help us who are new to your issue if you could highlight the mites in the image you show. Are they the shiny things that look like roasted coffee beans?

I don’t have any personal experience with trying to solve that kind of an image analysis problem, but the only algorithms we’ve seen in DLS that can recognize and localize multiple objects in an image are YOLO (as Gent says) and U-Net for Image Segmentation. Both of those are pretty complex algorithms and non-trivial to train, but perhaps you can take a Transfer Learning approach.

1 Like

Varroa destructor

Aptly named. That picture is from one of my hives that I was tardy in treating a few years ago. It didn’t survive the following winter. It’s a pain to find and count the mites and determine whether a treatment is needed, though both of the pictures in this thread have so many it’s an easy call. What’s harder is when there are fewer mites and more other debris, either benign like pollen and propolis or some other negative sign, like waste from wax moth or small hive beetle larvae.

Before zeroing in on the architecture we might need to know some other requirements, like are you planning to deploy on a mobile device so beekeepers can check right in the apiary. Would you consider including other diagnoses as well? Do you really need the computational expense of object detection, which requires bounding box coordinates, or is it enough to say ‘needs treatment’ or ‘does not need treatment’ which is an easier machine learning task.

1 Like

I like this approach as well, whats the point of putting a bunch of bounding boxes around the image, if mites can be removed you could remove all at once I guess not 1 here and 1 there…

Just for background, when you see the mites on the bottom of a hive, they are almost always already dead. But what you really care about are the ones you can’t see, because they are on the bees or the larvae up in the hive. So you count the number you see, and use it as a proxy for how many are in the hive altogether. Then decide whether to apply a treatment, or not. Treatments are expensive for the beekeeper and somewhat destructive to the bees, so you only want to do it when really needed.


For those interested in the problem, this article has some very helpful background on the Varroa Destructor problem in the US. As discussed there, visual inspection of the hive bottom board is just one, and not necessarily the most accurate or best, approach to quantifying mite load in a hive. There is a big market for an accurate, easy to use solution.

The signs of mite damage- How to identify progressed varroosis? – Bee Informed Partnership.