Description:
I am currently working on detecting defects on images using an Anomaly Detection algorithm. The defects range from small dots (5x5 pixels) to big defect areas (500x100 pixels) on an image with geometric structures. The ‘good’ images are overall very similar with (in my opinion) only small variations especially at the edge of the image. The original image is big: 5120x5120 pixels.
Problem:
With the current algorithm, I am reducing the image size to 2048x2048 pixels. Most of the defects are still visible to a human. So far, the results are rather sobering. The current F1 score is only at 55,54%.
When looking at the anomaly image of the inference result, I can see that even the small defects are being detected as an anomaly. However, the whole image has usually about the same anomaly score as the threshold or is slightly below it, thus resulting in many false negatives.
Questions:
I do not have a lot of experience with anomaly detection, so before fruitlessly trying to improve the algorithm I wanted to ask what are the most likely causes and what are possible solutions?
Is the cause to the large amount of false negatives the big variety in defect sizes? Or is the problem that even in the ‘good’ pictures, there is still a lot of variety?
Anomaly detection is only trained with ‘good’ images. Intuitively I’d say training only with ‘very good’ images will not help either, because then many of the ‘good’ images will be labeled falsely as anomalous. => Do I have a chance in improving the algorithm or is a data set with too much variety in even ‘good’ images not suitable in anomaly detection?
Can data augmentation help with anomaly detection? After all, for the training only the good images are being used.
Is ‘tiling’ in an effort to improve detection of very small defects possible with anomaly detection?
Without taking a look at the actual pictures, it’s hard to judge remotely. But if you have many false negatives, you rather want your model to learn „more“, how an anomaly looks like, meaning:
a more complex model could potentially make sense to learn „anomalies“. You might wanna try: less drop out or less regularization. See also: Training set error?
the threshold can be tuned to improve false negatives. In this case this would represent a trade-off w/ the false positives in general which would worsen in return…
there are several strategies, e.g. you try to get better, more and „more realistic“ normal labels. You could also think of applying some artificial noise (e.g white noise) to increase the robustness in total.
There are cases where tilting collides with your business problems (e.g. think of tilting a 9 or 6 too much if you to number classification). But if tilting or flipping pictures is OK from your domain point of view, it might be worth a try to help the model to generalise and learn what the defects are actually about. See also here some other techniques for augmentation:
What does „being detected as anomaly“ actually mean here? After all, I understood that your problem is rather too many false negatives (FNs) and the algorithm did not identify the anomalies correctly as such… how about the false positives (FPs)?
I am asking because I would be interested in how you chose your threshold.
Also: Did you check an ROC analysis w/ AUC?
I would highly recommend to take a look at it. It might also help you in dimensioning your threshold well, considering FPs and FNs.
Note: a good friend of mine did his PhD in defect detection and reconstruction. Feel free to take a look at these repos: