Mobile application for image classification

Project: mushroom classifier
I am building an edge application similar to an image classifier like the mobile application “picture this” or “picture mushroom”. ImageNet is unable to identify, or identify with enough specificity, various plants and mushrooms. I am using an iNaturalist dataset from a computer vision competition that took place in 2021 (if memory serves). It is roughly 224 GB and contains 1787841 images. My goal is to use that dataset to train a model to classify a picture of one of the 223 listed species of fungi, or “not a fungus”.

I structured the directory into subdirectories of all of the species of mushrooms, plus a subdirectory containing all of the specimen from other kingdom of life. Also, I did notice that there is now MobileNetV3, so I adjusted to use that (V3Large) architecture. Next, I got the code working on a smaller dataset as a toy example, noticed a few things, like overfitting after a number of epochs, and gained some intuitions. Finally, I made a few edits that seemed reasonable, ran my code on the large dataset, and went to sleep.

The dataset is pretty big. This is going to take a pretty long time running on my machine. I actually waaaas trying to use the terminal emulator in VSC to run the code. It actually seems to have stopped running overnight though. It was stuck at the end of the third epoch of adjusting the output layer.
So, for one thing, do I really need to adjust the output weights in accordance with the imagenet params or can I skip that altogether if I am going to unfreeze some of the previous layers?
Further, should I be running this somewhere besides the terminal emulator in VSC- maybe CMD or some other way? I’ve never run a script that took nearly this long and suppose I’m a little hackey in my current methods.
Do I even need all of this data? I feel like I do for the ID of different fungi, but for the “not a fungus” class, I’m not so sure. Is less sometimes more? Should I just focus on making this work with the larger volume of data or could I do well to shrink my dataset?
Does it make sense in this instance to still use a pretrained model? Would it make sense to train from scratch, or is imagenet still a good starting place? My intuition tells me that, even though I have a lot of data, I will have to iterate through training fewer times for similar results than I would if training from scratch. I am curious though if training from scratch might yield EVEN BETTER results than fine tuning. Unfreezing SOME of the layers at least improved performance in our lab example.
Finally… I’m wondering if we can do even better than MobileNetV3 architecture for this project…

openai’s CLIP is a zero shot model which can be used to for some classification tasks. in my experience it worked best for broad and few classes like “fungus or not a fungus” in your case, but it might not be so accurate for very specific classes or given a large number of classes. you should give it a try