How much images to collect for dataset to train a pre-trained model.
There is no âone size fits allâ answer to a general question like this. The only general way to answer that is âenough to get the training to work successfullyâ. Of course that depends on the nature of your problem and there are other design choices that you need to get right in order for things to work (e.g. the type and architecture of the network you use).
Hereâs a recent thread that actually shows some experimental results with dataset size for a particular problem, although it is dealing with convolutional nets. Those are a more advanced topic that is covered in DLS Course 4, but the same principles for evaluating the performnce of a network apply. At least that may give some concrete ideas for how to approach answering this question in a given situation.
In the thread linked above, I tried a simple experiment on an untrained model where I ran it over and over with different number of inputs from a given overall training set. It was easy and relatively quick to do because the model architecture was quite small and the dataset was only 15,000 records. Two caveats.
First, I was not using any transfer learningâŚthe model was trained from scratch each time on its own input subset. I did this in part because that was the question I was exploring; how much labelled training data would I need to train a model from scratch. And secondly, I did it because I could; each training run executed in roughly a minute. It took far longer to screen capture and upload the images to this forum than to do all the training runs. If your model takes a week to train, you probably donât have the luxury to figure this out using my naive, brute force approach.
The good news is that transfer learning exists to leverage and improve upon an investment already made in model training. The article linked by @Christian_Simonis in that thread obtained good results from only a handful of training inputs. And he previously provided a link to a more sophisticated approach than I took, again leveraging transfer learning, based on measuring how much uncertainty each additional training input removes.
So my takeaway for an untrained model is âit dependsâ but donât be surprised if it takes on the order of magnitude 10^4 inputs. For transfer learning, the answer is still âit dependsâ. It depends on how many layers are frozen and how many are being retrained (or maybe the architecture is being extended?). It depends on how similar the new training inputs are to the original inputs. Iâm sure there are more variables. Nevertheless, expect to need far less than the 10^4 suggested by my non-transfer learning case study. Perhaps as few as order of magnitude 10.
Hope this helps
By the way, if you do some across some quality articles or do some of your own experiments, please share.