How much images to collect for dataset to train a pre-trained model

How much images to collect for dataset to train a pre-trained model.

There is no “one size fits all” answer to a general question like this. The only general way to answer that is “enough to get the training to work successfully”. Of course that depends on the nature of your problem and there are other design choices that you need to get right in order for things to work (e.g. the type and architecture of the network you use).

Here’s a recent thread that actually shows some experimental results with dataset size for a particular problem, although it is dealing with convolutional nets. Those are a more advanced topic that is covered in DLS Course 4, but the same principles for evaluating the performnce of a network apply. At least that may give some concrete ideas for how to approach answering this question in a given situation.

In the thread linked above, I tried a simple experiment on an untrained model where I ran it over and over with different number of inputs from a given overall training set. It was easy and relatively quick to do because the model architecture was quite small and the dataset was only 15,000 records. Two caveats.

First, I was not using any transfer learning…the model was trained from scratch each time on its own input subset. I did this in part because that was the question I was exploring; how much labelled training data would I need to train a model from scratch. And secondly, I did it because I could; each training run executed in roughly a minute. It took far longer to screen capture and upload the images to this forum than to do all the training runs. If your model takes a week to train, you probably don’t have the luxury to figure this out using my naive, brute force approach.

The good news is that transfer learning exists to leverage and improve upon an investment already made in model training. The article linked by @Christian_Simonis in that thread obtained good results from only a handful of training inputs. And he previously provided a link to a more sophisticated approach than I took, again leveraging transfer learning, based on measuring how much uncertainty each additional training input removes.

So my takeaway for an untrained model is ‘it depends’ but don’t be surprised if it takes on the order of magnitude 10^4 inputs. For transfer learning, the answer is still ‘it depends’. It depends on how many layers are frozen and how many are being retrained (or maybe the architecture is being extended?). It depends on how similar the new training inputs are to the original inputs. I’m sure there are more variables. Nevertheless, expect to need far less than the 10^4 suggested by my non-transfer learning case study. Perhaps as few as order of magnitude 10.

Hope this helps

By the way, if you do some across some quality articles or do some of your own experiments, please share.

1 Like