How to measure the quality of a dataset?

By now we are worry about number of instances, levels of illumination and size of each file; but, What are your ideas about have this quality measure, in general terms?


Hi, @AlGo! :wave:

One great way to measure the quality of the dataset is to train the model and evaluate it. Look at the errors it made on the test set yourself, check the low confidence examples. If the quality of your dataset is subpar, you’ll see that a lot of your low confidence results are on the initially mislabeled samples.

Another thing you should pay attention to is whether your dataset is balanced or not; if not, you should make sure to handle it properly. Common techniques to handle imbalanced datasets are data augmentation, data generation, oversampling, and undersampling.