Creating own datasets or imagesets

So far in the labs we have been only using datasets provided by the course .

However i wish to know how i can make my own custom datasets to practise importing data, and then applying algorithms to them.

Please let me know how I can do this.

Hi @Ajitesh_Kumar_Singh

welcome to the community and thanks for your question.

To create your own data set I see two possibilities:

  1. either you measure data on your own e.g. by creating a real life experiment (this can also involve creating a big survey in the internet according to your problem definition). Regarding measured data e.g. see also this thread where a cool project is discussed by @ai_curious. Here data acquisition involves also taking pictures in interesting situations which you can then use for training.

  2. you create artificial data, e.g. using a model and simulating results + adding noise, see also here some inspiration.

You also can also combine the two mentioned methods and e.g. augment your measured data e.g. with applying additional noise as described in this thread.

In general 1) is usually way more effort than 2). Of course often it is needed to tackle a specific problem. However if you only want to work with data to learn and apply some ML methodology, 2) can be the faster way for you.
Hope that helps.

Best regards
Christian

1 Like

It’s also very easy to find real datasets to experiment with.
Try an internet search for “UCI Machine Learning Repository”, for example.

That‘s right!

Here also some links to Kaggle & scikitlearn data sets: How to get datasets for practice?

Please let us know if this answers your question, @Ajitesh_Kumar_Singh.

Best regards
Christian

1 Like

TensorFlow also provides some ‘built-in’ datasets, with the advantage that they often feature in tutorials or guides that provide starter code to load and learn from them.

2 Likes