welcome to the community and thanks for your question.
To create your own data set I see two possibilities:
-
either you measure data on your own e.g. by creating a real life experiment (this can also involve creating a big survey in the internet according to your problem definition). Regarding measured data e.g. see also this thread where a cool project is discussed by @ai_curious. Here data acquisition involves also taking pictures in interesting situations which you can then use for training.
-
you create artificial data, e.g. using a model and simulating results + adding noise, see also here some inspiration.
You also can also combine the two mentioned methods and e.g. augment your measured data e.g. with applying additional noise as described in this thread.
In general 1) is usually way more effort than 2). Of course often it is needed to tackle a specific problem. However if you only want to work with data to learn and apply some ML methodology, 2) can be the faster way for you.
Hope that helps.
Best regards
Christian