How to find specific datasets or models?

someone555777 · August 23, 2023, 9:08pm

I tried hugging face, kaggle and google dataset search and I found really hard the finding something specific. Can anybody share any tricks maybe how to do it effectively?

My current task for example:

I need datasets with any of contacts in any worldwide formats (phones, emails, links, adresses). It is really hard to find all types how users can write phones for example, with +1 on start, with spaces, commas, braсkets etc. Or, maybe already trained model.

I know, that I can do random data through different libs, but I am sure, that it will not be enough. Users can type really different things. We can even not expect something similar.

elirod · August 23, 2023, 9:34pm

Hi @someone555777

This kind of data is protected by GDPR and other data regulations around the word.

That might be explanation to unavailability of this kind of data.

An alternative is use the Fake lib to generate it.

someone555777 · August 24, 2023, 8:42am

As I said, the data that can be typed by themselves can be another format that are programmed in this libs. I need as much extended types of the data as can be.

And what about links, for example. It can really not differenciate when we make typo and not type space after dot between sentences for example.

elirod · August 24, 2023, 11:07am

Yep!

This is something we must get used to. Collecting and processing data is the hard part of the word data science my friend.

Many companies even pay a fortune for databases that they cannot acquire in their data lakes.

This is one of the challenges we will face on our journey

Topic		Replies	Views
Need advice regarding Dataset creation Natural Language Processing in TensorFlow week-3	3	190	August 4, 2023
How to Deal; with Unlabled dataset? AI Discussions ai-discussions	10	217	August 21, 2024
I need a dataset for model AI Discussions	9	77	October 6, 2023
Newbie Seeking Advice on AI Training Dataset Collection AI Discussions ai-discussions	0	41	March 28, 2025
Generating Synthetic Dataset Using LLM AI Discussions ai-discussions , project	6	198	January 7, 2025

How to find specific datasets or models?

Related topics