I tried hugging face, kaggle and google dataset search and I found really hard the finding something specific. Can anybody share any tricks maybe how to do it effectively?
My current task for example:
I need datasets with any of contacts in any worldwide formats (phones, emails, links, adresses). It is really hard to find all types how users can write phones for example, with +1 on start, with spaces, commas, braсkets etc. Or, maybe already trained model.
I know, that I can do random data through different libs, but I am sure, that it will not be enough. Users can type really different things. We can even not expect something similar.
As I said, the data that can be typed by themselves can be another format that are programmed in this libs. I need as much extended types of the data as can be.
And what about links, for example. It can really not differenciate when we make typo and not type space after dot between sentences for example.