Week-3: Session - More label ambiguity


At time-stamp 2:48 “User ID merge example” -
I understand we can manually try to compare whether two datasets belong to same personnel – but how to perform this task in real-life ? say where there are thousands / millions of records to compare ?
Are there any tools / libraries to do that ?

Thank You.

The Prof is exampling how the merge can be done. You will have to have a smaller Labeled dataset. You will train the ML model with this labeled data & use the trained model to deal with larger dataset.

Hi @Jeet ,

Unfortunately, this is not my strongest field of expertise. By searching on google I found this interesting article, perhaps it would be useful to you. Depending on the amount of data, using humans to do this task could become impossible, in the video they mentioned a supervised learning algorithm instead.

Best, Arturo