Hi,
Toward the end of the Triplet Loss video, Andrew talks about how some companies might use “north of 100 million images” for training.
Also, prior to that, we talk about how we need to try and pair A,P,N in a way where A is similar to N.
My question is, how do these companies put these training data together?
I don’t really know the answer here, but I think it’s probably based on scraping the web or perhaps (in the case of governments) the use of existing databases (e.g. applications for passports or driving licenses and the like).
Here’s an article about the startup Clearview.
Another famous case of the successful deployment of FR technology is by the government of the PRC. A search will find articles about that as well, e.g. this one.