I can’t understand at all why this kind of machine learning algorithms are classified into unsupervised learning. Don’t they use the labels y to fit the model using some algorithm (like gradient descent)?
Yes, kind of, but not exactly. Here’s how I think of it…
Recommender systems use a distance function on features of either content (think product attributes) or population (the population demographics) to quantify similarity. You can use that distance measure to make recommendations such as “because people similar to you liked this, you might like it, too.” Or, “here’s something similar to what you have enjoyed/watched/purchased in the past.” But you’re not predicting the probability that a specific person will like a particular thing or act a certain way. Rather, you’re looking for internal structure or patterns in the features of the data itself.
I have also seen something similar used in healthcare. Based on many patient attributes such as age, social condition, tobacco history, blood pressure, identify a cohort or patient population. Then, based on medical histories for that cohort, display clinical outcomes. 25% developed diabetes, 34% had heart failure, 6% suffered from depression etc. It doesn’t predict whether a given patient will experience those conditions, it only shows what ‘similar’ patients have experienced in the past. A related application of the tools is to study clinical trials, and determine if a new candidate is ‘similar’ to patients already in or who have done well under the protocol. Again, it’s not predicting that a specific patient will do well, but comparing the patient’s characteristics with those who have done in the past.
Unsupervised learning provides a different, but still very useful, type of insight from data.
@Marcos_Quintas_Perez, I agree with your observation. The cost function that we’re minimizing looks an awful lot like linear regression. It does no harm to view this as a supervised algorithm - it’s rather a bit of both.