In the “Collaborative filtering algorithm” video, Andrew says:
By the way, notice that this works only because we have parameters for four users. That’s what allows us to try to guess appropriate features, x_1. This is why in a typical linear regression application if you had just a single user, you don’t actually have enough information to figure out what would be the features, x_1 and x_2, which is why in the linear regression contexts that you saw in course 1, you can’t come up with features x_1 and x_2 from scratch. But in collaborative filtering, is because you have ratings from multiple users of the same item with the same movie. That’s what makes it possible to try to guess what are possible values for these features.
I’m struggling to understand this point. Why does the fact that we have parameters for multiple users allow us to guess the appropriate features? Why wasn’t this possible in linear regression?