Struggling with point on collaborative filtering

In the “Collaborative filtering algorithm” video, Andrew says:

By the way, notice that this works only because we have parameters for four users. That’s what allows us to try to guess appropriate features, x_1. This is why in a typical linear regression application if you had just a single user, you don’t actually have enough information to figure out what would be the features, x_1 and x_2, which is why in the linear regression contexts that you saw in course 1, you can’t come up with features x_1 and x_2 from scratch. But in collaborative filtering, is because you have ratings from multiple users of the same item with the same movie. That’s what makes it possible to try to guess what are possible values for these features.

I’m struggling to understand this point. Why does the fact that we have parameters for multiple users allow us to guess the appropriate features? Why wasn’t this possible in linear regression?

Mathematically, the difference is that the output ‘Y’ values are a matrix - there are multiple outputs for each row, not just a single output.

This gives us another dimension, which allows backpropagation to learn the X values as well as the weight values.

Thanks for the mathematical explanation. Is there an intuitive explanation as well? I wasn’t able to find any intuitive reasoning in the lecture.

I think I covered both the math and the intuition.
I’m not aware of any other materials on this.