In collaborative filtering, when the feature vectors of the movies are unknown and we try to learn the features for each movies, given the parameters of each users, how were the parameters for each users obtained in the first place? Does it have anything to do with matrix factorisation?

I presume what you mean here is that, in effect, we are dealing with a ‘sparse matrix’, correct ?

Not clear on what you mean by this; In our database we have x_{[i]} number of users whom have rated x_{[i,y]} number of films.

That part is ‘given’. What we *don’t know* is how they’d rate all the movies they haven’t seen (the ‘holes’ in the swiss cheese matrix).

There actually any number of ways you can solve this problem (you can even do it with linear regression if you wanted to, but I don’t recommend).

Matrix Factorization (or in some cases also SVD) do a great job at this in filling in the brunt of the wholes with good accuracy and speed, and that is where this comes in.

He actually teaches the Math’s class here (haven’t taken), but when I was learning recommenders I found his video and found it useful:

Thanks @Nevermnd !

What I mean by the the parameters for each user *j* to give the rating to movie *i* are these:

and these are the feature vector for each movie:

My question is, in what scenarios would the parameters for each user to give rating to a movie be given? so that we can use the parameters to find out the feacture vector of each movie?

I hope I am understanding you right, but just to start recall our data looks something like this, what I mean by ‘sparse’.

And we are trying to fill in all the gaps. In a sense you are trying to find users with ‘similar interests’ and then estimate the missing parameters based on that.

*Note here, like user, I don’t know 63 or something has rated a *ton* of movies.

@slam22 Make sure you watch the video I sent-- It would be easier to answer any questions you have based on that as I took my ML studies elsewhere, so haven’t taken that exact class here and don’t have access to the materials.

@slam22, in real life, you typically wouldn’t know parameter values for w & b. In the video, Professor Ng starts out with the example of assuming that you did have w & b, just as a simplification to help you get the concept of how we could use that to figure out values for features x. But, as the video continues on, he covers the situation where you don’t know w & b (the more realistic case), and explains how you can solve for w, b, AND x.

@Wendy thanks on the follow-up with them; I recall this is similar to how I was taught to, but we went even further and simply estimating it than performing the whole blown factorization.

I once even tried running it (and learning the hard way about ‘compute resource constraints’), failing after hours running linear regression on it. Largest dataset I could run without everything crashing was around 120k-- Not good enough for the 10M we needed for our assignment.

I had to bootstrap and get creative. And finally the full factorization— *soooooo* much faster.