Collaborative Filtering: Using per-item features

Would it be mathematically the same if we minimise J individually per customer?

Hello @Google_Google,

No. Let’s look at the error term of the cost function:


It is essentially a sum of individual user’s cost:

J = \frac{1}{2}(J_0 + J_1 + ... + J_{n_u-1}),

where J_j = \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2

Obviously, an optimized J does not guarantee that each J_j is also optimized. Moreover, an user’s w^{(j)} can affect another user’s through their shared x^{(i)}, so the optimized overall J is a compromised result.


This seems odd to me. Aren’t each user parameters (i.e. w and b) independent of the other users’?
If we minimize each user’s cost function independently, we should reach their lowest values. And their sum would then also be minimal.

If not, why not minimizing each cost function?
This should give us better predictions for each individual user.

Thanks for your support.

Hello @ljb1706,

No, they are not, instead, users that share some movies in their sets are dependent on each other during the training process, and it is a feature of being “collaborative”.

In this “world”, each movie is only going to have one x vector, so we won’t train one x for each user which is probably a necessary result of your approach that optimizes user-by-user.


Hi Raymond,
Yes, I understand that for collaborative filtering. But here I thought we were discussing the “per-item features” approach.
What do you think in that situation?

Laurent @ljb1706, can you define what is your meaning of “per-item features”? Each user has one set of separate movie vectors?

Since the OP hadn’t had a chance to elaborate it, please try to give a full description of such approach as you can, or if you think such approach comes from the lecture, please share the video name and the time mark, and I will watch it.

I would also like to know whether any information about the movies are given in such approach.

Sure. Here is the link:

That video is exactly describing the Collaborative filtering algorithm I have been talking about. What is the major difference between your understanding of it and my answers?

Did the video say each user is trained individually?

Thanks for following up on this :wink:
In the case of “per-item features”, we do have features describing movies.

I see.

There are a few points I must make for clarifying the purpose of the video:

  1. The approach you are suggesting is NOT collaborative Filtering.

  2. The video is there to ultimately introduce Collaborative Filtering.

  3. The video is not suggesting your approach. The evidence is that the lecture has never shown a cost function that includes only one user.

  4. Listing some features over there is probably only an intermediate step for us to transit from some “human understandable features” to “not-understandable features” generated by the collaborative filtering algorithm.

Knowing that the intention of the video is not to suggest a completely different algorithm which train each user individually with a set of given movie features, we can talk about it now.

Your suggested approach (because it is not the lecture’s suggested approach) is going to be problematic, because, in reality, most users have only rated very few movies out of the whole movie database. If an user only rated 4 out of the 100,000 movies in the movie database, and each movie has 10 features, then whatever model trained on such dataset of just 4 samples are going to be very inaccurate.

If you remember course 1, your approach would be fitting a model of 10 weights to just 4 samples, and that is very vulunerable to overfitting.

What do you think?

I get it and fully agree.
I just wanted to confirm my mathematical understanding that, as far as we only concern yourself with the first video and the described “per-item features” approach (though only meant to introduce collaborative filtering), in that case considering user independently would lead to same parameters.
Thanks Raymond.

No problem, Laurent @ljb1706.

However, I am not sure what you mean by “same parameters”. Same parameters as what?


I meant that, in that case, optimizing individual user cost functions would lead to the same w, and b parameters than optimizing a single cost function by summing over all users.

Thanks for clarifying, Laurent . :wink:

However, collaborative filtering and your approach will not train to the same parameter values for w and b. There is just no driving force for that to happen.

In collaborative filtering, users influence each other, whereas in your approach, there is no such influence, not to mention that collaborative filtering won’t possibly get to the same x that is given in your approach.

We can only keep talking about the difference in factors that affect the resulting w and b values, but never any driving force that keep them to have the same pair of values. That’s the problem. I bet that you can’t think of any single reason that they must be the same.

( :sweat_smile: Sorry for keeping calling it “your” approach. I could see why you would call it “per-item features” approach, but I am reluctant to follow because it seems to suggest there is such approach, but having no such approach was the point I tried to make.)