Collaborative filtering can be seen as a linear regression model in steroid.
Due to the fact we dont count with X, so we are technically doing 2 linear regressions at the same time? one where we find the values for X (movie features) and after that we try to find the values of Wj (weights for each subject) where both parameters Wj and Xi need to minimize J(w,x,b)?

So it’s like applying one linear regression then the next one for the next parameter and it’s a repeting cycle to minimize J(w,x,b)?

Yes. Collaborative filtering can be linear regression, but with extra complexity. Instead of explicit input features, collaborative filtering learns both user preferences (Wj) and item features (Xi) concurrently, aiming to minimize the loss function iteratively. This iterative can be thought as two linear regressions simultaneously where the parameters influence each other, to minimize J(w,x,b).

@jjdniz Important to note: While the CF problem can be seen/taken as a linear regression problem, in reality you will find other models/methods (i.e. SVD (singular value decomposition)) to be much faster and produce better overall results.

Second, keep in mind in contrast to a traditional type regression problem, your data matrix here is sparse (i.e. it has a lot of ‘holes’ with no data, like Swiss cheese) you have to fill in.

Collaborative filtering is like “linear regression on steroids” because it does a lot of the same things but is more complicated.

Things that are like linear regression:

Predicting what will happen: Both ways try to guess what will happen (like a movie review) by looking at the information that is given.
The goal of both is to find the best model values by minimising the cost function.

Important Differences:

Matrix of Missing Features (X): In normal linear regression, there is a feature matrix (X) that has details about each item. When this feature grid is missing or not full, collaborative filtering often has to deal with it. We need to use the data we already have to figure out these hidden features.

Optimization at the same time: As you said, we basically do two regressions at the same time:
Finding Movie Features (X): Based on user reviews, we try to find hidden features that best describe the movies.
Finding User Preferences (W): We guess what users want for these hidden functions based on their likes.

Iterative method: Both X and W are optimized using an iterative method. We keep our estimates of X and W up to date to keep the total error J(w, x, b) as small as possible.

So, collaborative filtering takes the main idea of linear regression (which is to guess what will happen and make sure mistakes are kept to a minimum) and applies it to situations where there isn’t enough information. This makes it a stronger and more complicated method.

i have an issue with this method
we are implement regression to estimate wx +b = y, or relation between x and y
but we dont know what x is, to estimate x we use w, but to know w we estimate x
so if we intialize both randomly, isnt w and x trying to optimize for random values, its minizing the error but x cannot be random

In collaborative filtering, we’re dealing with a chicken-and-egg problem: we’re trying to estimate both the user preferences (W_j) and the item features (X_i) simultaneously, without explicit knowledge of either.

Random initialization of both W_j and X_i can indeed lead to suboptimal solutions or even convergence issues, as you pointed out. To address this, initialization strategies, regularization techniques, and optimization algorithms are employed to guide the learning process and prevent the model from converging to random values.

Please do elaborate
I think its random because say we start with random X and W
Each Wj iteration tries to optimize for building
Xj.w =y, but Xj used is random so Wj doesn’t go in a good direction
Now we iterate Xj using Wj which as mentioned before not going in a good direction
Whats the point in optimization when we dont even have the independent variable, what are we even building towards, we cannot reuse there parameters for prediction at all

I just need a explanation which relates to the link i provided

Initial weights here are always randomized. This is very similar to how the weights in a neural network must be initialized.

The key difference in this model vs. a simple linear regression is that the output isn’t a single value. It’s a matrix that contains a set of ratings (the labels) for each movie for each user.

This creates a solution space that allows for learning both the features X and the weights W to minimize the cost for this output matrix (which contains all of the user ratings for all of the movies).