Collaborative filtering: R matrix calculation

Hi,

I am working on building a recommender system. I am struck in how to build the R-matrix & Y-matrix and not clear how the segregation of the vectors has been done in the practise lab.

Hello @akshay_singh4,

Can you be more specific? Can you share the source of (and even better the formula that defines) R-matrix and Y-matrix? You said “segregation of the vectors” - what vectors?

Please provide references to the terms so as to make sure we are on the same page from the beginning.

Raymond

Hi Raymond,

Thanks for the quick response. Please see the attached image. In this when we are

Feature Vector- X
W - Parameter vector
R - 1 if user has rated the movie, 0 if not
Y - Movie ratings.
y(i,j) = R(i,j) ( w(j). x(i)+ b(j) )

When I looked the dataset, A separate csv file for both the (R & Y) were already present. How can I construct these matrices from the datasets (large). Is there any efficient way to tackle this ?
####if it can be achieved using the sparse matrix to save the space then please share the implementation steps.

Hi Raymond,

Any update?

Hello @akshay_singh4,

So it seems to me you know well the definition for R and Y. What have you tried to construct and store a sparse matrix, or did you google for some methods to do so? Please share with us your research and we can see what to improve from there.

As you also mentioned, the assignment uses *.csv format for file storage, which does not seem to be what you are looking for.

Raymond

For example, if you just google “python construct sparse matrix”, then you can end up here, which provides you 7 ways to construct a sparse matrix. I believe scipy offers some methods to store a sparse matrix as well. You only need to google it out.

You probably want to study different sparse matrix formats, based on which or some experiments you will then figure out what works the best in your case.

It is important that you take the lead in development like this, share with us your progress, and we would be happy to give suggestions.

Raymond

Hi Raymond,

I tried adding mini-batch in same code & getting the below error. The same code without mini-batch is running fine.

AttributeError Traceback (most recent call last)
in
25 # Run one step of gradient descent by updating
26 # the value of the variables to minimize the loss.
—> 27 optimizer.apply_gradients(zip(grads, [X_batch, W, b]))
28
29 # Log periodically.

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py in apply_gradients(self, grads_and_vars, name)
424 ValueError: If none of the variables have gradients.
425 “”"
→ 426 grads_and_vars = filter_grads(grads_and_vars)
427 var_list = [v for (
, v) in grads_and_vars]
428

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py in _filter_grads(grads_and_vars)
1041 logging.warning(
1042 (“Gradients do not exist for variables %s when minimizing the loss.”),
→ 1043 ([v.name for v in vars_with_empty_grads]))
1044 return filtered
1045

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py in (.0)
1041 logging.warning(
1042 (“Gradients do not exist for variables %s when minimizing the loss.”),
→ 1043 ([v.name for v in vars_with_empty_grads]))
1044 return filtered
1045

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py in name(self)
1102 def name(self):
1103 raise AttributeError(
→ 1104 “Tensor.name is meaningless when eager execution is enabled.”)
1105
1106 @property

AttributeError: Tensor.name is meaningless when eager execution is enabled.

Hello @akshay_singh4,

Please share the code in text form. Also, did you google the error message “Tensor.name is meaningless when eager execution is enabled.” and what did you try based on the googled results? Perhaps you can add “apply_gradients” into the searching keywords?

Raymond

Hi Raymond,

iterations = 200
lambda_ = 1
batch_size = 32
num_batches = int(np.ceil(X.shape[0] / batch_size))

for iter in range(iterations):
    for batch_num in range(num_batches):
    # Get the batch data
        start = batch_num * batch_size
        end = min(start + batch_size, X.shape[0])
        X_batch = X[start:end, :]
        Y_batch = Ynorm[start:end, :]
        R_batch = R[start:end, :]

        # Use TensorFlow’s GradientTape
        # to record the operations used to compute the cost 
        with tf.GradientTape() as tape:
            # Compute the cost (forward pass included in cost)
            cost_value = cofi_cost_func_v(X_batch, W, b, Y_batch, R_batch, lambda_)

        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss
        grads = tape.gradient( cost_value, [X_batch,W,b] )

        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, [X_batch, W, b]))

# Log periodically.
if iter % 20 == 0:
    print(f"Training loss at iteration {iter}: {cost_value:0.1f}")

The same code is working without using mini-batch. I don’t know if i have done something wrong in the code.

I googled and try to disabled using
tf.compat.v1.disable_eager_execution()

& used the below code to get the cost

sess = tf.compat.v1.InteractiveSession()
J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 1.5)
J_val = J.eval()
print(f"Cost (with regularization): {J_val}")
sess.close()

but still getting the error. Sometimes it is leading to memory issue & the kernel becomes dead.

Hello @akshay_singh4,

Just to share my debugging steps :wink:

First, I make sure I can reproduce an important observation from you: that without mini-batch it works, by doing these minimial changes:

image

Then, I start to look at the difference between each pair of them by printing them out, and found that

  1. X is a tf.Variable that has a name, and,
  2. X[start:end, :] is a tf.Tensor that doesn’t come with a name. For example running X[start:end, :].name will give us the same error.

Finally I am kind of able to guess the the meaning of the error message: “Tensor.name is meaningless when eager execution is enabled.” It is saying that “Your operation needs a name but I do not have one because I am not supposed to necessarily have one”.

Therefore, apparently, apply_gradients is an operation that requires a name (from the doc it suggests to require a Variable) , so a quick fix is to convert it to a tf.Variable by adding this line that will also automatically assign a name:

X_batch = tf.Variable(X_batch)

That should do, but no promise that it is the best way of fixing it :wink:

Cheers,
Raymond

Hi Raymond,

Thanks a lot, this is working fine but my cost is not decreasing. Any suggestions :roll_eyes:

Hello @akshay_singh4,

Let me check one thing with you first. I assumed you had passed the assignment, but have you?

Raymond

Hi Raymond,

Yes i did, I am trying different ways to implement. So, my target is , as i have already mentioned in my previous blogs, to try the algorithm with large data using TensorFlow. Therefore i constructed a sparse matrix but unfortunately it did not worked with tensorflow & the dense matrix is leading to memory issues.

I am assuming you have gone through my previous posts.

Then, i thought of using mini batch on my data by tweaking the algorithm with mini batch . without using mini batch- the cost is decreasing & getting the output closer to desired output.

But with mini-batch, it is not decreasing.

image

Motivation behind using the mini-batch is to convert the sparse matrix into dense matrix during training & use it with tensflow.

Not sure if i am trying the right things. I am new to the data science & these questions might sound silly. I hope you won’t mind :grinning:

Hello @akshay_singh4,

Thank you for sharing your findings. Your findings is one of my motivations to continue :grin: :handshake:.

From your loss records, I think the mini-batch version is doing just fine, because it can reach a similar level as the batch version. Indeed, the mini-batch version is better in terms of training set cost. I think it gave you the impression that the cost of the mini-batch version does not drop because it starts off much better?


@akshay_singh4, I will try my best understanding your objective, but I ask you for please bearing with me because I read many posts everyday and sometimes my brain :brain: is not efficient in redrawing some details in one of the previous posts I have read some time ago. I know this is not the best, but can I rely on you for providing in-time hints on your goal if I miss something? :handshake: You manage to make sure everything is on the right track towards your goal and take the lead in your work. This means that, sometimes you might need to repeat your idea when I am stuck. For my side, I provide suggestions. If you agree, then we can continue. :smiley:


I want to analyze the source of the difficult situation you are in right now, and see if we are on the same page:

  1. You have a large matrix of R and Y
  2. Following the problem formulation of the assignment, you will have to build a large X and a large W, and the computer has to compute a large matrix when matrix multiplying the already very large W and X, and ofcourse, the R is already very large by itself.
  3. Your mini-batch approach slice only the necessary subset of X and subset of W, (and ofcourse, subset of R and subset of Y), so that at each mini-batch, the computer operates on much smaller matrices.

Do my analysis make sense?

Raymond

Hi Raymond,

Thanks a lot for helping out and i understand your situation. I will make sure from next time i will add all the necessary details.

Mini-batch

  1. Yes, I understand this, it starts-off with a decreased cost in comparison with Batch gradient descent but if you look at the cost & if i am training for higher iterations the cost is oscillating i.e increasing & decreasing. Is it the right behavior? should it not decrease after certain iterations?
  2. My predictions are incorrect & not even close to without mini-batch predictions. Ideally it should not happen?

Source Problem.

Yes you have correctly understood my problem & we are on the same page. One more point to add.
Due to memory issues, it is not possible to create the dense Y & R matrix also. I have to either limit my data or use sparse matrix.

Yes, this is the right behavior. In mini-batch gradient descent, as we approach the optimum, the oscillation will become apparent. The oscillation indicates that the step size of the model is a bit too large so that it jumps around the actual optimum. The MLS does not talk about how we deal with it, and I think the oscillation is not a big issue here too.

Ideally it should not. However, I want to look at something more quantitative. In order to compare the mini-batch and batch version, I expect for a more representative measurement. For example,

  1. we have reserved a set of samples that’s reasonably large (e.g. 300? 500?). it comes with the true ratings.
  2. we make prediction on those samples with the two models, and we get the predicted ratings
  3. we define a metric, such as the mean squared error (MSE), to compare the true ratings and the predicted ratings
  4. we have a MSE for the batch version, and a MSE for the mini-batch version to compare.

I hope we can conclude which version is better with a quantitative approach.

Raymond

Hello @akshay_singh4,

You may find this reply (or perhaps future subsequent discussion) relevant to your goal.

Raymond

Hi Raymond,

Yes, I am currently following this approach & able to implement the Recommender system. But i want to use all the data available.

As per my last post, i did the analysis, why my mini-batch is not showing the expected results.

Code:

iterations = 200
lambda_ = 1
batch_size = X.shape[0]
num_batches = int(np.ceil(X.shape[0] / batch_size))

cost_value = 0

for iter in range(iterations):
for batch_num in range(num_batches):
# Get the batch data
start = batch_num * batch_size
end = min(start + batch_size, X.shape[0])
X_batch = X[start:end, :]
Y_batch = Ynorm[start:end, :]
R_batch = R[start:end, :]
X_batch = tf.Variable(X_batch)

    # Use TensorFlow’s GradientTape
    # to record the operations used to compute the cost 
    with tf.GradientTape() as tape:
        # Compute the cost (forward pass included in cost)
        cost_value = cofi_cost_func_v(X_batch, W, b, Y_batch, R_batch, lambda_)
    
    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss
    grads = tape.gradient( cost_value, [X_batch,W,b] )

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients(zip(grads, [X_batch, W, b]))

Log periodically.

if iter % 20 == 0:
    print(f"Training loss at iteration {iter}: {cost_value}")

Analysis:

I found out the issue in the mini-batch algorithm, In this if you can see

grads = tape.gradient( cost_value, [X_batch,W,b] )

The values of X are not updating as we are using X_batch. But it should update as X is also a parameter in our case and for each batch the algorithm should update the X value.

I tried to resolve the issue by updating the X values using
X = tf.stack(X_batch[start:end ,:]) but not working.

Can you suggest any approach to resolve this ?

Thanks
Akshay

Hello Akshay,

So X_batch gets updated, but it does not reflect in X.

If you want to assign the values back to X, check out this small example for some ideas:

import tensorflow as tf
x = tf.Variable(tf.zeros((5,5), tf.float32))
y = tf.Variable(tf.ones((3,5), tf.float32))
x[1:4, :].assign(y)

Raymond