"What do offline experiments and p(y(i, j)) = 1 mean in the lecture "Recommending from a large catalogue"?

I’d like someone to help me elaborate more examples or intuition on offline experiments and p(y(i, j)) = 1. Thank you!!

1 Like


If I understand the point well… I believe offline and online means if the recommender model in not in production (offline) or in production (online). Thus before trying adding more items in your production model, which could infer in slower recommendations (more computing time because the matrix size is bigger to produce a list of recommended items), you could try with a model that is not in production adding more items to check if the recommendations that you get with that “augmented model” are slower but more relevant, if the user is picking that recommendation in the end, then it is relevant. As a summary you need a metric to compare if recommendations from one model are better than the recommendations from another. If there is a big improvement, then maybe the price of getting a slower system is a good trade-off. If the improvement in the recommendations is not much, then maybe the trade-off is not enough to update the model in production.

Hope this long explanation helped you understand that point :slight_smile:


1 Like

Hello Son, how are you doing?

Let me also attempt this question.

First let us remind ourselves that

  1. we train a user and a movie network so that the dot product of an user j and an movie i vector approximates the rating y^{(i,j)} which is the objective of the networks.

  2. we use (1) in any imagainable way to nominate good candidates (retrieval step), but the thing is, candidates are not recommendations. We still need a ranking mechanism to short-list the final recommendations (ranking step).

Now here are the points:

  1. the ranking mechanism itself is also a trainable mechanism

  2. in the video, Andrew suggested a way to rank by making use of (1). There are 2 things we need to again remind ourselves: (A) we can suggest a different way to rank, and (B) even if we adopt Andrew’s way, there are “hyperparameters” to tune including but not limited to “the number of candidates”. These 2 things are like we can choose to model our data with a first-degree polynomial or a second-degree polynomial model, and we have hyperparameters like the regularization factor to tune.

  3. for the case of modeling our data, we can evaluate our choice of model (first or second degree polynomial) and our choice of regularization factor. We make difference choices, evaluate them, and come up with the best choice.

  4. however, for the case of ranking, how are we going to evaluate our choice of ranking way (Andrew’s way or other ways) and our choice of hyperparameters like the number of candidates?

  5. perhaps we can immediately realize the difficulty here. Unlike training a supervised model where we have labeled data for evaluation, we don’t always have “labeled” or “good” or “expected” ranking to evaluate our ranking mechanism, do we? Can you imagine how we can collect those “labels” or “standard answers”?

  6. That’s why we need to experiment our ranking mechanism so that it generates recommendations and then we evaluate the recommendations. Andrew didn’t discuss how to design the experiments, so it is up to us to research and to brainstorm.

  7. Lastly, although I always only focus on experimenting the ranking mechanism, indeed we would also need to justify our retrieval mechanism. Our validation process for the user and movie networks is only good for the networks themselves, but talks nothing about how good the retrieval and the ranking mechanisms are.

Now to answer your question:

p(y(i,j)) =1: First, let me modify it to p( y(i,j)=1 ). Second, y(i, j) = 1 means user j rates movie i as 1 (highest score). Second, p( ... ) should mean the predicted probability of the rating being highest. Third, we concern about p( y(i,j)=1 ) because movies of high p value are the candidates for ranking.

offline experiment: as explained above, it is a process for us to evaluate the ranking mechanism, and the number of candidates can be seen as a hyperparameter to tune. The experiment is taken offline so that it would not mess up with the online service that is visible to users.

experiment: let me throw out a very basic way to experiment: assemble 2 groups of 100 people each, use ranking mechanism A to generate a list of recommendation for each person in group A, use ranking mechanism B to generate recommendations for people in group B. Ask each person to rate the recommendation with a well-designed evaluation form. Then you analyze the collected forms to come up with a decision of whether mechanism A or B is better. This is a pretty time-consuming and non-scalable way of experimentation.


1 Like

What are hyperparameters and tuning ?