Models for large number of classes

I’m interested in models that could be used to estimate the likelihood of a list of customers shopping on my website at a given time of day. The customers are represented by customer IDs (eg 0, 1, …).

I’d like to build a model that in the can answer the question: At 1pm, how likely is it that customers [10, 202, 123489] are shopping? Note there is a very large number of customer IDs (let’s say about 1 million).

I have data like the following (day, time, list of customers shopping). One example might be (‘05/10/20’, ‘13:00’, [10, 202, 123489]).

What type of models could be applicable to this?

Thanks.

Hello @GoodVision,

I would have a model that predicts “for a user with these features X, the likelihood Y of the user shopping in each hour of the day”. Y will be a vector of 24 numbers (for 24 hours).

Then I make a prediction for each user and then convert back to which users are likely to be shopping in each hour.

Cheers,
Raymond

1 Like

Ok. That’s like computing P(Y | X,U_i) for different users U_i correct?
But what if I want to know P(Y | X, U_i, U_j, U_k) ?

Hello @GoodVision,

Let’s write X(U_i) since X is a function of U_i.

My model would predict P(Y_i | X(U_i)) for all i, and considering Y_i | X(U_i) and Y_j | X(U_j) be independent, we can multiply their probabilities to get P(Y | X(U_i), X(U_j)) where Y is Y_i ∩ Y_j.

Cheers,
Raymond