I’m interested in models that could be used to estimate the likelihood of a list of customers shopping on my website at a given time of day. The customers are represented by customer IDs (eg 0, 1, …).
I’d like to build a model that in the can answer the question: At 1pm, how likely is it that customers [10, 202, 123489] are shopping? Note there is a very large number of customer IDs (let’s say about 1 million).
I have data like the following (day, time, list of customers shopping). One example might be (‘05/10/20’, ‘13:00’, [10, 202, 123489]).
I would have a model that predicts “for a user with these features X, the likelihood Y of the user shopping in each hour of the day”. Y will be a vector of 24 numbers (for 24 hours).
Then I make a prediction for each user and then convert back to which users are likely to be shopping in each hour.
My model would predict P(Y_i | X(U_i)) for all i, and considering Y_i | X(U_i) and Y_j | X(U_j) be independent, we can multiply their probabilities to get P(Y | X(U_i), X(U_j)) where Y is Y_i ∩ Y_j.