Collaborative filtering: Unary purchasing data

I wonder that is it possible if I use the number of time that the customer buy an item. Is this still work for collborative filtering? Because I think if we use 1 as purchase and 0 as not purchase. It is hard to tell what 0 mean because user not buy the item doesn’t mean they dislike the item.

Hi @peachans

Do you mean that the time that the user spend on the product and decide that he purchase or not…if you mean that I thinks that you may found user left the product open and go away and the clock calculate without any benefit and may found user open a product after he search about it and wait many to read Specifications of this product …and so on …so I thinks that you will have many miss leading data so the dividing line is whether the customer buys or not…


If you mean that number of time that the customer buy an items …I think that it may be useful if you doing an data analysis on your data to take a decision like if that customer buy many and many product you can decide that give that customer an offer but if you do that for recommendation system I think may it be useful and may not as you also considered with all customer who didn’t buy many products and who is an new customer on your web site …to make an decision of they but this specific product or not and we didn’t considered if the specific user buy many product or not we considered with the product how many users buy it and what is the comments on this product to take an decision of making that product come first on the search engine or recommend that product for new customers because many users buy this product and this product has a positive comment so briefly we considered with (The product not users to make a recommendation base on the product has it been purchased from many customers and did they like it or not?)


1 Like

Hi @peachans, here is my 2 cents.

FIrst, to answer your question, I want to assure you that you can do it. You can train a model with it, but I won’t guarantee that it performs either better or worse. It will be your job to really try them out and make a comparison.

You have told us about the good thing of using a numeric label instead of a binary label, so what we can do here is to discuss about the label itself. A label can have 3 meanings:

  1. 1 = purchased, 0 = not puchased
  2. 1 = purchased once or more times, 0 = otherwise
  3. 1 = very likely to purchase, 0 = very unlikely to purchase

(1) and (2) look equivalent but I think the (2) is a mind-changing approach, because now you get a “threshold” involved which is “at least one”. The good thing about a threshold is you can change it to anything more than 1.

For example, if you want to stay binary, you can have 1 be defined as “at least twice” and 0 be defined as “otherwise”. If you want to use numeric, you can use just the number of times, but anything larger than 10 will be clipped off to just 10. You might want to clip it off when you think purchasing 50 times should make no difference from purchasing 10 times because 10 is already very exceptional.

I hope you see that you always have the freedom to choose the label, and the ultimate thing is to make sure a good performance metric. No one can say you must do it this or that way, but in the course, we need to give you something as a starter, right? A binary not-purchase and purchase is a very good choice because it is simple!


1 Like