You are helping a grocery store predict its revenue, and have data on its items sold per week, and price per item. What could be a useful engineered feature?
(A) For each product, calculate the number of items sold divided by the price per item.
(B) For each product, calculate the number of items sold times price per item.
I have wrongly selected option A and the reason was “Dividing the number of items by the price per item is a less useful engineered feature compared to the other option.”
I have selected A because I considered that having two feature that may vary simultaneously inversely would “add more information” rather than having a feature that may behave constant.
Why would be more meaningful the other option?
Welcome to the Community!
For the second option(B) when we multiply the number of items sold times price per item, we get the total amount of money(revenue) that the seller get from the specific item That help us solve the main problem which is grocery store predict its revenue
But If we choose the first option(A) if we divide the number of items sold by the price per item, what we benefit from it, in the other words what is the column name of it that show result specification, so we choose the option(B)
I lost focus of the target! Thank you so much for your response! <3
In practice neither is very useful if the “price per item” is a constant. They both simply rescale the value, that doesn’t add information.
The question only makes sense if the ‘y’ values in the training set are the actual revenue, so that we could create a model that predicts the revenue.
But the scope of the question is feature engineering, not “how to create a useful model”.