Hello
I am almost at the end of the MLS course, and it got me wondering: if we have a dataset where labels y are each a vector rather than a scalar, what learning algorithm would be appropriate?
Say, I want to predict pizza sales for a pizza shop: how many pizzas of which type, size and # of pies per size I would need for next week? I’d use previous sales for training, so it’d be a supervised learning algorithm, but I don’t remember seeing where an output label is a vector ( such as [pizza type, # of small pies, # of large pies, etc]) rather than a scalar? I just can imagine such a question being asked at an interview
Professor Google has not been helpful in answering this question…
Thank you!
When you train the model according to training data has features [pizzas of which type, size and # of pies per size] if you want to predict pizza sales for a pizza shop(profit), after the training you want to predict what the price if you have the values of this features [pizzas of which type, size and # of pies per size]…
But
If you want to predict how many pizzas of which type, size and # of pies per size I would need for next week?..there are 2 solutions:
- You want to use time series as you predict according to the trend and time
- or you want to have an independent features like the time you would be open next week, the salary of the resoures … .etc …and train the model to predict these output [how many pizzas of which type, size and # of pies per size I would need for next week] it’s called
Multi-task learning check this if you want
Best Regards,
Abdelrahman
Thank you, Abdelrahman
No, that’s not what I had in mind… It is not profit (which is a scalar) I want to predict, but: how many pizzas of type type would I need for the next week, and # of pies of small size, large size etc, i.e I want to predict the following output vector y: [type, # small pies, # med pies, # large pies, etc]. I’d also like predict it for each day of the week, so I can plan what ingridients and how much of each I’d need to buy for each day of the week, which pizzas/sizes/how many I’d need to cook for each day of the week (so I can plan employee hours), stuff like that.
My dataset would be past sales like this:
features x are [day of week(Mon), time of day(11am-1pm), online order(1), dine in (4), etc] and
output labels [pepperoni, 5 sm, 3 med, 0] - for each training example.
Now I want to predict [pizza type, #sm, #med, #lg] for each day of next week or for a random week in the future.
I.e. my output labels y (not just the features x) are vectors (or matrices) rather than scalars.
Makes sense?
So my question is: what ML learning algorithm I’d use: something we’ve covered in this course (or maybe a combination of), or something we haven’t learned yet (and what is it?)?
The link for MLT you provided - is a bit hard to digest for a non-mathematician, but it didn’t look like what I was looking for at the first glance.
Thank you!
I thought of using collaborative filtering algorithm as it is used to generate vectors (like movie preferences per user, list of properties describing a movie - per movie, etc), but it is an unsupervised learning algorithm - whereas I would have output labels in my training dataset to learn from.
Or can it still be used in my case?
Since some of your outputs are categorical, and some are quantities (regressions), you might instead try using multiple different models, each with one of your outputs, and all using the same input data.
Or you could use the collaborative filtering method, as it’s essentially a bunch of linear regressions all running simultaneously.
Where it gets tricky is if you’re tying to mix classification and regression outputs.
Note that collaborative filtering is supervised learning.
Thank you, I was thinking of doing that.
But I’d probably just assume that every day every type of pizza would be ordered - so no need to learn categories.
So what I’d need to know is quantities of different sizes for each type of pizza, i.e. predict a vector y of [#sm, #md, # lg] for each category for each day/timeslot.
Still need to predict a vector rather than a scalar.
In our lectures the output vectors are completely unknown in the beginning, not based on learning from previously labeled training data. This is where i get confused.
I think Tom’s idea is that you have a model that predict pizza A’a sales, another model for pizza B’s sales, so on and so forth.
For day/timeslot, it can be a feature of each of those models. So it is like, Model(feature1, feature2, ..., featureN, day, timeslot) = sales
This way, each model is independent of each other.
If you want to learn models that are not independent of each other, i.e. they share something in common, then “Multi-task learning” may be a good starting point for searching. You can expect some neural network architectures that has multiple prediction “heads” (one head for one task), but those heads are preceded by shared neural network layers. The DLS has one video on this - not too much, but if the content is hard to digest, that means you might need to spend extra effort, or you may need to find more suitable references for yourself. The link to that video is here, and you may want to open it in an incognito browser.
Cheers,
Raymond
If we’re talking about recommender systems, the movie ratings (the matrix of output values) are the labels. What we’re missing is the features X and the weights w, so we have to learn both of them. But it’s still supervised learning, because we know the truth output values and we’re trying to learn to predict them for new inputs.
I’m really not sure why this is covered in the “Unsupervised learning” part of the course, because what we’re doing is really just a more complicated form of linear regression with multiple outputs.
I don’t think a recommender system is the best method for the outputs you want from your pizza store model.
Thank you all for your thoughts. I will take a closer look at the Multi-task Learning as you and AbdElRhaman suggested.
I am also thinking of how my “pizza shop” model would generalize: because a pizza shop example is quite small, a large scale example would be to predict a “purchase list” for a regional purchasing manager of all Denny’s restaurant in the metro of Chicago, for example. I.e. which products to order for each week of the year for each restaurant and how much of which product - so one can properly budget, place various orders to variouse vendors, plan shipping/receiving/warehousing etc.
It seems there are no quick and easy answers to this, so I’ll keep thinking on this…
I am also planning to take the DLS course next, maybe something I learn there will help.
Again, thank you all for your answers!
I think I may have found a relevant answer: Multi-Output Regression.
And it looks like there is some academic research that was done on this: A Survey on Multi-Output Regression.
That’s good! As it said
Multioutput regression support can be added to any regressor with MultiOutputRegressor. This strategy consists of fitting one regressor per target
It builds one model for one target.
You are right, it still solves for one target at a time, not n targets in parallel. I’ll check the academic paper I linked to which is a “survey on state-of-the-art multi-output regression methods” (circa 2015). Maybe someone already solved it?
Outputting a vector as you say is not the best approach. Use a one-hot encoding to represent the pizza size as input features and there is one output: the demand for that size pizza (plus all the toppings and other features).
Why do you want a single algorithm to predict all sales of all pizzas?
Lets say there are N pizza types. Wouldn’t predicting a vector of length N (with each being a number) be equivalent to N very similar models that predict the sales for each different type of pizza?
Insofar as I know, that is, btw, exactly what Multi-output Regressors do. They simply train one regressor for every separate target. The same for multi-class classifiers. The interesting things start happening if there are restraints on the output. I don’t think I know of any model that can predict multiple values that also satisfy some constraints (e.g. in your pizza problem: the sum of pizzas can never be greater than 200).
The pizza example was a thought exercise - something I wouldn’t be surprised to be asked at an interview at MSFT or Amazon or Google, for example (I used to work and conduct interviews at MSFT and AMZN)
As for real-life applications - see my examples of predicting a “shopping list” for a regional purchasing manager for a Denny’s chain of restaurants, or a grocery chain, or a hospital chain etc.
Is training thousands of models (one for each product) a practical solution? I have no idea. But I would certainly use it at an interview if asked
It’s probably computationally easier to train 1000 linear models than one NN model with 1000 outputs. NN backpropagation is computationally intensive.