Tl;DR: Can I train a single layer Neural Network having n linear regression units where the weights, biases, and feature values (w,b,x) are all free parameters to model something like the housing prices example of course 1? Does such an algorithm already exist? How well does it generalize?
I just finished watching the collaborative filtering lecture linked above where Andrew mentions that in the linear regression examples of course 1, one couldn’t learn the features as one did not have access to multiple weights and biases.
However, if instead of a single linear regression unit, we have multiple linear regression units, like a single layer Neural network, would that not help us have multiple weights and biases?
For example, say the input training data I have is the price a house was sold at, but I do not know which features are important for the house. In this case, if I assume to choose n randomly initialized features, can I train this Neural network where everything (w,x,b) are variables to model the housing prices?
I would also like to know what is the relation between a collaborative filtering algorithm and a Neural network, and how I could convert one to another.
I apologize if this question is poorly structured as I too am yet to have clarity on what it is I’m missing.
On further thought, the possibility of training a neural network where everything except the final house price (and number of neurons in the layer) seems not possible. This is because I understand regression as a curve fitting, which is not possible without having both X and y values. There are an infinite vastly varying possibilities of functions that can fit such a description.
But this insight has me even more confused about how collaborative filtering works. Just because I’ve had multiple people price the same house, how am I able to do curve fitting in an n-dimensional space (where n is the number of features) such that it generalizes to new data?
The collaborative filter and recommender systems work because the output labels are not just a single value for each example. The labels are an entire 2D matrix of values.
This lets you learn both the weights and the features.
Even then, I’m a little confused as to why this should work. For example, if instead of a single person pricing a house, if I had multiple people pricing it, isn’t this still a regression problem where we only know Y values and no X values?
The number of possible functions that can fit such a system seems too vast to me to be generalizable in any sensible manner. So I still don’t understand what the ingredient is that makes this algorithm work.
So, you have 10 houses, and 20 people pricing each of those 10 houses, right?
In this case, yes, you can build a model that predict the house price where the model’s w, x and b are all trainable parameters. At the end of the training, you will be able to predict how much one of those 20 people will price one of the 20 houses. NOT just a house’s price, but someone’s price of a house.
Since it is NOT just a house’s price, it appears to be different from what you will expect from the way we model it in course 1. In course 1, when you model the houses, your model should be able to tell the price (instead of someone’s price) of any house (instead of just those 20 houses). What do you think is the impact going to be?
Also think about this: in course 1, we have one fixed w, but in your idea, you have one w for each person, what are the pros and cons? What difference do they make?
This is a very interesting discussion, and I believe we should first focus on these two narrow points first, and you should be given time to think through them and propose your view .
@Rishi_Sreedhar so also, for me just a bit TLDR (only because I have not taken the course you are referencing)-- But I know, particularly in the context of collaborative filtering PCA (principal component analysis) can be a very powerful technique for determining which (and ‘how much’) various independent variables affect your dependent one.
As to how you might apply PCA to a neural net though… To be honest that is beyond the scope of my knowledge at this time and you’d have to ask someone else.