Howewer, an error of $1000 for an expensive house is not as critical as for a cheap house. Should not we use some kind of relative error? Something like this:
If so, how do we decide when to use absolute error and when to use relative error?
Additionally, when representing the model in the form:
f_{w,b}(x) = wx+b
we do not account for the fact that price and size cannot be less then zero.
But the model might produce negative results. Should the existence of such constraints be taken into consideration while fitting the model?
Interesting points! Youâre right that a specific dollar difference represents a smaller error for an expensive house, but the full range that the model has to handle isnât really that wide if you are applying it to a given housing market. Maybe 2 orders of magnitude of total range at the extreme. Itâs an interesting thought, but my guess would be that it wouldnât make that much difference to use the relative cost metric. But this is an experimental science: you could try it both ways with a given set of training data and see how much difference that makes in your results. Let us know if you learn anything either way by trying that. Science!
On the question about negative values, the sizes are inputs, so you donât have to constrain them to be non-negative. The prices are the output of the model, so you could apply ReLU as the output activation to constrain the prices to be non-negative. But that should not be necessary: the training data does not contain any negative prices, right? So if the model produces a negative cost prediction, then the error term for that sample will be even greater than if you constrained the output to be zero. So the cost function will give an even bigger penalty and push the model not to generate such bad predictions. At least thatâs what youâd hope if your training is actually working properly. If thatâs not happening, then maybe youâve got a bigger problem, e.g. thereâs a bug in your code or the model youâve chosen is not applicable to your data.
@paulinpaloalto I always get a little nervous when talking to you, but I am surprised standard deviation has not come up in this conversation-- I mean that is your measure of variance, am I not right (?).
Plus, since we are talking about linear regression, and this is only a line⌠Well, despite whatever error measure you use there but a limited number of ways you can contort itâŚ
Rather than invent a new cost function, a more effective method might be to perform some scaling on the target âyâ values - such as log compression or normalization.
The behavior of these models differs most for cheap houses. One model predicts 10% higher prices for small (cheap) houses than the other. It was interesting to experiment, build models with different cost functions and visualize them. However, I understand, that this plot is not a proof; it is just one case, and on a different dataset, we might get different results.
Also, thank you for the tip to think about orders of magnitude of the total range at the extreme. I find it helpful.
Regarding the question about negative prices. When specifying the model, we do not leverage the knowledge of the constraints, so our model lacks this âunderstandingâ. Actually, I do not know yet how to transfer this knowledge into the model and what it can do with it But maybe real-world data with such constraints tends to have a non-normal distribution, and that could affect our training process.
I apologize, as I am just at the beginning of the specialization, and it is currently quite challenging for me to understand why it would be more effective.
I want to build a solid intuition about cost functions. Here is how I understand it: When we train the model, we use a cost function to determine what is âbetterâ and what is âworseâ. If we choose a cost function that is more appropriate for a specific problem, then we get a model that predicts values more âaccuratelyâ for that problem. This is why I thought that we might sometimes choose different cost functions.
Also I took our lab and trained three models: 1) using mean squared error, 2) using mean squared error with y log-compressed and 3) using mean squared relative error. I did this to understand how log-compression changes the model, to see whether it differs from other models, and to bring more details to the context. If there are no bugs in my code, the plots are as follows:
@kagudimov personally, I am not familiar with this lab (do you only have the two variables ?)-- But otherwise you might also consider running PCA to see what components âactually matterâ in your analysis.
âMore effectiveâ in that rather than create a new cost function (which also requires you to derive the gradients for that cost function), you can simply re-scale the dataset and use the existing cost function.
Yes, now I see that it is more effective. However, does not it solve a different problem? It does not minimize relative error, which might be more appropriate for price prediction (my original question). Nonetheless, in the test data above, it produced a âbetterâ model (smaller relative error for small prices) than when training with mean squared error. I need to think more about itâŚ