When finding the line of best fit for linear regression, couldn’t you average the slopes of the data points starting from the first data point instead of using a loss model? And then trace that slopes back from each data point to find their y intercepts, and average those to get a y intercept? Wouldn’t this give a pretty good line of best fit or am I missing something.

What is the definition of the slope of a data point? Kind of like the sound of one hand clapping, right? Remember that we don’t have a curve through the point yet, that’s what we’re trying to figure out. I guess you could take all possible pairs of points (some combinatorics there) and then compute the averages you are talking about. It seems intuitively plausible that it might actually work. Would it handle “outlier” cases as well as the “loss model” approach? You could try your method and compare the results.

I guess the other question is how do you measure “best fit” in your case? In other words you need some kind of loss or distance function and what you mean by “best fit” is the solution that gives the minimum value for that function. Otherwise it’s just subjective and that only works in 3 dimensions, since we have a hard time deciding what “looks nice” in 4 or more dimensions.

Note that linear regression is not really covered in DLS that I can recall, but it’s an interesting question nonetheless. Also note that there is an actual closed form solution in that case called the Normal Equation, but that is only if you use the MSE loss function approach.

The inability to compare it to other models makes sense, I guess I would have to add up the losses any way to test it against any other function.

In terms of actually calculating the slope, I was thinking that you calculate the slope of the first point to the second, first to third, and so on but maybe that would bias it to count the first point two heavily…

Anyway the more I think about it the more complicated this method seems. Thanks for the reply!

Ah, sorry, I didn’t understand what you meant the first time, but now I get it. Yes, I think your objection to the first point method is exactly the issue that I would be concerned about also. What if that point is an “outlier”. So then you’d have to choose a better one, but then we’re back to the whole point being that you need to define what you mean by “better”. And then you’ve got a loss function that you’re trying to minimize, so we’re back where we started. I think we’re agreeing on that, so it’s all good!