# The Difference Between Algebraic and Statistical Linearity in the Context of Regression

Hi everyone!

Iβm studying machine learning and came across a question that might seem simple, but itβs causing some confusion. I want to better understand the difference between the concept of linearity in algebra and statistical linearity, especially when it comes to regression. Algebraic Linearity: As far as I understand, algebraic linearity means that a function π¦ = π ( π₯ ) y=f(x) is linear if π₯ x appears only in the first degree, for example, π¦ = π π₯ + π y=mx+b. If we add terms like π₯ 2 x 2 or higher powers, the function becomes nonlinear. Statistical Linearity: On the other hand, in statistics, Iβve come across the claim that even if we have terms like π₯ 2 x 2 , π₯ 3 x 3 , and so on, the model can still be considered linear if itβs linear in the parameters (e.g., π¦ = π½ 0 + π½ 1 π₯ + π½ 2 π₯ 2 + π½ 3 π₯ 3 y=Ξ² 0 β +Ξ² 1 β x+Ξ² 2 β x 2 +Ξ² 3 β x 3 ). In this case, linearity is defined as linearity in the coefficients π½ Ξ². Question: Could someone explain in detail why the term βlinearityβ is used this way in statistics? Whatβs the reasoning behind this, and what advantages does it provide for data analysis? It would be helpful to see examples where understanding these differences is important in practice. Thanks!

@NikitAo
Hey, this is actually a great question, and it took me a while to clear out the confusion I had. To be honest, while I was thinking how to clearly explain it in my answer, I understood a bit more.

Algebraically, when you add powers to X, as in Y = X^2, the function becomes, as you know, quadratic. So, the value of Y changes βquadraticallyβ with X.

In Statistics, the thinking is slightly different, because we are modeling (i.e trying the approximate) the relationship between Y, and the predictors, X and X^2. So, when you have the linear regression equation :

Y = Beta_0 + Beta_1 * X + Beta_2 * (X^2)

We are trying to estimate how much Y changes, when the predictor, X changes by 1 unit. And also, how much Y changes when X^2 changes by 1 unit. So, we are estimating the relationship between Y and X, and also between Y and (X^2). The term (X^2) could be seen as a different variable; you could rename Z. So, the effect of (X^2) on Y is linear and is captured by Beta_2.

In algebra, in : Y = X^2, we are observing how Y changes when X changes by 1 unit.
In regression, with Beta_2, we are observing how Y changes when (X^2) changes by 1 unit.

We are adding the quadratic term to capture the curvature that could be present in the data.

Hope this helps.

Nick