Including a square root in a polynomial regressor?

OsamaAhmad · June 21, 2022, 5:13pm

As far as I know, polynomials include only non-negative integer powers.
This means that roots (which are not integer powers) are not polynomials.

But prof. Andrew said in the lecture about polynomial regression that we can include the square root of the input as one of the features.

Aside from the fact that polynomials don’t include roots, would the MSE cost function remain convex after including the square root?

As far as I know, to know whether a function is convex or not, we perform a concavity test, in which we take the second derivative of the function, and check whether the second derivative is positive or not. If it’s positive for all of its allowed input values, then we can say that this function is convex.

The second derivative for the function J(x) = (sqrt(x) * w + b - y)^2 = 2x
We can see that the value 2x can be negative, and can be positive, which suggests that there is no one single global minimum.

Am I missing something here?

rmwkwok · June 22, 2022, 6:43am

Hello @OsamaAhmad, the test should be done against the parameter space which is w and b, and therefore the second derivative should be done with respect to w and b as well. Therefore, whether it is sqrt(x) or any order of x, it remains convex as long as the loss is squared loss and the model assumption is a linear combination of the x^?.

Cheers!

OsamaAhmad · June 22, 2022, 3:02pm

Sorry for not mentioning this, but 2x is the derivative with respect to w.

The derivative with respect to b = 2, which is always positive, but the derivative with respect to w can be negative.

Taking the second derivative of a linear combination of different powers of x will result in different powers of x as the second derivative, which can be negative.

For example, dJ/dw of (w1*x^(1/6) + w2*x^(1/3) + w3*x^3 + b - y)^2 = [2x^(1/3), 2x, 2x^6]

Am I missing something here?

rmwkwok · June 22, 2022, 3:23pm

Hello @OsamaAhmad, as you can see in my below deviation, the 2nd derivative is always non-negative which is convex because whatever {x^{(i)}}^k is - positive or negative, the square will make it non-negative, and k can be anything such as 1/6, 1/3, …

OsamaAhmad · June 22, 2022, 3:37pm

Great, thank you for the clarification!

rmwkwok · June 22, 2022, 3:38pm

@OsamaAhmad, you are welcome!

shanup · June 22, 2022, 10:52pm

Hey @rmwkwok

(x^{(i)^k})^2 = (x^{(i)})^{2k} = x^{(i)}, when k= 1/2

And now I am wondering where it leaves us on the convexity, purely going by the 2nd derivative rule.

P.S. Its past my bedtime so dont know if I am still thinking straight

rmwkwok · June 22, 2022, 11:00pm

Well! Interesting!

However, {x^{(i)} }^k is a pre-computed feature value, so the 2nd derivative here is the sum of squared pre-computed feature values. If your x^{(i)} is negative at the beginning, you couldn’t compute {x^{(i)} }^{\frac{1}{2}}, and so you can’t use it as a feature.

shanup · June 22, 2022, 11:02pm

Bang on @rmwkwok

Thats exactly the line of thought I was having - whether the x^k is taken first and then the square of that or is it x^2k directly

OsamaAhmad · June 22, 2022, 11:05pm

I’m wondering, won’t the model work the same if the complex numbers are involved?
In this case, the squared value can be negative.

This is, of course, assuming that gradient descent will converge. Is there any other problem with complex numbers being involved?

rmwkwok · June 22, 2022, 11:05pm

actually, when I said “you couldn’t compute the square-root if it is negative”, I was thinking about “why not? it’s just an imaginary numbner”, and then I decided to stop thinking about it further… lazy me…

OsamaAhmad · June 22, 2022, 11:06pm

The same thought I had

shanup · June 22, 2022, 11:24pm

I have not come across imaginary numbers being used in the context of the feature value “x”, so I have not thought it through.

when you take a 2-d problem and say y=ax^0.5. When x is a negative number, then y becomes an imaginary number - Is this a 2-d problem anymore. If so, how do we represent x and y on a 2d plot?

rmwkwok · June 22, 2022, 11:36pm

I can’t google any reliable source of article about Linear regression model for complex-valued data (& parameters) now, perhaps will need to go to a library for that. I have not learnt how to consider this problem in the complex number world.

rmwkwok · June 22, 2022, 11:56pm

This article suggests the condition for convexity is different for complex function. Having said that, I am not sure how such convexity is related to the optimization problem. Probably need to find a good book for a more systematic view of this.

OsamaAhmad · June 23, 2022, 12:06am

Anyways, I can’t think of an example where complex numbers are involved either

Bottom line: assuming that complex numbers aren’t involved, the MSE will always be convex for linear combinations of powers of the input.

rmwkwok · June 23, 2022, 12:09am

@OsamaAhmad, exactly!

shanup · June 23, 2022, 9:04am

Yes, lets leave it at that!

Topic		Replies	Views
Gradient Descent and cost function Supervised ML: Regression and Classification week-2	1	484	November 8, 2022
[Question/Validation] Negative J(w,b) in the lecture photo Supervised ML: Regression and Classification week-3	7	333	October 13, 2023
Need help grasping intuition behind square error cost function and multi-variable regression model Supervised ML: Regression and Classification week-1	3	516	May 1, 2023
Question regarding squared error cost and shape of J(w) Supervised ML: Regression and Classification week-1	1	398	July 2, 2023
Cost function for Polynomial regression Supervised ML: Regression and Classification week-2	1	529	July 10, 2022

Including a square root in a polynomial regressor?

Related topics