# Including a square root in a polynomial regressor?

As far as I know, polynomials include only non-negative integer powers.
This means that roots (which are not integer powers) are not polynomials.

But prof. Andrew said in the lecture about polynomial regression that we can include the square root of the input as one of the features.

Aside from the fact that polynomials don’t include roots, would the MSE cost function remain convex after including the square root?

As far as I know, to know whether a function is convex or not, we perform a concavity test, in which we take the second derivative of the function, and check whether the second derivative is positive or not. If it’s positive for all of its allowed input values, then we can say that this function is convex.

The second derivative for the function J(x) = (sqrt(x) * w + b - y)^2 = 2x
We can see that the value 2x can be negative, and can be positive, which suggests that there is no one single global minimum.

Am I missing something here?

Hello @OsamaAhmad, the test should be done against the parameter space which is w and b, and therefore the second derivative should be done with respect to w and b as well. Therefore, whether it is sqrt(x) or any order of x, it remains convex as long as the loss is squared loss and the model assumption is a linear combination of the x^?.

Cheers!

2 Likes

Sorry for not mentioning this, but 2x is the derivative with respect to w.

The derivative with respect to b = 2, which is always positive, but the derivative with respect to w can be negative.

Taking the second derivative of a linear combination of different powers of x will result in different powers of x as the second derivative, which can be negative.

For example, dJ/dw of (w1*x^(1/6) + w2*x^(1/3) + w3*x^3 + b - y)^2 = [2x^(1/3), 2x, 2x^6]

Am I missing something here?

Hello @OsamaAhmad, as you can see in my below deviation, the 2nd derivative is always non-negative which is convex because whatever {x^{(i)}}^k is - positive or negative, the square will make it non-negative, and k can be anything such as 1/6, 1/3, …

1 Like

Great, thank you for the clarification!

1 Like

Hey @rmwkwok

(x^{(i)^k})^2 = (x^{(i)})^{2k} = x^{(i)}, when k= 1/2

And now I am wondering where it leaves us on the convexity, purely going by the 2nd derivative rule.

P.S. Its past my bedtime so dont know if I am still thinking straight

2 Likes

Well! Interesting!

However, {x^{(i)} }^k is a pre-computed feature value, so the 2nd derivative here is the sum of squared pre-computed feature values. If your x^{(i)} is negative at the beginning, you couldn’t compute {x^{(i)} }^{\frac{1}{2}}, and so you can’t use it as a feature.

2 Likes

Bang on @rmwkwok

Thats exactly the line of thought I was having - whether the x^k is taken first and then the square of that or is it x^2k directly

I’m wondering, won’t the model work the same if the complex numbers are involved?
In this case, the squared value can be negative.

This is, of course, assuming that gradient descent will converge. Is there any other problem with complex numbers being involved?

actually, when I said “you couldn’t compute the square-root if it is negative”, I was thinking about “why not? it’s just an imaginary numbner”, and then I decided to stop thinking about it further… lazy me…

I have not come across imaginary numbers being used in the context of the feature value “x”, so I have not thought it through.

when you take a 2-d problem and say y=ax^0.5. When x is a negative number, then y becomes an imaginary number - Is this a 2-d problem anymore. If so, how do we represent x and y on a 2d plot?

I can’t google any reliable source of article about Linear regression model for complex-valued data (& parameters) now, perhaps will need to go to a library for that. I have not learnt how to consider this problem in the complex number world.

This article suggests the condition for convexity is different for complex function. Having said that, I am not sure how such convexity is related to the optimization problem. Probably need to find a good book for a more systematic view of this.

Anyways, I can’t think of an example where complex numbers are involved either

Bottom line: assuming that complex numbers aren’t involved, the MSE will always be convex for linear combinations of powers of the input.

1 Like