As far as I know, polynomials include only non-negative integer powers.
This means that roots (which are not integer powers) are not polynomials.
But prof. Andrew said in the lecture about polynomial regression that we can include the square root of the input as one of the features.
Aside from the fact that polynomials don’t include roots, would the MSE cost function remain convex after including the square root?
As far as I know, to know whether a function is convex or not, we perform a concavity test, in which we take the second derivative of the function, and check whether the second derivative is positive or not. If it’s positive for all of its allowed input values, then we can say that this function is convex.
The second derivative for the function J(x) = (sqrt(x) * w + b - y)^2 = 2x
We can see that the value 2x can be negative, and can be positive, which suggests that there is no one single global minimum.
Hello @OsamaAhmad, the test should be done against the parameter space which is w and b, and therefore the second derivative should be done with respect to w and b as well. Therefore, whether it is sqrt(x) or any order of x, it remains convex as long as the loss is squared loss and the model assumption is a linear combination of the x^?.
Sorry for not mentioning this, but 2x is the derivative with respect to w.
The derivative with respect to b = 2, which is always positive, but the derivative with respect to w can be negative.
Taking the second derivative of a linear combination of different powers of x will result in different powers of x as the second derivative, which can be negative.
For example, dJ/dw of (w1*x^(1/6) + w2*x^(1/3) + w3*x^3 + b - y)^2 = [2x^(1/3), 2x, 2x^6]
Hello @OsamaAhmad, as you can see in my below deviation, the 2nd derivative is always non-negative which is convex because whatever {x^{(i)}}^k is - positive or negative, the square will make it non-negative, and k can be anything such as 1/6, 1/3, …
However, {x^{(i)} }^k is a pre-computed feature value, so the 2nd derivative here is the sum of squared pre-computed feature values. If your x^{(i)} is negative at the beginning, you couldn’t compute {x^{(i)} }^{\frac{1}{2}}, and so you can’t use it as a feature.
actually, when I said “you couldn’t compute the square-root if it is negative”, I was thinking about “why not? it’s just an imaginary numbner”, and then I decided to stop thinking about it further… lazy me…
I have not come across imaginary numbers being used in the context of the feature value “x”, so I have not thought it through.
when you take a 2-d problem and say y=ax^0.5. When x is a negative number, then y becomes an imaginary number - Is this a 2-d problem anymore. If so, how do we represent x and y on a 2d plot?
I can’t google any reliable source of article about Linear regression model for complex-valued data (& parameters) now, perhaps will need to go to a library for that. I have not learnt how to consider this problem in the complex number world.
This article suggests the condition for convexity is different for complex function. Having said that, I am not sure how such convexity is related to the optimization problem. Probably need to find a good book for a more systematic view of this.