We have already found the best possible curve and learning rate for our model.
But, unfortunately, it still has high variance or high bias.
The “Deciding what to try next” video showed 6 different ways to make our model better
However, three of the methods seem suspicious to me:
Increasing alpha
Decreasing alpha
Adding polynomial features
Suspicious because both the best possible learning rate alpha and the best possible learning curve were already considered during the iterations of the algorithm.
I guess you might be slightly confused between the regularization parameter\lambda and the learning rate\alpha. In the video, increasing and decreasing \lambda is suggested, while \alpha is treated as a hyper-parameter, which you can hyper-tune to get a better performance. In my opinion, increasing and/or decreasing \alpha won’t have much effect on bias/variance.
As for the 3rd method, i.e., “Adding polynomial features”, I am not very sure as to what exactly is seeming to be suspicious to you. We might have already found the best possible curve, but since that has a high variance and/or high bias, how can we refer to it as the “best”? It’s just that the learning curve was “best” for the previous configuration, and it is not at all the “best” in terms of performance. If I am misunderstanding your query, please do correct me, and then we will discuss.
What I mean is that there is no point in increasing or decreasing lambda since our algorithm already estimated the best lambda for our function:
Well, it might have high variance and/or high bias but we already tried our best by fitting all the possible polynomial configurations
All the things I am saying are just my theoretical expectations since I have not diagnosed an algorithm myself. I think my theory breaks because, in the real world, the estimations we make about lambda and the polynomial configuration with our algorithm are not perfect. In the end, it is about us adjusting them. Am I right?
Hey @popaqy,
Before beginning my reply, I would like to mention 2 things:
Prof Andrew discussed 6 methods in the lecture video, but I am assuming that the fact of this list of 6 methods being non-exhaustive is well-established.
Another thing is that all of these 6 methods are not expected to work in every scenario. This is just a list of possible methods that you can try out, and perhaps some might work and some may not as per your scenario.
So, if we consider your hypothetical scenario, the 3 methods you have listed might not work, and that’s it. This doesn’t mean that these methods aren’t useful in other scenarios. I guess, that should clear your suspicions.
Now coming to the hypothetical scenario, as you mentioned, the theory indeed breaks in the real world, since there are infinite possible values of \lambda and since there are infinite polynomial configurations. So, even with all the computation in the world, establishing this hypothetical scenario (at least exhaustively) is not possible. Some form of parameterisation or function approximation might perhaps establish this, but I am completely unfamiliar with either of these, so, don’t take my word for it
Now, I am not sure as to what exactly are you expecting as the reply for this question. When Prof Andrew says that we can try increasing or decreasing \lambda, he doesn’t say by how much we should increase/decrease or to what value we should set it. So, in the staring even, it was about us adjusting them. I don’t think this hypothetical scenario affects that anyhow.
And as to this; sure, in this hypothetical scenario, increasing or decreasing \lambda won’t work, and you can try the other 4, or even some methods other than these 6.