I assume that we cannot know the exact form of our hypothesis that would fit the training examples.
That’s why we initially give the hypothesis more freedom by making it non-linear and adding more parameters. That way we run away from the problem of underfitting.
However, this makes our model more vulnerable to overfitting . Overfitting would occur especially if we do not have a sufficient amount of training examples.
We fight overfitting by regularization - a technique that prevents giving a single parameter too much weight. Instead it distributes the weight more evenly. At least that is my assumption. What do you think?
The critical part here would be to determine the correct value for lambda. I would personally run the cost function for different values of lambda and then take the one that has the smallest global minimum. I am looking forward to understand what Andrew would have to say on that in course 2.