In the example provided for regularization with cost function being a function of degree 4, sir said that we could add 1000((w_3)^2) + 1000((w_4)^2) to the cost function and we were not disturbing the other parameters (w_1)^2 and (w_2)^2.
We did this because we wanted less values for the coefficients of x^3 and x^4 to avoid over fitting,right?
If yes, then why the generalized cost function has the same lambda for all the squared values of weights i.e., lambda * ( (w_1)^2 + (w_2)^2 + (w_3)^2 + … (w_n)^2) ? Why didn’t we have different values of lambda for different weight parameters?
Hello @shubby007,
You are correct. In that specific function, the idea behind using a lambda of 1000 is to reduce the contribution of the x^3 and x^4 features to the output. Here, we can visualize the shape of the expected function.
In practice, you might have multiple independent features and in most cases, it’s not possible to visualize the expected output function. As Prof. Andrew explains in the next slide, it may not be possible to decide beforehand which features are important. So usually, a single regularization parameter is used to penalize all the weights, leading to a function that is less prone to overfitting.
You could use different values of the regularization parameter lambda for each feature - say lambda1, lambda2, lambda3 to lambdaN for each of the weights. You will have more hyperparameters to tune instead of a single value of lambda. You could experiment with your datasets and determine if it’s worth the effort.
Thank you Sir!
I have a few more queries…
→ Firstly, lets say that we don’t have over fitting in our model. So, by adding the regularization constraint to the cost function are we maybe decreasing the accuracy of our resultant weight parameters?
→ Secondly, is regularization kind of a precaution irrespective of whether we encounter over fitting or not?
→ Thirdly, how do lower values of weight parameters contribute to a better model?
Thanks again
Firstly, lets say that we don’t have over fitting in our model. So, by adding the regularization constraint to the cost function are we maybe decreasing the accuracy of our resultant weight parameters?
Here’s a slide from Week 3 of the Advanced Learning Algorithms course in this specialization:
If the model doesn’t overfit the data, then we have two cases:
The model is a good fit - The current regularisation parameter gives low error rates on both the training and cross-validation data.
The model under fits the data - We have a high bias here and high error rates on both the training and cross-validation data. Increasing the value of the regularization parameter would result in a higher error rate so we lower the value of lambda
to increase the accuracy of the model.
Secondly, is regularization kind of a precaution irrespective of whether we encounter over fitting or not?
Regularization is a good practice to reduce the issue of high variance(overfitting the data) in the model. When we have a model with a sufficient learning capacity, the model may overfit the data.
Thirdly, how do lower values of weight parameters contribute to a better model?
High values of the weight parameters make the model sensitive - small changes in the inputs can lead to large changes in the output. It can be a sign of overfitting and the model fails to generalize over new data.
The topic of bias and variance is covered in depth in Week 3 of the Advanced Learning Algorithms course in this specialization. This will help deepen your understanding of bias and variance(including the use of regularization).