I’m a bit confused about the whole idea of regularization: it seems too simplistic and to rely too much on the specificities of the outcome we want to get. What is the rationale for it, as opposed to more careful selection of the features?
Also, in a practical sense, how do we obtain convergence when the value of the coefficients is shrunk at each iteration by lambda/m? Is this because the added penalization term in the cost function is a convex function of w and so the cost function remains globally convex?
Regularization is a technique used in machine learning to prevent overfitting, which is the phenomenon where a model fits the training data too closely and does not generalize well to new data. It does this by adding a penalty term to the cost function that discourages the model from learning too many parameters or fitting the training data too closely.
There are a few reasons why regularization is useful:
- It helps to prevent overfitting: By adding a penalty term to the cost function, regularization can help to reduce the complexity of the model and prevent it from fitting the training data too closely. This can improve the model’s ability to generalize to new data.
- It can improve the interpretability of the model: A model with fewer parameters is often easier to interpret and understand. Regularization can help to reduce the number of parameters in a model, making it easier to understand how the model is making predictions.
- It can improve the stability of the model: A model with many parameters can be sensitive to small changes in the training data. Regularization can help to reduce the number of parameters in a model and make it more stable.
As for your second question, regularization can indeed help to ensure convergence of the optimization algorithm even when the value of the coefficients is shrunk at each iteration. This is because the added penalization term in the cost function is a convex function of the coefficients, which means that the cost function remains globally convex. This property can help the optimization algorithm to find a global minimum of the cost function, rather than getting stuck in a local minimum.
It’s important to note that regularization is just one tool that can be used to prevent overfitting and improve the performance of a machine learning model. There are also other techniques, such as carefully selecting the features, that can help to improve the model’s performance. Regularization can be a useful technique in many cases, but it’s not a one-size-fits-all solution, and it’s important to consider the specific characteristics of the data and the problem when deciding which techniques to use.
Manually selecting the features would require a lot of expert “human learning”.
We’re seeking a method that uses machine learning.
Is this because the added penalization term in the cost function is a convex function
This is no contradiction. When using classic machine learning models you should definitely chose your features carefully, evaluate importance, check distribution / correlations as well as think about transformations like PCA or PLS as well as incorporate your domain knowledge w/ signal processing.
Still you might not have sufficient data to fit a robust model which poses the risk of overfitting. In this case regularization is a powerful tool to reduce the model complexity as @pastorsoto pointed out correctly in his very good answer. Dropout would be another potential measure to reduce overfitting, see also this thread.