Regularization : Do larger weights imply complex model?

In L2 regularization, we say that a greater value of lambda shrinks the weights and thus save the model from overfitting. However, while explaining it graphically, we show a higher order polynomial function being fit to a dataset. My question is how the large values of weights correlate to more complex or higher order polynomial. We can have a linear function as well with very large weights, e.g. : y = 102863x1 + 347629x2 + 22222, right?

Your equation doesn’t have any higher-order features. It just has two (x1 and x2) which are linearly combined.

Yes. That’s where my doubt is. Let me rephrase what I mentioned earlier. What’s the purpose of L2 regularization actually?

  1. To keep the weights smaller?
  2. To keep the model less complex?
  3. Does more complex model/ higher order polynomial have correlation with smaller weights?
  4. In the model that i had mentioned earlier, applying L2 regularization can help reduce the weights, right?

First you add features to get a more complex model.

Then if you have overfitting, you add regularization to reduce the magnitude of the weights. This reduces overfitting.

You might find this discussion interesting where I believe all your open points are being answered:

The purpose of L1 vs L2 regularization is also discussed here: Do you see regularization the way I do? - #2 by Christian_Simonis

Feel free to take a look!

Best regards
Christian

Applying L1 or L2 regularization will reduce the magnitude of the weights (answering your Q1 & 4), and since L1 regularization is better at pushing weights towards zero, it makes the model simpler because zero weight basically takes out the corresponding feature from the model (answering your Q2). Regularization prevents the model to fit too well to the training data points by making the weights smaller, so that your model is less capable of overfitting to the training data.

For your Q3, I think there is no strong correlation, remembering that weights can be positive or negative, and higher order term can be greater or smaller than lower order term.

Raymond