Hello @renesultan,

I deleted my yesterday’s reply because I wanted to add some experiment results…

We can visualize it.

Consider a simple linear model with only one weight f(x) = wx and using the squared loss,

our cost function is: J_0 = \frac{1}{2m}\sum_{i=1}^{m}(wx^{(i)}-y^{(i)})^2

and regularized cost is: J = J_0 + \frac{\lambda}{2m}w^2

One important observation here is both J and the regularization term (\frac{\lambda}{2m}w^2) are 2nd degree polynomials of w, so they are parabolas. Note that a 1st degree polynomial is a line.

Now we plot them all J_0, \frac{\lambda}{2m}w^2 and J on a graph to see how the regularization affects the original cost J_0:

Green is original cost J_0

Blue is regularization \frac{\lambda}{2m}w^2

Red is regularized cost J = J_0 + \frac{\lambda}{2m}w^2

The optimal w for original & regularized cost are at the minimum points of the green & the red lines respectively, so from the graph, the optimal w become smaller after being regularized. The same idea applies to linear model with more than one weight. * The key take-away here is adding regularization pushes the optimal w closer to 0*.

The answer is, no, regularization generally doesn’t conserve that, except for that all features have the same variance and they are all uncorrelated with each other which is rare in our real world data.

However, if you only care about the relative ordering (instead of the relative magnitude of weight size), then depending on the regularization parameter *lambda* and the variances and correlations between features, you might see that some (or most) orderings are preserved. I did an experiment with a dataset of 100 features, and plot a histogram of ranking shift over different choices of *lambda* and number of features involved.

When you have only 2 features, the ordering of the features is not changed. However, as we increase the number of features or/and the size of *lambda*, the histrogram spreads out which means there are more and more ranking shifts, and large ranking shifts.

Cheers!