Problem getting my Python code to perform regularization of linear regression

ai_is_cool · May 30, 2025, 11:17am

I cannot get the expected results for applying regularization to linear regression in Python. The weights get bigger with each loop iteration and not smaller as I expect.

Can anyone check my code and see where I am going wrong?

Thanks
main.py (1.1 KB)

Alireza_Saei · May 30, 2025, 12:33pm

Hi @ai_is_cool

You’re using L2 regularization, but it should only apply to the weights w, not to the bias b. Also, your reg_lambda is veeeery big!

Hope it helps!

ai_is_cool · May 30, 2025, 2:22pm

I’m not applying Regularization to b.

Is the code correct?

Can you try changing some values to \lambda, \alpha and w to get it to work and reduce the weights with each iteration?

TMosh · May 30, 2025, 6:09pm

Comments:
You have a really large lambda value, and a very small learning rate, and only 50 iterations.
Some experimentation seems to be worthwhile.

Rule of thumb: For fixed-rate gradient descent to work well numerically, the features should be within an order of magnitude (so roughly between -3.0 and +3.0).

The features in your example (including the polynomial terms) vary from -300 to 33,215 (or thereabouts).

So you might try pre-computing the polynomial terms (this would make X a matrix of size (m x p), where ‘p’ is the number of polynomial features you’re adding), and then normalize X, then run gradient descent.

ai_is_cool · May 30, 2025, 6:21pm

I was setting \lambda= 10000 and \alpha = 0.001 because it was a value Prof. Ng set in his video lesson.

So I should be using z-score standardisation on a matrix of X and its integer powers up to 4?

TMosh · May 30, 2025, 7:53pm

It is certainly worth a try.

ai_is_cool · May 30, 2025, 9:23pm

Do you know what particular situations and limitations of value of X and w regularization is best-suited to combat?

TMosh · May 30, 2025, 9:36pm

Regularization is not a matter of the W and x values.

It’s a matter of the number of training examples (m) you have compared to the number of features (n).

The closer you are to 1:1, the more likely it is you will have overfitting, and the more regularization you will need to avoid it.

conscell · May 31, 2025, 5:58am

Hi @ai_is_cool,

You’re encountering a common issue in machine learning, especially with polynomial regression: exploding gradients. This happens if your features have very large values, which then lead to extremely large magnitude values in the gradient calculation:

dj_dw = [ 0.00000000e+00 -1.14980966e+06 -1.60325406e+07 -2.06758237e+08
 -2.80586600e+09]   dj_db = -99443.36742287749

With gradients in the range of 10^6 to 10^9, your current learning rate of 0.001 is far too large. Easy fix: set learning rate to 1e-9 and increase the number of epochs. Better fix, as other mentors suggested is to scale your features.
Another problem is with weight decay factor 1 - \alpha \frac{\lambda}{m}:
\alpha \frac{\lambda}{m} = 0.001 \times \frac{10000}{4} = 0.001 \times 2500 = 2.5
So, the factor becomes: 1 - 2.5 = -1.5. Your weight update rule effectively has this component w_j \leftarrow -1.5 \cdot w_j - \text{gradient_descent_term}, which increases the weight w_j to its magnitude by a factor of 1.5 in each iteration. Decreasing the learning rate also fixes this problem. Decreasing \lambda, is better solution.

ai_is_cool · May 31, 2025, 8:38am

Ok thanks. I will try that.

ai_is_cool · May 31, 2025, 1:37pm

It’s working now. I reduced the training example values to less than 3 but greater than -3 and the number of epochs to 5e6.

Is it typical to have to set the number of epochs to such a high value?

TMosh · May 31, 2025, 3:51pm

When you have normalized features, you can use a larger learning rate (since the “exploding gradient” problem has been avoided), and then you will need fewer epochs.

ai_is_cool · May 31, 2025, 6:02pm

Ok thanks.

I will try that.

Topic		Replies	Views
Doubt in a python code I’ve written Supervised ML: Regression and Classification week-module-3	6	525	November 30, 2022
W, B values are not converging Supervised ML: Regression and Classification week-module-1	5	36	July 4, 2024
Large value of lambda in Regularization Supervised ML: Regression and Classification week-module-3	14	1017	December 6, 2022
Regularization lambda Supervised ML: Regression and Classification week-module-1	3	450	September 23, 2023
Why is the value of regularization parameter(lambda) the same for all the weight parameters Supervised ML: Regression and Classification week-module-3	3	521	July 28, 2022

Problem getting my Python code to perform regularization of linear regression

Related topics