Slope dj_dw,dj_db not converging simultaneously,

dj_dw,dj_db- gradient of weight and scaler as highlighted in lecture class W2 topic- gradient descent

i used gradient descent to implement multiple linear regression. the problem is that the value of gradient dj_dw is converging to zero but at the same time the value of gradient dj_db is not converging to zero and becomes constant.


for above, value of djdw is close to 0 while that of djdb is constant at -290 ishh

1 Like

Check if you model is overfitting.

Actually I want Raymond to reply for this, he would be right person to explain this
@rmwkwok can you please have a look.

Also Vivek if it is not course related assignment, then kindly give details about your model, what you are looking for.

Regards
DP

If the linear regression model is y = wx +b, then dj_dw will be changing as you fit the data, these are the tunable weights. The dj_db where b is a constant will be an offset!

There’s no rule that says that the weights and bias must converge at the same rate.

Hello @Deepti_Prasad,

When we speak about “large” and “small”, we need to know what we are comparing with.

The cost values (assuming squared cost) are telling us that the averaged error per sample is pretty large (estimate: \sqrt{50000000} \approx 7000).

Assuming that the model has achieved a 5% accuracy, then the labels are in the range of 7000/0.05 = 140,000.

We know that the model is y = \vec{w} \cdot \vec{x} + b , and that the change of b \approx 290\alpha, so whether such change is large or not is a comparison with the labels: \frac{290\alpha}{140000} \approx \frac{\alpha}{480}.

The question is, is \frac{\alpha}{480} large?

Lastly, note that the above are my estimations, but the learner is responsible for providing the actual numbers.

Cheers,
Raymond

(I may not be able to follow up on this thread, please help me, @Deepti_Prasad)

Hello Raymond,

What do you mean by this? Even I couldn’t reply to learner’s query as only his training image was posted without much information about dataset, model and batch size.

From the image what I only understood that the gradient descent learnt enough about the model and lost way due to some error in the analysis, and I can’t interpret as I do not have the whole information about the same.

Regards
DP

I mean I may not be able to respond to the learner’s future feedback promptly, so if they do feedback, please help take care of it.

Cheers,
Raymond