dj_dw,dj_db- gradient of weight and scaler as highlighted in lecture class W2 topic- gradient descent

i used gradient descent to implement multiple linear regression. the problem is that the value of gradient dj_dw is converging to zero but at the same time the value of gradient dj_db is not converging to zero and becomes constant.

If the linear regression model is y = wx +b, then dj_dw will be changing as you fit the data, these are the tunable weights. The dj_db where b is a constant will be an offset!

When we speak about â€ślargeâ€ť and â€śsmallâ€ť, we need to know what we are comparing with.

The cost values (assuming squared cost) are telling us that the averaged error per sample is pretty large (estimate: \sqrt{50000000} \approx 7000).

Assuming that the model has achieved a 5% accuracy, then the labels are in the range of 7000/0.05 = 140,000.

We know that the model is y = \vec{w} \cdot \vec{x} + b , and that the change of b \approx 290\alpha, so whether such change is large or not is a comparison with the labels: \frac{290\alpha}{140000} \approx \frac{\alpha}{480}.

The question is, is \frac{\alpha}{480} large?

Lastly, note that the above are my estimations, but the learner is responsible for providing the actual numbers.

Cheers,
Raymond

(I may not be able to follow up on this thread, please help me, @Deepti_Prasad)

What do you mean by this? Even I couldnâ€™t reply to learnerâ€™s query as only his training image was posted without much information about dataset, model and batch size.

From the image what I only understood that the gradient descent learnt enough about the model and lost way due to some error in the analysis, and I canâ€™t interpret as I do not have the whole information about the same.