Early stopping- large w

Hello

In the figure brought in the slide related to the early stopping, why is it written large w below the figure? I can understand that early stopping prevents weights from getting very large, but I can’t understand why weights get bigger with each iteration.

Hey @Erfan_Brv,
It simply could be due to Exploding Gradients, i.e., when the gradients will become too large, the update steps will also be comparatively large, and they could result in large values of w.

Additionally, even if the network is free from the issue of exploding gradients, it could be simply due to a large number of update steps, i.e., a large number of iterations. Consider a case when we start with a small random value for w, say 2.3, and keep on adding a small value like 0.01 10000 times, so, the resulting value would be 102.3, which is quite large compared to 2.3. I am sure you can imagine what will happen when the number of iterations will further increase or the values added in each of the iterations will increase.

Now, these are just some of the reasons as to why this may happen, it doesn’t mean that they are always true. For instance, in the case of Vanishing Gradients, it may not be true. Additionally, it’s highly unlikely that in each of the iterations only a positive (or negative value) is added. It’s perfectly fine for some iterations to add positive values and some to add negative values, and nullify the effect in total.

I hope this helps.

Regards,
Elemento

Thank you so much Elemento

Hi, at every back propagation we calibrate w as (for layer1) w[1] = w[1] - alpha*partial derivative of loss wrt to W so on and so forth . This reduced w is passed on from back propagation to forward propagation for the next iteration . So my doubt is w should always reduce with every iteration compared to previous iteration for that DL layer because of the minus sign until and unless (dJ/dw) is negative. pls clarify

Hey @Ekta1,
Welcome to the community. You are correct.

Cheers,
Elemento