Vanshing and exploding gradient

xingzhe_li · April 8, 2022, 12:19pm

From my understanding, the reason why gradient is relevant to deep learning is that it provides the DIRECTION of which the largest decrease in the loss function. If this is the case, then we don’t really care about the magnitude of the gradient, so why are we bothered by vanishing and exploding gradient? Is it because it becomes difficult to extract the direction when the entries are too small or large (accuracy problem with float)?

gent.spah · April 8, 2022, 12:35pm

Its not only the direction but also the magnitude by which the movement towards an optima is made. If you have exploding gradients then you could overshoot (by pass) from the optima and could wondering around for a long time with no optimum found. In case of diminishing gradients then very little improvements are done and it could theoretically take much time to find an optima and therefore use much resources with low efficiency.

xingzhe_li · April 8, 2022, 2:20pm

But can we not extract the direction and use an arbitary learning rate? It won’t quite reacht he optima but given a small learning rate, it can get within a radius I think?

gent.spah · April 8, 2022, 3:41pm

the new weights after each step of gradient descent are Wnew = Wold – (α * dE/dW), there is a multiplication of learning rate and gradient.

Topic		Replies	Views
Exploding gradients in deep neural network Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	362	September 26, 2023
So, what is vanishing/exploding gradient? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	932	August 19, 2023
W1 assignment_Initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	453	July 24, 2023
Vanishing / Exploding Gradients : week1 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	575	June 19, 2021
Vanishing / Exploding Gradients Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	5	558	January 11, 2024

Vanshing and exploding gradient

Related topics