Vanishing/Exploding Gradient

Won’t increasing the learning rate help?

In the case that your problem is vanishing gradients, it might help. If it’s the opposite case of exploding gradients, that will make matters worse. In that case you could try reducing the learning rate. As with everything here, there is no “one size fits all” solution and whether you can solve the problem only by manipulating the learning rate will depend on the particulars of your model architecture and your data.

Note that there are more sophisticated algorithms for managing the learning rate dynamically. Prof Ng will show us a couple of methods here in C2 Week 2. Then when we switch to using TensorFlow in Week 3, that will be managed internally for us.

1 Like

For exploding gradients, it is better to clip the gradients incase they exceed a certain value.
Also, vanishing gradient scenario has gradients in the order of -10 or even lower. Increasing the learning rate would not really help in those cases.
But, there is never a “one fits all” solution.