Exploding gradients in deep neural network

Nandan1 · September 26, 2023, 9:14pm

This is my first post in this community - so please be gentle if I break any rules

This in the context of [Deep Learning Specialization] → [Course 2 - Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization] → [Section: Setting up your Optimization Problem] → [Video Title: Vanishing/Exploding Gradients].

I think I understand why derivatives being small (caused by very deep neural networks) can lead to slower training times (given we have to take more steps to reach the optima while doing the gradient descent). However, I don’t understand why a large derivative is a problem. As long as the large descent is following the “curve” of the loss function 'J’it would get us to the “bottom of the valley” sooner, no? This was not explained in the video

TMosh · September 26, 2023, 10:06pm

The problem with exploding gradients is that the changes in the weight values may go way past finding the minimum cost, and climb up the cost curve on the opposite side of the minimum.

Topic		Replies	Views
Vanishing / Exploding Gradients : week1 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	560	June 19, 2021
What causing exploding gradients? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	591	April 21, 2022
Vanshing and exploding gradient NLP with Sequence Models week-3	3	518	April 8, 2022
Vanishing/Exploding gradients C2W1 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	641	January 2, 2023
The problem of expolding/vanishing Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	510	March 5, 2022

Exploding gradients in deep neural network

Related topics