C4W2 Residual network vs plain network

heeseong_kim · November 2, 2021, 12:29am

Prof said “very deep neural networks are difficult to train because of vanishing and exploding gradient problems" ,and this led to the result of the graph on the left.

Q1. I think that “exploding gradient” could lead to “rising training error” because exploding gradient could make the gradient descent step too big.
Is what I think right? Is there any other reason “exploding gradient” could lead to “rising training error”?

Q2. I do not think that “vanishing gradient” could lead to “rising training error” because vanishing gradient gradient could make the gradient descent step too small, and this leads to almost no change of gradient.
Is what I think right? Is there reason “vanishing gradient” could lead to “rising training error”?

jonaslalin · November 2, 2021, 8:10pm

Yes, the weights become too big and may become NaN during training; hence, the training error will skyrocket when this happens.

Yes, you have to remember that the parameters of higher layers might still change significantly whereas the parameters of lower layers would not change much (or not at all). Anyway, the model is still learning, but very slowly, so the training error will flatten out, but not increase, if you don’t do something weird with the learning rate.

Topic		Replies	Views
Week2 Assignment1 Contradiction with ResNet Paper Convolutional Neural Networks	1	514	May 17, 2022
Residual Connection - Exploding Gradients Convolutional Neural Networks	1	608	June 4, 2021
Vanishing / Exploding Gradients : week1 Improving Deep Neural Networks: Hyperparameter tun	3	560	June 19, 2021
Question on weight initialization and exploding/vanishing gradients Improving Deep Neural Networks: Hyperparameter tun	9	673	May 23, 2021
Exploding gradients in deep neural network Improving Deep Neural Networks: Hyperparameter tun	1	345	September 26, 2023

C4W2 Residual network vs plain network

Related topics