C4 W2 "Why ResNets Work?" Question about the insight

kens · February 5, 2022, 5:46pm

I loved this video in Course 4, Week 2 called “Why ResNets Work?” in giving me some intuition about why adding the identity via skip dampens against vanishing gradient.

Maybe the answer to this is another video, but, what about exploding gradients? It seems like to protect against that we’d want to add, I don’t know, some like of log() function on Wx+b term, in addition to identity on skipping a term? e.g. g( (Wx+b) + (a_-1) + log(Wx+b) )

In other words, without something like this, does skipping help ResNets from exploding in addition to vanishing?

Thank you Andrew & Mentors! this course is great!

Ken

(btw, i searched to see if this question has been asked already & couldn’t find. apologies, if i just missed it somehow)

paulinpaloalto · May 27, 2022, 8:45pm

It’s an interesting suggestion. It’s been a while since I listened to what Prof Ng says in that lecture, but my memory is that he says that the skip layers help with just moderating the behavior of forward and back propagation in general, keeping things from “going off the rails” in either direction. Meaning that the existing Residual Net architecture helps with both vanishing and exploding gradients. At least that’s what I remember from this. If you see any references in the lectures that you think disagree with that, please post a reply with a pointer to where you found that part (video and time offset). Thanks!

Topic		Replies	Views
Quiz week 2, question Q5 Convolutional Neural Networks	2	528	November 7, 2021
Residual Connection - Exploding Gradients Convolutional Neural Networks	1	606	June 4, 2021
Week2 Assignment1 Contradiction with ResNet Paper Convolutional Neural Networks	1	514	May 17, 2022
Why ResNets work? weight decay causes activations to be same Convolutional Neural Networks	2	433	July 10, 2023
Simplifying ResNets after training Convolutional Neural Networks	2	485	November 16, 2022

C4 W2 "Why ResNets Work?" Question about the insight

Related topics