C4 W2 "Why ResNets Work?" Question about the insight

I loved this video in Course 4, Week 2 called “Why ResNets Work?” in giving me some intuition about why adding the identity via skip dampens against vanishing gradient.

Maybe the answer to this is another video, but, what about exploding gradients? It seems like to protect against that we’d want to add, I don’t know, some like of log() function on Wx+b term, in addition to identity on skipping a term? e.g. g( (Wx+b) + (a_-1) + log(Wx+b) )

In other words, without something like this, does skipping help ResNets from exploding in addition to vanishing?

Thank you Andrew & Mentors! this course is great!


(btw, i searched to see if this question has been asked already & couldn’t find. apologies, if i just missed it somehow)

It’s an interesting suggestion. It’s been a while since I listened to what Prof Ng says in that lecture, but my memory is that he says that the skip layers help with just moderating the behavior of forward and back propagation in general, keeping things from “going off the rails” in either direction. Meaning that the existing Residual Net architecture helps with both vanishing and exploding gradients. At least that’s what I remember from this. If you see any references in the lectures that you think disagree with that, please post a reply with a pointer to where you found that part (video and time offset). Thanks!