I don’t quite follow what Pf. Ng’s explanation on why the resnet works. In the lecture, Pf. Ng said it’s easy to learn identity function, does that mean resnet will tend to give a value around 0 to parameters w and b and thus only focusing on that identity mapping? And why the ease of learning identity function results in better performance of very deep NN? Can anyone explain more on this topic?
Hi 1157350959,
Maybe this explanation clarifies?
1 Like
Yes! That helps a lot! Thanks for the help!
It would be a big help to add a pointer to this writeup in the course notes.
$0.02,
Nidhi
1 Like