In the lecture video Why ResNets work , at 4:56 minute we cannot understand the below statement. can someone please help to clarify ?
And what goes wrong in very deep plain nets in very deep network without this residual of the skip connections is that when you make the network deeper and deeper, it’s actually very difficult for it to choose parameters that learn even the identity function which is why a lot of layers end up making your result worse rather than making your result better.
what does it mean difficult to choose parameters ?
Doubt 2 Here it’s actually very difficult for it to choose parameters that learn even the identity function, what does it meaning the pharse (that learn even the identity function )