Resnet Lecture Clarification

Hi Sir,

In the lecture video Why ResNets work , at 4:56 minute we cannot understand the below statement. can someone please help to clarify ?

Statement
And what goes wrong in very deep plain nets in very deep network without this residual of the skip connections is that when you make the network deeper and deeper, it’s actually very difficult for it to choose parameters that learn even the identity function which is why a lot of layers end up making your result worse rather than making your result better.

Doubt 1
what does it mean difficult to choose parameters ?

Doubt 2 Here it’s actually very difficult for it to choose parameters that learn even the identity function, what does it meaning the pharse (that learn even the identity function )

Difficult to choose parameters here means its harder for the network to learn the next weights and biases for the layers. It is so because by that point the original gradients might have diminished to the point the network can barely update anything. An identity function is the function which returns the input itself as the output. What’s meant there is even something as simple as the identity function becomes hard for the network to learn.

Thank You sir for the first answer. Regarding second answer, how even identity function becomes hard for the network to learn ?