Week 2, ResNets(Identity Function)

What is the significance of identity functions in deep layers? how is it different from just NOT having those layers in between whose activation is anyways going to zero and the activation of a layer acquires the value of some previous layer through identity function or whatsoever?
What I basically wants to ask is what is point of having a layer whose activation going to reach zero anyways and we are supposed to give it a value of some previous layer, isnt it like simply starting from beginning after a traversing a certain distance? isnt that a waste of computational power?

Hi @Moneet_Mohan_Devadig, the intuition is that you need a very deep neural network in some cases, for example when interpreting or translating long text sequences. In those deep networks, the layers you refer to should not go to zero activation if you need higher accuracy for the model. E.g. you want words in an earlier part of the sentence to be related to a later part of the sentence. The identify function is kind of a “memory” of the earlier part of the sentence that you feed into the layers interpreting the later part of the sentence, so if helpful in approving the accuracy of the later parts of the neural network layer.

Hope this helps. I am sure prof Ng explained this in the videos in a much better way than I do now, so it might be useful to relook at the videos on the intuition of the identify functions.

So can we say that the identity blocks acts as an insurance if at all activation value of any layer goes to zero?

Hi @Moneet_Mohan_Devadig, yes, it’s a kind of assurance for the neural network to remember items from earlier in the layers.

But this particular operation is simply a linear operation so it has got no other significance than ensuring that gradients values doesn’t reach the extreme isn’t it?

Hi @Moneet_Mohan_Devadig , am not exactly sure I understand your questions, it is mainly ensuring that gradients in the deeper part of the network don’t go to zero, and that the neural network can memorize constructs from earlier in the network. For values going to extreme/exploding (if that is what you mean), the other mechanisms discussed in course 2 are available.

I understood it to mean, that in the case there is something to be learnt, the layer will learn it, otherwise it can shut itself off, by essentially turning into an identity function.

This gives us more flexibility at the ‘design’ phase - we can go with deeper networks, and let the model figure out whether it requires a deep network or a shallower one will do.

Does that make sense?

Yes, that sounds like a good intuitive way to describe how “skip” layers affect the ResNet architecture. If I recall what Prof Ng says in those lectures, I remember him talking about the skip layers have a “moderating” influence on the gradients, making it easier to train a deeper network without having vanishing or exploding gradient issues. I think you could interpret that as the same intuition as yours, only expressing it in terms of the gradients, which are the “forcing function” for the learning process.