Hi!
I’m wondering what is the point of the residual layers. If a[l] is equal to a[l+2] then what is the purpose of those layers? Why not just connect the output of layer l with the input of layer l+2.
Thank you!
Hi!
I’m wondering what is the point of the residual layers. If a[l] is equal to a[l+2] then what is the purpose of those layers? Why not just connect the output of layer l with the input of layer l+2.
Thank you!
I am sure there are plenty of posts in this part of the forum for residual connections if you search. The basic idea is this:
because of diminishing/exploding gradients issue with large NN, it is good to have a previous copy of the trainable parameters so these can be used again in the far ahead layers (this is like having a backup copy of the previously learned information). And that’s what you do, you connect l as input to l+2 including also as input l+1, concatenated. This will ensure that the large NN can still propagate learning to the output.
Thank you and sorry for not searching more carefully