This residual connection or skip connection, takes the input from the previous layer and sums it or passes it directly to the next layer, does allow gradients to propagate backward more efficiently.

Hey @karra1729,

You will find this thread to be somewhat similar to your query. In this thread, Paul Sir has described how back-propagation works in ResNets.

However, if you are clear as to how the back-propagation works in ResNets, but are unclear as to how the residual connections help the gradients back-propagate more efficiently, then here’s my two cents.

You must be already familiar with the fact that deep neural networks tend to suffer from **vanishing gradients** due to continuous multiplication of smaller and smaller weights with the gradients. Also, you must be familiar with the fact that in an identity connection (*used by a residual block*), there’s no weight, or you can say that the weights are all equal to 1. So, when a gradient will back-propagate via an identity connection, it will just be multiplied by 1, and hence, will not shrink in value, and thus propagate for longer distances, or in other words, **more efficiently**.

A similar query as to yours can be found here as well. You will find this article mentioned in this thread, which provides an **Intuitive explanation of Skip Connections in Deep Learning**.

Let me know if this helps.

Cheers,

Elemento