Skipped connection in ResNet

Arisha_Prasain · April 29, 2023, 5:46pm

I get that it reduces the problem of vanishing gradient but If the layers in the network are made by using short circuit connections intended to skip over the layer , then why make that ‘skipped layer’ in the first place? Won’t that defeat the purpose of ‘creating a layer’?

alvaroramajo · April 29, 2023, 7:40pm

Hi, @Arisha_Prasain !

I think you are seeing it as an analogy of an electrical circuit, but it is quite different here. Having these kind of connections makes the network have the input and output of the layer, not just the input as I think you are suggesting. You can always check the original paper for further information

Marco_Morais · April 29, 2023, 7:49pm

The skip connection represents a second (shortcut) path through the residual block, but the main path which does not take advantage of the skip connection is still computed.

The last layer in the residual block passes the sum of the shortcut path (A[k]) and the main path (Z[k+2]) to the activation function (g). The shortcut path (A[k]) is the input to the residual block.

A[k+2] = g( Z[k+2] + A[k] )

You can imagine 2 scenarios.

Scenario 1: Due to to vanishing gradients, the main path (Z[k+2]) is 0. As a result the value passed to the activation function is g( 0 + A[k] ) which is identical to the input. In this scenario, the residual block acts as the identity function returning as output whatever was passed as input.
Scenario 2: The main path (Z[k+2]) is nonzero. As a result the value passed to the activation function is g( Z[k+2] + A[k] ). In this scenario, both the main path and skip connection contribute to the output of the residual block.

Figure 2 of the ResNet paper describes the residual block showing both the main path F(x) and shortcut path x. Both paths meet at the sum junction before being passed to the ReLU activation function.

Aditya_Ranganath · November 30, 2023, 8:04pm

@Marco_Morais Since layer k can can have multiple activations, how is it added to Z[k+2]. Are all the activations in A[k] just added up? A[k] is a vector and not a single number.

paulinpaloalto · March 28, 2024, 6:56pm

They are two vectors (or matrices in the case of multiple samples) of the same dimensions, so you can add them together “elementwise”. Resulting in yet another vector (or matrix) of the same dimensions.

Topic		Replies	Views
RESNET Explanation Convolutional Neural Networks	1	481	August 26, 2022
ResNets about Skip Connection Convolutional Neural Networks	6	572	June 13, 2022
Pls elobarate on how skip connection helps gradients backpropogate Convolutional Neural Networks	1	538	July 30, 2022
Question about residual blocks and skipped connections Convolutional Neural Networks	6	590	December 5, 2022
Sense of ResNet Convolutional Neural Networks	1	492	May 16, 2023

Skipped connection in ResNet

Related topics