Hello, I have one question about the “short cut” in ResNet. I can’t understand the meaning of the translation. I mean, what’s the point of processing information in permeable layers? For example: a[l] = [0.5, 0.4, 0.3] we transform by layers and get (for example) z[l+2] = [0.01, 0.02, 0.03]. And then to get [l+2], we have to take g(a[l]+z[l+2]) = g([0.51, 0.42, 0.33]).
If in general, what is the meaning of getting z[l+2]? And why should the resulting z[l+2] be added to the original a[l]?
Hi, @s1rGAY !
The function of this shortcut connections is twofold. It makes the training process easier (check this paper) and solves the vanishing/exploding gradient issue, since those shortcut connections make the derivatives “pass through” the previous layers without being increased / decreased in each layer. With this mechanism, you can make deeper networks avoiding this problems.
Thank you @alvaroramajo ! Just For a complete understanding) When the model is used, we will not use Xidentity, but just go through the model directly ?
I’m not sure I understand your question, but when you “pass through” the layers with the shortcut connection, what you actually do is a summation of the input and the output of that layer.
I mean, the model uses shortcut only when training(red arrows). And the trained model follows the path of the blue arrows
Both paths are part of the model. Both are used both during training (forward and back propagation) and also during prediction mode (forward prop only), using whatever the weights are that were learned during training.