Hello,
Requesting a clarification…
To implement Residual Network (RESNET), we need to add X and the X_Shortcut… will this not impact the performance of the network? and if we have a clustered RESNET the performance impact could be more?
Thanks,
Hello,
Requesting a clarification…
To implement Residual Network (RESNET), we need to add X and the X_Shortcut… will this not impact the performance of the network? and if we have a clustered RESNET the performance impact could be more?
Thanks,
Yes, it will impact the performance of the network. Otherwise we would not bother doing it.
Are you asking if there is a negative impact?
Yes, I was referring to negative impact.
Reason for the ask: For calculating both X and X_Shortcut, the system ends up taking more time / resource for computation.
So I am trying to understand does this approach improves performance?
One extra vector addition is not a major impact on the performance of anything. Adding the “shortcut” layers is the point of the Residual Net architecture. Prof Ng goes into quite a bit of detail in the lectures about why this is helpful for training deep neural networks. If you missed those points, the best idea is just to watch those lectures again or watch them for the first time as the case may be.
Thanks a lot for your inputs.
Yes, understood it better… when W[l+2] =0 then a[l+2] = a[l] so “Identity Function is easy for Residual Block to learn” hence it doesn’t hurt the performance.