Are skip connections necessary in big networks because of regularization driving down the values of W which prevent them from learning the identity function. If this wasnt done skip connections wouldn’t be needed right?
Here there is one recent post about this subject:
And I am sure there are plenty that discuss this issue if you search about it.
Hey, Thanks for the reply. My question is about a more specific instance though and I didn’t find anything that discusses this.
- Introducing the regularization parameter to prevent overfitting drives down values of W. This means W[l] can only take values like epsilon*w[l]. So even if it learns identity it won’t make much difference if W[l] is very small as it is driving down the values.
- So if there was no regularization parameter in the cost function would skip connections be useful?