"So rather than needing to follow the main path, the information from a[l] can now follow a shortcut to go much deeper into the neural network. "
But doesn’t z[l+2] need to be calculated the “normal” way in either case - with or without res. block? If yes then calling this “shortcut” - something easier and faster also confuses me. If we need to compute z[l+2] anyway then adding a[l] to the result is an additional operation, not shortcut.
I don’t want to be picky but I’m not sure if I misunderstood the wording or the concept of ResNets.
I think the term “shortcut” is about the connection - the purple line. a^{[l]} goes through both the main and the shortcut path. Relative to the main path, the shortcut path is a shortcut.
The shortcut doesn’t make the main path simpler, but it lets a copy of a^{[l]} reach there without having to get through the main path.
Thanks. True, relative to the main path it is, but a[l] has to get to z[l+2] through the main path too and then, additionally, it gets there through the “shortcut”. If so, “rather than” from Andrew’s quote above could also mean “in addition to”. So we first calculate z[l+2] in the normal way and then add a[l]?