I am a bit confused on what’s going on with ResNets. We have the main path and the shortcut as seen on the slide. The general flow of the Neural Network is the main path, with the main difference is that we copy a[l] and place it right before the next ReLu function?
So we calculate z[l+2] and when we apply the ReLu we have g(z[l+2] + a[l])? What about the calculations in the middle? Do we just copy a[l] and at the same time the flow of the NN continues as normal?
I’m not sure whether it’s correct to think of the “general flow” as being the main path. The point of this architecture is that there are two parallel paths, both of which contribute (in perhaps different ways) to the results. The point is that the input to the last ReLU you show is the sum of the outputs from the two paths. And note that backward propagation will happen through both paths as well: as always, it’s the mirror image of what happens in forward propagation. So that the gradients at the point where the short cut branches off will be the sum of the gradients from the two separate paths.
With the above thoughts in mind, you should go back and listen again to what Prof Ng says about why this is a helpful and interesting approach.
2 Likes
That was really helpful, the “parallel” word clicked, and I think I understood it. Thank you!
Great! But I really recommend that you listen again to what Prof Ng actually says with that extra idea in mind. He is a really excellent teacher and I’m sure he explained this in the lectures way better than I just did.
Oh for sure, I always rewatch the lectures 2 or even 3 times haha. He really is an excellent teacher. Once i finished the ML course i came straight to this one and I couldn’t be happier!