Hey guys. I have finished the video “ResNets” and hasn’t gone further. I’ve realized that we now used Skip Connection to avoid exploding gradients when NN get very deep. But then, i see that let’s say from A[0], it skipped to the computation A[2]. Where A[2] = g[2](Z[2] + A[0]). The problem is that is this solution really that effective? Because where does it get Z[2]? It gets from Z[2] = W[2]A[1] + b[2] right? Where does A[1] come from? Well what i meant is that does it need computation like i show above? If this doesn’t make sense please inform me i can make it more clear.
“Skip connection” does not mean, no data goes into the next neuron. The output from a previous layer, of course, goes into the next layer. In parallel, we store it as “residuals” and carries to the future neurons. So, it’s like a backup path to keep “network flow” active.
Oh, but what i meant was that let’s say you have A[0] where you use skip connection to A[2] computation which A[2] = g[2](Z[2] + A[0]). My question was that based on that, How does it become effective? I mean that in order to get Z[2] values u have to used A[1], W[2], b[2] values right? And in order to get A[1] you should do g1 which in order to get Z[1] you have to have W[1], A[0], b[1]. So how to you get the Z[2] value? By the computation i show above? If so then how does ResNets become useful? Thanks ahead!
You have all variables that you mentioned already. Which variables that you can not get ?
Using Andrew’s chart, let’s confirm the point that you are confused.
Think l =0, and a^{[l]} = A^{[0]}. Then, this should be same as your equations.
At the first neuron,
Z^{[1]} = W^{[1]}A^{[0]} + b^{[1]}, \ \ A^{[1]} = g(Z^{[1]}) \ \ \ (In here, A^{[1]} = a^{[l+1]} in Andrew’s picture)
At the 2nd neuron,
Z^{[2]} = W^{[2]}A^{[1]} + b^{[2]}, \ \ A^{[2]} = g(Z^{[2]}+A^{[0]}) \ \ \ (In here, A^{[2]} = a^{[l+2]} in Andrew’s picture)
We have all since A^{[0]} goes into the first neuron as well as is carried to the 2nd neuron as “residual”.
So the thing that i can’t understand is that the skip connection. And also i was wondering if this skip connection technique is really effective. Based on what i know about skip connection, You use the information of A[0] and skip to the computation A[2] where A[2] = g[2](Z[2] + A[0]). Maybe the whole point that i pleased you to answer was, What’s the difference between Skip connection and Main path based on computation?
That is what Andrew explained in “Why ResNets work”. I would recommend you to watch it again for better understanding.
On the other hand, there are some other thread to talk about ResNets. Here is the one. ResNets Question
Oh, sorry about that and the reason i was confused is that i haven’t watched the video “Why ResNets work”. Thanks!