I understand the first part, but why is a[l+2] = a[l] when we apply the relu activation function ? I understood that relu returns a non-negative function but still I do not understand how we can say that this is true. Please help !
Prof Ng mentions this a bit earlier in the video: he is assuming that the network uses ReLU at all layers, so that means that all values of a^{[l]} \geq 0. So if you apply ReLU to that input, the output is the same. ReLU only changes the values that are negative, right?