Question About reversible residual layers

In the lecture the following describes the reversible residual layers to back calculate X1 and X2. It was not clarified what is the difference between X1 and X2. I thought they are the same, one is duplicate of the other, am I wrong? if so why do we need to back calculate both X1 and X2. if they are different, what is the difference?

“Reversible residual layers allow you to reconstruct the forward layer from the end of the network. Usually you have two similar branches in the network that you use to compute the network.”

1 Like

If x1 and x2 are duplicates, it will be redundant to recompute both. The main point is that we have a way to recompute the backward propagation inputs using the forward pass. This improves memory efficiency. The layers are just similar, not a duplicate.

1 Like