Even though the flattening separates some adjacent pixels, the geometric relationships can still be learned. Remember that we also flatten 2D images in the plain vanilla FC network case and it can still learn to recognize the spatial relationships. The point is that it can recognize them in any of the unrolling patterns, because the behavior is learned. That’s the point. What matters is only that you are consistent in how you do the unrolling in all cases: you decide which method you are going to use and then you use that one method everywhere. Here’s a thread which discusses this purely from the point of view of the Fully Connected networks in DLS C1. Make sure to read all the way through the thread to see the discussion about the different possible unrolling orders. I claim that the same reasoning applies to the case of FC layers at the end of a series of Conv and Pooling layers: back prop will still “connect” across the “flatten” step using the flatten method that you chose.
1 Like