Neural style transfer, programming exercise, compute_content_cost reshape confusion

Yes, the point is that you need to do the reshape in a way that “preserves” the channel dimension and does not scramble the data between channels. But then you need the channels dimension to be the first dimension because the Gram matrix is A \cdot A^T and you want it to be the “correlation” of the filters. It requires a transpose to get the channels dimension as the first dimension. If you reshape directly to that requisite shape, instead of doing the two step process, the data becomes garbage because you mix the h and w data across the channels. If you want to understand that, use the same idea that I showed on that thread about “flattening” images to create a “telltale” tensor and then do the two different algorithms and compare the results. That will make it clear why the direct reshape without the transpose doesn’t work.

1 Like