There are two “Additional Hints” sections about “unrolling” and “reshaping”. Try reading the one before the compute_content_cost section and then the shorter one before the compute_layer_style_cost section. The key point is that you can’t just directly reshape the inputs into the end shape that you want: you need to do a transpose as part of the process so that the “channels” dimension of the data is preserved. There are several ways (orders in which) to accomplish this: transpose first and then reshape or reshape first and then transpose. Obviously the dimensions won’t be the same on the reshape between the two cases.
In order to understand the reason for this, you need to delve into the details of how reshape actually works. There was a similar situation in Course 1 Week 2 Logistic Regression in which we needed a transpose in order to get the results to be correct when we “flattened” the 4D image tensor into a matrix. Here’s a thread which explains the background there and the same fundamental ideas apply here as well. The key point is that you need to preserve the “samples” dimension of the data during the reshape: if you don’t, then you end up scrambling the data .
Update: here’s a more recent thread that shows a very specific example for the given case here of the right and wrong ways to do the reshaping in order to preserve the “channels” dimension.