The input data for the Course 1 Week 2 Logistic Regression assignment consists of 64 x 64 RGB images with corresponding labels for whether the images contain a cat or not. There are 209 training images and 50 test images. So that means that the input images for training are given to us as a 4 dimensional numpy array with the shape:
209 x 64 x 64 x 3
The first dimension is the “samples” dimension. For each of the 209 samples there are 64 x 64 pixels and each pixel has 3 color values R, G and B. Note that:
64 x 64 x 3 = 12288
In order to process these images with either Logistic Regression or a Neural Network, we first need to “unroll” or “flatten” each image into a column vector. That is because we represent the input vectors as column vectors. That means that we will end up with the training data as a 12288 x 209 matrix in which each column is one flattened image. So how do we convert the given 4 dimensional array into that 2 dimensional matrix? They give us the following python statement to do the flatten operation:
X_flatten = X.reshape(X.shape[0], -1).T
Note that the -1 there is not anything magic: it is just a shortcut that means “use all the leftover values of the remaining dimensions here”. You could get the same result with this statement:
X_flatten = X.reshape(X.shape[0], X.shape[1] * X.shape[2] * X.shape[3]).T
The really important point here is the transpose, not the -1. The point is that the first dimension is the “samples” dimension and that dimension must be preserved in the output. Without the transpose, we will end up with a 209 x 12288 matrix with the samples as the rows and the transpose gives us 12288 x 209 with the samples as the columns. Now you might think that an easier way to get a 12288 x 209 matrix would be to do the “reshape” this way:
X_flatten = X.reshape(-1, X.shape[0])
That does give you the same output shape, but it turns out that the results are garbage. That method “scrambles” the data by mixing pixels from multiple images into each column of the output. To see why, let’s construct a simpler example that will let us visualize what is happening. Open a new cell and enter the following function:
# routine to generate a telltale 4D array to play with
def testarray(shape):
(d1,d2,d3,d4) = shape
A = np.zeros(shape)
for ii1 in range(d1):
for ii2 in range(d2):
for ii3 in range(d3):
for ii4 in range(d4):
A[ii1,ii2,ii3,ii4] = ii1 * 1000 + ii2 * 100 + ii3 * 10 + ii4
return A
What that does is create a 4D array where the value in each position of the array shows the index values of its position in the array with each dimension in order. That is to say A[1,2,3,4] = 1234. Of course this is only going to be understandable if all the dimensions are single digit size. Using that function, we can create a play 4D array of dimensions 3 x 2 x 2 x 3 (which you can think of as 3 samples, each of which is a 2 x 2 RGB image) and then unroll it in both ways:
A = testarray((3,2,2,3))
Ashape1 = A.reshape(A.shape[0],-1).T
Ashape2 = A.reshape(-1,A.shape[0])
np.set_printoptions(suppress=True)
print("A.shape = " + str(A.shape))
print(A)
print("Ashape1.shape = " + str(Ashape1.shape))
print(Ashape1)
print("Ashape2.shape = " + str(Ashape2.shape))
print(Ashape2)
When you run that, here is the output:
A.shape = (3, 2, 2, 3)
[[[[ 0. 1. 2.]
[ 10. 11. 12.]]
[[ 100. 101. 102.]
[ 110. 111. 112.]]]
[[[1000. 1001. 1002.]
[1010. 1011. 1012.]]
[[1100. 1101. 1102.]
[1110. 1111. 1112.]]]
[[[2000. 2001. 2002.]
[2010. 2011. 2012.]]
[[2100. 2101. 2102.]
[2110. 2111. 2112.]]]]
Ashape1.shape = (12, 3)
[[ 0. 1000. 2000.]
[ 1. 1001. 2001.]
[ 2. 1002. 2002.]
[ 10. 1010. 2010.]
[ 11. 1011. 2011.]
[ 12. 1012. 2012.]
[ 100. 1100. 2100.]
[ 101. 1101. 2101.]
[ 102. 1102. 2102.]
[ 110. 1110. 2110.]
[ 111. 1111. 2111.]
[ 112. 1112. 2112.]]
Ashape2.shape = (12, 3)
[[ 0. 1. 2.]
[ 10. 11. 12.]
[ 100. 101. 102.]
[ 110. 111. 112.]
[1000. 1001. 1002.]
[1010. 1011. 1012.]
[1100. 1101. 1102.]
[1110. 1111. 1112.]
[2000. 2001. 2002.]
[2010. 2011. 2012.]
[2100. 2101. 2102.]
[2110. 2111. 2112.]]
Notice how each column of Ashape1 has consistent values for the first dimension, whereas that is not true for Ashape2 . You can see that each column of Ashape2 contains entries from all three of the input samples (values of the first index of the input). Also if you read down the columns in Ashape1, you can see that the unrolling happens from the highest dimension backwards. That is the color dimension, so that means that the order is all 3 color values at each position together and then it steps through the width and height dimensions.
This demonstrates why the second method actually scrambles the data, making it meaningless for our purposes. That is why you get 34% test accuracy if you run the training using the incorrectly flattened data: the images no longer make any sense.