Flattening Images in the Logistic Regression Assignment in Course 1 Week 2

The input data for the Course 1 Week 2 Logistic Regression assignment consists of 64 x 64 RGB images with corresponding labels for whether the images contain a cat or not. There are 209 training images and 50 test images. So that means that the input images for training are given to us as a 4 dimensional numpy array with the shape:

209 x 64 x 64 x 3

The first dimension is the “samples” dimension. For each of the 209 samples there are 64 x 64 pixels and each pixel has 3 color values R, G and B. Note that:

64 x 64 x 3 = 12288

In order to process these images with either Logistic Regression or a Neural Network, we first need to “unroll” or “flatten” each image into a column vector. That is because we represent the input vectors as column vectors. That means that we will end up with the training data as a 12288 x 209 matrix in which each column is one flattened image. So how do we convert the given 4 dimensional array into that 2 dimensional matrix? They give us the following python statement to do the flatten operation:

X_flatten = X.reshape(X.shape[0], -1).T

Note that the -1 there is not anything magic: it is just a shortcut that means “use all the leftover values of the remaining dimensions here”. You could get the same result with this statement:

X_flatten = X.reshape(X.shape[0], X.shape[1] * X.shape[2] * X.shape[3]).T

The really important point here is the transpose, not the -1. The point is that the first dimension is the “samples” dimension and that dimension must be preserved in the output. Without the transpose, we will end up with a 209 x 12288 matrix with the samples as the rows and the transpose gives us 12288 x 209 with the samples as the columns. Now you might think that an easier way to get a 12288 x 209 matrix would be to do the “reshape” this way:

X_flatten = X.reshape(-1, X.shape[0])

That does give you the same output shape, but it turns out that the results are garbage. That method “scrambles” the data by mixing pixels from multiple images into each column of the output. To see why, let’s construct a simpler example that will let us visualize what is happening. Open a new cell and enter the following function:

# routine to generate a telltale 4D array to play with
def testarray(shape):
    (d1,d2,d3,d4) = shape
    A = np.zeros(shape)

    for ii1 in range(d1):
        for ii2 in range(d2):
            for ii3 in range(d3):
                for ii4 in range(d4):
                    A[ii1,ii2,ii3,ii4] = ii1 * 1000 + ii2 * 100 + ii3 * 10 + ii4 

    return A

What that does is create a 4D array where the value in each position of the array shows the index values of its position in the array with each dimension in order. That is to say A[1,2,3,4] = 1234. Of course this is only going to be understandable if all the dimensions are single digit size. Using that function, we can create a play 4D array of dimensions 3 x 2 x 2 x 3 (which you can think of as 3 samples, each of which is a 2 x 2 RGB image) and then unroll it in both ways:

A = testarray((3,2,2,3))
Ashape1 = A.reshape(A.shape[0],-1).T
Ashape2 = A.reshape(-1,A.shape[0])
np.set_printoptions(suppress=True)
print("A.shape = " + str(A.shape))
print(A)
print("Ashape1.shape = " + str(Ashape1.shape))
print(Ashape1)
print("Ashape2.shape = " + str(Ashape2.shape))
print(Ashape2)

When you run that, here is the output:

A.shape = (3, 2, 2, 3)
[[[[   0.    1.    2.]
   [  10.   11.   12.]]

  [[ 100.  101.  102.]
   [ 110.  111.  112.]]]


 [[[1000. 1001. 1002.]
   [1010. 1011. 1012.]]

  [[1100. 1101. 1102.]
   [1110. 1111. 1112.]]]


 [[[2000. 2001. 2002.]
   [2010. 2011. 2012.]]

  [[2100. 2101. 2102.]
   [2110. 2111. 2112.]]]]
Ashape1.shape = (12, 3)
[[   0. 1000. 2000.]
 [   1. 1001. 2001.]
 [   2. 1002. 2002.]
 [  10. 1010. 2010.]
 [  11. 1011. 2011.]
 [  12. 1012. 2012.]
 [ 100. 1100. 2100.]
 [ 101. 1101. 2101.]
 [ 102. 1102. 2102.]
 [ 110. 1110. 2110.]
 [ 111. 1111. 2111.]
 [ 112. 1112. 2112.]]
Ashape2.shape = (12, 3)
[[   0.    1.    2.]
 [  10.   11.   12.]
 [ 100.  101.  102.]
 [ 110.  111.  112.]
 [1000. 1001. 1002.]
 [1010. 1011. 1012.]
 [1100. 1101. 1102.]
 [1110. 1111. 1112.]
 [2000. 2001. 2002.]
 [2010. 2011. 2012.]
 [2100. 2101. 2102.]
 [2110. 2111. 2112.]]

Notice how each column of Ashape1 has consistent values for the first dimension, whereas that is not true for Ashape2 . You can see that each column of Ashape2 contains entries from all three of the input samples (values of the first index of the input). Also if you read down the columns in Ashape1, you can see that the unrolling happens from the highest dimension backwards. That is the color dimension, so that means that the order is all 3 color values at each position together and then it steps through the width and height dimensions.

This demonstrates why the second method actually scrambles the data, making it meaningless for our purposes. That is why you get 34% test accuracy if you run the training using the incorrectly flattened data: the images no longer make any sense.

32 Likes

Thanks Paul. The example you provided showed there is big difference between
A.reshape(A.shape[0],-1).T
and
A.reshape(-1,A.shape[0])

However, I still find it very non-intuitive and hard to understand the reshape mechanism of complicated matrix. Is it because execution priority is different between A.reshape(A.shape[0],-1) and A.reshape(-1,A.shape[0]) ? Or is it purely the result of the transpose? If A.shape[0] is first going to sort out the outer most matrix (highest dimension), then should A.reshape(-1, A.shape[0]) first group the highest dimension matrix in each column and then fill out the remaining? If so, the result should be the same. If the reshape function is not sorting out the highest dimension, then A.reshape(A.shape[0],-1) should just reshape the original matrix to a 3 by 12 matrix just like the result of Ashape2 (without grouping the highest dimension in each row). And in this case the result of A.reshape(A.shape[0], -1) should be:
[ 0 10 100 …]
[1 11 101…]
[2 12 102…]

And tranpose the above matrix won’t give the right result either.
It seems that the two reshape function is treating the structure of the same image matrix very very differently.

2 Likes

@paulinpaloalto . Hi Paul, in case it got buried in your inbox… can you take a look at my previous response and see if you could answer my question? I just started to learn coding, and need some help to understand.

Thanks!

1 Like

Hi @chenf12, leaving aside for a moment the transpose function the reshape part is easier to understand.

Let’s start with a simple example defining a numpy array with 8 values:

>>> a = np.arange(8)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7])

You can reshape this array to have different number of columns and rows. Either explicitly, defining the number of rows and columns, like so:

>>> a.reshape(2,4)
array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

Or implicitly, letting the library to decide the number of rows or columns when you use -1, you fix either columns or rows and the other one is automatically calculated.

>>> a.reshape(2, -1)
array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

>>> a.reshape(-1, 2)
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])

Of course, the above only works when that “automatic calculation” is really possible. For example, there is no way to rearrange the array into something with 3 columns as you can see below:

>>> a.reshape(-1, 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot reshape array of size 8 into shape (3)
6 Likes

@chenf12: Why would you assume that’s the output order you would get? Python is an interactive language, so you can just try it and see what happens. Watch this:

A = testarray((3,2,2,3))
Ashape1 = A.reshape(A.shape[0],-1)
np.set_printoptions(suppress=True)
print("Ashape1.shape = " + str(Ashape1.shape))
print(Ashape1)
Ashape1.shape = (3, 12)
[[   0.    1.    2.   10.   11.   12.  100.  101.  102.  110.  111.  112.]
 [1000. 1001. 1002. 1010. 1011. 1012. 1100. 1101. 1102. 1110. 1111. 1112.]
 [2000. 2001. 2002. 2010. 2011. 2012. 2100. 2101. 2102. 2110. 2111. 2112.]]

So you can see that you get exactly what you’d expect from the transpose that I showed in my original example.

The unrolling done by reshape happens from the highest dimension first. That is demonstrated by the two examples I gave. Notice that it’s easier to see what is happening without the transpose: it just starts at (0,0) and unrolls across the first row and it turns out it hits the end of the first row at just the right time because we set the dimensions correctly. Also have you tried reading the documentation for reshape? Just google “numpy reshape”.

Now it turns out what you will learn from reading the aforementioned documentation is that unrolling from the highest dimension first is just the default behavior of reshape. You can use the “order” parameter to specify a different arrangement. Watch this:

Ashape2 = A.reshape(A.shape[0],-1,order='F')
print("Ashape2.shape = " + str(Ashape2.shape))
print(Ashape2)
Ashape2.shape = (3, 12)
[[   0.  100.   10.  110.    1.  101.   11.  111.    2.  102.   12.  112.]
 [1000. 1100. 1010. 1110. 1001. 1101. 1011. 1111. 1002. 1102. 1012. 1112.]
 [2000. 2100. 2010. 2110. 2001. 2101. 2011. 2111. 2002. 2102. 2012. 2112.]]

Notice that if you take the transpose of that matrix, it will still have the property that each column has a consistent first dimension value (0 for the first column, 1 for the second and so forth). That means that the data is not “scrambled”. It’s just a different way to order the pixels in each image. It turns out that you can run the experiment of using “F” order instead of the default “C” order and the training works exactly as well as it does with “C”. As long as you do the reshapes all in the same consistent way, the algorithm can learn to recognize the patterns.

2 Likes

The way to think about the difference between “C” and “F” order for an image is to remember that the highest dimension is the RGB color dimension. So what that means is that with “C” order you get all three colors for each pixel together. With “F” order, what you get is all the Red pixel values in order across and down the image, followed by all the Green pixels, followed by all the Blue pixels. So it’s like three separate monochrome images back to back. It’s worth trying the experiment of using “F” order on all your reshapes and then running the training and confirming that you get the same accuracy results. In other words (as I said in my previous post), the algorithm can learn the patterns either way. It just matters that you are consistent in how you do the unrolling. You can even do the “negative” experiment to confirm the science here: unroll the training images in “F” order and the test images in “C” order and confirm that you get terrible test accuracy. Science! :nerd_face:

4 Likes