I am having a really hard time understanding 3D arrays. I can’t wrap my head around this image:

I can understand 2D arrays, but not this. And then we have the training set that says

Remember that `train_set_x_orig`

is a numpy-array of shape (m_train, num_px, num_px, 3).

So the shape is (209, 64, 64, 3). This means it’s 209 rows of arrays that are 64x64? I just can’t, I am trying to understand this all day and I just can’t grasp it.

train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T

Also, this line doesn’t make sense to me. The shape of our train_set_x_orig is (209, 64, 64, 3). So we are reshaping this to have 209 rows, -1? What’s -1?

I feel a bit lost.

1 Like

Yes, it’s pretty tricky to make the jump from 2D arrays to 4D arrays. You can take it one step at a time:

For a single image, it is a 3D array. In our case the images are 64 x 64 x 3, which means you can think of it as 64 x 64 pixels and at each location, you have 3 color values RGB that give you the exact color of the pixel at that location. So it’s 64 x 64 positions with 3 values at each point. Or think of it as three layers stacked behind each other that are 64 x 64: the red picture, the green picture and the blue picture.

Now when you take the next step up and handle multiple images at in a batch, what we do is add the first dimension for the “samples”. The number of samples m = 209 in this case, so think of it as 209 images in a list, each of which has 64 x 64 pixels and at each pixel location you have 3 color values. So it’s 209 x 64 x 64 x 3.

Now when we “unroll” or “flatten” the 4D array into a 2D array so that we can feed it to our neural network, we need to be careful how we do that. Here’s a thread which explains that process in detail.

3 Likes

So if I understood this correctly, this is my training set? Test set is the same with fewer examples.

2 Likes

Yes, the test set has different images, but they are the same size and shape. There are 50 images in the test set and 209 in the training set.

1 Like