W2_How is dimension shaped through flattening of images?

Hello, I have a questions about the term dimensions, which appears multiple times and I want to make sure its definition.

In the “Logistic Regression with a Neural Network mindset” exercise 5, the first step is to initialize parameters with zeros. The instruction mentions to use the “shape” function to get the first dimension of X_train. I believe the answer should be X_train.shape[0].

However, I am wondering why dimension is not X_train.reshape((X_train[0] * X_train[1]* X_train[2], 1)? As we convert 3D array of shape (𝑙𝑒𝑛𝑔𝑡ℎ,ℎ𝑒𝑖𝑔ℎ𝑡,𝑑𝑒𝑝𝑡ℎ=3) to a vector of shape (𝑙𝑒𝑛𝑔𝑡ℎ∗ℎ𝑒𝑖𝑔ℎ𝑡∗3,1), for an example of 100 images with 64 x 64 x3, I thought the dimension would be 100 x 64 x 64 x 3?

The images are flattened into 1D vectors.
Then they are all stacked into a 2D matrix.

Hi Tmosh, could you please explain a bit more? still confused

Yes, you’re right although maybe I’m just nitpicking here, but the terminology for that is “dimensions” not “dimension”. Or you could say that “the shape is 100 x 64 x 64 x 3” if we have a batch of 100 images.

Then Tom’s point is that what we need to do in order to process images with Logistic Regression or a Neural Network is that we need to “flatten” or “unroll” the three dimensions of each image into a single vector. Notice that 64 * 64 * 3 = 12288. So each image has 12288 total pixel values. 64 * 64 = 4096 pixels, each of which has 3 color values R, G and B.

So with a batch of 100 images, the resulting input matrix will have dimensions 12288 x 100, because Prof Ng orients the inputs so that each column of the input matrix is one flattened image vector.

Here is a thread about how the flattening works.

1 Like

Hello @ansonchantf,

So far, I have seen two definitions for “dimension”.

First definition. In numpy terminology, number of dimensions = number of axes = number of values in array.shape. If an array has a shape of (2,3,4,5,6), it has 5 axes, it has 5 dimensions, and it is a 5D array.

Second definition. For a sample’s dimension, since the sample is usually arranged in a 1D array, the meaning of dimension is simply the number of values. For the sample np.array([3, 3, 2, 2, 1, 1]), it is a 1D array and the sample has 6 dimensions.

In the assignment, there are two representations of samples: train_set_x_orig (before flattened) and train_set_x (after flattened). We use the flattened version for Exercise 8, and there we are referring to the second definition of dimension.

Because we are already using the flattened version. The first value of the flattened version’s shape is already height x width x channels.

Cheers,
Raymond

1 Like

Thank you @rmwkwok @paulinpaloalto @TMosh . I understand now!