About 3d ARRAY's shape in numpy and CNN

There is no “standard order of dimensions” in numpy or any other language. You just have to look at the meaning of the data. When dealing with image data, you have the choice of “channels first” or “channels last” orientation, but here we use “channels last”. So the dimensions of a single image are:

height x width x channels

Of course height and width are number of pixels. Channels are the color values for the pixels, so it will be 1 for greyscale images, 3 for RGB images or 4 for CMYK or RGBA images.

Then if you have multiple images, a first dimension is added for the samples. So if you have m images, the array will be 4D:

m x h x w x c

In Course 1, we needed to convert these 4D image arrays to 2D matrices with dimensions n_x x m, where n_x is the number of features and m is the number of samples. Of course for images we have this relationship:

n_x = h * w * c

You can see a detailed discussion of how the “flattening” operation is done to convert from 4D to 2D on this thread.

When we get to ConvNets in Course 4, part of the power of ConvNets comes from the fact that they can handle the original geometric structure of the images: you don’t have to “flatten” them. The networks handle one image at a time, so you select on the first dimension (samples):

oneImage = images[i, :, :, :]

Which gets us back to a single image with dimensions h x w x c. As we go through the layers of the convnets, of course, the h and w and c values will typically change.