Let a be a 3d array with shape (3,2,2). It means that there are three 2 by 2 arrays. Now in the examples each image is of size (num_px,num_px,3). There num_px represent pixel values. Now each image in RGB format consists of 3 blocks of num_px by num_px arrays. But according to the size described above it seems that there are num_px blocks of num_px by 3 arrays. I am having confusion regarding the same. Please help me out.
It takes some practice to get adjusted to visualizing 3D arrays. There is no fixed way to look at them: it depends on what the data represent. In the case of images as we have here, I think it makes more sense to think of h x w pixels and at each pixel you’ve got 3 color values R, G and B. Let’s use 64 x 64 to agree with the images we have here. So think of it as 64 x 64 pixels, with each pixel having three color values. Or you can think of it as three 64 x 64 monochrome images stacked on top of each other.
You might also find this thread helpful, although it’s talking more about the process of “flattening” the 3D images into 1D vectors.