Cat V NotCat dataset Structure

I’m trying to understand the structure of the dataset used in the assignment of week 2. And these are the basic Assumptions I’ve made (Please correct me if i’m wrong):

  1. The shape of the training set is (209,64,64,3) . This means 209 images of dimensions 64x64 with 3 color channels.

  2. Shape and size are synonymous and are used interchangeably throughout the course.

Here’s my problem though:
What I have understood about this dataset structure is its [image1,Image2…image209].
Basically 209 images inside a list.
But when i do the following:

t= h5py.File("datasets/train_catvnoncat.h5","r")
train_set_x_orig= t["train_set_x"]
for i in train_set_x_orig:

I’m Getting only the first image of the dataset plotted . Is this an issue with my code or my understanding of the dataset? Also please correct me if any of the assumptions I’ve listed above are wrong. Thanks!

Hey @Narayan,
Your first statement is correct. As for your second statement, that may be true at times and may not be true at other times.

For instance, when we are talking about a single image (considering it as a picture), we may use size more often, but when we are talking about a batch of images (considering each image as a matrix of values), we may use shape more often. But yes, I do believe that they are used interchangeably throughout the course, most of the times.

As for your code, you are missing out on creating a separate figure for each image, and that’s why you are getting a single image only. Use the following code, and you will get all the images

for i in train_set_x_orig:

Since, you are not creating a figure explicitly, hence, the matplotlib plots each of the images in the same figure, and ultimately, you only get to see the last image. I hope this helps.


1 Like

Elemento has covered all the important points, but just one minor addition:

That is not a list: it is a numpy array that happens to have 4 separate dimensions. The first dimension is the “samples” dimension. When you use a for loop with a single index to index a 4D array, it indexes the first dimension, which is what you want for your purposes here.

Note that we have to do some rearrangement of that data in order to use it as the input to the type of algorithm we are using here: a Feed Forward Neural Network needs the inputs as individual vectors and cannot handle 3D (for one sample) or 4D (for a batch of samples) arrays as inputs. They give us the logic to “flatten” the 4D array into a 2D array (a.k.a. a “matrix”) in which the “samples” dimension is defined to be the second dimension, in other words the columns. Here’s a thread which discusses the flattening process in some detail.