Week 2 Data Set data.h5

takeonme79 · June 18, 2023, 6:27am

I feel like the start of this assignment is a mystery because it’s not explained how the data is saved inside the data set. If I could see the actual data that would help me understand. I downloaded the data file to my hard drive but I can only see html code and none of the actual data regarding images of cats.

If someone could show me what the first 2-3 rows of data look like then that would be very helpful. Thanks.

saifkhanengr · June 18, 2023, 8:07am

Like this? I used the print option to print train_set_x_orig.

k

takeonme79 · June 18, 2023, 10:15am

I see, thanks. So [17 31 56] are the RBG values for the pixel in (0,0) and [22 33 59] is for (0,1)? And where is the indicator to show whether it’s a cat or not?

saifkhanengr · June 18, 2023, 10:32am

I do not know about this.

It is in train_set_y . You may check it.

paulinpaloalto · June 18, 2023, 2:52pm

Yes, as Saif has explained, the data are 64 x 64 x 3 RGB images and the corresponding labels. So the individual numbers are the pixel color values.

If you’d also like to understand how that data is stored and retrieved in the file format, you can actually examine the logic in the load_data function by clicking “File → Open” and then opening the utility python file that accompanies the notebook. You can find the name of the file by examining the “Import” block, which is the first code block in the notebook.

You’ll find that H5 is a file format that is frequently used for storing multiple objects in a single compound file. You can find more about that by googling something like “how to create h5 files in python”.

Borrx · February 20, 2024, 4:52am

I actually tried to visualize the training set data with HDFView.
It shows a beautiful tabulated format with the index, and I was able to see the corresponding y value simultaneously.
The only thing I could not figure out was the 3 layer RGB in the data table.
I’m trying to use HDFView instead of the print command in Python since it doesn’t show the indexed data.

paulinpaloalto · February 20, 2024, 5:00am

I don’t know anything about HDFview, but the pixel data are just 8 bit unsigned integers representing the RGB color values at various positions in the images. Good luck getting any meaning out of that when viewed in tabular or spreadsheet form. They show you how to render the images in the code in the notebook. That is the way to “see” an image.

But the interesting meta point here is that the algorithms that we will learn about in this course can actually figure out patterns in the pixel data just viewed as raw numbers, even if they don’t mean anything to the human visual cortex in that form.

Borrx · February 20, 2024, 5:17am

So, basically we should not care about the raw data, right?
We are generating the raw data simply by feeding in an image, and it will populate the data. (if I’m understanding correctly)
I’m trying to find a way to see the relation between the raw data and its corresponding pixel. Let’s say I’m trying to generate the raw data from a picture, and I want to randomly select a pixel and check its corresponding RGB value for verification purposes.
Any suggestion about how to do that?

paulinpaloalto · February 20, 2024, 5:45am

The data are just images, so they are composed of pixel values. If you want to examine the values, the data is loaded for us by utility functions into numpy arrays. You can write the python code to print out any elements of those 3D arrays. Give it a try and my guess is you will quickly conclude that there is not much to be learned by doing that.

Topic		Replies	Views
Curious about the DataSet Neural Networks and Deep Learning	3	541	May 24, 2022
Extra data sets for week 2 Neural Networks and Deep Learning	14	876	February 12, 2022
Converting Images into 'h5' file? Neural Networks and Deep Learning	3	1021	February 22, 2023
Doubts with Practice Lab of week 2 Neural Networks and Deep Learning	3	775	July 18, 2022
Dls course week2 dataset Neural Networks and Deep Learning	4	666	June 28, 2021

Week 2 Data Set data.h5

Related topics