I feel like the start of this assignment is a mystery because it’s not explained how the data is saved inside the data set. If I could see the actual data that would help me understand. I downloaded the data file to my hard drive but I can only see html code and none of the actual data regarding images of cats.
If someone could show me what the first 2-3 rows of data look like then that would be very helpful. Thanks.
I see, thanks. So [17 31 56] are the RBG values for the pixel in (0,0) and [22 33 59] is for (0,1)? And where is the indicator to show whether it’s a cat or not?
Yes, as Saif has explained, the data are 64 x 64 x 3 RGB images and the corresponding labels. So the individual numbers are the pixel color values.
If you’d also like to understand how that data is stored and retrieved in the file format, you can actually examine the logic in the load_data function by clicking “File → Open” and then opening the utility python file that accompanies the notebook. You can find the name of the file by examining the “Import” block, which is the first code block in the notebook.
You’ll find that H5 is a file format that is frequently used for storing multiple objects in a single compound file. You can find more about that by googling something like “how to create h5 files in python”.
I actually tried to visualize the training set data with HDFView.
It shows a beautiful tabulated format with the index, and I was able to see the corresponding y value simultaneously.
The only thing I could not figure out was the 3 layer RGB in the data table.
I’m trying to use HDFView instead of the print command in Python since it doesn’t show the indexed data.
I don’t know anything about HDFview, but the pixel data are just 8 bit unsigned integers representing the RGB color values at various positions in the images. Good luck getting any meaning out of that when viewed in tabular or spreadsheet form. They show you how to render the images in the code in the notebook. That is the way to “see” an image.
But the interesting meta point here is that the algorithms that we will learn about in this course can actually figure out patterns in the pixel data just viewed as raw numbers, even if they don’t mean anything to the human visual cortex in that form.
So, basically we should not care about the raw data, right?
We are generating the raw data simply by feeding in an image, and it will populate the data. (if I’m understanding correctly)
I’m trying to find a way to see the relation between the raw data and its corresponding pixel. Let’s say I’m trying to generate the raw data from a picture, and I want to randomly select a pixel and check its corresponding RGB value for verification purposes.
Any suggestion about how to do that?
The data are just images, so they are composed of pixel values. If you want to examine the values, the data is loaded for us by utility functions into numpy arrays. You can write the python code to print out any elements of those 3D arrays. Give it a try and my guess is you will quickly conclude that there is not much to be learned by doing that.