In the week 2 assignment we’re given a dataset with dimensions(m_train,num_px,num_px,3). My question here is will every dataset we encounter outside this course be arranged in the same way/format? or is this just an example exclusive to this course?
Hi, @Narayan. The structure of the data set is determined by the nature of the task. In this case it is image classification. Although that task is a common one in ML/AI, it is far from exhaustive. That said, it is the main focus of Course 4, where you will be introduced to the convolutional neural network (CNN). But then you will move on to sequence-to-sequence modeling where the dataset might represent a sequence of microphone inputs for say, language translation and phrase identification (such as "Hey, Siri). These, will naturally have different data structures.
Ken has given you the overall answer here, but maybe it’s worth adding the comment that in the case that the inputs are images (which is very common in Course 4 as well) there are two common formats for representing image datasets as arrays or tensors:
The most common arrangement and the one that will be consistently used in Prof Ng’s courses is:
m x h x w x c
Where m is the number of samples, h is the number of pixels in the vertical dimension, w is the number of pixels in the width dimension and c is the number of color channels (typically 3 for an RGB image).
Some people like to use this alternate representation, but Prof Ng never uses it:
c x h x w x m
Also note that there are many types of images and not all of them have 2 spatial dimensions or 3 color channels. E.g. you can have greyscale images with one color channel and medical images may have 3 spatial dimensions (e.g. CT scans).
Even for “normal” images, you can also sometimes see a 4th channel called Alpha which expresses transparency. This is common when your inputs are PNG files. There are also lots of file formats with different types of compression and the like, e.g. JPEG, PNG, TIFF and more. For each image file type, there will be a python library for dealing with that to get the files loaded into memory.
Thank you so much , sirs for the solutions. Regarding the point u make with the dimensions of the matrices i have an issue. I’m not exactly able to picturize in my head what an (m x h x w x c ) matrix looks like which is causing issues when i perform transpose operations. If u could provide me a picture of what these matrices look like i would get a better understanding.
Just from a terminology point of view, a matrix is an array with 2 dimensions. These are 4 dimensional arrays. Yes, they are hard to visualize. I think it’s helpful just to consider one “sample”, meaning that you pick a value for the first dimension. Then you’ve got 3 dimensions remaining, which are the height, width and the 3 colors. In the case of 64 x 64 x 3 images, you can think of it as a “stack” of three monochrome 64 x 64 images: one red, one blue and one green positioned behind one another. They show some illustrations in the Week 2 Logistic Regression assignment notebook that have the kind of rendering I described above.
The other point here is that in order to process a 4D array with the type of networks we are using in Course 1, we have to “flatten” each image into a 1D vector, so that the input is a matrix (only 2 dimensions). Here is a thread which explains how that flattening works and that also might be useful to help you visualize how the 4D arrays work.
This was precisely my initial question which was: will i come across only 4D array datasets outside this course? Would we come across 5D,6D,7D etc? This is just because of my inexperience in dealing with a large number of datasets that i’m not aware if the course is giving us just one type of dataset.
There are lots of assignments in these courses. All the ones that are doing some form of image recognition or “computer vision” will use 2D 3 color images as the inputs, so they all end up being 4D arrays (or “tensors” once we get to TensorFlow). In those cases the dimensions will be as I described:
m x h x w x c
But there are some assignments where the inputs are not a picture. E.g. the Course 1, Week 3 Planar Data assignment: there the inputs are a number of points in the plane meaning (x,y) coordinates as the inputs. There’s another one in Course 2 where the inputs are points in a plane. Then in Course 5, we are dealing with language models, so the inputs are various forms of arrays containing “one hot” encoded sequences of words or characters. Those are usually 3D arrays, because there is also the “timestep” dimension.
My point about medical images (e.g. CT scans) is that there do exist types of images with more than 3 dimensions. I think a CT scan has 4 dimensions: 3 spatial dimensions plus the actual density values. So if you had a batch of those images, it would be a 5D array or tensor. But we do not encounter any 5D arrays in these courses at least through the end of Course 4.