Programming Assignment 1, the shapes of training dataset and test dataset


Can someone tell me why are the shape of the example data (2,300) and the shape of the true label vector (1,300) different?
To which of the rows of example data does the true label corresponds?

In other words, does the true label vector indicate blue or red only for example data in train_X[0] or train_X[1]?

Thank you in advance!

X[0] and X[1] are both features of the input and for both, the label is 1 item. 300 is the number of training data with 2 features and the true label is probably the 3rd column in the data if its table structured.

You need to understand the data in each case. In general each sample as some number of “features” or elements in each sample vector. In this case there are 2 which happen to be the coordinates of a point in the X-Y plane. Then the “labels” have one value for each sample. In this case they tell you whether that point in the plane given by the X entry is colored red or blue. So the X inputs are 2 x 300 and the labels are 1 x 300, because you have 300 samples and each sample has 2 features and one label.

1 Like

Of course at a high level your question doesn’t really make any sense. For example in Course 1 Week 2 and Week 4, we had inputs that were 64 x 64 x 3 images, so each sample had 12288 features which are the pixels in the image and we were looking for cats. So would it make any sense to ask “which one of those pixels does the ‘cat’ label apply to?” It applies to the whole image, right? So it’s the same here: the red or blue label applies the point, which has two coordinates. If you modify either of the coordinates, then it’s a different point, right?

So the high level point is that you start by understanding the meaning of the data that you’ve got. In the notebook they graph it for you and explain the goal of the model.

Thank you for your answer!