W3_A1_Input Layer_mismatched

When we load the dataset using load_planar_dataset(), it returns a tuple (X, Y), where X is a numpy array of shape (2, 400) and Y is a numpy array of shape (1, 400). This means that we have 400 examples in the dataset, and each example is represented as a vector of 2 features (the x and y coordinates of the input) and 1 output label.

Therefore, when we compute n_x, we need to use X.shape[0], which gives us the number of features in the input, i.e., the size of the first dimension of X. In this case, X.shape[0] is equal to 2, which is why n_x should also be equal to 2.

Similarly, when we compute n_y, we need to use Y.shape[0], which gives us the number of output labels, i.e., the size of the first dimension of Y. In this case, Y.shape[0] is equal to 1, which is why n_y should also be equal to 1. However, when using : n_x = X.shape[0]
n_h = 4
n_y = Y.shape[0] , it returns:
The size of the input layer is: n_x = 5
The size of the hidden layer is: n_h = 4
The size of the output layer is: n_y = 2
n_h is understandable. But why n_x=5 and n_y=2 (while as I discussed earlier, I expect them to be n_x=2 and n_y=1).
any help?

Hello @Hamid_Reza_Hamedi! I hope you are doing well.

I agree with your intuition about the shape of X and Y. However, the output you mentioned is actually the shape of t_X and t_Y, as indicated in the attached figure. I hope this clears up any confusion you may have had.

If you have any further questions or concerns, please don’t hesitate to let me know.


thanks, @saifkhanengr for the response. I see your point now.But still it is a puzzle for me. As far as I know, and it comes from the neural network figure in this exercise, n_x=n[0]= number of features in input=2; n_y= number of output layer nodes=1. This is against n_x=5 and n_y=2.
Essentially I do not see any relationship between the size of layers and t_X and t_Y as the sizes come from
n_x = X.shape[0]
n_y = Y.shape[0]
would be thankful if you help clarify this for me.

The point is that we are supposed to be writing general code here. The layer_sizes function should work for any shape of the inputs, right? It should not be “hard-coded” to assume that all problems have 2 features. So they wrote a test case that makes sure you didn’t do any “hard-coding” by using different dimensions than those for the actual problem here.

This will be a continuing pattern throughout all these courses: it’s a mistake to hard-code anything unless there is literally no choice and they explicitly tell you to (as in the case of n_h here).

1 Like

Hello Hamid!

I think you are mixing the X and Y of the load_planar_dataset() and the X and Y of the layer_sizes(X, Y). Both are different.

The X, Y = load_planar_dataset() means that X and Y are our data with shape (2, 400) and (1, 400), respectively. We can use any notation for that, like X_train, Y_train (but to pass the assignment, you have to use X and Y).

The X and Y of layer_sizes(X, Y) is not the planar dataset, but the input to the layer_sizes function. Once we define that function, then it is not necessary to use the exact notation (like X and Y) to call this function. For example, we use layer_sizes(t_X, t_Y), here input to the function is t_X and t_Y which shape is different from X and Y of the planar dataset.

These are the basic concepts of Python. If you are not familiar with it, I highly recommend taking any Python course to get familiar with it. Note: You don’t need to become an expert in Python to complete the DLS, just become familiar.


1 Like

thanks, @saifkhanengr and @paulinpaloalto . Clear now, got the point. sorry for asking about such basic issues!

Hello Hamid,

No need to apologize for asking basic questions. We all start somewhere, and it’s important to have a strong foundation in the basics to build upon. I’m glad that your doubts are clear now, and if you have any more questions or concerns in the future, feel free to ask.


1 Like

Hello Saif!

I also struggled with this! Must say your answer here is fundamental, I suggest to give this same answer any other times as it is very clear.

I am more than familiar with Python but still struggled for a few minutes, it is comprehensible that one confounds the X matrix and Y vector with X and Y arguments to the function, as it is of unintuitive!

I don’t know whether this was built as so to try our understanding, if so, interesting proof.