W3_A1_Ex-2_Layer_sizes_Dimension_of_output_layer

It is about Exercise 2 - layer_sizes. The instruction claims that n_y: the size of the output layer. I am able to do the exercise, but I do not understand why the size of the output layer (n_y) is equal to 2. It is a binary classification and the outputs should be either 0 or 1. Why the size of the output layer can be 2? I do not understand this and cannot visualize it. Also, one the top (in Exercise 1), the instruction also claims that the shape of Y is (1, 400). Why the size of output layer is not 1? Can anyone help? Thanks so much!

5 Likes

Hi, n_y is 1 for the planar dataset, however, it is 2 for the test case used after the layer_sizes function. As you can see, the layer_sizes function is applied to t_X and t_Y, not to X and Y. You can check the result when you applied to X and Y.

t_X, t_Y = layer_sizes_test_case()
(n_x, n_h, n_y) = layer_sizes(t_X, t_Y)
6 Likes

This is quite confusing (why use random data, not mentioned before) and it does not quite work “AssertionError: Wrong result. Expected (7, 4, 5) got (5, 4, 2)”

1 Like

If it doesn’t work, it means your code is not correct. Please share your full error.

1 Like

I was hard coding the numbers because how confuse (yet simple) this exercise is.
The scope of work should be better explained here.
Several people had similar issues.

1 Like

what are t_X and t_Y? How are they related to X.shape and Y.shape?

1 Like

They aren’t related. Those are just test case values generated by calling that function. They are then passed to your function and the answers are checked.

Within the body of your layer_sizes function, you simply reference the shapes of your input parameters in order to determine the dimensions.

It’s always a mistake to reference global variables within the body of one of our functions here in these courses, if that’s what you are asking.

1 Like

I am not sure if I understand. In Ex.4, t_X, parameters = forward_propagation_test_case() are just test cases for function forward propagation. In this test case t_X.shape=(2,3) and parameter[W1].shape =(4,2). So obviously I receive error trying to create Z1=np.dot(W1.T,t_X)+b1. It seems forward_propagation_test_case() are not created correctly.

1 Like

Remember this rule: To multiply two matrices, the number of columns in the first matrix must match the number of rows in the second matrix.

So, we have W1 (first matrix) and t_X (second matrix). Do you think we need to transpose any of them to match the above rule?

1 Like

Ok, so I arranged the matrices in the np.dot and no error in Z1 and Z2 calculation. but then in Ex.4 my A2=[[0.21442387 0.21436745 0.21442857]] which is not equal to the expected value A2=[[0.21292656 0.21274673 0.21295976]]. What could be my mistake?

1 Like

It will be worth checking the equations again:

Z^{[1]} = W^{[1]} X + b^{[1]}

A^{[1]} = \tanh(Z^{[1]})

Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}

\hat{Y} = A^{[2]} = \sigma(Z^{[2]})

Use sigmoid for A^{[2]}.

1 Like

Thank you so much for your answer! I had defined A1 with sigmoid instead of tanh! It works well now.

1 Like

Great :+1: :+1:

1 Like

You must have already fixed this if you passed the tests, but there’s another mistake there besides the transpose. Remember what I said in my earlier reply:

In the local scope of the forward_propagation function t_X is a global variable. You should never directly access it: instead reference the relevant parameter of the function, which is X in that case.

2 Likes

I agree with others reporting the same problem while solving Ex. 2: please clarify in the instructions that n_x and n_y should be dynamically determined from the shapes. I was totally focused on solving the flower dataset problem itself that I hardcoded 2,4,1 and got totally confused by the error message.

Alternatively, add a bit of logic to create non-2,4,1 test cases and an assert that informs the user that the return value seems to have been hardcoded (assert … = (2,4,1) ).

On the behalf of some learners coming after me: pretty please?

There is an important lesson here that you are missing. The goal is that we always want to write general purpose code. Why even have the layer_sizes routine if you’re just going to “hard-code” everything to (2, 4, 1)? We could just write one big function that does everything, but then for every problem we have to write everything from scratch instead of having modular support functions that are reusable to solve other problems. We’ll see the best example of this when we get to Week 4 of DLS Course 1, so please stay tuned for that.

Then in later courses (C2 and C4) we will learn to use TensorFlow, which is all written with general purpose functions that can handle any dimensions.