Week 4 - 2nd Assignment - layer_dims?


I just made my way through the 2 layer network. Now I started working on the L-layer network and stumbled over this line:

layers_dims = [12288, 20, 7, 5, 1] #  4-layer model

I know the input vector X had been flattened. That would explain the numbers ‘12288’ and ‘1’ in this row of numbers. These two numbers represent the input images. But what about the other numbers in there '20, ‘7’, ‘5’? What do these numbers represent? I’m confused about what meaning they have or where they come from?

Many thanks.

Look at how layers_dims is used. That should make it clear what is going on. It is telling you the following things:

  1. That the network you are specifying has 4 layers: 3 hidden layers and the output layer.
  2. What the number of neurons are that are output by each layer.
  3. What the size of the input vectors is.

With that information, you now know the sizes and shapes of all the W^{[l]} and b^{[l]} parameters for all the layers.

The point about 12288 is that is the number of pixels in each image (64 * 64 * 3). The logic for flattening them is the same here as it was in Week 2 Logistic Regression. Here’s a thread which explains that.

So the first hidden layer takes 12288 inputs and outputs 20 neurons.

The second hidden layer takes 20 inputs and output 7 neurons.

The third hidden layer takes 7 inputs and has 5 outputs.

The output layer has 5 inputs and 1 output. We are doing a binary classification here, so you have one output which is interpreted as either “yes” or “no”, right?

In terms of “where these came from”, you can see from the above that the input and output sizes are pre-determined. The sizes of the hidden layers are choices that you have to make as the system designer. The general structure of a feed forward NN is that the number of neurons per layer decreases as you go through the network, but beyond that it’s up to you. You have to experiment with these “hyperparameters” in order to determine what works for the particular problem at hand. Prof Ng will spend a lot of time discussing how to make that sort of choice in a systematic way in Week 1 of Course 2 in this series, so “hold that thought” and stay tuned!

Awesome! That was very helpful. Thanks.

Hello. I am having a very similar issue with Course 1, Week 4, Assignment 2:

But the test for L_layer_model_test(L_layer_model) seems to require quite different layers_dims (10,5,6,1):


where “target” is the L_layer_model, the dimensions of which differ from those in the test case.

Am I missing something regarding these different dimensions?

Same answer as on the two layer case: if you copied over your functions from the Step by Step exercise, that will fail in exactly this way. The deep “init” routine they used here is more sophisticated: the simple function that they had us build in Step by Step gives really lousy convergence on this particular problem. They didn’t explain this because it’s a more advanced topic that is covered in Course 2.

Note that the errors you are getting are not about the shapes of the values, but the values themselves.

Also note that all the code we are writing here is “general”: it can handle any number of layers and any combination of numbers of neurons in each layer. You just use what the layers_dim variable tells you to drive everything. So it is not “a problem” that different test cases use different numbers of layers. If you wrote your code in the correct general way, it should “just work”. But if it’s in any way “hard-wired” or “hard-coded”, then you’ve got a problem that you need to fix. And of course the point of the test cases is to detect such problems, so they try different combinations.

I really appreciate your advice, and will try to find a way to make my code as flexible and not “hard-wired” as it surely is presently. I’m rather eager to see how a proper solution makes it “just work,” which I take at this point to mean just what we’ve been discussing with regard to dimensional analysis.

I really don’t think the problem in this particular instance is “hard-coding”. If you had the dimensions wrong, the error you get would be different. My guess is (as I said the my initial reply) that you’re still using your hand copied initialize_parameters_deep routine from the Step by Step exercise. That’s the first thing to check.

Hi, I’ve been trying to track down what could be hard-copied from the Step-by-Step code. However, in trying to flush this out, I’ve noticed a peculiarity with the first part of the 2nd assignment for two_layer_model_test(two_layer_model). First, let me display the decline in costs from the plot of the two_layer_model stats:

As you can see, it gets the right costs to very high precision over 2500 iterations, which should be good, right? That should only be possible for non-trivial code given the unique seed for generating random elements.

But, I find a change in dimensions from just after iteration 0:

where W1,b1,W2,b2 = {[7,12288],[7,1],[1,7],[1,1]} becomes {[4,10],[4,1],[1,4],[1,1]}, which of course is from the first assignment test. I mean, I can see the code, and it’s doing exactly what it should for this test case. Also, the corresponding weights and biases are not in the two_layer_model_test.

But the test results for W1,b1,W2,b2 do match on the 2nd pass of iteration 1:

And this is just for the two_layer_model.

For the L_layer_model. I can’t even display the weights and biases because the L_layer_model is initialized with layers_dims = [12288, 20, 7, 5, 1] for the 4-layer model, which is then changed to a 3-layer model with layers_dims = (10, 5, 6 , 1) in L_layer_model_test(L_layer_model).

Whatever is causing this behavior will be illuminating as I progress through the courses.

Hello. I’m concerned that I gave too many details yesterday to quickly digest a description about a discrepancy with respect to the model two_layer_model parameters, so I will now summarize the main issue that I have so far been unable to solve. Note that the code shown here is from the original notebook without any of my code.

The two_layer_model is initially parameterized with layers_dims set to n_x = 12288, n_h = 7, and n_y = 1 in the following cell:


and the function two_layer_model_test in public_tests.py takes two_layer_model as its target parameter, and then overwrites layers_dims with n_x = 10, n_h = 4, and n_y = 1 after initial iteration 0:


So, the model effectively trains on these latter parameters and somehow-- despite the discrepancy I describe–produces the correct cost to a very high precision when trained, as I described in yesterday’s post above.

I understand that these parameters are not to be changed, so I have to figure out a way accommodate these initial parameters in the function initialize_parameters(n_x, n_h, n_y) in the two_layer_model.
Assuming that I’ve got the latest code, I am truly mystified how everyone other than me are able to work around this problem.

I did not go through your long post, but I think I have some advices for you.

and the function two_layer_model_test in public_tests.py takes two_layer_model as its target parameter, and then overwrites layers_dims with n_x = 10 , n_h = 4 , and n_y = 1 after initial iteration 0:

This is a separate test. So, two separate tests are conducted sequentially.

The first one is to use global variables n_x, n_x, and n_y. This is the one that you can see in the test cell.

parameters, costs = two_layer_model(train_x, train_y, layers_dims = (n_x, n_h, n_y), num_iterations = 2, print_cost=False)

As you see, only two iterations are specified.

Soon after finishing this “open” testing, then the 2nd “hidden” test starts. That is what you discovered. The 2nd test intentionally uses different dimensions. This is quite important to check the flexibility of your implementation, i.e., to see there is no hard code as Paul suggested.

This is really important for a modern program development. I’m from a computer industry, and had worked on very large scale projects as an R&D leader. Key point here is, inputs can be easily changed/enhanced based on business needs. There are millions of code in one project. If some members “hard-code” any particular portions, then, we need to search millions of code to see impacts of changes. And, maintenance will be typically done by different persons. So, inflexible implementation by hard-coding is not accepted in real commercial products/projects.

Just for your awareness.

There are two completely separate test cases, right? They don’t have to use the same parameters in all test cases. Look again at the code in that test cell:

  1. First it runs a test by directly invoking two_layer_model using the parameters as already defined by previous cells.
  2. Then it invokes two_layer_model_test, which runs a different test with different parameters.

Look at how the two_layer_model_test uses the target parameter: it is a function reference and it invokes it after initializing the parameter values that it wants to use.

Hello, Nobu and Paul. I just wanted to thank you for the clarification. It allowed me to finish the course.

That’s great to hear. Congratulations! Do you plan to continue on with Course 2? There are plenty of interesting topics yet to be covered.


I am already working my way through the first week of Course 2. I got a Ph.D. in Computer Science in 2003, with a specialization in Cognitive Science applying neural networks to NLU, so I was familiar with most of the concepts in Course 1. But I had gotten side-tracked after a post-doc into other concerns I’ll not bore you with.
So, now I find myself trying to catch up with tools and methodologies new to me. As I progress, I anticipate a longer learning curve, but I really am looking forward to mastering the coursework.
Your kind guidance has really facilitated my studies.