# Course 1, Week 4, Assignment 1, Exercise 5 - finding L

Why is it that in Exercise 5, we find L by doing this:

L = len(parameters) // 2

But earlier, in Exercise 2, we did this to find L:

L = len(layer_dims)

Oh, wait, is it because if you have

layer_dims = [2, 4, 1]

you would end up with

L = len(layer_dims)

coming out to 3, while

L = len(parameters) // 2

would come out to 2? That way, you â€śleave outâ€ť the final layer, which needs a different activation function?

Hi,

I donâ€™t think thatâ€™s the reason, if you had more layers that logic would not work, think for example what would happen with a higher number of layers e.g. 5.

I think the explanation is that in`parameters` you have `W` and `b` for each layer e.g. `W1`, `b1`, `W2`, `b2` thatâ€™s `4` elements in the `parameters` dictionary which corresponds to `2` layers in the network. So, you have double number of parameters than layers thatâ€™s why you need to divide by 2.

Thanks for the reply! I guess what I initially couldnâ€™t figure out was what is the difference between

â€“ The number of layers
vs.
â€“ Double the number of layers, divided by 2.

Logically, that would be the same. So when I kept pondering, I remembered that in Python, when you use range(), it essentially subtracts 1 from the second number in the range, causing the length of 3 to loop only 2 times when you use `range(1, len(layer_dims))`. When you use `range(1, len(parameters) // 2)`, you end up looping through only 1 time, leaving your final layer â€śuncalculatedâ€ť so you can use the sigmoid function on it separately. However, that still doesnâ€™t explain why you wouldnâ€™t just use `len(layer_dims)-1`. Why change it up? Is it just a matter of preference? Is there a specific reason it has to be done this way?

Now, all of that being said, I canâ€™t get Exercise 5 to run correctly, so maybe my entire line of thinking is wrong. If so, please set me straight because Iâ€™m really having a rough time with this particular exercise.

P.S. - It would still happen with 5 layers. Letâ€™s say you had layer_dims = [2, 4, 3, 3, 1]
len(layer_dims) would be 5, which would loop 4 times, which would result in the creation of W1, b1, W2, b2, W3, b3, W4, and b4. So then len(parameters) // 2 would be 4, which would loop 3 times, leaving out the last layer so you could use sigmoid on it. Please correct me if Iâ€™m wrong, which is super-possible!

In previous exercises value of L was `len(layer_dims)` because `layer_dims` was an array containing number of nodes in a layer, i.e. layer_dims = [5,4,3] meant a 3 layered network that has 5 units in first layer, 4 units in second, and 3 units in third.

In the final exercise, the code changed to `len(parameters) // 2` because now we get a dictionary that contains parameters for each layer. Since we have only two parameters, the data in this would be something like len(parameters) = len([W1, b1, W2, b2, â€¦ , Wn, bn]), that is pairs of W and b. Therefore if you divide the length by 2 you get the number of W,b pairs which is the number of layers in the network.

Iâ€™m trying to figure out why weâ€™re using a completely different method to find the same piece of information (number of layers) that we already knew from earlier. It just seems roundabout and strange to me, so Iâ€™m assuming thereâ€™s a good reason for it, but I just donâ€™t get it.

I think the main difference between the two cases is the function where this information is used.

• In the first case, you are initializing the neural network `initialize_parameters_deep`. That function would in effect return a representation of the neural network as a dictionary with weights and bias values (`Wi` / `bi`).
• In the other function, you are implementing the the forward propagation, `L_model_forward`, which only requires the neural network (`parameters` input and `X`, the data). You could pass the value of `layer_dims` as input too, however, this value can be calculated using the `parameters` input therefore it would be somewhat redundant.

So, in summary, at initialization time you need to know the number of layers and their dimensions in order to generate the dictionary of `W` and `b` values. In contrast, when you are performing the forward pass the neural network is already defined so the number of layers can be deducted from the `parameters` input.

In any case your logic is correct, in the first L-1 layers you apply RELU and in the last one you apply sigmoid. What kind of error are you getting?

I now have a follow-on question. In Exercise 9, it finds L yet another way:
`L = len(caches)`
I cannot understand why the number of layers needs to be recalculated repeatedly, and in different ways. Is there a purpose to this? Is there some chance the number of layers has changed partway through running forward/backward propagation? Does it just have something to do with the fact that weâ€™ve broken our work apart into a bunch of small helper functions? Why isnâ€™t L just always the same?

Hi, the number of layers is calculated differently because the input of the functions is different too. In the function `L_model_backward` you donâ€™t have the `parameters` input, the function is defined as:

`def L_model_backward(AL, Y, caches):`

From those inputs, the only input from where you could calculate the number of layers is `caches`.

• The reason of the different calculations for `L` is only due to the inputs of each function.
• It could be possible to rewrite the functions to have `L` as input (or as a global variable) and then you wouldnâ€™t need to calculate it, but in general functions should have as less inputs as possible, therefore, as `L` can be calculated from other inputs then it is not used as a parameter.
• `L` is always the same but it is calculated differently at each step