Why is it that in Exercise 5, we find L by doing this:
L = len(parameters) // 2
But earlier, in Exercise 2, we did this to find L:
L = len(layer_dims)
Why is it that in Exercise 5, we find L by doing this:
L = len(parameters) // 2
But earlier, in Exercise 2, we did this to find L:
L = len(layer_dims)
Oh, wait, is it because if you have
layer_dims = [2, 4, 1]
you would end up with
L = len(layer_dims)
coming out to 3, while
L = len(parameters) // 2
would come out to 2? That way, you “leave out” the final layer, which needs a different activation function?
Hi,
I don’t think that’s the reason, if you had more layers that logic would not work, think for example what would happen with a higher number of layers e.g. 5.
I think the explanation is that inparameters
you have W
and b
for each layer e.g. W1
, b1
, W2
, b2
that’s 4
elements in the parameters
dictionary which corresponds to 2
layers in the network. So, you have double number of parameters than layers that’s why you need to divide by 2.
Thanks for the reply! I guess what I initially couldn’t figure out was what is the difference between
– The number of layers
vs.
– Double the number of layers, divided by 2.
Logically, that would be the same. So when I kept pondering, I remembered that in Python, when you use range(), it essentially subtracts 1 from the second number in the range, causing the length of 3 to loop only 2 times when you use range(1, len(layer_dims))
. When you use range(1, len(parameters) // 2)
, you end up looping through only 1 time, leaving your final layer “uncalculated” so you can use the sigmoid function on it separately. However, that still doesn’t explain why you wouldn’t just use len(layer_dims)-1
. Why change it up? Is it just a matter of preference? Is there a specific reason it has to be done this way?
Now, all of that being said, I can’t get Exercise 5 to run correctly, so maybe my entire line of thinking is wrong. If so, please set me straight because I’m really having a rough time with this particular exercise.
P.S. - It would still happen with 5 layers. Let’s say you had layer_dims = [2, 4, 3, 3, 1]
len(layer_dims) would be 5, which would loop 4 times, which would result in the creation of W1, b1, W2, b2, W3, b3, W4, and b4. So then len(parameters) // 2 would be 4, which would loop 3 times, leaving out the last layer so you could use sigmoid on it. Please correct me if I’m wrong, which is super-possible!
In previous exercises value of L was len(layer_dims)
because layer_dims
was an array containing number of nodes in a layer, i.e. layer_dims = [5,4,3] meant a 3 layered network that has 5 units in first layer, 4 units in second, and 3 units in third.
In the final exercise, the code changed to len(parameters) // 2
because now we get a dictionary that contains parameters for each layer. Since we have only two parameters, the data in this would be something like len(parameters) = len([W1, b1, W2, b2, … , Wn, bn]), that is pairs of W and b. Therefore if you divide the length by 2 you get the number of W,b pairs which is the number of layers in the network.
I’m trying to figure out why we’re using a completely different method to find the same piece of information (number of layers) that we already knew from earlier. It just seems roundabout and strange to me, so I’m assuming there’s a good reason for it, but I just don’t get it.
I think the main difference between the two cases is the function where this information is used.
initialize_parameters_deep
. That function would in effect return a representation of the neural network as a dictionary with weights and bias values (Wi
/ bi
).L_model_forward
, which only requires the neural network (parameters
input and X
, the data). You could pass the value of layer_dims
as input too, however, this value can be calculated using the parameters
input therefore it would be somewhat redundant.So, in summary, at initialization time you need to know the number of layers and their dimensions in order to generate the dictionary of W
and b
values. In contrast, when you are performing the forward pass the neural network is already defined so the number of layers can be deducted from the parameters
input.
In any case your logic is correct, in the first L-1 layers you apply RELU and in the last one you apply sigmoid. What kind of error are you getting?
I now have a follow-on question. In Exercise 9, it finds L yet another way:
L = len(caches)
I cannot understand why the number of layers needs to be recalculated repeatedly, and in different ways. Is there a purpose to this? Is there some chance the number of layers has changed partway through running forward/backward propagation? Does it just have something to do with the fact that we’ve broken our work apart into a bunch of small helper functions? Why isn’t L just always the same?
Hi, the number of layers is calculated differently because the input of the functions is different too. In the function L_model_backward
you don’t have the parameters
input, the function is defined as:
def L_model_backward(AL, Y, caches):
From those inputs, the only input from where you could calculate the number of layers is caches
.
Answering the other questions:
L
is only due to the inputs of each function.L
as input (or as a global variable) and then you wouldn’t need to calculate it, but in general functions should have as less inputs as possible, therefore, as L
can be calculated from other inputs then it is not used as a parameter.L
is always the same but it is calculated differently at each step
I hope it helps.
Ah! Yes, I understand now. Thank you! @albertovilla