[Programming assignment] Different definitions to determine L (number of layers) from input variables may lead to confusion

Hi,

I am a long time Andrew Ng’s student. This is probably the 8th or 9th course I have enrolled from his. I completed course 1 from Deep Learning Specialization. I figured I’d give some feedback regarding the first programming assignment of Week 4, in case it could bring about some discussions.

Specifically, in the graded function “initialize_parameters_deep”, L is found as the len of the layer_dims list (i.e, input variable “layer_dims”). The problem with this, is that the end-of-line comment next to it, refers to it as L: number of layers -see below on Case#1 picture.

If we were to define L_code as the definition of L given in this portion of code for the assignment and L_class as the definition given in class, then, L_code = (L_class+1), as layer_dims carries the dimension of the input features (nx) as well. To be perfectly clear, the code works, as the for loop that ensues cycles from l=1 till L_class (or L_code -1), hence initializing W and b parameters in all layers -see below Case#1.

Case#1

In Case#1, L (L_code) is determined as the length of list that contains all A vectors dimensions (including A0=x), and so L is actually the number of layers +1.

This would all be OK, if the methodology was consistent across the programming assignment (i.e, if L_code was commented as being L_class +1 . The issue arises from the fact that in further exercises down, L_code then actually is defined and found as L_class, and so in my case, I am left doubting and confused whether I should be looping particular sections of the code as if L_code was L_class or L_class+1. See below two cases (Case#2 and Case#3 in which L_code = L_class

Case#2

For example, on Case#2, on L_model_forward function, the variable “parameters” is intended to be the output of the previous discussed function “initialize_parameters_deep ()”, and so “parameters” will be a dictionary that has twice as many elements as the layers in the network (W and b per layer)

Let’s take a look at Case#3 below,

Here on Case#3 as well, L_code is the length of the list containing all caches for all layers, and so L_code = L_class.

I think there is one more case where L is retrieved from an input variable in a way that implies L_code = L_class, but I believe it is not necessary to list further cases. Having L commented as the real value of layers or as layers+1 in distinctive occasions is counterintuitive. Could you please consider whether homogenizing the L concept (and the way it is represented) across the assignment is worth the effort, specially since this is perhaps the most arduous programming assignment (or assignment in general) for the entire first course of the specialization?

Thanks for the consideration,

Hi, Eduardo.

Thanks very much for your careful analysis. You’ve clearly explained the different and somewhat inconsistent ways in which the number of layers is used in this assignment. I will make sure that your thread gets noticed by the course staff. I can’t guarantee if or when they will act on it, of course. Their attention and energy lately is focussed on creating new content, as opposed to revising the existing courses, as you can see from the rapidly growing curriculum here.

Best regards,
Paul

1 Like

Thank you Paul, for replying so promptly and for bringing attention to the staff.

1 Like