Hi,
I am a long time Andrew Ng’s student. This is probably the 8th or 9th course I have enrolled from his. I completed course 1 from Deep Learning Specialization. I figured I’d give some feedback regarding the first programming assignment of Week 4, in case it could bring about some discussions.
Specifically, in the graded function “initialize_parameters_deep”, L is found as the len of the layer_dims list (i.e, input variable “layer_dims”). The problem with this, is that the end-of-line comment next to it, refers to it as L: number of layers -see below on Case#1 picture.
If we were to define L_code as the definition of L given in this portion of code for the assignment and L_class as the definition given in class, then, L_code = (L_class+1), as layer_dims carries the dimension of the input features (nx) as well. To be perfectly clear, the code works, as the for loop that ensues cycles from l=1 till L_class (or L_code -1), hence initializing W and b parameters in all layers -see below Case#1.
Case#1
In Case#1, L (L_code) is determined as the length of list that contains all A vectors dimensions (including A0=x), and so L is actually the number of layers +1.
This would all be OK, if the methodology was consistent across the programming assignment (i.e, if L_code was commented as being L_class +1 . The issue arises from the fact that in further exercises down, L_code then actually is defined and found as L_class, and so in my case, I am left doubting and confused whether I should be looping particular sections of the code as if L_code was L_class or L_class+1. See below two cases (Case#2 and Case#3 in which L_code = L_class
Case#2
For example, on Case#2, on L_model_forward function, the variable “parameters” is intended to be the output of the previous discussed function “initialize_parameters_deep ()”, and so “parameters” will be a dictionary that has twice as many elements as the layers in the network (W and b per layer)
Let’s take a look at Case#3 below,
Here on Case#3 as well, L_code is the length of the list containing all caches for all layers, and so L_code = L_class.
I think there is one more case where L is retrieved from an input variable in a way that implies L_code = L_class, but I believe it is not necessary to list further cases. Having L commented as the real value of layers or as layers+1 in distinctive occasions is counterintuitive. Could you please consider whether homogenizing the L concept (and the way it is represented) across the assignment is worth the effort, specially since this is perhaps the most arduous programming assignment (or assignment in general) for the entire first course of the specialization?
Thanks for the consideration,


