Bug in reference initialize_parameters_zeros (week 1 assignment 1)

There seem to be two bugs in the reference code Initialization.ipynb for the week 1 assignment. (The bugs happen to cancel each other out.)

  1. The number L of layers is not len(layers_dims); instead it is len(layers_dims) - 1.

  2. The loop for l in range(1, L) should be for l in range(1, L+1). Otherwise, WL and bL are not set!

def initialize_parameters_zeros(layers_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the size of each layer.
    
    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
                    b1 -- bias vector of shape (layers_dims[1], 1)
                    ...
                    WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
                    bL -- bias vector of shape (layers_dims[L], 1)
    """
    
    parameters = {}
    L = len(layers_dims)            # number of layers in the network
    
    for l in range(1, L):
        #(≈ 2 lines of code)
        # parameters['W' + str(l)] = 
        # parameters['b' + str(l)] = 
        # YOUR CODE STARTS HERE
        
        # YOUR CODE ENDS HERE
    return parameters

You are correct that L is one greater than the number of layers, but that’s why the loop as written is correct. Remember that python indexing is 0-based. Try running the following loop and watch what happens:

for ii in range(1,5):
    print(f"ii = {ii}")

So it’s really only the docstring comment that is wrong.

Throughout the course, L denotes the number of layers. It would seem unwise to redefine its meaning just in this function.

Also note that the docstring on parameters, WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1]) is only correct for the normal meaning of L.

I’m not sure if this is the right place to bring it up, but searching for L = len(layers_dims) - 1 returned this topic.

The inconsistent way that L is set in the Initialization programming assignment confused me for a while:

L is assigned with len(layers_dims) in both the initialize_parameters_zeros and initialize_parameters_random functions, but it’s assigned with L = len(layers_dims) - 1 in the initialize_parameters_he function.

The function’s loops all handle their respective values of L correctly, so it’s not a bug in the code. The comment integer representing the number of layers is shown for L in all three functions though. It’s got to be wrong for one of the cases.

My understanding is that this is correct:

L = len(layers_dims)  # integer representing the number of layers

That is counting layer 0 (the input layer) as a layer. I think that’s the approach that’s used in the teaching material, for example where A0 refers to the input layer activations.

If that’s correct, then the comment in the initialize_parameters_he function is wrong:

L = len(layers_dims) - 1 # integer representing the number of layers

L = len(layers_dims) - 1

is the correct number for the number of layers. The point is you need to know the size of the inputs, but those are not a “layer” in Prof Ng’s terminology.

So in a 2 layer network, you need to know the dimension of the inputs, the number of neurons in the first layer and the number of neurons output by the second layer. That is 3 numbers for a 2 layer network.

But just because I have a variable called L in my function does not mean that it always means the same thing. You have to read the code and figure out how things are working in that particular instance.

It just seemed inconsistent to have two different ways of dealing with the L parameter in more or less identical functions in the same project. I wasn’t sure if that was intentional or not. So I spent some time trying to figure out why L was assigned differently in initialize_parameters_he. I was trying to understand if it’s done for a reason other than keeping people on their toes.

I don’t think it’s part of a teaching strategy. Consistency is rarely a characteristic of Machine Learning methods or instruction. There are very few firm standards.