DLS Course 1 week 4 assignment 1 exercise 9

Can anyone explain why we have the caches [l] and caches [L-1] ? I find in public_test.py that caches = (linear_cache_activation_1, linear_cache_activation_2) but i don’t understand what it mean.

1 Like

Hi, @thangngxuan. The documentation (the “docstring”) to the L_model_backward() function that you captured is revealing. Before getting lost in the details, remember what is going on here. In backward propagation, we are creating derivative functions, and evaluating them, so that we can compute the gradient for each iteration in the gradient descent algorithm. The caches store the results of forward propagation–the values necessary for derivative evaluation–during backward propagation. Also, remember that A^{[l]}=g(Z^{[l]}) where Z^{[l]} =W^{[l]}A^{[l-1]}+b^{[l]}.

The docstring indicates that it would be useful to study your previous functions and trace the cache history. In a nutshell:

  • linear_forward() produces the first cache: a length -3 tuple containing (A, W, b) for a single layer (where each element is an np.ndarray).
  • linear_activation_forward() produces the next cache – a length -2 tuple. For a single layer it calls linear_forward() to produce (Z, linear cache) and (A, activation_cache) and then, cache = (linear_cache, activation_cache). The activations depend on which activation function is passed to the function: 'relu' or 'sigmoid'.
  • L_model_forward() then loops through the layers of the model calling linear_activation_foward() for each layer, Producing a cache with L elements, each the output of linear_activation_foward().

Having stored away the Z values (linear) and the A values (the nonlinear activations of the Z values), you are now ready for backprop and gradient descent (in Exercise 10 and the ensuing assignment).

I hope that I have encouraged you to peruse the functions in the manner that I have indicated. I remember this being a sticking point for me when I took the course! :thinking:–>:nerd_face:

6 Likes

Thank you sir, I could understand. I very appreciate of that.

Happy to hear it! BTW, I made a classic copy/paste error in my response, which I have corrected.

Previously: Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l-1]}
Corrected: Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l ]}

Looking at the caches in the functions where they are created and in the comments, yields:

    # caches = (cache[layer1], cache[layer2], ... , cache[layerL])
    # current_cache = (linear_cache[current_layer], activation_cache[current_layer])
    # linear_cache = (A, W, b) or (X, W, b) at layer 1
    # activation_cache = relu(Z)