DLS Course 1 week 4 assignment 1 exercise 9

thangngxuan · October 22, 2021, 11:45am

Can anyone explain why we have the caches [l] and caches [L-1] ? I find in public_test.py that caches = (linear_cache_activation_1, linear_cache_activation_2) but i don’t understand what it mean.

kenb · October 22, 2021, 3:42pm

Hi, @thangngxuan. The documentation (the “docstring”) to the L_model_backward() function that you captured is revealing. Before getting lost in the details, remember what is going on here. In backward propagation, we are creating derivative functions, and evaluating them, so that we can compute the gradient for each iteration in the gradient descent algorithm. The caches store the results of forward propagation–the values necessary for derivative evaluation–during backward propagation. Also, remember that A^{[l]}=g(Z^{[l]}) where Z^{[l]} =W^{[l]}A^{[l-1]}+b^{[l]}.

The docstring indicates that it would be useful to study your previous functions and trace the cache history. In a nutshell:

linear_forward() produces the first cache: a length -3 tuple containing (A, W, b) for a single layer (where each element is an np.ndarray).
linear_activation_forward() produces the next cache – a length -2 tuple. For a single layer it calls linear_forward() to produce (Z, linear cache) and (A, activation_cache) and then, cache = (linear_cache, activation_cache). The activations depend on which activation function is passed to the function: 'relu' or 'sigmoid'.
L_model_forward() then loops through the layers of the model calling linear_activation_foward() for each layer, Producing a cache with L elements, each the output of linear_activation_foward().

Having stored away the Z values (linear) and the A values (the nonlinear activations of the Z values), you are now ready for backprop and gradient descent (in Exercise 10 and the ensuing assignment).

I hope that I have encouraged you to peruse the functions in the manner that I have indicated. I remember this being a sticking point for me when I took the course! –>

thangngxuan · October 23, 2021, 6:36am

Thank you sir, I could understand. I very appreciate of that.

kenb · October 23, 2021, 12:44pm

Happy to hear it! BTW, I made a classic copy/paste error in my response, which I have corrected.

Previously: Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l-1]}
Corrected: Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l ]}

Roland_Schuetz · September 16, 2022, 4:43pm

Looking at the caches in the functions where they are created and in the comments, yields:

    # caches = (cache[layer1], cache[layer2], ... , cache[layerL])
    # current_cache = (linear_cache[current_layer], activation_cache[current_layer])
    # linear_cache = (A, W, b) or (X, W, b) at layer 1
    # activation_cache = relu(Z)

Topic		Replies	Views
Confused about linear_cache and activation_cache Neural Networks and Deep Learning	4	674	April 30, 2022
Help on week 4 Q8 "linear_activation_backward" Neural Networks and Deep Learning	2	492	April 17, 2023
Week 4: Exercise 4 - Initializing linear_cache Neural Networks and Deep Learning	2	728	July 21, 2021
Assignment E9 L_model Backward Neural Networks and Deep Learning week-4	1	20	August 22, 2024
W 4 A 1 \| Ex- 4 \| Cannot identify linear_cache in ex-4? Neural Networks and Deep Learning	3	520	October 8, 2022

DLS Course 1 week 4 assignment 1 exercise 9

Related topics