Week 4: Exercise 4 - Initializing linear_cache

According to given function (linear_activation_forward) information in exercise 4, we need to return cache which is a tuple that contains two values: linear_cache, activation_cache. But I don’t understand the purpose of “linear_cache” variable. What is it’s data type and how to initialize it? It’s not given in exercise.

Hi @LordBars and welcome to the Specialization. You need to complete this function using functions that you have either previously completed further up in the notebook or have imported in the very first cell. The linear outputs Z, and activations A, need to be stored or (“cached”) for the backpropagation phase in which the gradients (i.e. derivatives) will need to be evaluated. This will become clear later on in the assignment.


linear_cache is ( A^{[l-1]}, W^{[l]}, b^{[l]} ), returned by the linear_forward(A_prev, W, b) function.
activation_cache is Z^{[l]} , returned by either the sigmoid(Z) or the relu(Z) function.

You need A^{[l-1]} for computing dW^{[l]}:
\frac{\partial \ L(a^{[L]}, y)}{\partial \ W^{[l]}} \ = \ \frac{\partial \ L}{\partial \ z^{[l]}} \ \frac{\partial \ z^{[l]}}{\partial \ W^{[l]}} \ = \ \left[ dz^{[l]} \right] \ \left[ a^{[l-1]} \right]
and also W^{[l]} for computing dA^{[l-1]}:
\frac{\partial \ L(a^{[L]}, y)}{\partial \ a^{[l-1]}} \ = \ \frac{\partial \ L(a^{[L]}, y)}{\partial \ z^{[l]}} \ \frac{\partial \ z^{[l]}}{\partial \ a^{[l-1]}} \ = \ \left[ dz^{[l]} \right] \ \left[ W^{[l]} \right]

↑ These two computation procedures are done in the linear_backward(dZ, cache) function:
dZ (dZ^{[l]}) gives you dz^{[l]}_{(i)} for each example (i),
cache ( A^{[l-1]}, W^{[l]}, b^{[l]} ) gives you a^{[l-1]}_{(i)} and W^{[l]}.

You need Z^{[l]} for computing dZ^{[l]}:
\frac{\partial \ L(a^{[L]}, y)}{\partial \ z^{[l]}} \ = \ \frac{\partial \ L(a^{[L]}, y)}{\partial \ a^{[l]}} \ \frac{\partial \ a^{[l]}}{\partial \ z^{[l]}} \ = \ \left[ da^{[l]} \right] \ \left[ f^{[l]'}(z^{[l]}) \right]

↑ This is the linear_activation_backward(dA, cache, activation) function:
dA (dA^{[l]}) gives you da^{[l]}_{(i)} for each example (i),
cache (Z^{[l]}) gives you z^{[l]}_{(i)},
and activation tells you what f^{[l]'}(\cdot) to use.