According to given function (linear_activation_forward) information in exercise 4, we need to return cache which is a tuple that contains two values: linear_cache, activation_cache. But I don’t understand the purpose of “linear_cache” variable. What is it’s data type and how to initialize it? It’s not given in exercise.

Hi @LordBars and welcome to the Specialization. You need to complete this function using functions that you have either previously completed further up in the notebook or have imported in the very first cell. The linear outputs Z, and activations A, need to be stored or (“cached”) for the backpropagation phase in which the gradients (i.e. derivatives) will need to be evaluated. This will become clear later on in the assignment.

`linear_cache`

is ( A^{[l-1]}, W^{[l]}, b^{[l]} ), returned by the `linear_forward(A_prev, W, b)`

function.

`activation_cache`

is Z^{[l]} , returned by either the `sigmoid(Z)`

or the `relu(Z)`

function.

You need A^{[l-1]} for computing dW^{[l]}:

\frac{\partial \ L(a^{[L]}, y)}{\partial \ W^{[l]}} \ = \ \frac{\partial \ L}{\partial \ z^{[l]}} \ \frac{\partial \ z^{[l]}}{\partial \ W^{[l]}} \ = \ \left[ dz^{[l]} \right] \ \left[ a^{[l-1]} \right]

and also W^{[l]} for computing dA^{[l-1]}:

\frac{\partial \ L(a^{[L]}, y)}{\partial \ a^{[l-1]}} \ = \ \frac{\partial \ L(a^{[L]}, y)}{\partial \ z^{[l]}} \ \frac{\partial \ z^{[l]}}{\partial \ a^{[l-1]}} \ = \ \left[ dz^{[l]} \right] \ \left[ W^{[l]} \right]

↑ These two computation procedures are done in the `linear_backward(dZ, cache)`

function:

dZ (dZ^{[l]}) gives you dz^{[l]}_{(i)} for each example (i),

cache ( A^{[l-1]}, W^{[l]}, b^{[l]} ) gives you a^{[l-1]}_{(i)} and W^{[l]}.

You need Z^{[l]} for computing dZ^{[l]}:

\frac{\partial \ L(a^{[L]}, y)}{\partial \ z^{[l]}} \ = \ \frac{\partial \ L(a^{[L]}, y)}{\partial \ a^{[l]}} \ \frac{\partial \ a^{[l]}}{\partial \ z^{[l]}} \ = \ \left[ da^{[l]} \right] \ \left[ f^{[l]'}(z^{[l]}) \right]

↑ This is the `linear_activation_backward(dA, cache, activation)`

function:

dA (dA^{[l]}) gives you da^{[l]}_{(i)} for each example (i),

cache (Z^{[l]}) gives you z^{[l]}_{(i)},

and activation tells you what f^{[l]'}(\cdot) to use.