Week 4: Exercise 4 - Initializing linear_cache

LordBars · July 21, 2021, 2:52pm

According to given function (linear_activation_forward) information in exercise 4, we need to return cache which is a tuple that contains two values: linear_cache, activation_cache. But I don’t understand the purpose of “linear_cache” variable. What is it’s data type and how to initialize it? It’s not given in exercise.

kenb · July 21, 2021, 4:36pm

Hi @LordBars and welcome to the Specialization. You need to complete this function using functions that you have either previously completed further up in the notebook or have imported in the very first cell. The linear outputs Z, and activations A, need to be stored or (“cached”) for the backpropagation phase in which the gradients (i.e. derivatives) will need to be evaluated. This will become clear later on in the assignment.

Tom03 · July 21, 2021, 5:45pm

linear_cache is ( A^{[l-1]}, W^{[l]}, b^{[l]} ), returned by the linear_forward(A_prev, W, b) function.
activation_cache is Z^{[l]} , returned by either the sigmoid(Z) or the relu(Z) function.

You need A^{[l-1]} for computing dW^{[l]}:
\frac{\partial \ L(a^{[L]}, y)}{\partial \ W^{[l]}} \ = \ \frac{\partial \ L}{\partial \ z^{[l]}} \ \frac{\partial \ z^{[l]}}{\partial \ W^{[l]}} \ = \ \left[ dz^{[l]} \right] \ \left[ a^{[l-1]} \right]
and also W^{[l]} for computing dA^{[l-1]}:
\frac{\partial \ L(a^{[L]}, y)}{\partial \ a^{[l-1]}} \ = \ \frac{\partial \ L(a^{[L]}, y)}{\partial \ z^{[l]}} \ \frac{\partial \ z^{[l]}}{\partial \ a^{[l-1]}} \ = \ \left[ dz^{[l]} \right] \ \left[ W^{[l]} \right]

↑ These two computation procedures are done in the linear_backward(dZ, cache) function:
dZ (dZ^{[l]}) gives you dz^{[l]}_{(i)} for each example (i),
cache ( A^{[l-1]}, W^{[l]}, b^{[l]} ) gives you a^{[l-1]}_{(i)} and W^{[l]}.

You need Z^{[l]} for computing dZ^{[l]}:
\frac{\partial \ L(a^{[L]}, y)}{\partial \ z^{[l]}} \ = \ \frac{\partial \ L(a^{[L]}, y)}{\partial \ a^{[l]}} \ \frac{\partial \ a^{[l]}}{\partial \ z^{[l]}} \ = \ \left[ da^{[l]} \right] \ \left[ f^{[l]'}(z^{[l]}) \right]

↑ This is the linear_activation_backward(dA, cache, activation) function:
dA (dA^{[l]}) gives you da^{[l]}_{(i)} for each example (i),
cache (Z^{[l]}) gives you z^{[l]}_{(i)},
and activation tells you what f^{[l]'}(\cdot) to use.

Topic		Replies	Views
Course1 Week4 Lab1 exercise4 Neural Networks and Deep Learning	2	511	September 14, 2022
Help on week 4 Q8 "linear_activation_backward" Neural Networks and Deep Learning	2	492	April 17, 2023
W4 First Programming Assignment - Exercise 4 Neural Networks and Deep Learning	2	503	August 10, 2022
Confused about linear_cache and activation_cache Neural Networks and Deep Learning	4	677	April 30, 2022
DLS Course 1 week 4 assignment 1 exercise 9 Neural Networks and Deep Learning	4	1121	September 16, 2022

Week 4: Exercise 4 - Initializing linear_cache

Related topics