Confused about linear_cache and activation_cache

CourseraFan · April 29, 2022, 8:56am

Hey guys. I have a confusion about linear_cache and activation_cache while doing the assignment. Can anyone tell me what is linear_cache and activation_cache? What’s inside? Cheers!

Elemento · April 29, 2022, 11:08am

Hey @CourseraFan, in general, we use cache to store the values during forward propagation that we will be requiring during the course of the backward propagation. A very simple example can be found in the assignment itself, and I have mentioned here for your reference as well.

Screenshot from 2022-04-29 16-34-15

Screenshot from 2022-04-29 16-34-28

While doing backprop for linear layer, for calculating the gradients of W, we require A from the previous layer, as can be seen in the first image above. We have acquired that from the cache variable, as can be seen in the second image above. And here, linear_cache is the cache variable for the linear layer and activation_cache is the cache variable for the sigmoid (activation) layer. I hope this helps.

Regards,
Elemento

kenb · April 29, 2022, 12:23pm

In addition to @Elemento’s explanation, you can take a quick look at this short thread.

kenb · April 29, 2022, 1:34pm

@Elemento has captured the main point of “caching” these values in the first place. I should add to that previous (linked) thread by noting that we need to evaluate the derivatives of the gradient at each “point” in the forward propagation: A^{[l]} = g \left(Z^{[l]}\right) where Z^{[l]} = W^{[l]}A^{[l-1]}+b^{[l]}. The linear_cache stores the Z^{[l]}'s and the activation_cache stores the A^{[l]}'s.

In elementary calculus (which is not a requirement), you may have derived the derivative function (yup) y=f^\prime \left(x\right) of f\left(x\right) and then evaluated it at a point x=x_0 or f^\prime \left(x_0\right). For example, y=f\left(x\right)=x^2, f^\prime \left(x\right)=2x. Then, let x_0=3 so that y_0 =f^\prime \left(3\right)=2\left(3\right) = 6. Now you know the slope (i.e. gradient of f at that point.

Fundamentally, that is what is going on here. Forward prop computes the output of the “function” (the neural network), which become the inputs to the gradient function during backprop. We have to store (“cache”) the results of forward prop to evaluate the derivatives comprising the gradient during backward prop.

The linear_activation_forward function of Exercise 4 returns a single cache that contains two tuples, each of which, is a tuple. First one, the linear cache, the second, the activation cache.

CourseraFan · April 30, 2022, 4:24am

Thanks guys! Really Helped me out!!

Topic		Replies	Views
DLS Course 1 week 4 assignment 1 exercise 9 Neural Networks and Deep Learning	4	1121	September 16, 2022
Help on week 4 Q8 "linear_activation_backward" Neural Networks and Deep Learning	2	492	April 17, 2023
Activation cache Neural Networks and Deep Learning	5	587	August 26, 2023
Help with linear cache and activation cache Neural Networks and Deep Learning	3	430	October 13, 2023
Course1 Week4 Lab1 exercise4 Neural Networks and Deep Learning	2	511	September 14, 2022

Confused about linear_cache and activation_cache

Related topics