Confused about linear_cache and activation_cache

Hey guys. I have a confusion about linear_cache and activation_cache while doing the assignment. Can anyone tell me what is linear_cache and activation_cache? What’s inside? Cheers!

Hey @CourseraFan, in general, we use cache to store the values during forward propagation that we will be requiring during the course of the backward propagation. A very simple example can be found in the assignment itself, and I have mentioned here for your reference as well.

Screenshot from 2022-04-29 16-34-15

Screenshot from 2022-04-29 16-34-28

While doing backprop for linear layer, for calculating the gradients of W, we require A from the previous layer, as can be seen in the first image above. We have acquired that from the cache variable, as can be seen in the second image above. And here, linear_cache is the cache variable for the linear layer and activation_cache is the cache variable for the sigmoid (activation) layer. I hope this helps.

Regards,
Elemento

In addition to @Elemento’s explanation, you can take a quick look at this short thread.

@Elemento has captured the main point of “caching” these values in the first place. I should add to that previous (linked) thread by noting that we need to evaluate the derivatives of the gradient at each “point” in the forward propagation: A^{[l]} = g \left(Z^{[l]}\right) where Z^{[l]} = W^{[l]}A^{[l-1]}+b^{[l]}. The linear_cache stores the Z^{[l]}'s and the activation_cache stores the A^{[l]}'s.

In elementary calculus (which is not a requirement), you may have derived the derivative function (yup) y=f^\prime \left(x\right) of f\left(x\right) and then evaluated it at a point x=x_0 or f^\prime \left(x_0\right). For example, y=f\left(x\right)=x^2, f^\prime \left(x\right)=2x. Then, let x_0=3 so that y_0 =f^\prime \left(3\right)=2\left(3\right) = 6. Now you know the slope (i.e. gradient of f at that point.

Fundamentally, that is what is going on here. Forward prop computes the output of the “function” (the neural network), which become the inputs to the gradient function during backprop. We have to store (“cache”) the results of forward prop to evaluate the derivatives comprising the gradient during backward prop.

The linear_activation_forward function of Exercise 4 returns a single cache that contains two tuples, each of which, is a tuple. First one, the linear cache, the second, the activation cache.

1 Like

Thanks guys! Really Helped me out!!