Hey guys. I have a confusion about linear_cache and activation_cache while doing the assignment. Can anyone tell me what is linear_cache and activation_cache? What’s inside? Cheers!
Hey @CourseraFan, in general, we use cache
to store the values during forward propagation that we will be requiring during the course of the backward propagation. A very simple example can be found in the assignment itself, and I have mentioned here for your reference as well.
While doing backprop for linear layer, for calculating the gradients of W, we require A from the previous layer, as can be seen in the first image above. We have acquired that from the cache variable, as can be seen in the second image above. And here, linear_cache
is the cache variable for the linear layer and activation_cache
is the cache variable for the sigmoid (activation) layer. I hope this helps.
Regards,
Elemento
In addition to @Elemento’s explanation, you can take a quick look at this short thread.
@Elemento has captured the main point of “caching” these values in the first place. I should add to that previous (linked) thread by noting that we need to evaluate the derivatives of the gradient at each “point” in the forward propagation: A^{[l]} = g \left(Z^{[l]}\right) where Z^{[l]} = W^{[l]}A^{[l-1]}+b^{[l]}. The linear_cache
stores the Z^{[l]}'s and the activation_cache
stores the A^{[l]}'s.
In elementary calculus (which is not a requirement), you may have derived the derivative function (yup) y=f^\prime \left(x\right) of f\left(x\right) and then evaluated it at a point x=x_0 or f^\prime \left(x_0\right). For example, y=f\left(x\right)=x^2, f^\prime \left(x\right)=2x. Then, let x_0=3 so that y_0 =f^\prime \left(3\right)=2\left(3\right) = 6. Now you know the slope (i.e. gradient of f at that point.
Fundamentally, that is what is going on here. Forward prop computes the output of the “function” (the neural network), which become the inputs to the gradient function during backprop. We have to store (“cache”) the results of forward prop to evaluate the derivatives comprising the gradient during backward prop.
The linear_activation_forward
function of Exercise 4 returns a single cache that contains two tuples, each of which, is a tuple. First one, the linear cache, the second, the activation cache.
Thanks guys! Really Helped me out!!