I am confused about the “activation cache” part. the cache contains two tuples, activation_cache, and linear_cache. linear_cache is pretty straight forward it’s basically (A_prev, W, b) but what does activation cache contain?
In Exercise 4 of the first assignment, in which you complete the
linear_activation_forward()
function, the line before the return statement assigns two separate caches to form a bigger cache: cache = (linear_cache, activation_cache)
. Formally, a tuple with two elements (which themselves can have multiple elements).
The values assigned to the activation_cache
will depend on whether it is a relu
activation or a sigmoid
activation, set in the activation
argument of the linear_activation_forward()
. The mathematics to this is explained in the prelude to Exercise 4. These values are “cached” so that can be used to evaluate the gradient in the backward propagation step.
I hope that this helps! @kenb