I am currently doing Week 4 programming assignment 1, I came across terms linear cache and activation cache which I am not aware of. What I do see is that only Z, W, b are only cache mentioned in lectures and they can be used to solve back propagation correctly.
So, my question is what are these terms and how are they calculated along with derivations for where they are used back propagation would be helpful.
We have three terms, cache, linear cache, and activation cache.
cache – a python tuple containing “linear_cache” and “activation_cache”.
linear_cache – a python tuple containing “A^{[l-1]}”, “W^{[l]}” and “b^{[l]}”
activation_cache – if I recall it correctly, a python dictionary containing “A^{[l]}”. Please see relu
function from dnn_utils.py
file.
Also, check back propagation arguments.
The “activation cache” contains Z^{[l]}, not A^{[l]}. But the other general thing to say here is that the terms “linear cache” and “activation cache” are not some kind of industry standard terminology: they are just very specific to how they have us write this particular code in this particular notebook. During forward propagation, we save the values that we are going to need later when we do backward propagation, so that we don’t have to compute them twice.
1 Like
All this was explained in the notebook: you just have to read carefully, including studying all the template code that they gave us. They actually did most of that cache related work for us in the template code: e.g. in the linear_activation_backward
template code notice how they did the work for us of parsing the layer cache entry into the linear and activation cache variables. We just have to pay attention and understand what we are seeing there.
1 Like