Why do we use caches ? These appear confusing is it possible to maintain an array of arrays ? for example we can have an array of Z and A arrays. This would also be useful in loops. same thing can be used in backpropogation
The caches are python “lists”, each element of which is a python “tuple” of numpy arrays.
The point of doing this is that there are a number of values that we compute in each iteration of forward propagation that will be required when we do the back propagation step for that iteration later. We could recompute those quantities during back propagation, but that would be more costly than just saving them during forward propagation and then using the cached values. It’s a tradeoff between compute cost (running parts of the algorithms twice in each iteration) versus some added runtime memory to pass the saved values to back prop. Memory is cheaper than cpu generally speaking, but of course it matters in any specific case how much memory and how much cpu is involved. All the computations here involve potentially very large arrays (thousands of elements), so it turns out in this case the memory is worth it to save work for the GPUs.
When starting and working on smaller projects, using an array of Z and A may be easier and more intuitive. However, as you progress to larger networks and more complex architectures, the cache approach will likely save you time and reduce errors. It is designed to help manage complexity and support different types of layers.