While backpropagating we only stored Z as cache but we also take derivative of activation function. If the activation function applied are different for each layer we should store A also?? Please clear this
Hello, yes I believe you need to store A and also some weights (W,b) in cache too for backprop, they are useful to compute dA[l-1]
1 Like