Course 1 week 4

While backpropagating we only stored Z as cache but we also take derivative of activation function. If the activation function applied are different for each layer we should store A also?? Please clear this

Hello, yes I believe you need to store A and also some weights (W,b) in cache too for backprop, they are useful to compute dA[l-1]

1 Like