The later is true for the case for backprop. Let’s understand it with the help of a very simple example. Consider the convolution operation. We will examine the
conv_backward functions to see how’s this the case.
conv_forward function, we can see that
A_prev, W and b are the input, and
Z is the output (here, I am considering only some of the IO). Here,
a_prev denotes the output activations from the previous layer, and
Z is used to compute the activations for the current layer, i.e., the input is on the left and output is on the right (visually).
On the contrary, in the
dZ is the input and
dA_prev, dW and db are the outputs. Here,
dZ represents the gradients with respect to the current conv layer’s output, and
dA_prev represents the gradients with respect to the activations of the previous layer, i.e., the input is on the right and output is on the left (visually).
Now, since you have managed to confuse yourself, allow me to confuse you a little bit more, and if possible, dissolve that confusion as well.
conv_backward function, it is mentioned
dA_prev -- gradient of the cost with respect to the input of the conv layer (A_prev). Now, here, you might get confused what exactly is
input referring to, the right or the left. Remember that in this docstring, the
input is referring to
A_prev, i.e., the activations and not any gradients, hence, it should naturally bring you the image of forward propagation. This is just a reference to forward propagation that is mentioned in the backward propagation, and make sure that this doesn’t confuse you whatsoever. I hope this helps.