Hi,
Thanks for spending time reading my post. I have one question on how to update the gradient during backpropagation in Week 1, Assignment 2. In the ‘utils.py’, specifically in function ‘rnn_backward(X, Y, parameters, cache)’, why do we subtract 1 when updating “dy” during backpropagation? If the activation function is softmax, shouldn’t it be dy=y*(1-y)?
Thank you very much in advance for any help.