Why do we susbtract 1 in rnn backward provided in C5W1 assignment 2


# Backpropagate through time
for t in reversed(range(len(X))):
    dy = np.copy(y_hat[t])
    dy[Y[t]] -= 1
    gradients = rnn_step_backward(dy, gradients, parameters, x[t], a[t], a[t-1])

Okay, I figured this out. This is due to this being the softmax derivative. (a-y), since there was no comment . It felt difficult to immediately guess this