I’m doing the optional backpropagation part of the RNN programming assignment 1 and I spent an an embarrassing amount of time trying to figure out why my
def rnn_backward(da, caches):
method was returning the wrong answer. There is a longer 2021 thread on this in Week 1 Assignment 1 Backpropagation but like others in that thread, I overlooked the addition of the cost derivatives from the output/fully connected layer in my implementation.
I realize the backpropagation section is optional/advanced, and I know that the addition is pointed out in the note for Figure 7 but given how many people have been confused by it, it might be worth both adding to the comment in the skeleton code of rnn_backward and going into a bit more depth in the assignment on this aspect of the back propagation?