I’m unsure exactly what to do on the optional assignment for the lstm backpropagation lstm_backward
.
The da
tensor was given and was randomly generated but dc
was never given.
I tried initializing dc_prev
to zero and setting the first da_prev = da[:, :, T_x]
and passing them into lstm_cell_backward
but that also wasn’t correct.
I know I am iterating through the caches right as they should be in reverse from T_x \rightarrow 0 and I have verified that my lstm_cell_backward
is coded correctly but what am I missing?
Reverse order iteration would mean that you are iterating from the last time step index.
As far as dc
is concerned, when you start the backward propagation, there is no value at the end. So, consider the initial values of da_prevt
and dc_prevt
to be the additive identity. Hint: If a + x = x, what is a?
I’m slightly confused, if da_prevt
and dc_prevt
are initialized as zero and treated initially as the additive identity then wouldn’t setting da_prevt
to the initial value of da
, in this case da_prev = da[:,:, T_x - 1]
, do the same?
I also tried passing da[:, :, t] + da_prevt
where da_prevt = np.zeros((n_a, m))
initially to the lstm_cell_backward
function but that also wasn’t correct.
Did you use the return value i.e. gradients
to update the values of da_prevt
and dc_prevt
?
Yes, i updated da_prevt
and dc_prevt
via =
, not +=
, the gradients where retreived via their dictionary keys.
Please click my name and message your notebook as an attachment.
Please fix lstm_cell_backward
.
Calculation of dc_prev
has a bug.
Thank you, that was the issue.
It was hard to spot what was wrong, wasn’t expecting it to be the forget gate value that I was missing since the unit test given for the lstm_cell_backward
didn’t show a huge change in the value so I though it may have just been a float point rounding error.