For this function,
UNGRADED FUNCTION: lstm_backward
def lstm_backward(da, caches):
…
this part:
Compute all gradients using lstm_cell_backward. Choose wisely the “da_next” (same as done for Ex 6).
When I do like this:
da[:, :, t], dc_prevt, caches[t]
then wrong values come
when I do like this:
da[:, :, t] + da_prevt, dc_prevt, caches[t]
then error:
ValueError: operands could not be broadcast together with shapes (5,10) (8,10)
I noticed, by printing, that before calling the function, shape of da_prevt is (5, 10) but after calling, the shape becomes (8, 10)
The previous function (lstm_cell_backward) works fine. All values print properly,
Please help.
Please click my name and message your notebook as an attachment.
Computation of da_prevt
is incorrect inside lstm_cell_backward
.
Expected:
gradients[“da_prev”].shape = (5, 10)
Actual:
gradients[“da_prev”].shape = (8, 10)
Here’s a hint from the markdown for the exercise:
where the weights for equation 21 are from n_a to the end, (i.e. W_f = W_f[:,n_a:] etc…)
Another hint:
Consider only till :na
in the 2nd dimension when computing da_prevt
2 Likes
Ah that worked! Thanks a lot @balaji.ambresh for the valuable suggestion.
A correction though (I mention for the purpose of future classmates who might get stuck): the problem lied not with the lstm_cell_backward (I had done that correctly), but with lstm_backward.
Hopefully, I can paste a snippet to show what went wrong.
- While calling the function in the for loop, the first argument I supplied was da[:, :, t] instead of da[:, :, t] + da_prevt, so that was one point of fault. This could be understood based on Exercise 6.
- As you pointed out, I had indexed
da_prevt
wrongly. However, one point I’d like to clarify is that this variable is in the lstm_backward function and not lstm_cell_backward function. That caused a bit of confusion for me.
And, this is how I coded for da_prevt in the for loop: gradients[‘da_prev’] instead of gradients[‘da_prev’][:n_a]. The latter is the correct version.
Thanks
1 Like