Help for C5W1A1

sushantnair · August 1, 2024, 10:26pm

For this function,

UNGRADED FUNCTION: lstm_backward

def lstm_backward(da, caches):
…
this part:

Compute all gradients using lstm_cell_backward. Choose wisely the “da_next” (same as done for Ex 6).

When I do like this:
da[:, :, t], dc_prevt, caches[t]
then wrong values come
when I do like this:
da[:, :, t] + da_prevt, dc_prevt, caches[t]
then error:
ValueError: operands could not be broadcast together with shapes (5,10) (8,10)
I noticed, by printing, that before calling the function, shape of da_prevt is (5, 10) but after calling, the shape becomes (8, 10)
The previous function (lstm_cell_backward) works fine. All values print properly,
Please help.

balaji.ambresh · August 2, 2024, 4:40am

Please click my name and message your notebook as an attachment.

balaji.ambresh · August 2, 2024, 9:02am

Computation of da_prevt is incorrect inside lstm_cell_backward.

Expected:
gradients[“da_prev”].shape = (5, 10)

Actual:
gradients[“da_prev”].shape = (8, 10)

Here’s a hint from the markdown for the exercise:

where the weights for equation 21 are from n_a to the end, (i.e. W_f = W_f[:,n_a:] etc…)

Another hint:
Consider only till :na in the 2nd dimension when computing da_prevt

sushantnair · August 2, 2024, 3:17pm

Ah that worked! Thanks a lot @balaji.ambresh for the valuable suggestion.

A correction though (I mention for the purpose of future classmates who might get stuck): the problem lied not with the lstm_cell_backward (I had done that correctly), but with lstm_backward.
Hopefully, I can paste a snippet to show what went wrong.

While calling the function in the for loop, the first argument I supplied was da[:, :, t] instead of da[:, :, t] + da_prevt, so that was one point of fault. This could be understood based on Exercise 6.
As you pointed out, I had indexed da_prevt wrongly. However, one point I’d like to clarify is that this variable is in the lstm_backward function and not lstm_cell_backward function. That caused a bit of confusion for me.
And, this is how I coded for da_prevt in the for loop: gradients[‘da_prev’] instead of gradients[‘da_prev’][:n_a]. The latter is the correct version.

Thanks

balaji.ambresh · August 2, 2024, 3:38pm

Sorry about the typo.

Topic		Replies	Views
C5W1A1 optional hw Sequence Models	11	1055	October 21, 2022
C5W1 A1 (Ex8) lstm_backward, dc_next missing? Sequence Models	13	739	May 4, 2023
[C5W1A1] wrong results of lstm_backward Sequence Models	14	1336	August 2, 2024
Error in lstm_backward Sequence Models week-1	2	20	October 5, 2024
Lstm_backward(da, caches) Sequence Models	4	585	May 21, 2023

Help for C5W1A1

UNGRADED FUNCTION: lstm_backward

Compute all gradients using lstm_cell_backward. Choose wisely the “da_next” (same as done for Ex 6).

Related topics