D5W1 A1 Assignment Exercise 6 rnn_backward need help

D5W1 A1 Assignment Exercise 6 rnn_backward need help

in the D5W1 A1 exercise for “Building_a_Recurrent_Neural_Network_Step_by_Step”

Exercise 6 rnn_backward need help I run into a strange problem and need help understanding.

when I run rnn_backward I got this error:


ValueError Traceback (most recent call last)
in
11 a_tmp, y_tmp, caches_tmp = rnn_forward(x_tmp, a0_tmp, parameters_tmp)
12 da_tmp = np.random.randn(5, 10, 4)
—> 13 gradients_tmp = rnn_backward(da_tmp, caches_tmp)
14
15 print(“gradients[“dx”][1][2] =”, gradients_tmp[“dx”][1][2])

in rnn_backward(da, caches)
39 for t in reversed(range(T_x)):
40 # Compute gradients at time step t. Choose wisely the “da_next” and the “cache” to use in the backward propagation step. (≈1 line)
—> 41 gradients = rnn_cell_backward(da_prevt, caches[t])
42 # Retrieve derivatives from gradients (≈ 1 line)
43 dxt, da_prevt, dWaxt, dWaat, dbat = gradients[“dxt”], gradients[“da_prev”], gradients[“dWax”], gradients[“dWaa”], gradients[“dba”]

in rnn_cell_backward(da_next, cache)
30 ### START CODE HERE ###
31 # compute the gradient of dtanh term using a_next and da_next (≈1 line)
—> 32 dtanh = da_next * (1 - np.tanh(np.dot(Wax, xt) + np.dot(Waa, a_prev) + ba)**2)
33
34 # compute the gradient of the loss with respect to Wax (≈2 lines)

ValueError: operands could not be broadcast together with shapes (10,4) (5,10)

However a step before when I worked on rnn_cell_backward I got the function working perfectly.

gradients[“dxt”][1][2] = -1.3872130506020925
gradients[“dxt”].shape = (3, 10)
gradients[“da_prev”][2][3] = -0.15239949377395495
gradients[“da_prev”].shape = (5, 10)
gradients[“dWax”][3][1] = 0.4107728249354584
gradients[“dWax”].shape = (5, 3)
gradients[“dWaa”][1][2] = 1.1503450668497135
gradients[“dWaa”].shape = (5, 5)
gradients[“dba”][4] = [0.20023491]
gradients[“dba”].shape = (5, 1)

So I am not sure why the aggregation stage ended up with mismatched shape (usually a result of the underlying operation say in rnn_cell_backward). need help in understanding what I may have done wrong here.

My lab ID is pahlbtkn

There are potentially two errors in your code.
The first one is shape related. It looks like you passed a parameter that has the shape of (10,4).
From Traceback, it must be da_prevt. I suppose you initialized da_prevt in a wrong shape.

In addition, passing da_prevt to rnn_cell_backward() is not enough.

Choose wisely the “da_next” and the “cache”…

You have an upstream gradients passed as a parameter. So, you can easily get gradients at time “t”, which should be added to da_prevt.

Please revisit above two points.

In your rnn_cell_backward() function…
please review this snippet:

31 # compute the gradient of dtanh term using a_next and da_next (≈1 line)
—> 32 dtanh = da_next * (1 - np.tanh(np.dot(Wax, xt) + np.dot(Waa, a_prev) + ba)**2)

Remember that: a_next = np.tanh(np.dot(Wax, xt) + np.dot(Waa, a_prev) + ba)
I think you should write this operation like: dtanh = da_next * (1 - (a_next ** 2))
So be carefull with this part, because you are computing tanh activation function again!!!