Vote Week 1 Assignment 1 RNN_backward Exercise 6: wrong output data

Alfiik · July 6, 2023, 5:50pm

Hello, I try to complete the def rnn_backward(da, caches) in Exercise 6 - rnn_backward
The results in Exercices 5 are correct with the test data, but in 6 not. The shapes are ok, but the data not. I tried to check the posts here, saw Im not only one, but cant figure out the problem.
I dont want to past here the code,

my output is:
gradients[“dx”][1][2] = [0.04036334 0.01590669 0.00395097 0.01483317]
gradients[“dx”].shape = (3, 10, 4)
gradients[“da0”][2][3] = -0.0007053016291385033
gradients[“da0”].shape = (5, 10)
gradients[“dWax”][3][1] = 8.452426371294356
gradients[“dWax”].shape = (5, 3)
gradients[“dWaa”][1][2] = 1.2707651799408062
gradients[“dWaa”].shape = (5, 5)
gradients[“dba”][4] = [-0.50815277]
gradients[“dba”].shape = (5, 1)

thanks

balaji.ambresh · July 6, 2023, 7:39pm

Please click my name and message your notebook as an attachment.

balaji.ambresh · July 7, 2023, 7:09am

Please fix your call to rnn_cell_backward to include the gradient at time t and the incoming gradient. In your implementation, only the incoming gradient is used.
Another hint: Look at da.

Alfiik · July 7, 2023, 7:33am

thanks, solved. I used original da with t time but didint added the new da.

Justin · July 14, 2023, 7:26am

I am also getting no error but my output is different from Expected Output. Can I send you my code? below is my output.

gradients[“dx”][1][2] = [-0.15028183 -0.34554547 0.02071758 0.01483317]
gradients[“dx”].shape = (3, 10, 4)
gradients[“da0”][2][3] = -0.17268893183890754
gradients[“da0”].shape = (5, 10)
gradients[“dWax”][3][1] = 4.081485734449453
gradients[“dWax”].shape = (5, 3)
gradients[“dWaa”][1][2] = 1.056012342849445
gradients[“dWaa”].shape = (5, 5)
gradients[“dba”][4] = [-0.12427391]
gradients[“dba”].shape = (5, 1)

Alfiik · July 14, 2023, 1:06pm

Check the call finction rnn_cell_backward(…)

[snippet removed by mentor]

Nuria_Plaza · August 30, 2023, 10:28am

Hello everyone, first of all thanks to those of you who have posted here about this issue. It was helpful for me to fix some bugs in the run_cell_backward() function.

Now I have the same problem as Justin. In fact, my code found the same gradients as he did. I have already checked the dimensions of the inputs in the function rnn_cell_backward(), da and cache to ask only for the slice t, but did not get any other gradient values. Also, the gradients of the rnn_cell_backward() function passed all previous tests. So I am quite lost as to how to solve this problem…

I would be very grateful for any ideas or comments!

saifkhanengr · August 30, 2023, 12:06pm

It is stated in the notebook that Choose wisely the "da_next" and the "cache" to use in the backward propagation step.

I guess the problem is with how you are choosing the "da_next". It is not only the da (slice of t). You have to add the gradient of the loss with respect to the hidden state at time step t-1.

Please read the below text from the notebook:

Note that this notebook does not implement the backward path from the Loss ‘J’ backwards to ‘a’.
- This would have included the dense layer and softmax which are a part of the forward path.
- This is assumed to be calculated elsewhere and the result passed to rnn_backward in ‘da’.
- You must combine this with the loss from the previous stages when calling rnn_cell_backward (see figure 7 above).

In other words, you have to add da_prevt with da (slice of t)

hwdc · September 10, 2023, 9:34am

Thanks a lot! resolved by using da[:,:,t] + da_prevt!

James_Bur · November 16, 2023, 5:08pm

I came across the same issue; using da[:,:,t] + da_prevt fixed it for me too, but I don’t know why it’s that and not simply da[:,:,t].

paulinpaloalto · November 17, 2023, 4:50am

The gradients are the sums of the gradients at each timestep. The action is depicted in Figure 7:

That is what is happening in that “+” sign in the green oval that I added at the right hand side of the diagram. It is the da for the current timestep plus the cumulative sum of all the da values from the later timesteps, which is da_{prev} from the point of view of those later timesteps. You can see the current “step” feeding the next da_{prev} off the left side of the diagram to the previous timestep. Of course this is “back prop”, so we are going backwards and in an RNN it’s “backwards in time”, right? Because there is just one “layer” but we repeat it over and over and feed the results forward.

lijiyao · December 18, 2024, 11:01am

I am very confused with the da, where does it come from?

da_prev is from the next timestamp

saifkhanengr · December 18, 2024, 11:09am

I hope this answers your question. If not, let me know.

paulinpaloalto · December 18, 2024, 3:55pm

In addition to the explanation in the text that Saif highlighted there, it’s also visible in the diagram in my previous post here. Remember that there are two outputs from each timestep in an RNN: the feed forward of the hidden state to the next timestep and the output branch that generates the \hat{y}^{<t>} for that timestep. As the text in Saif’s post says, we don’t actually do the work to compute the da from the \hat{y} branch in this assignment: it’s just given to us as an input. Our work is to compute the other branch of the gradients and we just add the da they give us.

Topic		Replies	Views
Trouble results delivered by rnn_backward Sequence Models week-module-1 , coursera-platform	3	186	May 18, 2024
Prograaming Assignment 1 - Week 1 - Exercise 6 - rnn_backward Sequence Models coursera-platform	7	498	May 28, 2023
All values are 0 Sequence Models coursera-platform	2	469	May 16, 2023
D5W1 A1 Assignment Exercise 6 rnn_backward need help Sequence Models coursera-platform	2	779	February 14, 2023
Rnn_backward error Sequence Models coursera-platform	2	460	June 22, 2023

Vote Week 1 Assignment 1 RNN_backward Exercise 6: wrong output data

Related topics