Backpropagation in LSTM

Please clarify me on these points:

  1. So the matrix da consists of gradients which are computed from loss to the output layer (e.g., a softmax) over at for each time step right?



  2. So i understood that we can backpropagate from loss through soft max to a_next which can be added to da_next, but in final timestep of the model as there is no c_next how dc_next is calculated is it initiated with zeroes or we take the connection between a_next and c_next?

  3. In the last block where i feel i inputed correct code the result is not matching with the expected output.

{mentor edit: code images removed}

Please do not post your code on the forum. That is not allowed by the Code of Conduct.

If a mentor needs to see your code, we’ll contact you with instructions.

1 Like

Sorry, my bad i did not check that particular image

To help me assist you,

Please click on my name to start a private message. Then, attach your notebook as a .ipynb file. Please note that mentors cannot access your Coursera Jupyter workspace, so sending the notebook in a .ipynb format is essential.

Hi , i had sent the file over to you privately. If possible can you help clarify the other questions in the post.

Please read this section of the markdown for lstm_forward and fix c_next

  • Initialize c^{\langle t \rangle} with zeros.
    - The variable name is c_next
    - c^{\langle t \rangle} represents a single time step, so its shape is (n_{a}, m)
    - Note: create c_next as its own variable with its own location in memory. Do not initialize it as a slice of the 3D tensor c. In other words, don’t do c_next = c[:,:,0]

Hi i was referring to the last excersie-8, even though i inputed correct code the few gradients are not matching


And can you please clarify my other queries on this post it would be helful

Did you make the fix I recommended and run rest of the cells?

Sorry for the delay.

  1. Correct. The gradients flow from the output layer in reverse direction.
  2. Just as da0 was set to the final value of da_prev_t since things accumulate over time, dc0 should follow the same path.

Adding @paulinpaloalto and @rmwkwok to confirm.

The recommendation you have provided refers to the other exercise of the assignment which was working fine beforehand…and the mismatch is happening in ex-8

The checks in the notebook are not exhaustive. So, just because your implementation of lstm_forward passes the test code that follows it, that doesn’t mean that lstm_forward is perfect.

The test for exercise 8 invokes lstm_forward before running lstm_backward. Do you have a good reason for not following my instructions?

1 Like

Your right i overlooked one step of the code, as i thought the checks are being passed i did not recheck it.