W1 A1 - Is the initial hidden state a0 learned in RNN/LSTM?

richardddli · April 6, 2024, 10:26am

At the end of backprop, I noticed that da0 is stored. Is this to update a0 (which was randomly initialized) for the next iteration?

In other words, is the initial hidden state a0 learned by the model, just like the weights?

I also noticed that c0, unlike a0, is initialized to zeros in lstm_forward. And seeing how we don’t keep track of dc0, I’m assuming that we do not learn c0 (keep c0 = 0 for every iteration).

Is there a reason we treat the initial hidden state & initial cell state differently? What is the interpretation of these at t=0 anyway?

If a represents the current state of the model, and c represents the long term memory, what do a0 and c0 even mean, seeing as the model has not started yet? Thanks for any insight.

TMosh · April 6, 2024, 5:02pm

This assignment doesn’t give a complete and functional RNN. That’s why the assignment ends without actually using the RNN in solving a real example.

Your question points out some of its defects and simplifiations.

The “R” in “RNN” means that in spite of how the RNN is drawn (as a sequence of separate cells), when implemented it’s just one cell that is called repeatedly in a loop.

On each pass, the a0 and c0 values are used to initialize the first call. After the first call, the a and c from the previous call are used as the input on the next call.

There’s no reason (other than as a mistake in the assignment) why a0 and c0 are handled differently.

richardddli · April 9, 2024, 11:58pm

Thanks for your response. So just to be clear, both a0 and c0 should be 1) initialized to random values on the first pass, and 2) updated by da0 and dc0 after each pass?

TMosh · April 10, 2024, 2:17am

Sorry, I do not know for certain. The implementation in this assignment is incomplete, and I’ve never tried to use it for training a real system.

The hand-coded backpropagation code (in Python) is very inefficient compared to what you get with a software package.

When you use a high-performance RNN system (like TensorFlow) it handles all this for you, and I’ve never pondered the internal details.

Topic		Replies	Views
Week 1 #deep-learning-specialization:dls-course-5 Sequence Models coursera-platform	1	589	June 3, 2022
RNN lecture and programming exercise: activation 0 Sequence Models coursera-platform	3	572	April 13, 2022
W1A1: Hidden State in RNN Sequence Models coursera-platform	1	516	June 6, 2022
Week 1, jazz assignment Sequence Models coursera-platform	2	515	August 20, 2021
C5W1 - Assignment 1 - Exercise 2 - RNN forward Sequence Models coursera-platform	3	720	January 8, 2022

W1 A1 - Is the initial hidden state a0 learned in RNN/LSTM?

Related topics