At the end of backprop, I noticed that da0 is stored. Is this to update a0 (which was randomly initialized) for the next iteration?
In other words, is the initial hidden state a0 learned by the model, just like the weights?
I also noticed that c0, unlike a0, is initialized to zeros in lstm_forward. And seeing how we don’t keep track of dc0, I’m assuming that we do not learn c0 (keep c0 = 0 for every iteration).
Is there a reason we treat the initial hidden state & initial cell state differently? What is the interpretation of these at t=0 anyway?
If a represents the current state of the model, and c represents the long term memory, what do a0 and c0 even mean, seeing as the model has not started yet? Thanks for any insight.
This assignment doesn’t give a complete and functional RNN. That’s why the assignment ends without actually using the RNN in solving a real example.
Your question points out some of its defects and simplifiations.
The “R” in “RNN” means that in spite of how the RNN is drawn (as a sequence of separate cells), when implemented it’s just one cell that is called repeatedly in a loop.
On each pass, the a0 and c0 values are used to initialize the first call. After the first call, the a and c from the previous call are used as the input on the next call.
There’s no reason (other than as a mistake in the assignment) why a0 and c0 are handled differently.
Thanks for your response. So just to be clear, both a0 and c0 should be 1) initialized to random values on the first pass, and 2) updated by da0 and dc0 after each pass?