So I understand the basic premise of this part of the lab, but I’m wondering why even need the activations from the previous state in the post-attention LSTM considering the previous char(which is a number does not really predict the next char like also explained in the photo.
There is still some correlation in the number sequences for years, months, and days.
- Years probably start with a 1 or 2.
- The first digit of a month can only be 0 or 1
- The first digit of a date can’t be be greater than 3.
Ah gotcha! Fair point, thanks for the answer!!