Why not randomly set x0 and a_prev instead of setting them to zero?
It seems to me that setting them to zero will somehow cause the first time step output to be biased. On the other hand randomly setting x0 and a_prev guarantees that selecting the first character will not be the same.
This is analogous to using np.random.choice for the following steps. So my question is why use random selection in the following steps only and not also use a random selection for x0 and a_prev?
The sampling function is used to generate new random dinosaur names.
For the first character, there is no previous character to use as the input, so they’re set to zeros. I believe this is for consistency with how the system was trained.
You could try initializing them randomly, and see how it works. Please post back your results.