Why set x0 and a_prev to zero when sampling?

Why not randomly set x0 and a_prev instead of setting them to zero?

It seems to me that setting them to zero will somehow cause the first time step output to be biased. On the other hand randomly setting x0 and a_prev guarantees that selecting the first character will not be the same.

This is analogous to using np.random.choice for the following steps. So my question is why use random selection in the following steps only and not also use a random selection for x0 and a_prev?

Which assignment are you working on? Week number and assignment name please.

the assignment is W1A2

The sampling function is used to generate new random dinosaur names.

For the first character, there is no previous character to use as the input, so they’re set to zeros. I believe this is for consistency with how the system was trained.

You could try initializing them randomly, and see how it works. Please post back your results.

I also had a slightly different question in a similar area, so I tried it.

If I reset it to 0 every time, the results would be biased. The loss was similar, but I don’t think it will help the model’s performance.

However, when initialized with randn(), there was no bias and the final learning results were at a similar level.

So my guess is that the reason for sending a_prev to next optimize is just for execution performance reasons.

Is this guess correct?