In the example “Cats on average sleep 15 hours” Why Input X1 in RNN taken as zero vector instead of “Cats”
then based on X1 as zero vector, Andrew assumed that y1_hat output will be “Cats” and then it will be input to RNN at 2nd timestamp so that x2=y1_hat
why not to take X1 as “Cats”
If you want the model to generate a sentence on cats
, there’s nothing wrong in feeding the 1st input as cats
. With zero vector as input, the output is most likely going to be the most frequent starting word in the corpus. There’s nothing stopping you from seeding the RNN with a custom prefix while maintaining the context of words seen so far.
Typical use of RNNs in generational tasks is to emit multiple sample data points based on the training samples. For instance, you could train an RNN to generate text in the style of a poet. In this case, we care more about the text as a whole and not if the sentence starts with a word like cats
.
thanks a lot