In week 1 dinosaur assignment, the sample() function step 3 comment says:
Step 3: Sample the index of a character within the vocabulary from the probability distribution y
Why is the distribution y the right thing to do? I understand that we want to make sure the next letter selected won’t be the same so we want to use something for distribution. But why y? I don’t understand.
Y is the probability distribution of the next character indexed by i. To help ensure the next character is not the same, the function np.random.choice is used to generate the next i randomly.
@Kic Thanks, I got what the instructions say it is. However, y is the softmax of z. I understand that y<t+1> is the prediction of the letter. How can it be a probability too?
In the 6th video of week1, Language Model and Sequence Generation, Prof. Ng explained that the job of a RNN langugage model is to predict the probability of a word given the previous word/words. So, for y-hat of certain time stamp, the output of the softmax function is the probability of a particular word picked from the corpus/dictionary that satisfies P( Y-hat / y1, y2) where y1 and y2 are preceding words in the sentence.
If you go back to an earlier video, where Prof. Ng. talked about the ‘Apple and pear salad’ example, you would remember that he gave two sentences sounded the same:
Apple and pair salad
Apple and pear salad
How did the model choose which one to pick? By looking at the probability of the softmax output, where the second sentence has a higher probability than the first, so the second sentence was chosen.
You will find revisiting those videos helpful to reinforce these concepts.