I am confused about the notation x = y. x is a Tth example which is a word and y is a probability , am I correct there?
So for x<2>, it is “cat” and y is a probability of a word being something like a float, say 0.7.
Can I get more clarification here as to what we are doing?
x<i> refers to the input to an RNN at time,
y<i> is the output based on
Input to an RNN is a representation say, one-hot encoding of the word and the output corresponds to the softmax output. So, in this case, when the one-hot representation of
Cats is input to the network, the RNN outputs probability of each word, given the input. From there, you can select the most likey word using
argmax on the output.
Let’s say you found
average to have the highest probability. You can now use the representation of
average as input at next timestep and so on.