Hi,
I am confused about the notation x = y. x is a Tth example which is a word and y is a probability , am I correct there?
So for x<2>, it is “cat” and y is a probability of a word being something like a float, say 0.7.
Can I get more clarification here as to what we are doing?
x<i>
refers to the input to an RNN at time, i
and y<i>
is the output based on a<i-1>
and x<i>
.
Input to an RNN is a representation say, one-hot encoding of the word and the output corresponds to the softmax output. So, in this case, when the one-hot representation of Cats
is input to the network, the RNN outputs probability of each word, given the input. From there, you can select the most likey word using argmax
on the output.
Let’s say you found average
to have the highest probability. You can now use the representation of average
as input at next timestep and so on.