Language Model and Sequence


I am confused about the notation x = y. x is a Tth example which is a word and y is a probability , am I correct there?

So for x<2>, it is “cat” and y is a probability of a word being something like a float, say 0.7.

Can I get more clarification here as to what we are doing?

x<i> refers to the input to an RNN at time, i and y<i> is the output based on a<i-1> and x<i>.
Input to an RNN is a representation say, one-hot encoding of the word and the output corresponds to the softmax output. So, in this case, when the one-hot representation of Cats is input to the network, the RNN outputs probability of each word, given the input. From there, you can select the most likey word using argmax on the output.

Let’s say you found average to have the highest probability. You can now use the representation of average as input at next timestep and so on.