Hi,

I am confused about the notation x = y. x is a Tth example which is a word and y is a probability , am I correct there?

So for x<2>, it is “cat” and y is a probability of a word being something like a float, say 0.7.

Can I get more clarification here as to what we are doing?

`x<i>`

refers to the input to an RNN at time, `i`

and `y<i>`

is the output based on `a<i-1>`

and `x<i>`

.

Input to an RNN is a representation say, one-hot encoding of the word and the output corresponds to the softmax output. So, in this case, when the one-hot representation of `Cats`

is input to the network, the RNN outputs probability of each word, given the input. From there, you can select the most likey word using `argmax`

on the output.

Let’s say you found `average`

to have the highest probability. You can now use the representation of `average`

as input at next timestep and so on.