RNN Model - y Label Meaning

For the RNN model mentioned in the “Language Model and Sequence Generation” lecture, I am confused about the input y_<1>, y_<2> and etc. y labels are normally supposed to be located on the output side. Why are y_<1>, y_<2> and etc. on the input side of the RNN model?

Also, why is y_<1> equal to “Cats”? “Cats” is supposed to be an input token and not an output label.

Thank you in advance.

@hungng777 it is not entirely clear what you are stating here…

I mean do recall we are dealing with an explicitly defined training set. We are not inventing ‘language out of nowhere’.

And yes, at each step y^{<t>} is of course an output, but the reason we feed it to the next step and say ‘Okay I am here-- So what (of the most probable outcomes) comes next ?’.

As I mentioned it is all temporal.

Your understanding is correct as long as RNN starts generating output based on user input. For a task like writing a story in the style of someone based on the sentences seen so far, RNN can start without any input at all. So, both approaches are okay.

HI Balaji,
I don’t quite understand the answer at this time.

Take a look at the RNN model in the “Language Model and Sequence Generation” lecture. There is a picture of the RNN model at time 11:00 in the video. I am wondering why y_<1>, y_<2> and etc. are acting as the input to the RNN. y labels are normally outputs.

Also, what should be the value of y_<1>, y_<2> and etc? In the RNN model, x_<2> = y_<1> = “Cats” and x_<3> = y_<2> = “average”. x_<2>, x_<3> and etc. are input values. How could x_<2> be equal to y_<1>, which is an output Y label?

The bottom line is that I am totally confused about the RNN model.

No worries.

The 1st thing to understand is that the weights of an RNN block are shared across all timeseps. For the sake of understanding, the time unrolled view of the RNN is drawn on the slide.

With this in mind, consider the task where we want the RNN to generate rest of the sentence given a starting word or no word to follow the style of the training corpus closely. RNN generates one timestep at a time i.e. next word given the current word and the context that captures the words seen so far. This is why the RNN feeds the output y_{<i>} as input to generate the next word y_{<i + 1>}