In the video “Language model and sequence generation”, the output y_hat<2> is a series of probabilities for different words in a dictionary, right? However, in the input of next step, y<2> is a one-hot function, so I am not sure how a series of probabilities transformed into a one-hot function. Did the system set the word with maximum probability to 1 and the rest for 0?
Another question, the one to many model is supposed to generate a sentence with just one starting word, right?