I did the assignment according the description but and passed it. However, i am trying to understand each step properly.

In the assignment we generate a list of numbers that correspond to a dinasaurus name. e.g.
this is X
[None, 23, 5, 12, 12, 14, 8, 15, 6, 5, 18, 9, 1]
this is Y
[23, 5, 12, 12, 14, 8, 15, 6, 5, 18, 9, 1, 0]

based on such two lists we calculate the loss . However, the lists are always identical. What determines the difference in loss that vary from one iteration to another ? and how does the model learn if X and Y are identical?

What is being learned is the internal weights for the LSTM states that allow the sequence of letters to be predicted. We’re not just looking at each X individually and using it to predict that Y.

X is the input and Y is the expected output. Although they appear identical, the model learns by comparing its predictions from X to Y and calculating the loss based on these differences. This process helps the model to adjust its parameters to minimize loss over time, even with structurally similar input and target sequences.

Hope it helps! Let me know if you have more questions.

The list contains integer indices of words - the sequence of integer can be translated back to the sequence of words by looking up in the ‘dictionary’. As @TMosh mentioned the hidden state weights are trained to ‘correctly’ predict the next word given a sequence of words. Optimizing the loss function helps ‘correct’ the randomly initialized weights.