W1 A2 Dinos: Are the params optimized per name or per letter?

The Dinos assignment takes one word (a Dino name) at a time then it takes one letter from this word at a time. For this single letter the parameters are computed using the function named optimize (which is called from the function model) and then the same is repeated for another letter until the end of the word. After each letter, the new (better) parameters are computed.

Then the above procedure is repeated for another word. I am just wondering if it is important to provide one word at a time to the function optimize. Would the resulting parameters be worse if we provided the optimize function with one very very long word which would be a concatenation of all the Dino names with just ā€˜\nā€™ separating the words?

The function optimize updates the parameters for each letter, not for each word. Perhaps it does not matter if they get one very very long word or if they get one name at a time, does it?

1 Like

The assignment builds a character level language model whose goal is to predict the next character. This choice makes sense to me since we want to predict a name and dino names are unique on a word level (based on my knowledge of dino names) and there is no relationship between names.

Recurrent layers like RNN / LSTM can handle long range dependencies, but they have limitations on how well they can learn on very long sequences.

Please keep this question in mind when you learn about attention mechanisms later in the course and reply to this topic should you still have doubts once you cover those lectures.