Conceptual aspect of Dinosaurus Island -- Character level language

Hello, I am a little confused about the conceptual aspect of the assignment (i.e, the way it generated the character or the way the language level model generates a text).

  1. First, of all the loss function, is not clear. Update the loss by substracting the cross-entropy term of this time-step from it. Can you please explain?
  2. Please correct me f I am wrong. The network actually tries to learn the probabilities of generating a character given that a character has occurred in a previous position from the dataset. Essentially, its a probabilistic model learn from the given names
  3. When the value of loss function is reduced, does not it means the model learns probability distributions of the characters that are found in the example names.
  4. Can you provide me a real-life example of the language model?

Thanks in advance.


Hello Shaikat,

Great to see your questions because that helps to understand better the assignments. I will try to answer your points the best I can:

1.- Sorry I did not find that statement on my notebook version. My understanding here is that you are computing the cross-entropy loss. Cross-entropy loss refers to the contrast between two random variables, in our case two characters: the predicted one and the desired one. Cross-entropy gives you a probability distribution between the possible predicted characters.
2.- The goal is to train the RNN to predict the next letter in the name, so the labels are the list of characters that are one time-step ahead of the characters in the input X.
3.- Correct. The model is learning based on previous characters and the training samples (from where it is learning the most probable combination of characters).
4. Well, this is an example of a creativity tool to generate β€œnew” dinosaur names but these names do not exist, so it is difficult to validate the results. I would say that it would be interesting in some NLP scenarios where you want to generate new creative content not-seen before. The transformers architectures on GPT-2 and 3 are trying to do that right? They are generating new pieces of text based on a lot of text samples and paying attention to words that go often together.

Happy learning.
Best ,