Hello, I am a little confused about the conceptual aspect of the assignment (i.e, the way it generated the character or the way the language level model generates a text).
- First, of all the loss function, is not clear. Update the loss by substracting the cross-entropy term of this time-step from it. Can you please explain?
- Please correct me f I am wrong. The network actually tries to learn the probabilities of generating a character given that a character has occurred in a previous position from the dataset. Essentially, its a probabilistic model learn from the given names
- When the value of loss function is reduced, does not it means the model learns probability distributions of the characters that are found in the example names.
- Can you provide me a real-life example of the language model?
Thanks in advance.
2 Likes
Hello Shaikat,
Great to see your questions because that helps to understand better the assignments. I will try to answer your points the best I can:
1.- Sorry I did not find that statement on my notebook version. My understanding here is that you are computing the cross-entropy loss. Cross-entropy loss refers to the contrast between two random variables, in our case two characters: the predicted one and the desired one. Cross-entropy gives you a probability distribution between the possible predicted characters.
2.- The goal is to train the RNN to predict the next letter in the name, so the labels are the list of characters that are one time-step ahead of the characters in the input X
.
3.- Correct. The model is learning based on previous characters and the training samples (from where it is learning the most probable combination of characters).
4. Well, this is an example of a creativity tool to generate βnewβ dinosaur names but these names do not exist, so it is difficult to validate the results. I would say that it would be interesting in some NLP scenarios where you want to generate new creative content not-seen before. The transformers architectures on GPT-2 and 3 are trying to do that right? They are generating new pieces of text based on a lot of text samples and paying attention to words that go often together.
Happy learning.
Best ,
Rosa