Conceptual aspect of Dinosaurus Island -- Character level language

shaikat · July 19, 2021, 4:31pm

Hello, I am a little confused about the conceptual aspect of the assignment (i.e, the way it generated the character or the way the language level model generates a text).

First, of all the loss function, is not clear. Update the loss by substracting the cross-entropy term of this time-step from it. Can you please explain?
Please correct me f I am wrong. The network actually tries to learn the probabilities of generating a character given that a character has occurred in a previous position from the dataset. Essentially, its a probabilistic model learn from the given names
When the value of loss function is reduced, does not it means the model learns probability distributions of the characters that are found in the example names.
Can you provide me a real-life example of the language model?

Thanks in advance.

arosacastillo · September 10, 2021, 6:06pm

Hello Shaikat,

Great to see your questions because that helps to understand better the assignments. I will try to answer your points the best I can:

1.- Sorry I did not find that statement on my notebook version. My understanding here is that you are computing the cross-entropy loss. Cross-entropy loss refers to the contrast between two random variables, in our case two characters: the predicted one and the desired one. Cross-entropy gives you a probability distribution between the possible predicted characters.
2.- The goal is to train the RNN to predict the next letter in the name, so the labels are the list of characters that are one time-step ahead of the characters in the input X.
3.- Correct. The model is learning based on previous characters and the training samples (from where it is learning the most probable combination of characters).
4. Well, this is an example of a creativity tool to generate “new” dinosaur names but these names do not exist, so it is difficult to validate the results. I would say that it would be interesting in some NLP scenarios where you want to generate new creative content not-seen before. The transformers architectures on GPT-2 and 3 are trying to do that right? They are generating new pieces of text based on a lot of text samples and paying attention to words that go often together.

Happy learning.
Best ,

Rosa

Topic		Replies	Views
Week 1, Programming assignments (# 2, #3 ) Sequence Models week-1	4	42	October 6, 2024
Sampling process and loss function in sequence models Sequence Models	5	358	September 12, 2023
Week 1 Assignment: Dinosaurus_Island_Character_level_language_model Sequence Models	1	517	January 15, 2023
Dinosaurus_Island_Character_level_language_model Exercise 4 Sequence Models week-1	3	32	October 17, 2024
Loss funtion in dinosaurus_Island assignment Sequence Models	3	515	October 6, 2022

Conceptual aspect of Dinosaurus Island -- Character level language

Related topics