C4W1: EOS token has very low probability

Hi, I am having a strange problem with my trained model where the translated sentences don’t end. For example:


And:

I verified with print statements after the generate_next_token call in Exercise 5 that next_token is indeed never eos_id, so that done is never (or very rarely) set to True.

Note that this issue/problem (not exactly an error) was not caught by either the unit tests or the grader, which gave me 100%. Any idea what’s going on?

My best guess is that there’s some problem with either the training data or my model implementation that’s preventing the model from seeing the EOS token during training. I suspect this is because the EOS token either is or isn’t somewhere that it shouldn’t or should be.

1 Like

Hi @jakeb

Ensure that the EOS token is correctly included in your training sequences. Each sequence should have an EOS token at the end.

Also, this issue happens when your model only chooses the most probable ones. For this, double-check the logic in your generate_next_token function and how you append tokens to the generated sequence.

Hope it helps! Feel free to ask if you need further assistance.

1 Like

hi @jakeb

is your issue resolved?

The right-shifted Portuguese sentence tokenizations do not have an EOS token, but the English and non-right-shifted Portuguese sentences do:

I just used the training data as provided, because it’s not suggested anywhere that we modify it at all. Would that be the issue?

“Also, this issue happens when your model only chooses the most probable ones”–in general, sure, but clearly not the issue here. There is no way that in a properly trained model, the most likely token after “eu adoro idomas” (translating “I love languages”) is anything other than the EOS token, even using greedy selection and zero temperature. It is certainly not “eu” as in the first screenshot (“I love languages I”).

Also, the generate_next_token function is provided and not editable.

I don’t have access to the course notebooks, but adding <s> (Start of Sentence) and </s> (End of Sentence) tokens could help standardize the training data. Then, the model properly learns where sentences begin and end, reducing the likelihood of inappropriate token predictions.

For more specific assistance, kindly share your code with me in Private Messages!

Can you send a screenshot of the grade cell of the most precious cell to the cell you are encountering the issue via personal DM

Chances are in your translate grade cell, you might be mixed up some code.

In case the other mentors can’t solve you can send me, as I am NLP mentor and have access to course.

Hi @jakeb

Codes correction required for

  1. Convert the original string into a tensor, you are support convert the texts and not text to tensor as it mention original string

  2. Same goes for next line of code where you vectorise, you need to use texts

Remember for both the above two codes you using only text means text (string): The sentence to translate, but the instructions mentions to original string, hence texts

  1. for next two corrections refer the below comment

Regards
DP

Hi, I have encountered the same problem. Struggling to correct…
By “you are support convert the texts and not text to tensor” do you mean in the beginning of the translate function definition? I don’t see any reason that is different from the case where every “texts” is “text”

I also don’t understand why “for i in range(max_length)” will not work since the name “token” is not used in the loop and technically I should be able to choose any label for this iteration variable, like “_”.
I change them anyway and the result is still missing eos_ids TUT.

@VictorAildom, I did not understand @Deepti_Prasad’s comments either. I eventually gave up and moved on since neither the unit tests nor the grader were noticing the problem.

I have also come across a similar issue. The translator from exercise 5 seems to fail to recognise the EOS token. I have tried debugging in multiple ways having gone through the suggestions from different posts in this forum. Still can’t get my head round why the generated output keeps repeating itself.

(My work passed all the unit tests and the grader returns with 100% as well.)

Kindly DM the codes in question via personal DM

hi @jonkl2024

The below comment thread mentions all the error or correction you require to do in your codes

Also your codes have the same issue as the post creator

Convert the original string into a tensor, you are support convert the texts and not text to tensor as it mention original string

Same goes for next line of code where you vectorise, you need to use texts

Refer to the linked comment which explains this part about the same.

next your code is incorrect for the iteration step as you have left it _ blank, it requires you recall the iteration.

next the code has been recalled incorrectly if next token is EOS, refer the linked comment, it has an image on how to recall.

Regards
DP