C4W1 NMT with Attention(tensorflow) Assignment, Exercise 5 - translate - generate "eu eu eu "

Fred_Hannoyer · March 29, 2024, 11:53am

When trying to translate “I love languages” my translate function is failing in “eu eu eu … 50x” miserably while I am passing all previous steps and individual testing and I am training successfully the model with the 20 Epoch and a converging cost

labID tgdixhjgmckd
my info are /notebooks/C4W1_Assignment.ipynb#ex4

Anna_Kay · March 31, 2024, 5:44pm

Hi @Fred_Hannoyer,

are you passing the unit test of Exercise 5 as well? The

w1_unittest.test_translate(translate, trained_translator)

cell?

Fred_Hannoyer · March 31, 2024, 8:06pm

I am passing all the tests -even get 100% passed on the lab but I know something is not working

Anna_Kay · March 31, 2024, 8:24pm

Could you check what happens if you “Restart and Run All” in the Kernel tab of the notebook?

Your training seems indeed to have gone fine, for reference, the values in my notebook are:

and I get only a couple of “eu” for temp = 0.0

Also, consider adding “Exercise 5” in the title, it makes it easier to find the part of the notebook that the issue occurs.

Deepti_Prasad · March 31, 2024, 9:02pm

Check if you have used eng_sentence or texts? In translate 5

Or you can share the translate 5 codes by DM with Anna or me.

Also the one big difference is using tf.zeros (should be used) instead of tf.random

Fred_Hannoyer · April 1, 2024, 8:50am

After [Restart and Run All] under Kernel and going through the 20 Epoch training - IT WORKED - Thank you I now have the right translations for Ex5 and the last mbr_decode working

As I was suffering not having enough “Expected Output” for the intermediate states I will share mines herebelow when there is none specified in the lab for the next student (I believe this is respecting the code of conduct):

3-Training

4. Using the model for inference

Exercise 5 - translate

5. Minimum Bayes-Risk Decoding

def weighted_avg_overlap(samples, log_probs, similarity_fn):

mbr_decode

Fred_Hannoyer · April 1, 2024, 9:03am

Maybe I went a bit too fast
The translation “eu adoro idiomas aqui mundial.” is still surprising “I love languages here worldwide.”
And the ones at the mbr as well
eu adoro idiomas ainda tem algumas linguas .
adoro idiomas vem eu algumas linguas .
eu adoro idiomas ainda tem a estante de idade .
eu adoro idiomas ainda tem comida .
eu adoro linguas perto de frequencia .
eu adoro idiomas ainda tem alguma idiomas .
eu adoro idiomas ainda tem pouco idiomas .
eu adoro idiomas ainda esta idiomas .
eu eu adoro idiomas nunca dorme .
eu adoro idiomas ainda tem algumas linguas .

I love languages, there are still some languages.
I love languages, come I some languages.
I love languages, there is still the age shelf.
I love languages, there is still food.
I love languages close to frequency.
I love languages, there are still some languages.
I love languages, there are still few languages.
I love languages, still is languages.
I, I love languages, never sleep.
I love languages, there are still some languages.
and the one selected
eu adoro idiomas ainda tem algumas linguas. =+> “I love languages, there are still some languages.”

Fred_Hannoyer · April 1, 2024, 9:30am

I did initiate state (hidden,cell) w tf.zeros in translate()
and I used text which is the input parameter and not eng_sentence nor texts
(PS: texts and eng_sentence refer to the previous function test “generate_next_token” - not sure I understand)

Fred_Hannoyer · April 1, 2024, 9:34am

Anna - I tried but I don’t seem to be able to modify the thread title to add “Exercise 5”
My original post is not editable anymore and I don’t know how to modify the thread title either

Deepti_Prasad · April 1, 2024, 10:08am

Where you write the Header, there is a small symbol, when you click that you can edit. For now I will edit the header here.

Deepti_Prasad · April 1, 2024, 11:04am

Hello @Anna_Kay

Your shared training output and temperature at 0.0 image created a doubt for me, why there is difference in logits and translation token for you and the learner, and also mine differs from my logit output and translation token, Logit: -1.539

Can you tell me why when all three of our original sentence was I love languages, still created a different output of translation and translation tokens.

Based on what I have understanding Logits helps to compute the probabilities of the output classes through the softmax function.
Higher the value of a logit for a particular class, the higher the probability of that class being the correct output.

Does this mean translation model are not universal even in relation to the two similar attention models ?

Also I noticed difference in the translation token, does that mean all of our 3 tokenizer are working at different pace, as I had come across a different post or query related to short course where it mentioned if we do not include a part of code, it can cause fast tokenization.

Sorry if I have asked too many doubts.

Regards
DP

Anna_Kay · April 1, 2024, 12:17pm

Hi @Deepti_Prasad!

My understanding is that the different results are due to:

the fact that there is inherent randomness in the training of a neural network (different initialization parameters, the optimizer - Adam in this case, the order in which the batches are passed for training)
this particular model is still (a bit) underfit when training for 20 epochs.

My guess is that if the training continued for some more epochs, our results would look more similar, but still not identical (e.g., logits values would still differ slighlty, but tokens and words would be the same).

If the training was done for actual experiments/production (and not for educational purposes), there would be monitoring of metrics to decide when to stop the training (e.g. monitoring of the relationship between train loss and validation loss and of the ROUGE score - since the final task is translation).

Regarding the initial issue that @Fred_Hannoyer encountered, it is possible that everything was done correctly (that’s why there were no errors in the testing), and although the training seemed to have gone fine, the weights that were learned were not so good (underfit actually). That would be the reason why restarting and rerunning the notebook, and retraining the model, solved the issue - it just learned better weights this time.

Your understanding of logits is correct, and because as your describe it is a matter of a higher value for a particular class, at each step the logits do not have to be identical to generate the same tokens through softmax, just higher than the logits for the other classes.

Regarding your question about translation models not being universal, if I understand correctly that you asking whether there is some randomness in the outputs for the same input, this is controlled by the Temperature. For temperature=0, the outputs (for same model & same input) should always be identical. The higher the temperature, the more randomness is introduced.
If you were asking something else, please correct me.

Regarding the tokens, I think the are actually ok, “eu” - is always 9, “adoro” - 564, “idiomas” - 850. Then when the words are different, the corresponding numbers also differ.

Deepti, since the answer it pretty long and contains a lot of info, feel free to tag any other mentor to review it.

PS: the randomness in the notebook could be solved by using a random seed, maybe we should open a issue to do this.

Deepti_Prasad · April 1, 2024, 12:58pm

Hello @Anna_Kay

But I noticed the difference between ours and Fred’s token, infact logit value for all 3 of us differ. Here I am sharing all 3 of our logits, translation and translation token output for temperature 0.0

Fred’s translate output at temp 0.0

==========================================================

Anna’s translate output at temp 0.0

==========================================================

My translate output at temp 0.0

Regards
DP

Anna_Kay · April 1, 2024, 2:19pm

It is because all 3 of us have slightly different models; although the training process is the same, the weights of the resulting models are not identical (due to the randomness).

So this is not a case of same model & same input, we actually have 3 different models (3 models with different weights), same input, but 3 different outputs.

You can also check that the training for the 3 of us, does not end up in the same exact values, see val_masked_acc & val_masked_loss.

Temperature = 0.0, guarantees that everytime you run the same model, with the same input, you get the same result. You can check this by rerunning the cell with temp = 0.0.

On the other hand, rerunning the cell where temp = 0.7, will give different outputs, for the same input and the same model, each time it is run.

Deepti_Prasad · April 1, 2024, 2:47pm

Thank you

goldfish · August 7, 2024, 3:22am

Hi, Mentor! I read all the discussion and took the adivice to restart my kernel and rerun the cells. But I still got the same result ---- a bunch of “eu”. I passed the unites. I wonder where the mistake is from. Any advices? Many thanks!
Sincerely,
Shining

Deepti_Prasad · August 7, 2024, 5:28am

@goldfish

Can you create a new topic thread with a screenshot of your or your output without sharing codes.

getting bunch of EU’s??? doesn’t explain your issue.

Regards
DP

goldfish · August 7, 2024, 2:41pm

Thank you for your reply! I’ll create a new topic thread.

Topic		Replies	Views
C4W1/C4W1_Assignment "Exercise 5 - translate": I got a bunch of "eu", don't know the cause NLP with Sequence Models week-module-1 , week-module-4 , ai-discussions	3	35	August 7, 2024
C4W1_Assignment - Translate Function NLP with Attention Models week-module-1	5	465	March 14, 2024
C4W1_Neural Machine Translation_Exercise 5 - translate NLP with Attention Models week-module-1	11	94	November 9, 2024
C4W1: EOS token has very low probability NLP with Attention Models week-module-1	11	193	July 18, 2024
C4W1_Assignment - Exercise 5 NLP with Sequence Models week-module-1	41	1418	May 28, 2024

C4W1 NMT with Attention(tensorflow) Assignment, Exercise 5 - translate - generate "eu eu eu "

3-Training

4. Using the model for inference

Exercise 5 - translate

5. Minimum Bayes-Risk Decoding

mbr_decode

Related topics