C4W1 NMT with Attention(tensorflow) Assignment, Exercise 5 - translate - generate "eu eu eu "

When trying to translate “I love languages” my translate function is failing in “eu eu eu … 50x” miserably while I am passing all previous steps and individual testing and I am training successfully the model with the 20 Epoch and a converging cost



labID tgdixhjgmckd
my info are /notebooks/C4W1_Assignment.ipynb#ex4

2 Likes

Hi @Fred_Hannoyer,

are you passing the unit test of Exercise 5 as well? The

w1_unittest.test_translate(translate, trained_translator)

cell?

2 Likes

I am passing all the tests -even get 100% passed on the lab but I know something is not working

2 Likes

Could you check what happens if you “Restart and Run All” in the Kernel tab of the notebook?

Your training seems indeed to have gone fine, for reference, the values in my notebook are:

and I get only a couple of “eu” for temp = 0.0

Also, consider adding “Exercise 5” in the title, it makes it easier to find the part of the notebook that the issue occurs.

3 Likes

Check if you have used eng_sentence or texts? In translate 5

Or you can share the translate 5 codes by DM with Anna or me.

Also the one big difference is using tf.zeros (should be used) instead of tf.random

2 Likes

After [Restart and Run All] under Kernel and going through the 20 Epoch training - IT WORKED - Thank you :slight_smile: I now have the right translations for Ex5 and the last mbr_decode working

As I was suffering not having enough “Expected Output” for the intermediate states I will share mines herebelow when there is none specified in the lab for the next student (I believe this is respecting the code of conduct):

3-Training

4. Using the model for inference

image

Exercise 5 - translate

image
image

5. Minimum Bayes-Risk Decoding

image
def weighted_avg_overlap(samples, log_probs, similarity_fn):
image

mbr_decode

image

3 Likes

Maybe I went a bit too fast
The translation “eu adoro idiomas aqui mundial.” is still surprising “I love languages here worldwide.”
And the ones at the mbr as well
eu adoro idiomas ainda tem algumas linguas .
adoro idiomas vem eu algumas linguas .
eu adoro idiomas ainda tem a estante de idade .
eu adoro idiomas ainda tem comida .
eu adoro linguas perto de frequencia .
eu adoro idiomas ainda tem alguma idiomas .
eu adoro idiomas ainda tem pouco idiomas .
eu adoro idiomas ainda esta idiomas .
eu eu adoro idiomas nunca dorme .
eu adoro idiomas ainda tem algumas linguas .

  1. I love languages, there are still some languages.
  2. I love languages, come I some languages.
  3. I love languages, there is still the age shelf.
  4. I love languages, there is still food.
  5. I love languages close to frequency.
  6. I love languages, there are still some languages.
  7. I love languages, there are still few languages.
  8. I love languages, still is languages.
  9. I, I love languages, never sleep.
  10. I love languages, there are still some languages.
    and the one selected
    eu adoro idiomas ainda tem algumas linguas. =+> “I love languages, there are still some languages.”
2 Likes

I did initiate state (hidden,cell) w tf.zeros in translate()
and I used text which is the input parameter and not eng_sentence nor texts
(PS: texts and eng_sentence refer to the previous function test “generate_next_token” - not sure I understand)

2 Likes

Anna - I tried but I don’t seem to be able to modify the thread title to add “Exercise 5”
My original post is not editable anymore and I don’t know how to modify the thread title either

2 Likes

Where you write the Header, there is a small :memo: symbol, when you click that you can edit. For now I will edit the header here.

3 Likes

Hello @Anna_Kay

Your shared training output and temperature at 0.0 image created a doubt for me, why there is difference in logits and translation token for you and the learner, and also mine differs from my logit output and translation token, Logit: -1.539

Can you tell me why when all three of our original sentence was I love languages, still created a different output of translation and translation tokens.

Based on what I have understanding Logits helps to compute the probabilities of the output classes through the softmax function.
Higher the value of a logit for a particular class, the higher the probability of that class being the correct output.

Does this mean translation model are not universal even in relation to the two similar attention models ?

Also I noticed difference in the translation token, does that mean all of our 3 tokenizer are working at different pace, as I had come across a different post or query related to short course where it mentioned if we do not include a part of code, it can cause fast tokenization.

Sorry if I have asked too many doubts.

Regards
DP

1 Like

Hi @Deepti_Prasad!

My understanding is that the different results are due to:

  1. the fact that there is inherent randomness in the training of a neural network (different initialization parameters, the optimizer - Adam in this case, the order in which the batches are passed for training)
  2. this particular model is still (a bit) underfit when training for 20 epochs.

My guess is that if the training continued for some more epochs, our results would look more similar, but still not identical (e.g., logits values would still differ slighlty, but tokens and words would be the same).

If the training was done for actual experiments/production (and not for educational purposes), there would be monitoring of metrics to decide when to stop the training (e.g. monitoring of the relationship between train loss and validation loss and of the ROUGE score - since the final task is translation).

Regarding the initial issue that @Fred_Hannoyer encountered, it is possible that everything was done correctly (that’s why there were no errors in the testing), and although the training seemed to have gone fine, the weights that were learned were not so good (underfit actually). That would be the reason why restarting and rerunning the notebook, and retraining the model, solved the issue - it just learned better weights this time.

Your understanding of logits is correct, and because as your describe it is a matter of a higher value for a particular class, at each step the logits do not have to be identical to generate the same tokens through softmax, just higher than the logits for the other classes.

Regarding your question about translation models not being universal, if I understand correctly that you asking whether there is some randomness in the outputs for the same input, this is controlled by the Temperature. For temperature=0, the outputs (for same model & same input) should always be identical. The higher the temperature, the more randomness is introduced.
If you were asking something else, please correct me.

Regarding the tokens, I think the are actually ok, “eu” - is always 9, “adoro” - 564, “idiomas” - 850. Then when the words are different, the corresponding numbers also differ.

Deepti, since the answer it pretty long and contains a lot of info, feel free to tag any other mentor to review it. :nerd_face:

PS: the randomness in the notebook could be solved by using a random seed, maybe we should open a issue to do this.

1 Like

Hello @Anna_Kay

But I noticed the difference between ours and Fred’s token, infact logit value for all 3 of us differ. Here I am sharing all 3 of our logits, translation and translation token output for temperature 0.0

Fred’s translate output at temp 0.0

==========================================================

Anna’s translate output at temp 0.0

==========================================================

My translate output at temp 0.0

Regards
DP

2 Likes

It is because all 3 of us have slightly different models; although the training process is the same, the weights of the resulting models are not identical (due to the randomness).

So this is not a case of same model & same input, we actually have 3 different models (3 models with different weights), same input, but 3 different outputs.

You can also check that the training for the 3 of us, does not end up in the same exact values, see val_masked_acc & val_masked_loss.

Temperature = 0.0, guarantees that everytime you run the same model, with the same input, you get the same result. You can check this by rerunning the cell with temp = 0.0.

On the other hand, rerunning the cell where temp = 0.7, will give different outputs, for the same input and the same model, each time it is run.

3 Likes

Thank you :blush:

2 Likes