Hi @Deepti_Prasad!
My understanding is that the different results are due to:
- the fact that there is inherent randomness in the training of a neural network (different initialization parameters, the optimizer - Adam in this case, the order in which the batches are passed for training)
- this particular model is still (a bit) underfit when training for 20 epochs.
My guess is that if the training continued for some more epochs, our results would look more similar, but still not identical (e.g., logits values would still differ slighlty, but tokens and words would be the same).
If the training was done for actual experiments/production (and not for educational purposes), there would be monitoring of metrics to decide when to stop the training (e.g. monitoring of the relationship between train loss and validation loss and of the ROUGE score - since the final task is translation).
Regarding the initial issue that @Fred_Hannoyer encountered, it is possible that everything was done correctly (that’s why there were no errors in the testing), and although the training seemed to have gone fine, the weights that were learned were not so good (underfit actually). That would be the reason why restarting and rerunning the notebook, and retraining the model, solved the issue - it just learned better weights this time.
Your understanding of logits is correct, and because as your describe it is a matter of a higher value for a particular class, at each step the logits do not have to be identical to generate the same tokens through softmax, just higher than the logits for the other classes.
Regarding your question about translation models not being universal, if I understand correctly that you asking whether there is some randomness in the outputs for the same input, this is controlled by the Temperature. For temperature=0, the outputs (for same model & same input) should always be identical. The higher the temperature, the more randomness is introduced.
If you were asking something else, please correct me.
Regarding the tokens, I think the are actually ok, “eu” - is always 9, “adoro” - 564, “idiomas” - 850. Then when the words are different, the corresponding numbers also differ.
Deepti, since the answer it pretty long and contains a lot of info, feel free to tag any other mentor to review it.
PS: the randomness in the notebook could be solved by using a random seed, maybe we should open a issue to do this.