Week 4 Transformer Network: package versions

If you are taking the approach of modifying things to run with the current versions of the packages, then you may well get different numeric results. Note that with TF it’s not really possible to get identical results even when you set the random seeds. The issue is that the training is parallelized and that process is fundamentally non-deterministic. They may have changed the behavior of the parallelization logic in the later versions. There is a flag you can set to get deterministic results, but it basically disables most of the parallelization, so it slows everything down. Here’s a post from mentor Raymond that explains this point.

Or they could have just changed things in other more direct ways that change the resolution of the outputs. Of course even if it’s more numerically accurate, that could still be “different”.

2 Likes