Also note that even if you use Tom’s method to always start from the same state, the results still are not deterministic even if you don’t change the code. I ran it three times from the “reset” state with the Add()
implementation and got three different values for the Test Accuracy:
- 0.85
- 0.866666
- 0.766666
Even when you set the random seeds for the PRNG algorithms, the results are still not deterministic, because the training is parallelized across multiple CPUs and GPUs. Parallelism is inherently non-deterministic, since exactly how the threads get scheduled depends on everything ele that’s happening on the computer at the same time. There are ways to artificially constrain that to be deterministic, but then you lose most of the advantages of parallelization and it really costs you in terms of performance. Here’s a thread from mentor Raymond which discusses this point in a lot more detail.