At the end of the U-net assignment, I plotted the model training acc across 40 epochs. I noticed that the training acc increased gradually from epoch 1- 30, but observed a huge drop at epoch 31, as shown by the screenshot. I’m wondering what would be the cause of this drop, and how should I interpret this observation?
Hi, I have the same behavior - not always, but regularly. I ran this notebook also locally and on colab and the issue reproduces. Some suggest that it is because of unlucky shuffling. But it would be nice to get an answer from mentors, please.
Yes, I have tried running a lot of experiments and I get varying behavior from the training here. My first thought was that they don’t set any random seeds anywhere in this notebook, so maybe that’s what causes the variability. But if I add:
tf.random.set_seed(42)
Either early in the notebook or right before the instantiation of the model, I still get varying results from the training. I read through the documentation about TF “global” versus “operational” seeds and my belief based on those documents is that setting the global seed as above should give reproducible results.
I don’t have enough samples yet to really conclude that the degree of variability is less with the seed fixed, but that is the impression I get. But it still definitely varies.
I have not tried printing the initialized weights to make sure my seed logic actually does produce the same results, but that is my reading of this TF docpage.
So my next theory is that model.fit with Adam optimization has its own source of randomness that is not affected by setting the seed. But I have not had time to pursue that. The next step is to read carefully the documentation on model.compile and model.fit.
The solution surfaces of Neural Networks are incredibly complicated, so there is never any guarantee that you’ll get smooth and monotonic convergence from Gradient Descent: you can go off a cliff at any iteration, especially with a fixed learning rate. But the internal algorithms that TF uses in model.fit presumably use sophisticated techniques for modulating the learning rate based in the magnitudes of the gradients, but the evidence seems to show that even with state of the art sophistication you can still get weird behavior on any given training run.
With the seed set as described above, I started by doing:
Kernel → Restart and Clear Output
Cell → Run All
Go to 1).
When I still was seeing variations from run to run, I wondered if maybe there is some in memory state of TF that is not cleared by “Kernel → Restart”, so I beefed it up to this:
Kernel → Restart and Clear Output
Cell → Run All
Kernel → Restart and Clear Output
Save
Close the notebook
Reopen from the “Work in Browser” link
Go to 1).
And I still see variability from run to run.
When I get time, I will dig into the question of whether model.fit can be made deterministic or worst case to explain why it’s not with a set global seed.