In lesson and in the original Training, we first train the Discriminator and then the Generator.

I went ahead and did an experiment and switched this and, in the loop, trained first the Generator and then the Discriminator. Empirically, the results on the losses in the generator seem a bit better with this new order.

Epoch 3, step 1500: Generator loss: 3.9927062740325936, discriminator loss: 0.024210385922342533

Experiment 1:
Epoch 3, step 1500: Generator loss: 2.067182400345801, discriminator loss: 0.30279242809861917

Then I did an Experiment 2:
In lecture and in the original training algorithm, the training of the Disc uses FAKE_1 from Gen, and training of Gen uses a new FAKE_2 image. In short, in each cycle, the Gen generates 2 images, one to train the Disc and one to train itself. In my experiment 2 I decided to use the same image for both trainings.
The empirical result: the losses are even better in the Gen with this configuration:

Epoch 3, step 1500: Generator loss: 3.9927062740325936, discriminator loss: 0.024210385922342533

Experiment 2:
Epoch 3, step 1500: Generator loss: 1.6950061668455585, discriminator loss: 0.3041314163953065
Epoch 21, step 10000: Generator loss: 0.7340306047201158, discriminator loss: 0.6788859988451004

I wonder if this is just silly playing, or if there is something to be said about it?

  • Is it important to first train the Disc and then the Gen? or is it irrelevant?
  • Is it important to use two different fake images, or is it irrelevant?



Interesting, @Juan_Olano!
I would have guessed, especially for Experiment #1 that it wouldn’t make a significant difference. You’re going back and forth between generator and discriminator thousands and thousands of times during training. It seems surprising that it would matter if you started first with generator or discriminator.

But, that’s just intuitively what I would have thought. Maybe I’m overlooking something, which your results suggest. One thing, though - did you make sure to choose “Refresh and Clear Outputs” between each of your test cases to make sure you were starting at the same starting point for each test?

Hi @Wendy , thank you for taking the time to answer my ticket!

Regarding your question about refreshing and clearing outputs, yes, absolutely, I added that. Torch accumulates these outputs across all iterations, so, if not cleared, each batch would be subject to the outputs of all previous batches.

I left original and both experiments run for a few thousand iterations, and the end they were actually pretty similar, so may be it doesn’t really matter that much, at least for this application.

Thanks again,