Hi!
In lesson and in the original Training, we first train the Discriminator and then the Generator.
I went ahead and did an experiment and switched this and, in the loop, trained first the Generator and then the Discriminator. Empirically, the results on the losses in the generator seem a bit better with this new order.
Original:
Epoch 3, step 1500: Generator loss: 3.9927062740325936, discriminator loss: 0.024210385922342533
Experiment 1:
Epoch 3, step 1500: Generator loss: 2.067182400345801, discriminator loss: 0.30279242809861917
Then I did an Experiment 2:
In lecture and in the original training algorithm, the training of the Disc uses FAKE_1 from Gen, and training of Gen uses a new FAKE_2 image. In short, in each cycle, the Gen generates 2 images, one to train the Disc and one to train itself. In my experiment 2 I decided to use the same image for both trainings.
The empirical result: the losses are even better in the Gen with this configuration:
Original:
Epoch 3, step 1500: Generator loss: 3.9927062740325936, discriminator loss: 0.024210385922342533
Experiment 2:
Epoch 3, step 1500: Generator loss: 1.6950061668455585, discriminator loss: 0.3041314163953065
Epoch 21, step 10000: Generator loss: 0.7340306047201158, discriminator loss: 0.6788859988451004
I wonder if this is just silly playing, or if there is something to be said about it?
- Is it important to first train the Disc and then the Gen? or is it irrelevant?
- Is it important to use two different fake images, or is it irrelevant?
Thanks!
Juan