I’m finding some odd (to me, anyway) behavior of the AutoEncoding/Reconstruction loss function and the Adam Optimizer, when building a VAE for Anime faces.
To better monitor my code, I modified the generate_and_save_images method, so that instead of printing out a 4x4 grid of generated images, it puts out a 4x8 grid, with the following structure:
-
Row 1: 8 images from the validation set.
-
Row 2: Test of AutoEncoding/Reconstruction - Shows output of the VAE for each of images in Row 1
-
Row 3: Test of Image Generation - 8 copies of VAE output when given a Zero vector as an input to the decoder (i.e. An average vector from the latent distribution). This should give the ‘average’ face
-
Row 4: Test of Image Generation - VAE output when given latent vectors randomly sampled from a Normal distribution (Note: The same 8 random vectors are used for each epoch)
In addition to his grid of images, it also displays the current epoch, step and losses due to the AutoEncoding(i.e. Reconstruction) and Variational (i.e. KL) losses.
After about 76 epochs, it gives me the following result:
which, is fine, as far as it goes. Shortly after, however, it gives me this:
This puzzles me, since the AutoEncoding/Reconstruction loss is marginally better, but the images in the second row are significantly worse (i.e. loss of color and face orientation). If I let the training proceed, I get to:
Where the AutoEncoding(Reconstruction) loss has gotten slightly worse, but the Variational(KL) loss has exploded.
For comparison, an earlier output with a similar AutoEncoding/Reconstruction loss is this one:
Again, the AutoEncoding/Reconstruction loss term does not seem to accurately reflect the quality of the reconstructed images, since the relative qualities in epochs 66 & 85 are not consistent with their respective losses.
Although I suspect the loss functions blowing up has something to do with instability in the Adam optimizer (if anyone agrees/disagrees, please let me know). I do not understand why the AutoEncoding/Reconstruction Loss term becomes such an inaccurate indicator of image quality.



