Odd behavior of AutoEncode Loss function & Adam optimizer

I’m finding some odd (to me, anyway) behavior of the AutoEncoding/Reconstruction loss function and the Adam Optimizer, when building a VAE for Anime faces.

To better monitor my code, I modified the generate_and_save_images method, so that instead of printing out a 4x4 grid of generated images, it puts out a 4x8 grid, with the following structure:

  • Row 1: 8 images from the validation set.

  • Row 2: Test of AutoEncoding/Reconstruction - Shows output of the VAE for each of images in Row 1

  • Row 3: Test of Image Generation - 8 copies of VAE output when given a Zero vector as an input to the decoder (i.e. An average vector from the latent distribution). This should give the ‘average’ face

  • Row 4: Test of Image Generation - VAE output when given latent vectors randomly sampled from a Normal distribution (Note: The same 8 random vectors are used for each epoch)

In addition to his grid of images, it also displays the current epoch, step and losses due to the AutoEncoding(i.e. Reconstruction) and Variational (i.e. KL) losses.

After about 76 epochs, it gives me the following result:

which, is fine, as far as it goes. Shortly after, however, it gives me this:

This puzzles me, since the AutoEncoding/Reconstruction loss is marginally better, but the images in the second row are significantly worse (i.e. loss of color and face orientation). If I let the training proceed, I get to:

Where the AutoEncoding(Reconstruction) loss has gotten slightly worse, but the Variational(KL) loss has exploded.

For comparison, an earlier output with a similar AutoEncoding/Reconstruction loss is this one:

Again, the AutoEncoding/Reconstruction loss term does not seem to accurately reflect the quality of the reconstructed images, since the relative qualities in epochs 66 & 85 are not consistent with their respective losses.

Although I suspect the loss functions blowing up has something to do with instability in the Adam optimizer (if anyone agrees/disagrees, please let me know). I do not understand why the AutoEncoding/Reconstruction Loss term becomes such an inaccurate indicator of image quality.

1 Like

Hi @Steven1

first of all, congrats for your generate_and_save images function! Much better than the original one :slight_smile:

In a VAE system the values of the Autoencode and Variational Loss doesn’t need to be directly correlated with the quality of the images generated. In VAE and GAN’s the result of the loss function can’t be used as a measure of the quality. It’s important to use some others like the Human Perception. Or maybe other metrics like the FID between a generated image and a original One.

Check this article, if you want more information:

I see that the images generated seems to be worse at advanced epochs. I remember that it was a really difficult and challenging assignement, and it’s really difficult to know which problem, if some, are you having in your notebook.

Are you using BatchNormalization layers in Encoder and Decoder? It can help with the stability of the model.

Hope it can help!

I’m open to suggestions :slight_smile: please if someone thinks I’m wrong, just let me know :slight_smile:

1 Like

Pretty informative @Pere_Martra , I saw this post but I had to revise the VAE in order to give a more precise answer to be frank. But I was thinking in terms of stability being worse at latter epochs, I was thinking about mode collapse in GANs, I have a feeling the problem here is related to something like this.

Hi @gent.spah, sure you are right! I have more experience with GAN’s than with VAE’s but they share some problems, and the instability is one of them. That’s why i used a BatchNormalization layer after each Conv2D (since I remember).

Please, if you have another answer feel free to share it! I think that it is a complex field where different solutions and explanations can be right.

1 Like

@gent.spah & @Pere_Martra - I can see something analogous to mode collapse playing a part, since this really only happens once my variational loss drops below ~0.07. If I force my VAE to always place as much emphasis on the variational loss as upon the reconstruction loss, I quickly get something that looks exactly like mode collapse.

I recall ADAM can get unstable if run too long (but still need to hunt down the references). I don’t think mode collapse would explain the explosion in the VAE loss and the slower increase in the reconstruction loss. I would think mode collapse would only express itself by making my my generated samples more similar to each other - which is also happening.

What mainly troubles me is how utterly unreliable the autoencoding loss becomes. The accuracy loss is fairly strong, if one notes the color bleaching. If I ever use VAE on something I can’t visually inspect, I could easily get in trouble.

I had another run where I saw this same effect, only without the variational loss exploding. Tragically, something crashed when I tried to download the files to my local machine, so I’ll have to try again.

PS - I am using BatchNorm in my VAE. I stuck pretty closely to the MNIST VAE, only making suggested changes.