Poor reconstruction of images using autoencoder on STL10 dataset

Hi everyone, I was working on a project where I am trying to train an autoencoder on STL10 data. Right now, I am just experimenting with certain architectures to get good reconstruction quality but unfortunately, no matter which architecture I try, I am unable to get good reconstruction quality. The main issue I am facing is poor color reconstruction. I have tried using resnet18 as encoder and used an several decoders (10-15 layers, with / without skip connections).
Some details of the training procedure –

  1. using ‘train+labelled’ dataset of STL10
  2. encoders used – resnet18 (pytorch model, without pretrained weights) and smallAlexnet (link)
  3. decoders used – tried my own architectures (for both resnet and alexnet encoders)
  4. values of hyperparameters taken from the alexnet repo (link above) – batchsize(b) = 512/768, Lr = 0.12*(b/256), Lr decays by factor of 0.1 on 100th, 150th and 180th epoch, Total epochs = 200, latent dimension = 128, SGD optimizer.
  5. As of now, using only pytorch’s nn.MSELoss() to calculate reconstruction loss
  6. Center cropping 64x64 size images

I have observed that loss stagnates at 0.007 at around 130-160th epoch, reconstructed images have very poor color reconstruction and outline reconstruction is still okay, though not very good either. FID score is 120-140.

Any suggestions on what I maybe doing wrong? I have tried experimenting with lot of architectures but results aren’t improving. (rest training procedures is same just changed architectures)

Let me know if I should share exact arch. of the decoder


It would help if you shared a link to your work.
Do ask a GAN specialization mentor for help while you wait for a response.