While it feels correct to start with low res images and gradually increase resolution, like how we learn any topic by gradually increasing the complexity, I am not able to clearly define in words how it actually helped or why did it even come to authors’ mind?
Is it just pure experimentation or there is a mathematical bases to it?
Hi, @Richeek_Arya! Thank you for your question.
What you’re referring to is called progressive growing. As mentioned in the video lecture, this technique of gradually increasing the image resolution during training was first proposed in ProgressiveGAN.
As per intuition behind gradual upsampling of the input image during training, I think there’re couple of reasons to do so.
More stable training. In one of the introductory lectures in was mentioned that it’s critical that the generator and discriminator learn in unison - generator has more difficult task to produce realistic output than the discriminator that predicts whether an image is real or fake. Starting to train with smaller resolution images helps to balance the training of the generator and the discriminator.
Image as hierarchy of features. As an analogy you can think of deep convolutional networks where in early layers convolutional kernels learn coarse features - shapes, silhouettes etc. whereas closer to the last layer convolution kernels learn fine features like high frequency details in an image. Similarly to this, progressive growing is aiming at realistically producing low level as well as high level features.
Hope this helps in your exploration of the matter. Please tag me if you have additional ideas or suggestions. Thank you.
Thanks for your response! I wanted to ask couple of follow up questions.
For point 1: That is by intuition right? Just checking if there is a mathematical basis to it as well?
For point 2: I have read it as well but never understood that. Could you please comment on how you arrived at this? Is it like we plot something at the end of every layer and deduce it?
There’s definitely a mathematical reasoning behind this however it’s not something I can put in a single formula. Instead, please recall that in GANs we calculate 2 losses - for generator and discriminator. Imagine a situation when the discriminator loss instantly drops (or explodes) and doesn’t change throughout the training process. In such case there’s a high possibility the generator won’t be able to learn from discriminator’s feedback and as a result GAN training becomes unstable. So there’s a computational reasoning behind progressive growing of input images.
The input to the first layer is a normalised array of input pixels to which a set of convolution kernels is applied. The output of the first layer is a set of N (# of output channels) feature maps that are fed to the next layer. As a result, convolution kernels at each layer “learn” features present in a given feature map set. As training advances convolution kernels at each layer produce an hierarchy of learned features.
Hope I didn’t confuse you even more. To summarise, to answer your original question - there’s a computational reasoning for progressive image growing.
For a deeper dive on point 2) that Dmitriy is making here, one place to look would be the lecture “What Are Deep ConvNets Learning” from Prof Ng in DLS C4 W4. It’s also available on YouTube. There he describes some really interesting work that examines what is happening in the hidden layers of a trained ConvNet.
Hi, @Richeek_Arya is nice to see you are interested in a deeper understanding. While my colleagues have written concise good points. I would like to refer you to one of the most important models that brought up the importance of upsampling: the U-Net.