Blocks for making a WGAN-GP to generate a fictional character

Hello, everyone. I’m trying to apply my recently aquired knowledge about GAN’s, and for that i’m trying to build a WGAN-GP for generating a fictional character, but i have a very limited dataset (827 images) with a very broad range of image sizes and ways that this chracter is portrayed. With that said I would like to know:

  • Is it easier to train the model if i resize the images to 64x64 or is it better to be in high resolution as possible (was thinking in 256x256 at first)? I’d like to know papers that use those resolutions for getting a better graps of how many convolutions and transposed convolutions and what the strides, paddings, etc.

  • Since i have a very limited dataset, using basic data augmentation would help? I was thinking in a horizontal flip of the images and then rotating both the flipped and non-flipped ones, making it 4 times bigger.

  • Would it be better to select images that the fictional character is portrayed in the most similar way (selecting only images with the full body of the character, for exemple)?

  • Is WGAN-GP really a good architecture for this case?

Thanks in advance for anyone trying to help.

Here are my thoughts:

In better resolution the model can pick up more features but it also depends on the number of parameters of the model if it capable of analyzing that many features.

Data augmentation is a good direction to move, you can see tensorflows image data augmentation library it can be helpful to you.

If you are just wanting to detect only the fictional character then maybe its a good idea to concentrate on that, but you need to see the real life test cases, what background do they include?

  • Is WGAN-GP really a good architecture for this case? - I dont know but probably more advanced models exist…
1 Like

Thanks for your reply @gent.spah !

I’ll certainly give a look and try to use the tensorflow data augmentation library!

I understand that there are more powerful architectures like StyleGAN and CycleGAN, but i don’t know how to apply these architectures to my case. I’m trying to generate images of a brazilian folkoric character and as I said, there aren’t many images available, and the WGAN-GP architecture was the one that grabbed my attention because it was more clear to me what i should to to generate more images of this character. Would you give me a clearer idea of how could i use the more powerful architectures? I don’t know if i misuderstood the uses of those architectures in the other classes, so your answer would be of great help.

Its been a while I have done this specialization but I think they explain those architechtures too, you also need to search online to find a pretrained GAN model (start with what comes up to your mind first), that you can possibly fine tune since you dont have a lot of data. These are my suggestions.

You are absolutely correct.

I’ve just entered the world of deep learning, it’s not much more than two weeks since i’ve started taking courses on this subject, and I don’t have a strong mathematical and statistical background, so it surely is a lot to digest. But i have to complete this project (it’s the final thesis for my post graduation) as soon as possible since the deadline is close. So the best approach i could’ve think of was doing this WGAN-GP to consolidate my recent obtained knowledge and being able to complete my thesis as fast as possible with a decent quality (for now it’s just provisory results). And later with more time and improving my mathematics, i’ll dive into these more sofisticated architectures.

Thank you so much!

1 Like

Yes thats right the most important point of the thesis is expounding an idea and the steps taken to do it not the final result which in any case is unknown. So try to put effort in the process and if be, you will achieve some or all of the projected goals.

1 Like