Last week in GANs course 3 (Cycle GANs), C3W3_assignment of Cycle GANs set loss weight 1, 0.1, and 10 for adversarial, identity, and cycle loss, respectively, on the image transfer between zebra and hours. In a more complicated situation, like night-to-day traffic image transfer, the original code doesn’t work well. Does anyone have any experience to do an assignment like that? How to set up the weight of an adversarial, identity, and cycle loss to get better results.
Welcome to the community. I hope you are doing well.
Setting the loss weight (lambda) in a CycleGAN depends on the specific problem you are trying to solve and the characteristics of the input and output images. Therefore, there is no fixed set of values that will work for every problem. To determine the optimal loss weight for a specific problem, you can experiment with different values and evaluate the results.
In the case of night-to-day traffic image transfer, the task is more complicated, and you may need to adjust the loss weights to ensure that the generated images preserve the essential characteristics of the input images. For example, you may want to increase the weight of the identity loss to ensure that the generator preserves the essential features of the input images, such as the position and orientation of the cars.
Overall, setting the loss weights for a CycleGAN requires some trial and error to find the optimal values that produce the desired output images. Maybe you can start with these weights (as per the assignment) and fine-tune them. But to be honest, I don’t have any experience with such problems. This is just my opinion, waiting to hear many views from mentors and learners on this !
Thx, Nithin. Now, I try [ad, id, cyc1, cyc2] = [1,5,10,10], but it is a kinda mode collapse. I will iincrease cycle weight more. Before training, I did pre-process images for some over/under-exposed areas. I am waiting for the result. Thanks again.
Thanks for this info Shenglin_li, do share your results, experiments and experience as you progress through your project, would be a great source of knowledge for us.
I have another question about the batch size, why set up batch size as 1 that is very easy to stuck in the local minimum.
Setting the batch size to 1 means that you are performing gradient descent, where you update the model weights after every single training example. As you said it is very easy to get stuck in the local minima initially.
On the other hand, using a larger batch size can help reduce the variance in the training process, as the model updates are based on the average of the gradients computed over a larger number of examples. This can help smooth out the optimization process and lead to better convergence to a global minimum.
However, using a larger batch size also leads to memory issues if the computation is too demanding for the available resources. So we might have to be careful about the batch size. When the batch size is larger, more data samples are loaded into memory at once, which requires more memory. If the batch size is too large for the available RAM/VRAM, the program may not be able to load all the data samples into memory at once, causing an out-of-memory error. In this case, the program may stop running and crash.
So when it comes to this trade-off, usually keeping batch size lesser is preferred in my opinion because at least we can run our model! It will take more epochs to converge though. If your images are smaller in size then maybe you can have a higher batch size, so again it depends on your problem and dataset.