Run time of C3W2: SRGAN (Optional) lab

saifkhanengr · August 24, 2023, 12:04pm

Hello!

I am running the optional lab, C3W2: SRGAN, in Colab. The cell (given below) took some four hours, yet it hasn’t completed its execution. Any thoughts?

Best,
Saif.

device = 'cuda' if torch.cuda.is_available() else 'cpu'
generator = Generator(n_res_blocks=16, n_ps_blocks=2)

# Uncomment the following lines if you're using ImageNet
# dataloader = torch.utils.data.DataLoader(
#     Dataset('data', 'train', download=True, hr_size=[384, 384], lr_size=[96, 96]),
#     batch_size=16, pin_memory=True, shuffle=True,
# )
# train_srresnet(generator, dataloader, device, lr=1e-4, total_steps=1e6, display_step=500)
# torch.save(generator, 'srresnet.pt')

# Uncomment the following lines if you're using STL
dataloader = torch.utils.data.DataLoader(
    Dataset('data', 'train', download=True, hr_size=[96, 96], lr_size=[24, 24]),
    batch_size=16, pin_memory=True, shuffle=True,
)
train_srresnet(generator, dataloader, device, lr=1e-4, total_steps=1e5, display_step=1000)
torch.save(generator, 'srresnet.pt')

saifkhanengr · August 24, 2023, 2:42pm

7 hours of running but still not completed. I am shutting down my computer. Good night

Wendy · August 25, 2023, 12:49am

Hi @saifkhanengr, that training IS slow, but I just gave it a try, and it was faster for me than what you’re seeing. I ran that cell for about 1 hour and it had gotten through 50000 steps - so halfway through the total, which would mean about 2 hours for the full 100000 steps, unless for some reason it started slowing way down towards the end. I was using colab.

If you were also using colab, the only thing I can think is that you somehow were using device = cpu. Maybe try changing the first line of that cell so that you break if torch.cuda.is_available() is false, just as a way to make doubly sure you’re running with gpu.

One other suggestion would be to reduce the total_steps in the call to train_srresnet to something smaller. It should be good enough for experimentation after 20K steps or so. Then you’ll at least have something to use to try out the next part - the SRGAN itself.

FYI, I noticed this old post about this optional lab. As far as I can tell, the calculations for adversarial loss still look suspicious, since both fake and real loss are checking for fake predictions to be false:

g_loss calls: self.adv_loss(fake_preds_for_g, False)
...
d_loss calls: self.adv_loss(fake_preds_for_d, False)

Just something to be aware of when you get to the SRGAN part of the lab. If your results aren’t looking great, you may want to experiment adjusting to make sure the discriminator is trying to predict false for fakes and the generator is trying to get the discriminator to predict true for fakes. (And if you do find you need to change something for this, please post back here so I can ping the developers to remind them that they still need to take a look at this.)

saifkhanengr · August 25, 2023, 1:21pm

Hello Wendy! Thank you for the reply.

I did this. Commenting the first line of code and adding some my code (below).

#device = 'cuda' if torch.cuda.is_available() else 'cpu'
if torch.cuda.is_available():
    device = 'cuda'
else:
    print("CUDA is not available. Exiting.")
    exit()

It executes and took some 3 hours to complete 100000 steps.

Good suggestion.

Personally, this long-time training demotivates me to further explore or play with this notebook. Bye to this. But I highly appreciate your suggestions…

Wendy · August 28, 2023, 6:21pm

I agree about the long time being demotivating!

I’ll submit a suggestion to the developers to load a pre-trained model for all or most of this first part (the srresnet training). It’s really the srgan training after this that is the focus of this lab anyway.

Topic		Replies	Views
C2_W2_Lab_1 1 epoch is taking 100sec Convolutional Neural Networks in TensorFlow week-2	2	527	October 7, 2022
You cannot currently connect to a GPU due to usage limits in Colab Generative Deep Learning with TensorFlow week-4	9	1694	November 22, 2021
C3W4_Assignment: Slow model training Advanced Computer Vision with TensorFlow week-4	6	400	June 29, 2024
Compilation takes forever Convolutional Neural Networks in TensorFlow week-2	6	535	June 14, 2022
C2 W2 lab is very slow and then self cancels unless I babysit it Convolutional Neural Networks in TensorFlow week-2	3	538	March 12, 2023

Run time of C3W2: SRGAN (Optional) lab

Related topics