RuntimeError: CUDA error: out of memory

jerrychi · April 23, 2022, 7:50am

This error (RuntimeError: CUDA error: out of memory) is happening on various notebooks for the homework in the first GANs course including Week 2 and Week 3. What can we do about this?

alvaroramajo · April 23, 2022, 10:25am

Hi, @jerrychi!

CUDA OOM error usually appears when trying to allocate more memory than the GPU server has. It is common when using a large batch size, so try to reduce it until all the data fits in memory.

If that is not the case, please describe which line throws that error and I’ll try to help.

jerrychi · April 23, 2022, 10:51am

I was just executing the pre-written existing code so I don’t think it was trying to allocate a lot of memory. Anyway the problem disappeared after I just waited for a few hours and tried again. Thanks!

alvaroramajo · April 23, 2022, 6:10pm

That kind of problem normally dissapears when you get assigned to a more powerful machine in the server

jerrychi · April 24, 2022, 6:46am

How can I do something to get assigned to a more powerful machine in the server? Thanks~

cynic · April 24, 2022, 6:53am

Hi @jerrychi. I also had this issue a few times a couple of days ago.
For me rebooting the notebook did the trick. It’s there under “Help” in the top right of the notebook.
Here’s the link to the original discussion we had.
https://community.deeplearning.ai/t/dls-course-4-week-2-cuda-runtime-implicit-initialization-on-gpu-0-failed-status-out-of-memory/121571?u=cynic

alvaroramajo · April 24, 2022, 9:14am

That happens automatically depending on the available resources and I think there’s nothing one can do to select a specific machine. As @cynic says, a reboot should be enough to solve the problem.

gorev.pv · April 30, 2022, 11:25am

It looks like a very common problem)) Maybe it worths to include it into the instruction if it cannot be fixed.

sharob.sinha · May 2, 2022, 1:10pm

Hi @jerrychi, Sorry for the inconvenience. We will investigate and fix this issue as soon as possible.

Thank you,
Sharob

David_HAIN · May 17, 2022, 2:26pm

I actually got the error when doing the code check for creating the get_one_hot_labels function and of course for this the size of the data is extremely small.

More specifically for me, the first assert passed and and it was the second assert (see below) which caused the error.

if torch.cuda.is_available():
assert str(get_one_hot_labels(torch.Tensor([[0]]).long().cuda(), 1).device).startswith(“cuda”)

dcbm · July 15, 2022, 3:55am

Hi, I am having the same problem when running the C1W3_WGAN_GP notebook. I didn’t made any chance in the cells …

The cell with error is:

gen = Generator(z_dim).to(device)
gen_opt = torch.optim.Adam(gen.parameters(), lr=lr, betas=(beta_1, beta_2))
crit = Critic().to(device) 
crit_opt = torch.optim.Adam(crit.parameters(), lr=lr, betas=(beta_1, beta_2))

def weights_init(m):
    if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
        torch.nn.init.normal_(m.weight, 0.0, 0.02)
    if isinstance(m, nn.BatchNorm2d):
        torch.nn.init.normal_(m.weight, 0.0, 0.02)
        torch.nn.init.constant_(m.bias, 0)
gen = gen.apply(weights_init)
crit = crit.apply(weights_init)

Thank you

sharob.sinha · July 18, 2022, 1:22pm

Hi @dcbm,

Kindly “Refresh the workspace” and re-submit the solution. (Please make sure to save your work before refreshing).

Thanks,
Sharob

Topic		Replies	Views
Staff Announcement: RuntimeError: CUDA error: out of memory FIX Generative Adversarial Networks (GANS)	0	116	May 3, 2022
Why: RuntimeError: CUDA error: out of memory Build Basic Generative Adversarial Networks week-1	7	588	July 4, 2022
CUDA error: out of memory Build Basic Generative Adversarial Networks week-3	1	313	August 13, 2022
Error Encountered When Training Neural Network AI Discussions ai-discussions	1	95	April 22, 2024
Week 4, Assignmet 2 - Art Generation, Cuda runtime error Convolutional Neural Networks	3	612	May 10, 2022

RuntimeError: CUDA error: out of memory

Related topics