This error (RuntimeError: CUDA error: out of memory) is happening on various notebooks for the homework in the first GANs course including Week 2 and Week 3. What can we do about this?
Hi, @jerrychi!
CUDA OOM error usually appears when trying to allocate more memory than the GPU server has. It is common when using a large batch size, so try to reduce it until all the data fits in memory.
If that is not the case, please describe which line throws that error and I’ll try to help.
I was just executing the pre-written existing code so I don’t think it was trying to allocate a lot of memory. Anyway the problem disappeared after I just waited for a few hours and tried again. Thanks!
That kind of problem normally dissapears when you get assigned to a more powerful machine in the server
How can I do something to get assigned to a more powerful machine in the server? Thanks~
Hi @jerrychi. I also had this issue a few times a couple of days ago.
For me rebooting the notebook did the trick. It’s there under “Help” in the top right of the notebook.
Here’s the link to the original discussion we had.
https://community.deeplearning.ai/t/dls-course-4-week-2-cuda-runtime-implicit-initialization-on-gpu-0-failed-status-out-of-memory/121571?u=cynic
That happens automatically depending on the available resources and I think there’s nothing one can do to select a specific machine. As @cynic says, a reboot should be enough to solve the problem.
It looks like a very common problem)) Maybe it worths to include it into the instruction if it cannot be fixed.
Hi @jerrychi, Sorry for the inconvenience. We will investigate and fix this issue as soon as possible.
Thank you,
Sharob
I actually got the error when doing the code check for creating the get_one_hot_labels function and of course for this the size of the data is extremely small.
More specifically for me, the first assert passed and and it was the second assert (see below) which caused the error.
if torch.cuda.is_available():
assert str(get_one_hot_labels(torch.Tensor([[0]]).long().cuda(), 1).device).startswith(“cuda”)
Hi, I am having the same problem when running the C1W3_WGAN_GP notebook. I didn’t made any chance in the cells …
The cell with error is:
gen = Generator(z_dim).to(device)
gen_opt = torch.optim.Adam(gen.parameters(), lr=lr, betas=(beta_1, beta_2))
crit = Critic().to(device)
crit_opt = torch.optim.Adam(crit.parameters(), lr=lr, betas=(beta_1, beta_2))
def weights_init(m):
if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
torch.nn.init.normal_(m.weight, 0.0, 0.02)
if isinstance(m, nn.BatchNorm2d):
torch.nn.init.normal_(m.weight, 0.0, 0.02)
torch.nn.init.constant_(m.bias, 0)
gen = gen.apply(weights_init)
crit = crit.apply(weights_init)
Thank you
Hi @dcbm,
Kindly “Refresh the workspace” and re-submit the solution. (Please make sure to save your work before refreshing).
Thanks,
Sharob