C4 W3 Assignment 2 - Help! CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

Abhishek_kalkote · May 17, 2022, 11:40pm

I keep getting the error: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

At 2.1 - Split Your Dataset into Unmasked and Masked Images Step.

I have not changed any code yet but keeps failing at this step. Please help!

InternalError Traceback (most recent call last)
in
----> 1 image_list_ds = tf.data.Dataset.list_files(image_list, shuffle=False)
2 mask_list_ds = tf.data.Dataset.list_files(mask_list, shuffle=False)
3
4 for path in zip(image_list_ds.take(3), mask_list_ds.take(3)):
5 print(path)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py in list_files(file_pattern, shuffle, seed)
1101 shuffle = True
1102 file_pattern = ops.convert_to_tensor(
→ 1103 file_pattern, dtype=dtypes.string, name=“file_pattern”)
1104 matching_files = gen_io_ops.matching_files(file_pattern)
1105

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
1497
1498 if ret is None:
→ 1499 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
1500
1501 if ret is NotImplemented:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
336 as_ref=False):
337 _ = as_ref
→ 338 return constant(v, dtype=dtype, name=name)
339
340

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name)
262 “”"
263 return _constant_impl(value, dtype, shape, name, verify_shape=False,
→ 264 allow_broadcast=True)
265
266

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
273 with trace.Trace(“tf.constant”):
274 return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
→ 275 return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
276
277 g = ops.get_default_graph()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
298 def _constant_eager_impl(ctx, value, dtype, shape, verify_shape):
299 “”“Implementation of eager constant.”""
→ 300 t = convert_to_eager_tensor(value, ctx, dtype)
301 if shape is None:
302 return t

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
95 except AttributeError:
96 dtype = dtypes.as_dtype(dtype).as_datatype_enum
—> 97 ctx.ensure_initialized()
98 return ops.EagerTensor(value, ctx.device_name, dtype)
99

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py in ensure_initialized(self)
537 if self._use_tfrt is not None:
538 pywrap_tfe.TFE_ContextOptionsSetTfrt(opts, self._use_tfrt)
→ 539 context_handle = pywrap_tfe.TFE_NewContext(opts)
540 finally:
541 pywrap_tfe.TFE_DeleteContextOptions(opts)

InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

Abhishek_kalkote · May 18, 2022, 12:01am

I was able to resolve the issue by getting the latest version and rebooting.

Help → Get Latest Version
Help → Reboot

paulinpaloalto · May 18, 2022, 1:14am

That error has happened a lot lately. I think the understanding is that it is a resource utilization problem in the backend infrastructure: too many active users for the number of GPUs available. So it may just be that waiting a few minutes to get the latest version happened to put you at a point where the load on the servers was less.

anon57530071 · May 18, 2022, 4:42pm

Have you checked with The NVIDIA System Management Interface (nvidia-smi) ? It provides us the current status of GPU devices including memory usage.

Here is an example.

You can use this command line interface from a cell in a Jupyter notebook with adding “!” in front of nvidia-smi command. If “Memory-Usage” shows that someone is already using your resource, one of the ways to use this GPU device is “kill remaining processes”, which may be slightly dangerous. Or, of course, just wait and get another instance which does not have any remaining processes.
In any cases, knowing the GPU memory usage may help you to think about the next step if you get the same error.

Abhishek_kalkote · May 18, 2022, 5:17pm

Very insightful, thank you.

Nidhi_Sachdev · May 24, 2022, 7:55pm

Thanks for posting this command … I’m running into this error and as far as I can see there are no active processes on the GPU. And very little memory is in use, yet it complains about being out of memory. If anyone has any suggestions on how to unblock myself, that would much appreciated.

Thanks!

Nidhi_Sachdev · May 24, 2022, 8:37pm

Help → Reboot Server fixes this. I found another thread … sorry for the noise.

Nidhi

Topic		Replies	Views
Week3 Assignment2 : U-Net assignmnet Convolutional Neural Networks	1	499	May 15, 2022
DLS/Course 4/Week 2/CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory Convolutional Neural Networks	28	1523	August 9, 2022
Week 4, Assignmet 2 - Art Generation, Cuda runtime error Convolutional Neural Networks	3	612	May 10, 2022
CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory Convolutional Neural Networks	9	726	August 3, 2022
DLS/Course 4/Week 3/reboot doesn't help for problem CUDA runtime implicit Convolutional Neural Networks	7	673	April 25, 2022

C4 W3 Assignment 2 - Help! CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

Related topics