DLS/Course 4/Week 2/CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

Thomas_Bulka · April 22, 2022, 7:01pm

Hello everyone,

I’m working on the first programming assignment. Executing the test cell for the identity block gives the following message:

CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

To me, it looks like an infrastructure problem and not particularly a problem with my code, since I can’t spot any message which directly relate to my implementation of the identity block.

What can I do to resolve the problem so I can continue working on the assigment?

Regards,

Thomas

balaji.ambresh · April 22, 2022, 7:27pm

If this is an issue with your environment and not with the grader:
Please create a new cell and run tf.keras.backend.clear_session() just after you run imports.

Thomas_Bulka · April 22, 2022, 7:30pm

Hi, I’ve done so and it does not change anything. The error message keeps the same.

balaji.ambresh · April 22, 2022, 7:32pm

That’s odd. Please restart the kernel and try again.

Thomas_Bulka · April 22, 2022, 7:35pm

Done. However, the error message keeps the same. Can I give you more information to make problem solving easier?

balaji.ambresh · April 22, 2022, 7:54pm

Please click my name and message your notebook as an attachment. Do provide any additional information you can.

cynic · April 22, 2022, 7:55pm

I’ve also had the same issue. The error first pops up when loading the parameters from the VGG-19 model. It’s the second cell in the notebook and I have not changed anything in the code.

The full error I receive is as follows:

---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
<ipython-input-2-c78346570a28> in <module>
      4 vgg = tf.keras.applications.VGG19(include_top=False,
      5                                   input_shape=(img_size, img_size, 3),
----> 6                                   weights='pretrained-model/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5')
      7 
      8 vgg.trainable = False

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/applications/vgg19.py in VGG19(include_top, weights, input_tensor, input_shape, pooling, classes, classifier_activation)
    142   x = layers.Conv2D(
    143       64, (3, 3), activation='relu', padding='same', name='block1_conv1')(
--> 144           img_input)
    145   x = layers.Conv2D(
    146       64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
    924     if _in_functional_construction_mode(self, inputs, args, kwargs, input_list):
    925       return self._functional_construction_call(inputs, args, kwargs,
--> 926                                                 input_list)
    927 
    928     # Maintains info about the `Layer.call` stack.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list)
   1096         # Build layer if applicable (if the `build` method has been
   1097         # overridden).
-> 1098         self._maybe_build(inputs)
   1099         cast_inputs = self._maybe_cast_inputs(inputs, input_list)
   1100 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in _maybe_build(self, inputs)
   2641         # operations.
   2642         with tf_utils.maybe_init_scope(self):
-> 2643           self.build(input_shapes)  # pylint:disable=not-callable
   2644       # We must set also ensure that the layer is marked as built, and the build
   2645       # shape is stored since user defined build functions may not be calling

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/convolutional.py in build(self, input_shape)
    202         constraint=self.kernel_constraint,
    203         trainable=True,
--> 204         dtype=self.dtype)
    205     if self.use_bias:
    206       self.bias = self.add_weight(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint, partitioner, use_resource, synchronization, aggregation, **kwargs)
    612         synchronization=synchronization,
    613         aggregation=aggregation,
--> 614         caching_device=caching_device)
    615     if regularizer is not None:
    616       # TODO(fchollet): in the future, this should be handled at the

/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/base.py in _add_variable_with_custom_getter(self, name, shape, dtype, initializer, getter, overwrite, **kwargs_for_getter)
    748         dtype=dtype,
    749         initializer=initializer,
--> 750         **kwargs_for_getter)
    751 
    752     # If we set an initializer and the variable processed it, tracking will not

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer_utils.py in make_variable(name, shape, dtype, initializer, trainable, caching_device, validate_shape, constraint, use_resource, collections, synchronization, aggregation, partitioner)
    143       synchronization=synchronization,
    144       aggregation=aggregation,
--> 145       shape=variable_shape if variable_shape else None)
    146 
    147 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in __call__(cls, *args, **kwargs)
    258   def __call__(cls, *args, **kwargs):
    259     if cls is VariableV1:
--> 260       return cls._variable_v1_call(*args, **kwargs)
    261     elif cls is Variable:
    262       return cls._variable_v2_call(*args, **kwargs)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in _variable_v1_call(cls, initial_value, trainable, collections, validate_shape, caching_device, name, variable_def, dtype, expected_shape, import_scope, constraint, use_resource, synchronization, aggregation, shape)
    219         synchronization=synchronization,
    220         aggregation=aggregation,
--> 221         shape=shape)
    222 
    223   def _variable_v2_call(cls,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in <lambda>(**kwargs)
    197                         shape=None):
    198     """Call on Variable class. Useful to force the signature."""
--> 199     previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
    200     for _, getter in ops.get_default_graph()._variable_creator_stack:  # pylint: disable=protected-access
    201       previous_getter = _make_getter(getter, previous_getter)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in default_variable_creator(next_creator, **kwargs)
   2595         synchronization=synchronization,
   2596         aggregation=aggregation,
-> 2597         shape=shape)
   2598   else:
   2599     return variables.RefVariable(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in __call__(cls, *args, **kwargs)
    262       return cls._variable_v2_call(*args, **kwargs)
    263     else:
--> 264       return super(VariableMetaclass, cls).__call__(*args, **kwargs)
    265 
    266 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py in __init__(self, initial_value, trainable, collections, validate_shape, caching_device, name, dtype, variable_def, import_scope, constraint, distribute_strategy, synchronization, aggregation, shape)
   1516           aggregation=aggregation,
   1517           shape=shape,
-> 1518           distribute_strategy=distribute_strategy)
   1519 
   1520   def _init_from_args(self,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py in _init_from_args(self, initial_value, trainable, collections, caching_device, name, dtype, constraint, synchronization, aggregation, distribute_strategy, shape)
   1649           with ops.name_scope("Initializer"), device_context_manager(None):
   1650             initial_value = ops.convert_to_tensor(
-> 1651                 initial_value() if init_from_fn else initial_value,
   1652                 name="initial_value", dtype=dtype)
   1653           if shape is not None:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/initializers/initializers_v2.py in __call__(self, shape, dtype)
    395        (via `tf.keras.backend.set_floatx(float_dtype)`)
    396     """
--> 397     return super(VarianceScaling, self).__call__(shape, dtype=_get_dtype(dtype))
    398 
    399 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops_v2.py in __call__(self, shape, dtype)
    559     else:
    560       limit = math.sqrt(3.0 * scale)
--> 561       return self._random_generator.random_uniform(shape, -limit, limit, dtype)
    562 
    563   def get_config(self):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops_v2.py in random_uniform(self, shape, minval, maxval, dtype)
   1042       op = random_ops.random_uniform
   1043     return op(
-> 1044         shape=shape, minval=minval, maxval=maxval, dtype=dtype, seed=self.seed)
   1045 
   1046   def truncated_normal(self, shape, mean, stddev, dtype):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
    199     """Call target, and fall back on dispatchers if there is a TypeError."""
    200     try:
--> 201       return target(*args, **kwargs)
    202     except (TypeError, ValueError):
    203       # Note: convert_to_eager_tensor currently raises a ValueError, not a

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/random_ops.py in random_uniform(shape, minval, maxval, dtype, seed, name)
    286     maxval = 1
    287   with ops.name_scope(name, "random_uniform", [shape, minval, maxval]) as name:
--> 288     shape = tensor_util.shape_tensor(shape)
    289     # In case of [0,1) floating results, minval and maxval is unused. We do an
    290     # `is` comparison here since this is cheaper than isinstance or  __eq__.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py in shape_tensor(shape)
   1027       # not convertible to Tensors because of mixed content.
   1028       shape = tuple(map(tensor_shape.dimension_value, shape))
-> 1029   return ops.convert_to_tensor(shape, dtype=dtype, name="shape")
   1030 
   1031 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
   1497 
   1498     if ret is None:
-> 1499       ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
   1500 
   1501     if ret is NotImplemented:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
    336                                          as_ref=False):
    337   _ = as_ref
--> 338   return constant(v, dtype=dtype, name=name)
    339 
    340 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name)
    262   """
    263   return _constant_impl(value, dtype, shape, name, verify_shape=False,
--> 264                         allow_broadcast=True)
    265 
    266 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
    273       with trace.Trace("tf.constant"):
    274         return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
--> 275     return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    276 
    277   g = ops.get_default_graph()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    298 def _constant_eager_impl(ctx, value, dtype, shape, verify_shape):
    299   """Implementation of eager constant."""
--> 300   t = convert_to_eager_tensor(value, ctx, dtype)
    301   if shape is None:
    302     return t

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
     95     except AttributeError:
     96       dtype = dtypes.as_dtype(dtype).as_datatype_enum
---> 97   ctx.ensure_initialized()
     98   return ops.EagerTensor(value, ctx.device_name, dtype)
     99 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py in ensure_initialized(self)
    537         if self._use_tfrt is not None:
    538           pywrap_tfe.TFE_ContextOptionsSetTfrt(opts, self._use_tfrt)
--> 539         context_handle = pywrap_tfe.TFE_NewContext(opts)
    540       finally:
    541         pywrap_tfe.TFE_DeleteContextOptions(opts)

InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

balaji.ambresh · April 22, 2022, 8:10pm

@cynic
I just tried on coursera environment and was able to load the model.
Please try resetting your lab environment by Help (top right) → Reboot
If that doesn’t resolve the problem, try coursera help.

cynic · April 22, 2022, 8:17pm

Hey @balaji.ambresh,
The same error still persists in the graded exercise section though. Not sure, if it’s something wrong with my code or the server, still.

Thomas_Bulka · April 22, 2022, 8:18pm

It seems to work now, thanks! Now I’ll go on and fix the logic bugs in my code…

cynic · April 22, 2022, 8:19pm

Another reboot solved it for me too. Thank you!

Priyank_Mishra · April 24, 2022, 8:49am

rebooting solved it for me as well

Sectum · April 24, 2022, 11:24am

help! I have already reboot it for several times and tried all of the methods above , but none works for me.
I 'm on course 4 week 3 homework 2, the problem happens on the 4th code block which is already given, when doing image_list_ds = tf.data.Dataset.list_files(image_list, shuffle=False)

What should I do to solve this problem?

balaji.ambresh · April 24, 2022, 11:37am

What did coursera help tell you?

John_Crane · April 25, 2022, 3:17am

I was having the same problem with identity_block in Week 2. Rebooting worked for me.

balaji.ambresh · April 25, 2022, 6:01am

Hi @Sectum

Please follow up with coursera. The case reference is #03032484.

Hello Balaji,

Thank you for following up.

May I know which link are you referring to? this is the case number of your interaction #03032484. In order for us to properly follow up with the Learner, you can advise him to contact us and provide us with the following details:

-Name of the specific assignment.
-Screenshots of the issue.
-If possible the direct link to the item.

Please let me know if I can help you with any other questions or concerns. I will be more than happy to help!

Thanks.
Erwin,
Coursera Support Team.
ref:_00D1Uy4UJ._5008Wn8tjA:ref

Mohamed_Ehab · April 26, 2022, 11:36am

I had the same problem, Help → get latest version did it for me

inam · April 27, 2022, 1:33pm

I am getting the same error. Did you solve it yet?

dheeraj_kura · May 1, 2022, 3:15pm

problem starts from #cell2

InternalError Traceback (most recent call last)
in
4 vgg = tf.keras.applications.VGG19(include_top=False,
5 input_shape=(img_size, img_size, 3),
----> 6 weights=‘pretrained-model/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5’)
7
8 vgg.trainable = False

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/applications/vgg19.py in VGG19(include_top, weights, input_tensor, input_shape, pooling, classes, classifier_activation)
142 x = layers.Conv2D(
143 64, (3, 3), activation=‘relu’, padding=‘same’, name=‘block1_conv1’)(
→ 144 img_input)
145 x = layers.Conv2D(
146 64, (3, 3), activation=‘relu’, padding=‘same’, name=‘block1_conv2’)(x)

<Note: mentor edit removing many lines of assert messages from the Tensorflow kernel>

InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory
and its same for other cells too

dheeraj_kura · May 1, 2022, 3:15pm

am facing the same issue too

Topic		Replies	Views
CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory Convolutional Neural Networks	9	774	August 3, 2022
Week 4, Assignmet 2 - Art Generation, Cuda runtime error Convolutional Neural Networks	3	612	May 10, 2022
DLS/Course 4/Week 3/reboot doesn't help for problem CUDA runtime implicit Convolutional Neural Networks	7	673	April 25, 2022
C4 W3 Assignment 2 - Help! CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory Convolutional Neural Networks	6	814	May 24, 2022
Issue Course 4 W4 Assignment 2 Convolutional Neural Networks	3	522	June 18, 2022

DLS/Course 4/Week 2/CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

problem starts from #cell2

Related topics