DLS/Course 4/Week 2/CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

Hello everyone,

I’m working on the first programming assignment. Executing the test cell for the identity block gives the following message:

CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

To me, it looks like an infrastructure problem and not particularly a problem with my code, since I can’t spot any message which directly relate to my implementation of the identity block.

What can I do to resolve the problem so I can continue working on the assigment?

Regards,

Thomas

1 Like

If this is an issue with your environment and not with the grader:
Please create a new cell and run tf.keras.backend.clear_session() just after you run imports.

1 Like

Hi, I’ve done so and it does not change anything. The error message keeps the same.

That’s odd. Please restart the kernel and try again.

Done. However, the error message keeps the same. Can I give you more information to make problem solving easier?

1 Like

Please click my name and message your notebook as an attachment. Do provide any additional information you can.

I’ve also had the same issue. The error first pops up when loading the parameters from the VGG-19 model. It’s the second cell in the notebook and I have not changed anything in the code.

The full error I receive is as follows:

---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
<ipython-input-2-c78346570a28> in <module>
      4 vgg = tf.keras.applications.VGG19(include_top=False,
      5                                   input_shape=(img_size, img_size, 3),
----> 6                                   weights='pretrained-model/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5')
      7 
      8 vgg.trainable = False

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/applications/vgg19.py in VGG19(include_top, weights, input_tensor, input_shape, pooling, classes, classifier_activation)
    142   x = layers.Conv2D(
    143       64, (3, 3), activation='relu', padding='same', name='block1_conv1')(
--> 144           img_input)
    145   x = layers.Conv2D(
    146       64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
    924     if _in_functional_construction_mode(self, inputs, args, kwargs, input_list):
    925       return self._functional_construction_call(inputs, args, kwargs,
--> 926                                                 input_list)
    927 
    928     # Maintains info about the `Layer.call` stack.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list)
   1096         # Build layer if applicable (if the `build` method has been
   1097         # overridden).
-> 1098         self._maybe_build(inputs)
   1099         cast_inputs = self._maybe_cast_inputs(inputs, input_list)
   1100 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in _maybe_build(self, inputs)
   2641         # operations.
   2642         with tf_utils.maybe_init_scope(self):
-> 2643           self.build(input_shapes)  # pylint:disable=not-callable
   2644       # We must set also ensure that the layer is marked as built, and the build
   2645       # shape is stored since user defined build functions may not be calling

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/convolutional.py in build(self, input_shape)
    202         constraint=self.kernel_constraint,
    203         trainable=True,
--> 204         dtype=self.dtype)
    205     if self.use_bias:
    206       self.bias = self.add_weight(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint, partitioner, use_resource, synchronization, aggregation, **kwargs)
    612         synchronization=synchronization,
    613         aggregation=aggregation,
--> 614         caching_device=caching_device)
    615     if regularizer is not None:
    616       # TODO(fchollet): in the future, this should be handled at the

/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/base.py in _add_variable_with_custom_getter(self, name, shape, dtype, initializer, getter, overwrite, **kwargs_for_getter)
    748         dtype=dtype,
    749         initializer=initializer,
--> 750         **kwargs_for_getter)
    751 
    752     # If we set an initializer and the variable processed it, tracking will not

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer_utils.py in make_variable(name, shape, dtype, initializer, trainable, caching_device, validate_shape, constraint, use_resource, collections, synchronization, aggregation, partitioner)
    143       synchronization=synchronization,
    144       aggregation=aggregation,
--> 145       shape=variable_shape if variable_shape else None)
    146 
    147 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in __call__(cls, *args, **kwargs)
    258   def __call__(cls, *args, **kwargs):
    259     if cls is VariableV1:
--> 260       return cls._variable_v1_call(*args, **kwargs)
    261     elif cls is Variable:
    262       return cls._variable_v2_call(*args, **kwargs)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in _variable_v1_call(cls, initial_value, trainable, collections, validate_shape, caching_device, name, variable_def, dtype, expected_shape, import_scope, constraint, use_resource, synchronization, aggregation, shape)
    219         synchronization=synchronization,
    220         aggregation=aggregation,
--> 221         shape=shape)
    222 
    223   def _variable_v2_call(cls,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in <lambda>(**kwargs)
    197                         shape=None):
    198     """Call on Variable class. Useful to force the signature."""
--> 199     previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
    200     for _, getter in ops.get_default_graph()._variable_creator_stack:  # pylint: disable=protected-access
    201       previous_getter = _make_getter(getter, previous_getter)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in default_variable_creator(next_creator, **kwargs)
   2595         synchronization=synchronization,
   2596         aggregation=aggregation,
-> 2597         shape=shape)
   2598   else:
   2599     return variables.RefVariable(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py in __call__(cls, *args, **kwargs)
    262       return cls._variable_v2_call(*args, **kwargs)
    263     else:
--> 264       return super(VariableMetaclass, cls).__call__(*args, **kwargs)
    265 
    266 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py in __init__(self, initial_value, trainable, collections, validate_shape, caching_device, name, dtype, variable_def, import_scope, constraint, distribute_strategy, synchronization, aggregation, shape)
   1516           aggregation=aggregation,
   1517           shape=shape,
-> 1518           distribute_strategy=distribute_strategy)
   1519 
   1520   def _init_from_args(self,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py in _init_from_args(self, initial_value, trainable, collections, caching_device, name, dtype, constraint, synchronization, aggregation, distribute_strategy, shape)
   1649           with ops.name_scope("Initializer"), device_context_manager(None):
   1650             initial_value = ops.convert_to_tensor(
-> 1651                 initial_value() if init_from_fn else initial_value,
   1652                 name="initial_value", dtype=dtype)
   1653           if shape is not None:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/initializers/initializers_v2.py in __call__(self, shape, dtype)
    395        (via `tf.keras.backend.set_floatx(float_dtype)`)
    396     """
--> 397     return super(VarianceScaling, self).__call__(shape, dtype=_get_dtype(dtype))
    398 
    399 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops_v2.py in __call__(self, shape, dtype)
    559     else:
    560       limit = math.sqrt(3.0 * scale)
--> 561       return self._random_generator.random_uniform(shape, -limit, limit, dtype)
    562 
    563   def get_config(self):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops_v2.py in random_uniform(self, shape, minval, maxval, dtype)
   1042       op = random_ops.random_uniform
   1043     return op(
-> 1044         shape=shape, minval=minval, maxval=maxval, dtype=dtype, seed=self.seed)
   1045 
   1046   def truncated_normal(self, shape, mean, stddev, dtype):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
    199     """Call target, and fall back on dispatchers if there is a TypeError."""
    200     try:
--> 201       return target(*args, **kwargs)
    202     except (TypeError, ValueError):
    203       # Note: convert_to_eager_tensor currently raises a ValueError, not a

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/random_ops.py in random_uniform(shape, minval, maxval, dtype, seed, name)
    286     maxval = 1
    287   with ops.name_scope(name, "random_uniform", [shape, minval, maxval]) as name:
--> 288     shape = tensor_util.shape_tensor(shape)
    289     # In case of [0,1) floating results, minval and maxval is unused. We do an
    290     # `is` comparison here since this is cheaper than isinstance or  __eq__.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py in shape_tensor(shape)
   1027       # not convertible to Tensors because of mixed content.
   1028       shape = tuple(map(tensor_shape.dimension_value, shape))
-> 1029   return ops.convert_to_tensor(shape, dtype=dtype, name="shape")
   1030 
   1031 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
   1497 
   1498     if ret is None:
-> 1499       ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
   1500 
   1501     if ret is NotImplemented:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
    336                                          as_ref=False):
    337   _ = as_ref
--> 338   return constant(v, dtype=dtype, name=name)
    339 
    340 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name)
    262   """
    263   return _constant_impl(value, dtype, shape, name, verify_shape=False,
--> 264                         allow_broadcast=True)
    265 
    266 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
    273       with trace.Trace("tf.constant"):
    274         return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
--> 275     return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    276 
    277   g = ops.get_default_graph()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    298 def _constant_eager_impl(ctx, value, dtype, shape, verify_shape):
    299   """Implementation of eager constant."""
--> 300   t = convert_to_eager_tensor(value, ctx, dtype)
    301   if shape is None:
    302     return t

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
     95     except AttributeError:
     96       dtype = dtypes.as_dtype(dtype).as_datatype_enum
---> 97   ctx.ensure_initialized()
     98   return ops.EagerTensor(value, ctx.device_name, dtype)
     99 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py in ensure_initialized(self)
    537         if self._use_tfrt is not None:
    538           pywrap_tfe.TFE_ContextOptionsSetTfrt(opts, self._use_tfrt)
--> 539         context_handle = pywrap_tfe.TFE_NewContext(opts)
    540       finally:
    541         pywrap_tfe.TFE_DeleteContextOptions(opts)

InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory
1 Like

@cynic
I just tried on coursera environment and was able to load the model.
Please try resetting your lab environment by Help (top right) → Reboot
If that doesn’t resolve the problem, try coursera help.

7 Likes

Hey @balaji.ambresh,
The same error still persists in the graded exercise section though. Not sure, if it’s something wrong with my code or the server, still.

It seems to work now, thanks! Now I’ll go on and fix the logic bugs in my code…

Another reboot solved it for me too. Thank you!

rebooting solved it for me as well :+1:

help! I have already reboot it for several times and tried all of the methods above , but none works for me.
I 'm on course 4 week 3 homework 2, the problem happens on the 4th code block which is already given, when doing image_list_ds = tf.data.Dataset.list_files(image_list, shuffle=False)

What should I do to solve this problem?

What did coursera help tell you?

I was having the same problem with identity_block in Week 2. Rebooting worked for me.

Hi @Sectum

Please follow up with coursera. The case reference is #03032484.

Hello Balaji,

Thank you for following up.

May I know which link are you referring to? this is the case number of your interaction #03032484. In order for us to properly follow up with the Learner, you can advise him to contact us and provide us with the following details:

-Name of the specific assignment.
-Screenshots of the issue.
-If possible the direct link to the item.

Please let me know if I can help you with any other questions or concerns. I will be more than happy to help!

Thanks.
Erwin,
Coursera Support Team.
ref:_00D1Uy4UJ._5008Wn8tjA:ref

I had the same problem, Help → get latest version did it for me

3 Likes

I am getting the same error. Did you solve it yet?

problem starts from #cell2

InternalError Traceback (most recent call last)
in
4 vgg = tf.keras.applications.VGG19(include_top=False,
5 input_shape=(img_size, img_size, 3),
----> 6 weights=‘pretrained-model/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5’)
7
8 vgg.trainable = False

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/applications/vgg19.py in VGG19(include_top, weights, input_tensor, input_shape, pooling, classes, classifier_activation)
142 x = layers.Conv2D(
143 64, (3, 3), activation=‘relu’, padding=‘same’, name=‘block1_conv1’)(
→ 144 img_input)
145 x = layers.Conv2D(
146 64, (3, 3), activation=‘relu’, padding=‘same’, name=‘block1_conv2’)(x)

<Note: mentor edit removing many lines of assert messages from the Tensorflow kernel>

InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory
and its same for other cells too

am facing the same issue too