PDS C3W1 - Assignment - Resource limit exceeded

Hi, I am getting a resource limit exceeded when trying to execute the lab for the assignment for course 3 week 1. In exercise 3 when trying to launch the tuning job it seems to try to create additional EC2 instances which exceeds the limits for this account. What’s the resolution ? Given we don’t have any control over the instances.


ResourceLimitExceeded Traceback (most recent call last)
in
4 ### END SOLUTION - DO NOT delete this comment for grading purposes
5 include_cls_metadata=False,
----> 6 wait=False
7 )

/opt/conda/lib/python3.7/site-packages/sagemaker/tuner.py in fit(self, inputs, job_name, include_cls_metadata, estimator_kwargs, wait, **kwargs)
442 “”"
443 if self.estimator is not None:
→ 444 self._fit_with_estimator(inputs, job_name, include_cls_metadata, **kwargs)
445 else:
446 self._fit_with_estimator_dict(inputs, job_name, include_cls_metadata, estimator_kwargs)

/opt/conda/lib/python3.7/site-packages/sagemaker/tuner.py in _fit_with_estimator(self, inputs, job_name, include_cls_metadata, **kwargs)
453 self._prepare_estimator_for_tuning(self.estimator, inputs, job_name, **kwargs)
454 self._prepare_for_tuning(job_name=job_name, include_cls_metadata=include_cls_metadata)
→ 455 self.latest_tuning_job = _TuningJob.start_new(self, inputs)
456
457 def _fit_with_estimator_dict(self, inputs, job_name, include_cls_metadata, estimator_kwargs):

/opt/conda/lib/python3.7/site-packages/sagemaker/tuner.py in start_new(cls, tuner, inputs)
1507 ]
1508
→ 1509 tuner.sagemaker_session.create_tuning_job(**tuner_args)
1510 return cls(tuner.sagemaker_session, tuner._current_job_name)
1511

/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in create_tuning_job(self, job_name, tuning_config, training_config, training_config_list, warm_start_config, tags)
2038 LOGGER.info(“Creating hyperparameter tuning job with name: %s”, job_name)
2039 LOGGER.debug(“tune request: %s”, json.dumps(tune_request, indent=4))
→ 2040 self.sagemaker_client.create_hyper_parameter_tuning_job(**tune_request)
2041
2042 def describe_tuning_job(self, job_name):

/opt/conda/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
384 "s() only accepts keyword arguments." py_operation_name)
385 # The “self” in this scope is referring to the BaseClient.
→ 386 return self._make_api_call(operation_name, kwargs)
387
388 _api_call.name = str(py_operation_name)

/opt/conda/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
703 error_code = parsed_response.get(“Error”, {}).get(“Code”)
704 error_class = self.exceptions.from_code(error_code)
→ 705 raise error_class(parsed_response, operation_name)
706 else:
707 return parsed_response

ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateHyperParameterTuningJob operation: The account-level service limit ‘ml.c5.9xlarge for training job usage’ is 2 Instances, with current utilization of 2 Instances and a request delta of 2 Instances. Please contact AWS support to request an increase for this limit.

Hello @cdhaeveloose,

The error may occur if you run the cell of tuner.fit() more than one time. Could you please check if training jobs already exist or not using the below cell.

Best regards,

1 Like

Hi, thanks for the following. Believe that is what may have happened.

I waited until the lab finished and restarted the next day and everything worked fine.
I had to select the kernel the second time (Python 3 Data Science) so the other thing that could have happened was that the incorrect kernel was selected, didn’t verify it the first time.

Anyways, all good now.

Thanks

Chris

Hi Chris,

Thanks for sharing your experience and it’s good to know you are okay now :slight_smile:

Happy learning