C2W3-Assignment 3: Testing Data Quality with Great Expectations - Empty S3 docs bucket

Hi, has anyone experienced similar issues with an empty S3 docs bucket in Section 7? I am able to pass the assignment with a score of 80/100, but would like to know why there were errors in validations. Details on the location and outputs with traceback:

7 - Checkpoints and Computing Validations over the Dataset

Validation error after running this:

checkpoint_result = checkpoint.run()

Cell outputs are as below.

Calculating Metrics: 0%| | 0/25 [00:00<?, ?it/s]

Instantiating class from config without an explicit class_name is dangerous. Consider adding an explicit class_name for None
Error running action with name update_data_docs
Traceback (most recent call last):
File “/home/coder/miniconda/lib/python3.12/site-packages/great_expectations/data_context/util.py”, line 90, in instantiate_class_from_config
class_instance = class_(**config_with_defaults)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: SiteBuilder._init_() missing 1 required positional argument: ‘store_backend’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/coder/miniconda/lib/python3.12/site-packages/great_expectations/validation_operators/validation_operators.py”, line 476, in _run_actions
action_result = self.actions[name].run(
^^^^^^^^^^^^^^^^^^^^^^^
File “/home/coder/miniconda/lib/python3.12/site-packages/great_expectations/checkpoint/actions.py”, line 101, in run
return self._run(
^^^^^^^^^^
File “/home/coder/miniconda/lib/python3.12/site-packages/great_expectations/checkpoint/actions.py”, line 1190, in _run
self.data_context.build_data_docs(
File “/home/coder/miniconda/lib/python3.12/site-packages/great_expectations/core/usage_statistics/usage_statistics.py”, line 266, in usage_statistics_wrapped_method
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File “/home/coder/miniconda/lib/python3.12/site-packages/great_expectations/data_context/data_context/abstract_data_context.py”, line 5318, in build_data_docs
return self._build_data_docs(
^^^^^^^^^^^^^^^^^^^^^^

“validations_store_name”: “validations_store”
}, ‘root_directory’: ‘/home/coder/project/gx’, ‘site_name’: ‘local_site’}

SiteBuilder._init_() missing 1 required positional argument: ‘store_backend’

Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings


TypeError Traceback (most recent call last)
File ~/miniconda/lib/python3.12/site-packages/great_expectations/data_context/util.py:90, in instantiate_class_from_config(config, runtime_environment, config_defaults)
89 try:
—> 90 class_instance = class_(**config_with_defaults)
91 except TypeError as e:

TypeError: SiteBuilder.init() missing 1 required positional argument: ‘store_backend’

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
Cell In[27], line 1
----> 1 checkpoint_result = checkpoint.run()

File ~/miniconda/lib/python3.12/site-packages/great_expectations/core/usage_statistics/usage_statistics.py:266, in usage_statistics_enabled_method..usage_statistics_wrapped_method(*args, **kwargs)
263 args_payload = args_payload_fn(*args, **kwargs) or {}
264 nested_update(event_payload, args_payload)
→ 266 result = func(*args, **kwargs)
267 message[“success”] = True
268 except Exception:

File ~/miniconda/lib/python3.12/site-packages/great_expectations/checkpoint/checkpoint.py:305, in BaseCheckpoint.run(self, template_name, run_name_template, expectation_suite_name, batch_request, validator, action_list, evaluation_parameters, runtime_configuration, validations, profilers, run_id, run_name, run_time, result_format, expectation_suite_ge_cloud_id)
303 if len(validations) > 0:
304 for idx, validation_dict in enumerate(validations):
→ 305 self._run_validation(
306 substituted_runtime_config=substituted_runtime_config,
307 async_validation_operator_results=async_validation_operator_results,
308 async_executor=async_executor,
309 result_format=result_format,
310 run_id=run_id,
311 idx=idx,
312 validation_dict=validation_dict,
313 )
314 else:
315 self._run_validation(
316 substituted_runtime_config=substituted_runtime_config,
317 async_validation_operator_results=async_validation_operator_results,
(…)
320 run_id=run_id,
321 )

File ~/miniconda/lib/python3.12/site-packages/great_expectations/checkpoint/checkpoint.py:531, in BaseCheckpoint._run_validation(self, substituted_runtime_config, async_validation_operator_results, async_executor, result_format, run_id, idx, validation_dict)
527 operator_run_kwargs[“catch_exceptions”] = catch_exceptions_validation
529 validation_id: str | None = substituted_validation_dict.get(“id”)
→ 531 async_validation_operator_result = async_executor.submit(
532 action_list_validation_operator.run,
533 assets_to_validate=[validator],
534 run_id=run_id,
535 evaluation_parameters=substituted_validation_dict.get(
536 “evaluation_parameters”
537 ),
538 result_format=result_format,
539 checkpoint_identifier=checkpoint_identifier,
540 checkpoint_name=self.name,
541 validation_id=validation_id,
542 **operator_run_kwargs,
543 )
544 async_validation_operator_results.append(async_validation_operator_result)
545 except (
546 gx_exceptions.CheckpointError,
547 gx_exceptions.ExecutionEngineError,
548 gx_exceptions.MetricError,
549 ) as e:

File ~/miniconda/lib/python3.12/site-packages/great_expectations/core/async_executor.py:106, in AsyncExecutor.submit(self, fn, *args, **kwargs)
102 return AsyncResult(
103 future=self._thread_pool_executor.submit(fn, *args, **kwargs) # type: ignore[union-attr]
104 )
105 else:
→ 106 return AsyncResult(value=fn(*args, **kwargs))

File ~/miniconda/lib/python3.12/site-packages/great_expectations/validation_operators/validation_operators.py:413, in ActionListValidationOperator.run(self, assets_to_validate, run_id, evaluation_parameters, run_name, run_time, catch_exceptions, result_format, checkpoint_identifier, checkpoint_name, validation_id)
408 validation_result.meta[“validation_id”] = validation_id
409 validation_result.meta[“checkpoint_id”] = (
410 checkpoint_identifier.id if checkpoint_identifier else None
411 )
→ 413 batch_actions_results = self._run_actions(
414 batch=batch,
415 expectation_suite_identifier=expectation_suite_identifier,
416 expectation_suite=batch._expectation_suite,
417 batch_validation_result=validation_result,
418 run_id=run_id,
419 validation_result_id=validation_result_id,
420 checkpoint_identifier=checkpoint_identifier,
421 )
423 run_result_obj = {
424 “validation_result”: validation_result,
425 “actions_results”: batch_actions_results,
426 }
427 run_results[validation_result_id] = run_result_obj

File ~/miniconda/lib/python3.12/site-packages/great_expectations/validation_operators/validation_operators.py:506, in ActionListValidationOperator._run_actions(self, batch, expectation_suite_identifier, expectation_suite, batch_validation_result, run_id, validation_result_id, checkpoint_identifier)
504 except Exception as e:
505 logger.exception(f"Error running action with name {action[‘name’]}")
→ 506 raise e
508 return batch_actions_results

File ~/miniconda/lib/python3.12/site-packages/great_expectations/validation_operators/validation_operators.py:476, in ActionListValidationOperator._run_actions(self, batch, expectation_suite_identifier, expectation_suite, batch_validation_result, run_id, validation_result_id, checkpoint_identifier)
470 validation_result_id = ValidationResultIdentifier(
471 expectation_suite_identifier=expectation_suite_identifier,
472 run_id=run_id,
473 batch_identifier=batch_identifier,
474 )
475 try:
→ 476 action_result = self.actions[name].run(
477 validation_result_suite_identifier=validation_result_id,
478 validation_result_suite=batch_validation_result,
479 data_asset=batch,
480 payload=batch_actions_results,
481 expectation_suite_identifier=expectation_suite_identifier,
482 checkpoint_identifier=checkpoint_identifier,
483 )
485 # Transform action_result if it not a dictionary.
486 if isinstance(action_result, GXCloudResourceRef):

File ~/miniconda/lib/python3.12/site-packages/great_expectations/checkpoint/actions.py:101, in ValidationAction.run(self, validation_result_suite, validation_result_suite_identifier, data_asset, expectation_suite_identifier, checkpoint_identifier, **kwargs)
73 @public_api
74 def run(
75 self,
(…)
83 **kwargs,
84 ):
85 “”“Public entrypoint GX uses to trigger a ValidationAction.
86
87 When a ValidationAction is configured in a Checkpoint, this method gets called
(…)
99 A Dict describing the result of the Action.
100 “””
→ 101 return self._run(
102 validation_result_suite=validation_result_suite,
103 validation_result_suite_identifier=validation_result_suite_identifier,
104 data_asset=data_asset,
105 expectation_suite_identifier=expectation_suite_identifier,
106 checkpoint_identifier=checkpoint_identifier,
107 **kwargs,
108 )

File ~/miniconda/lib/python3.12/site-packages/great_expectations/checkpoint/actions.py:1190, in UpdateDataDocsAction._run(self, validation_result_suite, validation_result_suite_identifier, data_asset, payload, expectation_suite_identifier, checkpoint_identifier)
1184 raise TypeError(
1185 f"validation_result_id must be of type ValidationResultIdentifier or GeCloudIdentifier, not {type(validation_result_suite_identifier)}"
1186 )
1188 # TODO Update for RenderedDataDocs
1189 # build_data_docs will return the index page for the validation results, but we want to return the url for the validation result using the code below
→ 1190 self.data_context.build_data_docs(
1191 site_names=self._site_names,
1192 resource_identifiers=[
1193 validation_result_suite_identifier,
1194 expectation_suite_identifier,
1195 ],
1196 )
1197 #
1198 data_docs_validation_results: dict = {}

File ~/miniconda/lib/python3.12/site-packages/great_expectations/core/usage_statistics/usage_statistics.py:266, in usage_statistics_enabled_method..usage_statistics_wrapped_method(*args, **kwargs)
263 args_payload = args_payload_fn(*args, **kwargs) or {}
264 nested_update(event_payload, args_payload)
→ 266 result = func(*args, **kwargs)
267 message[“success”] = True
268 except Exception:

File ~/miniconda/lib/python3.12/site-packages/great_expectations/data_context/data_context/abstract_data_context.py:5318, in AbstractDataContext.build_data_docs(self, site_names, resource_identifiers, dry_run, build_index)
5279 @usage_statistics_enabled_method(
5280 event_name=UsageStatsEvents.DATA_CONTEXT_BUILD_DATA_DOCS,
5281 )
(…)
5290 build_index: bool = True,
5291 ) → dict[str, str]:
5292 “”“Build Data Docs for your project.
5293
5294 --Documentation–
(…)
5316 ClassInstantiationError: Site config in your Data Context config is not valid.
5317 “””
→ 5318 return self._build_data_docs(
5319 site_names=site_names,
5320 resource_identifiers=resource_identifiers,
5321 dry_run=dry_run,
5322 build_index=build_index,
5323 )

File ~/miniconda/lib/python3.12/site-packages/great_expectations/data_context/data_context/abstract_data_context.py:5349, in AbstractDataContext._build_data_docs(self, site_names, resource_identifiers, dry_run, build_index)
5346 complete_site_config = site_config
5347 module_name = “great_expectations.render.renderer.site_builder”
5348 site_builder: SiteBuilder = (
→ 5349 self._init_site_builder_for_data_docs_site_creation(
5350 site_name=site_name,
5351 site_config=site_config,
5352 )
5353 )
5354 if not site_builder:
5355 raise gx_exceptions.ClassInstantiationError(
5356 module_name=module_name,
5357 package_name=None,
5358 class_name=complete_site_config[“class_name”],
5359 )

File ~/miniconda/lib/python3.12/site-packages/great_expectations/data_context/data_context/abstract_data_context.py:5384, in AbstractDataContext._init_site_builder_for_data_docs_site_creation(self, site_name, site_config)
5379 def _init_site_builder_for_data_docs_site_creation(
5380 self,
5381 site_name: str,
5382 site_config: dict,
5383 ) → SiteBuilder:
→ 5384 site_builder: SiteBuilder = instantiate_class_from_config(
5385 config=site_config,
5386 runtime_environment={
5387 “data_context”: self,
5388 “root_directory”: self.root_directory,
5389 “site_name”: site_name,
5390 },
5391 config_defaults={
5392 “class_name”: “SiteBuilder”,
5393 “module_name”: “great_expectations.render.renderer.site_builder”,
5394 },
5395 )
5396 return site_builder

File ~/miniconda/lib/python3.12/site-packages/great_expectations/data_context/util.py:92, in instantiate_class_from_config(config, runtime_environment, config_defaults)
90 class_instance = class_(**config_with_defaults)
91 except TypeError as e:
—> 92 raise TypeError(
93 f"Couldn’t instantiate class: {class_name} with config: \n\t{format_dict_for_error_message(config_with_defaults)}\n \n"
94 + str(e)
95 )
97 return class_instance

TypeError: Couldn’t instantiate class: SiteBuilder with config:
S3_site {‘class_name’: ‘SiteBuilder’, ‘store_backend’: {‘class_name’: ‘TupleS3StoreBackend’, ‘bucket’: ‘de-c2w3a1-359994195626-us-east-1-gx-docs’}, ‘site_index_builder’: {‘class_name’: ‘DefaultSiteIndexBuilder’}}
data_context {
“anonymous_usage_statistics”: {
“explicit_id”: true,
“data_context_id”: “243df769-99f4-460f-8029-9b26d719854f”,
“usage_statistics_url”: “https://stats.greatexpectations.io/great_expectations/v1/usage_statistics”,
“enabled”: true,
“explicit_url”: false
},
“checkpoint_store_name”: “checkpoint_store”,
“config_variables_file_path”: “uncommitted/config_variables.yml”,
“config_version”: 3.0,
“data_docs_sites”: {
“local_site”: {
“S3_site”: {
“class_name”: “SiteBuilder”,
“store_backend”: {
“class_name”: “TupleS3StoreBackend”,
“bucket”: “de-c2w3a1-359994195626-us-east-1-gx-docs”
},
“site_index_builder”: {
“class_name”: “DefaultSiteIndexBuilder”
}
}
}
},
“datasources”: {},
“evaluation_parameter_store_name”: “evaluation_parameter_store”,
“expectations_store_name”: “expectations_store”,
“fluent_datasources”: {},
“include_rendered_content”: {
“globally”: false,
“expectation_validation_result”: false,
“expectation_suite”: false
},
“plugins_directory”: “plugins/”,
“stores”: {
“expectations_store”: {
“class_name”: “ExpectationsStore”,
“store_backend”: {
“class_name”: “TupleS3StoreBackend”,
“bucket”: “de-c2w3a1-359994195626-us-east-1-gx-artifacts”,
“prefix”: “expectations/”
}
},
“validations_store”: {
“class_name”: “ValidationsStore”,
“store_backend”: {
“class_name”: “TupleS3StoreBackend”,
“bucket”: “de-c2w3a1-359994195626-us-east-1-gx-artifacts”,
“prefix”: “validations/”
}
},
“evaluation_parameter_store”: {
“class_name”: “EvaluationParameterStore”
},
“checkpoint_store”: {
“class_name”: “CheckpointStore”,
“store_backend”: {
“class_name”: “TupleS3StoreBackend”,
“suppress_store_backend_id”: false,
“bucket”: “de-c2w3a1-359994195626-us-east-1-gx-artifacts”,
“prefix”: “checkpoints/”
}
},
“profiler_store”: {
“class_name”: “ProfilerStore”,
“store_backend”: {
“class_name”: “TupleFilesystemStoreBackend”,
“suppress_store_backend_id”: true,
“base_directory”: “profilers/”
}
}
},
“validations_store_name”: “validations_store”
}
site_name local_site
runtime_environment {‘data_context’: {
“anonymous_usage_statistics”: {
“explicit_id”: true,
“data_context_id”: “243df769-99f4-460f-8029-9b26d719854f”,
“usage_statistics_url”: “https://stats.greatexpectations.io/great_expectations/v1/usage_statistics”,
“enabled”: true,
“explicit_url”: false
},
“checkpoint_store_name”: “checkpoint_store”,
“config_variables_file_path”: “uncommitted/config_variables.yml”,
“config_version”: 3.0,
“data_docs_sites”: {
“local_site”: {
“S3_site”: {
“class_name”: “SiteBuilder”,
“store_backend”: {
“class_name”: “TupleS3StoreBackend”,
“bucket”: “de-c2w3a1-359994195626-us-east-1-gx-docs”
},
“site_index_builder”: {
“class_name”: “DefaultSiteIndexBuilder”
}
}
}
},
“datasources”: {},
“evaluation_parameter_store_name”: “evaluation_parameter_store”,
“expectations_store_name”: “expectations_store”,
“fluent_datasources”: {},
“include_rendered_content”: {
“globally”: false,
“expectation_validation_result”: false,
“expectation_suite”: false
},
“plugins_directory”: “plugins/”,
“stores”: {
“expectations_store”: {
“class_name”: “ExpectationsStore”,
“store_backend”: {
“class_name”: “TupleS3StoreBackend”,
“bucket”: “de-c2w3a1-359994195626-us-east-1-gx-artifacts”,
“prefix”: “expectations/”
}
},
“validations_store”: {
“class_name”: “ValidationsStore”,
“store_backend”: {
“class_name”: “TupleS3StoreBackend”,
“bucket”: “de-c2w3a1-359994195626-us-east-1-gx-artifacts”,
“prefix”: “validations/”
}
},
“evaluation_parameter_store”: {
“class_name”: “EvaluationParameterStore”
},
“checkpoint_store”: {
“class_name”: “CheckpointStore”,
“store_backend”: {
“class_name”: “TupleS3StoreBackend”,
“suppress_store_backend_id”: false,
“bucket”: “de-c2w3a1-359994195626-us-east-1-gx-artifacts”,
“prefix”: “checkpoints/”
}
},
“profiler_store”: {
“class_name”: “ProfilerStore”,
“store_backend”: {
“class_name”: “TupleFilesystemStoreBackend”,
“suppress_store_backend_id”: true,
“base_directory”: “profilers/”
}
}
},
“validations_store_name”: “validations_store”
}, ‘root_directory’: ‘/home/coder/project/gx’, ‘site_name’: ‘local_site’}

SiteBuilder.init() missing 1 required positional argument: ‘store_backend’

All the following cells failed to run correctly as well, including a 404 error on visiting the S3 docs bucket URL.

Did you changed or remove a cell? When you run the lab, before submitting did it passed?

No, not I am aware of, is there any way I could reset this file: C2_W3_Assignment.ipynb? The last bit didn’t pass the grader, output is attched in screenshot:

Yes, if you stopped the lab and wait and restart you will get a new environment. For reseting the notebook you will loss all your progress.

it looks you didnt properly set the s3 task, can you review again the instructions against your code

Did you mean “Reboot Server”? I might have tried that, would it be possible to send me an blank C2_W3_Assignment.ipynb file just in case the Reboot won’t reset the files? I will probably save a copy of my version before doing that

Yes, one approach that you can take is to save the lab and all files in a different folder and reset the lab, this will create a fresh new copy of the files without any of the work you did.