C2W3 assignment 3:Testing Data Quality with Great Expectation

AQ_2023 · September 30, 2024, 3:56pm

Course 2 Week 3 : I am doing the Great Expectation Lab.
Desipte meeting doing all steps until checkpoint.run (which fails and before steps I do not get an error).
When I do submit the assignment. The grading says until checkpoint.run I did not do any of steps. When one check the S3 buckets. I have expectation suite, S3 artifacts buckets, validation folders created but they are all ignored. I do not know why my checkpoint run fails but even due to that I should be getting 85/100 but I only get 30/100. I have done the lab twice. Additional note that yaml.file parameters to be setup are incorrect. As I was getting constant error when i added the bucket parameter because class_name provided in the parameter file is incorrect. Please could you have the look at the exercise. As I am unable to go forward due to me failing this assignment. Thanks
When creating a post, please add:

Module # must be added in the tags option of the post. (ex: Module-1)
Link to the classroom item you are referring to:
Description (include relevant info but please do not post solution code or your entire notebook)

Georgios · September 30, 2024, 4:43pm

Hello @AQ_2023 , I coudnt reproduce any errors especially in the yml file. Sorry for trying twice with no success but make sure to replace the correct stores with the two different bucket names. Could you post your submission report to check further, it should look like this:

AQ_2023 · September 30, 2024, 5:05pm

Hello Georgios, Thank for your fast reply.

In the (configuration file) yaml file for example. The guide has set in the yaml file the expectations_store
the class_name : TupleFilesystemStoreBackend

this apparently result in an error when the bucket parameter is set.
When change class_name to this type : TupleS3StoreBackend

and also make sure that the
base_directory: expectations/
is changed
to
prefix: expectations/

Then the yaml file worked otherwise I would continuously get error

[Executed at: Mon Sep 30 8:47:54 PDT 2024]

Test 1 passed: Created Cloud9 environment.

Test 2 failed: No graded exercises found in the submission notebook. Please try again.

Test 3 failed: No graded exercises found in the submission notebook. Please try again.

Test 4 failed: The expectation file does not exist.

Test 5 failed: No graded exercises found in the submission notebook. Please try again.

Test 6 passed: The checkpoint file exists with the correct content.

Test 7 failed: No folders found in S3 docs bucket. Please try again.

Georgios · September 30, 2024, 6:35pm

Hello thanks for the information. Could you try copying the yaml to a notebook just in case. In the new file you need “TupleS3StoreBackend” and “prefix: expectations”, you only change the artifact bucket (3 places) and docs bucket (one place).

In case of an error you can revert back to the copy you made so you wont have to repeat everything.

Hint: The blocks in the instructions should have the correct indentation so check for the original file, it has to look identical. Finally add the names of the buckets and change nothing else. Thanks

AQ_2023 · September 30, 2024, 10:03pm

I have the issue when I created the solution for validation
and when batch loops I get too much output.
This is the synthax I have used.

validations = [
{“batch_request”:batches, “expectation_suite_name”: “expectation_suite_name”}
for batch in batches
]
validations

This is my output.

[{‘batch_request’: [Batch(datasource=SQLDatasource(type=‘sql’, name=‘de-c2w3a1-db-datasource’, id=None, assets=[TableAsset(name=‘de-c2w3a1-trips’, type=‘table’, id=None, order_by=, batch_metadata={}, splitter=SplitterColumnValue(column_name=‘vendor_id’, method_name=‘split_on_column_value’), table_name=‘trips’, schema_name=None)], connection_string=ConfigStr(‘{MYSQL_CONNECTION_STRING}'), create_temp_table=False, kwargs={}), data_asset=TableAsset(name='de-c2w3a1-trips', type='table', id=None, order_by=[], batch_metadata={}, splitter=SplitterColumnValue(column_name='vendor_id', method_name='split_on_column_value'), table_name='trips', schema_name=None), batch_request=BatchRequest(datasource_name='de-c2w3a1-db-datasource', data_asset_name='de-c2w3a1-trips', options={'vendor_id': 1}), data=<great_expectations.execution_engine.sqlalchemy_batch_data.SqlAlchemyBatchData object at 0x7fadd0861700>, id='de-c2w3a1-db-datasource-de-c2w3a1-trips-vendor_id_1', metadata={'vendor_id': 1}, batch_markers={'ge_load_time': '20240930T204247.156885Z'}, batch_spec={'type': 'table', 'data_asset_name': 'de-c2w3a1-trips', 'table_name': 'trips', 'schema_name': None, 'batch_identifiers': {'vendor_id': 1}, 'splitter_method': 'split_on_column_value', 'splitter_kwargs': {'column_name': 'vendor_id'}}, batch_definition={'datasource_name': 'de-c2w3a1-db-datasource', 'data_connector_name': 'fluent', 'data_asset_name': 'de-c2w3a1-trips', 'batch_identifiers': {'vendor_id': 1}}), Batch(datasource=SQLDatasource(type='sql', name='de-c2w3a1-db-datasource', id=None, assets=[TableAsset(name='de-c2w3a1-trips', type='table', id=None, order_by=[], batch_metadata={}, splitter=SplitterColumnValue(column_name='vendor_id', method_name='split_on_column_value'), table_name='trips', schema_name=None)], connection_string=ConfigStr('{MYSQL_CONNECTION_STRING}’), create_temp_table=False, kwargs={}), data_asset=TableAsset(name=‘de-c2w3a1-trips’, type=‘table’, id=None, order_by=, batch_metadata={}, splitter=SplitterColumnValue(column_name=‘vendor_id’, method_name=‘split_on_column_value’), table_name=‘trips’, schema_name=None), batch_request=BatchRequest(datasource_name=‘de-c2w3a1-db-datasource’, data_asset_name=‘de-c2w3a1-trips’, options={‘vendor_id’: 2}), data=<great_expectations.execution_engine.sqlalchemy_batch_data.SqlAlchemyBatchData object at 0x7fadd0861f10>, id=‘de-c2w3a1-db-datasource-de-c2w3a1-trips-vendor_id_2’, metadata={‘vendor_id’: 2}, batch_markers={‘ge_load_time’: ‘20240930T204247.158778Z’}, batch_spec={‘type’: ‘table’, ‘data_asset_name’: ‘de-c2w3a1-trips’, ‘table_name’: ‘trips’, ‘schema_name’: None, ‘batch_identifiers’: {‘vendor_id’: 2}, ‘splitter_method’: ‘split_on_column_value’, ‘splitter_kwargs’: {‘column_name’: ‘vendor_id’}}, batch_definition={‘datasource_name’: ‘de-c2w3a1-db-datasource’, ‘data_connector_name’: ‘fluent’, ‘data_asset_name’: ‘de-c2w3a1-trips’, ‘batch_identifiers’: {‘vendor_id’: 2}}),
Batch(datasource=SQLDatasource(type=‘sql’, name=‘de-c2w3a1-db-datasource’, id=None, assets=[TableAsset(name=‘de-c2w3a1-trips’, type=‘table’, id=None, order_by=, batch_metadata={}, splitter=SplitterColumnValue(column_name=‘vendor_id’, method_name=‘split_on_column_value’), table_name=‘trips’, schema_name=None)], connection_string=ConfigStr(‘{MYSQL_CONNECTION_STRING}'), create_temp_table=False, kwargs={}), data_asset=TableAsset(name='de-c2w3a1-trips', type='table', id=None, order_by=[], batch_metadata={}, splitter=SplitterColumnValue(column_name='vendor_id', method_name='split_on_column_value'), table_name='trips', schema_name=None), batch_request=BatchRequest(datasource_name='de-c2w3a1-db-datasource', data_asset_name='de-c2w3a1-trips', options={'vendor_id': 4}), data=<great_expectations.execution_engine.sqlalchemy_batch_data.SqlAlchemyBatchData object at 0x7fadd08611f0>, id='de-c2w3a1-db-datasource-de-c2w3a1-trips-vendor_id_4', metadata={'vendor_id': 4}, batch_markers={'ge_load_time': '20240930T204247.161007Z'}, batch_spec={'type': 'table', 'data_asset_name': 'de-c2w3a1-trips', 'table_name': 'trips', 'schema_name': None, 'batch_identifiers': {'vendor_id': 4}, 'splitter_method': 'split_on_column_value', 'splitter_kwargs': {'column_name': 'vendor_id'}}, batch_definition={'datasource_name': 'de-c2w3a1-db-datasource', 'data_connector_name': 'fluent', 'data_asset_name': 'de-c2w3a1-trips', 'batch_identifiers': {'vendor_id': 4}})], 'expectation_suite_name': 'expectation_suite_name'}, {'batch_request': [Batch(datasource=SQLDatasource(type='sql', name='de-c2w3a1-db-datasource', id=None, assets=[TableAsset(name='de-c2w3a1-trips', type='table', id=None, order_by=[], batch_metadata={}, splitter=SplitterColumnValue(column_name='vendor_id', method_name='split_on_column_value'), table_name='trips', schema_name=None)], connection_string=ConfigStr('{MYSQL_CONNECTION_STRING}’), create_temp_table=False, kwargs={}), data_asset=TableAsset(name=‘de-c2w3a1-trips’, type=‘table’, id=None, order_by=, batch_metadata={}, splitter=SplitterColumnValue(column_name=‘vendor_id’, method_name=‘split_on_column_value’), table_name=‘trips’, schema_name=None), batch_request=BatchRequest(datasource_name=‘de-c2w3a1-db-datasource’, data_asset_name=‘de-c2w3a1-trips’, options={‘vendor_id’: 1}), data=<great_expectations.execution_engine.sqlalchemy_batch_data.SqlAlchemyBatchData object at 0x7fadd0861700>, id=‘de-c2w3a1-db-datasource-de-c2w3a1-trips-vendor_id_1’, metadata={‘vendor_id’: 1}, batch_markers={‘ge_load_time’: ‘20240930T204247.156885Z’}, batch_spec={‘type’: ‘table’, ‘data_asset_name’: ‘de-c2w3a1-trips’, ‘table_name’: ‘trips’, ‘schema_name’: None, ‘batch_identifiers’: {‘vendor_id’: 1}, ‘splitter_method’: ‘split_on_column_value’, ‘splitter_kwargs’: {‘column_name’: ‘vendor_id’}}, batch_definition={‘datasource_name’: ‘de-c2w3a1-db-datasource’, ‘data_connector_name’: ‘fluent’, ‘data_asset_name’: ‘de-c2w3a1-trips’, ‘batch_identifiers’: {‘vendor_id’: 1}}),
Batch(datasource=SQLDatasource(type=‘sql’, name=‘de-c2w3a1-db-datasource’, id=None, assets=[TableAsset(name=‘de-c2w3a1-trips’, type=‘table’, id=None, order_by=, batch_metadata={}, splitter=SplitterColumnValue(column_name=‘vendor_id’, method_name=‘split_on_column_value’), table_name=‘trips’, schema_name=None)], connection_string=ConfigStr(‘{MYSQL_CONNECTION_STRING}'), create_temp_table=False, kwargs={}), data_asset=TableAsset(name='de-c2w3a1-trips', type='table', id=None, order_by=[], batch_metadata={}, splitter=SplitterColumnValue(column_name='vendor_id', method_name='split_on_column_value'), table_name='trips', schema_name=None), batch_request=BatchRequest(datasource_name='de-c2w3a1-db-datasource', data_asset_name='de-c2w3a1-trips', options={'vendor_id': 2}), data=<great_expectations.execution_engine.sqlalchemy_batch_data.SqlAlchemyBatchData object at 0x7fadd0861f10>, id='de-c2w3a1-db-datasource-de-c2w3a1-trips-vendor_id_2', metadata={'vendor_id': 2}, batch_markers={'ge_load_time': '20240930T204247.158778Z'}, batch_spec={'type': 'table', 'data_asset_name': 'de-c2w3a1-trips', 'table_name': 'trips', 'schema_name': None, 'batch_identifiers': {'vendor_id': 2}, 'splitter_method': 'split_on_column_value', 'splitter_kwargs': {'column_name': 'vendor_id'}}, batch_definition={'datasource_name': 'de-c2w3a1-db-datasource', 'data_connector_name': 'fluent', 'data_asset_name': 'de-c2w3a1-trips', 'batch_identifiers': {'vendor_id': 2}}), Batch(datasource=SQLDatasource(type='sql', name='de-c2w3a1-db-datasource', id=None, assets=[TableAsset(name='de-c2w3a1-trips', type='table', id=None, order_by=[], batch_metadata={}, splitter=SplitterColumnValue(column_name='vendor_id', method_name='split_on_column_value'), table_name='trips', schema_name=None)], connection_string=ConfigStr('{MYSQL_CONNECTION_STRING}’), create_temp_table=False, kwargs={}), data_asset=TableAsset(name=‘de-c2w3a1-trips’, type=‘table’, id=None, order_by=, batch_metadata={}, splitter=SplitterColumnValue(column_name=‘vendor_id’, method_name=‘split_on_column_value’), table_name=‘trips’, schema_name=None), batch_request=BatchRequest(datasource_name=‘de-c2w3a1-db-datasource’, data_asset_name=‘de-c2w3a1-trips’, options={‘vendor_id’: 4}), data=<great_expectations.execution_engine.sqlalchemy_batch_data.SqlAlchemyBatchData object at 0x7fadd08611f0>, id=‘de-c2w3a1-db-datasource-de-c2w3a1-trips-vendor_id_4’, metadata={‘vendor_id’: 4}, batch_markers={‘ge_load_time’: ‘20240930T204247.161007Z’}, batch_spec={‘type’: ‘table’, ‘data_asset_name’: ‘de-c2w3a1-trips’, ‘table_name’: ‘trips’, ‘schema_name’: None, ‘batch_identifiers’: {‘vendor_id’: 4}, ‘splitter_method’: ‘split_on_column_value’, ‘splitter_kwargs’: {‘column_name’: ‘vendor_id’}}, batch_definition={‘datasource_name’: ‘de-c2w3a1-db-datasource’, ‘data_connector_name’: ‘fluent’, ‘data_asset_name’: ‘de-c2w3a1-trips’, ‘batch_identifiers’: {‘vendor_id’: 4}})],
‘expectation_suite_name’: ‘expectation_suite_name’},
{‘batch_request’: [Batch(datasource=SQLDatasource(type=‘sql’, name=‘de-c2w3a1-db-datasource’, id=None, assets=[TableAsset(name=‘de-c2w3a1-trips’, type=‘table’, id=None, order_by=, batch_metadata={}, splitter=SplitterColumnValue(column_name=‘vendor_id’, method_name=‘split_on_column_value’), table_name=‘trips’, schema_name=None)], connection_string=ConfigStr(‘{MYSQL_CONNECTION_STRING}'), create_temp_table=False, kwargs={}), data_asset=TableAsset(name='de-c2w3a1-trips', type='table', id=None, order_by=[], batch_metadata={}, splitter=SplitterColumnValue(column_name='vendor_id', method_name='split_on_column_value'), table_name='trips', schema_name=None), batch_request=BatchRequest(datasource_name='de-c2w3a1-db-datasource', data_asset_name='de-c2w3a1-trips', options={'vendor_id': 1}), data=<great_expectations.execution_engine.sqlalchemy_batch_data.SqlAlchemyBatchData object at 0x7fadd0861700>, id='de-c2w3a1-db-datasource-de-c2w3a1-trips-vendor_id_1', metadata={'vendor_id': 1}, batch_markers={'ge_load_time': '20240930T204247.156885Z'}, batch_spec={'type': 'table', 'data_asset_name': 'de-c2w3a1-trips', 'table_name': 'trips', 'schema_name': None, 'batch_identifiers': {'vendor_id': 1}, 'splitter_method': 'split_on_column_value', 'splitter_kwargs': {'column_name': 'vendor_id'}}, batch_definition={'datasource_name': 'de-c2w3a1-db-datasource', 'data_connector_name': 'fluent', 'data_asset_name': 'de-c2w3a1-trips', 'batch_identifiers': {'vendor_id': 1}}), Batch(datasource=SQLDatasource(type='sql', name='de-c2w3a1-db-datasource', id=None, assets=[TableAsset(name='de-c2w3a1-trips', type='table', id=None, order_by=[], batch_metadata={}, splitter=SplitterColumnValue(column_name='vendor_id', method_name='split_on_column_value'), table_name='trips', schema_name=None)], connection_string=ConfigStr('{MYSQL_CONNECTION_STRING}’), create_temp_table=False, kwargs={}), data_asset=TableAsset(name=‘de-c2w3a1-trips’, type=‘table’, id=None, order_by=, batch_metadata={}, splitter=SplitterColumnValue(column_name=‘vendor_id’, method_name=‘split_on_column_value’), table_name=‘trips’, schema_name=None), batch_request=BatchRequest(datasource_name=‘de-c2w3a1-db-datasource’, data_asset_name=‘de-c2w3a1-trips’, options={‘vendor_id’: 2}), data=<great_expectations.execution_engine.sqlalchemy_batch_data.SqlAlchemyBatchData object at 0x7fadd0861f10>, id=‘de-c2w3a1-db-datasource-de-c2w3a1-trips-vendor_id_2’, metadata={‘vendor_id’: 2}, batch_markers={‘ge_load_time’: ‘20240930T204247.158778Z’}, batch_spec={‘type’: ‘table’, ‘data_asset_name’: ‘de-c2w3a1-trips’, ‘table_name’: ‘trips’, ‘schema_name’: None, ‘batch_identifiers’: {‘vendor_id’: 2}, ‘splitter_method’: ‘split_on_column_value’, ‘splitter_kwargs’: {‘column_name’: ‘vendor_id’}}, batch_definition={‘datasource_name’: ‘de-c2w3a1-db-datasource’, ‘data_connector_name’: ‘fluent’, ‘data_asset_name’: ‘de-c2w3a1-trips’, ‘batch_identifiers’: {‘vendor_id’: 2}}),
Batch(datasource=SQLDatasource(type=‘sql’, name=‘de-c2w3a1-db-datasource’, id=None, assets=[TableAsset(name=‘de-c2w3a1-trips’, type=‘table’, id=None, order_by=, batch_metadata={}, splitter=SplitterColumnValue(column_name=‘vendor_id’, method_name=‘split_on_column_value’), table_name=‘trips’, schema_name=None)], connection_string=ConfigStr(‘${MYSQL_CONNECTION_STRING}’), create_temp_table=False, kwargs={}), data_asset=TableAsset(name=‘de-c2w3a1-trips’, type=‘table’, id=None, order_by=, batch_metadata={}, splitter=SplitterColumnValue(column_name=‘vendor_id’, method_name=‘split_on_column_value’), table_name=‘trips’, schema_name=None), batch_request=BatchRequest(datasource_name=‘de-c2w3a1-db-datasource’, data_asset_name=‘de-c2w3a1-trips’, options={‘vendor_id’: 4}), data=<great_expectations.execution_engine.sqlalchemy_batch_data.SqlAlchemyBatchData object at 0x7fadd08611f0>, id=‘de-c2w3a1-db-datasource-de-c2w3a1-trips-vendor_id_4’, metadata={‘vendor_id’: 4}, batch_markers={‘ge_load_time’: ‘20240930T204247.161007Z’}, batch_spec={‘type’: ‘table’, ‘data_asset_name’: ‘de-c2w3a1-trips’, ‘table_name’: ‘trips’, ‘schema_name’: None, ‘batch_identifiers’: {‘vendor_id’: 4}, ‘splitter_method’: ‘split_on_column_value’, ‘splitter_kwargs’: {‘column_name’: ‘vendor_id’}}, batch_definition={‘datasource_name’: ‘de-c2w3a1-db-datasource’, ‘data_connector_name’: ‘fluent’, ‘data_asset_name’: ‘de-c2w3a1-trips’, ‘batch_identifiers’: {‘vendor_id’: 4}})],
‘expectation_suite_name’: ‘expectation_suite_name’}]

AQ_2023 · September 30, 2024, 10:05pm

Additonally I also using context exercise with the following code

context.add_or_update_checkpoint(checkpoint_name)

get this output which is less than what is expected from the assignment output shown.

{
“action_list”: [
{
“name”: “store_validation_result”,
“action”: {
“class_name”: “StoreValidationResultAction”
}
},
{
“name”: “store_evaluation_params”,
“action”: {
“class_name”: “StoreEvaluationParametersAction”
}
},
{
“name”: “update_data_docs”,
“action”: {
“class_name”: “UpdateDataDocsAction”
}
}
],
“batch_request”: {},
“class_name”: “Checkpoint”,
“config_version”: 1.0,
“evaluation_parameters”: {},
“module_name”: “great_expectations.checkpoint”,
“name”: “de-c2w3a1-checkpoint-trips-1727729147.8883522”,
“profilers”: ,
“runtime_configuration”: {},
“validations”:
}

AQ_2023 · September 30, 2024, 10:08pm

checkpoint_result=checkpoint.run()

results in an error I do not know what is the reason for that. Here is the error message :
checkpoint_result = checkpoint.run()

TypeError Traceback (most recent call last)
Cell In[18], line 1
----> 1 checkpoint_result = checkpoint.run()
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/core/usage_statistics/usage_statistics.py:266, in usage_statistics_enabled_method..usage_statistics_wrapped_method(*args, **kwargs)
263 args_payload = args_payload_fn(*args, **kwargs) or {}
264 nested_update(event_payload, args_payload)
→ 266 result = func(*args, **kwargs)
267 message[“success”] = True
268 except Exception:
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/checkpoint/checkpoint.py:265, in BaseCheckpoint.run(self, template_name, run_name_template, expectation_suite_name, batch_request, validator, action_list, evaluation_parameters, runtime_configuration, validations, profilers, run_id, run_name, run_time, result_format, expectation_suite_ge_cloud_id)
248 validations = get_validations_with_batch_request_as_dict(
249 validations=validations
250 )
252 runtime_kwargs: dict = {
253 “template_name”: template_name,
254 “run_name_template”: run_name_template,
(…)
262 “expectation_suite_ge_cloud_id”: expectation_suite_ge_cloud_id,
263 }
→ 265 substituted_runtime_config: dict = self.get_substituted_config(
266 runtime_kwargs=runtime_kwargs
267 )
269 run_name_template = substituted_runtime_config.get(“run_name_template”)
271 batch_request = substituted_runtime_config.get(“batch_request”)
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/checkpoint/checkpoint.py:370, in BaseCheckpoint.get_substituted_config(self, runtime_kwargs)
367 if runtime_kwargs is None:
368 runtime_kwargs = {}
→ 370 config_kwargs: dict = self.get_config(mode=ConfigOutputModes.JSON_DICT) # type: ignore[assignment] # always returns a dict
372 template_name: str | None = runtime_kwargs.get(“template_name”)
373 if template_name:
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/core/config_peer.py:69, in ConfigPeer.get_config(self, mode, **kwargs)
67 config_kwargs: dict = config.to_dict()
68 elif mode == ConfigOutputModes.JSON_DICT:
—> 69 config_kwargs = config.to_json_dict()
70 else:
71 raise ValueError(f’Unknown mode {mode} in “BaseCheckpoint.get_config()”.')
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/types/base.py:3065, in CheckpointConfig.to_json_dict(self)
3055 “”“Returns a JSON-serializable dict representation of this CheckpointConfig.
3056
3057 Returns:
3058 A JSON-serializable dict representation of this CheckpointConfig.
3059 “””
3060 # # TODO: 2/4/2022
3061 # This implementation of "SerializableDictDot.to_json_dict() occurs frequently and should ideally serve as the
3062 # reference implementation in the “SerializableDictDot” class itself. However, the circular import dependencies,
3063 # due to the location of the “great_expectations/types/init.py” and “great_expectations/core/util.py” modules
3064 # make this refactoring infeasible at the present time.
→ 3065 dict_obj: dict = self.to_dict()
3066 serializeable_dict: dict = convert_to_json_serializable(data=dict_obj)
3067 return serializeable_dict
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/types/init.py:137, in DictDot.to_dict(self)
134 def to_dict(self) → dict:
135 new_dict = {
136 key: self[key]
→ 137 for key in self.property_names(
138 include_keys=self.include_field_names,
139 exclude_keys=self.exclude_field_names,
140 )
141 }
142 for key, value in new_dict.items():
143 if isinstance(value, pydantic.BaseModel):
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/types/init.py:230, in DictDot.property_names(self, include_keys, exclude_keys)
224 keys_for_exclusion.extend(
225 [key for key in property_names if key not in include_keys]
226 )
228 if exclude_keys:
229 # Make sure that all properties, marked for exclusion, actually exist on the object.
→ 230 assert_valid_keys(keys=exclude_keys, purpose=“exclusion”)
231 keys_for_exclusion.extend(
232 [key for key in property_names if key in exclude_keys]
233 )
235 keys_for_exclusion = list(set(keys_for_exclusion))
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/types/init.py:212, in DictDot.property_names..assert_valid_keys(keys, purpose)
210 for name in keys:
211 try:
→ 212 _ = self[name]
213 except AttributeError:
214 try:
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/types/init.py:70, in DictDot.getitem(self, item)
68 if isinstance(item, int):
69 return list(self.dict.keys())[item]
—> 70 return getattr(self, item)
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/types/base.py:181, in BaseYamlConfig.commented_map(self)
179 @property
180 def commented_map(self) → CommentedMap:
→ 181 return self._get_schema_validated_updated_commented_map()
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/types/base.py:152, in BaseYamlConfig._get_schema_validated_updated_commented_map(self)
150 def _get_schema_validated_updated_commented_map(self) → CommentedMap:
151 commented_map: CommentedMap = copy.deepcopy(self._commented_map)
→ 152 schema_validated_map: dict = self._get_schema_instance().dump(self)
153 commented_map.update(schema_validated_map)
154 return commented_map
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/marshmallow/schema.py:547, in Schema.dump(self, obj, many)
545 many = self.many if many is None else bool(many)
546 if self._hooks[PRE_DUMP]:
→ 547 processed_obj = self._invoke_dump_processors(
548 PRE_DUMP, obj, many=many, original_data=obj
549 )
550 else:
551 processed_obj = obj
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/marshmallow/schema.py:1068, in Schema._invoke_dump_processors(self, tag, data, many, original_data)
1062 def _invoke_dump_processors(
1063 self, tag: str, data, *, many: bool, original_data=None
1064 ):
1065 # The pass_many post-dump processors may do things like add an envelope, so
1066 # invoke those after invoking the non-pass_many processors which will expect
1067 # to get a list of items.
→ 1068 data = self._invoke_processors(
1069 tag, pass_many=False, data=data, many=many, original_data=original_data
1070 )
1071 data = self._invoke_processors(
1072 tag, pass_many=True, data=data, many=many, original_data=original_data
1073 )
1074 return data
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/marshmallow/schema.py:1222, in Schema._invoke_processors(self, tag, pass_many, data, many, original_data, **kwargs)
1220 data = processor(data, original_data, many=many, **kwargs)
1221 else:
→ 1222 data = processor(data, many=many, **kwargs)
1223 return data
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/types/base.py:2767, in CheckpointConfigSchema.prepare_dump(self, data, **kwargs)
2765 @pre_dump
2766 def prepare_dump(self, data, **kwargs):
→ 2767 data = copy.deepcopy(data)
2768 for key, value in data.items():
2769 data[key] = convert_to_json_serializable(data=value)
File /usr/lib64/python3.9/copy.py:153, in deepcopy(x, memo, _nil)
151 copier = getattr(x, “deepcopy”, None)
152 if copier is not None:
→ 153 y = copier(memo)
154 else:
155 reductor = dispatch_table.get(cls)
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/types/base.py:3045, in CheckpointConfig.deepcopy(self, memo)
3043 try:
3044 value = self[key]
→ 3045 value_copy = safe_deep_copy(data=value, memo=memo)
3046 setattr(result, key, value_copy)
3047 except AttributeError:
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/types/init.py:266, in safe_deep_copy(data, memo)
263 return data
265 if isinstance(data, (list, tuple)):
→ 266 return [safe_deep_copy(data=element, memo=memo) for element in data]
268 if isinstance(data, dict):
269 return {
270 key: safe_deep_copy(data=value, memo=memo) for key, value in data.items()
271 }
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/types/init.py:266, in (.0)
263 return data
265 if isinstance(data, (list, tuple)):
→ 266 return [safe_deep_copy(data=element, memo=memo) for element in data]
268 if isinstance(data, dict):
269 return {
270 key: safe_deep_copy(data=value, memo=memo) for key, value in data.items()
271 }
File ~/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/types/init.py:274, in safe_deep_copy(data, memo)
269 return {
270 key: safe_deep_copy(data=value, memo=memo) for key, value in data.items()
271 }
273 # noinspection PyArgumentList
→ 274 return copy.deepcopy(data, memo)
File /usr/lib64/python3.9/copy.py:172, in deepcopy(x, memo, _nil)
170 y = x
171 else:
→ 172 y = _reconstruct(x, memo, *rv)
174 # If is its own copy, don’t memoize.
175 if y is not x:
File /usr/lib64/python3.9/copy.py:270, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
268 if state is not None:
269 if deep:
→ 270 state = deepcopy(state, memo)
271 if hasattr(y, ‘setstate’):
272 y.setstate(state)
File /usr/lib64/python3.9/copy.py:146, in deepcopy(x, memo, _nil)
144 copier = _deepcopy_dispatch.get(cls)
145 if copier is not None:
→ 146 y = copier(x, memo)
147 else:
148 if issubclass(cls, type):
File /usr/lib64/python3.9/copy.py:230, in _deepcopy_dict(x, memo, deepcopy)
228 memo[id(x)] = y
229 for key, value in x.items():
→ 230 y[deepcopy(key, memo)] = deepcopy(value, memo)
231 return y
File /usr/lib64/python3.9/copy.py:146, in deepcopy(x, memo, _nil)
144 copier = _deepcopy_dispatch.get(cls)
145 if copier is not None:
→ 146 y = copier(x, memo)
147 else:
148 if issubclass(cls, type):
File /usr/lib64/python3.9/copy.py:205, in _deepcopy_list(x, memo, deepcopy)
203 append = y.append
204 for a in x:
→ 205 append(deepcopy(a, memo))
206 return y
File /usr/lib64/python3.9/copy.py:172, in deepcopy(x, memo, _nil)
170 y = x
171 else:
→ 172 y = _reconstruct(x, memo, *rv)
174 # If is its own copy, don’t memoize.
175 if y is not x:
File /usr/lib64/python3.9/copy.py:270, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
268 if state is not None:
269 if deep:
→ 270 state = deepcopy(state, memo)
271 if hasattr(y, ‘setstate’):
272 y.setstate(state)
File /usr/lib64/python3.9/copy.py:146, in deepcopy(x, memo, _nil)
144 copier = _deepcopy_dispatch.get(cls)
145 if copier is not None:
→ 146 y = copier(x, memo)
147 else:
148 if issubclass(cls, type):
File /usr/lib64/python3.9/copy.py:230, in _deepcopy_dict(x, memo, deepcopy)
228 memo[id(x)] = y
229 for key, value in x.items():
→ 230 y[deepcopy(key, memo)] = deepcopy(value, memo)
231 return y
File /usr/lib64/python3.9/copy.py:146, in deepcopy(x, memo, _nil)
144 copier = _deepcopy_dispatch.get(cls)
145 if copier is not None:
→ 146 y = copier(x, memo)
147 else:
148 if issubclass(cls, type):
File /usr/lib64/python3.9/copy.py:230, in _deepcopy_dict(x, memo, deepcopy)
228 memo[id(x)] = y
229 for key, value in x.items():
→ 230 y[deepcopy(key, memo)] = deepcopy(value, memo)
231 return y
File /usr/lib64/python3.9/copy.py:172, in deepcopy(x, memo, _nil)
170 y = x
171 else:
→ 172 y = _reconstruct(x, memo, *rv)
174 # If is its own copy, don’t memoize.
175 if y is not x:
File /usr/lib64/python3.9/copy.py:270, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
268 if state is not None:
269 if deep:
→ 270 state = deepcopy(state, memo)
271 if hasattr(y, ‘setstate’):
272 y.setstate(state)
[… skipping similar frames: deepcopy at line 146 (2 times), _deepcopy_dict at line 230 (1 times)]
File /usr/lib64/python3.9/copy.py:230, in _deepcopy_dict(x, memo, deepcopy)
228 memo[id(x)] = y
229 for key, value in x.items():
→ 230 y[deepcopy(key, memo)] = deepcopy(value, memo)
231 return y
File /usr/lib64/python3.9/copy.py:146, in deepcopy(x, memo, _nil)
144 copier = _deepcopy_dispatch.get(cls)
145 if copier is not None:
→ 146 y = copier(x, memo)
147 else:
148 if issubclass(cls, type):
File /usr/lib64/python3.9/copy.py:205, in _deepcopy_list(x, memo, deepcopy)
203 append = y.append
204 for a in x:
→ 205 append(deepcopy(a, memo))
206 return y
File /usr/lib64/python3.9/copy.py:172, in deepcopy(x, memo, _nil)
170 y = x
171 else:
→ 172 y = _reconstruct(x, memo, *rv)
174 # If is its own copy, don’t memoize.
175 if y is not x:
File /usr/lib64/python3.9/copy.py:270, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
268 if state is not None:
269 if deep:
→ 270 state = deepcopy(state, memo)
271 if hasattr(y, ‘setstate’):
272 y.setstate(state)
File /usr/lib64/python3.9/copy.py:146, in deepcopy(x, memo, _nil)
144 copier = _deepcopy_dispatch.get(cls)
145 if copier is not None:
→ 146 y = copier(x, memo)
147 else:
148 if issubclass(cls, type):
File /usr/lib64/python3.9/copy.py:230, in _deepcopy_dict(x, memo, deepcopy)
228 memo[id(x)] = y
229 for key, value in x.items():
→ 230 y[deepcopy(key, memo)] = deepcopy(value, memo)
231 return y
File /usr/lib64/python3.9/copy.py:146, in deepcopy(x, memo, _nil)
144 copier = _deepcopy_dispatch.get(cls)
145 if copier is not None:
→ 146 y = copier(x, memo)
147 else:
148 if issubclass(cls, type):
File /usr/lib64/python3.9/copy.py:230, in _deepcopy_dict(x, memo, deepcopy)
228 memo[id(x)] = y
229 for key, value in x.items():
→ 230 y[deepcopy(key, memo)] = deepcopy(value, memo)
231 return y
File /usr/lib64/python3.9/copy.py:172, in deepcopy(x, memo, _nil)
170 y = x
171 else:
→ 172 y = _reconstruct(x, memo, *rv)
174 # If is its own copy, don’t memoize.
175 if y is not x:
File /usr/lib64/python3.9/copy.py:270, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
268 if state is not None:
269 if deep:
→ 270 state = deepcopy(state, memo)
271 if hasattr(y, ‘setstate’):
272 y.setstate(state)
[… skipping similar frames: _deepcopy_dict at line 230 (4 times), deepcopy at line 146 (4 times), _reconstruct at line 270 (2 times), deepcopy at line 172 (2 times)]
File /usr/lib64/python3.9/copy.py:172, in deepcopy(x, memo, _nil)
170 y = x
171 else:
→ 172 y = _reconstruct(x, memo, *rv)
174 # If is its own copy, don’t memoize.
175 if y is not x:
File /usr/lib64/python3.9/copy.py:270, in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
268 if state is not None:
269 if deep:
→ 270 state = deepcopy(state, memo)
271 if hasattr(y, ‘setstate’):
272 y.setstate(state)
File /usr/lib64/python3.9/copy.py:146, in deepcopy(x, memo, _nil)
144 copier = _deepcopy_dispatch.get(cls)
145 if copier is not None:
→ 146 y = copier(x, memo)
147 else:
148 if issubclass(cls, type):
File /usr/lib64/python3.9/copy.py:230, in _deepcopy_dict(x, memo, deepcopy)
228 memo[id(x)] = y
229 for key, value in x.items():
→ 230 y[deepcopy(key, memo)] = deepcopy(value, memo)
231 return y
File /usr/lib64/python3.9/copy.py:161, in deepcopy(x, memo, _nil)
159 reductor = getattr(x, “reduce_ex”, None)
160 if reductor is not None:
→ 161 rv = reductor(4)
162 else:
163 reductor = getattr(x, “reduce”, None)
TypeError: cannot pickle ‘_thread.lock’ object

From http://ec2-3-91-0-65.compute-1.amazonaws.com:8888/lab/tree/gx/Untitled2.ipynb

Georgios · September 30, 2024, 10:09pm

Hello I am not sure why your grader is not giving any points since I can complete without any issues. You can just try to understand how to edit that expectations yaml file. Hope it helps

AQ_2023 · September 30, 2024, 10:10pm

Even in the S3 buckets where I have validation and other outputs in the S3 output the grader ignores and I am really stuck.

AQ_2023 · September 30, 2024, 10:12pm

is the synthax for the validation correct

this is what I had for the validation :

validations = [
{“batch_request”:batches, “expectation_suite_name”: “expectation_suite_name”}
for batch in batches
]
validations

AQ_2023 · October 1, 2024, 7:06am

Could please anyone help. I received my grading and even though I do not get any error when do the exercise expectation suite. The grading gives me zero toward this exercise. I use this synthax :
#В Add an expectation suite name to the context
expectation_suite_name = f"{LAB_PREFIX}-expectation-suite-trips-taxi-db"

START CODE HERE ### (~ 1 line of code)

###None.None(expectation_suite_name=None)
context.add_or_update_expectation_suite(expectation_suite_name=“expectation_suite_name”)

END CODE HERE

And later when I create a checkpoint i get no error using this synthax:

context.add_or_update_checkpoint(checkpoint_name) – no error

but the output is completely different to what the assignment. My output based on executing this synthax :

context.add_or_update_checkpoint(checkpoint_name) – no error

{
“action_list”: [
{
“name”: “store_validation_result”,
“action”: {
“class_name”: “StoreValidationResultAction”
}
},
{
“name”: “store_evaluation_params”,
“action”: {
“class_name”: “StoreEvaluationParametersAction”
}
},
{
“name”: “update_data_docs”,
“action”: {
“class_name”: “UpdateDataDocsAction”
}
}
],
“batch_request”: {},
“class_name”: “Checkpoint”,
“config_version”: 1.0,
“evaluation_parameters”: {},
“module_name”: “great_expectations.checkpoint”,
“name”: “de-c2w3a1-checkpoint-trips-1727729147.8883522”,
“profilers”: ,
“runtime_configuration”: {},
“validations”:
}

Could anyone please tell me if I have done anything wrong with expectation suite and with the checkpoint. Any hint or tips will be highly appreciated. Big Thanks in advance

Georgios · October 1, 2024, 12:59pm

Hello @AQ_2023 ,In exercise 5 you are supposed to get an error because you use the whole batches instead of the batch.batch_request as a parameter. Try fixing your code and re-submit. A screenshot of your submission report is helpful:

AQ_2023 · October 1, 2024, 1:09pm

Hi Georgios,
I did tired to resolve the issue as you mentioned. And this is my submission report.

Thanks

Test 3 failed: Output item in ex02 is incorrect: {‘expectation_suite_name’: ‘expectation_suite_name’, ‘ge_cloud_id’: None, ‘expectations’: , ‘data_asset_type’: None, ‘meta’: {‘great_expectations_version’: ‘0.18.9’}}. Please try again

Georgios · October 1, 2024, 2:23pm

Hello @AQ_2023 yes, the exercise 2 looks good. Did you get any errors after creating the yaml file this time(the code is provided, just change the bucket names)? If you have a bug early you will reproduce errors even correct later, the submission report shows exercise 1 is now correct?

Also, as I said before keep a copy of the yaml file and change the bucket names so you don’t have to try again but only if you are sure its correct, we dont want to reproduce the same bugs.

About exercise 5 you have made a mistake in “batch_request”: None.None try adding batch.batch_request instead of the whole batches. See if it can pass the grader. If not, it might be like exercise 2 correct but there are previous issues.

Feel free to ask for any hints. if you are stuck. Thanks

AQ_2023 · October 1, 2024, 4:14pm

Hi Georgios, No there were no errors after creating the yaml file this time. Thank you for that.
for the batches I used the synthax batch_request.

exercise 6 I used this synthax :
context.add_or_update_checkpoint(checkpoint_name)

but the output I get is quite less compared to what the assignment for exercise 6 :
{
“action_list”: [
{
“name”: “store_validation_result”,
“action”: {
“class_name”: “StoreValidationResultAction”
}
},
{
“name”: “store_evaluation_params”,
“action”: {
“class_name”: “StoreEvaluationParametersAction”
}
},
{
“name”: “update_data_docs”,
“action”: {
“class_name”: “UpdateDataDocsAction”
}
}
],
“batch_request”: {},
“class_name”: “Checkpoint”,
“config_version”: 1.0,
“evaluation_parameters”: {},
“module_name”: “great_expectations.checkpoint”,
“name”: “de-c2w3a1-checkpoint-trips-1727729147.8883522”,
“profilers”: ,
“runtime_configuration”: {},
“validations”:
}

Any tips how I could fix exercise 6 so I can get the exact output as specified in the assignment.
Thanks

Georgios · October 1, 2024, 4:29pm

hello @AQ_2023 , exercise 6 has the same logic with exercise 2. The difference is you use checkpoint instead of expectation_suite. Just use the same context and add_or_update checkpoint and in the parentheses checkpoint=checkpoint as well. Thanks

AQ_2023 · October 1, 2024, 5:45pm

Exercise 6
I just did this synthax as you said.

START CODE HERE ### (~ 1 line of code)

checkpoint.context.add_or_update_checkpoint(checkpoint=checkpoint)

END CODE HERE

and i get this error :

AttributeError Traceback (most recent call last)
Cell In[19], line 4
1 ### START CODE HERE ### (~ 1 line of code)
2
3 #None.None(checkpoint=None)
----> 4 checkpoint.context.add_or_update_checkpoint(checkpoint=checkpoint)
6 ### END CODE HERE ###

AttributeError: ‘Checkpoint’ object has no attribute ‘context’

I do not know how can i resolve is as I followed exactly what you said.

Thanks

AQ_2023 · October 1, 2024, 5:49pm

also when I ran this sythax :

checkpoint_result = checkpoint.run()
i get a hugh error :
rror running action with name update_data_docs
Traceback (most recent call last):
File “/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/util.py”, line 98, in instantiate_class_from_config
class_instance = class_(**config_with_defaults)
TypeError: init() got an unexpected keyword argument ‘base_directory’

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/util.py”, line 98, in instantiate_class_from_config
class_instance = class_(**config_with_defaults)

Georgios · October 1, 2024, 6:23pm

You can try removing the first chekpoint before context, just use context and keep the next of the cell. Hope it will pass exercise 6 now. Thanks

AQ_2023 · October 1, 2024, 7:06pm

Hi Georgios,
Many thanks it worked !!!
Thank you

Topic		Replies	Views
C2W3Testing Data Quality with Great Expectations Source Systems, Data Ingestion, and Pipelines week-module-3 , coursera-platform	3	37	November 17, 2024
Need help : C2W3 assignment 3 Testing data quality with great expectation Source Systems, Data Ingestion, and Pipelines week-module-3 , coursera-platform	1	18	May 30, 2025
c2w3-Week 3 Assignment: Data Quality with Great Expectations Source Systems, Data Ingestion, and Pipelines week-module-3 , coursera-platform	1	66	November 9, 2024
C2W3 Assignment - Great Expectations Source Systems, Data Ingestion, and Pipelines week-module-3 , coursera-platform	2	46	May 6, 2025
C2W3 Assignment: great_expectations.yml file not as stated Source Systems, Data Ingestion, and Pipelines week-module-3 , coursera-platform	1	41	November 28, 2024

C2W3 assignment 3:Testing Data Quality with Great Expectation

Test 1 passed: Created Cloud9 environment.

Test 2 failed: No graded exercises found in the submission notebook. Please try again.

Test 3 failed: No graded exercises found in the submission notebook. Please try again.

Test 4 failed: The expectation file does not exist.

Test 5 failed: No graded exercises found in the submission notebook. Please try again.

Test 6 passed: The checkpoint file exists with the correct content.

Test 7 failed: No folders found in S3 docs bucket. Please try again.

START CODE HERE ### (~ 1 line of code)

END CODE HERE

START CODE HERE ### (~ 1 line of code)

END CODE HERE

and i get this error :

Related topics