Need help to reset the week3 - graded assignment on Testing Data Quality with Great Expectations

Hello @mizou,
As Amir pointed out, we are not allowed to share solutions. If you encounter specific problems on a lab, you can detail the problems you encounter and everyone will be happy to help you find the solution on your own

C2W3A1 - initial syntax in the file: great_expectations.yml which is generated after command: great_expectations init is not according to the supplied lab instructions, keyword bucket is not recognized. Also, key: prefix is not used anymore, looks like that it is replaced by base_directory.

Command: great_expectations store list executes with the following error(s):

(jupyterlab-venv) voclabs:~/environment $ great_expectations store list
Traceback (most recent call last):
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/util.py", line 98, in instantiate_class_from_config
    class_instance = class_(**config_with_defaults)
TypeError: __init__() got an unexpected keyword argument 'bucket'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/util.py", line 98, in instantiate_class_from_config
    class_instance = class_(**config_with_defaults)
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/store/expectations_store.py", line 151, in __init__
    super().__init__(
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/store/store.py", line 87, in __init__
    self._store_backend = instantiate_class_from_config(
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/util.py", line 100, in instantiate_class_from_config
    raise TypeError(
TypeError: Couldn't instantiate class: TupleFilesystemStoreBackend with config: 
        store_name              expectations_store
        bucket          de-c2w3a1-546183455181-us-east-1-gx-artifacts
        base_directory          expectations/
        manually_initialize_store_backend_id            d7853c25-2df6-475f-8963-8ef82548e6b4
        filepath_suffix         .json
        root_directory          /home/ec2-user/environment/gx
 
__init__() got an unexpected keyword argument 'bucket'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ec2-user/environment/jupyterlab-venv/bin/great_expectations", line 8, in <module>
    sys.exit(main())
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/cli/cli.py", line 146, in main
    cli()
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/click/core.py", line 1685, in invoke
    super().invoke(ctx)
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/cli/store.py", line 13, in store
    ctx.obj.data_context = ctx.obj.get_data_context_from_config_file()
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/cli/cli.py", line 43, in get_data_context_from_config_file
    context: FileDataContext = toolkit.load_data_context_with_error_handling(  # type: ignore[assignment] # will exit if error
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/cli/toolkit.py", line 420, in load_data_context_with_error_handling
    context = get_context(context_root_dir=directory)
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/data_context/context_factory.py", line 263, in get_context
    context = _get_context(**kwargs)
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/data_context/context_factory.py", line 302, in _get_context
    file_context = _get_file_context(
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/data_context/context_factory.py", line 383, in _get_file_context
    return FileDataContext(
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/data_context/file_data_context.py", line 67, in __init__
    super().__init__(
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/data_context/serializable_data_context.py", line 68, in __init__
    super().__init__(runtime_environment=runtime_environment)
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/core/usage_statistics/usage_statistics.py", line 266, in usage_statistics_wrapped_method
    result = func(*args, **kwargs)
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/data_context/abstract_data_context.py", line 301, in __init__
    self._init_primary_stores(self.project_config_with_variables_substituted.stores)
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/data_context/abstract_data_context.py", line 4399, in _init_primary_stores
    self._build_store_from_config(store_name, store_config)
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/data_context/abstract_data_context.py", line 4332, in _build_store_from_config
    new_store = Store.build_store_from_config(
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/store/store.py", line 319, in build_store_from_config
    new_store = instantiate_class_from_config(
  File "/home/ec2-user/environment/jupyterlab-venv/lib64/python3.9/site-packages/great_expectations/data_context/util.py", line 100, in instantiate_class_from_config
    raise TypeError(
TypeError: Couldn't instantiate class: ExpectationsStore with config: 
        store_name              expectations_store
        store_backend           {'class_name': 'TupleFilesystemStoreBackend', 'bucket': 'de-c2w3a1-546183455181-us-east-1-gx-artifacts', 'base_directory': 'expectations/', 'manually_initialize_store_backend_id': 'd7853c25-2df6-475f-8963-8ef82548e6b4', 'filepath_suffix': '.json'}
        runtime_environment             {'root_directory': '/home/ec2-user/environment/gx'}
 
Couldn't instantiate class: TupleFilesystemStoreBackend with config: 
        store_name              expectations_store
        bucket          de-c2w3a1-546183455181-us-east-1-gx-artifacts
        base_directory          expectations/
        manually_initialize_store_backend_id            d7853c25-2df6-475f-8963-8ef82548e6b4
        filepath_suffix         .json
        root_directory          /home/ec2-user/environment/gx
 
__init__() got an unexpected keyword argument 'bucket'
(jupyterlab-venv) voclabs:~/environment $

The content of grat_expectations.yml file is shared below:

# Welcome to Great Expectations! Always know what to expect from your data.
#
# Here you can define datasources, batch kwargs generators, integrations and
# more. This file is intended to be committed to your repo. For help with
# configuration please:
#   - Read our docs: https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/connect_to_data_overview/#2-configure-your-datasource
#   - Join our slack channel: http://greatexpectations.io/slack

# config_version refers to the syntactic version of this config file, and is used in maintaining backwards compatibility
# It is auto-generated and usually does not need to be changed.
config_version: 3

# Datasources tell Great Expectations where your data lives and how to get it.
# Read more at https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/connect_to_data_overview
datasources: {}

# This config file supports variable substitution which enables: 1) keeping
# secrets out of source control & 2) environment-based configuration changes
# such as staging vs prod.
#
# When GX encounters substitution syntax (like `my_key: ${my_value}` or
# `my_key: $my_value`) in the great_expectations.yml file, it will attempt
# to replace the value of `my_key` with the value from an environment
# variable `my_value` or a corresponding key read from this config file,
# which is defined through the `config_variables_file_path`.
# Environment variables take precedence over variables defined here.
#
# Substitution values defined here can be a simple (non-nested) value,
# nested value such as a dictionary, or an environment variable (i.e. ${ENV_VAR})
#
#
# https://docs.greatexpectations.io/docs/guides/setup/configuring_data_contexts/how_to_configure_credentials


config_variables_file_path: uncommitted/config_variables.yml

# The plugins_directory will be added to your python path for custom modules
# used to override and extend Great Expectations.
plugins_directory: plugins/

stores:
# Stores are configurable places to store things like Expectations, Validations
# Data Docs, and more. These are for advanced users only - most users can simply
# leave this section alone.
#
# Three stores are required: expectations, validations, and
# evaluation_parameters, and must exist with a valid store entry. Additional
# stores can be configured for uses such as data_docs, etc.
  expectations_store:
    class_name: ExpectationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      bucket: de-c2w3a1-546183455181-us-east-1-gx-artifacts
      #prefix: expectations/
      base_directory: expectations/

  validations_store:
    class_name: ValidationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      bucket: de-c2w3a1-546183455181-us-east-1-gx-artifacts
      #prefix: validations/      
      base_directory: uncommitted/validations/

  evaluation_parameter_store:
    # Evaluation Parameters enable dynamic expectations. Read more here:
    # https://docs.greatexpectations.io/docs/reference/evaluation_parameters/
    class_name: EvaluationParameterStore

  checkpoint_store:
    class_name: CheckpointStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      suppress_store_backend_id: true
      bucket: de-c2w3a1-546183455181-us-east-1-gx-artifacts
      #prefix: checkpoints/      
      base_directory: checkpoints/

  profiler_store:
    class_name: ProfilerStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      suppress_store_backend_id: true
      base_directory: profilers/

expectations_store_name: expectations_store
validations_store_name: validations_store
evaluation_parameter_store_name: evaluation_parameter_store
checkpoint_store_name: checkpoint_store

data_docs_sites:
  # Data Docs make it simple to visualize data quality in your project. These
  # include Expectations, Validations & Profiles. The are built for all
  # Datasources from JSON artifacts in the local repo including validations &
  # profiles from the uncommitted directory. Read more at https://docs.greatexpectations.io/docs/terms/data_docs
  local_site:
    class_name: SiteBuilder
    # set to false to hide how-to buttons in Data Docs
    show_how_to_buttons: true
    store_backend:
        class_name: TupleFilesystemStoreBackend
        bucket: de-c2w3a1-546183455181-us-east-1-gx-docs
        base_directory: uncommitted/data_docs/local_site/
    site_index_builder:
        class_name: DefaultSiteIndexBuilder

anonymous_usage_statistics:
  enabled: True

CONCLUSION: I am blocked and can not complete this LAB/Assignement. Please share appropriate configuration changes that look like not covered by LAB notebook. Thanks.

@stodic @mizou Thank you for your questions. When you run the command “great_expectations init” , the stores for validations, expectations and checkpoints are by default configured as local files. This is why once you open the yml file: great_expectations.yml, you will see for example under “expectations store” that the backend is of type file system (“TupleFilesystemStoreBackend”) and for that file system you need to define the “base directory”.

What you are asked to do in the question is to update the configuration of the backend store; so instead of using a local store for the validations, expectations and checkpoints stores, you need to define the backend stores as “s3 bucket”. And for that, you need to completely change the block under store_backend as shown here:

So class_name should be TupleS3StoreBackend, then you need to specify the bucket and finally you need to provide the prefix (because in S3 we don’t have directories like in a regular file system, each object has a key that starts with a prefix).

You need to do that for expectations_store, validations_store and checkpoints_store.

And finally for data_docs_sites, you would also need to change the block to how it is defined in the Jupyter notebook.

Hope that helps! Please let us know if you have any further questions.