In this assignment I wanted to make sure that template.py and model_trip_duration_easy_destiny.py were working so when I compiled both files
I got the following error in cloud 9 :
File “/home/ec2-user/environment/src/templates/template.py”, line 5, in
import great_expectations as gx
ModuleNotFoundError: No module named ‘great_expectations’
I also got the error with model_trip
voclabs:~/environment/src $ python3 model_trip_duration_easy_destiny.py
Traceback (most recent call last):
File “/home/ec2-user/environment/src/model_trip_duration_easy_destiny.py”, line 5, in
import great_expectations as gx
ModuleNotFoundError: No module named 'great_expectations
additionally maybe I have missed but when I try to regenerate the DAGs
I still am unable to get this error away in DAG’s airflow. It just keep showing previous errors.
When you read through the error logs for the 3 dags, you can see that Airflow cannot find model_trip_duration_dag as its not defined in your code.
In your case, likely it happens because you replaced the function name occurrence with the jinja template variable but not at your end of your code where you need to instantiate the function so that Airflow can recognize it.
For easy replacement of occurrences, you can replace all using ctrl+f and using replace all feature.
Hi data_otter,
okay. Thanks.
Additionally when I standalone was compiling model_trip_duration_easy_destiny.py
I was getting the error that import great_expectations as gx module was not found. Do I have to install this module myself ?
Thanks
voclabs:~/environment/src $ python3 model_trip_duration_easy_destiny.py
Traceback (most recent call last):
File “/home/ec2-user/environment/src/model_trip_duration_easy_destiny.py”, line 5, in
import great_expectations as gx
ModuleNotFoundError: No module named 'great_expectations
I don’t remember if Cloud9 instances come pre-installed with great-expectations, but you shouldn’t have to install it yourself, at least when I went through the lab.
At this time, you will probably be using a clean lab account so you try go through the lab first. If the error persists, then you try doing a pip install great-expectations
Hi, I am currently doing the lab from scratch. I finished up to exercise 4
I compile the model_trip_duration_easy_destiny.py
I got the error that import great expectation.
I execute the command to install great expectation.
pip install great-expectations
I complie model_trip_duration_easy_destiny.py
I get now the error
Traceback (most recent call last):
File “/home/ec2-user/environment/src/model_trip_duration_easy_destiny.py”, line 9, in
from airflow.decorators import ( ModuleNotFoundError: No module named ‘airflow’
Do i need to install airflow.decorators ?
additionally I am also trying to restart apache airflow (i do not know if that would help get rid of the error ?
bash ./scripts/restart_airflow.sh
but for five minutes I am getting these messages that the service is not healthy
additionally could anyone please or provide a hint regarding Jinjin templating. I am in template.py doing the replacement but I get error for using {{ dag_name }}
The python files you see in this lab are specifically designed to run in Airflow. I don’t know if it’s possible to run them in the terminal, but I don’t suggest doing so. There are dependencies like the Airflow library and the bucket_name variable you create in exercise 1 that are only available when the code is run as a DAG in Airflow.
Moreover, Jinja templating is used to create multiple python executables from a single template. You can see step 5.3.2 and the generate_dags.py file to see how this is done. After creating multiple files from a single template, you can try running them.
Hi again @AQ_2023
First of all, please refrain from posting your answers on the public forum since it is against the code of conduct.
Secondly, as I can see from the error you are getting, the address to the train.parquet file is incorrect. You can explore your S3 buckets from the AWS console to see where the files are:
As can be seen from the image, the sample address for altiran should be like de-c2w4a1-<Account_Id>-us-east-1-raw-data/work_zone/data_science_project/datasets/alitran/train.parquet, while the one you have written in line 85 of the model_trip_duration_easy_destiny.py file in exercise 2 leads to a different one.
EDIT: Mmmh OK, fixed the issue (had s3 in the bucket_name variable in airflow) but it does not detect the variable now and so, does say that bucket_name does is not defined.
I changed it several times, restarted airflow…still the message is:
Broken DAG: [/opt/airflow/dags/.~c9_invoke_CckkG.py]
Traceback (most recent call last):
File "/opt/airflow/dags/.~c9_invoke_CckkG.py", line 50, in model_trip_duration_to_my_place_ai
f"s3://{Variable.get('bucket_name')}/work_zone/data_science_project/datasets/"
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/variable.py", line 145, in get
raise KeyError(f"Variable {key} does not exist")
KeyError: 'Variable bucket_name does not exist'
Hello @hcara
From the error you are getting I can interpret that the problem is with the bucket_name variable that is supposed to be created in the Airflow UI. Please, double check what you did in exercise 1, and check that this variable exists.
I defined it, checked several times, restarted AIRFLOW… At the end it worked but didn’t change anyhing. It looks like Airflow simply wasn’t taking it for some time.
Hello @sshetty
Seems like you are trying to run the DAG python files from terminal. You aren’t supposed to do this. Instead, follow the lab instructions to upload the DAG files into airflow, then you can run them via the Airflow UI.
If the issue persists, please provide a screenshot of the error you see and specify in which step of the lab you are getting the error.
I am getting the same error for each of the 3 pipelines.
I’ve copied the error message from Airflow “DAG Import Errors” for easy_trans, but the others are the same:
Broken DAG: [/opt/airflow/dags/model_trip_duration_easy_destiny.py]
Traceback (most recent call last):
File “/home/airflow/.local/lib/python3.11/site-packages/pyarrow/dataset.py”, line 465, in _filesystem_dataset
fs, paths_or_selector = _ensure_single_source(source, filesystem)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/airflow/.local/lib/python3.11/site-packages/pyarrow/dataset.py”, line 441, in _ensure_single_source
raise FileNotFoundError(path)
FileNotFoundError: de-c2w4a1-182196532500-us-east-1-raw-data/work_zone/data_science_project/datasets/easy_destiny/train.parquet
The S3 URI as copied from the bucket is as below:
s3://de-c2w4a1-182196532500-us-east-1-raw-data/work-zone/data_science_project/datasets/easy_destiny/train.parquet
The code from line 48 in /src/dags/model_trip_duration_easy_destiny.py is:
data_asset_name=“train_easy_destiny”,
dataframe_to_validate=pd.read_parquet(
f"s3://{Variable.get(‘bucket_name’)}/work_zone/data_science_project/datasets/"
f"{vendor_name}/train.parquet"
),
Hello @oisinom
The tasks should not be run upon importing the DAG, but the error you are getting shows that this has been happening. This happens when you define a task, but instead of adding a function as its python_callable, you call the function. For instance, line 137 of the templates.py file should be python_callable=_is_deployable, not python_callable=_is_deployable(). Please, look out for these kinds of errors, and the issue will go away. However, if you couldn’t find the source of it, send me the filled out templates.py privately, and I will try to figure out the problem.