I am running generate_dags in the terminal and get an empty DAG folder…
Help
Hello @Chenko,
It seems you ran the command to create the dag_configs
folder inside the src/template
folder. You should be at the terminal at the project
folder in step 5.2.1 in order to create the correct path instead:
mkdir -p src/templates/dag_configs
This should create the correct path and then try to copy the 3 config files inside that folder. Thank you:
Thanks for the response, it worked!
Now I get an error in the code when running the DAG…
Here’s my code, what am I missing here
Hello @Chenko,
As you can see in the is_deployable task you are taking the performance
value from the train_and_evaluate task. Could you check in line 109 of the template.py
that you return the performance
. Hope it helps
Hello @Chenko,
I had a similar issue before when there were none integer values in the train set. It seems there is a bug in your code in the train_an_evaluate
task:
You used double curly brace {{ vendor_name }} instead of single ones:
train = pd.read_parquet(f"{datasets_path}/{{ vendor_name }}/train.parquet")
test = pd.read_parquet(f"{datasets_path}/{{ vendor_name }}/test.parquet")
Use {vendor_name} instead:
train = pd.read_parquet(f"{datasets_path}/{vendor_name}/train.parquet")
test = pd.read_parquet(f"{datasets_path}/{vendor_name}/test.parquet")
Hi, still not working…
Is everything defined correctly in the pictures I’ve added?
Tried all different kind of things, nothing worked (only once when I’ve put hardcoded 409 instead of the performance var one time to check if it’s working)
Changed -
Thanks
Hello @Chenko,
You’ve changed lines 85-86 and looks correct. Your depedencies as well, could you check you didn’t make the same mistake in line 51 with {vendor_name}:
f"s3://{Variable.get('bucket_name')}/work_zone/data_science_project/datasets/"
f"{vendor_name}/train.parquet" <---Did you use {{vendor_name}}here as well
),
Unfortunately I am waiting for a lab refresh since I made too many tries. You could send me your template.py
so I can check it. Thank you
Hello @Chenko,
Yes the code is complete and looks identical to mine. I would check if you used the raw_data
bucket and not the dags
bucket when defined the Variable in Airflow UI and when copied the parquet files in step 3.2. Finally make a manual check that the template I saw is correctly updating the three files in the dags folder. Are you getting the same error after fixing the files. Hope it helps:
TypeError: '<' not supported between instances of 'NoneType' and 'init'
@Chenko Found it in line 122, you used:
performance = ti.xcom_pull(task_ids="train_and_evalute")
instead of:
performance = ti.xcom_pull(task_ids="train_and_evaluate")
Thanks