Task 4.2.2: Setting up the following constants in the ‘./dags/deftunes_songs_pipeline.py’ file:
= “de-c4w4a2-[user]-us-east-1-data-lake”
= “de-c4w4a2-[user]-us-east-1-scripts”
= “arn:aws:iam::[user]:role/de-c4w4a2-glue-role”
When running the DAG in Airflow it is filing with the following message:
*** Found local files:
*** * /opt/airflow/logs/dag_id=deftunes_songs_pipeline_dag/run_id=scheduled__2020-02-01T00:00:00+00:00/task_id=rds_extract_glue_job/attempt=2.log
[2025-05-28, 03:01:34 UTC] {local_task_job_runner.py:123} ▶ Pre task execution logs
[2025-05-28, 03:01:35 UTC] {glue.py:188} INFO - Initializing AWS Glue Job: de-c4w4a2-rds-extract-job. Wait for completion: True
[2025-05-28, 03:01:35 UTC] {glue.py:365} INFO - Checking if job already exists: de-c4w4a2-rds-extract-job
[2025-05-28, 03:01:35 UTC] {base_aws.py:606} WARNING - Unable to find AWS Connection ID 'aws_default', switching to empty.
[2025-05-28, 03:01:35 UTC] {base_aws.py:180} INFO - No connection ID provided. Fallback on boto3 credential strategy (region_name='us-east-1'). See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
[2025-05-28, 03:01:36 UTC] {credentials.py:1075} INFO - Found credentials from IAM Role: de-c4w4a2-ec2-role
[2025-05-28, 03:01:36 UTC] {glue.py:209} INFO - You can monitor this Glue Job run at: https://console.aws.amazon.com/gluestudio/home?region=us-east-1#/job/de-c4w4a2-rds-extract-job/run/jr_6df1992da086b15667edde29ed62a4d98fa7f571f5617310e11b27f3d9702a84
[2025-05-28, 03:01:37 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:01:43 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:01:49 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:01:55 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:02:01 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:02:07 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:02:13 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:02:19 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:02:25 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:02:31 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:02:37 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:02:44 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:02:50 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:02:56 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:03:02 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING
[2025-05-28, 03:03:08 UTC] {glue.py:345} INFO - Exiting Job jr_6df1992da086b15667edde29ed62a4d98fa7f571f5617310e11b27f3d9702a84 Run State: FAILED
[2025-05-28, 03:03:08 UTC] {taskinstance.py:3310} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 767, in _execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 733, in _execute_callable
return ExecutionCallableRunner(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/operator_helpers.py", line 252, in run
return self.func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/baseoperator.py", line 406, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/operators/glue.py", line 223, in execute
glue_job_run = self.glue_job_hook.job_completion(self.job_name, self._job_run_id, self.verbose)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 297, in job_completion
ret = self._handle_state(job_run_state, job_name, run_id, verbose, next_log_tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 346, in _handle_state
raise AirflowException(job_error_message)
airflow.exceptions.AirflowException: Exiting Job jr_6df1992da086b15667edde29ed62a4d98fa7f571f5617310e11b27f3d9702a84 Run State: FAILED
[2025-05-28, 03:03:08 UTC] {taskinstance.py:1225} INFO - Marking task as FAILED. dag_id=deftunes_songs_pipeline_dag, task_id=rds_extract_glue_job, run_id=scheduled__2020-02-01T00:00:00+00:00, execution_date=20200201T000000, start_date=20250528T030134, end_date=20250528T030308
[2025-05-28, 03:03:08 UTC] {taskinstance.py:340} ▶ Post task execution logs
Any direction or guidance is appreciated! Thank you!