Hi,
I get the following error when running the Airflow DAG (4.2) similar to Capstone Project Part 2,** 4.2 - DAG for Songs Data in RDS Source Airflow task failed - #3 by Amir_Zare. As far as I can see SCRIPTS_BUCKET_NAME
is correctly set to the value of the output. Do I need to fill out this form? I see that the Job_Id is not the one that I got for de-c4w4a2-rds-extract-job when I ran “aws glue start-job-run …”. I really don’t get the problem, and the error message says close to nothing in order to find an error.
Do I need to fill out this form? Data Engineering Troubleshooting with AWS Issues
Output of Apache log of Job:
af713470591 *** Found local files: *** * /opt/airflow/logs/dag_id=deftunes_songs_pipeline_dag/run_id=scheduled__2020-03-01T00:00:00+00:00/task_id=rds_extract_glue_job/attempt=5.log [2024-12-23, 20:49:52 UTC] {local_task_job_runner.py:123} ▶ Pre task execution logs [2024-12-23, 20:49:52 UTC] {glue.py:188} INFO - Initializing AWS Glue Job: de-c4w4a2-rds-extract-job. Wait for completion: True [2024-12-23, 20:49:52 UTC] {glue.py:365} INFO - Checking if job already exists: de-c4w4a2-rds-extract-job [2024-12-23, 20:49:52 UTC] {base_aws.py:606} WARNING - Unable to find AWS Connection ID 'aws_default', switching to empty. [2024-12-23, 20:49:52 UTC] {base_aws.py:180} INFO - No connection ID provided. Fallback on boto3 credential strategy (region_name='us-east-1'). See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html [2024-12-23, 20:49:53 UTC] {credentials.py:1075} INFO - Found credentials from IAM Role: de-c4w4a2-ec2-role [2024-12-23, 20:49:54 UTC] {glue.py:209} INFO - You can monitor this Glue Job run at: https://console.aws.amazon.com/gluestudio/home?region=us-east-1#/job/de-c4w4a2-rds-extract-job/run/jr_2ba1a6f6501826e6cd8cfbab5b67dbb9df976e561d97b0d02fd0aed664278c01 [2024-12-23, 20:49:54 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:00 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:06 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:12 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:18 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:24 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:30 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:36 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:43 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:49 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:55 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:51:01 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:51:07 UTC] {glue.py:345} INFO - Exiting Job jr_2ba1a6f6501826e6cd8cfbab5b67dbb9df976e561d97b0d02fd0aed664278c01 Run State: FAILED [2024-12-23, 20:51:07 UTC] {taskinstance.py:3310} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 767, in _execute_task result = _execute_callable(context=context, **execute_callable_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 733, in _execute_callable return ExecutionCallableRunner( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/operator_helpers.py", line 252, in run return self.func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/baseoperator.py", line 406, in wrapper return func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/operators/glue.py", line 223, in execute glue_job_run = self.glue_job_hook.job_completion(self.job_name, self._job_run_id, self.verbose) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 297, in job_completion ret = self._handle_state(job_run_state, job_name, run_id, verbose, next_log_tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 346, in _handle_state raise AirflowException(job_error_message) airflow.exceptions.AirflowException: Exiting Job jr_2ba1a6f6501826e6cd8cfbab5b67dbb9df976e561d97b0d02fd0aed664278c01 Run State: FAILED [2024-12-23, 20:51:07 UTC] {taskinstance.py:1225} INFO - Marking task as UP_FOR_RETRY. dag_id=deftunes_songs_pipeline_dag, task_id=rds_extract_glue_job, run_id=scheduled__2020-03-01T00:00:00+00:00, execution_date=20200301T000000, start_date=20241223T204952, end_date=20241223T205107 [2024-12-23, 20:51:07 UTC] {taskinstance.py:340} ▶ Post task execution logs