Week 4 Part 2 - Error running DAG RDS

Hi,
I get the following error when running the Airflow DAG (4.2) similar to Capstone Project Part 2,** 4.2 - DAG for Songs Data in RDS Source Airflow task failed - #3 by Amir_Zare. As far as I can see SCRIPTS_BUCKET_NAME is correctly set to the value of the output. Do I need to fill out this form? I see that the Job_Id is not the one that I got for de-c4w4a2-rds-extract-job when I ran “aws glue start-job-run …”. I really don’t get the problem, and the error message says close to nothing in order to find an error.
Do I need to fill out this form? Data Engineering Troubleshooting with AWS Issues

Output of Apache log of Job:
af713470591 *** Found local files: *** * /opt/airflow/logs/dag_id=deftunes_songs_pipeline_dag/run_id=scheduled__2020-03-01T00:00:00+00:00/task_id=rds_extract_glue_job/attempt=5.log [2024-12-23, 20:49:52 UTC] {local_task_job_runner.py:123} ▶ Pre task execution logs [2024-12-23, 20:49:52 UTC] {glue.py:188} INFO - Initializing AWS Glue Job: de-c4w4a2-rds-extract-job. Wait for completion: True [2024-12-23, 20:49:52 UTC] {glue.py:365} INFO - Checking if job already exists: de-c4w4a2-rds-extract-job [2024-12-23, 20:49:52 UTC] {base_aws.py:606} WARNING - Unable to find AWS Connection ID 'aws_default', switching to empty. [2024-12-23, 20:49:52 UTC] {base_aws.py:180} INFO - No connection ID provided. Fallback on boto3 credential strategy (region_name='us-east-1'). See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html [2024-12-23, 20:49:53 UTC] {credentials.py:1075} INFO - Found credentials from IAM Role: de-c4w4a2-ec2-role [2024-12-23, 20:49:54 UTC] {glue.py:209} INFO - You can monitor this Glue Job run at: https://console.aws.amazon.com/gluestudio/home?region=us-east-1#/job/de-c4w4a2-rds-extract-job/run/jr_2ba1a6f6501826e6cd8cfbab5b67dbb9df976e561d97b0d02fd0aed664278c01 [2024-12-23, 20:49:54 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:00 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:06 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:12 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:18 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:24 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:30 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:36 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:43 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:49 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:50:55 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:51:01 UTC] {glue.py:348} INFO - Polling for AWS Glue Job de-c4w4a2-rds-extract-job current run state with status RUNNING [2024-12-23, 20:51:07 UTC] {glue.py:345} INFO - Exiting Job jr_2ba1a6f6501826e6cd8cfbab5b67dbb9df976e561d97b0d02fd0aed664278c01 Run State: FAILED [2024-12-23, 20:51:07 UTC] {taskinstance.py:3310} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 767, in _execute_task result = _execute_callable(context=context, **execute_callable_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 733, in _execute_callable return ExecutionCallableRunner( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/operator_helpers.py", line 252, in run return self.func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/baseoperator.py", line 406, in wrapper return func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/operators/glue.py", line 223, in execute glue_job_run = self.glue_job_hook.job_completion(self.job_name, self._job_run_id, self.verbose) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 297, in job_completion ret = self._handle_state(job_run_state, job_name, run_id, verbose, next_log_tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 346, in _handle_state raise AirflowException(job_error_message) airflow.exceptions.AirflowException: Exiting Job jr_2ba1a6f6501826e6cd8cfbab5b67dbb9df976e561d97b0d02fd0aed664278c01 Run State: FAILED [2024-12-23, 20:51:07 UTC] {taskinstance.py:1225} INFO - Marking task as UP_FOR_RETRY. dag_id=deftunes_songs_pipeline_dag, task_id=rds_extract_glue_job, run_id=scheduled__2020-03-01T00:00:00+00:00, execution_date=20200301T000000, start_date=20241223T204952, end_date=20241223T205107 [2024-12-23, 20:51:07 UTC] {taskinstance.py:340} ▶ Post task execution logs

Hello @Jannes_Klee
Looks like Airflow is failing upon running the glue_rds_extract_job Glue job. You can open AWS Glue console, search for the run, see the logs, and understand what’s the problem with this. However, were you able to run this Glue job manually in step 2.4? Were there any errors then?