L1 notebook throws error : Download Error

Here is the error I got when running this cell:
Cell starts on line 12:
url=“https://www.youtube.com/watch?v=jGwO_UgTS7I
save_dir=“docs/youtube/”
loader = GenericLoader(
YoutubeAudioLoader([url],save_dir),
OpenAIWhisperParser()
)
docs = loader.load()

Error:

JSONDecodeError Traceback (most recent call last)
File /usr/local/lib/python3.9/site-packages/openai/api_requestor.py:669, in APIRequestor._interpret_response_line(self, rbody, rcode, rheaders, stream)
668 try:
→ 669 data = json.loads(rbody)
670 except (JSONDecodeError, UnicodeDecodeError) as e:

File /usr/local/lib/python3.9/json/init.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
→ 346 return _default_decoder.decode(s)
347 if cls is None:

File /usr/local/lib/python3.9/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
333 “”“Return the Python representation of s (a str instance
334 containing a JSON document).
335
336 “””
→ 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()

File /usr/local/lib/python3.9/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
354 except StopIteration as err:
→ 355 raise JSONDecodeError(“Expecting value”, s, err.value) from None
356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

APIError Traceback (most recent call last)
Cell In[12], line 7
2 save_dir=“docs/youtube/”
3 loader = GenericLoader(
4 YoutubeAudioLoader([url],save_dir),
5 OpenAIWhisperParser()
6 )
----> 7 docs = loader.load()

File /usr/local/lib/python3.9/site-packages/langchain/document_loaders/generic.py:90, in GenericLoader.load(self)
88 def load(self) → List[Document]:
89 “”“Load all documents.”“”
—> 90 return list(self.lazy_load())

File /usr/local/lib/python3.9/site-packages/langchain/document_loaders/generic.py:86, in GenericLoader.lazy_load(self)
84 “”“Load documents lazily. Use this when working at a large scale.”“”
85 for blob in self.blob_loader.yield_blobs():
—> 86 yield from self.blob_parser.lazy_parse(blob)

File /usr/local/lib/python3.9/site-packages/langchain/document_loaders/parsers/audio.py:51, in OpenAIWhisperParser.lazy_parse(self, blob)
49 # Transcribe
50 print(f"Transcribing part {split_number+1}!")
—> 51 transcript = openai.Audio.transcribe(“whisper-1”, file_obj)
53 yield Document(
54 page_content=transcript.text,
55 metadata={“source”: blob.source, “chunk”: split_number},
56 )

File /usr/local/lib/python3.9/site-packages/openai/api_resources/audio.py:57, in Audio.transcribe(cls, model, file, api_key, api_base, api_type, api_version, organization, **params)
55 requestor, files, data = cls._prepare_request(file, file.name, model, **params)
56 url = cls._get_url(“transcriptions”)
—> 57 response, _, api_key = requestor.request(“post”, url, files=files, params=data)
58 return util.convert_to_openai_object(
59 response, api_key, api_version, organization
60 )

File /usr/local/lib/python3.9/site-packages/openai/api_requestor.py:226, in APIRequestor.request(self, method, url, params, headers, files, stream, request_id, request_timeout)
205 def request(
206 self,
207 method,
(…)
214 request_timeout: Optional[Union[float, Tuple[float, float]]] = None,
215 ) → Tuple[Union[OpenAIResponse, Iterator[OpenAIResponse]], bool, str]:
216 result = self.request_raw(
217 method.lower(),
218 url,
(…)
224 request_timeout=request_timeout,
225 )
→ 226 resp, got_stream = self._interpret_response(result, stream)
227 return resp, got_stream, self.api_key

File /usr/local/lib/python3.9/site-packages/openai/api_requestor.py:619, in APIRequestor._interpret_response(self, result, stream)
611 return (
612 self._interpret_response_line(
613 line, result.status_code, result.headers, stream=True
614 )
615 for line in parse_stream(result.iter_lines())
616 ), True
617 else:
618 return (
→ 619 self._interpret_response_line(
620 result.content.decode(“utf-8”),
621 result.status_code,
622 result.headers,
623 stream=False,
624 ),
625 False,
626 )

File /usr/local/lib/python3.9/site-packages/openai/api_requestor.py:671, in APIRequestor._interpret_response_line(self, rbody, rcode, rheaders, stream)
669 data = json.loads(rbody)
670 except (JSONDecodeError, UnicodeDecodeError) as e:
→ 671 raise error.APIError(
672 f"HTTP code {rcode} from API ({rbody})", rbody, rcode, headers=rheaders
673 ) from e
674 resp = OpenAIResponse(data, rheaders)
675 # In the future, we might add a “status” parameter to errors
676 # to better handle the “error while streaming” case.

APIError: HTTP code 504 from API (

504 Gateway Time-out

504 Gateway Time-out

)
2 Likes

I am also getting this error , have you sort it out ??

I have this error too. Anyone can help? Thank you!

I also see the same exceptions when running the LangChain for LLMs / Document Loading notebook:

JSONDecodeError
APIError: HTTP code 504 from API

Trying a shorter 1-minute YouTube video didn’t resolve the 504 timeout. Any resolution? Thanks

Hi,

I have the same error… any ideas on solutions ?

Simon

Hello,
I am also facing same error. To me it looks like library mismatch issue - not sure on this. We dont have privileges to update the libraries. This is a powerful application. Please help rectify the issue at earliest. If you need me to help with troubleshooting, do reach out.

i don’t think anyone is looking into it. however, if someone does look into it then it will be great

thank you

I am facing the same error.

I am getting error 403 instead and just wonder anyone else has seen the same?

[download] Got error: HTTP Error 403: Forbidden

The full stack trace as follows:

[youtube] Extracting URL: https://www.youtube.com/watch?v=jGwO_UgTS7I
[youtube] jGwO_UgTS7I: Downloading webpage
[youtube] jGwO_UgTS7I: Downloading android player API JSON
[info] jGwO_UgTS7I: Downloading 1 format(s): 140
[dashsegments] Total fragments: 7
[download] Destination: docs/youtube//Stanford CS229: Machine Learning Course, Lecture 1 - Andrew Ng (Autumn 2018).m4a
[download] Got error: HTTP Error 403: Forbidden
---------------------------------------------------------------------------
DownloadError                             Traceback (most recent call last)
Cell In[15], line 7
      2 save_dir="docs/youtube/"
      3 loader = GenericLoader(
      4     YoutubeAudioLoader([url],save_dir),
      5     OpenAIWhisperParser()
      6 )
----> 7 docs = loader.load()

File /usr/local/lib/python3.9/site-packages/langchain/document_loaders/generic.py:90, in GenericLoader.load(self)
     88 def load(self) -> List[Document]:
     89     """Load all documents."""
---> 90     return list(self.lazy_load())

File /usr/local/lib/python3.9/site-packages/langchain/document_loaders/generic.py:85, in GenericLoader.lazy_load(self)
     81 def lazy_load(
     82     self,
     83 ) -> Iterator[Document]:
     84     """Load documents lazily. Use this when working at a large scale."""
---> 85     for blob in self.blob_loader.yield_blobs():
     86         yield from self.blob_parser.lazy_parse(blob)

File /usr/local/lib/python3.9/site-packages/langchain/document_loaders/blob_loaders/youtube_audio.py:45, in YoutubeAudioLoader.yield_blobs(self)
     42 for url in self.urls:
     43     # Download file
     44     with yt_dlp.YoutubeDL(ydl_opts) as ydl:
---> 45         ydl.download(url)
     47 # Yield the written blobs
     48 loader = FileSystemBlobLoader(self.save_dir, glob="*.m4a")

File /usr/local/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py:3369, in YoutubeDL.download(self, url_list)
   3366     raise SameFileError(outtmpl)
   3368 for url in url_list:
-> 3369     self.__download_wrapper(self.extract_info)(
   3370         url, force_generic_extractor=self.params.get('force_generic_extractor', False))
   3372 return self._download_retcode

File /usr/local/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py:3344, in YoutubeDL.__download_wrapper.<locals>.wrapper(*args, **kwargs)
   3341 @functools.wraps(func)
   3342 def wrapper(*args, **kwargs):
   3343     try:
-> 3344         res = func(*args, **kwargs)
   3345     except UnavailableVideoError as e:
   3346         self.report_error(e)

File /usr/local/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py:1507, in YoutubeDL.extract_info(self, url, download, ie_key, extra_info, process, force_generic_extractor)
   1505             raise ExistingVideoReached()
   1506         break
-> 1507     return self.__extract_info(url, self.get_info_extractor(key), download, extra_info, process)
   1508 else:
   1509     extractors_restricted = self.params.get('allowed_extractors') not in (None, ['default'])

File /usr/local/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py:1518, in YoutubeDL._handle_extraction_exceptions.<locals>.wrapper(self, *args, **kwargs)
   1516 while True:
   1517     try:
-> 1518         return func(self, *args, **kwargs)
   1519     except (DownloadCancelled, LazyList.IndexError, PagedList.IndexError):
   1520         raise

File /usr/local/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py:1615, in YoutubeDL.__extract_info(self, url, ie, download, extra_info, process)
   1613 if process:
   1614     self._wait_for_video(ie_result)
-> 1615     return self.process_ie_result(ie_result, download, extra_info)
   1616 else:
   1617     return ie_result

File /usr/local/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py:1674, in YoutubeDL.process_ie_result(self, ie_result, download, extra_info)
   1672 if result_type == 'video':
   1673     self.add_extra_info(ie_result, extra_info)
-> 1674     ie_result = self.process_video_result(ie_result, download=download)
   1675     self._raise_pending_errors(ie_result)
   1676     additional_urls = (ie_result or {}).get('additional_urls')

File /usr/local/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py:2779, in YoutubeDL.process_video_result(self, info_dict, download)
   2777 downloaded_formats.append(new_info)
   2778 try:
-> 2779     self.process_info(new_info)
   2780 except MaxDownloadsReached:
   2781     max_downloads_reached = True

File /usr/local/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py:3247, in YoutubeDL.process_info(self, info_dict)
   3243 dl_filename = existing_video_file(full_filename, temp_filename)
   3244 if dl_filename is None or dl_filename == temp_filename:
   3245     # dl_filename == temp_filename could mean that the file was partially downloaded with --no-part.
   3246     # So we should try to resume the download
-> 3247     success, real_download = self.dl(temp_filename, info_dict)
   3248     info_dict['__real_download'] = real_download
   3249 else:

File /usr/local/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py:2970, in YoutubeDL.dl(self, name, info, subtitle, test)
   2968 if new_info.get('http_headers') is None:
   2969     new_info['http_headers'] = self._calc_headers(new_info)
-> 2970 return fd.download(name, new_info, subtitle)

File /usr/local/lib/python3.9/site-packages/yt_dlp/downloader/common.py:444, in FileDownloader.download(self, filename, info_dict, subtitle)
    441     self.to_screen(f'[download] Sleeping {sleep_interval:.2f} seconds ...')
    442     time.sleep(sleep_interval)
--> 444 ret = self.real_download(filename, info_dict)
    445 self._finish_multiline_status()
    446 return ret, True

File /usr/local/lib/python3.9/site-packages/yt_dlp/downloader/dash.py:60, in DashSegmentsFD.real_download(self, filename, info_dict)
     56         return fd.real_download(filename, info_dict)
     58     args.append([ctx, fragments_to_download, fmt])
---> 60 return self.download_and_append_fragments_multiple(*args, is_fatal=lambda idx: idx == 0)

File /usr/local/lib/python3.9/site-packages/yt_dlp/downloader/fragment.py:382, in FragmentFD.download_and_append_fragments_multiple(self, *args, **kwargs)
    380 max_progress = len(args)
    381 if max_progress == 1:
--> 382     return self.download_and_append_fragments(*args[0], **kwargs)
    383 max_workers = self.params.get('concurrent_fragment_downloads', 1)
    384 if max_progress > 1:

File /usr/local/lib/python3.9/site-packages/yt_dlp/downloader/fragment.py:521, in FragmentFD.download_and_append_fragments(self, ctx, fragments, info_dict, is_fatal, pack_func, finish_func, tpe, interrupt_trigger)
    519     break
    520 try:
--> 521     download_fragment(fragment, ctx)
    522     result = append_fragment(
    523         decrypt_fragment(fragment, self._read_fragment(ctx)), fragment['frag_index'], ctx)
    524 except KeyboardInterrupt:

File /usr/local/lib/python3.9/site-packages/yt_dlp/downloader/fragment.py:466, in FragmentFD.download_and_append_fragments.<locals>.download_fragment(fragment, ctx)
    463     self.report_retry(err, count, retries, frag_index, fatal)
    464     ctx['last_error'] = err
--> 466 for retry in RetryManager(self.params.get('fragment_retries'), error_callback):
    467     try:
    468         ctx['fragment_count'] = fragment.get('fragment_count')

File /usr/local/lib/python3.9/site-packages/yt_dlp/utils.py:6141, in RetryManager.__iter__(self)
   6139 yield self
   6140 if self.error:
-> 6141     self.error_callback(self.error, self.attempt, self.retries)

File /usr/local/lib/python3.9/site-packages/yt_dlp/downloader/fragment.py:463, in FragmentFD.download_and_append_fragments.<locals>.download_fragment.<locals>.error_callback(err, count, retries)
    461 if fatal and count > retries:
    462     ctx['dest_stream'].close()
--> 463 self.report_retry(err, count, retries, frag_index, fatal)
    464 ctx['last_error'] = err

File /usr/local/lib/python3.9/site-packages/yt_dlp/downloader/common.py:389, in FileDownloader.report_retry(self, err, count, retries, frag_index, fatal)
    387 """Report retry"""
    388 is_frag = False if frag_index is NO_DEFAULT else 'fragment'
--> 389 RetryManager.report_retry(
    390     err, count, retries, info=self.__to_screen,
    391     warn=lambda msg: self.__to_screen(f'[download] Got error: {msg}'),
    392     error=IDENTITY if not fatal else lambda e: self.report_error(f'\r[download] Got error: {e}'),
    393     sleep_func=self.params.get('retry_sleep_functions', {}).get(is_frag or 'http'),
    394     suffix=f'fragment{"s" if frag_index is None else f" {frag_index}"}' if is_frag else None)

File /usr/local/lib/python3.9/site-packages/yt_dlp/utils.py:6148, in RetryManager.report_retry(e, count, retries, sleep_func, info, warn, error, suffix)
   6146 if count > retries:
   6147     if error:
-> 6148         return error(f'{e}. Giving up after {count - 1} retries') if count > 1 else error(str(e))
   6149     raise e
   6151 if not count:

File /usr/local/lib/python3.9/site-packages/yt_dlp/downloader/common.py:392, in FileDownloader.report_retry.<locals>.<lambda>(e)
    387 """Report retry"""
    388 is_frag = False if frag_index is NO_DEFAULT else 'fragment'
    389 RetryManager.report_retry(
    390     err, count, retries, info=self.__to_screen,
    391     warn=lambda msg: self.__to_screen(f'[download] Got error: {msg}'),
--> 392     error=IDENTITY if not fatal else lambda e: self.report_error(f'\r[download] Got error: {e}'),
    393     sleep_func=self.params.get('retry_sleep_functions', {}).get(is_frag or 'http'),
    394     suffix=f'fragment{"s" if frag_index is None else f" {frag_index}"}' if is_frag else None)

File /usr/local/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py:1015, in YoutubeDL.report_error(self, message, *args, **kwargs)
   1010 def report_error(self, message, *args, **kwargs):
   1011     '''
   1012     Do the same as trouble, but prefixes the message with 'ERROR:', colored
   1013     in red if stderr is a tty file.
   1014     '''
-> 1015     self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)

File /usr/local/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py:955, in YoutubeDL.trouble(self, message, tb, is_error)
    953     else:
    954         exc_info = sys.exc_info()
--> 955     raise DownloadError(message, exc_info)
    956 self._download_retcode = 1

[download] Got error: HTTP Error 403: Forbidden

Hello @Mun_Chung_Wong ,

The team has updated our labs.
Could you kindly start a fresh again and let me know if issue still persists.

With regards,
Nilosree Sengupta

Thanks Nilosree.

I’ve restarted the kernel but same http error 403 persists.
Is there a new version of notebook I should load? If so, I can’t find it when I do:
File → Open…
It just shows one and presumably the same notebook I have been using for this module.

Have also tried a different link to Andrew’s lecture 2 but same error:
https://www.youtube.com/watch?v=4b4MUYve_U8

Didnt mean to show that video preview page in my reply post :wink:

Hello @Mun_Chung_Wong ,

You’re welcome.
I also reciprocated on my end and got the same. I have reported this to our team.
This will be fixed soon!

With regards,
Nilosree Sengupta

1 Like

Hello I am facing the same error.
[download] Got error: HTTP Error 403: Forbidden

Hello @shigaraki_G ,

There are some permission issues. Our Team is working on it.
By the mean time, continue with other parts of the course.
Happy learning!

With regards,
Nilosree Sengupta

Thank you for the reply. I have completed the course too.

Hello @shigaraki_G ,

You’re welcome!
That’s great!

With regards,
Nilosree Sengupta

Hello @Mun_Chung_Wong , @shigaraki_G ,

The version is updated now and the issue fixed by the team.

With regards,
Nilosree Sengupta

Thankyou!

Hello @shigaraki_G ,

You’re welcome!
Happy Learning!

With regards,
Nilosree Sengupta

Hi @nilosreesengupta,
I am facing this issue - although I see that it has fixed (As per the convo) in this thread.
Attaching Screenshot for reference: