YouTube document loader -

Hello
I am having issues with being able to download YouTube video using Document Loader functionality of LangChain. Here are additional details

Course - LangChain: Chat with Your Data
Area - 01_document_loading
Subject - YouTube document loader
OS - MacOS M1 Max
Code - Loading documents from a YouTube url | 🦜️🔗 Langchain
The files have been installed on my laptop - ffmpeg ffplay ffprobe

When I invoke the following code via Jupyter Notebook, I get an error

ERROR: Postprocessing: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location

# Two Karpathy lecture videos
urls = [“Let's build GPT: from scratch, in code, spelled out. - YouTube”, “The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube”]

# Directory to save audio files
save_dir = “~/Downloads/YouTube”

# Transcribe the videos to text
loader = GenericLoader(YoutubeAudioLoader(urls, save_dir), OpenAIWhisperParser())
docs = loader.load()

Your guidance is greatly appreciated.

pydub needs ffprobe and ffmpeg but you cant just pip install them in the python script. You have to install them properly on your machine, and add them to your path. This video worked for me:How To: Download+Install FFMPEG on Windows 10 | Full Guide - YouTube

I am on MacOS. I followed the instructions. I continue to have the same issue. Here is what I have done so far

  1. After I install the three libraries ffprobe, ffmpeg, ffplay and added it to my PATH, I confirmed that it can be invoked from command line on zsh iTerminal

ffmpeg
ffmpeg version N-111376-g13ef5025e3-tessus static FFmpeg binaries for macOS 64-bit Copyright (c) 2000-2023 the FFmpeg developers
built with Apple clang version 11.0.0 (clang-1100.0.33.17)

  1. The where command correctly identifies the file

where ffmpeg
/Users/john/Documents/Projects/PATH_Files/ffmpeg

  1. I tried to pass the location via loader.requests_kwargs

Karpathy lecture videos

urls = [“https://youtu.be/kCc8FmEb1nY”]

Directory to save audio files

save_dir = “~/Downloads/YouTube”

Transcribe the videos to text

loader = GenericLoader(YoutubeAudioLoader(urls, save_dir), OpenAIWhisperParser())
loader.requests_kwargs = {‘ffmpeg-location’:‘~/Documents/Projects/PATH_Files’}
loader.requests_kwargs = {‘ffprobe-location’:’~/Documents/Projects/PATH_Files’}
docs = loader.load()

  1. I set the PATH in Jupyter notebook
    current_path = os.environ.get(‘PATH’)
    new_path = ‘~/Documents/Projects/PATH_Files:’ + current_path
    os.environ[‘PATH’] = new_path

  2. I can confirm that the PATH has been updated
    !echo $PATH
    ~/Documents/Projects/PATH_Files:/Users/John/miniconda3/envs/local_llm_env/bin:/opt/homebrew/anaconda3/condabin:/usr/bin:/bin:/usr/sbin:/sbin

  3. If I try to invoke it via the notebook directly
    !ffmpeg
    The command is not recognized
    zsh:1: command not found: ffmpeg

  4. If I use the full path, then the notebook recognizes it

!~/Documents/Projects/PATH_Files/ffmpeg
ffmpeg version N-111376-g13ef5025e3-tessus static FFmpeg binaries for macOS 64-bit Copyright (c) 2000-2023 the FFmpeg developers

8. The python program still shows me an error
[youtube] Extracting URL: https://youtu.be/kCc8FmEb1nY
[youtube] kCc8FmEb1nY: Downloading webpage
[youtube] kCc8FmEb1nY: Downloading ios player API JSON
[youtube] kCc8FmEb1nY: Downloading android player API JSON
[youtube] kCc8FmEb1nY: Downloading m3u8 information
[info] kCc8FmEb1nY: Downloading 1 format(s): 140
[download] /Users/John/Downloads/YouTube/Let’s build GPT: from scratch, in code, spelled out…m4a has already been downloaded
[download] 100% of 107.73MiB
ERROR: Postprocessing: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location

I am stumped at this time.
Thanks for your help.