Audio Transcripts

I used the below code to transcribe you tube video to text as described in the course

from langchain_community.document_loaders.generic import GenericLoader, FileSystemBlobLoader
from langchain_community.document_loaders.parsers import OpenAIWhisperParser
from langchain_community.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader

url=“”
save_dir=“”
loader = GenericLoader(
YoutubeAudioLoader([url],save_dir), # fetch from youtube
#FileSystemBlobLoader(save_dir, glob=“*.m4a”), #fetch locally
OpenAIWhisperParser()
)
docs = loader.load()

docs[0].page_content[0:500]

on running the code, i see the audio file created at the desired location

[youtube] Extracting URL:
[youtube] 5HcDJ8e9NwY: Downloading webpage
[youtube] 5HcDJ8e9NwY: Downloading tv client config
[youtube] 5HcDJ8e9NwY: Downloading tv player API JSON
[youtube] 5HcDJ8e9NwY: Downloading ios player API JSON
[youtube] 5HcDJ8e9NwY: Downloading m3u8 information
[info] 5HcDJ8e9NwY: Downloading 1 format(s): 140
[download] Data Quality Explained.m4a has already been downloaded
[download] 100% of 3.59MiB
[ExtractAudio] Not converting audio
Quality Explained.m4a; file is already in target format m4a
Transcribing part 1!
Transcribing part 1!

when i run the code docs[0].page_content[0:500] , i get error that NameError: name ‘docs’ is not defined

there is no txt file created with the audio transcription aslo

Please help as i am not sure what i am doing incorrect

1 Like

Dear @mohit.tandon03,

Can you please share your notebook via DM, I will look into the issues and let you the problems as well solutions.


Keep Learning AI with DeepLearning.AI - Girijesh