I was wondering if we can create a dataset from the transcripts of individual videos to build an LLM application with RAG + vector DB for educational purposes. I would like to practice some of the new skills that we have learned from the short courses.
In summary, would it be okay to create and share such dataset in the following format:
Course: Advanced Retrieval for AI with Chroma
Video 1: Introduction
Transcript 1:
Video 2: Overview of embeddings-based retrieval
Transcript 2:
Thank you for sharing. Yes, I did read the CoC but I was not sure about my intended use case. Using slides with proper citation is allowed but the transcript itself is not mentioned.
Yes, that would be nice. I mean I can replace the dataset with something else to do the same but I thought it would be nice to share this with the community as well, and would be a nice demonstration of the content of the course.
Thank you so much for your answer! All clear now and my only intention is to be able retrieve the relevant information and where it’s being told in the courses. It’s only for educational purposes with no monetization. For example, I will ask What is RAG? and I will retrieve the answer + where it’s told (e.g. Course: Advanced Retrieval for AI with Chroma - lecture 1 - Introduction).
@snnclsr, this is a great initiative! This is aligned with our code of conduct, as we really appreciate when learners collaborate to share knowledge.
However, just keep in mind that this will be fine as long as you stick with free resources that are available for everyone in this forum. This includes the slides and transcripts as well. This does not include any ungraded lab or assignment, since they are not accessible by everyone in this forum.
Thanks again for this initiative, and for sharing it with the community. We’re excited to learn the different ways that people use it to improve their learning.
Just my 1 cent worth, as a learner… I’m not sure that the transcripts in their raw source form will work well for @snnclsr 's purpose. For myself, I’ve started downloading the transcripts and then I revise, edit, and simplify the text so that I can understand it better. I also get rid of various ‘asides’ and tangential comments that distract from the key points, and add cross-references to external content as needed. In other words, I think it could be useful to clean up the transcripts so that they work best for your purpose. Perhaps you’ve already considered doing this?
Hey @lucas.coutinho , Can creating transcripts for doing RAG for the short courses of deeplearning.ai be done? I ask this because you mentioned that as long as one sticks with only the free resourses that are available for everyone in the forum while deeplearning.ai says that the short courses are available for free a limited period. Also, post completing the activity is it ok to make a youtube video sharing the learnings of the RAG application?
Hey @kwiseth@snnclsr or anybody else can please help me in getting the transcripts. As far as I can see the transcript is an image how can I get access to it. Any help would be appreciated
@Shell167 In some of the early DeepLearningAI short courses, the transcript was available to download. At the bottom of the video playback, you’d see choices for CC (closed captions), transcript, and some other options:
I haven’t looked at these in a while, but I just noticed that the neo4j-RAG doesn’t have this option, and likely some of the other more recent courses don’t either. If you don’t see these options, AFAIK the transcript is simply not available and I have no insight into how to get a transcript for the course. I tend to go back to the courses I’ve already taken and go through them again and again, until I ‘get it’. BTW, I’m not sure what you’re referring to when you say that the “transcript is an image.”? I have no clue how to obtain a transcript from an image. Best of luck