Can we create a dataset from the transcripts of videos for the educational purposes?

Hi,

I was wondering if we can create a dataset from the transcripts of individual videos to build an LLM application with RAG + vector DB for educational purposes. I would like to practice some of the new skills that we have learned from the short courses.

In summary, would it be okay to create and share such dataset in the following format:
Course: Advanced Retrieval for AI with Chroma
Video 1: Introduction
Transcript 1:
Video 2: Overview of embeddings-based retrieval
Transcript 2:

Best,

4 Likes

I recommend you read the Code of Conduct.

1 Like

Thank you for sharing. Yes, I did read the CoC but I was not sure about my intended use case. Using slides with proper citation is allowed but the transcript itself is not mentioned.

1 Like

Maybe someone from the course staff will also contribute here.

1 Like

Yes, that would be nice. I mean I can replace the dataset with something else to do the same but I thought it would be nice to share this with the community as well, and would be a nice demonstration of the content of the course.

1 Like

Yes

1 Like

Looking for it!

1 Like

Hey there @Community-Team,

Thank you so much for your answer! All clear now and my only intention is to be able retrieve the relevant information and where it’s being told in the courses. It’s only for educational purposes with no monetization. For example, I will ask What is RAG? and I will retrieve the answer + where it’s told (e.g. Course: Advanced Retrieval for AI with Chroma - lecture 1 - Introduction).

I will share my project and results here soon!

3 Likes

@Community-Team, “creating transcripts” is not what’s happening here.

It’s using the transcripts to create an LMM application.

I think it’s important to clarify whether your answer really applies to this proposed use.

1 Like

Hi all!

@snnclsr, this is a great initiative! This is aligned with our code of conduct, as we really appreciate when learners collaborate to share knowledge.

However, just keep in mind that this will be fine as long as you stick with free resources that are available for everyone in this forum. This includes the slides and transcripts as well. This does not include any ungraded lab or assignment, since they are not accessible by everyone in this forum.

Thanks again for this initiative, and for sharing it with the community. We’re excited to learn the different ways that people use it to improve their learning.

Cheers,
Lucas

4 Likes

Awesome! Thank you for clarifying the usage of resources @lucas.coutinho! I will share my project here as well.

Best,

1 Like

Just my 1 cent worth, as a learner… I’m not sure that the transcripts in their raw source form will work well for @snnclsr 's purpose. For myself, I’ve started downloading the transcripts and then I revise, edit, and simplify the text so that I can understand it better. I also get rid of various ‘asides’ and tangential comments that distract from the key points, and add cross-references to external content as needed. In other words, I think it could be useful to clean up the transcripts so that they work best for your purpose. Perhaps you’ve already considered doing this?

1 Like