Are there any Manga and anime fan out here?

Diarray · February 19, 2024, 6:44pm

Hi everyone, I want to finetune a large language model for “Question-Answering” in otakus’ universe. People don’t really have this culture in Mali and I’m pretty sure it will be impossible for me to get my QA pairs here. I’m also thinking about how can I collect those data with a minimum effort and cleaning so I thought of creating a google spreadsheet to collect a lot of QA pairs in the simplest way but I need otakus to fill it out. Is anyone here interested in helping me collect those data?
And if someone has a better idea for data collection than using a shared spreadsheet Please let me know!

balaji.ambresh · February 19, 2024, 7:40pm

Have you seen this ?

chuaal · February 20, 2024, 3:12pm

That creative

Diarray · February 20, 2024, 6:50pm

Sure, my first idea was to search a dataset on kaggle but it turns out that those are mainly datasets for recommender systems. Nothing about text, QA, nothing to train an LLM

Diarray · February 20, 2024, 6:51pm

Yes, but I’m struggling to collect data!

balaji.ambresh · February 21, 2024, 4:59am

Why can’t you turn details about the dataset into a QA dataset?

Diarray · February 21, 2024, 11:11am

It would require too much time, and I wanted to get QA pairs from real otakus, because those are the ones the model would be trained for

balaji.ambresh · February 21, 2024, 11:36am

How about the following approaches for generating Q&A pairs:

Provide few shot examples to an LLM and make it generate responses for new content.
Use a crowd sourcing / freelance platform.

Diarray · February 22, 2024, 10:16am

I already thought about the first one, but I didn’t want my data to contains such a pattern. I want data from real manga/anime fan, which is more likely to be unbiased, pertinent and quality data.
And for you second idea, you mean to delegate the task to freelancers that I would pay to collect the data?

balaji.ambresh · February 22, 2024, 11:25am

Yes. Have you heard of turk ?

Diarray · February 22, 2024, 2:13pm

No I had never heard of mturk before you, looks interesting! Thank you!

pranav7896 · February 24, 2024, 1:10pm

Seems interesting, might add if got something helpful!

Rayed786 · February 26, 2024, 4:29pm

Hi i aminterested in helping you out

Diarray · February 28, 2024, 12:21pm

Thank you for your interest and excuse the late answer. Here is the Google form I’m using to collect data. Appreciate your help! Thanks

Topic		Replies	Views
Hello from a conversational ai geek in the UK Introductions introductions	1	27	February 4, 2025
Seeking Collaborators for Innovative GenAI Projects AI Discussions ai-discussions , project	4	163	July 30, 2024
In need of some help (GenAi with LLMs course related) / in general AI Discussions	0	66	October 2, 2023
Project idea help AI Discussions	1	54	December 22, 2023
Companies collecting data for LLMs AI Discussions ai-discussions	0	93	August 26, 2024

Are there any Manga and anime fan out here?

Related topics