Hi team, I’m curious how the Lamini Q&A dataset used in the labs are generated? It has 1k+ question and answer pairs about Lamini. I’m thinking to create such Q&A dataset for my own use case too but I only have documentations/readme. Did you use Alpaca to produce the data or other approaches or was the it human generated?
Thank you!
Hello @Sophie668
Thank you for sharing your question here. Are you referring to this Lesson 4: Data preparation, specifically?
Yes! It has 1400 question answer pairs.
One datapoint in the finetuning dataset:
{'answer': 'Lamini has documentation on Getting Started, Authentication, '
'Question Answer Model, Python Library, Batching, Error Handling, '
'Advanced topics, and class documentation on LLM Engine available '
'at https://lamini-ai.github.io/.',
'question': '### Question:\n'
'What are the different types of documents available in the '
'repository (e.g., installation guide, API documentation, '
"developer's guide)?\n"
'\n'
'### Answer:'}