How is the Lamini Q&A dataset generated?

Hi team, I’m curious how the Lamini Q&A dataset used in the labs are generated? It has 1k+ question and answer pairs about Lamini. I’m thinking to create such Q&A dataset for my own use case too but I only have documentations/readme. Did you use Alpaca to produce the data or other approaches or was the it human generated?

Thank you!

Hello @Sophie668

Thank you for sharing your question here. Are you referring to this Lesson 4: Data preparation, specifically?

Yes! It has 1400 question answer pairs.

One datapoint in the finetuning dataset:
{'answer': 'Lamini has documentation on Getting Started, Authentication, '
           'Question Answer Model, Python Library, Batching, Error Handling, '
           'Advanced topics, and class documentation on LLM Engine available '
           'at https://lamini-ai.github.io/.',
 'question': '### Question:\n'
             'What are the different types of documents available in the '
             'repository (e.g., installation guide, API documentation, '
             "developer's guide)?\n"
             '\n'
             '### Answer:'}