What if there's no data in question-answer format?

Can I use lamini to finetune a model on my own documentation? But there’s no data in question-answer format, like it’s always shown throughout the course, so I can’t use the same technique, instruction finetuning. I just have a 500 lines txt file with the docs (actually it’s not even documentation, it’s a rulebook of a board game)

Why can’t you come up with question / answer pairs?

Here’s an example for monopoly:
Question: What should I do when I land on a chance square?
Answer: Pick a card from the chance pile of cards and follow the instructions.

I don’t see how it could, at all, be done to completely cover the entire rulebook.

I tried to use it in a prompt in openai gpt api, but it’s too big for their gpt-3.5-turbo-16k model (my prompt is around 29500 tokens). gpt-4-32k could handle that prompt, but it’s not publically available at the moment.

If the rules are in sections that are unrelated to each other, it’s possible to get good results by doing the following:

  1. Generate text embeddings for each section.
  2. When a query is received, find the rule section that’s closest in terms of embeddings.
  3. Submit another query to openai with just the section of the candidate section with your query again.

Another way:

  1. Summarize the rule book and create a smaller document. Most of the rules should be in sections. So, summarizing several sections of the rule book in 1 shot is a good place to start.
  2. Query this smaller document.

If none of these are helpful, please see this post on openai forum and consider posting your question there.

For your first approach, is there any techniques other than using langchain to pass query output of one model to other model?

Also, how would you identify the section to which the query belongs?