Dataset for fine tuning SLM

Hey everyone, I am trying to fine tune a model llama3.2 1B on some policies I have. My application that I am building, takes an application form, using RAG gets relevant context from the policy and check for the eligibility of the application form. Finetuning is also mostly for my learning.

My confusion is what my dataset should look like.

  1. data format/structure wise
  2. data content wise

should the data look exactly as it would be used


or should it be simple question answer format

or should it be simple question answer format with context


also I might (not decided yet) want the model to be able answer more user questions once it gives eligibility response. then the input structure to the model will be different form earlier.

can you guys let me know what are the best approach for finetuning for my case.

will the model llama 3.2 1B or phi 2 be able to handle all that policy context plus application form details and be able to reason and return eligibility criteria?