I believe it was mentioned that it’s best to fine tune a model using a base model that’s closer to the domain where’re adapting it to. If I wanted to adapt a model for a health content Q&A, how should I go about it? The content specifically is not clinical, but talks about health content - articles such as those from mayoclinic would be representative.
My thought is - I would explore models (pre-trained) that are generally trained, but also on health content. Then I would fine-tune it (using PEFT techniques) on more domain specific Q&A labeled training examples (in the order of 10-15k examples) specific to my use-case.
Is that on the right course or is there a better approach?
How can I discover datasets or pretrained models that are trained on specific types of content (e.g. mental health within “health” topic, or “heart health” within “health” topic, or “contract law” within “law” topic … etc)?
Thanks!