As per the course, we used pythia-70M model and fine-tuned it on lamini_docs.jsonl data which was a QnA dataset. Some of the limitations I found on this model finetuning are -
-
After finetuning, when we give a question from the dataset, the model generates the output. But after the first few sentences, the model repeats the sentences. II think it is because we set max_output_length = 100, so the model just tries to somehow complete the output size. Is there any way to make the model stop its generation after it’s done with relevant output generation?
-
After finetuning, I was hoping that the model would be able to answer questions that are not present in the dataset but are related to the Lamini docs. eg. Lamini docs have a total of 1400 sample questions and answers, if we provide a question that is different from those 1400 questions but its answer is present in those 1400 answers, the model is unable to generate the sensible/relevant output for such questions. In my opinion, the model should have learned the context of that entire document and should be able to answer any question related to that document. (I may be wrong, please explain me)