The example shown in the course was to fine-tune a model to answer questions about Lamini. The raw training data was in the form {question} and {answer}.
I would have expected the model to be a sequence-to-sequence model where the input is the question and the output is the answer.
Instead, in the course they used a causal language model where they concatenated they concatenated the {question} and {answer} and trained the model to predict the next token.
Is this a common practice for real life, or was it just done to simplify the course?
Thanks.