Why Causal LM for Q&A?

The example shown in the course was to fine-tune a model to answer questions about Lamini. The raw training data was in the form {question} and {answer}.

I would have expected the model to be a sequence-to-sequence model where the input is the question and the output is the answer.

Instead, in the course they used a causal language model where they concatenated they concatenated the {question} and {answer} and trained the model to predict the next token.

Is this a common practice for real life, or was it just done to simplify the course?

Thanks.

According to my understanding, the underlying models are auto-regressive, which means they are decoder-only models. Hence they use AutoModelForCausalLM. For seq2seq models like T5 by Google you can use AutoModelForSeq2SeqLM.

Right, but my question is why did they choose a causal (auto-regressive) model for a question answering task, for which both the training data and the usage at inference time is of the for {question} and {answer}, i.e a sequence-to-sequence.

2 Likes