Why Causal LM for Q&A?

Yacine_Mazari · August 31, 2023, 3:18pm

The example shown in the course was to fine-tune a model to answer questions about Lamini. The raw training data was in the form {question} and {answer}.

I would have expected the model to be a sequence-to-sequence model where the input is the question and the output is the answer.

Instead, in the course they used a causal language model where they concatenated they concatenated the {question} and {answer} and trained the model to predict the next token.

Is this a common practice for real life, or was it just done to simplify the course?

Thanks.

Mohammed_Raouf · September 5, 2023, 10:15pm

According to my understanding, the underlying models are auto-regressive, which means they are decoder-only models. Hence they use AutoModelForCausalLM. For seq2seq models like T5 by Google you can use AutoModelForSeq2SeqLM.

Yacine_Mazari · September 6, 2023, 1:06am

Right, but my question is why did they choose a causal (auto-regressive) model for a question answering task, for which both the training data and the usage at inference time is of the for {question} and {answer}, i.e a sequence-to-sequence.

Topic		Replies	Views
T5 Model Architecture NLP with Attention Models week-3	2	281	December 22, 2023
Sequence to sequence vs autoregressive models Generative AI with Large Language Models week-1	3	1080	July 18, 2023
Finetune model with conversations GenAI with LLMs Resources	1	342	August 26, 2023
Week 1: Pretraining Large Language Models Generative AI with Large Language Models ai-discussions , large-language-model , llm	1	43	November 17, 2024
Can you mix and match different types of data? Finetuning Large Language Models	2	115	September 21, 2023

Why Causal LM for Q&A?

Related topics