Supervised Fine-tuning

Hi, I have a question about the process of instruction fine-tuning. This approach consists of fine-tuning a LLM on a specific task or set of tasks by providing task-specific instructions or examples. So, in this process what I have understood is that we introduce in the model an input which is a template with instruction, text and completion, and the label is also this template. The output of the model is compared with the label and the weights are updated (it is a supervised training). Is this correct?

Thanks

Hello Luis, welcome to the Community!

Yes, you’re on the right track, this is a form of supervised training. Fine-tuning a LLM means that you train a model (usually a foundational model) to specific task-data in order to perform best for the task of interest. For this, you do need to update the weights of the model based on this new information, usually based on the methods you’re already familiar with during standard training from zero.

However, while full tine-tuning can be performed, this is not always done as it’s suboptimal when compared with other approaches, such as PEFT (Parameter Efficient Fine-Tuning), which only updates a small percentage of all the weights (while the rest of the weights remain the same, also known as ‘freezing weights’).

Hope this helps!

Thank you, Charlie. However, it is not clear to me why for instruction tuning the desired output is also included in the model input (in Llama2, for example, a template with instuction, input and desired ouput is used). As it is supervised fine tuning, wouldn’t it make more sense to have only instruction and input in the model input, and to compere the model output with the desired output?

Indeed, it makes sense, and that’s certainly an alternative approach. However, including here the desired output in the model input serves a specific purpose.

By providing the desired output along with the instruction and input text, the model learns not only to generate text based on the input but also to explicitly understand what the correct output should be for that particular input and instruction. This essentially guides the models towards producing outputs that align more closely with the desired output during fine-tuning.

If we don’t include the desired output in the model input, the model would solely rely on comparing its generated output with the desired output after processing the input and instruction. While this approach can still work, when we provide the desired output as part of the input this can improve the learning process by giving the model more direct information about what it should aim to produce, and usually leads to faster and more accurate fine-tuning results.

1 Like