Instruct fine-tuning vs Vanilla fine-tuning

I have two questions here:

1: In instruct fine-tuning the pre-trained LM weights are updated as in vanilla fine-tuning ?
2: If the answer of the first question is YES, What is the different between them ?

If I want to increase model capability like in translation tasks
why I get data and reformulate it into instruction format then apply a instruct fine-tuning on it, while I can easy collect a translation data-sets and then fine tune pre-trained LM on it directly ?

1 Like

Hello @mohamed_abd_allah , welcome to the community!
I am not sure I understood your concern correctly, let me try.

1: In instruct fine-tuning the pre-trained LM weights are updated as in vanilla fine-tuning ?
Instruct fine-tuning is just a type of fine-tuning where the dataset is optimized to get an LLM which reacts properly to instructions. Instruction fine-tuning may be performed to a vanilla LLM. I understand “vanilla fine-tuning” as full fine-tuning (e.g. updating all LLM parameters), which is orthogonal to instruction fine-tuning. Instruction fine-tuning may be updating all parameters, a part of them, or none of them (e.g. adding new layers or soft-prompting).

I hope it makes sense?

1 Like

In my understanding instruction fine-tuning is LLM full fine-tuning, which @mohamed_abd_allah may be referring as vanilla fine-tuning, at least it is one form of LLM full fine-tuning. In instruction fine-tuning also we calculate the loss and update (possibly all) the LLM weights in each epoch using standard back prop. Therefore, I cannot understand why you are saying that full fine-tuning is orthogonal to instruction fine-tuning!

1 Like

Hi @KaushikDas , fine-tuning usually does not change all weights, as this is computationally and memory-wise, very expensive. Please check PEFT.

1 Like

Hi, I have not referred to fine-tuning in general, rather I specifically have talked about full fine-tuning (FFT) in the relation to instruction fine-tuning. In case, the instruction fine-tuning updates all parameters of the model during the training, will it not be equivalent to FFT? Although, I understand that in case of instruction fine-tuning the training is a targetted training using a dataset containing instruction-response pairs and therefore it is possible that it may not update all parameters in which case it will not be equivalent to FFT.

Please clarify if this understanding is correct.

On a side note, in case of FFT, a very comprehensive dataset is used to train the entire LLM from scratch updating all model parameters and that is an important distinction of FFT compared to instruction fine tuning.

Thank you!

1 Like

I believe differentiating between full fine tuning and instruction fine tuning based on how many parameters are updated is wrong. Thats not the differentiating factor of instruction finetuning. If instruction finetuning by same chance happens to not update a specific weight, the same is possible for full fine tuning as well.

The key difference of instruction finetuning is the dataset being used. Through instruction finetuning what we are teaching the LLM to do is to understand human instructions (which has a lot of variability) and then be able to respond to it in a manner that satisfies human evaluation. This is something thats not explicity taught to it during initial training, which just focuses on next word prediction (or other similar training functions) for its training.

But with instruction fine tuning, the LLM learns to understand the context behind user questions/instructions and learns to effectively respond to them. For example, hard commands like “Restrict your answer to 100 words” may not be explicitly clear to a base LLM but will be understood much better by a instruction fine-tuned LLM

1 Like