Computation needs for Fine-tuning LLMs

In week 1, there is a video where the “Computational challenges of LLMs” are explained.

Since Fine-tuning actually runs the model and updates the weights similarly to the pre-training step, does it also require the same large computational resources to fine-tune the LLM?

Hi @DanielCogzell

While fine-tuning typically requires fewer computational resources compared to the initial pre-training phase, it still demands substantial computing power, especially when dealing with LLMs.

Size of the model, the complexity of the task, the size of the dataset, and the hardware affect the computational demands of fine-tuning a model.

Hello @DanielCogzell, welcome to the Community!

Full fine-tuning does require a substantial amount of processing and memory as you need to update all the weights of the model. While it is not as demanding as pre-training from zero, which typically involves hundreds or thousands of training epochs, it is nevertheless quite expensive depending on the total number of model parameters, size of dataset, and computational resources available.

For this reason, usually parameter efficient fine-tuning (PEFT) techniques are preferred as they instead update only a small portion of weights for the model, making the process less computationally expensive without compromising too much on model accuracy. Again, the results also depend on model architecture and training dataset size.