Fine tune the mode on GPU

In Week2’s Lab, section 2.2 - Fine tune the model was run on CPU. I tried to run trainer.train() on my GPU which has 32G memory and I get the OutOfMemoryError message. How to decide the GPU memory size required to fine tune a given model?

Hi @Real_Emily ,

I understand that you are doing a full fine-tuning right?

To calculate the memory required for this task you have to consider several variables:

  1. The size of the base model. That’s for starters the one thing that may consume the most as the weights will have to be loaded into the RAM.

  2. Batch size: You are in control of this one. The batch size will also impact the memory usage. The bigger the batch the more memory is needed.

  3. Precision: Models come usually in 32bits but you can use quantization to go to 16bits or even 8 bits. This will definitively impact your memory usage, but also the precision.

There are other variables that will also impact like gradient accumulation, sequence length, the number of GPUs at your disposal.

HAVING SAID ALL THAT…
There are other ways to fine tune a model that will use much much less resources. One very well known and recommended is Peft + LoRA. With this, you can get almost the same quality of a full training at a fraction of the resources.

I hope this info is useful!

Juan

Thanks @Juan_Olano . Is it possible that a model is so big that the model itself cannot fit into one GPU? My understanding is that in order to use Peft + LoRA, the model itself at least need to be loaded into one GPU.

Hi @Real_Emily ,

Yes, definitively. There are small models with, say 1B Params, even probably a 7B param that can fit in 1 GPU, but moving forward you’ll probably need more than on GPU. The large models with 30B, 70B, use several GPUs, and the super models like GPT may be in the order of 100s of GPUs.

I’ve done fine tuning using Peft+LoRA in my laptop in a model call DistilBERT which is a small model. But I’ve tried to do it with Bloomz7B1 and it has gone out of memory in my laptop, but when I run it in Colab with 1 GPU, it works.

I had the same problem. I solved it by adding:
per_device_train_batch_size=2,
per_device_eval_batch_size=2
to training_args
batch_size = 1 or 2, or 4 should be fine