Fine tune the mode on GPU

Real_Emily · July 18, 2023, 9:02pm

In Week2’s Lab, section 2.2 - Fine tune the model was run on CPU. I tried to run trainer.train() on my GPU which has 32G memory and I get the OutOfMemoryError message. How to decide the GPU memory size required to fine tune a given model?

Juan_Olano · July 18, 2023, 9:21pm

Hi @Real_Emily ,

I understand that you are doing a full fine-tuning right?

To calculate the memory required for this task you have to consider several variables:

The size of the base model. That’s for starters the one thing that may consume the most as the weights will have to be loaded into the RAM.
Batch size: You are in control of this one. The batch size will also impact the memory usage. The bigger the batch the more memory is needed.
Precision: Models come usually in 32bits but you can use quantization to go to 16bits or even 8 bits. This will definitively impact your memory usage, but also the precision.

There are other variables that will also impact like gradient accumulation, sequence length, the number of GPUs at your disposal.

HAVING SAID ALL THAT…
There are other ways to fine tune a model that will use much much less resources. One very well known and recommended is Peft + LoRA. With this, you can get almost the same quality of a full training at a fraction of the resources.

I hope this info is useful!

Juan

Real_Emily · July 19, 2023, 5:33am

Thanks @Juan_Olano . Is it possible that a model is so big that the model itself cannot fit into one GPU? My understanding is that in order to use Peft + LoRA, the model itself at least need to be loaded into one GPU.

Juan_Olano · July 19, 2023, 12:47pm

Hi @Real_Emily ,

Yes, definitively. There are small models with, say 1B Params, even probably a 7B param that can fit in 1 GPU, but moving forward you’ll probably need more than on GPU. The large models with 30B, 70B, use several GPUs, and the super models like GPT may be in the order of 100s of GPUs.

I’ve done fine tuning using Peft+LoRA in my laptop in a model call DistilBERT which is a small model. But I’ve tried to do it with Bloomz7B1 and it has gone out of memory in my laptop, but when I run it in Colab with 1 GPU, it works.

tbhuy · August 3, 2023, 11:07pm

I had the same problem. I solved it by adding:
per_device_train_batch_size=2,
per_device_eval_batch_size=2
to training_args
batch_size = 1 or 2, or 4 should be fine

Topic		Replies	Views
When we upgrade to a better GPU (V100 32G), how should we adjust the training parameters? I have come across some issues Finetuning Large Language Models	0	82	October 7, 2023
Lab 2 - What training parameters were used to full train the LoRA tuned model? Generative AI with Large Language Models week-2	2	73	June 25, 2024
Questions about "GPU RAM size needed to train 1B parameters" Generative AI with Large Language Models week-1	8	3215	February 20, 2024
Loading Model - Memory Requirements Generative AI with Large Language Models week-2	1	415	October 30, 2023
Week 2 Lab - what parameters to use to fully fine-tune the model? (part 2.2) Generative AI with Large Language Models ai-discussions	4	30	March 11, 2025

Fine tune the mode on GPU

Related topics