Questions about "GPU RAM size needed to train 1B parameters"

kjscop · July 25, 2023, 12:15pm

Hi. I’m attending the course ‘Generative AI with LLMs’ 1 week - “Computational challenges of training LLMs”

about 2:00 in lecture,
It tells we need 80GB @32-bit Full precision.

But what I understand in this course 1:30 ~ 2:00 is

for 1 parameters, we need

4 bytes per parameter / Model Parameters (= weights)
- 8 bytes per parameter / Adam Optimizers (2 states)
- 4 bytes per parameter / Gradients
- 8 bytes per parameter / Activations and temp memory

So, in training session, for 1 parameters, we need 4 + ( 8+4+8) = 24 bytes in max.

As the results, I calculate like below for 1B parameters model,
24 bytes x 1B = 24GB is needed.

but 2:00 in this lecture, We needs 80GB for training 1B models…

So … What am I missing?
Is there anyone who corrects me ?

Thanks in advance

Juan_Olano · July 25, 2023, 3:32pm

Hi @kjscop , thank you for your case. We have detected this and have sent it to the group in charge of content. This is being reviewed.

ketan_jogadankar · July 28, 2023, 6:00am

Hey, I too have same doubt? How we got 80 GB @ FP32 for 1B model parameters? Please share math behind this calculation

Juan_Olano · July 28, 2023, 1:25pm

I am still waiting on a reply from the group in charge of this

kjscop · July 29, 2023, 12:57am

Thank you for updating.

Minimum_Gravity · July 31, 2023, 3:15am

I believe 24 GB’s refers to the memory that is needed to hold the necessary pieces to run the LLM in memory which doesn’t include the memory needed to train it. More memory is needed to train.

This is how I understood it.

kjscop · July 31, 2023, 10:01am

“More memory” is like memory which is caused by train sample in batch ?

Minimum_Gravity · July 31, 2023, 5:41pm

It sounds like the extra components of training can easily lead to 20x the amount that the weights alone take up.

I’m not sure of the details on what exactly is taking up all this memory. Still learning

SOURCE
Computational challenges of training LLMs
Minute 1:21

"If you want to train the model, you’ll have to plan for additional components that use
GPU memory during training. These include two Adam optimizer states,
gradients, activations, and temporary variables needed by your functions.

This can easily lead to 20 extra bytes of memory per model parameter.
In fact, to account for all of these overhead during training,
you’ll actually require approximately 20 times the amount
of GPU RAM that the model weights alone take up."

Hope this helps.

Iniyaan_Poongundran · February 20, 2024, 7:54am

Actually, 24 GB should be right one. If you refer to this video posted in Deeplearning.AI YouTube channel, Efficient Fine-Tuning for Llama-v2-7b on a Single GPU .
They have taken Parameter, Gradient and Optimizer into account but not activation. They get 112 GB in total for 7 billion parameters.

If you take all these 4 (Parameter, Gradient and Optimizer and Activation) into account, we should get 168GB for 7 billion parameters.

So for 1 billion parameters it should be 24 GB

Topic		Replies	Views
Could not understand how training 1B parameters requires 80 GB RAM Generative AI with Large Language Models week-1	6	819	November 16, 2023
Loading Model - Memory Requirements Generative AI with Large Language Models week-2	1	415	October 30, 2023
When we upgrade to a better GPU (V100 32G), how should we adjust the training parameters? I have come across some issues Finetuning Large Language Models	0	82	October 7, 2023
Fine tune the mode on GPU Generative AI with Large Language Models week-2	4	660	August 3, 2023
Week 2 Lab - what parameters to use to fully fine-tune the model? (part 2.2) Generative AI with Large Language Models ai-discussions	4	30	March 11, 2025

Questions about "GPU RAM size needed to train 1B parameters"

Related topics