Hi,
In Week 1 video lecture “Computational challenges of training LLMs” , It is mentioned that
1 parameter = 4 bytes @ 32 bits full precision
1B parameters = 4 GB
Extra 20 bytes are required for other things(activations, gradients… etc).
How come 80 GB RAM required for 1B parameters training? Isn’t it supposed to be 24 GB with 20 bytes of extra memory for parameter.
Please let me know if I am missing something here.
Thanks,
Pandu
1 Like
That would be the data (the inputs) and the embeddings? You need memory to store the things that are multiplied to the weights during training as well, instead of just storing things that are associated with the weights.
1 Like
WOW!!!
Hi @GUGULOTHU_PANDU ! I think you may have found an error in the lecture! I’ve done my math, and crossed check with other searches and your numbers seem to be right! We would need more like 20Gb of RAM for 1B params!
I will raise this point internally and share any outcome with you.
Thanks!
Juan
1 Like
@Juan_Olano What was the result of this? Does this mean if I wanted to load in the 1B parameter model, out of the box locally with no quantization, I would need about 20-24GB of RAM? I’m tempted to allocate the difference (80GB - 20GB = 60 GB) to full-fine tune, because in Week 2, its clearly stated that a full copy of the model is created for every task.
1 Like
Hi, I have rewatched this part of the video a few times to get a better understanding. It isn’t clear to me completely either. The only thing that could hint to justifying the 80GB for a 1B parameter model would be something said on these lines in the video: Due to the overhead that occurs during the model training, we would need 20times more memory for training as compared to the model size. Can a mentor please confirm and make sense of this? What is the overhead?
1 Like
Hi Juan, do you have an update on this?
1 Like
Hi everyone. This will be corrected soon. The number should only be about 24GB instead of 80GB. Thank you for reporting!