Relationship between batch size and context window on the amount of memory that's needed for the model

ramz · March 1, 2024, 9:19am

Can anyone give me an explanation for the relationship between batch size and context window on the amount of memory that’s needed for the model? I can see that the higher the batch size and context window the more memory is needed but is there some calculation that approximates the impact?

I tried to search and ask the LLMs but I wasn’t successful in getting the answers - or at least understanding fully what I got.

TMosh · March 1, 2024, 6:30pm

It starts by knowing the number of trainable parameters in the model.

ramz · March 1, 2024, 6:45pm

let’s say 7B but is there some relationship that is a good estimate that takes into account the various options available? If it’s only just inference rather than training does that change anything?

TMosh · March 1, 2024, 6:49pm

Just a discussion of general magnitudes:

Multiply the batch size by the number of trainable parameters, then add the non-trainable parameters. This tells you how much memory the model is going to occupy.
For training, you’ll need double the amount of memory for the trainable parameters, because it usually updates one copy while using the original copy for forward propagation.

Topic		Replies	Views
Context Length and exponential increase in RNNs Memory (Parameters) Generative AI with Large Language Models week-module-1	2	486	June 30, 2023
Seasonality, window, and batch size Sequences, Time Series and Prediction week-module-4	4	645	August 19, 2023
Questions about "GPU RAM size needed to train 1B parameters" Generative AI with Large Language Models week-module-1	8	3279	February 20, 2024
Loading Model - Memory Requirements Generative AI with Large Language Models week-module-2	1	420	October 30, 2023
Relationship between batch size and GPU memory Generative AI with Large Language Models week-module-1	7	688	March 29, 2024

Relationship between batch size and context window on the amount of memory that's needed for the model

Related topics