Relationship between batch size and GPU memory

liangyi · March 28, 2024, 3:54am

Hi. I’m attending the course ‘Generative AI with LLMs’ 1 week - “Computational challenges of training LLMs” The following calculation is clear, but I wonder if this is a pure model usage calculation just trainning for 1 sample? but what is the impact of different batch sizes on GPU memory? What are the specific impacts?

I can think of times when the results of different batch sizes in each layer need to be stored at least temporarily. It should be proportional to the batch size. Can anyone show it in a specific example? thank you.

Deepti_Prasad · March 28, 2024, 9:15am

Hello @liangyi

That is such a good question, even I had this thought when GPU usage had affect on my model training, however I couldn’t find totally correlative analysis between parameters and bytes usage i.r.t. GPU. but today I found someone who did some digging

github.com/ultralytics/ultralytics

How to roughly calculate the GPU used for training, and the abnormal occupancy problem when selecting different imgsz

opened 06:26AM - 04 Dec 23 UTC

closed 02:42AM - 05 Dec 23 UTC

minwim

question

### Search before asking - [X] I have searched the YOLOv8 [issues](https://gith…ub.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and found no similar questions. ### Question I used a data set to train the yolov8 detection model. During this period, I chose different training sizes and found some interesting things. ``` from ultralytics import YOLO if __name__ == '__main__': modelpath = '/home/disk/wlm/yolov8/detect/workspace/models/yolov8s.pt' model = YOLO(modelpath) model.train(data='/home/disk/wlm/yolov8/detect/workspace/yaml/infect_zx_1124.yaml', epochs=2, imgsz=[480], batch= 1, device = [0], workers= 4, patience = 20 # resume = True ) ``` I used imgsz as 480, 512, 960 and 1024 respectively to record the actual usage of the GPU and yolov8's own statistics GPU_men imgsz = 480, nvidia-smi: ![image](https://github.com/ultralytics/ultralytics/assets/86920530/961d91a1-fd8c-4829-8736-b33184edb2ec) yolov8 GPU_mem: ![image](https://github.com/ultralytics/ultralytics/assets/86920530/02558f56-56fa-4f1d-b826-0db08b2ce16e) The statistics of different sizes are as follows: | imgsz| nvidia-smi GPU_used | yolov8 GPU_mem | |:------------|:-------------:|------------:| | 480 | 2009MB | 0.422GB | |512 | 2025MB | 0.438GB | |960 | 2769MB | 1.22GB | |1024 | 2389MB | 0.82GB | **There are two questions bothering me,** 1. The specific meaning of GPU represented by this parameter GPU_mem, and the relationship between the GPU usage in nvidia-smi 2. Why does the GPU usage of the size imgsz = 1024 be smaller than imgsz = 960 (this is contrary to my conventional intuition?) ### Additional _No response_

Give me some more time, if I find more relative matter on this, will share.

Regards
DP

liangyi · March 28, 2024, 10:22am

haha thank u, there are a lot of instructions for the parameter nums and GPU memory calculation , but there is less information about the batch size , most people intuitively adjust when they see OOM i think

Deepti_Prasad · March 28, 2024, 10:28am

batch_size there are information, but parameter wise there isn’t much.

See this article where it tells on how to track GPU usage

But your query gave me an idea about doing an analysis parameter wise i.r.t. GPU.

Thanks and regards
DP

liangyi · March 28, 2024, 10:34am

ok thank you I’ll do some test as well

Deepti_Prasad · March 28, 2024, 3:32pm

my question was related to this image Paul @paulinpaloalto

paulinpaloalto · March 28, 2024, 3:54pm

If the question is how to add the “dimension” of batch size to the above memory usage computations, then it requires a bit more thought. We have to consider what happens in the various intermediate calculations that we do in forward and back prop.

For the parameters themselves, there is no change, right? Because the size of the parameters is not affected by the batch size.

The final gradients themselves are also the same size as the parameters, of course. So those are not affected.

But then we have to think about all the intermediate steps (linear and non-linear activations in forward prop) and all the Chain Rule formulas in back prop.

All the forward propagation calculations involve the minibatch, e.g.

Z^{[l]} = W^{[l]} \cdot A^{[l-1]} + b^{[l]}
A^{[l]} = g^{[l]}(Z^{[l]})

So all the A and Z values there are number of neurons times number of samples.

And some of the back prop formulas also are affected by the batch size, e.g.:

dZ^{[L]} = A^{[L]} - Y
dW^{[L]} = \displaystyle \frac {1}{m} dZ^{[L]} \cdot A^{[L-1]T}

So the intermedate values there will be neurons times samples, but those are just temporary values that can be discarded.

So basically this makes my head hurt. The simpler approach would be to evaluate this experimentally. It looks like Deepti has found some really valuable info about how to monitor or get the status of the GPU. So you could just turn on all the GPU statistics gathering or add that to your training scripts and then try running first with minibatch size = 1 (Stochastic GD) to get the baseline memory usage, which should correspond to your chart above. Then run again with batch size = 2 or 4 or 8 and see how much the memory usage increases. With that info, you can then estimate what the maximum batch size is that you can use without getting the dreaded OOM errors given the size of the memory on your particular GPU.

I have not actually read the articles pointed to above, but does what I’m suggesting there sound like it would be doable?

liangyi · March 29, 2024, 3:32am

Thank you from Guangzhou China

Topic		Replies	Views
Questions about "GPU RAM size needed to train 1B parameters" Generative AI with Large Language Models week-1	8	3232	February 20, 2024
Relationship between batch size and context window on the amount of memory that's needed for the model AI Discussions ai-discussions	3	188	March 1, 2024
When we upgrade to a better GPU (V100 32G), how should we adjust the training parameters? I have come across some issues Finetuning Large Language Models	0	82	October 7, 2023
Work with GPU Generative AI with Large Language Models week-2	3	491	April 8, 2024
Fine tune the mode on GPU Generative AI with Large Language Models week-2	4	660	August 3, 2023

Relationship between batch size and GPU memory

Related topics