Does model sharding fully utilize all GPUs?

Nyxynyx · June 28, 2023, 10:15pm

When my HF transformers models get sharded across multiple GPU because it is too large to fit into the VRAM of a single GPU, I notice that only one of the GPU gets 100% utilization at a time, and each model takes turn to get 100% utilization while the others are at 0%.

What sharding method does the transformer library use? Does ZeRO/FSDP help to fully utilize all GPUs?

Topic		Replies	Views
Ctransformers also using CPU as well as GPU for a model that should fit in VRAM AI Discussions ai-discussions	1	508	March 2, 2024
FSDP Model Sharding: Where does Synchronization take place? Generative AI with Large Language Models week-1	1	12	September 30, 2024
Langchain Ctransformer GPU performance AI Discussions ai-discussions	1	453	March 2, 2024
Agnostic GPU card AI Discussions ai-discussions , data-centric	12	145	September 17, 2024
Increase GPU utilization AI Discussions	5	130	January 9, 2024

Does model sharding fully utilize all GPUs?

Related topics