FSDP Model Sharding: Where does Synchronization take place?

themightywolfie · September 30, 2024, 5:26am

If the model is too big to fit into a single GPU, where does the synchronization process take place? Is it carried out on yet another GPU node or does the it take place in CPU?

gent.spah · September 30, 2024, 5:50am

In parallel GPUs (either cores or other present GPU’s), if the processing units are different its much harder to run the same operation in parallel, generally speaking!

Topic		Replies	Views
I have a question about the content of the lecture~ Generative AI with Large Language Models week-module-2	3	409	September 21, 2023
Does model sharding fully utilize all GPUs? AI Discussions	0	170	June 28, 2023
Combining data parallelism with model parallelism Generative AI with Large Language Models quiz-help , week-module-1	2	303	July 30, 2025
2D and 3D parallelism Generative AI with Large Language Models week-module-1	1	449	July 7, 2023
Weights update on multi GPU mirrored strategy Custom and Distributed Training with TF week-module-4	3	594	October 18, 2022

FSDP Model Sharding: Where does Synchronization take place?

Related topics