The gathered latency is not better than the loop latency in lesson 6

zoebei · June 14, 2024, 2:09am

The following outcome was obtained from the notebook for lesson 6. What could be the potential reason for the increased latency, instead of seeing an improvement, when using torch.index_select to collect all necessary LoRA matrices for the entire batch in one go? It appears that there are no benefits from parallel processing in this case.

Topic		Replies	Views
Latency will only increase Efficiently Serving LLMs	1	129	May 9, 2024
Multi-lora-inference issue with gathered version Efficiently Serving LLMs	3	166	April 19, 2024
Multitask finetuning LoRA and DistilBERT multiclass classification AI Discussions ai-discussions	4	89	September 23, 2024
LoRA: intuition w.r.t catastrophic forgetting Generative AI with Large Language Models week-module-2	4	885	September 4, 2024
Inference delays AI Discussions feedback , ai-discussions , project	0	25	November 2, 2024

The gathered latency is not better than the loop latency in lesson 6

Related topics