Multi-lora-inference issue with gathered version

It seems there is an issue affecting the gathered version for serving multiples Lora. It is taking much more time than expected compared with the loop version
image

I’ve tested it in my local machine and got the expected behavior so there should be something related with the environment.

Hello @Daniel_Casals

Can you share the result outcome from your local jupyter?

Could you notice any difference in codes, or module version?

You will have to share screenshots for better understanding for others to respond.

Regards
DP

Hello @Deepti_Prasad

Maybe I did not explained well, my local code ran ok and I got the expected result.
The issue I reported as well as the shared chart is from the online Deeplearning.ai jupyter of the course.

You should be able to reproduce the issue in the embedded provided notbook https://learn.deeplearning.ai/courses/efficiently-serving-llms/lesson/7/multi-lora-inference and Run All cells.

I confirm I am having the same issue, multi-lora runs properly on my local laptop, but not on the deeplearning.ai platform notebook.