Hi, I am trying to scale up my nlp ensemble model( retvec and glove embedding) the response time of the prediction is 30 ms but overall response time goes up to 70 ms due to loading data from host cpu to gpu. Is there any solution for this also is there any way to decerease the prediction time further tried tensorrt, onnx but no use
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Ungraded Assignments (Bert and T5) Model Decoding | 3 | 508 | November 14, 2022 | |
You cannot currently connect to a GPU due to usage limits in Colab | 9 | 1601 | November 22, 2021 | |
Any reasonable hints regarding the assignment? | 3 | 248 | July 23, 2023 | |
Tip for satisfying the GANS with Hands grader | 2 | 716 | December 14, 2022 | |
Week 4, assignment 2 (Neural Style Transfer) slow on local computer | 3 | 128 | May 27, 2024 |