Short Course Q&A Efficiently Serving LLMs
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Lesson 3: functions init_batch, merge_batches, filter_batch used to implemented continuous batching
|
![]() ![]() |
4 | 59 | June 19, 2024 |
The gathered latency is not better than the loop latency in lesson 6
|
![]() |
0 | 41 | June 14, 2024 |
Position_ids
|
![]() |
0 | 113 | May 10, 2024 |
Latency will only increase
|
![]() ![]() |
1 | 119 | May 9, 2024 |
Multi-lora-inference issue with gathered version
|
![]() ![]() ![]() |
3 | 166 | April 19, 2024 |
Error for Lesson 7 - LoRAX
|
![]() ![]() |
1 | 203 | April 10, 2024 |
Run the initial model after quntization function (Lecture 4))
|
![]() |
0 | 90 | April 4, 2024 |
KV Caching for Instruction Tuned models
|
![]() ![]() |
1 | 189 | March 21, 2024 |
Execution time
|
![]() ![]() |
1 | 121 | March 21, 2024 |
Why inference using dequantized model?
|
![]() ![]() |
1 | 241 | March 19, 2024 |