Short Course Q&A Efficiently Serving LLMs
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Lesson 3: functions init_batch, merge_batches, filter_batch used to implemented continuous batching
|
4 | 50 | June 19, 2024 | |
The gathered latency is not better than the loop latency in lesson 6
|
0 | 38 | June 14, 2024 | |
Position_ids
|
0 | 100 | May 10, 2024 | |
Latency will only increase
|
1 | 113 | May 9, 2024 | |
Multi-lora-inference issue with gathered version
|
3 | 159 | April 19, 2024 | |
Error for Lesson 7 - LoRAX
|
1 | 185 | April 10, 2024 | |
Run the initial model after quntization function (Lecture 4))
|
0 | 85 | April 4, 2024 | |
KV Caching for Instruction Tuned models
|
1 | 172 | March 21, 2024 | |
Execution time
|
1 | 116 | March 21, 2024 | |
Why inference using dequantized model?
|
1 | 197 | March 19, 2024 |