Short Course Q&A Efficiently Serving LLMs
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Lesson 3: functions init_batch, merge_batches, filter_batch used to implemented continuous batching
|
4 | 54 | June 19, 2024 | |
The gathered latency is not better than the loop latency in lesson 6
|
0 | 41 | June 14, 2024 | |
Position_ids
|
0 | 107 | May 10, 2024 | |
Latency will only increase
|
1 | 116 | May 9, 2024 | |
Multi-lora-inference issue with gathered version
|
3 | 161 | April 19, 2024 | |
Error for Lesson 7 - LoRAX
|
1 | 196 | April 10, 2024 | |
Run the initial model after quntization function (Lecture 4))
|
0 | 88 | April 4, 2024 | |
KV Caching for Instruction Tuned models
|
1 | 180 | March 21, 2024 | |
Execution time
|
1 | 119 | March 21, 2024 | |
Why inference using dequantized model?
|
1 | 222 | March 19, 2024 |