✨ New course! Enroll in Efficient Inference with SGLang: Text and Image Generation

:arrow_forward: Enroll Now!

Introducing Efficient Inference with SGLang: Text and Image Generation, built in partnership with LMSys and RadixArk, and taught by Richard Chen a Member of Technical Staff at RadixArk.

Running LLMs in production is expensive. Much of that cost comes from redundant computation: every new request forces the model to reprocess the same system prompt and shared context from scratch. SGLang is an open-source inference framework that eliminates that waste by caching computation that’s already been done and reusing it across future requests.

In this course, you’ll build a clear mental model of how inference works (from input tokens to generated output) and learn why the memory bottleneck exists. From there, you’ll implement the KV cache from scratch to store and reuse intermediate attention values within a single request. Then you’ll go further with RadixAttention, SGLang’s approach to sharing KV cache across requests by identifying common prefixes using a radix tree. Finally, you’ll apply these same optimization principles to image generation using diffusion models.

can you share the slides used in the videos?

Hello @mohamed_seyam ,

We do not offer slides for short courses at the moment. We will let learners know once this has changed.