DeepSeek unveils sparse attention model for cheaper long-context inference

Community-Team · October 3, 2025, 6:40pm

Subscribe for free access to Data Points!

DeepSeek released V3.2-exp, an experimental model with a new sparse attention system that cuts inference costs for long-context operations by up to 50 percent. The system employs an indexer to prioritize specific excerpts and a token selection system to choose relevant tokens, allowing the model to process long contexts with reduced server loads. The open-weight model is available on Hugging Face with an accompanying academic paper on GitHub, enabling third-party researchers to verify DeepSeek’s performance claims. This development addresses the growing challenge of inference costs, a critical bottleneck as AI applications scale. The model is available under an MIT license, or via API at $0.28/$0.42 per million input/output tokens. (DeepSeek)

Topic		Replies	Views
Qwen3-Next employs hybrid attention for long context inputs AI Discussions ai-discussions , data-points	3	61	September 17, 2025
Speech to text - Open models for transfer learning AI Discussions	1	67	May 18, 2023
Pre-trained model for invoice parser AI Discussions	2	78	January 7, 2023
How can I optimize cost of ChatGPT when prompting? AI Discussions ai-discussions	4	179	October 22, 2023
We are hiring! - Marketing Operations Manager AI Discussions ai-discussions , careers	1	109	May 18, 2023

DeepSeek unveils sparse attention model for cheaper long-context inference

Related topics