Subscribe for free access to
Data Points!
DeepSeek released V3.2-exp, an experimental model with a new sparse attention system that cuts inference costs for long-context operations by up to 50 percent. The system employs an indexer to prioritize specific excerpts and a token selection system to choose relevant tokens, allowing the model to process long contexts with reduced server loads. The open-weight model is available on Hugging Face with an accompanying academic paper on GitHub, enabling third-party researchers to verify DeepSeek’s performance claims. This development addresses the growing challenge of inference costs, a critical bottleneck as AI applications scale. The model is available under an MIT license, or via API at $0.28/$0.42 per million input/output tokens. (DeepSeek)
