Qwen3-Next employs hybrid attention for long context inputs

Community-Team · September 16, 2025, 1:10pm

Subscribe for free access to Data Points!

Alibaba introduced Qwen3-Next-80B-A3B, which activates only 3 billion of its 80 billion parameters during inference using a sparse mixture-of-experts design. The model combines Gated DeltaNet linear attention with standard attention in a 3:1 ratio, achieving performance comparable to dense 32 billion-parameter models while using less than 10 percent of the training compute. For contexts over 32,000 tokens, Qwen3-Next delivers more than 10 times faster inference and supports up to 256,000 tokens. This shows that sparse architectures with hybrid attention can match larger models’ performance while drastically cutting computational costs. The models are available on Hugging Face and ModelScope, with API access through Alibaba Cloud and NVIDIA. (Qwen)

Topic		Replies	Views
DeepSeek unveils sparse attention model for cheaper long-context inference AI Discussions ai-discussions , data-points	3	49	October 6, 2025
Attention sequence model week3(make post attention steps output depend on prior step) Deep Learning Resources	1	95	October 7, 2022
C4_W4 Reformer Chatbot Assignment - Training model from scratch NLP with Attention Models week-module-4	2	565	March 5, 2022
Ungraded Lab: Question Answering with HuggingFace 2 NLP with Attention Models week-module-3	2	552	May 4, 2022
AttentionQKV dense layers NLP with Attention Models week-module-1	1	557	April 23, 2022

Qwen3-Next employs hybrid attention for long context inputs

Related topics