Qwen/Qwen3-Coder-Next logo

Qwen/Qwen3-Coder-Next

Qwen3-Coder-Next is an open-weight MoE language model designed specifically for coding agents. With only 3B activated parameters out of 79.7B total, it achieves performance comparable to models with 10–20x more active parameters. It features a hybrid Gated Attention + Gated DeltaNet MoE architecture with 512 experts (10 active per token), 262K native context, and achieves 74.2% on SWE-Bench Verified — making it highly cost-effective for production agent deployment.

Alibaba (Cloud) Code 262K Tokens
Get API Key
Deposit $5 to get started Unlock API access and start running inference right away. See how many million tokens $5 gets you

api_example.sh

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "Qwen/Qwen3-Coder-Next",
  "messages": [
    {
      "role": "user",
      "content": "Write a Python function to calculate fibonacci sequence"
    }
  ],
  "temperature": 1,
  "max_tokens": 8192,
  "stream": true,
  "top_p": 0.95
}'

Technical Specifications

Model Architecture & Performance

Variant Instruct
Model Size 79.7B params (3B active)
Context Length 262K Tokens
Quantization FP8
Tokens/sec 80
Architecture Hybrid Gated Attention + Gated DeltaNet MoE Transformer, 512 experts / 10 active per token, 48 layers
Precision FP8
License Apache 2.0
Release Date February 1, 2026
Developers Alibaba Cloud (QwenLM)

Pricing

Pay-per-use, no commitments

Input Tokens $0.30/1M Tokens
Output Tokens $1.50/1M Tokens

API Reference

Complete parameter documentation

Parameter Type Default Description
stream boolean true Enable streaming responses for real-time output.
temperature number 1 Controls randomness in output.
max_tokens number 8192 Maximum tokens to generate.
top_p number 0.95 Controls nucleus sampling.

Performance

Strengths & considerations

Strengths Considerations
Only 3B active params from 79.7B total — performs like 30–60B models
74.2% on SWE-Bench Verified, 63.7% SWE-Bench Multilingual
Native 262K context length (262,144 tokens)
Hybrid Gated Attention + Gated DeltaNet MoE, 512 experts / 10 active
Advanced tool calling with complex function orchestration
10–20x parameter efficiency advantage for agent workloads
Non-thinking mode only — no chain-of-thought reasoning blocks
Not optimized for vision or multimodal tasks
Best suited for agentic tasks; overkill for simple completions

Use cases

Recommended applications for this model

Agentic software development & long-horizon coding
Complex tool use & function orchestration
Execution failure recovery in dynamic workflows
Repository-scale navigation and bug fixing
Automated testing, refactoring & documentation
CI/CD pipeline integration for code generation

Enterprise
Platform Integration

Docker

Docker Support

Official Docker images for containerized deployments

Kubernetes

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

"Qubrid scaled our personalized outreach from hundreds to tens of thousands of prospects. AI-driven research and content generation doubled our campaign velocity without sacrificing quality."

Demand Generation Team

Marketing & Sales Operations