Open weights, so you pay an inference provider (Together $0.20/$0.60, DeepInfra ~$0.27 blended) or self-host on ~8x H100. No license fee.

Can I use it commercially?

Yes — Apache 2.0, no MAU threshold, full commercial and redistribution rights.

What about China data residency?

The official DashScope mainland endpoint routes through Alibaba Cloud in China; use the international endpoint, a US/EU-hosted provider, or self-host to avoid it.

Hosted is a drop-in OpenAI-compatible API. Self-hosting needs an 8x H100-class node and vLLM/SGLang — non-trivial but well-documented.

Yes, via an optional hybrid thinking mode with full visible CoT; toggle it per request.

How does it handle non-English?

Best-in-class among open weights for Chinese, Japanese, Korean, Arabic, and Indic languages across 119 supported languages.

Should I worry about content alignment?

On PRC-sensitive political topics, expect stricter refusals/deflection than Western models. For most workloads this is irrelevant; for political/news products, test it.

Qwen3-235B-A22B Review — Benchmarks, Pricing & AI Panel Verdict

Benchmark	Score	Source
MMLU-Pro	82.8%	Qwen3 Technical Report (arXiv 2505.09388)2025-05-14T00:00:00.000Z
AIME 2025	81.5%	Qwen3 Technical Report (arXiv 2505.09388)2025-05-14T00:00:00.000Z
LMArena Elo	1431	LMArena Text leaderboard (qwen3-235b-a22b-instruct-2507 checkpoint)2025-08-01T00:00:00.000Z
GPQA Diamond	70%	Qwen3 Technical Report (arXiv 2505.09388)2025-05-14T00:00:00.000Z
LiveCodeBench	70.7%	Qwen3 Technical Report (arXiv 2505.09388), LiveCodeBench v52025-05-14T00:00:00.000Z

Architecture

Qwen3-235B-A22B is a sparse MoE decoder. Total parameters are 235B; 22B activate per forward pass (8 of 128 experts routed per token). It uses 94 transformer layers, Grouped Query Attention, SwiGLU activations, Rotary Positional Embeddings with frequency scaling, and RMSNorm pre-normalization — the same modern stack as the Qwen3 dense models, scaled up and made sparse. Pre-training ran on roughly 36 trillion tokens spanning 119 languages and dialects, followed by a multi-stage post-training pipeline that fuses reasoning (long-CoT) and non-reasoning behavior into one model with a runtime switch. Native context is 131K. Alibaba disclosed these figures in the Qwen3 launch blog and Technical Report (arXiv 2505.09388), so unlike most frontier models the architecture is fully public.

Capabilities

The 22B active footprint keeps inference cost and latency closer to a 30B dense model than to a true 235B, while the sparse capacity delivers frontier-adjacent quality. Reasoning and math are the headline strengths (cap_reasoning 8.5, cap_math 8.7): MMLU-Pro 82.8, GPQA Diamond 70.0, AIME 2025 81.5 with thinking mode on. Coding is strong (cap_coding 8.0) — LiveCodeBench v5 70.7, best among open-weight at release. Tool-use, parallel function calls, and structured JSON output are natively trained (cap_function_calling 8.0, cap_agentic 7.5). The defining edge is multilingual coverage (cap_multilingual 9.0): Chinese, Japanese, Korean, Arabic, and major Indic-language performance materially exceeds DeepSeek and Llama peers — this is the open-weight model teams reach for when the workload mixes English and Asian languages. Long-context is honest to roughly 64-96K before degradation (cap_long_context 7.0). It has no vision or live-data access (cap_vision 0, cap_realtime_data 0). Safety calibration is mid (cap_safety_calibration 6.0): refusal behavior is Western-comparable except on PRC-sensitive political topics.

Benchmark analysis

Benchmark	Score	vs Predecessor	vs Top Competitor	Source
MMLU-Pro	82.8	+~12 vs Qwen2.5-72B	Near GPT-4o-class	Tech Report
GPQA Diamond	70.0	+~20 vs Qwen2.5-72B	Behind o1, ahead of DeepSeek-V3	Tech Report
AIME 2025	81.5	n/a (new)	Ahead of Gemini 2.5 Pro on AIME'25	Tech Report
LiveCodeBench v5	70.7	+~25 vs Qwen2.5-72B	Best open-weight at release	Tech Report
LMArena Elo	1431	n/a	Top-tier open-weight (2507 checkpoint)	LMArena

AIME 2024 was reported at 85.7 and Arena Hard at 95.6 in the Technical Report; CodeForces rating 2056, BFCL v3 70.8. The LMArena Elo of 1431 is for the refreshed 2507 checkpoint (Aug 2025), not the original April release. Scores reflect thinking mode on; non-thinking scores are lower. Never invented — keys without a verifiable score are null in the data layer.

Speed & latency

Median throughput around 68 tokens/sec with time-to-first-token near 0.78s on tuned inference stacks (llm-stats aggregate). In practice, latency varies sharply with the thinking toggle: non-thinking responses feel like a normal 30B-class model; thinking mode can run 5-30x longer because reasoning chains are long. For interactive product surfaces, gate thinking mode behind an explicit "reason" path. Batch throughput is strong on vLLM/SGLang with FP8.

Pricing analysis

Surface	Cost	Notes
API input (Together)	$0.20 / 1M tok	Most common Western provider
API output (Together)	$0.60 / 1M tok	No published cache discount
Fireworks (blended)	~$0.90 / 1M tok	Serverless flat-rate
DeepInfra	~$0.27 / 1M tok blended	Among cheapest mainstream
Alibaba Model Studio (DashScope)	Pay-as-you-go	First-party; intl endpoint available
Direct UI	Free at chat.qwen.ai	No SLA
Self-host (~8x H100)	~$15-30/hr	Breaks even vs API at ~2-3M tok/hr
Rate limits	provider-dependent	No fixed RPM/TPM on open weights

Deployment & access

Open weights on Hugging Face and ModelScope under Apache 2.0 — no MAU threshold, full commercial use, redistribution, and fine-tuning rights. FP16 serving needs roughly 470GB VRAM (8x H100 / H200 class); FP8 and AWQ quantizations bring the practical floor to roughly 240GB. GGUF exists for llama.cpp but the MoE is impractical on consumer hardware. Hosted by Together, Fireworks, DeepInfra, Hyperbolic, Novita, and OpenRouter; first-party via Alibaba Cloud Model Studio (DashScope), which offers an international endpoint outside mainland China. Data-residency note: the official DashScope mainland endpoint routes through Alibaba Cloud in China; the international endpoint and third-party providers (US/EU-hosted) avoid that, and self-hosting eliminates egress entirely.

Safety & privacy

No formal published safety framework or tier label. Alibaba does not train on third-party inference inputs when self-hosted; first-party API data handling follows Alibaba Cloud terms with opt-out. No SOC2/HIPAA certifications attach to the open weights themselves (a hosting provider may carry its own). No built-in content moderation layer. Refusal calibration is Western-comparable on general topics, with notably stricter refusal/deflection on PRC-sensitive political content — an honest enterprise consideration for US public-sector or politically sensitive consumer surfaces.

Ecosystem & tooling

SDKs via OpenAI-compatible clients (Python, TypeScript). Deep open-source support: vLLM, SGLang, Ollama, llama.cpp, MLX, Transformers, plus LangChain, LlamaIndex, Axolotl, and LLaMA-Factory for fine-tuning. Hosted by Together, Fireworks, DeepInfra, Hyperbolic, Novita, and OpenRouter; first-party via Alibaba Cloud Model Studio. Popularity is mainstream — Qwen3 is one of the most-downloaded open-weight families on Hugging Face and a frequent top-of-leaderboard open model.

Qwen3-235B-A22B

What's new

Benchmarks

AI Panel Review

Strengths

Limitations

Best use cases

Deep dive

Architecture

Capabilities

Benchmark analysis

Speed & latency

Pricing analysis

Deployment & access

Safety & privacy

Ecosystem & tooling

Buyer questions

Comparable models

Sources

Model specs

Other Qwen3 versions