1 versions from Alibaba Cloud, reviewed by the TopReviewed AI panel.
| Model | Score | Status | Released | Price in/out | Context | |
|---|---|---|---|---|---|---|
| Qwen2.5-VL-72B-Instruct best open-weight VLM for document AI | 7.9 | GA | 2025-01-25 | $0.70 / $0.70 | 131K | Review → |
Qwen2.5-VL-72B-Instruct is the largest open-weight vision-language model from Alibaba, shipped 2025-01-26. It accepts interleaved image, video, and text and produces text, with document understanding (tables, forms, charts, OCR) competitive with closed-source frontier VLMs and standout multilingual document parsing. The buyer's sentence: the default open-weight VLM for document AI and Asian-market visual workloads, at roughly 1/10th the per-token cost of GPT-4o Vision. - Provider: Alibaba Cloud (Qwen Team) - Released: 2025-01-26 (GA) - Tier: VL (vision-language flagship) - Context: 131,072 tokens (32K native + YaRN) - Max output: 8,192 tokens - Modalities: text + image + video in, text out - Knowledge cutoff: approx. 2024-10 - Headline price: approx. $0.70 in / $0.70 out per 1M tokens (blended for vision-capable open weights)
Full review →