Codestral (25.08)

Q: What is the FIM endpoint?

/v1/fim/completions — a dedicated fill-in-the-middle API for inserting code between a prefix and suffix, ideal for editor autocomplete.

GALatest Coder

by Mistral AI · Codestral family · best for low-latency code completion and FIM

CodingCost-Optimized

7.6

AI Panel Score

Value 8.5/10

Codestral 25.08 (model ID codestral-2508, shipped end of July 2025) is Mistral's IDE code-completion specialist: a low-latency, high-frequency model built for tab-complete, fill-in-the-middle (FIM), code correction, and test generation across 80+ programming languages, exposed via a dedicated /v1/fim/completions endpoint. IMPORTANT: it is a Premier/proprietary, closed-weight model with a 128K context — not an Apache-2.0 open-weight model and not 256K (an earlier draft conflated it with the separate open "Codestral 2" relicensing). Priced at $0.30/$0.90. The buyer's sentence: the right model to power an editor's autocomplete via API, not a self-hostable open coder.

Compare this model All Codestral versions

What's new

128K context (4x the original Codestral's 32K), enabling broader repo-aware completion.
80+ programming languages supported.
Class-leading Fill-in-the-Middle (FIM) for IDE autocomplete via the /v1/fim/completions endpoint.
Shipped as part of the "Complete Mistral Coding Stack for Enterprise" (Codestral 25.08, Devstral, Codestral Embed, Mistral Code IDE extension).
LICENSE CLARIFICATION: Codestral 25.08 on La Plateforme is Premier/closed-weight. The original 2024 Codestral (22B) used the Mistral AI Non-Production License (MNPL); a separate model branded "Codestral 2" was relicensed to Apache 2.0 in April 2026. Codestral 25.08 itself is the proprietary API completion model.

Benchmarks

Benchmark	Score	Source
HumanEval	86.6%	mistral.ai 2025-07-31T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker7.5/10

“The right brain for an in-product autocomplete feature via API — just don't confuse it with an agentic coder or expect to self-host it.”

Codestral 25.08 fills a specific slot: the low-latency completion engine behind an editor or product. For a SaaS that wants a "Copilot inside our product" experience, it is a clean API choice at $0.30/$0.90 with class-leading FIM. The strategic caveat is that it is Premier/closed — there is no self-host for this model, so the sovereignty story is limited to EU-hosted API. It is not a Devstral 2 / Medium 3.5 substitute; those are agents. Value-per-dollar is high for the right use case and low if you mistake it for a generalist coder. Pick it for autocomplete, route agentic work elsewhere.

Strategic Fit 8Vendor Risk 7Roadmap Confidence 7

Pros

best-in-class FIM, low cost, mature integrations

Cons

closed/no self-host
narrow scope

Right for: in-product autocomplete via API

Avoid if: you need self-host or agentic coding

Domain Strategist7.5/10

“It owns the IDE-completion slot in Mistral's coding stack, but its closed license caps the on-prem story that Devstral Small carries.”

Codestral anchors the completion layer of Mistral's enterprise coding stack (Codestral 25.08 + Devstral + Codestral Embed + Mistral Code). Strategically it competes with the model behind GitHub Copilot on FIM quality and with open coders on price. Its weakness as a strategic asset is the closed license: the self-host / on-prem completion story belongs to Devstral Small 2 (Apache 2.0) or the open Codestral 2, not codestral-2508. So as a differentiator it is "best-in-class FIM via API," a real but bounded position. Mature integrations (Continue.dev, Tabnine, Mistral Code) give it distribution.

Competitive Positioning 8Differentiation 8Market Timing 7

Pros

best FIM, stack anchor, broad integration

Cons

closed license caps on-prem story

Right for: completion-feature vendors

Avoid if: open/self-host is the requirement

Finance Lead8/10

“$0.30/$0.90 is cheap, and for a high-frequency completion workload the per-call economics are excellent — but there's no self-host lever here.”

At $0.30/$0.90 with a $0.03 cached-input rate and ~50% batch discount, Codestral is inexpensive per call, which matters enormously for autocomplete, a workload of thousands of small calls per developer per day. The financial caveat versus an earlier assumption: there is no self-host capex lever for this specific model (it is closed), so it is a pure API-economics play. For the high-frequency completion use case the API cost is low enough to be a non-issue; for teams that wanted to amortise GPUs, the open path is Devstral Small 2. Excellent unit economics within its scope.

Cost Efficiency 8Pricing Transparency 8Value per Dollar 9

Pros

very low per-call cost, cache/batch discounts

Cons

no self-host option

Right for: high-frequency completion via API

Avoid if: you needed fixed-cost self-host

Domain Practitioner8/10

“This is the autocomplete I actually want — FIM quality is class-leading, latency is low, and the dedicated FIM endpoint is clean.”

As the autocomplete engine in an editor or product, Codestral does the job: FIM quality is genuinely class-leading, latency is low, and the /v1/fim/completions endpoint is purpose-built. 128K context means it sees enough repo to make sensible completions. I don't use it as a chat companion — that's Medium 3.5's job — but for tab-complete it's excellent. The constraint to internalise is that it's closed, so I can't fine-tune it on a private codebase the way I could with Devstral Small 2 (Apache 2.0). For pure completion ergonomics, a strong specialist.

API Ergonomics 8Tool/Agent Support 7Reliability 8

Pros

class-leading FIM, low latency, dedicated endpoint

Cons

can't fine-tune (closed)
text-only

Right for: editor/product autocomplete

Avoid if: you need private fine-tunes

Power User7.5/10

“I never see Codestral directly — I see fast, usually-correct completions in my editor, which is exactly the right standard for autocomplete.”

End users experience Codestral as autocomplete suggestions, not as a chatbot. Subjectively completions feel snappy and frequently correct for mainstream languages, less reliable for newer languages or obscure frameworks. The UX is "felt only when wrong," which is the correct bar for autocomplete — a good completion model disappears into the workflow. Good enough that I don't switch to alternatives. No conversational or vision dimension to evaluate; this is infrastructure that either helps invisibly or annoys when it misfires.

Output Quality 7.5Speed 8.5Everyday Usefulness 7.5

Pros

snappy, usually-correct completions

Cons

weaker on obscure languages
no chat

Right for: daily in-editor coding

Avoid if: you wanted a conversational coder

Skeptic7/10

“Solid FIM model — but the prior write-up claimed Apache 2.0 and 256K; it's actually closed and 128K. Verify the model, not the family.”

Codestral 25.08 is a genuinely good completion model, so the skepticism here is about provenance, not quality. The Codestral brand spans three things — the 2024 open 22B (MNPL), the 25.08 Premier API model, and a separate "Codestral 2" relicensed Apache 2.0 — and they are easy to conflate (an earlier draft did, claiming 25.08 was Apache 2.0 and 256K). The verified reality: codestral-2508 is Premier/closed with a 128K context. The lesson is to check the exact model ID and its docs page, because the marketing umbrella blurs the licensing. On capability the FIM claim holds; on openness, it does not.

Claim Accuracy 7Weakness Severity 5Hype vs Reality 7

Pros

real FIM quality

Cons

brand confusion around license/context

Right for: buyers who verify the exact model

Avoid if: you assumed it was open from the family name

Strengths

Class-leading Fill-in-the-Middle quality — the right tool for IDE autocomplete.
Low latency and price tuned for thousands of completion calls per developer per day.
128K context fits multi-file repo windows.
Dedicated FIM endpoint and 80+ language coverage.
Mature, widely integrated (Mistral Code, Continue.dev, Tabnine).

Limitations

Premier/closed-weight — no self-host for this model (correcting a prior "Apache 2.0" error).
128K context, not 256K (correcting a prior error).
Not a chat or agent model — poor fit for conversational coding help.
Text-only — no screenshot-to-code workflows.
Outperformed by Devstral 2 / Medium 3.5 on agentic SWE-bench-style tasks (a different category).
Narrower natural-language multilingual breadth than the generalist Mistrals.

Best use cases

IDE / editor extensions for tab-complete and FIM (the core purpose).
Code-correction passes and lint-style suggestions at scale.
Automated test scaffolding for newly written functions.
High-volume code search / explanation where latency dominates.
In-product completion features served via API at low per-call cost.

Deep dive

The full research notes behind this review — verified against primary sources.

Architecture Capabilities Benchmark analysis Speed & latency Pricing analysis Deployment & access Safety & privacy Ecosystem & tooling

Architecture

Codestral 25.08 is a dense, code-specialised transformer. As a Premier/closed-weight API model, Mistral does not disclose parameter count, layers, attention type, or training scale. What is verifiable: 128K context, text-only, optimised for low-latency FIM and completion rather than agentic loops or extended reasoning. It exposes both a chat-style completion path and the dedicated FIM endpoint (/v1/fim/completions). Architecture internals are deliberately undisclosed and recorded as null. The original 2024 Codestral was a 22B open model; the 25.08 API model's size is not published.

Capabilities

Codestral 25.08 is built for low-latency, high-frequency coding tasks: tab-complete, FIM, code correction, test generation (cap_coding 7.5). Where Devstral 2 and Medium 3.5 are agentic coders that plan multi-file edits and open PRs, Codestral is the editor-autocomplete brain — small, fast, focused. The 128K context lets it draw on broad repo context for completions (cap_long_context 7.5). It supports 80+ languages including Python, Java, C/C++, JavaScript, TypeScript, Rust, Go, and Bash. It is not built for agent loops or extended reasoning (cap_reasoning 5.0, cap_agentic 5.0), is text-only with no vision (cap_vision 0.0), and is a poor fit for conversational or creative work (cap_creative_writing 4.0). The cost/latency profile is tuned for thousands of small calls per developer per day. No native real-time retrieval (cap_realtime_data 0.0).

Benchmark analysis

Benchmark	Score	vs Predecessor	vs Top Competitor	Source
HumanEval pass@1	86.6%	+~2pp	competitive with Copilot-class completion models	Mistral
FIM (open category)	class-leading	flat	best-in-class for completion	Mistral

Codestral's relevant benchmarks are completion-oriented (HumanEval, FIM quality) rather than the agentic SWE-bench suite — using SWE-bench to judge it would be a category error. Coverage is partial; only the sourced completion metrics are recorded. General reasoning and math benchmarks are not the point of this model and are null.

Speed & latency

Mistral does not publish official tps/TTFT figures, but Codestral is explicitly tuned for low latency — it is the autocomplete engine, where sub-second response is the requirement. Fast latency tier. The economics and latency profile assume thousands of small completion calls per developer per day, which is why it is priced and engineered differently from the agentic coders.

Pricing analysis

Surface	Cost	Notes
API input	$0.30 / 1M tok	La Plateforme
API output	$0.90 / 1M tok	La Plateforme
Cached input	$0.03 / 1M tok	cache read
Batch (in/out)	$0.15 / $0.45	~50% async discount
FIM endpoint	included	`/v1/fim/completions`
Free tier	Codestral experimentation tier	daily quota
Cloud	Bedrock, Azure AI Foundry, Vertex AI	managed

Deployment & access

Codestral 25.08 is API-only and Premier/closed-weight — there is no Hugging Face weights download and no self-host for this specific model, contrary to an earlier draft that claimed an April 2026 Apache 2.0 relicensing. (That relicensing applies to a separate "Codestral 2" model, not codestral-2508.) It is available on La Plateforme (EU-hosted by default) and managed on Bedrock, Azure AI Foundry, and Vertex AI. For teams that need a self-hostable in-product completion model, the open paths are Devstral Small 2 (Apache 2.0) or the separate open Codestral 2 — not codestral-2508.

Safety & privacy

Standard Mistral posture: GDPR-native, SOC 2 Type II, ISO 27001/27701, EU AI Act aligned, EU residency by default, 30-day abuse retention, no training on inputs unless opt-in, ZDR available. No built-in moderation. As a code-completion model it rarely encounters refusal-sensitive content, so the practical refusal rate is low.

Ecosystem & tooling

SDKs in Python and TypeScript/JavaScript; integrated into the Mistral Code IDE extension, Continue.dev, and Tabnine, with the dedicated FIM endpoint making it easy to wire into editor plugins. Part of Mistral's enterprise coding stack alongside Devstral and Codestral Embed. Mature and mainstream within the code-completion niche.

Buyer questions

Can I self-host Codestral 25.08?

No — this specific model is Premier/closed. For self-host code completion, use Devstral Small 2 (Apache 2.0) or the separate open Codestral 2.

Wasn't it relicensed to Apache 2.0?

No — that applies to a different "Codestral 2" model. codestral-2508 remains proprietary. Always check the model ID.

What's the context window?

128K (not 256K). Enough for multi-file completion context.

What is the FIM endpoint?

/v1/fim/completions — a dedicated fill-in-the-middle API for inserting code between a prefix and suffix, ideal for editor autocomplete.

Is it good for agentic coding?

No — it's a completion model. For planning multi-file edits and opening PRs, use Medium 3.5 or Devstral 2.

How much does autocomplete cost?

At $0.30/$0.90 with caching, thousands of small completion calls per developer per day stay inexpensive.

Which clouds host it?

Bedrock, Azure AI Foundry, and Vertex AI, plus La Plateforme.

Comparable models

Devstral 2 / Medium 3.5 (Mistral):

Bigger, agentic, multi-file edits and PRs — a different category from completion; higher cost per call.

Devstral Small 2 (Mistral):

The open-weight (Apache 2.0) coder for teams that need self-host/fine-tune at the small tier.

Copilot underlying model: — GitHub

Closed, comparable FIM quality, no self-host — the closest direct competitor for in-editor completion.

Qwen 3 Coder / DeepSeek Coder:

Open-weight coders; competitive but weaker FIM polish and EU-language code-comment quality.

Sources

Primary references used to verify this review.

Model specs

Input price: $0.30 / Mtok
Output price: $0.90 / Mtok
Cached input: $0.03 / Mtok
Batch (in/out): $0.15 / $0.45
Context window: 128K tokens
Max output: 16K tokens
Knowledge cutoff: 2025-05
Released: 2025-07-30
Modalities: text → text
Output speed: Not profiled
License: Proprietary
Clouds: Bedrock, Azure AI Foundry, Vertex AI

Does not train on API inputs by default

Last verified 2026-05-27