Cheapest LLM for vision
Vision tasks send images (tokenized as input) and get back text, so input price leads. Only models that accept image input qualify. These are the cheapest generally-available vision-capable models, ranked by a typical workload.
Cheapest models for vision
Monthly cost for an image-understanding workload of ~20M input and ~5M output tokens a month. Sorted cheapest first.
| # | Model | Context | Input $/M | Output $/M | Monthly cost |
|---|---|---|---|---|---|
| 1 | Ministral 3 3B Mistral | — | $0.04 | $0.04 | $1.00 ◎ |
| 2 | Ministral 3 8B Mistral | 256K | $0.15 | $0.15 | $3.75 |
| 3 | Ministral 3 14B Mistral | — | $0.2 | $0.2 | $5.00 |
| 4 | Mistral Small 4 Mistral | 256K | $0.15 | $0.6 | $6.00 |
| 5 | Mistral Large 3 Mistral | 256K | $0.5 | $1.50 | $17.50 |
| 6 | Grok Build 0.1 xAI | 256K | $1 | $2 | $30.00 |
| 7 | Grok 4.3 xAI | 1M | $1.25 | $2.50 | $37.50 |
| 8 | Grok 4.20 (0309) Reasoning xAI | 1M | $1.25 | $2.50 | $37.50 |
| 9 | Grok 4.20 (0309) Non-Reasoning xAI | 1M | $1.25 | $2.50 | $37.50 |
| 10 | Claude Haiku 4.5 Anthropic | 200K | $1 | $5 | $45.00 |
| 11 | Mistral Medium 3.5 Mistral | — | $1.50 | $7.50 | $67.50 |
| 12 | Claude Sonnet 4.6 Anthropic | 1M | $3 | $15 | $135 |
Estimate only; excludes prompt caching, batch discounts and free tiers. Different volumes change the ranking —run your own numbers. Prices verified against official docs · catalog updated 2026-06-28.
We include only models that accept image input (image understanding, not image generation), then rank a 20M-in / 5M-out monthly mix. Image tokens land on the input side of the bill, so cheap input pricing is the biggest lever.
Cheapest LLM for vision
What is the cheapest LLM for vision?
Ministral 3 3B (Mistral) is the cheapest generally-available model we track for vision, at $0.04 per 1M input tokens and $0.04 per 1M output tokens — about $1.00/month for an image-understanding workload of ~20M input and ~5M output tokens a month. Ministral 3 8B is the next cheapest at $3.75/month.
How is "cheapest for vision" calculated?
We price a representative monthly workload — an image-understanding workload of ~20M input and ~5M output tokens a month — against every generally-available model, then rank by total cost. Only models that accept image input qualify. All prices are USD per 1M tokens, sourced from official provider documentation.
Is the cheapest model always the right choice for vision?
No. Price is one axis; quality, latency, rate limits and reliability matter too. Use this ranking to shortlist, then test the top candidates on your own vision workload before committing. Cost is easy to measure — fit is not.
Get alerted when a cheaper model for vision ships
New models, price cuts, and deprecations — a short email when something actually changes. No spam, unsubscribe anytime.
◎ You're on the watch list. We'll ping you the moment a model launches, changes price, or gets deprecated.
Free forever · powered by the same data on this page.