Guides / summarization

Cheapest LLM for summarization

Summarization is almost pure input: you feed in long documents and get back a short digest. Input price and a roomy context window dominate. These models clear a 200K-token floor and rank cheapest first for that read-heavy load.

The cheapest pickQwen-Flash
$5.60/mo for a summarizer reading ~80M input and writing ~4M output tokens a month · $0.05 in / $0.4 out per 1M · Alibaba
The ranking

Cheapest models for summarization

Monthly cost for a summarizer reading ~80M input and writing ~4M output tokens a month. Sorted cheapest first.

#ModelContextInput $/MOutput $/MMonthly cost
1Qwen-Flash
Alibaba
1M$0.05$0.4$5.60 ◎
2Amazon Nova Lite
Amazon
300K$0.06$0.24$5.76
3Llama 4 Scout (17B-16E Instruct)
Meta
10M$0.1$0.3$9.20
4Qwen3.5-Flash
Alibaba
1M$0.1$0.4$9.60
5Ministral 3 8B
Mistral
256K$0.15$0.15$12.60
6Llama 4 Maverick (17B-128E Instruct)
Meta
1M$0.15$0.6$14.40
7Mistral Small 4
Mistral
256K$0.15$0.6$14.40
8GPT-5.4 nano
OpenAI
400K$0.2$1.25$21.00
9Gemini 3.1 Flash-Lite
Google
1.0M$0.25$1.50$26.00
10Qwen3.6-Flash
Alibaba
1M$0.25$1.50$26.00
11Amazon Nova 2 Lite
Amazon
1M$0.3$2.50$34.00
12Qwen-Plus (Qwen3-series)
Alibaba
1M$0.4$1.20$36.80

Estimate only; excludes prompt caching, batch discounts and free tiers. Different volumes change the ranking —run your own numbers. Prices verified against official docs · catalog updated 2026-06-28.

Methodology

Summarization is the most lopsided workload — long source in, short summary out (~20:1). We rank an 80M-in / 4M-out monthly mix and require ≥200K context so whole documents fit in a single pass instead of being chunked.

FAQ

Cheapest LLM for summarization

What is the cheapest LLM for summarization?

Qwen-Flash (Alibaba) is the cheapest generally-available model we track for summarization, at $0.05 per 1M input tokens and $0.4 per 1M output tokens — about $5.60/month for a summarizer reading ~80M input and writing ~4M output tokens a month. Amazon Nova Lite is the next cheapest at $5.76/month.

How is "cheapest for summarization" calculated?

We price a representative monthly workload — a summarizer reading ~80M input and writing ~4M output tokens a month — against every generally-available model, then rank by total cost. Only models with at least a 200K-token context window are included. All prices are USD per 1M tokens, sourced from official provider documentation.

Is the cheapest model always the right choice for summarization?

No. Price is one axis; quality, latency, rate limits and reliability matter too. Use this ranking to shortlist, then test the top candidates on your own summarization workload before committing. Cost is easy to measure — fit is not.