Question 1

What is the cheapest LLM for summarization?

Accepted Answer

Qwen-Flash (Alibaba) is the cheapest generally-available model we track for summarization, at $0.05 per 1M input tokens and $0.4 per 1M output tokens — about $5.60/month for a summarizer reading ~80M input and writing ~4M output tokens a month. Amazon Nova Lite is the next cheapest at $5.76/month.

Question 2

How is "cheapest for summarization" calculated?

Accepted Answer

We price a representative monthly workload — a summarizer reading ~80M input and writing ~4M output tokens a month — against every generally-available model, then rank by total cost. Only models with at least a 200K-token context window are included. All prices are USD per 1M tokens, sourced from official provider documentation.

Question 3

Is the cheapest model always the right choice for summarization?

Accepted Answer

No. Price is one axis; quality, latency, rate limits and reliability matter too. Use this ranking to shortlist, then test the top candidates on your own summarization workload before committing. Cost is easy to measure — fit is not.

#	Model	Context	Input $/M	Output $/M	Monthly cost
1	Qwen-Flash Alibaba	1M	$0.05	$0.4	$5.60 ◎
2	Amazon Nova Lite Amazon	300K	$0.06	$0.24	$5.76
3	Llama 4 Scout (17B-16E Instruct) Meta	10M	$0.1	$0.3	$9.20
4	Qwen3.5-Flash Alibaba	1M	$0.1	$0.4	$9.60
5	Ministral 3 8B Mistral	256K	$0.15	$0.15	$12.60
6	Llama 4 Maverick (17B-128E Instruct) Meta	1M	$0.15	$0.6	$14.40
7	Mistral Small 4 Mistral	256K	$0.15	$0.6	$14.40
8	GPT-5.4 nano OpenAI	400K	$0.2	$1.25	$21.00
9	Gemini 3.1 Flash-Lite Google	1.0M	$0.25	$1.50	$26.00
10	Qwen3.6-Flash Alibaba	1M	$0.25	$1.50	$26.00
11	Amazon Nova 2 Lite Amazon	1M	$0.3	$2.50	$34.00
12	Qwen-Plus (Qwen3-series) Alibaba	1M	$0.4	$1.20	$36.80

Cheapest LLM for summarization

Cheapest models for summarization