# AI Model Watch

> Live prices, context windows, lifecycle status and deprecation dates for every major AI/LLM model. Compiled from official provider documentation and refreshed daily.

Last updated: 2026-06-28. Tracking 181 models across 12 providers.
Cheapest GA model by blended cost: Llama 3.1 8B Instruct (Meta) at $0.022 per 1M tokens.

## Key pages
- [All models](https://aimodelwatch.dev/): sortable table of every model with price and context
- [Providers](https://aimodelwatch.dev/providers): per-provider lineups, cheapest pick and lifecycle notes
- [Compare](https://aimodelwatch.dev/compare): head-to-head model comparisons
- [Deprecations](https://aimodelwatch.dev/deprecations): models retiring and their migration targets
- [Cost calculator](https://aimodelwatch.dev/calculator): monthly cost per model for your token volume
- [Cheapest-LLM guides](https://aimodelwatch.dev/guides): cheapest model for chatbots, RAG, coding, summarization, vision
- [Changelog](https://aimodelwatch.dev/changelog): launches, price changes and deprecations

## Cheapest LLM by use case (computed from official prices)
- Cheapest for chatbots: Llama 3.1 8B Instruct (Meta), ~$0.29/mo for a busy chatbot handling ~10M input and ~3M output tokens a month — https://aimodelwatch.dev/guides/cheapest-llm-for-chatbots
- Cheapest for RAG: Llama 3.1 8B Instruct (Meta), ~$1.15/mo for a RAG app stuffing ~50M input and ~5M output tokens a month — https://aimodelwatch.dev/guides/cheapest-llm-for-rag
- Cheapest for coding: Amazon Nova Lite (Amazon), ~$11.40/mo for a coding agent burning ~90M input and ~25M output tokens a month — https://aimodelwatch.dev/guides/cheapest-llm-for-coding
- Cheapest for summarization: Qwen-Flash (Alibaba), ~$5.60/mo for a summarizer reading ~80M input and writing ~4M output tokens a month — https://aimodelwatch.dev/guides/cheapest-llm-for-summarization
- Cheapest for vision: Ministral 3 3B (Mistral), ~$1.00/mo for an image-understanding workload of ~20M input and ~5M output tokens a month — https://aimodelwatch.dev/guides/cheapest-llm-for-vision

## GA models (cheapest first, blended USD per 1M tokens)
- Llama 3.1 8B Instruct (Meta): $0.02/$0.03 in/out, 128K context, blended $0.022/1M — https://aimodelwatch.dev/models/llama-3-1-8b-instruct
- Ministral 3 3B (Mistral): $0.04/$0.04 in/out, — context, blended $0.04/1M — https://aimodelwatch.dev/models/ministral-3-3b
- Amazon Nova Micro (Amazon): $0.035/$0.14 in/out, 128K context, blended $0.061/1M — https://aimodelwatch.dev/models/amazon-nova-micro
- Command R7B (Cohere): $0.0375/$0.15 in/out, 128K context, blended $0.066/1M — https://aimodelwatch.dev/models/command-r7b-12-2024
- Amazon Nova Lite (Amazon): $0.06/$0.24 in/out, 300K context, blended $0.105/1M — https://aimodelwatch.dev/models/amazon-nova-lite
- Qwen-Flash (Alibaba): $0.05/$0.4 in/out, 1M context, blended $0.138/1M — https://aimodelwatch.dev/models/qwen-flash
- Llama 4 Scout (17B-16E Instruct) (Meta): $0.1/$0.3 in/out, 10M context, blended $0.15/1M — https://aimodelwatch.dev/models/llama-4-scout
- Ministral 3 8B (Mistral): $0.15/$0.15 in/out, 256K context, blended $0.15/1M — https://aimodelwatch.dev/models/ministral-3-8b
- Llama 3.3 70B Instruct (Meta): $0.1/$0.32 in/out, 128K context, blended $0.155/1M — https://aimodelwatch.dev/models/llama-3-3-70b-instruct
- Qwen3.5-Flash (Alibaba): $0.1/$0.4 in/out, 1M context, blended $0.175/1M — https://aimodelwatch.dev/models/qwen3-5-flash
- Ministral 3 14B (Mistral): $0.2/$0.2 in/out, — context, blended $0.2/1M — https://aimodelwatch.dev/models/ministral-3-14b
- Llama 4 Maverick (17B-128E Instruct) (Meta): $0.15/$0.6 in/out, 1M context, blended $0.262/1M — https://aimodelwatch.dev/models/llama-4-maverick
- Mistral Small 4 (Mistral): $0.15/$0.6 in/out, 256K context, blended $0.262/1M — https://aimodelwatch.dev/models/mistral-small-4
- Command R (08-2024) (Cohere): $0.15/$0.6 in/out, 128K context, blended $0.262/1M — https://aimodelwatch.dev/models/command-r-08-2024
- Codestral (v25.08) (Mistral): $0.3/$0.9 in/out, 128K context, blended $0.45/1M — https://aimodelwatch.dev/models/codestral
- GPT-5.4 nano (OpenAI): $0.2/$1.25 in/out, 400K context, blended $0.463/1M — https://aimodelwatch.dev/models/gpt-5-4-nano
- Gemini 3.1 Flash-Lite (Google): $0.25/$1.5 in/out, 1.0M context, blended $0.563/1M — https://aimodelwatch.dev/models/gemini-3-1-flash-lite
- Qwen3.6-Flash (Alibaba): $0.25/$1.5 in/out, 1M context, blended $0.563/1M — https://aimodelwatch.dev/models/qwen3-6-flash
- Qwen-Plus (Qwen3-series) (Alibaba): $0.4/$1.2 in/out, 1M context, blended $0.6/1M — https://aimodelwatch.dev/models/qwen-plus
- Moonshot v1 8K (Moonshot): $0.2/$2 in/out, 8K context, blended $0.65/1M — https://aimodelwatch.dev/models/moonshot-v1-8k
- Qwen3.7-Plus (Alibaba): $0.4/$1.6 in/out, 1M context, blended $0.7/1M — https://aimodelwatch.dev/models/qwen3-7-plus
- Qwen3.7-Plus (snapshot 2026-05-26) (Alibaba): $0.4/$1.6 in/out, 1M context, blended $0.7/1M — https://aimodelwatch.dev/models/qwen3-7-plus-2026-05-26
- Mistral Large 3 (Mistral): $0.5/$1.5 in/out, 256K context, blended $0.75/1M — https://aimodelwatch.dev/models/mistral-large-3
- Amazon Nova 2 Lite (Amazon): $0.3/$2.5 in/out, 1M context, blended $0.85/1M — https://aimodelwatch.dev/models/amazon-nova-2-lite
- Qwen3.5-Plus (Alibaba): $0.4/$2.4 in/out, 262K context, blended $0.9/1M — https://aimodelwatch.dev/models/qwen3-5-plus
- Sonar (Perplexity): $1/$1 in/out, 128K context, blended $1/1M — https://aimodelwatch.dev/models/perplexity-sonar
- Qwen3.6-Plus (Alibaba): $0.5/$3 in/out, 1M context, blended $1.13/1M — https://aimodelwatch.dev/models/qwen3-6-plus
- Kimi K2.5 (Moonshot): $0.6/$3 in/out, 262K context, blended $1.20/1M — https://aimodelwatch.dev/models/kimi-k2-5
- Grok Build 0.1 (xAI): $1/$2 in/out, 256K context, blended $1.25/1M — https://aimodelwatch.dev/models/grok-build-0-1
- Amazon Nova Pro (Amazon): $0.8/$3.2 in/out, 300K context, blended $1.40/1M — https://aimodelwatch.dev/models/amazon-nova-pro
- Moonshot v1 32K (Moonshot): $1/$3 in/out, 33K context, blended $1.50/1M — https://aimodelwatch.dev/models/moonshot-v1-32k
- Grok 4.3 (xAI): $1.25/$2.5 in/out, 1M context, blended $1.56/1M — https://aimodelwatch.dev/models/grok-4-3
- Grok 4.20 (0309) Reasoning (xAI): $1.25/$2.5 in/out, 1M context, blended $1.56/1M — https://aimodelwatch.dev/models/grok-4-20-0309-reasoning
- Grok 4.20 (0309) Non-Reasoning (xAI): $1.25/$2.5 in/out, 1M context, blended $1.56/1M — https://aimodelwatch.dev/models/grok-4-20-0309-non-reasoning
- GPT-5.4 mini (OpenAI): $0.75/$4.5 in/out, 400K context, blended $1.69/1M — https://aimodelwatch.dev/models/gpt-5-4-mini
- Kimi K2.7 Code (Moonshot): $0.95/$4 in/out, 262K context, blended $1.71/1M — https://aimodelwatch.dev/models/kimi-k2-7-code
- Kimi K2.6 (Moonshot): $0.95/$4 in/out, 262K context, blended $1.71/1M — https://aimodelwatch.dev/models/kimi-k2-6
- Claude Haiku 4.5 (Anthropic): $1/$5 in/out, 200K context, blended $2/1M — https://aimodelwatch.dev/models/claude-haiku-4-5
- GPT-4o mini Transcribe (OpenAI): $1.25/$5 in/out, 16K context, blended $2.19/1M — https://aimodelwatch.dev/models/gpt-4o-mini-transcribe
- Qwen3-Max (Alibaba): $1.2/$6 in/out, 262K context, blended $2.40/1M — https://aimodelwatch.dev/models/qwen3-max
- Moonshot v1 128K (Moonshot): $2/$5 in/out, 131K context, blended $2.75/1M — https://aimodelwatch.dev/models/moonshot-v1-128k
- Qwen-Max (Qwen2.5-Max) (Alibaba): $1.6/$6.4 in/out, 33K context, blended $2.80/1M — https://aimodelwatch.dev/models/qwen-max
- Mistral Medium 3.5 (Mistral): $1.5/$7.5 in/out, — context, blended $3/1M — https://aimodelwatch.dev/models/mistral-medium-3-5
- Gemini 3.5 Flash (Google): $1.5/$9 in/out, 1.0M context, blended $3.38/1M — https://aimodelwatch.dev/models/gemini-3-5-flash
- Kimi K2.7 Code HighSpeed (Moonshot): $1.9/$8 in/out, 262K context, blended $3.42/1M — https://aimodelwatch.dev/models/kimi-k2-7-code-highspeed
- o4-mini-deep-research (OpenAI): $2/$8 in/out, 200K context, blended $3.50/1M — https://aimodelwatch.dev/models/o4-mini-deep-research
- Sonar Reasoning Pro (Perplexity): $2/$8 in/out, 128K context, blended $3.50/1M — https://aimodelwatch.dev/models/perplexity-sonar-reasoning-pro
- Sonar Deep Research (Perplexity): $2/$8 in/out, 128K context, blended $3.50/1M — https://aimodelwatch.dev/models/perplexity-sonar-deep-research
- Qwen3.7-Max (Alibaba): $2.5/$7.5 in/out, 1M context, blended $3.75/1M — https://aimodelwatch.dev/models/qwen3-7-max
- GPT Image 1 mini (OpenAI): $2.5/$8 in/out, — context, blended $3.88/1M — https://aimodelwatch.dev/models/gpt-image-1-mini
- GPT-4o Transcribe (OpenAI): $2.5/$10 in/out, 16K context, blended $4.38/1M — https://aimodelwatch.dev/models/gpt-4o-transcribe
- Command A (Cohere): $2.5/$10 in/out, 256K context, blended $4.38/1M — https://aimodelwatch.dev/models/command-a-03-2025
- Command R+ (08-2024) (Cohere): $2.5/$10 in/out, 128K context, blended $4.38/1M — https://aimodelwatch.dev/models/command-r-plus-08-2024
- GPT-5.3-Codex (OpenAI): $1.75/$14 in/out, 400K context, blended $4.81/1M — https://aimodelwatch.dev/models/gpt-5-3-codex
- GPT-5.4 (OpenAI): $2.5/$15 in/out, 1.1M context, blended $5.63/1M — https://aimodelwatch.dev/models/gpt-5-4
- Claude Sonnet 4.6 (Anthropic): $3/$15 in/out, 1M context, blended $6/1M — https://aimodelwatch.dev/models/claude-sonnet-4-6
- Claude Sonnet 4.5 (Anthropic): $3/$15 in/out, 200K context, blended $6/1M — https://aimodelwatch.dev/models/claude-sonnet-4-5
- Sonar Pro (Perplexity): $3/$15 in/out, 200K context, blended $6/1M — https://aimodelwatch.dev/models/perplexity-sonar-pro
- GPT-Realtime-2 (OpenAI): $4/$24 in/out, 128K context, blended $9/1M — https://aimodelwatch.dev/models/gpt-realtime-2
- Claude Opus 4.8 (Anthropic): $5/$25 in/out, 1M context, blended $10/1M — https://aimodelwatch.dev/models/claude-opus-4-8

## Providers
- OpenAI: 29 models (16 GA), cheapest GPT-5.4 nano at $0.463/1M — https://aimodelwatch.dev/providers/openai
- Anthropic: 16 models (8 GA), cheapest Claude Haiku 4.5 at $2/1M — https://aimodelwatch.dev/providers/anthropic
- Google: 9 models (2 GA), cheapest Gemini 3.1 Flash-Lite at $0.563/1M — https://aimodelwatch.dev/providers/google
- xAI: 17 models (8 GA), cheapest Grok Build 0.1 at $1.25/1M — https://aimodelwatch.dev/providers/xai
- Mistral: 20 models (8 GA), cheapest Ministral 3 3B at $0.04/1M — https://aimodelwatch.dev/providers/mistral
- Alibaba: 18 models (13 GA), cheapest Qwen-Flash at $0.138/1M — https://aimodelwatch.dev/providers/alibaba
- Amazon: 6 models (4 GA), cheapest Amazon Nova Micro at $0.061/1M — https://aimodelwatch.dev/providers/amazon
- Moonshot: 17 models (7 GA), cheapest Moonshot v1 8K at $0.65/1M — https://aimodelwatch.dev/providers/moonshot
- Perplexity: 5 models (4 GA), cheapest Sonar at $1/1M — https://aimodelwatch.dev/providers/perplexity
- DeepSeek: 4 models (0 GA), pricing n/a — https://aimodelwatch.dev/providers/deepseek
- Meta: 12 models (11 GA), cheapest Llama 3.1 8B Instruct at $0.022/1M — https://aimodelwatch.dev/providers/meta
- Cohere: 28 models (18 GA), cheapest Command R7B at $0.066/1M — https://aimodelwatch.dev/providers/cohere

Data provided as-is; verify against official provider pages before relying on it.