# AI Model Watch > Live prices, context windows, lifecycle status and deprecation dates for every major AI/LLM model. Compiled from official provider documentation and refreshed daily. Last updated: 2026-06-28. Tracking 181 models across 12 providers. Cheapest GA model by blended cost: Llama 3.1 8B Instruct (Meta) at $0.022 per 1M tokens. ## Key pages - [All models](https://aimodelwatch.dev/): sortable table of every model with price and context - [Providers](https://aimodelwatch.dev/providers): per-provider lineups, cheapest pick and lifecycle notes - [Compare](https://aimodelwatch.dev/compare): head-to-head model comparisons - [Deprecations](https://aimodelwatch.dev/deprecations): models retiring and their migration targets - [Cost calculator](https://aimodelwatch.dev/calculator): monthly cost per model for your token volume - [Cheapest-LLM guides](https://aimodelwatch.dev/guides): cheapest model for chatbots, RAG, coding, summarization, vision - [Changelog](https://aimodelwatch.dev/changelog): launches, price changes and deprecations ## Cheapest LLM by use case (computed from official prices) - Cheapest for chatbots: Llama 3.1 8B Instruct (Meta), ~$0.29/mo for a busy chatbot handling ~10M input and ~3M output tokens a month — https://aimodelwatch.dev/guides/cheapest-llm-for-chatbots - Cheapest for RAG: Llama 3.1 8B Instruct (Meta), ~$1.15/mo for a RAG app stuffing ~50M input and ~5M output tokens a month — https://aimodelwatch.dev/guides/cheapest-llm-for-rag - Cheapest for coding: Amazon Nova Lite (Amazon), ~$11.40/mo for a coding agent burning ~90M input and ~25M output tokens a month — https://aimodelwatch.dev/guides/cheapest-llm-for-coding - Cheapest for summarization: Qwen-Flash (Alibaba), ~$5.60/mo for a summarizer reading ~80M input and writing ~4M output tokens a month — https://aimodelwatch.dev/guides/cheapest-llm-for-summarization - Cheapest for vision: Ministral 3 3B (Mistral), ~$1.00/mo for an image-understanding workload of ~20M input and ~5M output tokens a month — https://aimodelwatch.dev/guides/cheapest-llm-for-vision ## GA models (cheapest first, blended USD per 1M tokens) - Llama 3.1 8B Instruct (Meta): $0.02/$0.03 in/out, 128K context, blended $0.022/1M — https://aimodelwatch.dev/models/llama-3-1-8b-instruct - Ministral 3 3B (Mistral): $0.04/$0.04 in/out, — context, blended $0.04/1M — https://aimodelwatch.dev/models/ministral-3-3b - Amazon Nova Micro (Amazon): $0.035/$0.14 in/out, 128K context, blended $0.061/1M — https://aimodelwatch.dev/models/amazon-nova-micro - Command R7B (Cohere): $0.0375/$0.15 in/out, 128K context, blended $0.066/1M — https://aimodelwatch.dev/models/command-r7b-12-2024 - Amazon Nova Lite (Amazon): $0.06/$0.24 in/out, 300K context, blended $0.105/1M — https://aimodelwatch.dev/models/amazon-nova-lite - Qwen-Flash (Alibaba): $0.05/$0.4 in/out, 1M context, blended $0.138/1M — https://aimodelwatch.dev/models/qwen-flash - Llama 4 Scout (17B-16E Instruct) (Meta): $0.1/$0.3 in/out, 10M context, blended $0.15/1M — https://aimodelwatch.dev/models/llama-4-scout - Ministral 3 8B (Mistral): $0.15/$0.15 in/out, 256K context, blended $0.15/1M — https://aimodelwatch.dev/models/ministral-3-8b - Llama 3.3 70B Instruct (Meta): $0.1/$0.32 in/out, 128K context, blended $0.155/1M — https://aimodelwatch.dev/models/llama-3-3-70b-instruct - Qwen3.5-Flash (Alibaba): $0.1/$0.4 in/out, 1M context, blended $0.175/1M — https://aimodelwatch.dev/models/qwen3-5-flash - Ministral 3 14B (Mistral): $0.2/$0.2 in/out, — context, blended $0.2/1M — https://aimodelwatch.dev/models/ministral-3-14b - Llama 4 Maverick (17B-128E Instruct) (Meta): $0.15/$0.6 in/out, 1M context, blended $0.262/1M — https://aimodelwatch.dev/models/llama-4-maverick - Mistral Small 4 (Mistral): $0.15/$0.6 in/out, 256K context, blended $0.262/1M — https://aimodelwatch.dev/models/mistral-small-4 - Command R (08-2024) (Cohere): $0.15/$0.6 in/out, 128K context, blended $0.262/1M — https://aimodelwatch.dev/models/command-r-08-2024 - Codestral (v25.08) (Mistral): $0.3/$0.9 in/out, 128K context, blended $0.45/1M — https://aimodelwatch.dev/models/codestral - GPT-5.4 nano (OpenAI): $0.2/$1.25 in/out, 400K context, blended $0.463/1M — https://aimodelwatch.dev/models/gpt-5-4-nano - Gemini 3.1 Flash-Lite (Google): $0.25/$1.5 in/out, 1.0M context, blended $0.563/1M — https://aimodelwatch.dev/models/gemini-3-1-flash-lite - Qwen3.6-Flash (Alibaba): $0.25/$1.5 in/out, 1M context, blended $0.563/1M — https://aimodelwatch.dev/models/qwen3-6-flash - Qwen-Plus (Qwen3-series) (Alibaba): $0.4/$1.2 in/out, 1M context, blended $0.6/1M — https://aimodelwatch.dev/models/qwen-plus - Moonshot v1 8K (Moonshot): $0.2/$2 in/out, 8K context, blended $0.65/1M — https://aimodelwatch.dev/models/moonshot-v1-8k - Qwen3.7-Plus (Alibaba): $0.4/$1.6 in/out, 1M context, blended $0.7/1M — https://aimodelwatch.dev/models/qwen3-7-plus - Qwen3.7-Plus (snapshot 2026-05-26) (Alibaba): $0.4/$1.6 in/out, 1M context, blended $0.7/1M — https://aimodelwatch.dev/models/qwen3-7-plus-2026-05-26 - Mistral Large 3 (Mistral): $0.5/$1.5 in/out, 256K context, blended $0.75/1M — https://aimodelwatch.dev/models/mistral-large-3 - Amazon Nova 2 Lite (Amazon): $0.3/$2.5 in/out, 1M context, blended $0.85/1M — https://aimodelwatch.dev/models/amazon-nova-2-lite - Qwen3.5-Plus (Alibaba): $0.4/$2.4 in/out, 262K context, blended $0.9/1M — https://aimodelwatch.dev/models/qwen3-5-plus - Sonar (Perplexity): $1/$1 in/out, 128K context, blended $1/1M — https://aimodelwatch.dev/models/perplexity-sonar - Qwen3.6-Plus (Alibaba): $0.5/$3 in/out, 1M context, blended $1.13/1M — https://aimodelwatch.dev/models/qwen3-6-plus - Kimi K2.5 (Moonshot): $0.6/$3 in/out, 262K context, blended $1.20/1M — https://aimodelwatch.dev/models/kimi-k2-5 - Grok Build 0.1 (xAI): $1/$2 in/out, 256K context, blended $1.25/1M — https://aimodelwatch.dev/models/grok-build-0-1 - Amazon Nova Pro (Amazon): $0.8/$3.2 in/out, 300K context, blended $1.40/1M — https://aimodelwatch.dev/models/amazon-nova-pro - Moonshot v1 32K (Moonshot): $1/$3 in/out, 33K context, blended $1.50/1M — https://aimodelwatch.dev/models/moonshot-v1-32k - Grok 4.3 (xAI): $1.25/$2.5 in/out, 1M context, blended $1.56/1M — https://aimodelwatch.dev/models/grok-4-3 - Grok 4.20 (0309) Reasoning (xAI): $1.25/$2.5 in/out, 1M context, blended $1.56/1M — https://aimodelwatch.dev/models/grok-4-20-0309-reasoning - Grok 4.20 (0309) Non-Reasoning (xAI): $1.25/$2.5 in/out, 1M context, blended $1.56/1M — https://aimodelwatch.dev/models/grok-4-20-0309-non-reasoning - GPT-5.4 mini (OpenAI): $0.75/$4.5 in/out, 400K context, blended $1.69/1M — https://aimodelwatch.dev/models/gpt-5-4-mini - Kimi K2.7 Code (Moonshot): $0.95/$4 in/out, 262K context, blended $1.71/1M — https://aimodelwatch.dev/models/kimi-k2-7-code - Kimi K2.6 (Moonshot): $0.95/$4 in/out, 262K context, blended $1.71/1M — https://aimodelwatch.dev/models/kimi-k2-6 - Claude Haiku 4.5 (Anthropic): $1/$5 in/out, 200K context, blended $2/1M — https://aimodelwatch.dev/models/claude-haiku-4-5 - GPT-4o mini Transcribe (OpenAI): $1.25/$5 in/out, 16K context, blended $2.19/1M — https://aimodelwatch.dev/models/gpt-4o-mini-transcribe - Qwen3-Max (Alibaba): $1.2/$6 in/out, 262K context, blended $2.40/1M — https://aimodelwatch.dev/models/qwen3-max - Moonshot v1 128K (Moonshot): $2/$5 in/out, 131K context, blended $2.75/1M — https://aimodelwatch.dev/models/moonshot-v1-128k - Qwen-Max (Qwen2.5-Max) (Alibaba): $1.6/$6.4 in/out, 33K context, blended $2.80/1M — https://aimodelwatch.dev/models/qwen-max - Mistral Medium 3.5 (Mistral): $1.5/$7.5 in/out, — context, blended $3/1M — https://aimodelwatch.dev/models/mistral-medium-3-5 - Gemini 3.5 Flash (Google): $1.5/$9 in/out, 1.0M context, blended $3.38/1M — https://aimodelwatch.dev/models/gemini-3-5-flash - Kimi K2.7 Code HighSpeed (Moonshot): $1.9/$8 in/out, 262K context, blended $3.42/1M — https://aimodelwatch.dev/models/kimi-k2-7-code-highspeed - o4-mini-deep-research (OpenAI): $2/$8 in/out, 200K context, blended $3.50/1M — https://aimodelwatch.dev/models/o4-mini-deep-research - Sonar Reasoning Pro (Perplexity): $2/$8 in/out, 128K context, blended $3.50/1M — https://aimodelwatch.dev/models/perplexity-sonar-reasoning-pro - Sonar Deep Research (Perplexity): $2/$8 in/out, 128K context, blended $3.50/1M — https://aimodelwatch.dev/models/perplexity-sonar-deep-research - Qwen3.7-Max (Alibaba): $2.5/$7.5 in/out, 1M context, blended $3.75/1M — https://aimodelwatch.dev/models/qwen3-7-max - GPT Image 1 mini (OpenAI): $2.5/$8 in/out, — context, blended $3.88/1M — https://aimodelwatch.dev/models/gpt-image-1-mini - GPT-4o Transcribe (OpenAI): $2.5/$10 in/out, 16K context, blended $4.38/1M — https://aimodelwatch.dev/models/gpt-4o-transcribe - Command A (Cohere): $2.5/$10 in/out, 256K context, blended $4.38/1M — https://aimodelwatch.dev/models/command-a-03-2025 - Command R+ (08-2024) (Cohere): $2.5/$10 in/out, 128K context, blended $4.38/1M — https://aimodelwatch.dev/models/command-r-plus-08-2024 - GPT-5.3-Codex (OpenAI): $1.75/$14 in/out, 400K context, blended $4.81/1M — https://aimodelwatch.dev/models/gpt-5-3-codex - GPT-5.4 (OpenAI): $2.5/$15 in/out, 1.1M context, blended $5.63/1M — https://aimodelwatch.dev/models/gpt-5-4 - Claude Sonnet 4.6 (Anthropic): $3/$15 in/out, 1M context, blended $6/1M — https://aimodelwatch.dev/models/claude-sonnet-4-6 - Claude Sonnet 4.5 (Anthropic): $3/$15 in/out, 200K context, blended $6/1M — https://aimodelwatch.dev/models/claude-sonnet-4-5 - Sonar Pro (Perplexity): $3/$15 in/out, 200K context, blended $6/1M — https://aimodelwatch.dev/models/perplexity-sonar-pro - GPT-Realtime-2 (OpenAI): $4/$24 in/out, 128K context, blended $9/1M — https://aimodelwatch.dev/models/gpt-realtime-2 - Claude Opus 4.8 (Anthropic): $5/$25 in/out, 1M context, blended $10/1M — https://aimodelwatch.dev/models/claude-opus-4-8 ## Providers - OpenAI: 29 models (16 GA), cheapest GPT-5.4 nano at $0.463/1M — https://aimodelwatch.dev/providers/openai - Anthropic: 16 models (8 GA), cheapest Claude Haiku 4.5 at $2/1M — https://aimodelwatch.dev/providers/anthropic - Google: 9 models (2 GA), cheapest Gemini 3.1 Flash-Lite at $0.563/1M — https://aimodelwatch.dev/providers/google - xAI: 17 models (8 GA), cheapest Grok Build 0.1 at $1.25/1M — https://aimodelwatch.dev/providers/xai - Mistral: 20 models (8 GA), cheapest Ministral 3 3B at $0.04/1M — https://aimodelwatch.dev/providers/mistral - Alibaba: 18 models (13 GA), cheapest Qwen-Flash at $0.138/1M — https://aimodelwatch.dev/providers/alibaba - Amazon: 6 models (4 GA), cheapest Amazon Nova Micro at $0.061/1M — https://aimodelwatch.dev/providers/amazon - Moonshot: 17 models (7 GA), cheapest Moonshot v1 8K at $0.65/1M — https://aimodelwatch.dev/providers/moonshot - Perplexity: 5 models (4 GA), cheapest Sonar at $1/1M — https://aimodelwatch.dev/providers/perplexity - DeepSeek: 4 models (0 GA), pricing n/a — https://aimodelwatch.dev/providers/deepseek - Meta: 12 models (11 GA), cheapest Llama 3.1 8B Instruct at $0.022/1M — https://aimodelwatch.dev/providers/meta - Cohere: 28 models (18 GA), cheapest Command R7B at $0.066/1M — https://aimodelwatch.dev/providers/cohere Data provided as-is; verify against official provider pages before relying on it.