DeepSeek-V4-Flash
PreviewSmaller/cheaper V4 model (~284B total / ~13B active params per authoritative third-party reports). Context length 1M, max output 384K tokens. Input price $0.14/M cache-miss, $0.0028/M cache-hit; output $0.28/M (USD). Supports dual modes (Thinking / Non-Thinking), JSON output, tool calls, chat prefix completion; FIM completion is non-thinking-mode only. Concurrency limit 2500. The legacy aliases deepseek-chat and deepseek-reasoner currently route to this model (non-thinking / thinking respectively). Part of the 'DeepSeek V4 Preview' generation (released 2026-04-24), hence status=preview. Knowle
Track DeepSeek-V4-Flash price & status changes
New models, price cuts, and deprecations — a short email when something actually changes. No spam, unsubscribe anytime.
◎ You're on the watch list. We'll ping you the moment a model launches, changes price, or gets deprecated.
Free forever · powered by the same data on this page.