DeepSeek V4 Flash

DeepSeek V4 Flash
DeepSeek · Text Generation
POST /v1/chat/completions

Lightweight MoE model with 284B total / 13B active parameters and native 1M context, tuned for low-latency, cost-effective high-concurrency use.

At a glance

FieldValue
Model iddeepseek-v4-flash
Input modalitiesText
Output modalitiesText
Context window1M
Weight precision-
Max output tokens393,216
RegionGermany (Frankfurt)
Featuresreasoning
Native inferenceNo
NewYes
Supported endpointsPOST /v1/chat/completions, POST /v1/responses, POST /v1/messages

Pricing

ChargeSpecRate
Inputper 1M prompt tokens$0.14
Outputper 1M generated tokens$0.28
Web Search (Linkup)per call when invoked$0.013

Example request

$curl https://api.empiriolabs.ai/v1/chat/completions \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "deepseek-v4-flash", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno0.7Sampling temperature. 0 = deterministic, 2 = maximum randomness. · Range: 0 – 2
top_pnumberno0.9Nucleus sampling probability mass. Lower = more focused. · Range: 0 – 1
max_tokensnumberno4096Maximum output tokens. · Range: 1 – 393216
stopstringno-Up to 4 strings where the model will stop generating further tokens.
enable_thinkingbooleannotrueEnable step-by-step reasoning before answering.
thinking_budgetnumberno32768Maximum tokens reserved for the reasoning process. Up to 393216. · Range: 1 – 393216
reasoning_effortenumno"medium"Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: none, low, medium, high, max
web_search_linkupbooleannofalseOptional web search powered by Linkup. When enabled, recent web sources are retrieved using your latest user message as the query and provided to the model as additional context. Adds a flat $0.013 per request on top of the model’s normal token cost. Disabled by default.
disable_formattingbooleannofalseWhen enabled, the gateway will not append the “Sources” footer to assistant responses that used Linkup web search. Useful when the model output is piped to another system that expects no decoration.

Variants

:variant1

FieldValue
Model iddeepseek-v4-flash:variant1
RegionSingapore
Context window1M
Weight precision-
Max output tokens393,216
Featuresreasoning, web_search
Native inferenceNo
Supported endpointsPOST /v1/chat/completions, POST /v1/responses, POST /v1/messages

Pricing

ChargeSpecRate
Inputper 1M prompt tokens$0.20
Outputper 1M generated tokens$0.40
Web searchper request when enabled$0.02

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno0.7Sampling temperature. 0 = deterministic, 2 = maximum randomness. · Range: 0 – 2
top_pnumberno0.9Nucleus sampling probability mass. Lower = more focused. · Range: 0 – 1
max_tokensnumberno4096Maximum output tokens. · Range: 1 – 393216
stopstringno-Up to 4 strings where the model will stop generating further tokens.
enable_thinkingbooleannotrueEnable step-by-step reasoning before answering.
thinking_budgetnumberno32768Maximum tokens reserved for the reasoning process. Up to 393216. · Range: 1 – 393216
reasoning_effortenumno"medium"Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: none, low, medium, high, max
tool_web_searchbooleannofalseEnable live web search. Adds a $0.02 surcharge to the request cost when enabled.

:variant2

FieldValue
Model iddeepseek-v4-flash:variant2
RegionChina
Context window1M
Weight precision-
Max output tokens384,000
Featuresreasoning, function_calling, web_search, cache
Native inferenceNo
Supported endpointsPOST /v1/chat/completions, POST /v1/responses, POST /v1/messages

Pricing

ChargeSpecRate
Inputper 1M prompt tokens$0.138 (was $0.14)
Outputper 1M generated tokens$0.275 (was $0.28)
Implicit cache readper 1M cached input tokens$0.028
Web searchper request when enabled$0.01

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno0.7Sampling temperature. 0 = deterministic, 2 = maximum randomness. · Range: 0 – 2
top_pnumberno0.9Nucleus sampling probability mass. Lower = more focused. · Range: 0 – 1
max_tokensnumberno4096Maximum output tokens. · Range: 1 – 393216
stopstringno-Up to 4 strings where the model will stop generating further tokens.
enable_thinkingbooleannotrueEnable step-by-step reasoning before answering.
thinking_budgetnumberno32768Maximum tokens reserved for the reasoning process. Up to 393216. · Range: 1 – 393216
reasoning_effortenumno"medium"Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: none, low, medium, high, max
tool_web_searchbooleannofalseEnable live web search. Adds $0.01 to the request cost when enabled.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/deepseek-v4-flash.