DeepSeek V4 Flash

DeepSeek · Text Generation
POST /v1/chat/completionsLightweight MoE model with 284B total / 13B active parameters and native 1M context, tuned for low-latency, cost-effective high-concurrency use.
At a glance
| Field | Value |
|---|---|
| Model id | deepseek-v4-flash |
| Input modalities | Text |
| Output modalities | Text |
| Context window | 1M |
| Weight precision | - |
| Max output tokens | 393,216 |
| Region | Germany (Frankfurt) |
| Features | reasoning |
| Native inference | No |
| New | Yes |
| Supported endpoints | POST /v1/chat/completions, POST /v1/responses, POST /v1/messages |
Pricing
| Charge | Spec | Rate |
|---|---|---|
| Input | per 1M prompt tokens | $0.14 |
| Output | per 1M generated tokens | $0.28 |
| Web Search (Linkup) | per call when invoked | $0.013 |
Example request
$ curl https://api.empiriolabs.ai/v1/chat/completions \ > -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \ > -H 'Content-Type: application/json' \ > -d '{"model": "deepseek-v4-flash", "messages": [{"role":"user","content":"Hello"}]}'
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
temperature | number | no | 0.7 | Sampling temperature. 0 = deterministic, 2 = maximum randomness. · Range: 0 – 2 |
top_p | number | no | 0.9 | Nucleus sampling probability mass. Lower = more focused. · Range: 0 – 1 |
max_tokens | number | no | 4096 | Maximum output tokens. · Range: 1 – 393216 |
stop | string | no | - | Up to 4 strings where the model will stop generating further tokens. |
enable_thinking | boolean | no | true | Enable step-by-step reasoning before answering. |
thinking_budget | number | no | 32768 | Maximum tokens reserved for the reasoning process. Up to 393216. · Range: 1 – 393216 |
reasoning_effort | enum | no | "medium" | Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: none, low, medium, high, max |
web_search_linkup | boolean | no | false | Optional web search powered by Linkup. When enabled, recent web sources are retrieved using your latest user message as the query and provided to the model as additional context. Adds a flat $0.013 per request on top of the model’s normal token cost. Disabled by default. |
disable_formatting | boolean | no | false | When enabled, the gateway will not append the “Sources” footer to assistant responses that used Linkup web search. Useful when the model output is piped to another system that expects no decoration. |
Variants
:variant1
| Field | Value |
|---|---|
| Model id | deepseek-v4-flash:variant1 |
| Region | Singapore |
| Context window | 1M |
| Weight precision | - |
| Max output tokens | 393,216 |
| Features | reasoning, web_search |
| Native inference | No |
| Supported endpoints | POST /v1/chat/completions, POST /v1/responses, POST /v1/messages |
Pricing
| Charge | Spec | Rate |
|---|---|---|
| Input | per 1M prompt tokens | $0.20 |
| Output | per 1M generated tokens | $0.40 |
| Web search | per request when enabled | $0.02 |
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
temperature | number | no | 0.7 | Sampling temperature. 0 = deterministic, 2 = maximum randomness. · Range: 0 – 2 |
top_p | number | no | 0.9 | Nucleus sampling probability mass. Lower = more focused. · Range: 0 – 1 |
max_tokens | number | no | 4096 | Maximum output tokens. · Range: 1 – 393216 |
stop | string | no | - | Up to 4 strings where the model will stop generating further tokens. |
enable_thinking | boolean | no | true | Enable step-by-step reasoning before answering. |
thinking_budget | number | no | 32768 | Maximum tokens reserved for the reasoning process. Up to 393216. · Range: 1 – 393216 |
reasoning_effort | enum | no | "medium" | Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: none, low, medium, high, max |
tool_web_search | boolean | no | false | Enable live web search. Adds a $0.02 surcharge to the request cost when enabled. |
:variant2
| Field | Value |
|---|---|
| Model id | deepseek-v4-flash:variant2 |
| Region | China |
| Context window | 1M |
| Weight precision | - |
| Max output tokens | 384,000 |
| Features | reasoning, function_calling, web_search, cache |
| Native inference | No |
| Supported endpoints | POST /v1/chat/completions, POST /v1/responses, POST /v1/messages |
Pricing
| Charge | Spec | Rate |
|---|---|---|
| Input | per 1M prompt tokens | $0.138 (was $0.14) |
| Output | per 1M generated tokens | $0.275 (was $0.28) |
| Implicit cache read | per 1M cached input tokens | $0.028 |
| Web search | per request when enabled | $0.01 |
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
temperature | number | no | 0.7 | Sampling temperature. 0 = deterministic, 2 = maximum randomness. · Range: 0 – 2 |
top_p | number | no | 0.9 | Nucleus sampling probability mass. Lower = more focused. · Range: 0 – 1 |
max_tokens | number | no | 4096 | Maximum output tokens. · Range: 1 – 393216 |
stop | string | no | - | Up to 4 strings where the model will stop generating further tokens. |
enable_thinking | boolean | no | true | Enable step-by-step reasoning before answering. |
thinking_budget | number | no | 32768 | Maximum tokens reserved for the reasoning process. Up to 393216. · Range: 1 – 393216 |
reasoning_effort | enum | no | "medium" | Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: none, low, medium, high, max |
tool_web_search | boolean | no | false | Enable live web search. Adds $0.01 to the request cost when enabled. |
Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/deepseek-v4-flash.
