
POST /v1/chat/completionsCost-effective Qwen3.7 vision-language model for text, image, video, coding, tool use, GUI understanding, and 1M-context workflows.
| Field | Value |
|---|---|
| Model id | qwen3-7-plus |
| Input modalities | Text, Image, Video |
| Output modalities | Text |
| Context window | 1M |
| Weight precision | - |
| Max output tokens | 65,536 |
| Region | Singapore |
| Features | qwen3.7, reasoning, vision, video, web_search, code_interpreter, function_calling, structured_output, prefix_continuation, fine_tuning, agentic_coding |
| Native inference | No |
| New | Yes |
| Supported endpoints | POST /v1/chat/completions, POST /v1/responses, POST /v1/messages |
| Charge | Spec | Rate |
|---|---|---|
| Input | per 1M prompt tokens | <=256K $0.40; 256K-1M $1.20 |
| Output | per 1M generated tokens | <=256K $1.60; 256K-1M $4.80 |
| Web Search | per call | $0.03 |
| Image Search | per call | $0.03 |
$ curl https://api.empiriolabs.ai/v1/chat/completions \ > -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \ > -H 'Content-Type: application/json' \ > -d '{"model": "qwen3-7-plus", "messages": [{"role":"user","content":"Hello"}]}'
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
temperature | number | no | 0.7 | Sampling temperature. 0 is deterministic and 2 is maximum randomness. · Range: 0 – 2 |
top_p | number | no | 0.9 | Nucleus sampling probability mass. Lower values make outputs more focused. · Range: 0 – 1 |
max_tokens | number | no | 4096 | Maximum output tokens. · Range: 1 – 65536 |
stop | string | no | - | Up to 4 strings where the model will stop generating further tokens. |
enable_thinking | boolean | no | true | Enable reasoning before answering. |
reasoning_effort | enum | no | "medium" | Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. · Allowed: none, low, medium, high, max |
thinking_budget | number | no | 32768 | Maximum tokens reserved for reasoning when thinking is enabled. · Range: 1 – 256000 |
response_format | object | no | - | OpenAI-compatible JSON mode or JSON schema response format. Use non-thinking mode for strict schemas. |
vl_high_resolution_images | boolean | no | true | Use higher resolution processing for image inputs. |
max_pixels | number | no | 2621440 | Maximum pixel count per image when high resolution processing is disabled. · Range: 4096 – 16777216 |
video_fps | number | no | 2 | Frames per second to sample from video inputs. · Range: 0.1 – 10 |
treat_images_as_video | boolean | no | false | Treat a sequence of images as video frames. |
tool_web_search | boolean | no | true | Search the web for real-time information. Adds $0.03 to the request cost for each invoked call. |
tool_web_extractor | boolean | no | true | Extract and read content from URLs. Requires Web Search and Thinking. |
tool_code_interpreter | boolean | no | true | Run Python code in a sandbox. Requires Thinking. |
tool_web_search_image | boolean | no | true | Search the web for images from text descriptions. Adds $0.03 to the request cost for each invoked call. |
tool_image_search | boolean | no | true | Find similar images from an uploaded image. Adds $0.03 to the request cost for each invoked call. |
disable_formatting | boolean | no | false | Return raw provider-style output without EmpirioLabs source formatting where supported. |
Pricing is 3x for input/output above 256K tokens. Web Search, Text-to-Image Search, and Image-to-Image Search are billed only when invoked.
Text-to-Image Search and Image-to-Image Search use the Image Search pricing row. Thinking tokens are billed as output tokens.
Per-tool billing (usage.tool_usage)
When this model invokes built-in tools inside a single request, the response carries a normalized usage.tool_usage map alongside the token counts. Tool counts are already factored into cost_usd and are surfaced for transparency.
:variant1| Field | Value |
|---|---|
| Model id | qwen3-7-plus:variant1 |
| Region | China |
| Context window | 1M |
| Weight precision | - |
| Max output tokens | 65,536 |
| Features | qwen3.7, reasoning, vision, video, web_search, code_interpreter, function_calling, structured_output, prefix_continuation, cache, fine_tuning, agentic_coding |
| Native inference | No |
| Supported endpoints | POST /v1/chat/completions, POST /v1/responses, POST /v1/messages |
Pricing
| Charge | Spec | Rate |
|---|---|---|
| Input | per 1M prompt tokens | <=256K $0.276 (was $0.40); 256K-1M $0.826 (was $1.20) |
| Output | per 1M generated tokens | <=256K $1.101 (was $1.60); 256K-1M $3.301 (was $4.80) |
| Implicit cache input | per 1M cached prompt tokens | <=256K $0.056 (was $0.08); 256K-1M $0.166 (was $0.24) |
| Web Search | per call | $0.01 |
| Image Search | per call | $0.01 |
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
temperature | number | no | 0.7 | Sampling temperature. 0 is deterministic and 2 is maximum randomness. · Range: 0 – 2 |
top_p | number | no | 0.9 | Nucleus sampling probability mass. Lower values make outputs more focused. · Range: 0 – 1 |
max_tokens | number | no | 4096 | Maximum output tokens. · Range: 1 – 65536 |
stop | string | no | - | Up to 4 strings where the model will stop generating further tokens. |
enable_thinking | boolean | no | true | Enable reasoning before answering. |
reasoning_effort | enum | no | "medium" | Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. · Allowed: none, low, medium, high, max |
thinking_budget | number | no | 32768 | Maximum tokens reserved for reasoning when thinking is enabled. · Range: 1 – 256000 |
response_format | object | no | - | OpenAI-compatible JSON mode or JSON schema response format. Use non-thinking mode for strict schemas. |
vl_high_resolution_images | boolean | no | true | Use higher resolution processing for image inputs. |
max_pixels | number | no | 2621440 | Maximum pixel count per image when high resolution processing is disabled. · Range: 4096 – 16777216 |
video_fps | number | no | 2 | Frames per second to sample from video inputs. · Range: 0.1 – 10 |
treat_images_as_video | boolean | no | false | Treat a sequence of images as video frames. |
tool_web_search | boolean | no | true | Search the web for real-time information. Adds $0.01 to the request cost for each invoked call. |
tool_web_extractor | boolean | no | true | Extract and read content from URLs. Requires Web Search and Thinking. |
tool_code_interpreter | boolean | no | true | Run Python code in a sandbox. Requires Thinking. |
tool_web_search_image | boolean | no | true | Search the web for images from text descriptions. Adds $0.01 to the request cost for each invoked call. |
tool_image_search | boolean | no | true | Find similar images from an uploaded image. Adds $0.01 to the request cost for each invoked call. |
disable_formatting | boolean | no | false | Return raw provider-style output without EmpirioLabs source formatting where supported. |
Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/qwen3-7-plus.
