Qwen3.6-Flash
Qwen3.6-Flash

Alibaba Cloud · Text Generation
POST /v1/chat/completionsFast Qwen3.6 vision-language model for agentic coding, math reasoning, spatial understanding, OCR, and text, image, and video input.
At a glance
| Field | Value |
|---|---|
| Model id | qwen3-6-flash |
| Input modalities | Text, Image, Video |
| Output modalities | Text |
| Context window | 1M |
| Weight precision | - |
| Max output tokens | 65,536 |
| Region | Singapore |
| Features | reasoning, vision, video, web_search, function_calling, structured_output, agentic_coding |
| Native inference | No |
| New | Yes |
| Supported endpoints | POST /v1/chat/completions, POST /v1/responses, POST /v1/messages |
Pricing
| Charge | Spec | Rate |
|---|---|---|
| Input | per 1M prompt tokens | <=256K $0.25; 256K-1M $1.00 |
| Output | per 1M generated tokens | <=256K $1.50; 256K-1M $4.00 |
| Web search | per query when enabled | $0.02 |
Example request
$ curl https://api.empiriolabs.ai/v1/chat/completions \ > -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \ > -H 'Content-Type: application/json' \ > -d '{"model": "qwen3-6-flash", "messages": [{"role":"user","content":"Hello"}]}'
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
temperature | number | no | 0.7 | Sampling temperature. 0 is deterministic and 2 is maximum randomness. · Range: 0 – 2 |
top_p | number | no | 0.9 | Nucleus sampling probability mass. Lower values make outputs more focused. · Range: 0 – 1 |
max_tokens | number | no | 4096 | Maximum output tokens. · Range: 1 – 65536 |
stop | string | no | — | Up to 4 strings where the model will stop generating further tokens. |
enable_thinking | boolean | no | true | Enable reasoning before answering. |
reasoning_effort | enum | no | "medium" | Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: none, low, medium, high, max |
thinking_budget | number | no | 32768 | Maximum tokens reserved for reasoning when thinking is enabled. · Range: 1 – 64000 |
response_format | object | no | — | OpenAI-compatible JSON mode or JSON schema response format. Use non-thinking mode for strict schemas. |
vl_high_resolution_images | boolean | no | true | Use higher resolution processing for image inputs. |
max_pixels | number | no | 2621440 | Maximum pixel count per image when high resolution processing is disabled. · Range: 4096 – 16777216 |
video_fps | number | no | 2 | Frames per second to sample from video inputs. · Range: 0.1 – 10 |
tool_web_search | boolean | no | false | Search the web for real-time information. Adds $0.02 to the request cost when enabled. |
Notes
Supports text, image, and video input. Web search is available through tool_web_search and adds $0.02 per query when enabled. Thinking tokens are billed as output tokens. Explicit cache controls are not supported.
Variants
:variant1
| Field | Value |
|---|---|
| Model id | qwen3-6-flash:variant1 |
| Region | China |
| Context window | 1M |
| Weight precision | - |
| Max output tokens | 65,536 |
| Features | reasoning, vision, video, web_search, function_calling, structured_output, agentic_coding |
| Native inference | No |
| Supported endpoints | POST /v1/chat/completions, POST /v1/responses, POST /v1/messages |
Pricing
| Charge | Spec | Rate |
|---|---|---|
| Input | per 1M prompt tokens | <=256K $0.165 (was $0.25); 256K-1M $0.66 (was $1.00) |
| Output | per 1M generated tokens | <=256K $0.99 (was $1.50); 256K-1M $3.961 (was $4.00) |
| Web search | per query when enabled | $0.01 |
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
temperature | number | no | 0.7 | Sampling temperature. 0 is deterministic and 2 is maximum randomness. · Range: 0 – 2 |
top_p | number | no | 0.9 | Nucleus sampling probability mass. Lower values make outputs more focused. · Range: 0 – 1 |
max_tokens | number | no | 4096 | Maximum output tokens. · Range: 1 – 65536 |
stop | string | no | — | Up to 4 strings where the model will stop generating further tokens. |
enable_thinking | boolean | no | true | Enable reasoning before answering. |
reasoning_effort | enum | no | "medium" | Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: none, low, medium, high, max |
thinking_budget | number | no | 32768 | Maximum tokens reserved for reasoning when thinking is enabled. · Range: 1 – 128000 |
response_format | object | no | — | OpenAI-compatible JSON mode or JSON schema response format. Use non-thinking mode for strict schemas. |
vl_high_resolution_images | boolean | no | true | Use higher resolution processing for image inputs. |
max_pixels | number | no | 2621440 | Maximum pixel count per image when high resolution processing is disabled. · Range: 4096 – 16777216 |
video_fps | number | no | 2 | Frames per second to sample from video inputs. · Range: 0.1 – 10 |
tool_web_search | boolean | no | false | Search the web for real-time information. Adds $0.01 to the request cost when enabled. |
Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/qwen3-6-flash.
