Qwen3.6-Flash

Qwen3.6-Flash
Alibaba Cloud · Text Generation
POST /v1/chat/completions

Fast Qwen3.6 vision-language model for agentic coding, math reasoning, spatial understanding, OCR, and text, image, and video input.

At a glance

FieldValue
Model idqwen3-6-flash
Input modalitiesText, Image, Video
Output modalitiesText
Context window1M
Weight precision-
Max output tokens65,536
RegionSingapore
Featuresreasoning, vision, video, web_search, function_calling, structured_output, agentic_coding
Native inferenceNo
NewYes
Supported endpointsPOST /v1/chat/completions, POST /v1/responses, POST /v1/messages

Pricing

ChargeSpecRate
Inputper 1M prompt tokens<=256K $0.25; 256K-1M $1.00
Outputper 1M generated tokens<=256K $1.50; 256K-1M $4.00
Web searchper query when enabled$0.02

Example request

$curl https://api.empiriolabs.ai/v1/chat/completions \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "qwen3-6-flash", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno0.7Sampling temperature. 0 is deterministic and 2 is maximum randomness. · Range: 0 – 2
top_pnumberno0.9Nucleus sampling probability mass. Lower values make outputs more focused. · Range: 0 – 1
max_tokensnumberno4096Maximum output tokens. · Range: 1 – 65536
stopstringnoUp to 4 strings where the model will stop generating further tokens.
enable_thinkingbooleannotrueEnable reasoning before answering.
reasoning_effortenumno"medium"Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: none, low, medium, high, max
thinking_budgetnumberno32768Maximum tokens reserved for reasoning when thinking is enabled. · Range: 1 – 64000
response_formatobjectnoOpenAI-compatible JSON mode or JSON schema response format. Use non-thinking mode for strict schemas.
vl_high_resolution_imagesbooleannotrueUse higher resolution processing for image inputs.
max_pixelsnumberno2621440Maximum pixel count per image when high resolution processing is disabled. · Range: 4096 – 16777216
video_fpsnumberno2Frames per second to sample from video inputs. · Range: 0.1 – 10
tool_web_searchbooleannofalseSearch the web for real-time information. Adds $0.02 to the request cost when enabled.

Notes

Supports text, image, and video input. Web search is available through tool_web_search and adds $0.02 per query when enabled. Thinking tokens are billed as output tokens. Explicit cache controls are not supported.

Variants

:variant1

FieldValue
Model idqwen3-6-flash:variant1
RegionChina
Context window1M
Weight precision-
Max output tokens65,536
Featuresreasoning, vision, video, web_search, function_calling, structured_output, agentic_coding
Native inferenceNo
Supported endpointsPOST /v1/chat/completions, POST /v1/responses, POST /v1/messages

Pricing

ChargeSpecRate
Inputper 1M prompt tokens<=256K $0.165 (was $0.25); 256K-1M $0.66 (was $1.00)
Outputper 1M generated tokens<=256K $0.99 (was $1.50); 256K-1M $3.961 (was $4.00)
Web searchper query when enabled$0.01

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno0.7Sampling temperature. 0 is deterministic and 2 is maximum randomness. · Range: 0 – 2
top_pnumberno0.9Nucleus sampling probability mass. Lower values make outputs more focused. · Range: 0 – 1
max_tokensnumberno4096Maximum output tokens. · Range: 1 – 65536
stopstringnoUp to 4 strings where the model will stop generating further tokens.
enable_thinkingbooleannotrueEnable reasoning before answering.
reasoning_effortenumno"medium"Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: none, low, medium, high, max
thinking_budgetnumberno32768Maximum tokens reserved for reasoning when thinking is enabled. · Range: 1 – 128000
response_formatobjectnoOpenAI-compatible JSON mode or JSON schema response format. Use non-thinking mode for strict schemas.
vl_high_resolution_imagesbooleannotrueUse higher resolution processing for image inputs.
max_pixelsnumberno2621440Maximum pixel count per image when high resolution processing is disabled. · Range: 4096 – 16777216
video_fpsnumberno2Frames per second to sample from video inputs. · Range: 0.1 – 10
tool_web_searchbooleannofalseSearch the web for real-time information. Adds $0.01 to the request cost when enabled.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/qwen3-6-flash.