Qwen3.6 Flash | EmpirioLabs AI Docs

POST /v1/chat/completions

Fast Qwen3.6 vision-language model for agentic coding, math reasoning, spatial understanding, OCR, and text, image, and video input.

At a glance

Field	Value
Model id	`qwen3-6-flash`
Model release date	2026-04-16
Input modalities	Text, Image, Video
Output modalities	Text
Context window	1M
Weight precision	-
Max output tokens	65,536
Region	Singapore
Features	reasoning, vision, video, web_search, function_calling, agentic_coding
Native inference	No
New	Yes
Structured output	JSON Mode
Supported endpoints	`POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/messages`, `POST /v1beta/models/qwen3-6-flash:generateContent`

Pricing

Charge	Spec	Rate
Input	per 1M prompt tokens	<=256K $0.25; 256K-1M $1.00
Output	per 1M generated tokens	<=256K $1.50; 256K-1M $4.00
Web search	per query when enabled	$0.02

Example request

$ curl https://api.empiriolabs.ai/v1/chat/completions \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "qwen3-6-flash", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

Parameter	Type	Required	Default	Description
`temperature`	number	no	`0.7`	Sampling temperature. 0 is deterministic and 2 is maximum randomness. · Range: 0 – 2
`top_p`	number	no	`0.9`	Nucleus sampling probability mass. Lower values make outputs more focused. · Range: 0 – 1
`max_tokens`	number	no	`4096`	Maximum output tokens. · Range: 1 – 65536
`stop`	string	no	-	Up to 4 strings where the model will stop generating further tokens.
`enable_thinking`	boolean	no	true	Enable reasoning before answering.
`reasoning_effort`	enum	no	`"medium"`	Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: `none`, `low`, `medium`, `high`, `max`
`thinking_budget`	number	no	`32768`	Maximum tokens reserved for reasoning when thinking is enabled. · Range: 1 – 64000
`vl_high_resolution_images`	boolean	no	true	Use higher resolution processing for image inputs.
`max_pixels`	number	no	`2621440`	Maximum pixel count per image when high resolution processing is disabled. · Range: 4096 – 16777216
`video_fps`	number	no	`2`	Frames per second to sample from video inputs. · Range: 0.1 – 10
`tool_web_search`	boolean	no	false	Search the web for real-time information. Adds $0.02 to the request cost when enabled.
`response_format`	enum	no	-	Return the output as a valid JSON object (JSON mode). Describe the fields you want in your prompt.

Notes

Supports text, image, and video input. Web search is available through tool_web_search and adds $0.02 per query when enabled. Thinking tokens are billed as output tokens. Explicit cache controls are not supported.

Variants

`:variant1`

Field	Value
Model id	`qwen3-6-flash:variant1`
Model release date	2026-04-16
Region	China
Context window	1M
Weight precision	-
Max output tokens	65,536
Features	reasoning, vision, video, web_search, function_calling, agentic_coding
Native inference	No
Structured output	JSON Mode
Supported endpoints	`POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/messages`, `POST /v1beta/models/qwen3-6-flash:variant1:generateContent`

Pricing

Charge	Spec	Rate
Input	per 1M prompt tokens	<=256K $0.165 (was $0.25); 256K-1M $0.66 (was $1.00)
Output	per 1M generated tokens	<=256K $0.99 (was $1.50); 256K-1M $3.961 (was $4.00)
Web search	per query when enabled	$0.01

Parameters

Parameter	Type	Required	Default	Description
`temperature`	number	no	`0.7`	Sampling temperature. 0 is deterministic and 2 is maximum randomness. · Range: 0 – 2
`top_p`	number	no	`0.9`	Nucleus sampling probability mass. Lower values make outputs more focused. · Range: 0 – 1
`max_tokens`	number	no	`4096`	Maximum output tokens. · Range: 1 – 65536
`stop`	string	no	-	Up to 4 strings where the model will stop generating further tokens.
`enable_thinking`	boolean	no	true	Enable reasoning before answering.
`reasoning_effort`	enum	no	`"medium"`	Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: `none`, `low`, `medium`, `high`, `max`
`thinking_budget`	number	no	`32768`	Maximum tokens reserved for reasoning when thinking is enabled. · Range: 1 – 128000
`vl_high_resolution_images`	boolean	no	true	Use higher resolution processing for image inputs.
`max_pixels`	number	no	`2621440`	Maximum pixel count per image when high resolution processing is disabled. · Range: 4096 – 16777216
`video_fps`	number	no	`2`	Frames per second to sample from video inputs. · Range: 0.1 – 10
`tool_web_search`	boolean	no	false	Search the web for real-time information. Adds $0.01 to the request cost when enabled.
`response_format`	enum	no	-	Return the output as a valid JSON object (JSON mode). Describe the fields you want in your prompt.

Notes

Supports text, image, and video input. Web search is available through tool_web_search and adds $0.01 per query when enabled. Thinking tokens are billed as output tokens. Explicit cache controls are not supported.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/qwen3-6-flash.