Qwen3.5 Flash | EmpirioLabs AI Docs

POST /v1/chat/completions

Vision-language model with hybrid linear-attention plus sparse MoE, 1M context, and fast multimodal text/image/video inference.

At a glance

Field	Value
Model id	`qwen3-5-flash`
Model release date	2026-02-24
Input modalities	Text, Image, Video
Output modalities	Text
Context window	1M
Weight precision	-
Max output tokens	32,768
Region	Singapore
Features	vision, web_search, code_interpreter, function_calling, reasoning
Native inference	No
New	No
Structured output	JSON Mode
Supported endpoints	`POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/messages`, `POST /v1beta/models/qwen3-5-flash:generateContent`
Alternate model ids	`qwen3.5-flash`

Pricing

Charge	Spec	Rate
Input	per 1M prompt tokens	$0.090 (was $0.10)
Output	per 1M generated tokens	$0.368 (was $0.40)
Web search	per request when enabled	$0.015
Image Search	per call	$0.012

Example request

$ curl https://api.empiriolabs.ai/v1/chat/completions \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "qwen3-5-flash", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

Parameter	Type	Required	Default	Description
`temperature`	number	no	`0.7`	Sampling temperature. 0 = deterministic, 2 = maximum randomness. · Range: 0 – 2
`top_p`	number	no	`0.9`	Nucleus sampling probability mass. Lower = more focused. · Range: 0 – 1
`max_tokens`	number	no	`4096`	Maximum tokens in the response. · Range: 1 – 32768
`enable_thinking`	boolean	no	true	Enable extended thinking mode. Slower but improves reasoning-heavy tasks.
`vl_high_resolution_images`	boolean	no	true	Use higher resolution for input images. Better detail at higher cost.
`max_pixels`	number	no	`2621440`	Maximum pixels per input image. Larger = more detail but slower / more tokens. · Range: 1 – 99999999
`tool_web_search`	boolean	no	false	Search the web for real-time information.
`tool_web_extractor`	boolean	no	true	Extract and read content from URLs. Requires Web Search and Thinking.
`tool_code_interpreter`	boolean	no	true	Run Python code in a sandbox. Requires Thinking.
`tool_web_search_image`	boolean	no	true	Search the web for images from text descriptions.
`tool_image_search`	boolean	no	true	Find similar images from an uploaded image.
`video_fps`	number	no	`2`	Frames-per-second sampled from input video for analysis. · Range: 0.1 – 10
`treat_images_as_video`	boolean	no	false	Treat a sequence of input images as a video for temporal reasoning.
`response_format`	enum	no	-	Return the output as a valid JSON object (JSON mode). Describe the fields you want in your prompt.
`disable_formatting`	boolean	no	false	Skip the EmpirioLabs Markdown formatting (citation [N] rewriting + References block when web search / tools were used). The raw upstream answer with plain [N] citations is returned.

Notes

Built-in tools (billed only when invoked)

Web search: $0.015/call
Web extractor: free
Code interpreter: free
Text-to-image search: $0.012/call
Image-to-image search: $0.012/call

Other

Thinking tokens are billed as output tokens

Text-to-Image Search and Image-to-Image Search use the Image Search pricing row. Each invoked image search is billed at that listed per-call rate.

Per-tool billing (usage.tool_usage)

When this model invokes tools (web search, code interpreter, etc.) inside a single request, the response carries a normalized usage.tool_usage map alongside the token counts. The example below shows the shape — exact field names, units, and which tools appear can vary slightly per provider:

1 "usage": {
2   "prompt_tokens": 123,
3   "completion_tokens": 456,
4   "cost_usd": 0.0042,
5   "tool_usage": {"web_search": 3, "code_interpreter": 1}
6 }

The tool counts are already factored into cost_usd — they are surfaced for transparency so you can audit per-tool billing. The field is omitted when no tools were invoked.

Variants

`:variant1`

Field	Value
Model id	`qwen3-5-flash:variant1`
Model release date	2026-02-24
Region	China
Context window	1M
Weight precision	-
Max output tokens	65,536
Features	reasoning, vision, video, web_search, function_calling, agentic_coding
Native inference	No
Structured output	JSON Mode
Batch API	35% off list price
Supported endpoints	`POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/messages`, `POST /v1beta/models/qwen3-5-flash:variant1:generateContent`

Pricing

Charge	Spec	Rate
Input	per 1M prompt tokens	<=128K $0.029 (was $0.090); 128K-256K $0.115; 256K-1M $0.172
Output	per 1M generated tokens	<=128K $0.287 (was $0.368); 128K-256K $1.147; 256K-1M $1.72
Web search	per query when enabled	$0.01

Parameters

Parameter	Type	Required	Default	Description
`temperature`	number	no	`0.7`	Sampling temperature. 0 is deterministic and 2 is maximum randomness. · Range: 0 – 2
`top_p`	number	no	`0.9`	Nucleus sampling probability mass. Lower values make outputs more focused. · Range: 0 – 1
`max_tokens`	number	no	`4096`	Maximum output tokens. · Range: 1 – 65536
`stop`	string	no	-	Up to 4 strings where the model will stop generating further tokens.
`enable_thinking`	boolean	no	true	Enable reasoning before answering.
`reasoning_effort`	enum	no	`"medium"`	Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: `none`, `low`, `medium`, `high`, `max`
`thinking_budget`	number	no	`32768`	Maximum tokens reserved for reasoning when thinking is enabled. · Range: 1 – 80000
`vl_high_resolution_images`	boolean	no	true	Use higher resolution processing for image inputs.
`max_pixels`	number	no	`2621440`	Maximum pixel count per image when high resolution processing is disabled. · Range: 4096 – 16777216
`video_fps`	number	no	`2`	Frames per second to sample from video inputs. · Range: 0.1 – 10
`tool_web_search`	boolean	no	false	Search the web for real-time information. Adds $0.01 to the request cost when enabled.
`response_format`	enum	no	-	Return the output as a valid JSON object (JSON mode). Describe the fields you want in your prompt.

Notes

Supports text, image, and video input. Web search is available through tool_web_search and adds $0.01 per query when enabled. Thinking tokens are billed as output tokens. Explicit cache controls are not supported.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/qwen3-5-flash.