Qwen3.5 Omni Flash | EmpirioLabs AI Docs

POST /v1/chat/completions

Cost-efficient omni-modal model handling text, image, audio, and video, with up to 3 hours of audio and 1 hour of video across 90+ languages.

At a glance

Field	Value
Model id	`qwen3-5-omni-flash`
Model release date	2026-03-30
Input modalities	Text, Image, Video, Audio
Output modalities	Text, Audio
Context window	256K
Weight precision	-
Max output tokens	32,768
Region	Singapore
Features	vision, audio_in, audio_out, multilingual, function_calling, web_search
Native inference	No
New	No
Structured output	JSON Schema
Supported endpoints	`POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/messages`, `POST /v1/audio/speech`, `POST /v1beta/models/qwen3-5-omni-flash:generateContent`
Alternate model ids	`qwen3.5-omni-flash`

Pricing

Charge	Spec	Rate
Input	per 1M prompt tokens	per 1M prompt tokens $0.40; per 1M prompt tokens $3.00
Output	per 1M generated tokens	per 1M generated tokens $2.20; per 1M generated tokens $11.90
Web search	per request	$0.015

Example request

$ curl https://api.empiriolabs.ai/v1/chat/completions \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "qwen3-5-omni-flash", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

Parameter	Type	Required	Default	Description
`temperature`	number	no	`0.7`	Sampling temperature. 0 = deterministic, 2 = maximum randomness. · Range: 0 – 2
`top_p`	number	no	`0.9`	Nucleus sampling probability mass. Lower = more focused. · Range: 0 – 1
`max_tokens`	number	no	`4096`	Maximum tokens in the response. · Range: 1 – 32768
`output_mode`	enum	no	`"text"`	Output format mode. text = text only, audio = include synthesized speech. · Allowed: `text`, `text_audio`
`voice`	string	no	`"Tina"`	Voice name for audio output (when output_mode = audio).
`tool_web_search`	boolean	no	false	Allow the model to perform web searches when needed.
`video_fps`	number	no	`2`	Frames-per-second sampled from input video for analysis. · Range: 0.1 – 10
`vl_high_resolution_images`	boolean	no	true	Use higher resolution for input images. Better detail at higher cost.
`max_pixels`	number	no	`2621440`	Maximum pixels per input image. Larger = more detail but slower / more tokens. · Range: 1 – 99999999
`response_format`	enum	no	-	Constrain the output to JSON. Use JSON mode for any valid JSON object, or JSON schema to force output that matches a schema you provide.

Notes

Audio billing

Audio is billed at a higher token rate than text/image/video
When audio output is enabled, output text is NOT charged — only audio tokens

Voice and language

55 voice timbres available
Audio output supports 29 languages, 7 dialects

Per-tool billing (usage.tool_usage)

When this model invokes tools (web search, code interpreter, etc.) inside a single request, the response carries a normalized usage.tool_usage map alongside the token counts. The example below shows the shape — exact field names, units, and which tools appear can vary slightly per provider:

1 "usage": {
2   "prompt_tokens": 123,
3   "completion_tokens": 456,
4   "cost_usd": 0.0042,
5   "tool_usage": {"web_search": 3, "code_interpreter": 1}
6 }

The tool counts are already factored into cost_usd — they are surfaced for transparency so you can audit per-tool billing. The field is omitted when no tools were invoked.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/qwen3-5-omni-flash.