Qwen3.5 Omni Flash

POST /v1/chat/completionsCost-efficient omni-modal model handling text, image, audio, and video, with up to 3 hours of audio and 1 hour of video across 90+ languages.
At a glance
Pricing
Example request
Parameters
Notes
Audio billing
- Audio is billed at a higher token rate than text/image/video
- When audio output is enabled, output text is NOT charged — only audio tokens
Voice and language
- 55 voice timbres available
- Audio output supports 29 languages, 7 dialects
Per-tool billing (usage.tool_usage)
When this model invokes tools (web search, code interpreter, etc.) inside a single request, the response carries a normalized usage.tool_usage map alongside the token counts. The example below shows the shape — exact field names, units, and which tools appear can vary slightly per provider:
The tool counts are already factored into cost_usd — they are surfaced for transparency so you can audit per-tool billing. The field is omitted when no tools were invoked.
Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/qwen3-5-omni-flash.
