Qwen3.5 Omni Plus

Alibaba Cloud · Text Generation
POST /v1/chat/completionsFlagship omni-modal model for text, image, audio, and video. 3h audio, 1h video, 90+ input and 30+ output languages, 55 voice timbres.
At a glance
Pricing
Example request
Parameters
Notes
Audio billing
- Audio is billed at a higher token rate than text/image/video
- When audio output is enabled, output text is NOT charged — only audio tokens
Voice and language
- 55 voice timbres available
- Audio output supports 29 languages, 7 dialects
Per-tool billing (usage.tool_usage)
When this model invokes tools (web search, code interpreter, etc.) inside a single request, the response carries a normalized usage.tool_usage map alongside the token counts. The example below shows the shape — exact field names, units, and which tools appear can vary slightly per provider:
The tool counts are already factored into cost_usd — they are surfaced for transparency so you can audit per-tool billing. The field is omitted when no tools were invoked.
Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/qwen3-5-omni-plus.
