Qwen3.5 Omni Plus

Qwen3.5 Omni Plus
Alibaba Cloud · Text Generation
POST /v1/chat/completions

Flagship omni-modal model for text, image, audio, and video. 3h audio, 1h video, 90+ input and 30+ output languages, 55 voice timbres.

At a glance

FieldValue
Model idqwen3-5-omni-plus
Input modalitiesText, Image, Video, Audio
Output modalitiesText, Audio
Context window256K
Weight precision-
Max output tokens32,768
RegionSingapore
Featuresvision, audio_in, audio_out, multilingual
Native inferenceNo
NewNo
Supported endpointsPOST /v1/chat/completions, POST /v1/responses, POST /v1/messages, POST /v1/audio/speech

Pricing

ChargeSpecRate
Inputper 1M prompt tokensper 1M prompt tokens $1.40; per 1M prompt tokens $11.00
Outputper 1M generated tokensper 1M generated tokens $8.30; per 1M generated tokens $44.00
Web Searchper request$0.015

Example request

$curl https://api.empiriolabs.ai/v1/chat/completions \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "qwen3-5-omni-plus", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno0.7Sampling temperature. 0 = deterministic, 2 = maximum randomness. · Range: 0 – 2
top_pnumberno0.9Nucleus sampling probability mass. Lower = more focused. · Range: 0 – 1
max_tokensnumberno4096Maximum tokens in the response. · Range: 1 – 32768
output_modeenumno"text"Output format mode. text = text only, audio = include synthesized speech. · Allowed: text, text_audio
voicestringno"Tina"Voice name for audio output (when output_mode = audio).
tool_web_searchbooleannofalseAllow the model to perform web searches when needed.
video_fpsnumberno2Frames-per-second sampled from input video for analysis. · Range: 0.1 – 10
vl_high_resolution_imagesbooleannotrueUse higher resolution for input images. Better detail at higher cost.
max_pixelsnumberno2621440Maximum pixels per input image. Larger = more detail but slower / more tokens. · Range: 1 – 99999999

Notes

Audio billing

  • Audio is billed at a higher token rate than text/image/video
  • When audio output is enabled, output text is NOT charged — only audio tokens

Voice and language

  • 55 voice timbres available
  • Audio output supports 29 languages, 7 dialects

Per-tool billing (usage.tool_usage)

When this model invokes tools (web search, code interpreter, etc.) inside a single request, the response carries a normalized usage.tool_usage map alongside the token counts. The example below shows the shape — exact field names, units, and which tools appear can vary slightly per provider:

1"usage": {
2 "prompt_tokens": 123,
3 "completion_tokens": 456,
4 "cost_usd": 0.0042,
5 "tool_usage": {"web_search": 3, "code_interpreter": 1}
6}

The tool counts are already factored into cost_usd — they are surfaced for transparency so you can audit per-tool billing. The field is omitted when no tools were invoked.


Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/qwen3-5-omni-plus.