Qwen3.5-Omni-Plus

Qwen3.5-Omni-Plus

Provider: Alibaba Cloud
Category: Text Generation
Endpoint: POST /v1/chat/completions
Context window: 256K
Served from: Singapore

Flagship omni-modal model for text, image, audio, and video. 3h audio, 1h video, 90+ input and 30+ output languages, 55 voice timbres.

At a glance

FieldValue
Model idqwen3-5-omni-plus
Input modalitiestext, image, video, audio
Output modalitiestext, audio
Context window256K
RegionSingapore
Featuresvision, audio_in, audio_out, multilingual
NewNo
Native inferenceNo

Pricing

ChargeSpecRate
Input (text/image/video)per 1M tokens$1.40
Input (audio)per 1M tokens$11.00
Output (text only)per 1M tokens$8.30
Output (text + audio)per 1M tokens$44.00
Web Searchper request$0.015

Example request

$curl https://api.empiriolabs.ai/v1/chat/completions \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "qwen3-5-omni-plus", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno0.7Sampling temperature. · Range: 0 – 2
top_pnumberno0.9Nucleus sampling. · Range: 0 – 1
max_tokensnumberno4096Max output tokens. · Range: 1 – 32768
output_modeenumno"text"text: text-only response. text_audio: stream both text and synthesized speech. · Allowed: text, text_audio
voiceenumno"Cherry"Voice timbre. Only when output_mode=text_audio. 55+ Qwen Omni voices available. · Allowed: Cherry, Ethan, Chelsie, Serena, Dylan, Jada, Sunny, Eric, Eric_FR, Eric_RU, Eric_DE, Eric_IT, Eric_ES, Eric_PT, Eric_JA, Eric_KO, Eric_AR, Eric_TH, Eric_VI, Eric_ID, Eric_TR, Eric_PL, Eric_NL, Eric_DA, Eric_FI, Eric_SV, Eric_CS, Eric_HU, Eric_RO, Eric_HE, Cherry_FR, Cherry_RU, Cherry_DE, Cherry_IT, Cherry_ES, Cherry_PT, Cherry_JA, Cherry_KO, Cherry_AR, Cherry_TH, Cherry_VI, Cherry_ID, Cherry_TR, Cherry_PL, Cherry_NL, Cherry_DA, Cherry_FI, Cherry_SV, Cherry_CS, Cherry_HU, Cherry_RO, Cherry_HE
enable_web_searchbooleannofalseAllow real-time web search.
video_fpsnumberno2Frames per second extracted from video input. · Range: 0.1 – 10
vl_high_resolution_imagesbooleannotrueVision: process images at high resolution.
max_pixelsnumberno2621440Vision pixel cap. Only when vl_high_resolution_images=false. · Range: 4096 – 16777216
disable_formattingbooleannofalseReturn raw upstream response.

Live machine-readable schema is also available at GET https://api.empiriolabs.ai/v1/models/qwen3-5-omni-plus.