Qwen3.5 Flash

Qwen3.5 Flash
Alibaba Cloud · Text Generation
POST /v1/chat/completions

Vision-language model with hybrid linear-attention plus sparse MoE, 1M context, and fast multimodal text/image/video inference.

At a glance

FieldValue
Model idqwen3-5-flash
Input modalitiesText, Image, Video
Output modalitiesText
Context window1M
Weight precision-
Max output tokens32,768
RegionSingapore
Featuresvision, web_search, code_interpreter, function_calling
Native inferenceNo
NewNo
Supported endpointsPOST /v1/chat/completions, POST /v1/responses, POST /v1/messages

Pricing

ChargeSpecRate
Inputper 1M prompt tokens$0.090 (was $0.10)
Outputper 1M generated tokens$0.368 (was $0.40)
Web Searchper call$0.015
Image Searchper call$0.012

Example request

$curl https://api.empiriolabs.ai/v1/chat/completions \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "qwen3-5-flash", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno0.7Sampling temperature. 0 = deterministic, 2 = maximum randomness. · Range: 0 – 2
top_pnumberno0.9Nucleus sampling probability mass. Lower = more focused. · Range: 0 – 1
max_tokensnumberno4096Maximum tokens in the response. · Range: 1 – 32768
enable_thinkingbooleannotrueEnable extended thinking mode. Slower but improves reasoning-heavy tasks.
vl_high_resolution_imagesbooleannotrueUse higher resolution for input images. Better detail at higher cost.
max_pixelsnumberno2621440Maximum pixels per input image. Larger = more detail but slower / more tokens. · Range: 1 – 99999999
tool_web_searchbooleannofalseSearch the web for real-time information.
tool_web_extractorbooleannotrueExtract and read content from URLs. Requires Web Search and Thinking.
tool_code_interpreterbooleannotrueRun Python code in a sandbox. Requires Thinking.
tool_web_search_imagebooleannotrueSearch the web for images from text descriptions.
tool_image_searchbooleannotrueFind similar images from an uploaded image.
video_fpsnumberno2Frames-per-second sampled from input video for analysis. · Range: 0.1 – 10
treat_images_as_videobooleannofalseTreat a sequence of input images as a video for temporal reasoning.
disable_formattingbooleannofalseSkip the EmpirioLabs Markdown formatting (citation [N] rewriting + References block when web search / tools were used). The raw upstream answer with plain [N] citations is returned.

Notes

Built-in tools (billed only when invoked)

  • Web search: $0.015/call
  • Web extractor: free
  • Code interpreter: free
  • Text-to-image search: $0.012/call
  • Image-to-image search: $0.012/call

Other

  • Thinking tokens are billed as output tokens

Text-to-Image Search and Image-to-Image Search use the Image Search pricing row. Each invoked image search is billed at that listed per-call rate.

Per-tool billing (usage.tool_usage)

When this model invokes tools (web search, code interpreter, etc.) inside a single request, the response carries a normalized usage.tool_usage map alongside the token counts. The example below shows the shape — exact field names, units, and which tools appear can vary slightly per provider:

1"usage": {
2 "prompt_tokens": 123,
3 "completion_tokens": 456,
4 "cost_usd": 0.0042,
5 "tool_usage": {"web_search": 3, "code_interpreter": 1}
6}

The tool counts are already factored into cost_usd — they are surfaced for transparency so you can audit per-tool billing. The field is omitted when no tools were invoked.

Variants

:variant1

FieldValue
Model idqwen3-5-flash:variant1
RegionChina
Context window1M
Weight precision-
Max output tokens65,536
Featuresreasoning, vision, video, web_search, function_calling, structured_output, agentic_coding
Native inferenceNo
Supported endpointsPOST /v1/chat/completions, POST /v1/responses, POST /v1/messages

Pricing

ChargeSpecRate
Inputper 1M prompt tokens<=128K $0.029 (was $0.090); 128K-256K $0.115; 256K-1M $0.172
Outputper 1M generated tokens<=128K $0.287 (was $0.368); 128K-256K $1.147; 256K-1M $1.72
Web searchper query when enabled$0.01

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno0.7Sampling temperature. 0 is deterministic and 2 is maximum randomness. · Range: 0 – 2
top_pnumberno0.9Nucleus sampling probability mass. Lower values make outputs more focused. · Range: 0 – 1
max_tokensnumberno4096Maximum output tokens. · Range: 1 – 65536
stopstringno-Up to 4 strings where the model will stop generating further tokens.
enable_thinkingbooleannotrueEnable reasoning before answering.
reasoning_effortenumno"medium"Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: none, low, medium, high, max
thinking_budgetnumberno32768Maximum tokens reserved for reasoning when thinking is enabled. · Range: 1 – 80000
response_formatobjectno-OpenAI-compatible JSON mode or JSON schema response format. Use non-thinking mode for strict schemas.
vl_high_resolution_imagesbooleannotrueUse higher resolution processing for image inputs.
max_pixelsnumberno2621440Maximum pixel count per image when high resolution processing is disabled. · Range: 4096 – 16777216
video_fpsnumberno2Frames per second to sample from video inputs. · Range: 0.1 – 10
tool_web_searchbooleannofalseSearch the web for real-time information. Adds $0.01 to the request cost when enabled.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/qwen3-5-flash.