Qwen3-Max-Thinking

Qwen3-Max-Thinking

Provider: Alibaba Cloud
Category: Text Generation
Endpoint: POST /v1/chat/completions
Context window: 256K
Served from: Singapore

Reasoning model with adaptive tool use (search, memory, code interpreter) and test-time scaling for higher accuracy on complex tasks.

At a glance

FieldValue
Model idqwen3-max-thinking
Input modalitiestext
Output modalitiestext
Context window256K
RegionSingapore
Featuresreasoning, code_interpreter, web_search, thinking
NewNo
Native inferenceNo

Pricing

ChargeSpecRate
Input≤32K, per 1M tokens1.08(was1.08 (was 1.20)
Input32K-128K, per 1M tokens2.16(was2.16 (was 2.40)
Input128K-256K, per 1M tokens2.70(was2.70 (was 3.00)
Output≤32K, per 1M tokens5.52(was5.52 (was 6.00)
Output32K-128K, per 1M tokens11.04(was11.04 (was 12.00)
Output128K-256K, per 1M tokens13.80(was13.80 (was 15.00)

Example request

$curl https://api.empiriolabs.ai/v1/chat/completions \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "qwen3-max-thinking", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno0.7Sampling temperature · Range: 0 – 2
top_pnumberno1Nucleus sampling · Range: 0 – 1
max_tokensnumberno4096Max output tokens · Range: 1 – 65536
frequency_penaltynumberno0Range: -2 – 2
presence_penaltynumberno0Range: -2 – 2
streambooleannofalseServer-Sent Events streaming
stopstringnoComma-separated stop sequences
disable_formattingbooleannofalseReturn raw upstream response with no formatting wrappers
enable_thinkingbooleannotrueReason step-by-step before answering
thinking_budgetnumberno32768Tokens reserved for thinking · Range: 1 – 393216

Live machine-readable schema is also available at GET https://api.empiriolabs.ai/v1/models/qwen3-max-thinking.