For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
WebsiteModelsPricingGet Started
DocumentationAPI Reference
DocumentationAPI Reference
  • Overview
    • Welcome
    • Getting Started
    • Authentication
    • Concepts
  • Platform
    • Models and Pricing
    • Billing and Credits
    • Limits and API Keys
    • Account Usage API
    • Generation Templates
    • OpenAI and Anthropic Compatibility
    • Integrations
  • Providers and Models
    • All providers
      • Alibaba Cloud overview
      • HappyHorse 1.0
      • Qwen Image 2.0
      • Qwen3 Max
      • Qwen3 Max Preview
      • Qwen3 Max Thinking
      • Qwen3 Rerank
      • Qwen3.5 122B-A10B
      • Qwen3.5 27B
      • Qwen3.5 35B-A3B
      • Qwen3.5 397B-A17B
      • Qwen3.5 4B
      • Qwen3.5 9B
      • Qwen3.5 Flash
      • Qwen3.5 Omni Flash
      • Qwen3.5 Omni Plus
      • Qwen3.5 Plus
      • Qwen3.6 27B
      • Qwen3.6 Flash
      • Qwen3.6 Max Preview
      • Qwen3.6 Plus
      • Qwen3.7 Max
      • Text Embedding v4
      • Tongyi Embedding Vision Flash
      • Tongyi Embedding Vision Plus
      • Wan 2.6
      • Wan 2.7
      • Wan2.7 Image
  • Reference
    • API Reference Overview
    • AI Agent Access
    • Support
    • Changelog
Logo
WebsiteModelsPricingGet Started
On this page
  • At a glance
  • Pricing
  • Example request
  • Parameters
  • Notes
  • Variants
  • :variant1
Providers and ModelsAlibaba Cloud

Qwen3.7 Max

Was this page helpful?
Previous

Text Embedding v4

Next
Built with

Qwen3.7 Max
Alibaba Cloud · Text Generation
POST /v1/chat/completions

Qwen3.7 Max is a flagship text model for coding, productivity, long-running agents, deep thinking, tools, and 1M-token context.

At a glance

FieldValue
Model idqwen3-7-max
Input modalitiesText
Output modalitiesText
Context window1M
Weight precision-
Max output tokens65,536
RegionSingapore
Featuresreasoning, web_search, code_interpreter, function_calling, agentic_coding
Native inferenceNo
NewYes
Supported endpointsPOST /v1/chat/completions, POST /v1/responses, POST /v1/messages

Pricing

ChargeSpecRate
Inputper 1M prompt tokens$2.50
Outputper 1M generated tokens$7.50
Web searchper call when invoked$0.02
Web extractorper call when invoked$0.02
Code interpreterper call when invoked$0.02

Example request

$curl https://api.empiriolabs.ai/v1/chat/completions \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "qwen3-7-max", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno0.7Sampling temperature. 0 is deterministic and 2 is maximum randomness. · Range: 0 – 2
top_pnumberno0.9Nucleus sampling probability mass. Lower values make outputs more focused. · Range: 0 – 1
max_tokensnumberno4096Maximum output tokens. · Range: 1 – 65536
stopstringno—Up to 4 strings where the model will stop generating further tokens.
enable_thinkingbooleannotrueEnable reasoning before answering.
reasoning_effortenumno"medium"Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. · Allowed: none, low, medium, high, max
thinking_budgetnumberno32768Maximum tokens reserved for reasoning when thinking is enabled. · Range: 1 – 64000
tool_web_searchbooleannofalseSearch the web for real-time information. Adds $0.02 to the request cost for each invoked web search call.
tool_web_extractorbooleannofalseExtract and read content from URLs. Requires Web Search and Thinking. Adds $0.02 to the request cost for each invoked web extractor call.
tool_code_interpreterbooleannofalseRun Python code in a sandbox. Requires Thinking. Adds $0.02 to the request cost for each invoked code interpreter call.
disable_formattingbooleannofalseReturn raw provider-style output without EmpirioLabs source formatting where supported.

Notes

Text input only. Web search, web extractor, and code interpreter are optional built-in tools exposed through tool_* parameters. Each built-in tool call adds $0.02 when invoked. Thinking tokens are billed as output tokens.

Per-tool billing (usage.tool_usage)

When this model invokes built-in tools inside a single request, the response carries a normalized usage.tool_usage map alongside the token counts. Tool counts are already factored into cost_usd and are surfaced for transparency.

Variants

:variant1

FieldValue
Model idqwen3-7-max:variant1
RegionChina
Context window1M
Weight precision-
Max output tokens65,536
Featuresreasoning, web_search, code_interpreter, function_calling, agentic_coding
Native inferenceNo
Supported endpointsPOST /v1/chat/completions, POST /v1/responses, POST /v1/messages

Pricing

ChargeSpecRate
Inputper 1M prompt tokens$1.65 (was $2.50)
Outputper 1M generated tokens$4.951 (was $7.50)
Web searchper call when invoked$0.01
Web extractorper call when invoked$0.01
Code interpreterper call when invoked$0.01

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno0.7Sampling temperature. 0 is deterministic and 2 is maximum randomness. · Range: 0 – 2
top_pnumberno0.9Nucleus sampling probability mass. Lower values make outputs more focused. · Range: 0 – 1
max_tokensnumberno4096Maximum output tokens. · Range: 1 – 65536
stopstringno—Up to 4 strings where the model will stop generating further tokens.
enable_thinkingbooleannotrueEnable reasoning before answering.
reasoning_effortenumno"medium"Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. · Allowed: none, low, medium, high, max
thinking_budgetnumberno32768Maximum tokens reserved for reasoning when thinking is enabled. · Range: 1 – 64000
tool_web_searchbooleannofalseSearch the web for real-time information. Adds $0.01 to the request cost for each invoked web search call.
tool_web_extractorbooleannofalseExtract and read content from URLs. Requires Web Search and Thinking. Adds $0.01 to the request cost for each invoked web extractor call.
tool_code_interpreterbooleannofalseRun Python code in a sandbox. Requires Thinking. Adds $0.01 to the request cost for each invoked code interpreter call.
disable_formattingbooleannofalseReturn raw provider-style output without EmpirioLabs source formatting where supported.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/qwen3-7-max.