GLM-5.1

GLM-5.1
Z.ai · Text Generation
/v1/chat/completions

Long-context Zhipu AI reasoning model with 202K context, 128K output, tool calling, structured output, and cache support.

At a glance

FieldValue
Model idglm-5-1
Input modalitiesText
Output modalitiesText
Context window202K
Weight precision-
RegionChina
Featuresreasoning, function_calling, structured_output, cache
Native inferenceNo
NewYes
Supported endpoints/v1/chat/completions, /v1/responses, /v1/messages

Pricing

ChargeSpecRate
Inputper 1M prompt tokens<=32K $0.825 (was $1.40); 32K-200K $1.10 (was $1.40)
Outputper 1M generated tokens<=32K $3.301 (was $4.40); 32K-200K $3.851 (was $4.40)
Implicit cache readper 1M cached input tokens<=32K $0.165 (was $0.26); 32K-200K $0.22 (was $0.26)

Example request

$curl https://api.empiriolabs.ai/v1/chat/completions \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "glm-5-1", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
max_tokensintegerno4096Maximum number of output tokens to generate. · Range: 1 – 128000
temperaturenumberno1Controls randomness. Lower values make responses more deterministic. · Range: 0 – 2
top_pnumberno0.95Nucleus sampling cutoff. · Range: 0 – 1
top_kintegerno20Limits sampling to the top K tokens. · Range: 1 – 100
repetition_penaltynumberno1Penalizes repeated tokens. · Range: 0.1 – 2
reasoning_effortenumno"medium"Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: none, low, medium, high, max
enable_thinkingbooleannotrueAllow the model to reason before answering. Disable this for strict structured output.
thinking_budgetintegerno32768Maximum tokens available for reasoning content when thinking is enabled. · Range: 1 – 38912
tool_streambooleannofalseStream function-call arguments incrementally when streaming.
toolsarrayno[]OpenAI-compatible function calling tool definitions.
tool_choiceobjectnoOpenAI-compatible tool choice control.
parallel_tool_callsbooleannotrueAllow multiple tool calls in a single assistant turn when supported.
response_formatobjectnoOpenAI-compatible JSON mode or JSON schema response format. Use non-thinking mode for strict schemas.
stoparraynoOptional stop sequences.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/glm-5-1.