GLM-5.1
GLM-5.1

Z.ai · Text Generation
/v1/chat/completionsLong-context Zhipu AI reasoning model with 202K context, 128K output, tool calling, structured output, and cache support.
At a glance
| Field | Value |
|---|---|
| Model id | glm-5-1 |
| Input modalities | Text |
| Output modalities | Text |
| Context window | 202K |
| Weight precision | - |
| Region | China |
| Features | reasoning, function_calling, structured_output, cache |
| Native inference | No |
| New | Yes |
| Supported endpoints | /v1/chat/completions, /v1/responses, /v1/messages |
Pricing
| Charge | Spec | Rate |
|---|---|---|
| Input | per 1M prompt tokens | <=32K $0.825 (was $1.40); 32K-200K $1.10 (was $1.40) |
| Output | per 1M generated tokens | <=32K $3.301 (was $4.40); 32K-200K $3.851 (was $4.40) |
| Implicit cache read | per 1M cached input tokens | <=32K $0.165 (was $0.26); 32K-200K $0.22 (was $0.26) |
Example request
$ curl https://api.empiriolabs.ai/v1/chat/completions \ > -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \ > -H 'Content-Type: application/json' \ > -d '{"model": "glm-5-1", "messages": [{"role":"user","content":"Hello"}]}'
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
max_tokens | integer | no | 4096 | Maximum number of output tokens to generate. · Range: 1 – 128000 |
temperature | number | no | 1 | Controls randomness. Lower values make responses more deterministic. · Range: 0 – 2 |
top_p | number | no | 0.95 | Nucleus sampling cutoff. · Range: 0 – 1 |
top_k | integer | no | 20 | Limits sampling to the top K tokens. · Range: 1 – 100 |
repetition_penalty | number | no | 1 | Penalizes repeated tokens. · Range: 0.1 – 2 |
reasoning_effort | enum | no | "medium" | Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: none, low, medium, high, max |
enable_thinking | boolean | no | true | Allow the model to reason before answering. Disable this for strict structured output. |
thinking_budget | integer | no | 32768 | Maximum tokens available for reasoning content when thinking is enabled. · Range: 1 – 38912 |
tool_stream | boolean | no | false | Stream function-call arguments incrementally when streaming. |
tools | array | no | [] | OpenAI-compatible function calling tool definitions. |
tool_choice | object | no | — | OpenAI-compatible tool choice control. |
parallel_tool_calls | boolean | no | true | Allow multiple tool calls in a single assistant turn when supported. |
response_format | object | no | — | OpenAI-compatible JSON mode or JSON schema response format. Use non-thinking mode for strict schemas. |
stop | array | no | — | Optional stop sequences. |
Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/glm-5-1.
