GLM 5.2 | EmpirioLabs AI Docs

Z.ai · Text Generation

POST /v1/chat/completions

Reasoning and coding model with a 1M token context, 128K output, adjustable reasoning effort, native web search, and tool calling.

At a glance

Field	Value
Model id	`glm-5-2`
Model release date	2026-06-16
Input modalities	Text
Output modalities	Text
Context window	1M
Weight precision	-
Max output tokens	131,072
Region	Singapore
Features	reasoning, function_calling, web_search
Native inference	No
New	Yes
Structured output	JSON Mode
Supported endpoints	`POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/messages`, `POST /v1beta/models/glm-5-2:generateContent`
Alternate model ids	`glm-5.2`, `zai/glm-5.2`, `zhipu/glm-5.2`

Pricing

Charge	Spec	Rate
Input	per 1M prompt tokens	$1.40
Output	per 1M generated tokens	$4.40
Web search	per request	$0.033

Example request

$ curl https://api.empiriolabs.ai/v1/chat/completions \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "glm-5-2", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

Parameter	Type	Required	Default	Description
`max_tokens`	integer	no	`65536`	Maximum number of output tokens to generate. · Range: 1 – 131072
`temperature`	number	no	`1`	Controls randomness. Lower values make responses more deterministic. · Range: 0 – 1
`top_p`	number	no	`0.95`	Nucleus sampling cutoff. · Range: 0.01 – 1
`reasoning_effort`	enum	no	`"max"`	GLM-5.2 reasoning effort. none disables thinking; minimal through max set how hard the model reasons before answering. max is recommended for complex coding. · Allowed: `none`, `minimal`, `low`, `medium`, `high`, `xhigh`, `max`
`enable_thinking`	boolean	no	true	Allow the model to reason before answering. Turn off for the lowest-latency replies or strict structured output.
`do_sample`	boolean	no	true	Enable sampling. Turn off for greedy deterministic output (temperature and top_p are ignored).
`tool_web_search`	boolean	no	false	Enable built-in web search. Adds $0.033 per request when used.
`search_recency_filter`	enum	no	`"noLimit"`	Limit web search results to a recency window. · Allowed: `oneDay`, `oneWeek`, `oneMonth`, `oneYear`, `noLimit`
`count`	integer	no	`10`	Number of web search results to retrieve when web search is enabled. · Range: 1 – 50
`search_domain_filter`	string	no	-	Restrict web search to a specific domain.
`search_prompt`	string	no	-	Optional prompt used to summarize retrieved web search results.
`search_result`	boolean	no	true	Return web search result metadata in the response when web search is enabled.
`tool_stream`	boolean	no	false	Stream function-call arguments incrementally when streaming.
`tools`	array	no	`[]`	OpenAI-compatible function calling tool definitions.
`tool_choice`	object	no	-	OpenAI-compatible tool choice control.
`stop`	array	no	-	Optional stop sequences (up to 4).
`response_format`	enum	no	-	Return the output as a valid JSON object (JSON mode). Describe the fields you want in your prompt.

Variants

`:variant1`

Field	Value
Model id	`glm-5-2:variant1`
Model release date	2026-06-16
Region	Germany
Context window	1M
Weight precision	-
Max output tokens	131,072
Features	reasoning, function_calling, cache, web_search
Native inference	No
Structured output	JSON Schema
Supported endpoints	`POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/messages`, `POST /v1beta/models/glm-5-2:variant1:generateContent`

Pricing

Charge	Spec	Rate
Input	per 1M prompt tokens	$1.10 (was $1.40)
Output	per 1M generated tokens	$3.851 (was $4.40)
Implicit cache read	per 1M cached input tokens	$0.275
Web Search (Linkup)	per call when invoked	$0.013

Parameters

Parameter	Type	Required	Default	Description
`temperature`	number	no	`0.7`	Sampling temperature. 0 = deterministic, 2 = maximum randomness. · Range: 0 – 2
`top_p`	number	no	`0.9`	Nucleus sampling probability mass. Lower = more focused. · Range: 0 – 1
`max_tokens`	number	no	`4096`	Maximum output tokens. · Range: 1 – 131072
`stop`	string	no	-	Up to 4 strings where the model will stop generating further tokens.
`enable_thinking`	boolean	no	true	Enable step-by-step reasoning before answering.
`reasoning_effort`	enum	no	`"medium"`	Reasoning effort level. none disables thinking. low, medium, high, and max set bounded thinking budgets sized to the selected model. Sent as an OpenAI-style reasoning_effort field, translated into enable_thinking and thinking_budget for the model service. · Allowed: `none`, `low`, `medium`, `high`, `max`
`thinking_budget`	number	no	`32768`	Maximum tokens reserved for the reasoning process. Up to 131072. · Range: 1 – 131072
`response_format`	enum	no	-	Constrain the output to JSON. Use JSON mode for any valid JSON object, or JSON schema to force output that matches a schema you provide.
`web_search_linkup`	boolean	no	false	Optional web search powered by Linkup. When enabled, recent web sources are retrieved using your latest user message as the query and provided to the model as additional context. Adds $0.013 per call when invoked on top of the model’s normal token cost. Disabled by default.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/glm-5-2.