GLM 4.7 Flash

Z.ai · Text Generation

POST /v1/chat/completions

Free lightweight GLM-4.7 text model for coding, reasoning, long-context writing, and general chat.

At a glance

Field	Value
Model id	`glm-4-7-flash`
Model release date	2026-01-19
Input modalities	Text
Output modalities	Text
Context window	200K
Weight precision	-
Max output tokens	131,072
Region	Singapore
Features	reasoning, function_calling, web_search
Native inference	No
New	Yes
Structured output	JSON Mode
Supported endpoints	`POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/messages`, `POST /v1beta/models/glm-4-7-flash:generateContent`

Pricing

Charge	Spec	Rate
Input	per 1M prompt tokens	Free
Output	per 1M generated tokens	Free
Implicit cache read	per 1M cached input tokens	Free
Web search	per request when enabled	$0.033

Example request

$ curl https://api.empiriolabs.ai/v1/chat/completions \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "glm-4-7-flash", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

Parameter	Type	Required	Default	Description
`temperature`	number	no	`1`	Sampling temperature. Lower values are more deterministic. GLM-4.7-Flash and GLM-4.6V-Flash default to 1.0; GLM-4.5-Flash defaults to 0.6. · Range: 0 – 1
`top_p`	number	no	`0.95`	Nucleus sampling probability mass. Z.AI documents a 0.95 default for the GLM-4.7, GLM-4.6, and GLM-4.5 series. · Range: 0.01 – 1
`max_tokens`	number	no	`4096`	Maximum output tokens for GLM-4.7-Flash: 131072. · Range: 1 – 131072
`stop`	array	no	-	Stop word list. Z.AI currently supports one stop string in array form.
`do_sample`	boolean	no	true	Enable sampling. When false, temperature and top_p do not affect generation.
`enable_thinking`	boolean	no	true	Controls Z.AI thinking mode. Enabled is the default and makes GLM-4.7-Flash think; disable it for simple low-latency turns.
`thinking`	object	no	-	Advanced thinking object. Use {“type”:“enabled”} or {“type”:“disabled”}. GLM-4.7-Flash thinks when enabled.
`tools`	array	no	-	Function tools and the built-in web_search tool are supported.
`tool_choice`	enum	no	`"auto"`	Controls whether the model may use tools. Z.AI documents auto tool selection; omit tools to disable tool use. · Allowed: `auto`
`tool_stream`	boolean	no	false	Stream function-call tool output when stream is true. Z.AI documents tool_stream for GLM-4.6 and newer models.
`tool_web_search`	boolean	no	false	Enable built-in web search. Adds $0.033 per request when enabled.
`search_result`	boolean	no	true	Return structured web search result metadata when web search is enabled.
`search_prompt`	string	no	-	Optional instruction for summarizing retrieved web search results.
`count`	number	no	`10`	Number of web search results to retrieve. · Range: 1 – 50
`search_domain_filter`	string	no	-	Optional domain whitelist for web search results.
`search_recency_filter`	enum	no	`"noLimit"`	Optional web search recency window. · Allowed: `oneDay`, `oneWeek`, `oneMonth`, `oneYear`, `noLimit`
`response_format`	enum	no	-	Return the output as a valid JSON object (JSON mode). Describe the fields you want in your prompt.

Notes

Base token use is free. Built-in web search is optional through tool_web_search and adds $0.033 per request when enabled.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/glm-4-7-flash.