GLM-4.7-Flash

GLM-4.7-Flash
Z.ai · Text Generation
POST /v1/chat/completions

Free lightweight GLM-4.7 text model for coding, reasoning, long-context writing, and general chat.

At a glance

FieldValue
Model idglm-4-7-flash
Input modalitiesText
Output modalitiesText
Context window200K
Weight precision-
Max output tokens131,072
RegionSingapore
Featuresreasoning, function_calling, structured_output, web_search
Native inferenceNo
NewYes
Supported endpointsPOST /v1/chat/completions, POST /v1/responses, POST /v1/messages

Pricing

ChargeSpecRate
Inputper 1M prompt tokensFree
Outputper 1M generated tokensFree
Implicit cache readper 1M cached input tokensFree
Web Searchper request when enabled$0.033

Example request

$curl https://api.empiriolabs.ai/v1/chat/completions \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "glm-4-7-flash", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno1Sampling temperature. Lower values are more deterministic. GLM-4.7-Flash and GLM-4.6V-Flash default to 1.0; GLM-4.5-Flash defaults to 0.6. · Range: 0 – 1
top_pnumberno0.95Nucleus sampling probability mass. Z.AI documents a 0.95 default for the GLM-4.7, GLM-4.6, and GLM-4.5 series. · Range: 0.01 – 1
max_tokensnumberno4096Maximum output tokens for GLM-4.7-Flash: 131072. · Range: 1 – 131072
stoparraynoStop word list. Z.AI currently supports one stop string in array form.
do_samplebooleannotrueEnable sampling. When false, temperature and top_p do not affect generation.
enable_thinkingbooleannotrueControls Z.AI thinking mode. Enabled is the default and makes GLM-4.7-Flash think; disable it for simple low-latency turns.
thinkingobjectnoAdvanced thinking object. Use {“type”:“enabled”} or {“type”:“disabled”}. GLM-4.7-Flash thinks when enabled.
response_formatobjectnoSet {“type”:“json_object”} for JSON mode or {“type”:“text”} for plain text.
toolsarraynoFunction tools and the built-in web_search tool are supported.
tool_choiceenumno"auto"Controls whether the model may use tools. Z.AI documents auto tool selection; omit tools to disable tool use. · Allowed: auto
tool_streambooleannofalseStream function-call tool output when stream is true. Z.AI documents tool_stream for GLM-4.6 and newer models.
tool_web_searchbooleannofalseEnable built-in web search. Adds $0.033 per request when enabled.
search_resultbooleannotrueReturn structured web search result metadata when web search is enabled.
search_promptstringnoOptional instruction for summarizing retrieved web search results.
countnumberno10Number of web search results to retrieve. · Range: 1 – 50
search_domain_filterstringnoOptional domain whitelist for web search results.
search_recency_filterenumno"noLimit"Optional web search recency window. · Allowed: oneDay, oneWeek, oneMonth, oneYear, noLimit

Notes

Base token use is free. Built-in web search is optional through tool_web_search and adds $0.033 per request when enabled.


Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/glm-4-7-flash.