GLM-4.6V-Flash
GLM-4.6V-Flash

Z.ai · Text Generation
POST /v1/chat/completionsFree multimodal GLM-4.6V model for image, video, file, and text understanding with native function calling.
At a glance
| Field | Value |
|---|---|
| Model id | glm-4-6v-flash |
| Input modalities | Text, Image, Video, File |
| Output modalities | Text |
| Context window | 128K |
| Weight precision | - |
| Max output tokens | 32,768 |
| Region | Singapore |
| Features | vision, video_understanding, document_understanding, function_calling, structured_output, web_search |
| Native inference | No |
| New | Yes |
| Supported endpoints | POST /v1/chat/completions, POST /v1/responses, POST /v1/messages |
Pricing
| Charge | Spec | Rate |
|---|---|---|
| Input | per 1M prompt tokens | Free |
| Output | per 1M generated tokens | Free |
| Implicit cache read | per 1M cached input tokens | Free |
| Web Search | per request when enabled | $0.033 |
Example request
$ curl https://api.empiriolabs.ai/v1/chat/completions \ > -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \ > -H 'Content-Type: application/json' \ > -d '{"model": "glm-4-6v-flash", "messages": [{"role":"user","content":"Hello"}]}'
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
temperature | number | no | 1 | Sampling temperature. Lower values are more deterministic. GLM-4.7-Flash and GLM-4.6V-Flash default to 1.0; GLM-4.5-Flash defaults to 0.6. · Range: 0 – 1 |
top_p | number | no | 0.95 | Nucleus sampling probability mass. Z.AI documents a 0.95 default for the GLM-4.7, GLM-4.6, and GLM-4.5 series. · Range: 0.01 – 1 |
max_tokens | number | no | 4096 | Maximum output tokens for GLM-4.6V-Flash: 32768. · Range: 1 – 32768 |
stop | array | no | — | Stop word list. Z.AI currently supports one stop string in array form. |
do_sample | boolean | no | true | Enable sampling. When false, temperature and top_p do not affect generation. |
enable_thinking | boolean | no | true | Controls Z.AI thinking mode. Enabled is the default; GLM-4.6V-Flash automatically decides whether to think when enabled. |
thinking | object | no | — | Advanced thinking object. Use {“type”:“enabled”} or {“type”:“disabled”}. GLM-4.6V-Flash automatically decides whether to think when enabled. |
response_format | object | no | — | Set {“type”:“json_object”} for JSON mode or {“type”:“text”} for plain text. |
tools | array | no | — | Function tools and the built-in web_search tool are supported. |
tool_choice | enum | no | "auto" | Controls whether the model may use tools. Z.AI documents auto tool selection; omit tools to disable tool use. · Allowed: auto |
tool_stream | boolean | no | false | Stream function-call tool output when stream is true. Z.AI documents tool_stream for GLM-4.6 and newer models. |
tool_web_search | boolean | no | false | Enable built-in web search. Adds $0.033 per request when enabled. |
search_result | boolean | no | true | Return structured web search result metadata when web search is enabled. |
search_prompt | string | no | — | Optional instruction for summarizing retrieved web search results. |
count | number | no | 10 | Number of web search results to retrieve. · Range: 1 – 50 |
search_domain_filter | string | no | — | Optional domain whitelist for web search results. |
search_recency_filter | enum | no | "noLimit" | Optional web search recency window. · Allowed: oneDay, oneWeek, oneMonth, oneYear, noLimit |
Notes
Base token use is free. Built-in web search is optional through tool_web_search and adds $0.033 per request when enabled.
Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/glm-4-6v-flash.
