Kimi K2.7 Code | EmpirioLabs AI Docs

Moonshot AI · Text Generation

POST /v1/chat/completions

Kimi K2.7 Code is Moonshot’s trillion-parameter agentic coding model with 256K context, always-on reasoning, and text, image, and video inputs.

At a glance

Field	Value
Model id	`kimi-k2-7-code`
Model release date	2026-06-16
Input modalities	Text, Image, Video
Output modalities	Text
Context window	256K
Weight precision	-
Max output tokens	131,072
Region	International
Features	reasoning, function_calling, multimodal, agentic_coding, web_search
Native inference	No
New	Yes
Structured output	JSON Schema
Supported endpoints	`POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/messages`, `POST /v1beta/models/kimi-k2-7-code:generateContent`
Alternate model ids	`kimi-k2.7-code`, `moonshotai/kimi-k2.7-code`

Pricing

Charge	Spec	Rate
Input	per 1M prompt tokens	$0.95
Output	per 1M generated tokens	$4.00
Web search	per call when invoked	$0.015

Example request

$ curl https://api.empiriolabs.ai/v1/chat/completions \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "kimi-k2-7-code", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

Parameter	Type	Required	Default	Description
`max_tokens`	number	no	`16384`	Maximum output tokens. Reasoning tokens count toward this limit. · Range: 1 – 131072
`stop`	string	no	-	Up to 4 strings where the model will stop generating further tokens.
`tool_web_search`	boolean	no	false	Search the web for real-time information. Adds $0.015 to the request cost for each invoked web search call.
`response_format`	enum	no	-	Constrain the output to JSON. Use JSON mode for any valid JSON object, or JSON schema to force output that matches a schema you provide.

Notes

Supports text, image, and video inputs with 256K context, function calling, JSON mode structured output, and built-in web search at $0.015 per invoked call. Thinking is always on and cannot be disabled; reasoning tokens are billed as output tokens. Temperature and other sampling overrides are ignored because the model service uses fixed sampling settings. Multi-step function calling through the API must replay the assistant message with its reasoning_content field intact.

Per-tool billing (usage.tool_usage)

When this model invokes built-in tools inside a single request, the response carries a normalized usage.tool_usage map alongside the token counts. Tool counts are already factored into cost_usd and are surfaced for transparency.

Variants

`:variant1`

Field	Value
Model id	`kimi-k2-7-code:variant1`
Model release date	2026-06-16
Region	Germany
Context window	256K
Weight precision	-
Max output tokens	16,384
Features	reasoning, function_calling, multimodal, agentic_coding, cache, web_search
Native inference	No
Structured output	Not supported
Supported endpoints	`POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/messages`, `POST /v1beta/models/kimi-k2-7-code:variant1:generateContent`

Pricing

Charge	Spec	Rate
Input	per 1M prompt tokens	$0.8939 (was $0.95)
Output	per 1M generated tokens	$3.7131 (was $4.00)
Implicit cache read	per 1M cached input tokens	$0.1788
Web Search (Linkup)	per call when invoked	$0.013

Parameters

Parameter	Type	Required	Default	Description
`max_tokens`	number	no	`16384`	Maximum output tokens. Reasoning tokens count toward this limit. · Range: 1 – 16384
`stop`	string	no	-	Up to 4 strings where the model will stop generating further tokens.
`response_format`	object	no	-	OpenAI-compatible JSON mode or JSON schema response format.
`web_search_linkup`	boolean	no	false	Optional web search powered by Linkup. When enabled, recent web sources are retrieved using your latest user message as the query and provided to the model as additional context. Adds $0.013 per call when invoked on top of the model’s normal token cost. Disabled by default.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/kimi-k2-7-code.