MiMo V2.5 | EmpirioLabs AI Docs

Xiaomi · Text Generation

POST /v1/chat/completions

Multimodal model with native visual and audio understanding on a 1M context, designed to reason and act across modalities in agentic workflows.

At a glance

Field	Value
Model id	`mimo-v2-5`
Model release date	2026-04-22
Input modalities	Text, Image, Video, Audio
Output modalities	Text
Context window	1M
Weight precision	-
Max output tokens	128,000
Features	vision, audio_in, function_calling, reasoning, web_search
Native inference	No
New	No
Structured output	JSON Mode
Supported endpoints	`POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/messages`, `POST /v1beta/models/mimo-v2-5:generateContent`
Alternate model ids	`mimo-v2.5`, `mimo/v2.5`, `xiaomi/mimo-v2.5`

Pricing

Charge	Spec	Rate
Input	per 1M prompt tokens	$0.70
Output	per 1M generated tokens	$1.40
Implicit cache read	per 1M cached input tokens	$0.014
Web search	per request when enabled	$0.015

Example request

$ curl https://api.empiriolabs.ai/v1/chat/completions \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "mimo-v2-5", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

Parameter	Type	Required	Default	Description
`enable_thinking`	boolean	no	true	Enable extended thinking mode. Slower but improves reasoning-heavy tasks.
`tool_web_search`	boolean	no	false	Allow the model to perform web searches when needed.
`web_search_force`	boolean	no	false	Force the model to always run a web search before answering.
`web_search_max_keyword`	number	no	`3`	Max number of keywords the model can use across web searches. · Range: 1 – 5
`web_search_limit`	number	no	`5`	Max number of web searches the model can perform per request. · Range: 1 – 10
`video_fps`	number	no	`2`	Frames-per-second sampled from input video for analysis. · Range: 0.1 – 10
`video_resolution`	enum	no	`"default"`	Resolution at which input video is sampled (e.g. 360p, 480p, 720p). · Allowed: `default`, `max`
`temperature`	number	no	`0.7`	Sampling temperature. 0 = deterministic, 2 = maximum randomness. · Range: 0 – 2
`top_p`	number	no	`0.9`	Nucleus sampling probability mass. Lower = more focused. · Range: 0 – 1
`max_tokens`	number	no	`4096`	Maximum tokens in the response. · Range: 1 – 65536
`stop`	string	no	-	Up to 4 strings where the model will stop generating further tokens.
`response_format`	enum	no	-	Return the output as a valid JSON object (JSON mode). Describe the fields you want in your prompt.
`disable_formatting`	boolean	no	false	Skip the EmpirioLabs Markdown formatting (citation [N] rewriting + References block when web search was used). The raw upstream answer with plain [N] citations is returned.

Notes

Omnimodal input (text, image, video, audio) with text output. Web search ($0.015/call) is charged only when invoked. Cached input tokens are billed at a steep discount.

Per-tool billing (usage.tool_usage)

When this model invokes tools (web search, code interpreter, etc.) inside a single request, the response carries a normalized usage.tool_usage map alongside the token counts. The example below shows the shape — exact field names, units, and which tools appear can vary slightly per provider:

1 "usage": {
2   "prompt_tokens": 123,
3   "completion_tokens": 456,
4   "cost_usd": 0.0042,
5   "tool_usage": {"web_search": 3, "code_interpreter": 1}
6 }

The tool counts are already factored into cost_usd — they are surfaced for transparency so you can audit per-tool billing. The field is omitted when no tools were invoked.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/mimo-v2-5.