MiMo V2.5

POST /v1/chat/completionsMultimodal model with native visual and audio understanding on a 1M context, designed to reason and act across modalities in agentic workflows.
At a glance
Pricing
Example request
Parameters
Notes
Omnimodal input (text, image, video, audio) with text output. Web search ($0.015/call) is charged only when invoked. Cached input tokens are billed at a steep discount.
Per-tool billing (usage.tool_usage)
When this model invokes tools (web search, code interpreter, etc.) inside a single request, the response carries a normalized usage.tool_usage map alongside the token counts. The example below shows the shape — exact field names, units, and which tools appear can vary slightly per provider:
The tool counts are already factored into cost_usd — they are surfaced for transparency so you can audit per-tool billing. The field is omitted when no tools were invoked.
Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/mimo-v2-5.
