MiMo-V2-Omni | EmpirioLabs AI Docs

Provider: Xiaomi
Category: Text Generation
Endpoint: POST /v1/chat/completions
Context window: 256K
Served from: —

Omni-modal foundation model that natively understands text, images, audio, and video with deep reasoning, web search, and multi-step planning.

At a glance

Field	Value
Model id	`mimo-v2-omni`
Input modalities	text, image, audio
Output modalities	text, audio
Context window	256K
Region	—
Features	vision, audio_in
New	No
Native inference	No

Pricing

Charge	Spec	Rate
Input	per 1M tokens	$0.40
Output	per 1M tokens	$2.00
Web Search	per call	$0.015

Example request

$ curl https://api.empiriolabs.ai/v1/chat/completions \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "mimo-v2-omni", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

Parameter	Type	Required	Default	Description
`deep_thinking`	boolean	no	true	—
`web_search_enabled`	boolean	no	false	—
`web_search_force`	boolean	no	false	—
`web_search_max_keyword`	number	no	`3`	Range: 1 – 5
`web_search_limit`	number	no	`5`	Range: 1 – 10
`video_fps`	number	no	`2`	Frames per second extracted from video input · Range: 0.1 – 10
`video_resolution`	enum	no	`"default"`	Allowed: `default`, `max`
`temperature`	number	no	`0.7`	Range: 0 – 1
`top_p`	number	no	`1`	Range: 0 – 1
`max_tokens`	number	no	`4096`	Range: 1 – 32768
`disable_formatting`	boolean	no	false	—

Live machine-readable schema is also available at GET https://api.empiriolabs.ai/v1/models/mimo-v2-omni.