MiMo-V2-Omni

MiMo-V2-Omni

Provider: Xiaomi
Category: Text Generation
Endpoint: POST /v1/chat/completions
Context window: 256K
Served from:

Omni-modal foundation model that natively understands text, images, audio, and video with deep reasoning, web search, and multi-step planning.

At a glance

FieldValue
Model idmimo-v2-omni
Input modalitiestext, image, audio
Output modalitiestext, audio
Context window256K
Region
Featuresvision, audio_in
NewNo
Native inferenceNo

Pricing

ChargeSpecRate
Inputper 1M tokens$0.40
Outputper 1M tokens$2.00
Web Searchper call$0.015

Example request

$curl https://api.empiriolabs.ai/v1/chat/completions \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "mimo-v2-omni", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
deep_thinkingbooleannotrue
web_search_enabledbooleannofalse
web_search_forcebooleannofalse
web_search_max_keywordnumberno3Range: 1 – 5
web_search_limitnumberno5Range: 1 – 10
video_fpsnumberno2Frames per second extracted from video input · Range: 0.1 – 10
video_resolutionenumno"default"Allowed: default, max
temperaturenumberno0.7Range: 0 – 1
top_pnumberno1Range: 0 – 1
max_tokensnumberno4096Range: 1 – 32768
disable_formattingbooleannofalse

Live machine-readable schema is also available at GET https://api.empiriolabs.ai/v1/models/mimo-v2-omni.