MiniMax M2.7 Highspeed

MiniMax M2.7 Highspeed
MiniMax · Text Generation
POST /v1/chat/completions

High-speed M2.7 variant tuned for fast inference with strong general-purpose performance with strong agentic capabilities.

At a glance

FieldValue
Model idminimax-m2-7-highspeed
Input modalitiesText
Output modalitiesText
Context window200K
Weight precision-
Max output tokens32,768
RegionSingapore
Featuresreasoning
Native inferenceNo
NewNo
Supported endpointsPOST /v1/chat/completions, POST /v1/responses, POST /v1/messages

Pricing

ChargeSpecRate
Inputper 1M prompt tokens$0.30 (was $0.60)
Outputper 1M generated tokens$1.20 (was $2.40)
Implicit cache readper 1M cached input tokens$0.03 (was $0.06)
Web Search (Linkup)per call when invoked$0.013

Example request

$curl https://api.empiriolabs.ai/v1/chat/completions \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "minimax-m2-7-highspeed", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
temperaturenumberno1.0Sampling temperature. 0 = deterministic, 2 = maximum randomness. · Range: 0 – 2
top_pnumberno0.95Nucleus sampling probability mass. Lower = more focused. · Range: 0 – 1
max_tokensnumberno4096Maximum tokens in the response. · Range: 1 – 131072
stopstringno-Up to 4 strings where the model will stop generating further tokens.
toolsarrayno-OpenAI-style function-calling tool definitions. Each entry has name, description, parameters.
tool_choicestringno-auto | none | required | {type:function, function:{name:”…”}}. Controls when the model must call a tool.
web_search_linkupbooleannofalseOptional web search powered by Linkup. When enabled, recent web sources are retrieved using your latest user message as the query and provided to the model as additional context. Adds a flat $0.013 per request on top of the model’s normal token cost. Disabled by default.
disable_formattingbooleannofalseWhen enabled, the gateway will not append the “Sources” footer to assistant responses that used Linkup web search. Useful when the model output is piped to another system that expects no decoration.

Notes

Same frontier performance as M2.7 with ~100 tokens/sec output. Interleaved thinking is always on (no toggle).


Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/minimax-m2-7-highspeed.