Fugu Ultra

Fugu Ultra
Sakana AI · Text Generation
POST /v1/chat/completions

Multi-agent conductor that orchestrates frontier expert models for hard reasoning, coding, and research, with 1M context, image input, and web search.

At a glance

FieldValue
Model idfugu-ultra
Model release date2026-06-21
Input modalitiesText, Image
Output modalitiesText
Context window1M
Weight precision-
Max output tokens131,072
Featuresreasoning, multimodal, web_search, function_calling, structured_output, agentic_coding, cache
Native inferenceNo
NewYes
Supported endpointsPOST /v1/chat/completions, POST /v1/responses, POST /v1/messages

Pricing

ChargeSpecRate
Inputper 1M prompt tokens<=272K $7.50; >272K $15.00
Outputper 1M generated tokens<=272K $45.00; >272K $67.50
Implicit cache readper 1M cached input tokens<=272K $1.50; >272K $3.00

Example request

$curl https://api.empiriolabs.ai/v1/chat/completions \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "fugu-ultra", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
max_tokensintegerno32768Maximum number of output tokens for the final answer. The conductor needs room to work, so very small values can return empty output. · Range: 1 – 131072
reasoning_effortenumno"high"How hard Fugu Ultra reasons. Reasoning is always on. The default is high; xhigh and max are aliases of the same maximum effort (more thorough and slower than high). · Allowed: high, xhigh, max
tool_web_searchbooleannofalseEnable built-in web search. There is no separate fee; the search cost is reflected in the orchestration tokens billed for the request.
toolsarrayno[]OpenAI-compatible function calling tool definitions.
tool_choiceobjectno-OpenAI-compatible tool choice control.
response_formatobjectno-OpenAI-compatible JSON mode for structured output.

Notes

Fugu Ultra is a multi-agent conductor: each request coordinates a pool of expert models and composes their work into a single answer.

Latency and streaming

  • Responses can take from a few seconds to a few minutes on complex prompts.
  • The full answer is returned all at once when the model finishes, not token by token. Streaming is accepted, but it delivers the complete response at the end rather than streaming tokens as they generate.
  • Leave generous max_tokens headroom, since very small limits can truncate or empty the answer.

Capabilities

  • Text and image input, with a 1M token context.
  • Always-on reasoning. high is the default; xhigh and max are the same maximum effort.
  • Function calling, JSON mode, and built-in web search that cites its sources when available (no separate fee).

Billing

  • Billed on full token usage, including the orchestration tokens the model uses internally, so even short prompts carry some cost.
  • Context-tiered: requests above 272K total input tokens use the higher rate shown.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/fugu-ultra.