Fugu Ultra

Sakana AI · Text Generation

POST /v1/chat/completions

Multi-agent conductor that orchestrates frontier expert models for hard reasoning, coding, and research, with 1M context, image input, and web search.

At a glance

Field	Value
Model id	`fugu-ultra`
Model release date	2026-06-21
Input modalities	Text, Image
Output modalities	Text
Context window	1M
Weight precision	-
Max output tokens	131,072
Features	reasoning, multimodal, web_search, function_calling, structured_output, agentic_coding, cache
Native inference	No
New	Yes
Supported endpoints	`POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/messages`

Pricing

Charge	Spec	Rate
Input	per 1M prompt tokens	<=272K $7.50; >272K $15.00
Output	per 1M generated tokens	<=272K $45.00; >272K $67.50
Implicit cache read	per 1M cached input tokens	<=272K $1.50; >272K $3.00

Example request

$ curl https://api.empiriolabs.ai/v1/chat/completions \
>   -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
>   -H 'Content-Type: application/json' \
>   -d '{"model": "fugu-ultra", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

Parameter	Type	Required	Default	Description
`max_tokens`	integer	no	`32768`	Maximum number of output tokens for the final answer. The conductor needs room to work, so very small values can return empty output. · Range: 1 – 131072
`reasoning_effort`	enum	no	`"high"`	How hard Fugu Ultra reasons. Reasoning is always on. The default is high; xhigh and max are aliases of the same maximum effort (more thorough and slower than high). · Allowed: `high`, `xhigh`, `max`
`tool_web_search`	boolean	no	false	Enable built-in web search. There is no separate fee; the search cost is reflected in the orchestration tokens billed for the request.
`tools`	array	no	`[]`	OpenAI-compatible function calling tool definitions.
`tool_choice`	object	no	-	OpenAI-compatible tool choice control.
`response_format`	object	no	-	OpenAI-compatible JSON mode for structured output.

Notes

Fugu Ultra is a multi-agent conductor: each request coordinates a pool of expert models and composes their work into a single answer.

Latency and streaming

Responses can take from a few seconds to a few minutes on complex prompts.
The full answer is returned all at once when the model finishes, not token by token. Streaming is accepted, but it delivers the complete response at the end rather than streaming tokens as they generate.
Leave generous max_tokens headroom, since very small limits can truncate or empty the answer.

Capabilities

Text and image input, with a 1M token context.
Always-on reasoning. high is the default; xhigh and max are the same maximum effort.
Function calling, JSON mode, and built-in web search that cites its sources when available (no separate fee).

Billing

Billed on full token usage, including the orchestration tokens the model uses internally, so even short prompts carry some cost.
Context-tiered: requests above 272K total input tokens use the higher rate shown.

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/fugu-ultra.