API Reference
Complete REST surface — chat, embeddings, reranks, images, video, 3D, audio, transcription, search, detection, jobs
EmpirioLabs speaks OpenAI- and Anthropic-compatible request shapes. Drop in any SDK, point it at https://api.empiriolabs.ai, and authenticate with your EmpirioLabs API key. Every endpoint below works against any OpenAI or Anthropic client unchanged.
Authentication
Every request requires a bearer token. Either header is accepted on every endpoint:
Endpoint surface
OpenAI-compatible chat. Streaming, tool calling, vision, audio input, JSON mode, structured output, reasoning controls.
OpenAI-compatible prompt completions for models that advertise POST /v1/completions.
Drop-in for Anthropic SDK clients. tool_use / tool_result blocks round-trip cleanly.
Generate, edit, inpaint, image variations. Hosted CDN URLs, 7-day signed.
Async video generation. Returns a job_id; poll the jobs endpoint for the URL.
TTS plus real-time streaming TTS (Inworld), music / podcast / SFX generation, voice clone management.
Long-running tool-using agent tasks. Start, poll, stream messages, stop early.
Whisper / Deepgram / Parakeet. Multipart upload or file_url.
Exa, Tavily, Linkup, Perplexity Search. Domain filters, date ranges, geo bias.
Async image-to-3D asset generation. Returns a job_id; poll for the signed GLB URL.
POST /v1/detect — GPTZero AI-detection, bibliography scan, source analysis.
OpenAI-compatible embeddings. Multilingual text + multimodal embedders.
Semantic document reranking. Sort retrieval candidates by relevance for RAG and search refinement.
Pass any public URL on input fields. No upload, no re-sign — generated outputs are valid for 7 days.
Poll the status / result of any async generation. State retained 1 hour after completion.
Live catalog with pricing, parameter schema, capability flags, regions.
OpenAI- and Anthropic-compatible error envelopes.
Chat completions
POST /v1/chat/completions
Pass any chat-capable model from the catalog as model. Streaming uses Server-Sent Events with data: ... lines and a final data: [DONE].
Every model’s accepted parameters live on its docs page (e.g. temperature, top_p, enable_thinking, reasoning_effort, web_search_tier). Browse them under Providers and Models.
Model parameters across endpoints
Model-specific parameters advertised on the model page and in GET /v1/models/{id} can be sent to /v1/chat/completions, /v1/responses, and /v1/messages when that model supports the endpoint. The gateway adapts request shapes so the same controls reach the underlying model.
For thinking-capable models, enable_thinking and thinking_budget are accepted on all three text endpoints. On /v1/messages, you can also use Anthropic-style thinking:
That maps to the same enable_thinking=true and thinking_budget=1024 controls used by Chat Completions and Responses.
Legacy completions
POST /v1/completions
Use this endpoint for OpenAI-compatible clients that still send a raw prompt instead of chat messages. Only models that list POST /v1/completions in supported_endpoints accept this shape.
Streaming uses Server-Sent Events and includes usage when the model service reports it.
Anthropic Messages
POST /v1/messages
Drop-in for any Anthropic SDK client — the same models accessible on /v1/chat/completions and /v1/responses are reachable here under the Anthropic Messages shape.
tool_use and tool_result blocks round-trip cleanly. Mixed text-plus-tool_use content arrays are preserved.
Image generation
POST /v1/images/generations
Image-edit flows accept image: ["https://..."] with up to the model’s documented limit (3 for qwen-image-2-0, 9 for wan-2-7-image, 14 for seedream-5-0-lite). Image-set modes generate cohesive series — see each model’s page for the toggle.
Returned URLs live on https://media.empiriolabs.ai and expire after 7 days. Save anything you want to keep before the URL expires.
POST /v1/images/analysis runs vision-only analysis (no generation) on one or more input images. Use for layout extraction, object detection, OCR, and similar inspection tasks where the model returns text or JSON describing the image rather than a new picture.
Video generation
POST /v1/videos/generations
Always async — the endpoint returns a job_id and a polling URL.
Audio generation
POST /v1/audio/speech synchronous, returns a hosted URL by default; pass response_format: "b64_json" for inline audio bytes.
POST /v1/audio/speech:stream real-time TTS. Returns Server-Sent Events as the model synthesizes. Sub-130ms time-to-first-byte on Inworld TTS Mini, sub-250ms on Max. Use for voice agents and interactive playback. Currently supported on Inworld TTS Mini / Max; other TTS models use the synchronous endpoint.
POST /v1/audio/generations music, podcast, and sound-effect generation. Covers Stable Audio, GLM TTS, MOSS, SoulX Podcast where the prompt-to-audio shape differs from TTS.
GET /v1/voices list and manage voices, including custom voice clones for Inworld TTS. Use the returned voice_id on either speech endpoint.
Transcription
POST /v1/audio/transcriptions
Accepts either a multipart file upload or a JSON payload with file_url.
Long files (over 5 minutes) auto-route to the async job system — the response includes a job_id instead of inline text. Poll the jobs endpoint to retrieve the final transcript.
Search and research
POST /v1/search unified search surface for retrieval-style models. The exact accepted params per model live on each model’s page (e.g. exa-search exposes 28 params including category, livecrawl, subpages, summary_query, code_tokens).
POST /v1/research deep research / multi-step retrieval models (Exa Research, Perplexity Deep Research, Linkup Deep Search). Generates a structured research report with cited sources.
POST /v1/answer direct question-answering models (Exa Answer). Returns a concise answer plus citations without the full report shape.
Agents
Long-running, tool-using agent tasks (currently routed to Manus). Submit once, then poll for status and step-by-step messages, or stop early.
POST /v1/agents/run does double duty:
- With no
task_idit starts a fresh task. The response carries the newtask_id. - With
task_idit sends a follow-up message to an existing task. The agent picks it up on its next reasoning step.
GET /v1/agents/{task_id} retrieve the task’s current status and final result.
GET /v1/agents/{task_id}/messages list every step the agent has emitted so far. Useful for rendering a live reasoning trace alongside the final answer.
POST /v1/agents/{task_id}/stop stop a running task. Billing settles for the work the agent already completed.
3D Generation
POST /v1/3d/generations
Image-to-3D generation is async. The endpoint returns a job_id and a polling URL; poll the jobs endpoint to retrieve the final signed GLB URL.
trellis-2-4b exposes the full image, resolution, sampler, texture, and mesh export parameter surface on its model page.
Detection
POST /v1/detect
Specialized text-classification endpoint. Currently powers GPTZero (AI-detection, bibliography scan, source analysis). Each model’s scan_type enum picks the upstream path; see the per-model docs for the full parameter surface.
GPTZero is also reachable via /v1/chat/completions and /v1/responses — pass the text on the message body and the gateway adapts the call. The detection summary comes back as the assistant message; pass disable_formatting: true to receive the raw upstream JSON instead.
Embeddings
POST /v1/embeddings
OpenAI-compatible embeddings. Multilingual text and multimodal (text + image + video) embedders are available.
Reranks
POST /v1/reranks
Sort candidate documents by semantic relevance to a query. Returns each document’s original index plus a 0-1 relevance score (higher = more relevant). Use this to tighten the output of a vector store / BM25 / hybrid retriever before passing the top hits to a language model — the standard last step in a RAG pipeline.
The optional instruct parameter swaps between Q&A retrieval (default) and pure semantic-similarity sorting — see the qwen3-rerank model page for the full parameter table.
Usage object
Every endpoint that bills usage returns a usage field on the response (and on the terminal streaming chunk). Base shape:
cost_usd— exact amount your account was billed for the request. Authoritative.prompt_tokens/completion_tokens/total_tokens— for chat-style models.- Cache fields (
cache_read_input_tokens,cache_creation_input_tokens) — when prompt caching applies.
Models with tiered, per-call, or variant-priced upstreams stamp extra fields on usage so you can see which rate was applied:
- Tier / variant pricing. Workers stamp a tier discriminator on
usagewhen the same dimension has more than one rate. The primary field ispricing_tier_label(human-readable, e.g."Medium context"/"Pro"/"2K"). Older workers may stamp the raw dimension directly instead (resolution,quality,mode,rate_tier). The dashboard renders the badge from whichever is present. - Per-call pricing. Workers that bill per tool invocation (search, fetch, code execution, etc.) stamp counts under
tool_calls_details.<tool>.invocationortool_usage.<tool>. The dashboard expands these into a per-tool breakdown automatically. - Per-dimension pricing. Workers that bill multiple dimensions in one request (e.g. citation tokens + reasoning tokens + search queries on deep-research models) stamp each dimension as its own field (
citation_tokens,reasoning_tokens,num_search_queries, etc.).
The same fields drive the tier badge and per-tool breakdown on the dashboard usage logs, and they are also returned by the GET /v1/account/usage history endpoint under each event’s metadata.worker_usage (plus a structured tool_breakdown array for per-call models). So whether you read live response usage, account-usage history, or your dashboard, the tier and billing breakdown match exactly.
File URLs
EmpirioLabs does not host user uploads. Pass any public URL directly on the input field of the model endpoint:
For audio transcription specifically, the multipart-direct upload on /v1/audio/transcriptions is the supported path for private clips that aren’t on a URL — those bytes flow straight to the speech-to-text worker without persistent storage.
Generated output URLs are signed and expire 7 days after creation. There is no re-sign endpoint. Save anything you need to keep — both the URL and the binary — within that window.
Async jobs
GET /v1/jobs/<job-id> — poll the status / final result of any async generation or transcription job.
Job state is retained for 1 hour after completion.
When status is completed, the result field carries the full response in the same shape the synchronous endpoint would have returned.
Inbound HTTP timeout is 15 minutes. Synchronous chat completions running close to that limit should set stream=true so partial output flows back and the connection stays warm.
Models
GET /v1/models — list every available model.
GET /v1/models/<model-id> — full schema for one model, including its parameter table.
GET /v1/models?format=openrouter returns the OpenRouter model-listing shape for models marked ready for partner ingestion. See OpenRouter Model Listing for the exact response fields.
Each model returns:
disable_formatting flag
Many chat, search, research, and rerank endpoints accept a disable_formatting=true flag. When set on a supporting model, the worker skips EmpirioLabs server-side formatting (citation rewriting, References block, thinking-block Markdown, etc.) and returns the upstream payload shape verbatim.
Coverage is advertised per-model. Check supports_passthrough in GET /v1/models/{id} to confirm a specific model honors the flag. Models that advertise supports_passthrough: true also accept the aliases raw=true, passthrough=true, and raw_response=true. Models without that field accept only the canonical disable_formatting=true form (or do not honor passthrough at all). The model card lists which aliases each model accepts.
Image, video, audio-generation, transcription, and embedding endpoints do not accept this flag, since there is no formatting layer to disable on those endpoints.
Generated media retention
Generated images, videos, and audio are returned as signed URLs that are valid for 7 days. After that, the URL stops working and the asset is gone — there is no re-sign endpoint. Save anything you want to keep before the 7-day window expires.
Errors
OpenAI envelope on chat / responses / images / videos / audio / search / embeddings / reranks:
Anthropic envelope on /v1/messages:
Headers reference
Browse the per-model parameter schemas under Providers and Models. When you click into a specific model, every parameter the model accepts — type, default, range, allowed values, conditional flags — is documented in a table generated from the live database.
