Limits and API Keys

Production limits, API keys, GPU Cloud, hosted agents, playground saved chats, and increase requests

Each account receives default production limits and can request higher limits as usage grows. The defaults below are kept in sync with the live platform settings.

Default account limits

LimitDefault
Requests per minute50 RPM
Tokens per minute2,000,000 TPM
API keys per account50
GPU Cloud GPUs per account10
Hosted agents per account3
Saved playground chats per account50

Email support@empiriolabs.ai if you need higher limits for production workloads.

API key format

API keys use the sk-empiriolabs- prefix:

1Authorization: Bearer sk-empiriolabs-...

Keep API keys server-side only. Never expose them in browser code, mobile apps, public repos, or client logs.

GPU Cloud limits

GPU Cloud limits are account-scoped. The default shown above comes from the live platform settings, and the dashboard settings page shows your effective limit.

Multi-GPU deployments count each GPU toward the limit. For example, one 2-GPU instance uses two GPU slots. Stopped instances keep their deploy spec and continue to count toward the GPU Cloud limit until they are destroyed.

Disk size can be set from 100 GB to 300 GB per instance. Deploying or starting a GPU also requires enough credit balance for the initial running window.

Hosted agent limits

Hosted agent limits are account-scoped. The default shown above comes from the live platform settings, and the dashboard settings page shows your effective limit.

Stopped hosted agents keep their managed runtime state and continue to count toward the limit until they are destroyed. Creating or renewing a hosted agent requires enough credit balance for the selected monthly plan.

Managing API keys

  • Generate new keys from the dashboard
  • Each account can hold up to 50 API keys (adjustable, contact support)
  • Delete unused keys promptly to reduce your attack surface
  • Use separate keys for production, staging, and development to isolate environments

Saved playground chats

The Playground auto-saves conversations so you can come back to useful model tests, prompts, and responses later.

Saved Playground chat history currently covers text conversations for supported chat models and modes. Media generation, search, transcription, agent/task, and other non-text Playground runs can still be reviewed through usage history even when no chat transcript is saved.

SettingBehavior
Default saved-chat cap50 saved chats per account
At the capNew chat turns still run, but additional conversations are not saved until you delete older chats or request a higher limit
Public APIUse GET /v1/playground/conversations to list saved chats and GET /v1/playground/conversations/{id} to load one

The Playground UI also shows a status chip in the chat header:

ChipMeaning
SavedThe latest settled turn is persisted
SavingThe client is waiting for the 600 ms auto-save debounce
Not savingThe account hit the saved-chat cap, so new turns continue but are not stored

The public saved-chat API is read-only. Saving and deleting chats still happens in the dashboard Playground.

MethodPathPurpose
GET/v1/playground/conversationsList saved conversations
GET/v1/playground/conversations/{id}Load one saved conversation with messages

Rate limit behavior

When you exceed a rate limit, the API returns a 429 Too Many Requests response. Use exponential backoff with jitter when retrying.

Rate limits are applied per account, not per API key. All keys on the same account share the same RPM and TPM budget.

Requesting higher limits

If your workload requires more than the default 50 RPM or 2M TPM, email support@empiriolabs.ai with:

  • Your account email or account ID
  • The limits you need and why
  • Expected traffic patterns (peak RPM, average request size)

Common errors

CodeMeaning
missing_api_keyNo bearer token was provided.
invalid_api_keyThe token is malformed, inactive, expired, or not found.
insufficient_creditsThe account needs more credits before making API calls.
model_not_foundThe requested model does not exist or is not available.
rate_limit_exceededThe account has exceeded its RPM or TPM limit. Retry with backoff.
gpu_limit_exceededThe account has reached its GPU Cloud limit. Destroy an instance, reduce the GPU count, or request a higher limit.
agent_limit_reachedThe account has reached its hosted-agent limit. Destroy an unused agent or request a higher limit.
model_unavailableThe model’s worker is temporarily offline. Retry shortly.
upstream_errorThe model provider returned an error.