For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
WebsiteModelsPricingGet Started
DocumentationAPI Reference
DocumentationAPI Reference
  • Overview
    • Welcome
    • Getting Started
    • Authentication
    • Concepts
  • Platform
    • Models and Pricing
    • Billing and Credits
    • Limits and API Keys
    • Account Usage API
    • Generation Templates
    • GPU Cloud
    • OpenAI and Anthropic Compatibility
    • Integrations
  • Providers and Models
    • All providers
  • Reference
    • API Reference Overview
    • AI Agent Access
    • Support
    • Changelog
Logo
WebsiteModelsPricingGet Started
On this page
  • How it works
  • Browse the catalog
  • Deploy a GPU
  • Deploy a model
  • Deploy a template
  • Deploy a custom Docker image
  • Manage rentals
  • Connect to a running GPU
  • Billing
Platform

GPU Cloud

Rent on-demand GPUs by the second, run any model or Docker workload, and reach it through one API.

Was this page helpful?
Previous

OpenAI and Anthropic Compatibility

Use familiar request shapes while routing through EmpirioLabs AI
Next
Built with

GPU Cloud lets you rent on-demand GPUs and run anything on them: serve any open model, spin up JupyterLab or ComfyUI, or launch your own Docker image. Rentals are billed by the second against your credit balance, with no commitment. Every GPU stays private and is reachable only through your EmpirioLabs API key.

You can manage rentals from the GPU Cloud page in the dashboard or directly through the API documented here.

Deploying GPUs is rolling out to early-access accounts. Browsing the catalog and pricing is open to everyone. If your account is not in the early-access group yet, deploy calls return 404.

How it works

  1. Pick a GPU from the catalog (RTX 4090, RTX 5090, L40S, A100, H100, H200, and more). Each has a listed hourly price and live availability.
  2. Deploy a model, a template, or a custom Docker image onto it.
  3. Connect to the running workload through https://api.empiriolabs.ai/v1/gpu/connect/{id}/..., authenticated with your API key.
  4. Stop or destroy when you are done. Billing is metered by the second and stops the moment the GPU stops.

Pricing is listed per GPU per hour but billed by the second, so a ten minute run costs ten minutes. Your running and lifetime GPU spend is shown on the dashboard GPU Cloud page.

Browse the catalog

The catalog is public and cacheable, like GET /v1/models. Prices and availability update live as capacity changes.

GET
/v1/gpu/catalog
1curl https://api.empiriolabs.ai/v1/gpu/catalog
Try it
$curl https://api.empiriolabs.ai/v1/gpu/catalog
1{
2 "object": "list",
3 "data": [
4 {
5 "slug": "rtx-4090",
6 "name": "RTX 4090",
7 "vram_gb": 24,
8 "price_hourly": 0.80,
9 "available": true,
10 "available_count": 21,
11 "max_gpus": 8,
12 "regions": ["US", "EU"]
13 }
14 ]
15}
GET
/v1/gpu/catalog/:slug
1curl https://api.empiriolabs.ai/v1/gpu/catalog/rtx-4090
Try it

Deploy a GPU

Deploying starts a rental. There are four ways to provision it.

POST
/v1/gpu/instances
1curl -X POST https://api.empiriolabs.ai/v1/gpu/instances \
2 -H "Authorization: Bearer <token>" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "gpu_slug": "rtx-4090"
6}'
Try it

Deploy a model

Pass a curated template_slug or paste any Hugging Face repo id. The model is served with vLLM and is OpenAI-compatible at /v1.

$curl https://api.empiriolabs.ai/v1/gpu/instances \
> -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "gpu_slug": "rtx-4090",
> "mode": "model",
> "hf_id": "Qwen/Qwen2.5-7B-Instruct"
> }'

For gated repos, pass a token in env:

1{
2 "gpu_slug": "a100-80gb",
3 "mode": "model",
4 "hf_id": "meta-llama/Llama-3.1-8B-Instruct",
5 "env": { "HF_TOKEN": "hf_..." }
6}

Deploy a template

Templates are ready-to-run environments. Available templates: PyTorch + JupyterLab, ComfyUI, Web Terminal, and Ollama.

$curl https://api.empiriolabs.ai/v1/gpu/instances \
> -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{ "gpu_slug": "rtx-4090", "mode": "template", "template_slug": "pytorch-jupyter" }'

Deploy a custom Docker image

Run your own image. Use a CUDA base image (for example nvidia/cuda or pytorch/pytorch); CPU-only images may fail the GPU runtime.

$curl https://api.empiriolabs.ai/v1/gpu/instances \
> -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "gpu_slug": "rtx-4090",
> "mode": "custom",
> "image": "pytorch/pytorch:2.4.0-cuda12.1-cudnn9-runtime",
> "ports": [8000],
> "env": { "MY_VAR": "value" }
> }'

The response returns the rental in provisioning. Poll GET /v1/gpu/instances/{id} until status is running.

Manage rentals

GET
/v1/gpu/instances
1curl https://api.empiriolabs.ai/v1/gpu/instances \
2 -H "Authorization: Bearer <token>" \
3 -H "Content-Type: application/json"
Try it
GET
/v1/gpu/instances/:instance_id
1curl https://api.empiriolabs.ai/v1/gpu/instances/instance_id \
2 -H "Authorization: Bearer <token>" \
3 -H "Content-Type: application/json"
Try it
POST
/v1/gpu/instances/:instance_id/:action
1curl -X POST https://api.empiriolabs.ai/v1/gpu/instances/instance_id/stop \
2 -H "Authorization: Bearer <token>" \
3 -H "Content-Type: application/json" \
4 -d '{}'
Try it

Use stop to pause billing while keeping the disk, start to resume, and refresh to re-sync live status.

$# Stop (pauses billing, keeps the disk)
$curl -X POST https://api.empiriolabs.ai/v1/gpu/instances/$ID/stop \
> -H "Authorization: Bearer $EMPIRIOLABS_API_KEY"
DELETE
/v1/gpu/instances/:instance_id
1curl -X DELETE https://api.empiriolabs.ai/v1/gpu/instances/instance_id \
2 -H "Authorization: Bearer <token>" \
3 -H "Content-Type: application/json" \
4 -d '{}'
Try it

Destroying a rental stops billing permanently and cannot be undone.

Connect to a running GPU

Your GPU has no public address. Reach the service running on it through the connect proxy, authenticated with your API key. The proxy supports GET, POST, PUT, PATCH, DELETE, and streaming responses.

GET
/v1/gpu/connect/:instance_id/:path
1curl https://api.empiriolabs.ai/v1/gpu/connect/instance_id/v1%2Fchat%2Fcompletions \
2 -H "Authorization: Bearer <token>" \
3 -H "Content-Type: application/json"
Try it

For a model deploy, the workload is OpenAI-compatible, so point any OpenAI client at the connect base:

$curl https://api.empiriolabs.ai/v1/gpu/connect/$ID/v1/chat/completions \
> -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "model": "Qwen/Qwen2.5-7B-Instruct",
> "messages": [{ "role": "user", "content": "Hello!" }]
> }'
1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.empiriolabs.ai/v1/gpu/connect/INSTANCE_ID/v1",
5 api_key="EMPIRIOLABS_API_KEY",
6)
7
8resp = client.chat.completions.create(
9 model="Qwen/Qwen2.5-7B-Instruct",
10 messages=[{"role": "user", "content": "Hello!"}],
11)
12print(resp.choices[0].message.content)

For a JupyterLab, ComfyUI, Web Terminal, or Ollama template, open the connect base in your browser or point the relevant client at it.

Billing

  • Rentals are billed by the second at the listed hourly price, against your credit balance.
  • Billing starts when the GPU reaches running and stops when it stops or is destroyed.
  • If your balance reaches zero, running rentals are stopped automatically so a rental can never spend you into a deep hole.
  • Your running and lifetime GPU spend is shown on the dashboard GPU Cloud page.

Add credits on the Billing page before starting a GPU. A deploy with a zero balance returns 402.