GPU Cloud
Rent on-demand GPUs by the second, run any model or Docker workload, and reach it through one API.
Rent on-demand GPUs by the second, run any model or Docker workload, and reach it through one API.
GPU Cloud lets you rent on-demand GPUs and run anything on them: serve any open model, spin up JupyterLab or ComfyUI, or launch your own Docker image. Rentals are billed by the second against your credit balance, with no commitment. Every GPU stays private and is reachable only through your EmpirioLabs API key.
You can manage rentals from the GPU Cloud page in the dashboard or directly through the API documented here.
Deploying GPUs is rolling out to early-access accounts. Browsing the catalog and pricing is open to everyone. If your account is not in the early-access group yet, deploy calls return 404.
https://api.empiriolabs.ai/v1/gpu/connect/{id}/..., authenticated with your API key.Pricing is listed per GPU per hour but billed by the second, so a ten minute run costs ten minutes. Your running and lifetime GPU spend is shown on the dashboard GPU Cloud page.
The catalog is public and cacheable, like GET /v1/models. Prices and availability update live as capacity changes.
Deploying starts a rental. There are four ways to provision it.
Pass a curated template_slug or paste any Hugging Face repo id. The model is served with vLLM and is OpenAI-compatible at /v1.
For gated repos, pass a token in env:
Templates are ready-to-run environments. Available templates: PyTorch + JupyterLab, ComfyUI, Web Terminal, and Ollama.
Run your own image. Use a CUDA base image (for example nvidia/cuda or pytorch/pytorch); CPU-only images may fail the GPU runtime.
The response returns the rental in provisioning. Poll GET /v1/gpu/instances/{id} until status is running.
Use stop to pause billing while keeping the disk, start to resume, and refresh to re-sync live status.
Destroying a rental stops billing permanently and cannot be undone.
Your GPU has no public address. Reach the service running on it through the connect proxy, authenticated with your API key. The proxy supports GET, POST, PUT, PATCH, DELETE, and streaming responses.
For a model deploy, the workload is OpenAI-compatible, so point any OpenAI client at the connect base:
For a JupyterLab, ComfyUI, Web Terminal, or Ollama template, open the connect base in your browser or point the relevant client at it.
running and stops when it stops or is destroyed.Add credits on the Billing page before starting a GPU. A deploy with a zero balance returns 402.