GPU Cloud | EmpirioLabs AI Docs

GPU Cloud provisions managed GPU instances for model serving, notebooks, ComfyUI, Web Terminal, Ollama, or your own CUDA image. Billing is metered by the second against your credit balance. Connect to running workloads through authenticated EmpirioLabs API paths.

You can manage instances from the GPU Cloud page in the dashboard or through the API documented here.

How it works

Pick a GPU from the catalog. Each row shows VRAM, hourly pricing, and the exact available count.
Choose a workload: a curated model, a Hugging Face model id, a template, or a custom CUDA Docker image.
Deploy the instance. Your dashboard settings show the current GPU limit for your account.
Wait for readiness. New instances move through provisioning, then loading, then running.
Connect through the API using your EmpirioLabs API key.
Stop or destroy when you are done. Stopping releases the GPU and its disk and pauses billing, keeping only the deploy configuration so you can start a fresh instance later. Destroyed instances are permanently removed.

Stopping does not save the instance disk. A stopped instance keeps only its deploy configuration (image, GPU type, disk size, ports, and environment variables), not the files on disk. When you start it again it redeploys fresh, so anything written during the previous session, such as downloaded models, checkpoints, notebooks, or generated output, is not carried over. Move anything you want to keep to your own storage before stopping.

Pricing and limits

Prices are listed per GPU per hour and billed by the second.
Multi-GPU deployments are billed as listed hourly price x GPU count.
Billing starts when an instance reaches running.
Billing stops when an instance is stopped or destroyed.
Deploying and starting an instance require enough credit balance for the initial running window.
Running instances are stopped automatically when the balance threshold is no longer sufficient.
GPU Cloud limits are account-scoped. Your dashboard settings show your effective limit.
Disk size can be requested from 100 GB to 300 GB.

Browse the catalog

The catalog returns customer-safe GPU metadata, pricing, and current availability.

GET

/v1/gpu/catalog

1 curl https://api.empiriolabs.ai/v1/gpu/catalog

Try it

$ curl https://api.empiriolabs.ai/v1/gpu/catalog

1 {
2   "object": "list",
3   "data": [
4     {
5       "slug": "rtx-4090",
6       "name": "RTX 4090",
7       "vram_gb": 24,
8       "price_hourly": 0.65,
9       "available": true,
10       "available_count": 21,
11       "max_gpus": 8,
12       "regions": ["US", "EU"]
13     }
14   ]
15 }

GET

/v1/gpu/catalog/:slug

1 curl https://api.empiriolabs.ai/v1/gpu/catalog/rtx-4090

Try it

Deploy an instance

Deploying starts provisioning and returns an instance in provisioning status. Poll GET /v1/gpu/instances/{id} until status is running. If allocation or setup cannot become ready in time, the instance moves to error and the allocation is canceled automatically.

POST

/v1/gpu/instances

1 curl -X POST https://api.empiriolabs.ai/v1/gpu/instances \
2      -H "Authorization: Bearer <token>" \
3      -H "Content-Type: application/json" \
4      -d '{
5   "gpu_slug": "rtx-4090"
6 }'

Try it

Deploy a model

Pass a curated template_slug or paste any Hugging Face repo id. Model deployments are served from an OpenAI-compatible /v1 endpoint on the instance.

$ curl https://api.empiriolabs.ai/v1/gpu/instances \
>   -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "gpu_slug": "rtx-4090",
>     "mode": "model",
>     "hf_id": "Qwen/Qwen2.5-7B-Instruct"
>   }'

For gated repos, pass the token in env:

1 {
2   "gpu_slug": "a100-80gb",
3   "mode": "model",
4   "hf_id": "meta-llama/Llama-3.1-8B-Instruct",
5   "env": { "HF_TOKEN": "hf_..." }
6 }

Deploy a template

Templates are ready-to-run environments. Available templates include PyTorch + JupyterLab, ComfyUI, Web Terminal, and Ollama.

$ curl https://api.empiriolabs.ai/v1/gpu/instances \
>   -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "gpu_slug": "rtx-4090",
>     "mode": "template",
>     "template_slug": "pytorch-jupyter",
>     "disk_gb": 150
>   }'

Deploy a custom Docker image

Run your own CUDA image. CPU-only images may fail because the runtime expects a GPU-compatible container.

$ curl https://api.empiriolabs.ai/v1/gpu/instances \
>   -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "gpu_slug": "rtx-4090",
>     "mode": "custom",
>     "image": "pytorch/pytorch:2.4.0-cuda12.1-cudnn9-runtime",
>     "ports": [8000],
>     "disk_gb": 150,
>     "env": { "MY_VAR": "value" }
>   }'

Manage lifecycle

GET

/v1/gpu/instances

1 curl https://api.empiriolabs.ai/v1/gpu/instances \
2      -H "Authorization: Bearer <token>" \
3      -H "Content-Type: application/json"

Try it

GET

/v1/gpu/instances/:instance_id

1 curl https://api.empiriolabs.ai/v1/gpu/instances/instance_id \
2      -H "Authorization: Bearer <token>" \
3      -H "Content-Type: application/json"

Try it

POST

/v1/gpu/instances/:instance_id/:action

1 curl -X POST https://api.empiriolabs.ai/v1/gpu/instances/instance_id/stop \
2      -H "Authorization: Bearer <token>" \
3      -H "Content-Type: application/json" \
4      -d '{}'

Try it

Use refresh to re-sync status, stop to release the running allocation and pause billing, and start to redeploy the saved instance spec.

$ curl -X POST https://api.empiriolabs.ai/v1/gpu/instances/$ID/stop \
>   -H "Authorization: Bearer $EMPIRIOLABS_API_KEY"

DELETE

/v1/gpu/instances/:instance_id

1 curl -X DELETE https://api.empiriolabs.ai/v1/gpu/instances/instance_id \
2      -H "Authorization: Bearer <token>" \
3      -H "Content-Type: application/json" \
4      -d '{}'

Try it

Destroying an instance stops billing permanently and cannot be undone.

Statuses

Status	Meaning
`provisioning`	Capacity is being allocated.
`loading`	The workload is starting or warming.
`running`	The workload is reachable through the connect path and billing is active.
`stopping`	A stop or destroy operation is being applied.
`stopped`	GPU billing is paused. Start redeploys the saved instance spec with fresh runtime disk.
`error`	Provisioning or runtime setup failed, or allocation did not become ready in time. The instance can be refreshed or destroyed.
`destroyed`	The instance has been permanently removed.

Connect to a running instance

Use the connect endpoint with your EmpirioLabs API key. It supports GET, POST, PUT, PATCH, DELETE, and streaming responses.

GET

/v1/gpu/connect/:instance_id/:path

1 curl https://api.empiriolabs.ai/v1/gpu/connect/instance_id/v1%2Fchat%2Fcompletions \
2      -H "Authorization: Bearer <token>" \
3      -H "Content-Type: application/json"

Try it

For a model deployment, call the OpenAI-compatible endpoint on the instance:

$ curl https://api.empiriolabs.ai/v1/gpu/connect/$ID/v1/chat/completions \
>   -H "Authorization: Bearer $EMPIRIOLABS_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "model": "Qwen/Qwen2.5-7B-Instruct",
>     "messages": [{ "role": "user", "content": "Hello!" }]
>   }'

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.empiriolabs.ai/v1/gpu/connect/INSTANCE_ID/v1",
5     api_key="EMPIRIOLABS_API_KEY",
6 )
7 
8 resp = client.chat.completions.create(
9     model="Qwen/Qwen2.5-7B-Instruct",
10     messages=[{"role": "user", "content": "Hello!"}],
11 )
12 print(resp.choices[0].message.content)

For JupyterLab, ComfyUI, Web Terminal, or Ollama, open the instance connect URL from the dashboard or send requests to the relevant connect path.

Chat with your model in the dashboard

When you deploy a model (or any instance that serves an OpenAI-compatible API), the dashboard gives you a built-in chat page so you can try the model right away without writing any code. Open the instance from the GPU Cloud page and choose Chat with this model. The chat page streams responses, supports a system prompt and the usual sampling controls (temperature, top-p, max tokens), and lets you attach images or audio for multimodal models. It runs against the same authenticated connect path as the API, so there is no extra setup and no separate billing: the instance is already metered by the second.

SSH and shell access

Use the Web Terminal template when you need a shell inside the workload, or expose an HTTP service from a custom container and reach it through /v1/gpu/connect/{instance_id}/{path}.

Usage and billing records

The GPU Cloud dashboard shows running spend and lifetime GPU spend. API lifecycle responses include the instance price, GPU count, billing status, and billed amount so you can reconcile usage from your own systems.