GPU Cloud

GPUs that are actually in stock, by the hour.

H200, H100, A100 and L40S accelerators with NVLink, InfiniBand and a JupyterLab quickstart. Pay per second, snapshot mid-training, never sign an annual contract.

  • No credit card to start
  • Free migrations
  • Cancel any time
gpu_h100_18a · sjc1
CUDA 12.4
GPU
H100 80GB SXM5
Mem BW
3.35 TB/s
FP16 TFLOPs
989
NVLink
900 GB/s
vCPU
32 · EPYC 9354
RAM
256 GB DDR5
# reserve an H100 node and start jupyter
$ hostengine gpu create --plan "h100-80gb-x1"
✓ provisioned gpu_h100_18a in 92s
✓ jupyter at https://18a.gpu.hostengine.dev
>>> torch.cuda.get_device_name(0)
'NVIDIA H100 80GB HBM3'
H200 / H100
Latest accelerators in stock
$1.89/hr
L40S starting on-demand
3 TB/s
HBM3e on H200 nodes
InfiniBand
3.2 Tbps fabric on multi-GPU
Accelerator menu

The chip you need, in the region you need it.

Live capacity across all 14 regions. We post the per-SKU stock count on every dashboard refresh — never guess what is in inventory.

NVIDIA H200 141 GB
$5.49/hr
Memory
141 GB HBM3e
Bandwidth
4.8 TB/s
Regions
sjc1 · fra1 · sgp1
NVIDIA H100 80 GB SXM
$3.69/hr
Memory
80 GB HBM3
Bandwidth
3.35 TB/s
Regions
sjc1 · ash1 · fra1 · sgp1
NVIDIA H100 80 GB PCIe
$3.19/hr
Memory
80 GB HBM3
Bandwidth
2 TB/s
Regions
8 regions
NVIDIA A100 80 GB
$1.99/hr
Memory
80 GB HBM2e
Bandwidth
2 TB/s
Regions
10 regions
NVIDIA L40S 48 GB
$1.89/hr
Memory
48 GB GDDR6
Bandwidth
864 GB/s
Regions
11 regions
RTX 6000 Ada 48 GB
$1.39/hr
Memory
48 GB GDDR6
Bandwidth
960 GB/s
Regions
9 regions
Productivity

One link, JupyterLab on a real GPU.

No bashing CUDA versions, no compiling Triton, no chasing kernel headers. The image boots into a JupyterLab tab with PyTorch, JAX, vLLM and your repo cloned.

  • PyTorch 2.5, JAX 0.5, TensorFlow 2.16 — kept current
  • vLLM, TRL, axolotl, sglang and Hugging Face hub pre-warmed
  • GitHub repo auto-clone with one-line bootstrap
Container image
PyTorch 2.5.1 · cu124
CUDA Toolkit 12.4.1
Triton 3.1.0
vLLM 0.6.4 · pinned
Flash-Attn 2.7.0 · prebuilt
Jupyter 4.3 · password-locked
Networking

InfiniBand fabric for multi-GPU runs.

When you need eight H100s talking, you need 3.2 Tbps of NDR InfiniBand between them — not best-effort Ethernet. Our cluster nodes deliver it by default.

  • NDR ConnectX-7 · 400 Gb/s per link · GPUDirect RDMA
  • RCCL and NCCL tuned for our exact topology
  • Free 100 TB transfer from object storage to GPU node
H100·0 80GB H100·1 80GB H100·2 80GB H100·3 80GB H100·4 80GB H100·5 80GB H100·6 80GB H100·7 80GB 3.2 Tbps NDR fabric · NVLink 900 GB/s
Economics

Per-second billing, weekly reservations.

Pay only for the seconds you train, or commit to a week and save 22%. No annual contracts, no upfront capex, no calls with sales unless you want one.

  • Per-second billing after first hour, on every SKU
  • Reservations from 1 week to 3 years, 14% – 47% off list
  • Spot pricing up to 65% off, with snapshot-on-preempt
H100 · pricing tiers
On-demand $3.69/hr
1 week reserved $3.18/hr
1 month reserved $2.79/hr
1 year reserved $2.34/hr
Spot (preempt) $1.29/hr
Plans

Three lanes, all per-second.

Need a different shape — say A100 + 2 TB shared NVMe? Build a custom node in the dashboard or talk to us.

Inference

Mid-size models, batch generation, vector search.

$1.89 /hr · L40S
  • 1 × NVIDIA L40S 48 GB
  • 16 vCPU EPYC · 128 GB DDR5
  • 1.6 TB NVMe scratch
  • JupyterLab + Ollama pre-installed
  • Per-second billing after first hour
  • Free egress to HostEngine compute
Start with Inference
Training
Most popular

Single-node fine-tuning, 70B inference.

$3.69 /hr · H100 80 GB
  • 1 × NVIDIA H100 80 GB SXM5
  • 32 vCPU EPYC Genoa · 256 GB DDR5
  • 3.84 TB NVMe Gen4 scratch
  • 200 Gbps NVLink to neighbours
  • PyTorch 2.5 + CUDA 12.4 image
  • Snapshot in-flight training state
Start with Training
Cluster

Pre-training, distributed RLHF, dense MoE.

$28.80 /hr · 8 × H100
  • 8 × H100 80 GB SXM5 with NVLink
  • 192 vCPU · 2 TB DDR5
  • 30 TB local NVMe + 100 TB network
  • 3.2 Tbps NDR InfiniBand fabric
  • Reserved-week pricing −22%
  • Dedicated GPU SRE on Slack
Start with Cluster

All plans include CUDA 12.4, PyTorch 2.5, JupyterLab, snapshot-on-preempt and free egress to HostEngine object storage and CDN.

Who it's for

From notebooks to dense MoE pre-training.

ML researcher at a biotech

Fine-tunes a 13B protein model nightly

Spins an H100 node at 23:00, runs LoRA training, snapshots weights into object storage at 04:00 and tears it down — total cost $34/run.

Generative-art platform

Serves 20k images/minute at peak

Auto-scales between 4 and 36 L40S replicas based on queue depth. Saves ~$18,000/month versus running 36 replicas 24/7 on a hyperscaler.

Indie LLM team

Pre-trains a 7B model from scratch

Reserves an 8×H100 node for 14 days at the weekly rate. Trains 12B tokens, ships the checkpoint to Hugging Face, releases the box.

Compare

The numbers that change a procurement deck.

Capability
HostEngine
Hyperscaler A
Legacy Host B
H100 80 GB on-demand $3.69/hr $8.50/hr Reservation only
Per-second billing
InfiniBand fabric included +contract
Pre-baked PyTorch / CUDA image
Free egress to platform storage
Snapshot mid-training state
Reservations from 1 week 1 year minimum 1 year minimum
Stack

The frameworks your team already loves.

Integrates with the stack you already use

  • PyTorch 2.5
  • CUDA 12.4
  • JAX
  • TensorFlow
  • Triton
  • vLLM
  • TRL
  • Hugging Face
  • Weights & Biases
  • JupyterLab
  • Slurm
  • Kubeflow
  • Ray
  • Determined
FAQ

Questions ML teams keep asking.

Which GPUs are available right now?
H200 141 GB, H100 80 GB SXM5, H100 80 GB PCIe, A100 80 GB, L40S 48 GB and RTX 6000 Ada 48 GB. Capacity for each SKU is listed live on the dashboard with ETA if a region is full.
Can I reserve a multi-GPU node?
Yes — reservations from one week up to three years. The minimum reservation discount is 14% (one-week) and the maximum is 47% (three-year, paid upfront).
What about networking between nodes?
Multi-node training runs over 3.2 Tbps NDR InfiniBand inside a single availability zone. Cross-zone goes through 400 GbE Ethernet with RoCE.
Do you support Kubernetes / Slurm?
Both. We expose CRDs for the NVIDIA GPU operator and ship a managed Slurm option for traditional HPC desks. Kubeflow, Ray and Determined deploy in one Helm command.
How does scratch storage work?
Every GPU node has local NVMe scratch (1.6 – 30 TB). Persist artefacts to the platform object store (free egress) or to a managed parallel filesystem with 80 GB/s aggregate read.
Are GPUs MIG-able?
Yes — we expose MIG profiles on H100 and H200 so you can split a single accelerator into 1g.20gb / 2g.40gb / 3g.80gb instances. Useful for cheap inference replicas.

Used by 1,200+ ML teams and 38 frontier-model labs

Northwind
Cobalt Studio
Volcrest
Northbeam AI
Halcyon
Acme Cloud
Pinepoint
Verdant
Helix Labs
Riverstone
Iron Forge
Beacon
Northwind
Cobalt Studio
Volcrest
Northbeam AI
Halcyon
Acme Cloud
Pinepoint
Verdant
Helix Labs
Riverstone
Iron Forge
Beacon
Ready when you are

Spin up an H100 by the time you finish coffee.

No reservation form, no quota request — pick a region, click deploy, get a JupyterLab link in 92 seconds.

  • No credit card to start
  • Free migration from any provider
  • 99.99% uptime SLA, in writing
Frankfurt · 3 nodes · healthy
38ms p99
# spin up a 4 vCPU / 8 GB cloud VPS in 55s
$ hostengine vps create --plan "performance-4x8" --region "fra1"
✓ provisioned vps_2x9k1q  (172.247.18.42)
✓ image debian-12 ready · ssh keys attached
✓ snapshot policy: hourly · backups: 30 days

$ hostengine domain attach "trading.acme.io" --ssl
✓ DNS verified · Let's Encrypt cert issued in 6.4s
55s
median provision
14
global regions
$200
welcome credit