GPUs that are actually in stock, by the hour.
H200, H100, A100 and L40S accelerators with NVLink, InfiniBand and a JupyterLab quickstart. Pay per second, snapshot mid-training, never sign an annual contract.
- No credit card to start
- Free migrations
- Cancel any time
# reserve an H100 node and start jupyter $ hostengine gpu create --plan "h100-80gb-x1" ✓ provisioned gpu_h100_18a in 92s ✓ jupyter at https://18a.gpu.hostengine.dev >>> torch.cuda.get_device_name(0) 'NVIDIA H100 80GB HBM3'
The chip you need, in the region you need it.
Live capacity across all 14 regions. We post the per-SKU stock count on every dashboard refresh — never guess what is in inventory.
- Memory
- 141 GB HBM3e
- Bandwidth
- 4.8 TB/s
- Regions
- sjc1 · fra1 · sgp1
- Memory
- 80 GB HBM3
- Bandwidth
- 3.35 TB/s
- Regions
- sjc1 · ash1 · fra1 · sgp1
- Memory
- 80 GB HBM3
- Bandwidth
- 2 TB/s
- Regions
- 8 regions
- Memory
- 80 GB HBM2e
- Bandwidth
- 2 TB/s
- Regions
- 10 regions
- Memory
- 48 GB GDDR6
- Bandwidth
- 864 GB/s
- Regions
- 11 regions
- Memory
- 48 GB GDDR6
- Bandwidth
- 960 GB/s
- Regions
- 9 regions
One link, JupyterLab on a real GPU.
No bashing CUDA versions, no compiling Triton, no chasing kernel headers. The image boots into a JupyterLab tab with PyTorch, JAX, vLLM and your repo cloned.
- PyTorch 2.5, JAX 0.5, TensorFlow 2.16 — kept current
- vLLM, TRL, axolotl, sglang and Hugging Face hub pre-warmed
- GitHub repo auto-clone with one-line bootstrap
InfiniBand fabric for multi-GPU runs.
When you need eight H100s talking, you need 3.2 Tbps of NDR InfiniBand between them — not best-effort Ethernet. Our cluster nodes deliver it by default.
- NDR ConnectX-7 · 400 Gb/s per link · GPUDirect RDMA
- RCCL and NCCL tuned for our exact topology
- Free 100 TB transfer from object storage to GPU node
Per-second billing, weekly reservations.
Pay only for the seconds you train, or commit to a week and save 22%. No annual contracts, no upfront capex, no calls with sales unless you want one.
- Per-second billing after first hour, on every SKU
- Reservations from 1 week to 3 years, 14% – 47% off list
- Spot pricing up to 65% off, with snapshot-on-preempt
Three lanes, all per-second.
Need a different shape — say A100 + 2 TB shared NVMe? Build a custom node in the dashboard or talk to us.
Mid-size models, batch generation, vector search.
- 1 × NVIDIA L40S 48 GB
- 16 vCPU EPYC · 128 GB DDR5
- 1.6 TB NVMe scratch
- JupyterLab + Ollama pre-installed
- Per-second billing after first hour
- Free egress to HostEngine compute
Single-node fine-tuning, 70B inference.
- 1 × NVIDIA H100 80 GB SXM5
- 32 vCPU EPYC Genoa · 256 GB DDR5
- 3.84 TB NVMe Gen4 scratch
- 200 Gbps NVLink to neighbours
- PyTorch 2.5 + CUDA 12.4 image
- Snapshot in-flight training state
Pre-training, distributed RLHF, dense MoE.
- 8 × H100 80 GB SXM5 with NVLink
- 192 vCPU · 2 TB DDR5
- 30 TB local NVMe + 100 TB network
- 3.2 Tbps NDR InfiniBand fabric
- Reserved-week pricing −22%
- Dedicated GPU SRE on Slack
All plans include CUDA 12.4, PyTorch 2.5, JupyterLab, snapshot-on-preempt and free egress to HostEngine object storage and CDN.
From notebooks to dense MoE pre-training.
ML researcher at a biotech
Fine-tunes a 13B protein model nightly
Spins an H100 node at 23:00, runs LoRA training, snapshots weights into object storage at 04:00 and tears it down — total cost $34/run.
Generative-art platform
Serves 20k images/minute at peak
Auto-scales between 4 and 36 L40S replicas based on queue depth. Saves ~$18,000/month versus running 36 replicas 24/7 on a hyperscaler.
Indie LLM team
Pre-trains a 7B model from scratch
Reserves an 8×H100 node for 14 days at the weekly rate. Trains 12B tokens, ships the checkpoint to Hugging Face, releases the box.
The numbers that change a procurement deck.
| Capability | HostEngine | Hyperscaler A | Legacy Host B |
|---|---|---|---|
| H100 80 GB on-demand | $3.69/hr | $8.50/hr | Reservation only |
| Per-second billing | |||
| InfiniBand fabric included | +contract | ||
| Pre-baked PyTorch / CUDA image | |||
| Free egress to platform storage | |||
| Snapshot mid-training state | |||
| Reservations from 1 week | 1 year minimum | 1 year minimum |
The frameworks your team already loves.
Integrates with the stack you already use
- PyTorch 2.5
- CUDA 12.4
- JAX
- TensorFlow
- Triton
- vLLM
- TRL
- Hugging Face
- Weights & Biases
- JupyterLab
- Slurm
- Kubeflow
- Ray
- Determined
Questions ML teams keep asking.
Which GPUs are available right now?
Can I reserve a multi-GPU node?
What about networking between nodes?
Do you support Kubernetes / Slurm?
How does scratch storage work?
Are GPUs MIG-able?
Used by 1,200+ ML teams and 38 frontier-model labs
Spin up an H100 by the time you finish coffee.
No reservation form, no quota request — pick a region, click deploy, get a JupyterLab link in 92 seconds.
- No credit card to start
- Free migration from any provider
- 99.99% uptime SLA, in writing
# spin up a 4 vCPU / 8 GB cloud VPS in 55s $ hostengine vps create --plan "performance-4x8" --region "fra1" ✓ provisioned vps_2x9k1q (172.247.18.42) ✓ image debian-12 ready · ssh keys attached ✓ snapshot policy: hourly · backups: 30 days $ hostengine domain attach "trading.acme.io" --ssl ✓ DNS verified · Let's Encrypt cert issued in 6.4s