Bare Metal vs Cloud: When Smart Infrastructure Means Moving Off AWS
Three real migrations off AWS / Azure saving 50-70% of the cloud bill while improving performance. The workload shapes where bare-metal wins (steady-state + EU-only + egress-heavy + latency-sensitive) and the decision tree we use before recommending the move.
Cloud is the right answer for most workloads most of the time. It is the wrong answer for a specific class of workloads that has grown larger and more obvious since 2023. This article is the honest math + three migration case studies from clients who moved off AWS or Azure and saved 50-70% of their infrastructure bill while improving performance.
Not a "cloud bad" piece. Cloud earns its premium for elasticity, geographic distribution, and operational simplicity. The point is that "the cloud is always right" is a religion, not an analysis. For the right workload, on a 5-year horizon, dedicated bare-metal is materially cheaper and often faster.
The workloads where cloud loses
Five workload shapes where bare-metal consistently wins the TCO comparison:
- Steady-state compute. Workloads that run at constant utilisation 24/7. The cloud's elastic premium has no payback when there is nothing to scale.
- Stateful + predictable databases. Production OLTP databases with bounded growth + known IOPS profile. Cloud-managed databases (RDS, Cloud SQL, Azure SQL) carry a 2-4x premium over equivalent bare-metal.
- Egress-heavy services. Anything that pushes large volumes of data out (CDN origins, file delivery, backup targets serving as primary, media processing). Cloud egress fees compound brutally.
- GPU inference at sustained load. Hyperscaler GPU rental is 3-5x bare-metal cost at sustained 60%+ utilisation. Crosses the break-even quickly for inference workloads.
- Latency-sensitive on-prem-adjacent workloads. Manufacturing floor systems, retail POS, hospitality systems where on-prem latency to local devices matters and round-tripping to the cloud breaks the user experience.
Where cloud genuinely wins (the honest scoping)
To be fair, cloud is the right answer for:
- Bursty workloads. Marketing campaigns, holiday traffic, product launches. Pay for 2 hours of 50 servers, not 5 years of 2 servers.
- Greenfield startups. The capex avoidance + the operational simplicity at small scale.
- Geographic distribution. Edge presence in 30+ regions; impossible to replicate with bare-metal except via cloud.
- Managed services depth. RDS handles failover, patching, point-in-time recovery, read replicas. Replicating this on bare-metal is a real engineering investment.
- Compliance with specific certifications. Some regulatory regimes prefer / require certified-provider stamps that bare-metal cannot easily produce.
- Engineering team that does not want to operate physical infrastructure. Real preference; legitimate signal.
The TCO model (do this carefully)
The honest 5-year comparison needs both sides done correctly. Most "cloud is cheaper" or "cloud is more expensive" arguments fall into the trap of comparing list prices on one side vs internal-engineering-time on the other.
Cloud TCO components
- Compute (EC2 / Azure VMs / GCP VMs) — hourly × utilisation
- Storage (EBS / Managed Disks / Persistent Disks) — provisioned capacity × time
- Egress data transfer
- Managed services (RDS, ElastiCache, etc.) premium
- Load balancers + NAT gateways
- Snapshot + backup storage
- Reserved instances / Savings Plans discount
- Operational engineering time (real but often smaller than bare-metal equivalent)
Bare-metal TCO components
- Hardware (servers, switches, storage) amortised over 5 years
- Co-location / data centre rental (or own facility cost)
- Power + cooling (real cost, often forgotten)
- Internet uplink (multi-carrier for redundancy)
- Backup infrastructure
- Hardware vendor support contracts
- Spare-parts inventory
- Operational engineering time (genuinely larger than cloud equivalent)
The TCO model has to capture all 8 cloud lines and all 8 bare-metal lines. Comparing list-price cloud against hardware-only bare-metal is the most common modelling error.
Case study 01: SaaS B2B platform (50% reduction)
Starting state
A B2B SaaS company. ~200 customers, mostly EU. AWS-hosted: EKS cluster, RDS PostgreSQL, S3 for assets, CloudFront CDN, monthly bill stable at €38,000/month (€456k/year).
Workload shape: stable. Customers use the platform 9-18 weekdays, low weekend usage. No global distribution requirement (EU-only customers). Database 1.2 TB with 4-8% annual growth. Compute average utilisation 35%.
The decision
Workload was textbook bare-metal-friendly: stable, predictable, EU-only, stateful database, predictable growth. The CFO had spent two years pushing for cost analysis.
Migration target
- 3× Hetzner AX102 dedicated servers (Ryzen 9950X, 128GB RAM, 2× 1.92TB NVMe) for Kubernetes workers — €240/mo each
- 2× Hetzner AX102 for PostgreSQL primary + replica — €240/mo each
- Hetzner Cloud Load Balancer + Floating IPs — €30/mo
- Hetzner Storage Box for backup target — €60/mo
- Cloudflare R2 (S3-compatible) for asset storage with EU residency — variable, ~€200/mo at usage
- Cloudflare CDN for the asset delivery — included in R2 (no egress fees)
The migration
11 weeks. Three phases:
- Weeks 1-3: Provision Hetzner gear, build Kubernetes cluster, validate storage performance, restore database from RDS snapshot to test bare-metal Postgres.
- Weeks 4-7: Migrate non-critical workloads first (internal admin tools, analytics pipelines). Build operational runbooks.
- Weeks 8-11: Cut over the production application during a maintenance window. CDN swap. DNS cutover. Decommission AWS environment.
Result
| Line | Before (AWS) | After (bare-metal) |
|---|---|---|
| Compute | €18,000/mo | €1,200/mo |
| Database | €8,500/mo (RDS Multi-AZ) | €480/mo (2x Hetzner) |
| Storage + egress | €9,500/mo | €260/mo (R2, no egress fees) |
| Other (LB, CDN, snapshots) | €2,000/mo | €30/mo |
| Monthly total | €38,000 | €1,970 |
| Annualised | €456,000 | €23,640 |
| 5-year savings | €2,162,000 | |
Performance: median request latency dropped from 142ms to 89ms (single-region bare-metal NVMe vs multi-AZ cloud). P95 latency dropped similarly. Customer satisfaction did not change measurably (the existing performance was acceptable; the new performance was acceptable+).
Operational reality: hired one platform engineer (€78k/year fully loaded) dedicated to the on-prem environment. Net annual saving after the hire: ~€354k/year. 5-year net savings: ~€1.77M.
Case study 02: Manufacturing ERP (70% reduction)
Starting state
A mid-market manufacturing company. SAP-style ERP + custom Manufacturing Execution System running on Azure. Production-floor PLCs round-trip through the cloud for SCADA telemetry. Monthly bill: €22,000/month.
Workload shape: 24/7 steady-state. Latency-sensitive (PLCs expect <100ms response). EU-only. Heavy egress because telemetry streams continuously to the cloud and back.
The decision
Workload was actively hostile to cloud: latency-sensitive, egress-heavy, no elasticity requirement. The cloud bill was paying for everything the workload did not need.
Migration target
- 2× Dell PowerEdge R660 on-prem in the manufacturing facility (HA pair) — €32,000 one-time
- 1× backup target on-prem — €6,000 one-time
- Cloud-tier backup mirror at Hetzner — €40/mo
- 10GbE infrastructure already in place — €0 incremental
Result
| Line | Before (Azure) | After (on-prem + cloud backup) |
|---|---|---|
| Compute + storage | €18,000/mo | ~€640/mo (amortised hardware over 5y + power + support) |
| Egress / cross-region traffic | €3,500/mo | €0 |
| Cloud backup tier (DR only) | — | €40/mo |
| Monthly total | €22,000 | ~€680 |
| Annualised | €264,000 | ~€8,200 |
| 5-year savings | €1,279,000 | |
PLC round-trip latency dropped from 180ms (cloud round-trip via VPN) to 4ms (local network). The factory floor engineers stopped seeing the intermittent SCADA timeouts that had been a recurring pain.
Case study 03: Media SaaS (60% reduction)
Starting state
A media-asset management SaaS. Customers upload large video files, the platform processes (transcoding, AI tagging, thumbnail generation), stores, and delivers via CDN. AWS hosted: EC2 + EKS + RDS + S3 + CloudFront + MediaConvert. Monthly bill: €52,000/month.
Workload shape: variable. Compute spikes during processing; storage grows monotonically (~3 TB/month); egress is the dominant cost because customers download finished assets globally.
The decision
Hybrid. Bare-metal for the steady-state pieces; cloud retained for elastic processing.
Migration target
- Hetzner bare-metal for storage tier (Ceph cluster on 4× AX102): €1,200/mo
- Cloudflare R2 for hot tier + delivery (no egress fees!): variable, ~€800/mo
- Bare-metal Kubernetes for steady-state services: €960/mo
- AWS Fargate retained for elastic transcoding burst capacity: variable, ~€2,800/mo at typical load
- RDS retained for primary database (kept for managed-DB operational simplicity): €4,200/mo
Result
| Line | Before (full AWS) | After (hybrid) |
|---|---|---|
| Steady-state compute | €18,000/mo | €960/mo |
| Burst / transcoding | €8,400/mo | €2,800/mo (Fargate) |
| Storage | €6,000/mo (S3) | €1,200/mo (Ceph) + €800/mo (R2 hot) |
| Egress / CDN | €16,000/mo | €0 (R2 + Cloudflare CDN, no egress fees) |
| Database | €3,600/mo (RDS) | €4,200/mo (RDS retained) |
| Monthly total | €52,000 | €9,960 |
| Annualised | €624,000 | €119,520 |
| 5-year savings | €2,522,000 | |
The CDN egress kill is the most dramatic line. Cloudflare's no-egress-fees on R2 turned the cost shape on its head.
The decision tree we use
flowchart TD
A[Workload analysis] --> B{Steady-state utilisation > 40%?}
B -->|No - bursty| C[Cloud wins]
B -->|Yes - steady| D{Egress > 5 TB/month?}
D -->|Yes| E{Acceptable to operate physical infrastructure?}
D -->|No| F{Geographic distribution required?}
F -->|Yes - global| C
F -->|No - single region/EU only| E
E -->|Yes - have ops capacity| G[Bare-metal wins]
E -->|No - cannot operate physical| H{Worth hiring 1 platform engineer for the savings?}
H -->|Yes| G
H -->|No| I[Cloud is acceptable cost of avoiding physical ops]
G --> J{Burst pattern remains for some workloads?}
J -->|Yes| K[Hybrid - bare-metal steady + cloud burst]
J -->|No| L[Full bare-metal]The hosting choices for EU bare-metal
| Provider | Strengths | Note |
|---|---|---|
| Hetzner (Falkenstein + Helsinki + Nuremberg) | Lowest cost per Gbps + per CPU core in EU | Self-service, no enterprise support, EU-incorporated |
| OVHcloud (Roubaix + Strasbourg + Gravelines + Warsaw) | Broader managed services, EU-incorporated | Slightly higher cost than Hetzner; more product depth |
| Scaleway (Paris + Amsterdam + Warsaw) | Strong managed Kubernetes + serverless; EU-incorporated | Mid-range cost; good for hybrid bare-metal + managed |
| IONOS / 1&1 (Karlsruhe + Berlin) | Enterprise support; EU-incorporated | Higher cost; mature support; popular for regulated industries |
| Genesis Cloud | GPU-focused; renewable energy; EU-incorporated | Niche but useful for ML inference |
Our default for general-purpose bare-metal: Hetzner for the cost discipline + EU presence. For regulated industries that want enterprise support: IONOS.
The operational realities
Bare-metal carries real operational obligations that cloud abstracts. The honest list:
- Hardware fails. Disks die. PSUs degrade. NICs go intermittent. The on-call rotation includes hardware events.
- Patches require coordination. Cloud rotates underlying hosts; bare-metal needs explicit maintenance windows.
- Capacity planning becomes your problem. Procurement lead times (4-12 weeks for new hardware) require forecasting.
- Backup + DR are explicit engineering work. Cloud managed services hide this; bare-metal exposes it.
- The platform engineer is non-optional. 0.5-1.0 FTE depending on scale.
For the workloads in the case studies, the platform-engineer hire was the easy decision once the savings were visible. For smaller workloads (savings < €60k/year), the operational overhead does not pencil out.
The break-even thresholds we use
| Workload type | Cloud bill threshold to consider migration |
|---|---|
| Steady-state compute + database | €4,000/mo (€48k/year) — savings justify a 0.25 FTE platform engineer |
| Egress-heavy services | €2,500/mo cloud bill if >60% is egress |
| GPU inference at sustained load | €6,000/mo cloud bill at >50% utilisation |
| Latency-sensitive on-prem-adjacent | Any scale — latency matters more than TCO |
Below these thresholds, cloud's operational simplicity is the right premium to pay. Above them, the bare-metal migration is worth scoping.
What we changed our minds about
- The "DR is hard on bare-metal" argument is mostly wrong. Cross-region bare-metal DR via Hetzner Helsinki + Falkenstein is operationally similar to cross-AZ cloud DR for most workloads.
- The hiring difficulty is real. Platform engineers who can run bare-metal are rarer than cloud-native engineers. Budget for recruiting time.
- The hybrid pattern is increasingly the right answer. Pure bare-metal is rarely optimal; hybrid (bare-metal steady + cloud burst) consistently wins for media + ML workloads.
- Cloudflare R2 changed the egress math. No-egress-fee S3-compatible storage is the single biggest cost-shape shift of the last 2 years.
- The CFO conversation became easier. Showing the 5-year savings table converts skeptics fast.
The one paragraph version
Cloud wins for bursty + geographically-distributed + managed-service-heavy + sub-€48k/year-cost workloads. Bare-metal wins for steady-state + EU-only + egress-heavy + latency-sensitive workloads at scale. Three real migrations: B2B SaaS €456k → €24k/year (50% even after platform-engineer hire); manufacturing ERP €264k → €8k/year; media SaaS €624k → €120k/year via hybrid pattern. The break-even threshold is roughly €48k/year cloud bill for steady workloads; below that, cloud's operational simplicity is worth the premium. Hetzner is our default EU bare-metal provider; Cloudflare R2 + CDN is the egress-killer that makes the hybrid pattern dramatically cheaper than pure cloud.
If you want the TCO analysis run against your specific workload, that is part of every FinOps & Cost Management engagement we do. The free Bloodbath Scan includes a "what if we moved this off cloud" sizing as part of the diagnostic when the workload pattern suggests bare-metal is a candidate.