Site-to-Site VPN vs SD-WAN: Picking the Right Backbone for Multi-Office Operations
IPsec, WireGuard mesh, and SD-WAN compared against three real multi-office deployments. Decision matrix scoring throughput, latency, failover behaviour, operational complexity, vendor lock-in, and 5-year TCO from €13k to €265k.
"We need to connect three offices" sounds like a single problem with one right answer. It is not. The answer depends on the traffic patterns, the uplink mix, the operational headcount, and what happens when the primary uplink at the smallest site dies on a Friday evening. This article walks the decision matrix between IPsec site-to-site, WireGuard mesh, and full SD-WAN — with three real deployments documented honestly.
No vendor pitches. The three options have legitimate sweet spots and legitimate failure modes. Pick wrong and you spend the next 18 months paying for either capability you do not use or capability you needed and skipped.
The three architectures
Architecture A: IPsec site-to-site (the venerable default)
The classic. Each site has a firewall / VPN concentrator. Tunnels are configured pairwise (or hub-and-spoke) between sites. Encryption via IKEv2 + IPsec ESP. Traffic between sites encrypted; traffic to the internet exits the local site.
Mature, well-supported on every firewall on the market, no vendor lock-in. Throughput limited by the firewall CPU and the slowest uplink between two sites.
Architecture B: WireGuard mesh
The modern alternative. Each site has a Linux gateway running WireGuard. Mesh topology: every site has a direct tunnel to every other site. Encryption via WireGuard's Noise-protocol-derived crypto. Throughput typically 2-4x higher than IPsec on the same hardware.
Simpler config (a few dozen lines of WireGuard config vs hundreds of lines of strongSwan / Cisco IPsec). Operationally requires Linux familiarity. Less vendor support for "managed" deployments.
Architecture C: SD-WAN appliances
The vendor-managed approach. Each site has an SD-WAN appliance (Cisco Meraki MX, Fortinet FortiGate-SD-WAN, Palo Alto Prisma, Versa, VMware VeloCloud, Cato Networks SASE). Tunnels are auto-built; routing is application-aware; vendor cloud orchestration handles config, monitoring, and policy.
Adds application-aware routing (a Teams call uses a different path than a backup transfer), automated failover between multiple uplinks at each site, policy-driven QoS. Comes with per-appliance + per-site licensing.
The decision matrix
| Factor | IPsec S2S | WireGuard mesh | SD-WAN |
|---|---|---|---|
| CAPEX | Low (use existing firewalls) | Very low (small Linux box per site) | High (€2-€8k per appliance + licensing) |
| OPEX | Low | Low (Linux ops time) | Medium (€100-€500/site/month licensing) |
| Setup time per site | 2-4 hours | 30-90 minutes | 30-60 minutes (zero-touch provisioning) |
| Throughput per tunnel | 200-800 Mbps typical | 800 Mbps - 4 Gbps | 1-10 Gbps depending on appliance |
| Latency overhead | ~1-3 ms | ~1-2 ms | ~1-3 ms |
| Application-aware routing | ❌ (manual policy) | ❌ (manual policy) | ✅ (built-in DPI) |
| Multi-uplink per site failover | Manual scripts | Manual scripts | Sub-second automated |
| Operational expertise | Network engineer | Linux + network engineer | Vendor portal user |
| Vendor lock-in | Low (standard protocols) | Very low (open source) | High (per-vendor cloud) |
| Best for | Stable, low-change, mid-size | Technical team, modest sites | Many sites, multi-uplink, complex apps |
Deployment 01: 3-office consultancy (IPsec winner)
Client profile: 80-person consultancy, three offices (HQ + two regional). Each office has a single fibre uplink. Traffic between offices is modest (file sharing, occasional video, no cross-site database replication). Operations team has one network engineer.
What we deployed
IPsec site-to-site in a hub-and-spoke topology with HQ as the hub. Sophos XGS firewalls at each site (existing investment). IKEv2 with AES-256-GCM, perfect forward secrecy enabled.
[HQ - Sophos XGS]
/ \
/ IPsec \ IPsec
/ IKEv2 \ IKEv2
/ \
[Regional A - Sophos XG] [Regional B - Sophos XG]
What worked
- Setup completed in two engineering days
- Steady-state throughput: ~600 Mbps per tunnel (limited by regional uplink)
- No vendor lock-in beyond Sophos firewalls (which were already in place)
- Zero new licensing cost
- Per-tunnel monitoring via the existing Sophos cloud dashboard
What did not work
- Regional A's fibre had a 4-hour outage in month 3. No automated failover; the office was offline. The client accepted this in the design decision but it stung when it happened.
- Inter-regional traffic (Regional A → Regional B) hairpinned through HQ. Latency was acceptable (4-8 ms additional) but noticeable in Teams quality.
Verdict
Right architecture for this client. The 4-hour outage was acceptable for the operations profile. Total project cost €4,500 (engineering only). Annual operating cost €2,000 (monitoring + occasional config tweak).
Deployment 02: 5-office software vendor (WireGuard mesh)
Client profile: 220-person software company, five offices across the EU. Engineering teams need cross-site access to internal services (Git, CI/CD, internal package registries). Linux-native operations team. Existing investment in HashiCorp + Terraform.
What we deployed
WireGuard mesh — every site has a direct tunnel to every other site. Linux gateways (small Hetzner-class hardware: Intel NUC i5, 16GB RAM, 256GB SSD running Debian Stable). WireGuard config managed via Ansible. Routing via simple static routes pushed by Ansible.
Each gateway also runs Tailscale as a backup mesh — if the primary WireGuard mesh has an issue, Tailscale's coordination server can rebuild connectivity. The two stacks coexist without conflict.
What worked
- Per-tunnel throughput: 1.8-3.5 Gbps measured (gigabit-uplink limited at three of five sites)
- Setup per new site: 35 minutes including hardware power-on, Ansible play, validation
- Total infrastructure cost: 5 × €600 NUC + €0 software = €3,000 CAPEX
- Tailscale fallback caught two real failures where the primary mesh had a key-rotation issue
- Cross-site latency consistently low (mesh, no hairpinning)
What did not work
- The mesh-topology key rotation became operationally painful at 5 sites. n(n-1)/2 = 10 tunnels to update each rotation. Automated, but the moment of cutover required care.
- WireGuard's "stateless" design means there is no notion of "the tunnel is down" — just packets failing to be returned. Monitoring required custom Prometheus exporters that probe each peer.
- One site's ISP applied aggressive UDP rate limiting, breaking WireGuard's keep-alives. We worked around with TCP-tunnelled WireGuard via wireguard-go + custom userspace handler — non-trivial.
Verdict
Right architecture for this client given the Linux operational depth. The ISP-rate-limiting incident would have been opaque on SD-WAN (vendor support ticket) and obvious on WireGuard (engineer with tcpdump). Total project cost €12,000 (engineering + hardware). Annual operating cost €4,000.
Deployment 03: 12-office retail chain (SD-WAN justified)
Client profile: 180-person retail chain, twelve store locations + central office. Each store has dual uplinks (fibre + 4G backup). Critical traffic: POS terminals to central database, in-store WiFi to internet, security cameras to cloud, video calls from manager workstations. Operations team is small and not network-specialised.
What we deployed
Cisco Meraki MX appliances at each site. AutoVPN (Meraki's SD-WAN overlay) for inter-site connectivity. Dual-uplink at every store; sub-second failover from fibre to 4G when fibre degrades. Application-aware routing (POS traffic prioritised over cameras; voice gets DSCP marking).
What worked
- Zero-touch provisioning. The store manager unboxes the Meraki MX, plugs it in, it phones home, downloads config, comes up live. No engineer visit.
- Automated failover caught 47 carrier events in the first year. Average disruption: 2-4 seconds. POS transactions survived.
- Application-aware routing produced measurable Teams call quality improvement during peak store hours
- Central operations dashboard (Meraki cloud) gave the small ops team visibility they would not have otherwise
- 4G fail-over data plan paid for itself in avoided downtime cost
What did not work
- Meraki licensing cost: €1,700/site/year × 13 sites = €22,100/year. Real money.
- Vendor lock-in: switching off Meraki would require replacing the appliances. Not a short-term issue but a strategic concern.
- Meraki's cloud went through a 4-hour control-plane outage in month 7. Data plane stayed up (existing tunnels worked) but the team could not push policy changes. The vendor SLA covered the incident; the operational anxiety was real.
Verdict
Right architecture for this client. The sub-second multi-uplink failover and the zero-touch provisioning were genuinely valuable. The licensing premium was justified by the operational simplicity (the ops team is two people, not five). Total project cost €52,000 (appliances + initial deployment). Annual operating cost €24,500 (licensing + monitoring + occasional re-config).
The cost comparison, honestly
For a 5-site SMB over a 5-year horizon:
| Cost line | IPsec S2S | WireGuard mesh | SD-WAN |
|---|---|---|---|
| CAPEX (initial) | €0-€8,000 (existing firewalls) | €2,500-€5,000 (small Linux gateways) | €12,000-€40,000 (appliances) |
| Engineering setup | €3,000-€8,000 | €5,000-€15,000 | €8,000-€20,000 |
| Annual licensing | €0-€1,500 (firewall renewals) | €0 | €8,000-€40,000 |
| Annual operations | €2,000-€5,000 | €3,000-€8,000 | €2,000-€5,000 |
| 5-year TCO | €13,000-€40,000 | €17,500-€55,000 | €70,000-€265,000 |
SD-WAN's premium is substantial. The premium is justified when the operational simplicity + automated failover + application awareness directly buys you something — usually multi-uplink failover with sub-second cutover, or many sites with limited operational headcount.
The hidden gotchas per architecture
IPsec gotchas
- MTU and fragmentation. IPsec overhead (~70 bytes) plus path MTU discovery issues produce subtle TCP performance problems. Set TCP MSS clamping on the firewall.
- NAT-T inconsistencies between vendors. Two different firewall vendors at endpoints occasionally produce frustrating debug sessions.
- Throughput ceiling on the firewall. Hardware accelerated IPsec on enterprise gear is fine; on commodity gear the CPU saturates at modest throughput.
WireGuard gotchas
- No native dynamic peering. Adding a site requires updating every other site's config. Automation makes this tractable but the operational discipline is non-negotiable.
- UDP-only. Some carrier networks rate-limit or block UDP. The TCP-wrapped variants (Tailscale, NetMaker, custom wireguard-go) handle this but add complexity.
- Key management is your problem. No CA, no certificate revocation. Key rotation is an explicit operational responsibility.
SD-WAN gotchas
- Cloud-control-plane dependency. If the vendor's cloud is unreachable, you cannot push policy changes. Data plane usually keeps working.
- Vendor lock-in is real. Switching SD-WAN vendors is a multi-month project at scale.
- Licensing complexity. Per-appliance + per-tier + per-feature licensing creates surprise costs at renewal time.
- "Application awareness" sometimes misclassifies. The DPI engines are good but not perfect. A custom application can get the wrong QoS class until you train the appliance.
The hybrid pattern (rarely discussed)
For clients with 5-15 sites, a hybrid pattern often wins: SD-WAN at the sites with multi-uplink + critical workloads, IPsec or WireGuard at the sites with single-uplink + tolerant workloads. The site-to-site tunnels terminate on a central concentrator that bridges both worlds.
The pattern adds operational complexity (two technology stacks to maintain) but produces real savings — the SD-WAN licensing applies only where the value is realised, not as a universal tax.
The decision in flowchart form
flowchart TD
A[Start: site-to-site connectivity decision] --> B{Sites > 8?}
B -->|Yes| C{Multi-uplink per site needed?}
C -->|Yes| D[SD-WAN]
C -->|No| E{Operational team has Linux depth?}
B -->|3-8 sites| F{Multi-uplink per site needed?}
F -->|Yes| D
F -->|No| E
E -->|Yes| G[WireGuard mesh]
E -->|No| H[IPsec S2S]
B -->|1-2 sites| H
G --> I[Add Tailscale fallback]
D --> J[Negotiate licensing aggressively]
H --> K[Plan for single-uplink failures]What we would tell our past self
- The throughput numbers in vendor datasheets are theoretical. Real-world IPsec throughput is usually 40-60% of advertised; WireGuard usually hits 80-90%; SD-WAN sits between.
- Operational simplicity has a real Euro value. A small ops team that does not have to debug VPN tunnels saves 5-10 hours/month. Multiply by their fully-loaded cost; SD-WAN's premium often pays for itself.
- Vendor lock-in is a risk you can quantify. The exit cost of an SD-WAN vendor is the replacement appliance cost + 4-8 weeks of engineering. Budget it as a contingency.
- Application-aware routing matters more than people expect. Real-time voice, video, and POS traffic all benefit. The DPI engine being "imperfect" is fine; the queueing it produces is still better than FIFO.
- The "do nothing" option deserves explicit evaluation. Sometimes the right answer is "the sites stay independent, users use ZTNA + cloud apps, no inter-site backbone needed". This is increasingly common as workloads move off-prem.
The one paragraph version
IPsec site-to-site wins for 1-3 stable sites with technical teams and no need for automated multi-uplink failover. WireGuard mesh wins for 3-8 sites with Linux operational depth, where throughput matters and the ops team can handle the manual scaling. SD-WAN wins for 5+ sites with multi-uplink per site, modest operational headcount, and traffic patterns that benefit from application-aware routing. The 5-year TCO ranges from €13k (IPsec, small) to €265k (SD-WAN, large) for the same number of sites; the value of "operational simplicity + automated failover" is what justifies the premium when it exists. Hybrid patterns are under-used and often optimal at mid-scale.
If you want a scoped diagnostic — sites mapped, traffic patterns measured, operational team assessed, decision matrix scored — that is the engagement shape. We deliver multi-site backbone designs under Azure Cloud Infrastructure + Microsoft 365 Management (where the sites mostly consume M365 services) and broader networking under our wider infrastructure practice.