Cloud Cost Optimization: How We Cut a Client's Azure Bill by 42%
How we cut a 200-seat FinTech's Azure run-rate by 42% in six months — the eight specific levers (RIs, right-sizing, SQL elastic pools, storage lifecycle, SSD tier, Sentinel ingestion, egress, auto-shutdown), the tagging programme that made the savings durable, and the operate cadence that kept them.
A 200-seat EU FinTech engaged us for a FinOps audit. Six months later, their Azure run-rate was down 42% with no production impact. This is the full playbook — the audit, the prioritised backlog, the eight specific levers that moved the number, and the operational discipline that kept the savings from regressing.
Numbers are real. Vendor and persona details are anonymised but the architecture, tooling, and decisions are exactly what we shipped. Total annualised saving: €312,000 against a starting Azure spend of €62,000/month.
The starting state
The client at the start of the engagement:
- 200 employees, EU-only operation, regulated FinTech
- Azure-primary cloud estate (with smaller M365 + small AWS workload)
- Monthly Azure spend: €62,000 (annualised €744,000)
- Two-year cost growth rate: 87%, faster than revenue growth
- Internal team: 1 platform engineer + 1 part-time SRE; no dedicated FinOps capacity
- Tooling: Azure Cost Management default reports; no third-party FinOps platform
- Tagging: ~40% coverage, inconsistent taxonomy
- Reservations / Savings Plans: zero (all consumption pay-as-you-go)
- Last serious cost review: 16 months prior
Week 1-2: The audit
The first two weeks were diagnostic, not remedial. The goal: produce a prioritised savings backlog with realisable Euros attached, ranked by impact-per-engineering-hour.
Spend baseline
We ingested 24 months of Azure billing data into a BigQuery warehouse (the client did not have an existing data warehouse for cost data). We loaded the FOCUS-aligned cost and usage report, plus the Azure Resource Graph snapshots for resource metadata.
The first cut analysis surfaced:
| Service category | Monthly € | % of total | Note |
|---|---|---|---|
| Virtual Machines | €18,400 | 30% | Mostly D-series + E-series; some legacy A-series |
| Azure SQL Database | €9,800 | 16% | 4 production + 7 dev/staging tiers, all on-demand |
| Storage (Blob + Files + Disks) | €7,100 | 11% | Heavy on premium SSDs; no lifecycle policies |
| App Service / Functions | €5,200 | 8% | Mixed Premium + Standard tiers |
| Networking (egress, vNET peering, VPN, etc.) | €6,800 | 11% | Significant cross-region egress |
| Log Analytics / Sentinel | €4,900 | 8% | All Analytics tier; no Basic / Archive routing |
| Backup + Site Recovery | €3,200 | 5% | Some misconfigured retention policies |
| Monitoring + Application Insights | €2,600 | 4% | Long retention on low-value metrics |
| Other (DNS, KeyVault, etc.) | €4,000 | 7% | Long tail of small services |
Tagging gap analysis
Of 1,847 resources, only 740 had complete required tags (Owner, CostCenter, Environment, Workload). Per-team showback was impossible because 60% of resources could not be attributed.
The prioritised backlog
22 distinct opportunities surfaced, ranked by realisable Euros × effort-to-realise. The top 10 captured 89% of the total opportunity:
| Rank | Opportunity | Monthly saving | Effort |
|---|---|---|---|
| 1 | RI / Savings Plan procurement for steady-state VMs | €5,200 | 1 week |
| 2 | VM right-sizing (D-series → smaller) | €4,300 | 2 weeks |
| 3 | Azure SQL elastic pools + dev-tier downsizing | €3,800 | 3 weeks |
| 4 | Storage lifecycle (Hot → Cool → Archive) | €2,400 | 2 weeks |
| 5 | Premium SSD → Standard SSD for non-perf workloads | €1,900 | 1 week |
| 6 | Sentinel Basic Logs tier for low-value sources | €1,700 | 2 weeks |
| 7 | Egress reduction (cross-region traffic patterns) | €1,800 | 3 weeks |
| 8 | App Service plan consolidation | €1,500 | 2 weeks |
| 9 | Dev/staging environment auto-shutdown | €1,400 | 1 week |
| 10 | Backup retention rationalisation | €900 | 1 week |
Total monthly opportunity in top 10: €24,900. Plus 12 smaller items adding €3,000/month. Total identifiable savings: €27,900/month = €334,800/year. Risked-adjusted: €280,000/year (assuming 84% realisation rate).
Lever 01: Reservation + Savings Plan procurement (€5,200/mo)
The single largest lever, and the lowest effort. Azure offers three commitment tools:
- Reserved Instances (1 or 3 year): commit to a specific VM family + region. Up to 72% discount on hourly rate for 3-year commitment.
- Savings Plans for Compute (1 or 3 year): commit to a hourly euro amount; auto-applied across VM families and regions. Lower max discount (~65%) but more flexibility.
- Azure Hybrid Benefit: use existing Windows Server + SQL Server licenses on Azure VMs to avoid per-VM license cost. Independent of RI / SP.
The procurement decision
We analysed 24 months of historical usage. The steady-state was 22 VMs running 24/7 for ≥18 months. For these, 3-year Reserved Instances were a clear win — 60-72% discount, low risk because the workload had not moved in 18 months.
For the bursty workload (additional 10-15 VMs that ran 8-16 hours/day), 1-year Savings Plans were a better fit — lower commitment risk, still 40-55% discount.
The result
3-year RI commitment on 22 core VMs: €3,400/mo saving. 1-year SP for €18,000 total annual commitment on bursty workload: €1,800/mo saving. Total: €5,200/mo. Recovered the cash outlay (Azure's RI / SP are pre-paid) in month 3.
Lever 02: VM right-sizing (€4,300/mo)
Azure Advisor's right-sizing recommendations are an underrated source. They look at CPU + memory + network + disk utilisation over the last 14 days and recommend smaller SKUs where headroom exceeds 50%.
We pulled Advisor's recommendations, validated against application owners, and executed the safe ones. The conservative cut:
- 11 D8s_v5 → D4s_v5 (50% smaller; CPU utilisation was ~25%)
- 6 D16s_v5 → D8s_v5
- 3 E16s_v5 → E8s_v5 (memory-optimised; same logic)
- 4 D2s_v5 → B2s (burstable, dev-only workloads)
Total monthly saving: €4,300. Risk: low — Azure Advisor's 14-day analysis caught the actual usage; the underutilisation was real, not seasonal.
The post-resize monitoring
We instrumented every right-sized VM with explicit alerts on CPU sustained above 80% or memory above 85%. The alerts fired twice in the first month (both legitimate workload growth) and we resized those two VMs back up. The other 22 stayed at the new size permanently.
Lever 03: Azure SQL elastic pools (€3,800/mo)
The estate had 7 dev/staging Azure SQL databases each on individual Standard S3 tiers. The combined cost was €3,400/mo. None of them ever hit 30% DTU utilisation.
Moving all 7 into a single Elastic Pool with shared DTU capacity: pool cost €1,400/mo, saving €2,000/mo. Plus pool right-sizing reduced production tier costs by another €1,800/mo (production was on a Premium tier when General Purpose was sufficient).
Lever 04: Storage lifecycle policies (€2,400/mo)
1.2 TB of blob storage on Hot tier with last-access dates >90 days. Configuring lifecycle policies to transition automatically:
- Hot → Cool after 30 days of inactivity (Cool is ~50% cheaper)
- Cool → Archive after 90 days of inactivity (Archive is ~80% cheaper than Hot)
- Delete non-current versions after 60 days
Initial transition: most of the 1.2 TB moved to Cool / Archive over 30 days. Monthly storage cost dropped from €4,100 to €1,700.
Lever 05: Premium SSD → Standard SSD (€1,900/mo)
Premium SSDs (P30, P40 tier) were the default in the original deployment because "we don't want disk to be the bottleneck". Audit showed:
- 17 disks attached to non-performance-critical VMs (dev, staging, batch processing)
- Average IOPS used: 200-400 (well under Standard SSD's 500 IOPS ceiling)
Migrating these 17 to Standard SSD (E-tier), with full performance monitoring for the first month, captured €1,900/mo. No application complained.
Lever 06: Sentinel ingestion tier optimisation (€1,700/mo)
The Sentinel workspace ingested 180 GB/day, all in the Analytics tier (€2.30/GB ingested). The breakdown:
- Microsoft 365 audit logs: 35 GB/day — Analytics tier justified (security-critical, used in alerts)
- AAD sign-in logs: 25 GB/day — Analytics tier justified (sign-in risk detection)
- Firewall logs (rule hits, deny actions): 60 GB/day — moved to Basic Logs tier (€0.15/GB ingested, 70x cheaper)
- App Gateway access logs: 25 GB/day — moved to Basic Logs tier
- VM performance counters (verbose): 35 GB/day — moved to Basic Logs tier
Plus a 90-day → 30-day retention policy on the Basic Logs tier sources, with longer retention via Archive. Monthly Sentinel cost dropped from €4,900 to €3,200. The security signal stayed; the cost did not.
Lever 07: Egress reduction (€1,800/mo)
Cross-region egress traffic (West Europe ↔ North Europe) was €1,800/mo. Investigation showed two patterns:
- Backup traffic crossing regions because the backup vault was provisioned in a different region than the workload
- Cross-region replication for redundancy that the workload did not actually require (it was a dev environment)
Re-provisioning the backup vault in the workload's region + removing unnecessary geo-replication for dev: egress dropped to near-zero. The fix took 3 weeks (cutover required workload coordination) but the saving was permanent.
Lever 08: Dev / staging environment auto-shutdown (€1,400/mo)
The dev and staging environments were running 24/7. Engineers were active 9-18 on weekdays (≈40 hours/week of the 168 hours/week the VMs ran).
Implementing Azure DevTest Labs auto-shutdown + Logic App-driven auto-start:
- Auto-shutdown at 19:00 daily (Monday-Friday)
- Auto-shutdown all weekend
- Auto-start at 08:30 Monday-Friday
- Engineers can self-serve "keep this VM running" via a Teams bot for occasional out-of-hours work
VM uptime dropped from 168 hours/week to 52.5 hours/week (a 69% reduction). Compute cost on these VMs reduced proportionally. Net saving: €1,400/mo.
The tagging programme that made the savings stick
None of the above is durable without tagging. Without tags, cost-by-team is impossible; without cost-by-team, no team owns the cost; without ownership, the savings drift back over 6-12 months.
We implemented Azure Policy-driven tag enforcement:
- Required tags: Owner (email), CostCenter (code), Environment (prod/staging/dev), Workload (canonical name from CMDB)
- Azure Policy deny effect: resources cannot be created without all four required tags
- Existing untagged resources: backfill workflow over 4 weeks; named owners attest to the tag values
- Monthly drift report: any resource that has lost a tag (e.g., manual edit) flags within 24 hours
Result: tag coverage from 40% to 98% within 8 weeks. Per-team showback dashboards became possible. Teams started managing their own cost lines because the data was finally attributable.
The ongoing operate cadence
The audit + remediation captured the savings. The cadence kept them.
Weekly
- Cost-anomaly alerts (Azure Cost Management): any service crossing 20% week-over-week increase pages the platform team
- RI / SP utilisation report: are we still using the commitments we paid for?
- New-resource review: any new resources deployed last week without tags?
Monthly
- FinOps council meeting: platform lead + finance + 2-3 team leads. Review savings tracker, identify new opportunities, address anomalies.
- Per-team showback report distributed to team leads
- Right-sizing recommendations refresh from Azure Advisor
Quarterly
- RI / SP procurement review: do current commitments still match usage?
- Lifecycle policy audit: are the configured policies still appropriate?
- Forecast vs actual: 90-day rolling comparison surfaces drift
Annually
- Full architecture review: any new technologies (Azure Container Apps, Serverless SQL, etc.) that would change the cost shape?
- Enterprise Agreement renewal preparation
The 6-month outcome
| Metric | Before | After 6 months | Change |
|---|---|---|---|
| Monthly Azure spend | €62,000 | €36,000 | −42% |
| Annualised cost | €744,000 | €432,000 | −€312,000 |
| Cost-per-employee | €310/mo | €180/mo | −42% |
| Tag coverage | 40% | 98% | +58 points |
| RI / SP coverage | 0% | 72% | +72 points |
| Production incidents from cost changes | — | 0 | — |
| FinOps council attendance | — | Monthly, all stakeholders | — |
The €312,000 annual saving funded:
- One full-time platform engineer (€78,000)
- SaaS tooling investments deferred from prior year (€45,000)
- Cybersecurity insurance premium reduction the lower-risk posture earned (€12,000)
- Cash returned to operations: €177,000
What did not work (the honest section)
Three approaches we attempted and abandoned:
- Spot instances for batch workloads. The batch jobs were not interruption-tolerant; the engineering effort to make them interruption-tolerant was greater than the saving on offer.
- Aggressive auto-scale of production App Service plans. Scale-up latency caused noticeable response-time degradation under sudden traffic spikes. We backed off to a more conservative scale-out floor.
- Container migration of legacy VMs. Two VMs we evaluated for migration to Azure Container Apps would have saved ~€600/mo but required ~€8,000 of engineering work and meaningful regression-test scope. Returned to the future-improvements backlog.
The replicable playbook
The pattern transfers. For any Azure estate >€20,000/mo with no structured FinOps practice, expect:
- 20-35% identifiable savings on first pass
- 15-25% realised savings after 6 months
- 3-6 month payback on the audit + remediation effort
- Ongoing 5-10% year-on-year savings through the operate cadence (plus inflation offset)
The pattern fails when: the executive team does not back the FinOps council with authority to make trade-offs; tagging is treated as IT hygiene rather than a financial control; commitments (RI / SP) are made on workloads that turn out to be migrating; or the operate cadence stops happening 3 months in because everyone "got busy".
The one paragraph version
A 200-seat EU FinTech with €62,000/mo Azure spend, no FinOps practice, no tagging discipline, no commitments. Six-month engagement: 2-week audit producing a 22-item backlog ranked by realisable Euros. Top 8 levers (RI/SP procurement, VM right-sizing, Azure SQL elastic pools, storage lifecycle, premium → standard SSD, Sentinel ingestion tier, egress reduction, dev/staging auto-shutdown) captured 89% of the opportunity. Tagging programme (Azure Policy-driven enforcement) made the savings durable. Monthly FinOps council kept the cadence alive. Result: 42% reduction, €312,000/year saved, zero production impact, fully replicable on any Azure estate >€20,000/mo. The pattern fails only when the operating discipline is not in place — and that is what separates a one-time audit from a durable FinOps practice.
If you want the same audit + remediation + operate cadence run against your estate, that is the engagement shape. FinOps & Cost Management covers the multi-cloud + SaaS practice; Azure Cloud Infrastructure covers the Azure-specific landing-zone hardening that makes FinOps controls enforceable. Start with the free Bloodbath Scan — read-only diagnostic, quantified savings plan with realisable Euros attached, delivered as a written report inside 48 hours.