Security & InfrastructurePractitioner05 May 2026· 10 min read

Cloud Cost Optimization: How We Cut a Client's Azure Bill by 42%

How we cut a 200-seat FinTech's Azure run-rate by 42% in six months — the eight specific levers (RIs, right-sizing, SQL elastic pools, storage lifecycle, SSD tier, Sentinel ingestion, egress, auto-shutdown), the tagging programme that made the savings durable, and the operate cadence that kept them.

#Azure #Cost Optimization #Cloud

ITSailor

Senior IT Consultant

A 200-seat EU FinTech engaged us for a FinOps audit. Six months later, their Azure run-rate was down 42% with no production impact. This is the full playbook — the audit, the prioritised backlog, the eight specific levers that moved the number, and the operational discipline that kept the savings from regressing.

Numbers are real. Vendor and persona details are anonymised but the architecture, tooling, and decisions are exactly what we shipped. Total annualised saving: €312,000 against a starting Azure spend of €62,000/month.

The starting state

The client at the start of the engagement:

200 employees, EU-only operation, regulated FinTech
Azure-primary cloud estate (with smaller M365 + small AWS workload)
Monthly Azure spend: €62,000 (annualised €744,000)
Two-year cost growth rate: 87%, faster than revenue growth
Internal team: 1 platform engineer + 1 part-time SRE; no dedicated FinOps capacity
Tooling: Azure Cost Management default reports; no third-party FinOps platform
Tagging: ~40% coverage, inconsistent taxonomy
Reservations / Savings Plans: zero (all consumption pay-as-you-go)
Last serious cost review: 16 months prior

Week 1-2: The audit

The first two weeks were diagnostic, not remedial. The goal: produce a prioritised savings backlog with realisable Euros attached, ranked by impact-per-engineering-hour.

Spend baseline

We ingested 24 months of Azure billing data into a BigQuery warehouse (the client did not have an existing data warehouse for cost data). We loaded the FOCUS-aligned cost and usage report, plus the Azure Resource Graph snapshots for resource metadata.

The first cut analysis surfaced:

Service category	Monthly €	% of total	Note
Virtual Machines	€18,400	30%	Mostly D-series + E-series; some legacy A-series
Azure SQL Database	€9,800	16%	4 production + 7 dev/staging tiers, all on-demand
Storage (Blob + Files + Disks)	€7,100	11%	Heavy on premium SSDs; no lifecycle policies
App Service / Functions	€5,200	8%	Mixed Premium + Standard tiers
Networking (egress, vNET peering, VPN, etc.)	€6,800	11%	Significant cross-region egress
Log Analytics / Sentinel	€4,900	8%	All Analytics tier; no Basic / Archive routing
Backup + Site Recovery	€3,200	5%	Some misconfigured retention policies
Monitoring + Application Insights	€2,600	4%	Long retention on low-value metrics
Other (DNS, KeyVault, etc.)	€4,000	7%	Long tail of small services

Tagging gap analysis

Of 1,847 resources, only 740 had complete required tags (Owner, CostCenter, Environment, Workload). Per-team showback was impossible because 60% of resources could not be attributed.

The prioritised backlog

22 distinct opportunities surfaced, ranked by realisable Euros × effort-to-realise. The top 10 captured 89% of the total opportunity:

Rank	Opportunity	Monthly saving	Effort
1	RI / Savings Plan procurement for steady-state VMs	€5,200	1 week
2	VM right-sizing (D-series → smaller)	€4,300	2 weeks
3	Azure SQL elastic pools + dev-tier downsizing	€3,800	3 weeks
4	Storage lifecycle (Hot → Cool → Archive)	€2,400	2 weeks
5	Premium SSD → Standard SSD for non-perf workloads	€1,900	1 week
6	Sentinel Basic Logs tier for low-value sources	€1,700	2 weeks
7	Egress reduction (cross-region traffic patterns)	€1,800	3 weeks
8	App Service plan consolidation	€1,500	2 weeks
9	Dev/staging environment auto-shutdown	€1,400	1 week
10	Backup retention rationalisation	€900	1 week

Total monthly opportunity in top 10: €24,900. Plus 12 smaller items adding €3,000/month. Total identifiable savings: €27,900/month = €334,800/year. Risked-adjusted: €280,000/year (assuming 84% realisation rate).

Lever 01: Reservation + Savings Plan procurement (€5,200/mo)

The single largest lever, and the lowest effort. Azure offers three commitment tools:

Reserved Instances (1 or 3 year): commit to a specific VM family + region. Up to 72% discount on hourly rate for 3-year commitment.
Savings Plans for Compute (1 or 3 year): commit to a hourly euro amount; auto-applied across VM families and regions. Lower max discount (~65%) but more flexibility.
Azure Hybrid Benefit: use existing Windows Server + SQL Server licenses on Azure VMs to avoid per-VM license cost. Independent of RI / SP.

The procurement decision

We analysed 24 months of historical usage. The steady-state was 22 VMs running 24/7 for ≥18 months. For these, 3-year Reserved Instances were a clear win — 60-72% discount, low risk because the workload had not moved in 18 months.

For the bursty workload (additional 10-15 VMs that ran 8-16 hours/day), 1-year Savings Plans were a better fit — lower commitment risk, still 40-55% discount.

The result

3-year RI commitment on 22 core VMs: €3,400/mo saving. 1-year SP for €18,000 total annual commitment on bursty workload: €1,800/mo saving. Total: €5,200/mo. Recovered the cash outlay (Azure's RI / SP are pre-paid) in month 3.

Lever 02: VM right-sizing (€4,300/mo)

Azure Advisor's right-sizing recommendations are an underrated source. They look at CPU + memory + network + disk utilisation over the last 14 days and recommend smaller SKUs where headroom exceeds 50%.

We pulled Advisor's recommendations, validated against application owners, and executed the safe ones. The conservative cut:

11 D8s_v5 → D4s_v5 (50% smaller; CPU utilisation was ~25%)
6 D16s_v5 → D8s_v5
3 E16s_v5 → E8s_v5 (memory-optimised; same logic)
4 D2s_v5 → B2s (burstable, dev-only workloads)

Total monthly saving: €4,300. Risk: low — Azure Advisor's 14-day analysis caught the actual usage; the underutilisation was real, not seasonal.

The post-resize monitoring

We instrumented every right-sized VM with explicit alerts on CPU sustained above 80% or memory above 85%. The alerts fired twice in the first month (both legitimate workload growth) and we resized those two VMs back up. The other 22 stayed at the new size permanently.

Lever 03: Azure SQL elastic pools (€3,800/mo)

The estate had 7 dev/staging Azure SQL databases each on individual Standard S3 tiers. The combined cost was €3,400/mo. None of them ever hit 30% DTU utilisation.

Moving all 7 into a single Elastic Pool with shared DTU capacity: pool cost €1,400/mo, saving €2,000/mo. Plus pool right-sizing reduced production tier costs by another €1,800/mo (production was on a Premium tier when General Purpose was sufficient).

Lever 04: Storage lifecycle policies (€2,400/mo)

1.2 TB of blob storage on Hot tier with last-access dates >90 days. Configuring lifecycle policies to transition automatically:

Hot → Cool after 30 days of inactivity (Cool is ~50% cheaper)
Cool → Archive after 90 days of inactivity (Archive is ~80% cheaper than Hot)
Delete non-current versions after 60 days

Initial transition: most of the 1.2 TB moved to Cool / Archive over 30 days. Monthly storage cost dropped from €4,100 to €1,700.

Lever 05: Premium SSD → Standard SSD (€1,900/mo)

Premium SSDs (P30, P40 tier) were the default in the original deployment because "we don't want disk to be the bottleneck". Audit showed:

17 disks attached to non-performance-critical VMs (dev, staging, batch processing)
Average IOPS used: 200-400 (well under Standard SSD's 500 IOPS ceiling)

Migrating these 17 to Standard SSD (E-tier), with full performance monitoring for the first month, captured €1,900/mo. No application complained.

Lever 06: Sentinel ingestion tier optimisation (€1,700/mo)

The Sentinel workspace ingested 180 GB/day, all in the Analytics tier (€2.30/GB ingested). The breakdown:

Microsoft 365 audit logs: 35 GB/day — Analytics tier justified (security-critical, used in alerts)
AAD sign-in logs: 25 GB/day — Analytics tier justified (sign-in risk detection)
Firewall logs (rule hits, deny actions): 60 GB/day — moved to Basic Logs tier (€0.15/GB ingested, 70x cheaper)
App Gateway access logs: 25 GB/day — moved to Basic Logs tier
VM performance counters (verbose): 35 GB/day — moved to Basic Logs tier

Plus a 90-day → 30-day retention policy on the Basic Logs tier sources, with longer retention via Archive. Monthly Sentinel cost dropped from €4,900 to €3,200. The security signal stayed; the cost did not.

Lever 07: Egress reduction (€1,800/mo)

Cross-region egress traffic (West Europe ↔ North Europe) was €1,800/mo. Investigation showed two patterns:

Backup traffic crossing regions because the backup vault was provisioned in a different region than the workload
Cross-region replication for redundancy that the workload did not actually require (it was a dev environment)

Re-provisioning the backup vault in the workload's region + removing unnecessary geo-replication for dev: egress dropped to near-zero. The fix took 3 weeks (cutover required workload coordination) but the saving was permanent.

Lever 08: Dev / staging environment auto-shutdown (€1,400/mo)

The dev and staging environments were running 24/7. Engineers were active 9-18 on weekdays (≈40 hours/week of the 168 hours/week the VMs ran).

Implementing Azure DevTest Labs auto-shutdown + Logic App-driven auto-start:

Auto-shutdown at 19:00 daily (Monday-Friday)
Auto-shutdown all weekend
Auto-start at 08:30 Monday-Friday
Engineers can self-serve "keep this VM running" via a Teams bot for occasional out-of-hours work

VM uptime dropped from 168 hours/week to 52.5 hours/week (a 69% reduction). Compute cost on these VMs reduced proportionally. Net saving: €1,400/mo.

The tagging programme that made the savings stick

None of the above is durable without tagging. Without tags, cost-by-team is impossible; without cost-by-team, no team owns the cost; without ownership, the savings drift back over 6-12 months.

We implemented Azure Policy-driven tag enforcement:

Required tags: Owner (email), CostCenter (code), Environment (prod/staging/dev), Workload (canonical name from CMDB)
Azure Policy deny effect: resources cannot be created without all four required tags
Existing untagged resources: backfill workflow over 4 weeks; named owners attest to the tag values
Monthly drift report: any resource that has lost a tag (e.g., manual edit) flags within 24 hours

Result: tag coverage from 40% to 98% within 8 weeks. Per-team showback dashboards became possible. Teams started managing their own cost lines because the data was finally attributable.

The ongoing operate cadence

The audit + remediation captured the savings. The cadence kept them.

Weekly

Cost-anomaly alerts (Azure Cost Management): any service crossing 20% week-over-week increase pages the platform team
RI / SP utilisation report: are we still using the commitments we paid for?
New-resource review: any new resources deployed last week without tags?

Monthly

FinOps council meeting: platform lead + finance + 2-3 team leads. Review savings tracker, identify new opportunities, address anomalies.
Per-team showback report distributed to team leads
Right-sizing recommendations refresh from Azure Advisor

Quarterly

RI / SP procurement review: do current commitments still match usage?
Lifecycle policy audit: are the configured policies still appropriate?
Forecast vs actual: 90-day rolling comparison surfaces drift

Annually

Full architecture review: any new technologies (Azure Container Apps, Serverless SQL, etc.) that would change the cost shape?
Enterprise Agreement renewal preparation

The 6-month outcome

Metric	Before	After 6 months	Change
Monthly Azure spend	€62,000	€36,000	−42%
Annualised cost	€744,000	€432,000	−€312,000
Cost-per-employee	€310/mo	€180/mo	−42%
Tag coverage	40%	98%	+58 points
RI / SP coverage	0%	72%	+72 points
Production incidents from cost changes	—	0	—
FinOps council attendance	—	Monthly, all stakeholders	—

The €312,000 annual saving funded:

One full-time platform engineer (€78,000)
SaaS tooling investments deferred from prior year (€45,000)
Cybersecurity insurance premium reduction the lower-risk posture earned (€12,000)
Cash returned to operations: €177,000

What did not work (the honest section)

Three approaches we attempted and abandoned:

Spot instances for batch workloads. The batch jobs were not interruption-tolerant; the engineering effort to make them interruption-tolerant was greater than the saving on offer.
Aggressive auto-scale of production App Service plans. Scale-up latency caused noticeable response-time degradation under sudden traffic spikes. We backed off to a more conservative scale-out floor.
Container migration of legacy VMs. Two VMs we evaluated for migration to Azure Container Apps would have saved ~€600/mo but required ~€8,000 of engineering work and meaningful regression-test scope. Returned to the future-improvements backlog.

The replicable playbook

The pattern transfers. For any Azure estate >€20,000/mo with no structured FinOps practice, expect:

20-35% identifiable savings on first pass
15-25% realised savings after 6 months
3-6 month payback on the audit + remediation effort
Ongoing 5-10% year-on-year savings through the operate cadence (plus inflation offset)

The pattern fails when: the executive team does not back the FinOps council with authority to make trade-offs; tagging is treated as IT hygiene rather than a financial control; commitments (RI / SP) are made on workloads that turn out to be migrating; or the operate cadence stops happening 3 months in because everyone "got busy".

The one paragraph version

A 200-seat EU FinTech with €62,000/mo Azure spend, no FinOps practice, no tagging discipline, no commitments. Six-month engagement: 2-week audit producing a 22-item backlog ranked by realisable Euros. Top 8 levers (RI/SP procurement, VM right-sizing, Azure SQL elastic pools, storage lifecycle, premium → standard SSD, Sentinel ingestion tier, egress reduction, dev/staging auto-shutdown) captured 89% of the opportunity. Tagging programme (Azure Policy-driven enforcement) made the savings durable. Monthly FinOps council kept the cadence alive. Result: 42% reduction, €312,000/year saved, zero production impact, fully replicable on any Azure estate >€20,000/mo. The pattern fails only when the operating discipline is not in place — and that is what separates a one-time audit from a durable FinOps practice.

If you want the same audit + remediation + operate cadence run against your estate, that is the engagement shape. FinOps & Cost Management covers the multi-cloud + SaaS practice; Azure Cloud Infrastructure covers the Azure-specific landing-zone hardening that makes FinOps controls enforceable. Start with the free Bloodbath Scan — read-only diagnostic, quantified savings plan with realisable Euros attached, delivered as a written report inside 48 hours.