Your DR Site Has a Problem: It Trusts the Same People

I design cloud systems that survive failure. Focused on resilience, real-world recoverability, and the hidden cost of technical debt — turning disaster recovery into a competitive advantage, not a checkbox.
We spend ziljardians on disaster recovery sites.
Secondary regions. Backup systems. Failover procedures. RTO measured in minutes.
And then someone with a compromised Global Admin account deletes everything.
In both regions.
Why?
Because your DR site trusts the same identity provider.
The Scenario Nobody Plans For
Picture this:
It's Friday afternoon. Your monitoring goes crazy. Production is gone. Someone ran:
Get-AzResourceGroup | Remove-AzResourceGroup -Force
Panic. But wait—you have DR!
You failover to secondary region. Services come up. Crisis averted.
Monday morning: You realize the same admin account that nuked production? It had contributor rights to DR too.
The attacker just waited.

Now both regions are gone.
grThe Uncomfortable Truth
We plan for datacenter fires that never happen.
We don't plan for the developer who got phished last Tuesday.
We plan for hardware failures.
We don't plan for the contractor whose last day is Friday and who still has Contributor rights.
We plan for natural disasters.
We don't plan for "I thought I was in DEV"—which happens weekly.
Why?
Because admitting "we need to protect against our own admins" feels wrong.
But your DR plan shouldn't be based on trust. It should be based on blast radius limitation.
The Problem With Shared Identity
Your production and DR share everything that matters:
Same identity provider
Same service principals
Same admin accounts
Same role-based access control
You built geographic redundancy on top of a single point of failure.
Two regions, one set of keys.
Real-World Attack Vectors
Vector 1: Compromised DevOps
Your CI/CD pipeline has a service principal with Contributor rights.
It deploys to production. Automatically. On every commit.
Now imagine: Someone compromises your DevOps account.
They have your service principal credentials.
They can deploy to production. And to DR.
One infrastructure destroy command later, you're explaining to the board why both regions are gone.
Vector 2: The Patient Ransomware
Modern ransomware doesn't encrypt immediately.
It waits. Learns your environment. Finds your backups. Finds your DR.
Then:
Day 1: Encrypts production
You: "No problem, we have DR!" fails over
Day 2: Ransomware encrypts DR
You: "How did they..."
Attacker: "Same credentials work everywhere. Thanks for the geo-redundancy though."
Vector 3: The Friday Deployment
Tired admin. 11 PM. Deployment script.
az group delete --name "test-rg" --yes --no-wait
Except the script had --subscription "all" somewhere in there.
And "all" includes your DR subscription.
Because the same account has access to both.
Weekend ruined.
The "Just Use PIM" Myth
"Use Privileged Identity Management! Just-in-Time access!"
Yes. Do that. But it's not enough.
PIM means:
No standing admin privileges
Time-limited access
Approval workflows
PIM doesn't mean:
Admin can't do damage during their justified access window
Compromised credentials during active session are safe
DevOps service principals are protected
PIM reduces the attack window. It doesn't close it.
What Actually Works: Defense in Depth
Here's what actually works. Not a checklist—a philosophy: make it hard to lose everything at once.
Start with the pipeline. Your DevOps has credentials to production. Fine. But those credentials stop at production. DR subscription? Different keys. Compromised pipeline can't touch it.
Now the backups. Backup vault with Multi-User Authorization. Three people to approve critical operations. One compromised admin can't delete your backups. Ransomware hits a wall.
DR itself stays minimal. Not a hot copy of production—just networking, one database receiving log shipping, enough compute to keep the data flowing. When disaster strikes, you pull infrastructure code from backup (remember, MUA protected) and deploy. Takes longer. But attackers need to compromise multiple systems to get here.
Different service principals everywhere. Prod service principals can't touch DR. DR service principals can't touch prod. Sounds obvious. Almost nobody does it.
Your DR database? Local authentication enabled. Credentials printed, in a safe. If your identity provider burns, the database doesn't care. It keeps accepting log shipping.
And finally: break-glass accounts. Three of them. Printed passwords. Hardware MFA tokens. Physical safe. Any login triggers board-level alert. You use these when everything else has failed. Not for "I forgot my password." Not for Friday deployments.
What This Architecture Protects Against
Compromised DevOps: Can't access DR directly
Compromised Prod Admin: DR has different RBAC
Ransomware: MUA prevents backup deletion, DR isolated
"Oops" moment: Can't accidentally nuke DR
Patient attacker: Multiple systems need compromise
What This DOESN'T Protect Against
Full identity tenant takeover at Global Admin level.
If an attacker gets Global Admin on your identity tenant, they can technically access everything.
But:
MUA still requires 3 people
Break-glass alerts trigger
DB local auth still works
Audit logs show everything
Separate identity tenant for DR would solve this...
But let's be honest: How many organizations will actually do that?
Different tenant = different billing
Different admin portal
Different support contracts
Cross-tenant authentication nightmare
Nobody wants this complexity
This architecture is the realistic middle ground:
80% of security benefit
20% of the complexity
Actually implementable
Let's Talk Money
"But this is expensive!"
Is it though?
Your alternative hot DR setup:
Hot site running 24/7: €50k/month
Same security posture as prod: €0 (because same)
RTO: 15 minutes
Risk: One compromised account = total loss
This architecture:
Minimal DR infrastructure: €5k/month
MUA, separate service principals, layered defense: €10k setup
RTO: 12-48 hours (honest number, not boardroom fantasy)
Risk: Requires compromise of multiple systems
Savings: €45k/month = €540k/year
And you're more secure.
Yeah, it's gonna take 2 days. But here's why that's actually acceptable and way more secure than the alternative.
Before You Start: Just See What You Have
You're not rebuilding this tomorrow.
Start with visibility: who actually has admin access to both prod and DR?
That list will scare you. I promise.
Then MUA on backup vaults—that's your cheapest win.
Then separate the service principals.
Then break-glass accounts.
Then actually test it.
Find what breaks. Fix it.
Repeat until it doesn't hurt anymore.
The Question You Need to Answer
Your board asks: "What's our disaster recovery plan?"
Version A: "We have a hot DR site. RTO is 15 minutes."
Sounds impressive. Until one compromised admin account takes down both regions.
Version B: "We have layered DR. Real RTO is 12-48 hours depending on complexity. It's the 20% cost, 80% benefit approach. But it requires an attacker to compromise DevOps, bypass Multi-User Authorization, get separate DR credentials, and penetrate multiple security layers. Our risk of total loss is significantly lower."
Sounds realistic. And honest.
Which one do you want to say?
The Uncomfortable Truth
Geographic redundancy is not security.
Running the same vulnerable architecture in two regions just means you can lose twice as fast.
Real disaster recovery in 2026 means:
Accept that trust is not a security model
Design for compromise
Protect against your own infrastructure
Make it hard to lose everything at once
So, what's your DR plan protecting against?
Datacenter fires?
Or the more likely scenario: someone with legitimate access making an illegitimate decision?

P.S. If your immediate reaction is "this is too complex"—compare it to explaining to your CEO why both regions got deleted by the same attacker.
P.P.S. If you're thinking "but separate identity tenants would be better"—you're right. But will you actually do it? Or will you keep talking about it in meetings for the next 3 years?
Start with what's realistic. This is realistic.

