
1
00:00:00,000 –> 00:00:02,720
But most Azure professionals are learning the wrong skill right now.
2
00:00:02,720 –> 00:00:06,800
They’re chasing certifications in services that become obsolete every 18 months.
3
00:00:06,800 –> 00:00:10,740
They’re memorizing the Azure portal, they’re building expertise in specific workloads,
4
00:00:10,740 –> 00:00:15,560
AKS, functions, synapse, as if mastery of individual services is what the market actually
5
00:00:15,560 –> 00:00:16,560
rewards.
6
00:00:16,560 –> 00:00:17,560
It’s not.
7
00:00:17,560 –> 00:00:20,200
The real market value in 2026 isn’t in knowing Azure.
8
00:00:20,200 –> 00:00:22,520
It’s in preventing Azure from destroying itself.
9
00:00:22,520 –> 00:00:25,760
High income cloud roles aren’t filled by people who can provision resources.
10
00:00:25,760 –> 00:00:28,840
They’re filled by people who prevent the wrong resources from being provisioned in the
11
00:00:28,840 –> 00:00:29,960
first place.
12
00:00:29,960 –> 00:00:34,240
The skill that compounds, the one that gets more valuable every year instead of less,
13
00:00:34,240 –> 00:00:37,160
is the ability to architect governance frameworks that scale.
14
00:00:37,160 –> 00:00:39,000
To design systems that don’t erode.
15
00:00:39,000 –> 00:00:42,880
To codify intent in a way that makes human oversight unnecessary because the architecture
16
00:00:42,880 –> 00:00:44,720
itself enforces what should happen.
17
00:00:44,720 –> 00:00:48,120
This is what separates the six-figure architects from the mid-level engineers who are still
18
00:00:48,120 –> 00:00:49,640
clicking buttons in the portal.
19
00:00:49,640 –> 00:00:51,600
This episode explains why.
20
00:00:51,600 –> 00:00:53,240
The fundamental misunderstanding.
21
00:00:53,240 –> 00:00:56,440
Why most Azure architects are already obsolete?
22
00:00:56,440 –> 00:00:58,360
Organizations treat Azure like a service catalog.
23
00:00:58,360 –> 00:01:00,360
You need compute, you pick a VM size.
24
00:01:00,360 –> 00:01:02,080
You need storage, you pick a tier.
25
00:01:02,080 –> 00:01:04,080
You need networking, you configure a subnet.
26
00:01:04,080 –> 00:01:07,120
It’s transactional, it’s reactive, it’s completely wrong.
27
00:01:07,120 –> 00:01:10,200
What they’re actually operating is a distributed decision engine.
28
00:01:10,200 –> 00:01:15,640
Every policy exception, every manual override, every justice-wants decision converts deterministic
29
00:01:15,640 –> 00:01:18,360
security into probabilistic chaos.
30
00:01:18,360 –> 00:01:21,440
Most Azure architects don’t understand this distinction and that’s why they’re already
31
00:01:21,440 –> 00:01:22,440
obsolete.
32
00:01:22,440 –> 00:01:26,800
The gap between knowing Azure services and architecting systems that don’t erode is widening
33
00:01:26,800 –> 00:01:29,280
faster than most professionals realize.
34
00:01:29,280 –> 00:01:30,520
It’s not a small gap anymore.
35
00:01:30,520 –> 00:01:31,520
It’s a chasm.
36
00:01:31,520 –> 00:01:34,000
On one side are the people who understand how to use Azure.
37
00:01:34,000 –> 00:01:37,760
On the other side are the people who understand how to prevent Azure from being misused at
38
00:01:37,760 –> 00:01:38,760
scale.
39
00:01:38,760 –> 00:01:40,960
The second group makes significantly more money.
40
00:01:40,960 –> 00:01:42,800
They also keep their jobs when things go wrong.
41
00:01:42,800 –> 00:01:44,040
Here’s why this matters.
42
00:01:44,040 –> 00:01:47,960
When you operate at scale, when you have hundreds of subscriptions, thousands of resources,
43
00:01:47,960 –> 00:01:51,320
dozens of teams, all provisioning infrastructure simultaneously.
44
00:01:51,320 –> 00:01:53,760
Human oversight becomes mathematically impossible.
45
00:01:53,760 –> 00:01:55,840
You cannot manually review every deployment.
46
00:01:55,840 –> 00:01:57,600
You cannot audit every permission assignment.
47
00:01:57,600 –> 00:02:01,560
You cannot catch every configuration drift before it becomes a security incident.
48
00:02:01,560 –> 00:02:04,280
The only way to maintain control is through architecture.
49
00:02:04,280 –> 00:02:08,560
Through policy, through code that enforces what should happen before humans ever have the
50
00:02:08,560 –> 00:02:09,840
chance to make a mistake.
51
00:02:09,840 –> 00:02:12,200
The certifications teach you what Azure can do.
52
00:02:12,200 –> 00:02:13,480
They teach you the feature set.
53
00:02:13,480 –> 00:02:15,280
They teach you the capabilities.
54
00:02:15,280 –> 00:02:19,400
What they don’t teach you, what they fundamentally cannot teach you is what Azure should do
55
00:02:19,400 –> 00:02:23,520
given your constraints, given your risk tolerance, given your regulatory requirements,
56
00:02:23,520 –> 00:02:25,160
given your organizational culture.
57
00:02:25,160 –> 00:02:26,480
It’s the skill that matters.
58
00:02:26,480 –> 00:02:27,760
That’s the skill that scares.
59
00:02:27,760 –> 00:02:29,320
That’s the skill that compounds.
60
00:02:29,320 –> 00:02:32,920
Most as your architects are already obsolete because they’re still thinking like infrastructure
61
00:02:32,920 –> 00:02:33,920
engineers.
62
00:02:33,920 –> 00:02:36,400
They’re still thinking in terms of resources and configurations.
63
00:02:36,400 –> 00:02:39,320
They’re not thinking in terms of control planes.
64
00:02:39,320 –> 00:02:41,280
They’re not thinking in terms of erosion.
65
00:02:41,280 –> 00:02:44,920
They’re not thinking in terms of how to make the system enforce its own rules without human
66
00:02:44,920 –> 00:02:45,920
intervention.
67
00:02:45,920 –> 00:02:47,360
The uncomfortable truth is this.
68
00:02:47,360 –> 00:02:50,920
If your governance depends on humans to enforce it, it’s already failing.
69
00:02:50,920 –> 00:02:54,800
Somewhere right now, someone is bypassing your policies because they’re in a hurry.
70
00:02:54,800 –> 00:02:58,560
Someone is creating a resource that violates your tagging standards because they forgot.
71
00:02:58,560 –> 00:03:01,960
Someone is assigning permissions that are too broad because the alternative would require
72
00:03:01,960 –> 00:03:03,640
a conversation with security.
73
00:03:03,640 –> 00:03:05,520
These aren’t failures of individual judgment.
74
00:03:05,520 –> 00:03:07,240
These are failures of architecture.
75
00:03:07,240 –> 00:03:11,240
And if your architecture depends on perfect human behavior, your architecture is broken.
76
00:03:11,240 –> 00:03:16,400
The high income roles in 2026 aren’t filled by people who understand every Azure service.
77
00:03:16,400 –> 00:03:19,880
They’re filled by people who understand how to design systems that make it impossible
78
00:03:19,880 –> 00:03:21,120
to do the wrong thing.
79
00:03:21,120 –> 00:03:24,520
People who can look at an organization’s chaos and see where the control plane is breaking
80
00:03:24,520 –> 00:03:25,520
down.
81
00:03:25,520 –> 00:03:29,120
People who can codify governance in a way that scales to hundreds of teams without requiring
82
00:03:29,120 –> 00:03:32,080
a governance team to manually review every decision.
83
00:03:32,080 –> 00:03:33,160
That skill is rare.
84
00:03:33,160 –> 00:03:34,280
That skill is valuable.
85
00:03:34,280 –> 00:03:36,960
That skill is what this episode is about.
86
00:03:36,960 –> 00:03:38,640
What cloud erosion actually means?
87
00:03:38,640 –> 00:03:43,160
Cloud erosion is the inevitable drift between intended state and actual state as organization’s
88
00:03:43,160 –> 00:03:44,160
scale.
89
00:03:44,160 –> 00:03:45,160
It’s not a bug.
90
00:03:45,160 –> 00:03:46,400
It’s not a failure of specific people or teams.
91
00:03:46,400 –> 00:03:48,080
It’s a mathematical inevitability.
92
00:03:48,080 –> 00:03:51,800
And if you don’t architect against it, it will destroy your infrastructure from the inside
93
00:03:51,800 –> 00:03:52,800
out.
94
00:03:52,800 –> 00:03:54,240
Here’s what erosion looks like in practice.
95
00:03:54,240 –> 00:03:58,120
You define a policy that says all storage accounts must have encryption enabled.
96
00:03:58,120 –> 00:03:59,880
For the first month, it’s true.
97
00:03:59,880 –> 00:04:01,400
Every storage account has encryption.
98
00:04:01,400 –> 00:04:03,200
Then a team needs to move fast on a project.
99
00:04:03,200 –> 00:04:06,920
They create a storage account without encryption because the alternative would require waiting
100
00:04:06,920 –> 00:04:07,920
for approval.
101
00:04:07,920 –> 00:04:09,080
They’re planning to enable it later.
102
00:04:09,080 –> 00:04:10,080
They never do.
103
00:04:10,080 –> 00:04:12,320
Now your policy is violated, but it’s just one storage account.
104
00:04:12,320 –> 00:04:13,320
It’s not a big deal.
105
00:04:13,320 –> 00:04:15,080
Except it is because now there’s precedent.
106
00:04:15,080 –> 00:04:19,120
Now the next team that needs to move fast knows it’s possible to bypass the policy.
107
00:04:19,120 –> 00:04:20,120
And they do.
108
00:04:20,120 –> 00:04:21,120
And the next team does.
109
00:04:21,120 –> 00:04:24,520
Within six months, 30% of your storage accounts don’t have encryption.
110
00:04:24,520 –> 00:04:25,760
Your policy still exists.
111
00:04:25,760 –> 00:04:27,040
It’s still in audit mode.
112
00:04:27,040 –> 00:04:29,080
It’s still being violated constantly.
113
00:04:29,080 –> 00:04:32,880
But nobody’s paying attention anymore because the violations are so common that they’ve become
114
00:04:32,880 –> 00:04:33,880
invisible.
115
00:04:33,880 –> 00:04:34,880
That’s erosion.
116
00:04:34,880 –> 00:04:35,880
It’s not a dramatic failure.
117
00:04:35,880 –> 00:04:40,040
It’s a slow drift where the gap between what you intended and what actually exists grows
118
00:04:40,040 –> 00:04:44,160
wider every single day until one day you run a compliance audit and realize you have
119
00:04:44,160 –> 00:04:47,800
no idea what your actual security posture is.
120
00:04:47,800 –> 00:04:49,680
The distinction that matters is this.
121
00:04:49,680 –> 00:04:53,040
The governance that depends on humans to enforce it is already failing.
122
00:04:53,040 –> 00:04:54,040
Not eventually.
123
00:04:54,040 –> 00:04:57,680
Right now, somewhere in your organization, someone is bypassing a policy because they’re
124
00:04:57,680 –> 00:04:58,680
in a hurry.
125
00:04:58,680 –> 00:05:01,600
Somewhere a permission is too broad because nobody reviewed it carefully.
126
00:05:01,600 –> 00:05:05,040
Somewhere a resource is misconfigured because the person who created it didn’t understand
127
00:05:05,040 –> 00:05:06,040
the requirement.
128
00:05:06,040 –> 00:05:08,040
These aren’t failures of individual competence.
129
00:05:08,040 –> 00:05:09,600
They’re failures of architecture.
130
00:05:09,600 –> 00:05:11,640
Cloud erosion has three primary drivers.
131
00:05:11,640 –> 00:05:12,960
The first is velocity.
132
00:05:12,960 –> 00:05:14,600
Teams move faster than policy can adapt.
133
00:05:14,600 –> 00:05:17,840
You create a policy and by the time it’s fully deployed, the business has already moved
134
00:05:17,840 –> 00:05:18,840
onto the next problem.
135
00:05:18,840 –> 00:05:20,600
The second driver is complexity.
136
00:05:20,600 –> 00:05:22,640
More services create more decision points.
137
00:05:22,640 –> 00:05:25,400
More decision points create more opportunities for drift.
138
00:05:25,400 –> 00:05:27,720
The third driver is incentive misalignment.
139
00:05:27,720 –> 00:05:29,360
Builders are rewarded for speed.
140
00:05:29,360 –> 00:05:31,160
Security is rewarded for compliance.
141
00:05:31,160 –> 00:05:33,360
Finance is rewarded for cost optimization.
142
00:05:33,360 –> 00:05:36,800
When these incentives conflict and they always do, people optimize for what they’re measured
143
00:05:36,800 –> 00:05:39,400
on, not for what’s best for the system as a whole.
144
00:05:39,400 –> 00:05:41,800
Now add AI to this equation.
145
00:05:41,800 –> 00:05:44,680
Autonomous agents make decisions at machine speed.
146
00:05:44,680 –> 00:05:47,000
They can make thousands of decisions per second.
147
00:05:47,000 –> 00:05:49,920
But those decisions aren’t pre-constrained by architecture.
148
00:05:49,920 –> 00:05:53,000
Failures propagate exponentially faster than humans can detect them.
149
00:05:53,000 –> 00:05:58,120
A single misconfigured agent with over-privileged identity permissions can exfiltrate data, modify
150
00:05:58,120 –> 00:06:02,680
systems or trigger cost explosions faster than any human can notice something’s wrong.
151
00:06:02,680 –> 00:06:05,800
By the time you realize the agent is behaving badly, the damage is done.
152
00:06:05,800 –> 00:06:07,560
The uncomfortable truth is this.
153
00:06:07,560 –> 00:06:11,000
Most as your environments are already in advanced erosion, they just don’t know it yet.
154
00:06:11,000 –> 00:06:12,000
You can measure it.
155
00:06:12,000 –> 00:06:15,320
Policy compliance rates below 85% indicate erosion.
156
00:06:15,320 –> 00:06:18,360
Carback assignments that can’t be audited indicate erosion.
157
00:06:18,360 –> 00:06:22,200
Cost forecasts that diverge from actuals by more than 15% indicate erosion.
158
00:06:22,200 –> 00:06:26,200
When you see these signals, what you’re actually seeing is the gap between intended state and
159
00:06:26,200 –> 00:06:27,200
actual state.
160
00:06:27,200 –> 00:06:30,520
You’re seeing the architecture failing to enforce what should happen.
161
00:06:30,520 –> 00:06:34,080
The organizations that understand this are the ones that are winning in 2026.
162
00:06:34,080 –> 00:06:37,680
They’re not trying to prevent erosion through better training or stricter reviews.
163
00:06:37,680 –> 00:06:41,080
Their designing systems where erosion is architecturally impossible.
164
00:06:41,080 –> 00:06:43,280
Where the system itself enforces what should happen.
165
00:06:43,280 –> 00:06:47,760
A human oversight becomes a safety net instead of the primary control mechanism.
166
00:06:47,760 –> 00:06:48,760
That’s the shift.
167
00:06:48,760 –> 00:06:51,240
That’s what separates the six-figure architects from everyone else.
168
00:06:51,240 –> 00:06:54,600
The ability to look at an organization’s chaos and see where the control plane is breaking
169
00:06:54,600 –> 00:06:55,600
down.
170
00:06:55,600 –> 00:06:58,360
The ability to design systems that don’t erode because they can’t erode.
171
00:06:58,360 –> 00:07:02,080
The ability to codify governance in a way that makes human failure irrelevant because
172
00:07:02,080 –> 00:07:04,120
the architecture itself prevents it.
173
00:07:04,120 –> 00:07:06,120
The three layers of architectural control.
174
00:07:06,120 –> 00:07:09,280
There are three layers where governance actually happens in Azure.
175
00:07:09,280 –> 00:07:12,320
Understanding these layers is the difference between architects who prevent erosion and
176
00:07:12,320 –> 00:07:15,280
architectural architects who react to it after the damage is done.
177
00:07:15,280 –> 00:07:16,760
Layer one is identity and access.
178
00:07:16,760 –> 00:07:17,760
This is enter ID.
179
00:07:17,760 –> 00:07:19,560
This is where you decide who can do what.
180
00:07:19,560 –> 00:07:23,600
And this is where most organizations fail catastrophically because they treat identity as
181
00:07:23,600 –> 00:07:25,680
a user problem instead of a system problem.
182
00:07:25,680 –> 00:07:27,320
They think about humans logging in.
183
00:07:27,320 –> 00:07:31,640
They don’t think about the fact that non-human identities now outnumber human identities
184
00:07:31,640 –> 00:07:33,200
in most enterprises.
185
00:07:33,200 –> 00:07:34,200
Service principles.
186
00:07:34,200 –> 00:07:35,200
Managed identities.
187
00:07:35,200 –> 00:07:36,200
AI agents.
188
00:07:36,200 –> 00:07:37,200
These aren’t people.
189
00:07:37,200 –> 00:07:38,200
They don’t need passwords.
190
00:07:38,200 –> 00:07:39,200
They don’t need MFA.
191
00:07:39,200 –> 00:07:40,680
They need least privilege by default.
192
00:07:40,680 –> 00:07:42,200
They need just in time elevation.
193
00:07:42,200 –> 00:07:45,600
They need immutable audit trails that record every single action they take.
194
00:07:45,600 –> 00:07:48,080
Here’s the architecture that works at this layer.
195
00:07:48,080 –> 00:07:50,960
Every non-human identity gets a distinct service principle.
196
00:07:50,960 –> 00:07:53,320
Every service principle gets scoped permissions.
197
00:07:53,320 –> 00:07:56,400
Not broad roles, but specific permissions for specific resources.
198
00:07:56,400 –> 00:08:00,280
Every elevated operation requires explicit justification and approval.
199
00:08:00,280 –> 00:08:04,000
Every action gets logged in a way that cannot be modified after the fact that this is the
200
00:08:04,000 –> 00:08:05,160
first control plane.
201
00:08:05,160 –> 00:08:08,360
If identity is compromised, all downstream controls fail.
202
00:08:08,360 –> 00:08:10,320
So this layer has to be airtight.
203
00:08:10,320 –> 00:08:12,080
The layer 2 is policy and compliance.
204
00:08:12,080 –> 00:08:13,240
This is Azure Policy.
205
00:08:13,240 –> 00:08:17,280
This is where you prevent bad decisions from reaching infrastructure in the first place.
206
00:08:17,280 –> 00:08:20,360
Most organizations use Azure Policy in audit mode.
207
00:08:20,360 –> 00:08:24,000
They deploy a policy that says all storage accounts must have encryption enabled and set
208
00:08:24,000 –> 00:08:25,000
it to audit.
209
00:08:25,000 –> 00:08:26,000
The policy fires.
210
00:08:26,000 –> 00:08:27,000
It logs violations.
211
00:08:27,000 –> 00:08:28,520
It creates visibility.
212
00:08:28,520 –> 00:08:32,000
But it doesn’t actually stop anyone from creating unencrypted storage accounts.
213
00:08:32,000 –> 00:08:33,000
That’s not governance.
214
00:08:33,000 –> 00:08:34,400
That’s theatre.
215
00:08:34,400 –> 00:08:36,200
Real governance happens in deny mode.
216
00:08:36,200 –> 00:08:40,920
A policy in deny mode says you cannot create this resource because it violates our requirements.
217
00:08:40,920 –> 00:08:42,080
The deployment fails.
218
00:08:42,080 –> 00:08:43,920
The resource never gets created.
219
00:08:43,920 –> 00:08:47,720
The person who tried to create it learns immediately that this isn’t allowed.
220
00:08:47,720 –> 00:08:50,440
This is where the architecture actually enforces what should happen.
221
00:08:50,440 –> 00:08:51,440
But here’s the hard part.
222
00:08:51,440 –> 00:08:53,400
Deny mode policies break things.
223
00:08:53,400 –> 00:08:54,400
They break workflows.
224
00:08:54,400 –> 00:08:55,400
They slow down teams.
225
00:08:55,400 –> 00:08:57,840
So most organizations are afraid to use them.
226
00:08:57,840 –> 00:08:59,840
They stay in audit mode forever.
227
00:08:59,840 –> 00:09:03,120
Watching violations accumulate, telling themselves they’ll tighten it up later.
228
00:09:03,120 –> 00:09:04,040
They never do.
229
00:09:04,040 –> 00:09:07,920
The scaling problem at this layer is that policy exceptions accumulate faster than policy
230
00:09:07,920 –> 00:09:08,920
rules.
231
00:09:08,920 –> 00:09:10,520
Every exception is governance dead.
232
00:09:10,520 –> 00:09:13,520
Every exception is a signal that your policy isn’t quite right.
233
00:09:13,520 –> 00:09:16,320
But instead of fixing the policy, teams just add exceptions.
234
00:09:16,320 –> 00:09:20,520
This team needs to create unencrypted storage accounts for testing purposes.
235
00:09:20,520 –> 00:09:21,720
So you add an exemption.
236
00:09:21,720 –> 00:09:23,560
Then another team leads the same exemption.
237
00:09:23,560 –> 00:09:27,000
Then another within a year your exemption list is longer than your policy list.
238
00:09:27,000 –> 00:09:28,520
Your framework becomes unmentainable.
239
00:09:28,520 –> 00:09:30,440
Layer 3 is operational enforcement.
240
00:09:30,440 –> 00:09:31,440
This is CICD gates.
241
00:09:31,440 –> 00:09:32,480
This is cost controls.
242
00:09:32,480 –> 00:09:33,680
This is drift detection.
243
00:09:33,680 –> 00:09:37,160
This is the systems that catch what the other two layers miss.
244
00:09:37,160 –> 00:09:40,200
Governance that isn’t automated is governance that isn’t enforced.
245
00:09:40,200 –> 00:09:44,560
Cost controls that depend on manual review are cost controls that fail at scale.
246
00:09:44,560 –> 00:09:45,560
Drift detection.
247
00:09:45,560 –> 00:09:50,400
The practice of continuously comparing actual state to intended state and flagging divergence
248
00:09:50,400 –> 00:09:54,040
is the only way to catch the erosion that happens between deployments.
249
00:09:54,040 –> 00:09:57,480
The hardest part of this layer is that it requires discipline across teams.
250
00:09:57,480 –> 00:09:59,840
It requires discipline in your CICD pipelines.
251
00:09:59,840 –> 00:10:02,960
It requires discipline in how you define intended state.
252
00:10:02,960 –> 00:10:06,600
It requires discipline in how you respond when drift is detected.
253
00:10:06,600 –> 00:10:07,600
Discipline is expensive.
254
00:10:07,600 –> 00:10:09,080
Discipline is uncomfortable.
255
00:10:09,080 –> 00:10:12,160
But discipline is the only thing that prevents erosion at scale.
256
00:10:12,160 –> 00:10:13,880
These three layers work together.
257
00:10:13,880 –> 00:10:15,800
Identity prevents unauthorized access.
258
00:10:15,800 –> 00:10:18,640
Policy prevents bad configurations from being deployed.
259
00:10:18,640 –> 00:10:21,320
Operational enforcement catches what slips through the cracks.
260
00:10:21,320 –> 00:10:22,800
None of them work in isolation.
261
00:10:22,800 –> 00:10:24,040
All three have to be in place.
262
00:10:24,040 –> 00:10:25,640
All three have to be enforced.
263
00:10:25,640 –> 00:10:29,600
And all three have to be continuously monitored and adjusted as the organization changes.
264
00:10:29,600 –> 00:10:34,000
This is what separates the architects who prevent erosion from the architects who react to it.
265
00:10:34,000 –> 00:10:36,600
The ones who understand that governance isn’t a single control.
266
00:10:36,600 –> 00:10:38,680
It’s a system of controls working together.
267
00:10:38,680 –> 00:10:41,400
Each one compensating for the limitations of the others.
268
00:10:41,400 –> 00:10:45,920
Each one enforcing what should happen at a different point in the infrastructure life cycle.
269
00:10:45,920 –> 00:10:48,480
Why AI amplifies every governance mistake?
270
00:10:48,480 –> 00:10:50,440
AI agents operate at machine speed.
271
00:10:50,440 –> 00:10:52,760
They can make thousands of decisions per second.
272
00:10:52,760 –> 00:10:57,360
If those decisions aren’t pre-constrained by architecture, failures propagate exponentially.
273
00:10:57,360 –> 00:11:01,160
This is the critical insight that most organizations haven’t internalized yet.
274
00:11:01,160 –> 00:11:05,440
They are deploying AI agents into environments with governance frameworks designed for humans.
275
00:11:05,440 –> 00:11:09,240
And those frameworks are about to break under the weight of machine speed decision making.
276
00:11:09,240 –> 00:11:11,040
Here’s the distinction that matters.
277
00:11:11,040 –> 00:11:13,200
Traditional infrastructure is deterministic.
278
00:11:13,200 –> 00:11:17,320
If you provision a virtual machine with a specific configuration, you get that configuration.
279
00:11:17,320 –> 00:11:18,440
The outcome is predictable.
280
00:11:18,440 –> 00:11:19,600
You can reason about it.
281
00:11:19,600 –> 00:11:20,600
You can audit it.
282
00:11:20,600 –> 00:11:22,560
But AI introduces probabilistic layers.
283
00:11:22,560 –> 00:11:26,200
If you ask an agent to do something, it might do it one way or it might do it another way.
284
00:11:26,200 –> 00:11:29,080
Or it might do something slightly different that you didn’t anticipate.
285
00:11:29,080 –> 00:11:30,280
The agent isn’t malicious.
286
00:11:30,280 –> 00:11:33,320
It’s just operating probabilistically instead of deterministically.
287
00:11:33,320 –> 00:11:38,200
And if that probabilistic behavior isn’t constrained by architecture, it becomes chaos at scale.
288
00:11:38,200 –> 00:11:44,200
Most organizations still share human credentials with AI agents because they don’t have formal agent identity frameworks.
289
00:11:44,200 –> 00:11:45,520
Think about what that means.
290
00:11:45,520 –> 00:11:48,760
An AI agent is using the same identity as a human employee.
291
00:11:48,760 –> 00:11:53,520
The audit trail doesn’t distinguish between actions taken by the human and actions taken by the agent.
292
00:11:53,520 –> 00:11:56,400
If the agent does something wrong, you can’t tell who’s responsible.
293
00:11:56,400 –> 00:12:01,120
If the agent gets compromised, the attacker has access to everything the human has access to.
294
00:12:01,120 –> 00:12:02,680
This isn’t a governance framework.
295
00:12:02,680 –> 00:12:05,640
This is a security disaster waiting to happen.
296
00:12:05,640 –> 00:12:08,640
Entra agent ID is Microsoft’s answer to this problem.
297
00:12:08,640 –> 00:12:13,800
It gives AI agents distinct identities with scoped permissions, audit trails, and life cycle management.
298
00:12:13,800 –> 00:12:16,000
But most organizations haven’t implemented it yet.
299
00:12:16,000 –> 00:12:19,520
They’re still in the credential sharing phase, which means they’re running their infrastructure
300
00:12:19,520 –> 00:12:22,040
on shared credentials and hoping nobody notices.
301
00:12:22,040 –> 00:12:23,720
Here’s the real cost of this approach.
302
00:12:23,720 –> 00:12:30,760
An AI agent with over-privileged identity permissions can ex-filter a data, modify systems, or trigger cost explosions
303
00:12:30,760 –> 00:12:32,840
faster than any human can detect it.
304
00:12:32,840 –> 00:12:37,880
A single misconfigured agent can generate thousands of dollars in unexpected compute costs in minutes.
305
00:12:37,880 –> 00:12:41,920
Not through malice, not through compromise, just through the normal operation of an agent
306
00:12:41,920 –> 00:12:45,240
that’s been given too much permission and is operating at machine speed.
307
00:12:45,240 –> 00:12:48,520
The cost amplification problem is particularly acute with retry loops.
308
00:12:48,520 –> 00:12:51,000
An agent retries a failed operation automatically.
309
00:12:51,000 –> 00:12:55,480
If that retry isn’t bounded, a single misconfigured agent can generate exponential costs.
310
00:12:55,480 –> 00:12:57,280
The agent tries to execute something.
311
00:12:57,280 –> 00:12:58,280
It fails.
312
00:12:58,280 –> 00:12:59,280
It retries.
313
00:12:59,280 –> 00:13:00,280
It fails again.
314
00:13:00,280 –> 00:13:01,280
It retries again.
315
00:13:01,280 –> 00:13:03,080
Within minutes, you’ve got thousands of retry attempts.
316
00:13:03,080 –> 00:13:04,600
Each one consuming resources.
317
00:13:04,600 –> 00:13:06,280
Each one accumulating costs.
318
00:13:06,280 –> 00:13:09,080
By the time you notice something’s wrong, the damage is done.
319
00:13:09,080 –> 00:13:13,800
The governance patterns that work at this layer are pre-execution gates that validate agent
320
00:13:13,800 –> 00:13:16,600
decisions before they’re allowed to execute.
321
00:13:16,600 –> 00:13:19,760
Cost estimators that block operations exceeding thresholds.
322
00:13:19,760 –> 00:13:23,360
Unutable logs that record every agent action, these aren’t optional, these aren’t nice
323
00:13:23,360 –> 00:13:27,920
to have, these are architectural requirements for running AI agents safely at scale.
324
00:13:27,920 –> 00:13:29,680
The uncomfortable truth is this.
325
00:13:29,680 –> 00:13:32,600
Most organizations don’t have formal agent identity governance yet.
326
00:13:32,600 –> 00:13:37,000
They’re running their AI infrastructure on shared credentials, which means they’re operating
327
00:13:37,000 –> 00:13:41,600
in a state where a single misconfigured agent or compromised credential can cause exponential
328
00:13:41,600 –> 00:13:42,600
damage.
329
00:13:42,600 –> 00:13:46,760
They’re deploying AI into governance frameworks that were designed for humans, not machines.
330
00:13:46,760 –> 00:13:50,480
And those frameworks are about to fail under the weight of machine speed decision making.
331
00:13:50,480 –> 00:13:54,280
The organizations that understand this, that are building agent identity frameworks now,
332
00:13:54,280 –> 00:13:59,320
that are implementing pre-execution gates that are treating agent governance as a first-class
333
00:13:59,320 –> 00:14:01,240
architectural concern.
334
00:14:01,240 –> 00:14:04,080
Those organizations are going to win in 2026.
335
00:14:04,080 –> 00:14:08,840
Everyone else is going to have incidents they don’t understand and costs they can’t explain.
336
00:14:08,840 –> 00:14:11,120
The shift from click-ups to governance as code.
337
00:14:11,120 –> 00:14:13,880
Click-ups is what most Azure environments are built on right now.
338
00:14:13,880 –> 00:14:15,040
You open the Azure portal.
339
00:14:15,040 –> 00:14:16,320
You click through the UI.
340
00:14:16,320 –> 00:14:18,040
You configure resources one at a time.
341
00:14:18,040 –> 00:14:19,440
You create policies by hand.
342
00:14:19,440 –> 00:14:21,160
You assign permissions through the console.
343
00:14:21,160 –> 00:14:22,560
It works at small scale.
344
00:14:22,560 –> 00:14:25,520
It works when you have five subscriptions and one team.
345
00:14:25,520 –> 00:14:27,880
It fails catastrophically at enterprise scale.
346
00:14:27,880 –> 00:14:29,400
Every click is a decision point.
347
00:14:29,400 –> 00:14:33,920
Every decision made through the portal isn’t auditable, isn’t reproducible and isn’t scalable.
348
00:14:33,920 –> 00:14:34,920
You can’t version it.
349
00:14:34,920 –> 00:14:36,880
You can’t review it through a pull request.
350
00:14:36,880 –> 00:14:38,800
You can’t test it before it goes to production.
351
00:14:38,800 –> 00:14:40,640
You can’t roll it back if something goes wrong.
352
00:14:40,640 –> 00:14:42,840
You just have a resource in a certain state.
353
00:14:42,840 –> 00:14:47,400
And if you want to know why it’s in that state, you have to ask the person who clicked the buttons.
354
00:14:47,400 –> 00:14:50,840
If that person left the company six months ago, you’re out of luck.
355
00:14:50,840 –> 00:14:53,160
Infrastructure as code solves part of this problem.
356
00:14:53,160 –> 00:14:57,760
You define your infrastructure in code, bicep, terraform, AIM templates.
357
00:14:57,760 –> 00:14:58,960
And you version that code.
358
00:14:58,960 –> 00:15:00,000
You can review changes.
359
00:15:00,000 –> 00:15:01,000
You can track history.
360
00:15:01,000 –> 00:15:04,160
You can reproduce the exact same infrastructure in a different environment.
361
00:15:04,160 –> 00:15:05,800
You can roll back if something breaks.
362
00:15:05,800 –> 00:15:08,160
This is a massive improvement over click-ups.
363
00:15:08,160 –> 00:15:11,080
Most serious organizations have moved to IAC by now.
364
00:15:11,080 –> 00:15:13,200
But IAC solves the reproducibility problem.
365
00:15:13,200 –> 00:15:14,920
It doesn’t solve the governance problem.
366
00:15:14,920 –> 00:15:15,880
Here’s the distinction.
367
00:15:15,880 –> 00:15:18,560
You can write IAC that violates your policies.
368
00:15:18,560 –> 00:15:22,200
You can write bicep code that creates an unencrypted storage account.
369
00:15:22,200 –> 00:15:25,320
You can write terraform that assigns overly broad permissions.
370
00:15:25,320 –> 00:15:28,400
The code is reproducible and auditable, but it’s still wrong.
371
00:15:28,400 –> 00:15:30,920
IAC doesn’t prevent you from making bad decisions.
372
00:15:30,920 –> 00:15:34,800
It just makes those bad decisions repeatable and auditable, which is actually worse, because
373
00:15:34,800 –> 00:15:36,920
now you’ve codified the mistake.
374
00:15:36,920 –> 00:15:38,560
Governance as code is the next evolution.
375
00:15:38,560 –> 00:15:42,360
You codify your governance rules and enforce them in your CI/CD pipelines.
376
00:15:42,360 –> 00:15:43,760
You define policies in code.
377
00:15:43,760 –> 00:15:44,880
You version them in Git.
378
00:15:44,880 –> 00:15:46,240
You test them in pre-production.
379
00:15:46,240 –> 00:15:48,120
You enforce them in production.
380
00:15:48,120 –> 00:15:51,800
Governance becomes as repeatable, auditable, and scalable as infrastructure.
381
00:15:51,800 –> 00:15:53,240
Here’s what the workflow looks like.
382
00:15:53,240 –> 00:15:56,600
A developer writes bicep code that creates a new resource.
383
00:15:56,600 –> 00:15:59,680
They push it to a Git repository, a CI/CD pipeline runs.
384
00:15:59,680 –> 00:16:02,480
The pipeline validates the code against your governance policies.
385
00:16:02,480 –> 00:16:06,400
The policy check either passes or fails if it passes the code can be deployed.
386
00:16:06,400 –> 00:16:08,200
If it fails, the deployment is blocked.
387
00:16:08,200 –> 00:16:12,360
The developer sees the error, understands why they are code violated the policy, and fixes
388
00:16:12,360 –> 00:16:13,360
it.
389
00:16:13,360 –> 00:16:14,360
They push the corrected code.
390
00:16:14,360 –> 00:16:15,360
The pipeline runs again.
391
00:16:15,360 –> 00:16:16,360
This time it passes.
392
00:16:16,360 –> 00:16:17,360
The code is deployed.
393
00:16:17,360 –> 00:16:18,560
This is where the magic happens.
394
00:16:18,560 –> 00:16:21,400
The governance is enforced before the code reaches production.
395
00:16:21,400 –> 00:16:25,360
The developer learns immediately that their approach violates the policy.
396
00:16:25,360 –> 00:16:29,280
They fix it right away instead of six months later when an audit discovers the problem.
397
00:16:29,280 –> 00:16:32,040
The policy is applied consistently to every deployment.
398
00:16:32,040 –> 00:16:33,040
There are no exceptions.
399
00:16:33,040 –> 00:16:34,360
There are no manual reviews.
400
00:16:34,360 –> 00:16:35,360
There are no workarounds.
401
00:16:35,360 –> 00:16:37,680
The system enforces what should happen.
402
00:16:37,680 –> 00:16:39,120
The mental model shift is this.
403
00:16:39,120 –> 00:16:40,760
Instead of asking, can we do this?
404
00:16:40,760 –> 00:16:41,760
Ask, should we do this?
405
00:16:41,760 –> 00:16:43,800
And what would prevent someone from doing this wrong?
406
00:16:43,800 –> 00:16:46,200
You’re not trying to enable every possible use case.
407
00:16:46,200 –> 00:16:48,280
You’re trying to prevent every possible mistake.
408
00:16:48,280 –> 00:16:52,080
You’re designing the system so that doing the right thing is the path of least resistance
409
00:16:52,080 –> 00:16:54,480
and doing the wrong thing is architecturally impossible.
410
00:16:54,480 –> 00:16:56,360
Why does this skill compound in value?
411
00:16:56,360 –> 00:16:59,800
Because once you’ve designed a governance framework that works, you can apply it to new
412
00:16:59,800 –> 00:17:02,480
services, new teams, new regions without starting over.
413
00:17:02,480 –> 00:17:05,440
You don’t have to reinvent the wheel every time you onboard a new business unit.
414
00:17:05,440 –> 00:17:08,160
You don’t have to manually review every deployment.
415
00:17:08,160 –> 00:17:11,080
You don’t have to hope that people remember the policies.
416
00:17:11,080 –> 00:17:13,200
The system enforces them automatically.
417
00:17:13,200 –> 00:17:15,800
This is the shift happening right now in the market.
418
00:17:15,800 –> 00:17:19,240
Organizations are moving from click-ops to ISE to governance as code.
419
00:17:19,240 –> 00:17:23,600
The people who understand this progression, who can design governance frameworks that scale,
420
00:17:23,600 –> 00:17:26,560
those people are the ones who are valuable in 2026.
421
00:17:26,560 –> 00:17:29,960
Everyone else is still clicking buttons in the portal wondering why their infrastructure
422
00:17:29,960 –> 00:17:32,880
keeps drifting and their compliance audits keep failing.
423
00:17:32,880 –> 00:17:34,600
Landing zones as governance blueprints.
424
00:17:34,600 –> 00:17:38,760
A landing zone is a pre-configured Azure environment that embeds governance from the start.
425
00:17:38,760 –> 00:17:41,000
It’s not a resource group, it’s not a subscription.
426
00:17:41,000 –> 00:17:45,880
It’s a complete opinionated blueprint for how an organization should operate in Azure.
427
00:17:45,880 –> 00:17:50,560
And it’s the difference between teams that inherit chaos and teams that inherit order.
428
00:17:50,560 –> 00:17:53,960
The Cloud adoption framework provides a reference architecture for landing zones.
429
00:17:53,960 –> 00:17:56,320
But what matters isn’t the specific architecture.
430
00:17:56,320 –> 00:17:57,800
What matters is the philosophy.
431
00:17:57,800 –> 00:18:01,520
A landing zone says, before you provision your first resource, before you deploy your
432
00:18:01,520 –> 00:18:06,200
first application, before you make your first decision about how to operate in Azure,
433
00:18:06,200 –> 00:18:08,360
here’s how we’ve decided things should work.
434
00:18:08,360 –> 00:18:10,320
Here are the policies that will be enforced.
435
00:18:10,320 –> 00:18:13,400
Here are the management groups that will organize your subscriptions.
436
00:18:13,400 –> 00:18:14,680
Here’s the network baseline.
437
00:18:14,680 –> 00:18:16,000
Here’s the identity baseline.
438
00:18:16,000 –> 00:18:18,800
Here’s how we’re going to monitor and audit everything you do.
439
00:18:18,800 –> 00:18:19,560
Why does this matter?
440
00:18:19,560 –> 00:18:21,720
Because it prevents the blank canvas problem.
441
00:18:21,720 –> 00:18:26,400
If you give a team a blank Azure subscription and say, go build, they will build.
442
00:18:26,400 –> 00:18:30,560
They’ll make a thousand small decisions about how to organize resources, how to name things,
443
00:18:30,560 –> 00:18:33,200
how to configure networking, how to assign permissions.
444
00:18:33,200 –> 00:18:36,880
Most of those decisions will be locally optimal but globally suboptimal.
445
00:18:36,880 –> 00:18:41,000
They’ll make sense for that team’s immediate needs but create problems for everyone else downstream.
446
00:18:41,000 –> 00:18:45,640
By the time you realize the decisions were wrong, the infrastructure is too entrenched to change.
447
00:18:45,640 –> 00:18:50,000
A landing zone prevents this by establishing constraints before anyone starts building.
448
00:18:50,000 –> 00:18:52,120
The management group hierarchy is already defined.
449
00:18:52,120 –> 00:18:53,720
The policies are already deployed.
450
00:18:53,720 –> 00:18:55,760
The network baselines are already in place.
451
00:18:55,760 –> 00:18:58,040
The identity baselines are already configured.
452
00:18:58,040 –> 00:18:59,720
Teams don’t have to make those decisions.
453
00:18:59,720 –> 00:19:00,760
They inherit them.
454
00:19:00,760 –> 00:19:05,800
And because those decisions were made by architects who understood the full scope of the organization’s requirements,
455
00:19:05,800 –> 00:19:09,240
they’re usually better than the decisions the team would have made on their own.
456
00:19:09,240 –> 00:19:12,720
The architecture of a landing zone includes several critical components.
457
00:19:12,720 –> 00:19:18,240
The management group hierarchy organizes subscriptions by function, environment and compliance level.
458
00:19:18,240 –> 00:19:23,960
Azure policy assignments enforce tagging, encryption, network configuration and RBIC at scale.
459
00:19:23,960 –> 00:19:28,280
Network baselines define virtual networks, firewalls and private endpoints.
460
00:19:28,280 –> 00:19:33,320
Identity baselines define managed identities, role assignments and conditional access policies.
461
00:19:33,320 –> 00:19:38,520
Monitoring and compliance infrastructure provides logging, alerts and ordered trails.
462
00:19:38,520 –> 00:19:39,760
Here’s the distinction that matters.
463
00:19:39,760 –> 00:19:42,040
A landing zone isn’t just infrastructure.
464
00:19:42,040 –> 00:19:45,920
It’s codified intent about how your organization wants to operate at scale.
465
00:19:45,920 –> 00:19:50,160
It’s saying we’ve thought about security, we’ve thought about compliance, we’ve thought about cost management.
466
00:19:50,160 –> 00:19:53,120
And here’s how we’ve decided to handle all of these concerns.
467
00:19:53,120 –> 00:19:54,800
Things don’t have to reinvent the wheel.
468
00:19:54,800 –> 00:19:58,720
They inherit the decisions that architects made, tested and refined.
469
00:19:58,720 –> 00:20:00,080
Why does this prevent erosion?
470
00:20:00,080 –> 00:20:04,880
Because teams provisioning resources within a landing zone are constrained by policies they didn’t write.
471
00:20:04,880 –> 00:20:05,720
That’s the point.
472
00:20:05,720 –> 00:20:07,120
Those constraints prevent drift.
473
00:20:07,120 –> 00:20:12,920
They prevent teams from making locally optimal decisions that create globally suboptimal outcomes.
474
00:20:12,920 –> 00:20:17,840
They prevent the slow accumulation of exceptions and workarounds that characterizes eroded environments.
475
00:20:17,840 –> 00:20:19,440
The scaling pattern is elegant.
476
00:20:19,440 –> 00:20:25,480
Once you’ve built one landing zone, you can replicate it across teams, regions and business units without reinventing governance.
477
00:20:25,480 –> 00:20:28,560
You’re not creating governance from scratch for each new team.
478
00:20:28,560 –> 00:20:31,960
You’re instantiating a template that’s already been tested and proven.
479
00:20:31,960 –> 00:20:33,360
This is where the skill compounds.
480
00:20:33,360 –> 00:20:36,160
The first landing zone takes weeks to design and deploy.
481
00:20:36,160 –> 00:20:37,720
The second one takes days.
482
00:20:37,720 –> 00:20:39,320
The third one takes hours.
483
00:20:39,320 –> 00:20:43,240
By the time you’ve deployed your tenth landing zone, you’ve got a repeatable process that works.
484
00:20:43,240 –> 00:20:44,960
The common mistakes are instructive.
485
00:20:44,960 –> 00:20:47,800
Landing zones that are too permissive don’t prevent erosion.
486
00:20:47,800 –> 00:20:49,400
They just push the problem downstream.
487
00:20:49,400 –> 00:20:52,480
Landing zones that are too rigid slow down, legitimate innovation.
488
00:20:52,480 –> 00:20:56,200
The sweet spot is landing zones that are permissive enough to enable business velocity,
489
00:20:56,200 –> 00:20:58,040
but constrained enough to prevent erosion.
490
00:20:58,040 –> 00:20:58,960
That’s the hard part.
491
00:20:58,960 –> 00:21:03,880
That’s the part that requires architects who understand both the technical constraints and the organizational culture.
492
00:21:03,880 –> 00:21:08,480
This is what separates the organizations that scale successfully from the ones that don’t.
493
00:21:08,480 –> 00:21:12,400
The ones that have landing zones that work scale faster and with fewer incidents.
494
00:21:12,400 –> 00:21:17,680
The ones that don’t have landing zones are constantly fighting fires, constantly discovering misconfigurations,
495
00:21:17,680 –> 00:21:22,240
constantly dealing with the accumulated debt of ad hoc decisions made under time pressure.
496
00:21:22,240 –> 00:21:24,560
Azure Policy as the enforcement engine,
497
00:21:24,560 –> 00:21:28,320
Azure Policy is the service that enforces your governance rules at scale.
498
00:21:28,320 –> 00:21:29,240
It’s not optional.
499
00:21:29,240 –> 00:21:30,160
It’s not a nice to have.
500
00:21:30,160 –> 00:21:33,880
If you’re operating Azure without Azure Policy, you’re operating without governance.
501
00:21:33,880 –> 00:21:36,280
You’re just hoping people make the right decisions.
502
00:21:36,280 –> 00:21:37,560
And they won’t.
503
00:21:37,560 –> 00:21:38,320
Here’s how it works.
504
00:21:38,320 –> 00:21:39,520
You define a policy.
505
00:21:39,520 –> 00:21:42,040
The policy is a JSON file that describes a rule.
506
00:21:42,040 –> 00:21:45,800
The rule might say, all storage accounts must have encryption enabled.
507
00:21:45,800 –> 00:21:48,840
Or all virtual machines must have a specific tag.
508
00:21:48,840 –> 00:21:52,280
Or all resources must be deployed to approved regions.
509
00:21:52,280 –> 00:21:54,080
You store this policy definition in code.
510
00:21:54,080 –> 00:21:59,760
You version it, you review it, then you assign it to a scope, a subscription, a resource group, or a management group.
511
00:21:59,760 –> 00:22:03,040
Once assigned, the policy applies to every resource within that scope.
512
00:22:03,040 –> 00:22:06,640
The distinction between policy definitions and policy assignments matters.
513
00:22:06,640 –> 00:22:07,840
Definitions are the rules.
514
00:22:07,840 –> 00:22:09,800
Assignments apply those rules to scopes.
515
00:22:09,800 –> 00:22:12,480
You might have a definition that says require encryption,
516
00:22:12,480 –> 00:22:15,280
but that definition doesn’t do anything until you assign it to a scope.
517
00:22:15,280 –> 00:22:18,120
Once assigned, it applies everywhere within that scope.
518
00:22:18,120 –> 00:22:24,080
This is how you enforce governance at scale, without creating a separate rule for every subscription or every resource group.
519
00:22:24,080 –> 00:22:25,880
The effects are where the real power lives.
520
00:22:25,880 –> 00:22:28,040
Audit mode logs violations without blocking them.
521
00:22:28,040 –> 00:22:29,440
This is useful for detection.
522
00:22:29,440 –> 00:22:30,920
You deploy a policy in audit mode.
523
00:22:30,920 –> 00:22:31,720
You watch it fire.
524
00:22:31,720 –> 00:22:33,200
You see what violations exist.
525
00:22:33,200 –> 00:22:35,320
You understand the scope of the problem.
526
00:22:35,320 –> 00:22:37,160
But audit mode doesn’t actually prevent anything.
527
00:22:37,160 –> 00:22:38,480
It’s visibility, not control.
528
00:22:38,480 –> 00:22:42,720
Most organizations stay in audit mode forever because deny mode is uncomfortable.
529
00:22:42,720 –> 00:22:45,400
deny mode blocks violations from reaching infrastructure.
530
00:22:45,400 –> 00:22:47,400
A deployment fails if it violates the policy.
531
00:22:47,400 –> 00:22:48,760
The resource never gets created.
532
00:22:48,760 –> 00:22:50,960
This is actual control, but deny mode breaks things.
533
00:22:50,960 –> 00:22:51,800
It breaks workflows.
534
00:22:51,800 –> 00:22:52,800
It slows down teams.
535
00:22:52,800 –> 00:22:54,800
So most organizations are afraid to use it.
536
00:22:54,800 –> 00:22:59,720
They stay in audit mode, watching violations accumulate, telling themselves they’ll tighten it up later.
537
00:22:59,720 –> 00:23:00,520
They never do.
538
00:23:00,520 –> 00:23:02,920
Deploy if not exists is the pattern that scales.
539
00:23:02,920 –> 00:23:05,200
This effect automatically remediate violations.
540
00:23:05,200 –> 00:23:08,120
If a resource is missing or required tag, the policy adds it.
541
00:23:08,120 –> 00:23:10,440
If encryption isn’t enabled, the policy enables it.
542
00:23:10,440 –> 00:23:13,920
If a resource is created in an unapproved region, the policy moves it.
543
00:23:13,920 –> 00:23:15,560
This is where governance becomes invisible.
544
00:23:15,560 –> 00:23:17,280
Teams don’t have to think about compliance.
545
00:23:17,280 –> 00:23:19,800
The system enforces it automatically.
546
00:23:19,800 –> 00:23:21,920
Why policy as code matters is this?
547
00:23:21,920 –> 00:23:24,280
Policy definitions are JSON files stored in Git.
548
00:23:24,280 –> 00:23:25,280
They’re versioned.
549
00:23:25,280 –> 00:23:26,880
They’re reviewed through pull requests.
550
00:23:26,880 –> 00:23:28,360
They’re tested before deployment.
551
00:23:28,360 –> 00:23:32,080
This is fundamentally different from policies created through the Azure Portal and stored
552
00:23:32,080 –> 00:23:33,080
nowhere.
553
00:23:33,080 –> 00:23:36,400
Code-based policies are auditable, repeatable, and scalable.
554
00:23:36,400 –> 00:23:38,160
Here’s the workflow that prevents erosion.
555
00:23:38,160 –> 00:23:40,040
You write policy definitions in code.
556
00:23:40,040 –> 00:23:43,000
You test them in pre-production against your actual resources.
557
00:23:43,000 –> 00:23:45,280
You identify false positives and false negatives.
558
00:23:45,280 –> 00:23:46,520
You refine the policy.
559
00:23:46,520 –> 00:23:49,320
You deploy it in audit mode first to understand the impact.
560
00:23:49,320 –> 00:23:52,440
You gradually shift to deny mode as confidence increases.
561
00:23:52,440 –> 00:23:56,160
You monitor compliance metrics and adjust policies as the organization changes.
562
00:23:56,160 –> 00:24:00,320
The scaling problem is that policy exceptions accumulate faster than policy rules.
563
00:24:00,320 –> 00:24:01,800
Every exception is governance.
564
00:24:01,800 –> 00:24:04,720
Every exception is a signal that your policy isn’t quite right.
565
00:24:04,720 –> 00:24:07,680
But instead of fixing the policy, teams just add exceptions.
566
00:24:07,680 –> 00:24:10,880
Before long, your exemption list is longer than your policy list.
567
00:24:10,880 –> 00:24:13,000
Your framework becomes unmentainable.
568
00:24:13,000 –> 00:24:17,920
High-income architects design frameworks where exceptions are rare, documented, and time-bound.
569
00:24:17,920 –> 00:24:20,760
Here’s a real scenario that illustrates the pattern.
570
00:24:20,760 –> 00:24:25,200
You create a policy that requires all storage accounts to have encryption enabled.
571
00:24:25,200 –> 00:24:27,720
Audit mode identifies non-compliant storage accounts.
572
00:24:27,720 –> 00:24:31,160
Deny mode prevents creation of non-compliant storage accounts.
573
00:24:31,160 –> 00:24:35,400
Deploy if not exists automatically enables encryption on non-compliant accounts.
574
00:24:35,400 –> 00:24:39,600
You start with audit, move to deny, use deploy if not exists as a safety net.
575
00:24:39,600 –> 00:24:41,520
The policy evolves as you learn what works.
576
00:24:41,520 –> 00:24:45,800
Why this skill is valuable is because designing policies that prevent problems without creating
577
00:24:45,800 –> 00:24:47,560
friction is harder than it sounds.
578
00:24:47,560 –> 00:24:50,600
A policy that’s too strict blocks legitimate use cases.
579
00:24:50,600 –> 00:24:53,240
A policy that’s too loose doesn’t prevent erosion.
580
00:24:53,240 –> 00:24:57,200
The sweet spot requires understanding both the technical requirements and the organizational
581
00:24:57,200 –> 00:24:58,200
workflow.
582
00:24:58,200 –> 00:24:59,200
That’s the skill that’s rare.
583
00:24:59,200 –> 00:25:00,520
That’s the skill that’s valuable.
584
00:25:00,520 –> 00:25:03,680
This is where governance shifts from theatre to reality.
585
00:25:03,680 –> 00:25:05,600
Our policies are enforced through code.
586
00:25:05,600 –> 00:25:09,560
When violations are prevented before they reach production, when the system itself makes
587
00:25:09,560 –> 00:25:13,920
doing the right thing the path of least resistance, that’s when erosion stops.
588
00:25:13,920 –> 00:25:19,000
That’s when architects move from reacting to incidents to preventing them.
589
00:25:19,000 –> 00:25:21,040
Identity governance and entree agent ID.
590
00:25:21,040 –> 00:25:23,360
Identity is the control plane for everything else in Azure.
591
00:25:23,360 –> 00:25:27,200
If identity is compromised all downstream controls fail, this is why identity governance
592
00:25:27,200 –> 00:25:28,480
has to be airtight.
593
00:25:28,480 –> 00:25:32,000
And this is where most organizations are making catastrophic mistakes because they’re still
594
00:25:32,000 –> 00:25:34,680
thinking about identity in human terms.
595
00:25:34,680 –> 00:25:37,440
Traditional identity governance focused on human users.
596
00:25:37,440 –> 00:25:41,960
Passwords, multi factor authentication, conditional access policies, these are important,
597
00:25:41,960 –> 00:25:43,320
but they’re only half the problem.
598
00:25:43,320 –> 00:25:49,640
The new reality is that non-human identities now outnumber human identities in most enterprises.
599
00:25:49,640 –> 00:25:54,080
Service principles, managed identities, AI agents, these aren’t people.
600
00:25:54,080 –> 00:25:55,080
They don’t need passwords.
601
00:25:55,080 –> 00:25:56,960
They don’t need MFA in the traditional sense.
602
00:25:56,960 –> 00:25:58,600
They need something completely different.
603
00:25:58,600 –> 00:26:00,480
They need least privilege by default.
604
00:26:00,480 –> 00:26:02,440
They need just in time elevation.
605
00:26:02,440 –> 00:26:04,120
They need immutable audit trails.
606
00:26:04,120 –> 00:26:06,200
Here’s what most organizations are doing wrong.
607
00:26:06,200 –> 00:26:08,640
They’re sharing credentials between humans and AI agents.
608
00:26:08,640 –> 00:26:11,280
A team needs an AI agent to perform some task.
609
00:26:11,280 –> 00:26:14,960
Instead of creating a distinct service principle with scoped permissions, they give the agent
610
00:26:14,960 –> 00:26:18,320
a human’s credentials or they create a single service principle and share it across
611
00:26:18,320 –> 00:26:19,880
multiple agents.
612
00:26:19,880 –> 00:26:23,120
Or they store credentials in plain text in configuration files.
613
00:26:23,120 –> 00:26:24,640
These aren’t security oversights.
614
00:26:24,640 –> 00:26:26,440
These are architectural failures.
615
00:26:26,440 –> 00:26:29,040
And they’re creating massive vulnerabilities at scale.
616
00:26:29,040 –> 00:26:31,960
Each Azure Agent ID is Microsoft’s answer to this problem.
617
00:26:31,960 –> 00:26:37,240
It’s a framework that gives AI agents distinct identities with scoped permissions, audit trails,
618
00:26:37,240 –> 00:26:38,760
and life cycle management.
619
00:26:38,760 –> 00:26:40,840
Each agent gets a unique service principle.
620
00:26:40,840 –> 00:26:44,880
Each service principle gets specific permissions for specific resources.
621
00:26:44,880 –> 00:26:47,600
Elevated operations require explicit justification.
622
00:26:47,600 –> 00:26:51,600
Every action gets logged in a way that cannot be modified after the fact.
623
00:26:51,600 –> 00:26:52,960
Here’s how this works in practice.
624
00:26:52,960 –> 00:26:56,200
An organization registers an AI agent in Entra ID.
625
00:26:56,200 –> 00:26:58,840
The agent gets a unique object ID and app ID.
626
00:26:58,840 –> 00:27:00,880
The organization assigns the agent to a group.
627
00:27:00,880 –> 00:27:04,920
They apply policies to that group conditional access rules, permission boundaries, approval
628
00:27:04,920 –> 00:27:05,920
workflows.
629
00:27:05,920 –> 00:27:08,680
When the agent needs to perform an action, it requests a token.
630
00:27:08,680 –> 00:27:10,760
The token is issued with scoped permissions.
631
00:27:10,760 –> 00:27:11,760
The action is logged.
632
00:27:11,760 –> 00:27:15,720
If the agent behaves unexpectedly, it can be disabled immediately without affecting other
633
00:27:15,720 –> 00:27:17,240
agents or human users.
634
00:27:17,240 –> 00:27:20,760
The architecture that works at this layer has several components.
635
00:27:20,760 –> 00:27:23,920
Every agent gets registered in your identity system.
636
00:27:23,920 –> 00:27:26,200
Agents are assigned to groups based on their function.
637
00:27:26,200 –> 00:27:29,000
These are applied to agent groups, not individual agents.
638
00:27:29,000 –> 00:27:32,840
An agent that handles customer data might be in a different group than an agent that handles
639
00:27:32,840 –> 00:27:34,280
internal operations.
640
00:27:34,280 –> 00:27:36,160
Each group gets different permissions.
641
00:27:36,160 –> 00:27:39,840
Agents can be disabled, rotated, or revoked without touching human credentials.
642
00:27:39,840 –> 00:27:41,440
Every agent action is auditable.
643
00:27:41,440 –> 00:27:43,520
Why this prevents erosion is straightforward.
644
00:27:43,520 –> 00:27:47,320
Without formal agent identity governance, teams resort to sharing credentials.
645
00:27:47,320 –> 00:27:48,840
Shared credentials are unauditable.
646
00:27:48,840 –> 00:27:51,040
You can’t tell which agent took which action.
647
00:27:51,040 –> 00:27:54,840
You can’t revoke an agent’s permissions without revoking permissions for every other agent
648
00:27:54,840 –> 00:27:56,600
or human using that credential.
649
00:27:56,600 –> 00:28:00,720
You can’t implement least privilege because the credential is shared across multiple entities
650
00:28:00,720 –> 00:28:02,120
with different needs.
651
00:28:02,120 –> 00:28:03,920
The system becomes impossible to govern.
652
00:28:03,920 –> 00:28:05,680
The cost of not doing this is staggering.
653
00:28:05,680 –> 00:28:10,520
A single compromised agent credential can exfiltrate data, modify systems, or trigger cost
654
00:28:10,520 –> 00:28:12,800
explosions without anyone knowing who did it.
655
00:28:12,800 –> 00:28:16,520
An agent with overprivileged permissions can perform actions that violate your compliance
656
00:28:16,520 –> 00:28:17,520
requirements.
657
00:28:17,520 –> 00:28:21,440
An agent that can’t be disabled independently can force you to rotate credentials that
658
00:28:21,440 –> 00:28:23,200
affect dozens of other systems.
659
00:28:23,200 –> 00:28:24,840
The pattern that scales is elegant.
660
00:28:24,840 –> 00:28:28,520
Once you’ve designed identity governance for agents, you can apply it to new agents, new
661
00:28:28,520 –> 00:28:30,800
teams, new regions without starting over.
662
00:28:30,800 –> 00:28:33,600
You’re not creating governance from scratch for each agent.
663
00:28:33,600 –> 00:28:37,000
You’re instantiating a template that’s already been tested and proven.
664
00:28:37,000 –> 00:28:41,320
An organization with a mature agent identity framework can onboard a new agent in hours.
665
00:28:41,320 –> 00:28:44,520
An organization without one spends weeks trying to figure out how to give the agent the
666
00:28:44,520 –> 00:28:47,960
permissions it needs without creating security vulnerabilities.
667
00:28:47,960 –> 00:28:49,800
The uncomfortable truth is this.
668
00:28:49,800 –> 00:28:53,480
These organizations don’t have formal agent identity governance yet.
669
00:28:53,480 –> 00:28:57,760
They’re running their AI infrastructure on shared credentials, which means they’re operating
670
00:28:57,760 –> 00:29:03,240
in a state where a single misconfigured agent or compromised credential can cause exponential
671
00:29:03,240 –> 00:29:04,240
damage.
672
00:29:04,240 –> 00:29:07,440
They’re treating agent identity as an afterthought instead of a first class architectural
673
00:29:07,440 –> 00:29:08,440
concern.
674
00:29:08,440 –> 00:29:11,320
They’re about to discover how expensive that decision is.
675
00:29:11,320 –> 00:29:13,280
Cost governance and Finops automation.
676
00:29:13,280 –> 00:29:17,000
Cost governance is governance that most organizations ignore until they get a bill that
677
00:29:17,000 –> 00:29:18,000
makes them panic.
678
00:29:18,000 –> 00:29:20,880
Treat cost as a finance problem instead of an architecture problem.
679
00:29:20,880 –> 00:29:21,880
It’s not.
680
00:29:21,880 –> 00:29:22,880
Cost is a governance problem.
681
00:29:22,880 –> 00:29:26,760
And if you don’t architect for cost control, you will discover very quickly how expensive
682
00:29:26,760 –> 00:29:28,360
it is to not have cost control.
683
00:29:28,360 –> 00:29:29,360
Here’s the pattern.
684
00:29:29,360 –> 00:29:30,880
Teams experiment with AI.
685
00:29:30,880 –> 00:29:32,960
Agents run retry loops, compute scales up.
686
00:29:32,960 –> 00:29:35,320
Suddenly you’re paying 10 times more than expected.
687
00:29:35,320 –> 00:29:36,320
Nobody knows why.
688
00:29:36,320 –> 00:29:37,320
Nobody can explain it.
689
00:29:37,320 –> 00:29:38,320
The bill just keeps growing.
690
00:29:38,320 –> 00:29:40,360
This isn’t a failure of the finance team.
691
00:29:40,360 –> 00:29:42,080
This is a failure of architecture.
692
00:29:42,080 –> 00:29:43,400
Finops.
693
00:29:43,400 –> 00:29:44,840
Financial operations for cloud.
694
00:29:44,840 –> 00:29:47,200
Treats cost as a first class governance concern.
695
00:29:47,200 –> 00:29:48,360
And afterthought.
696
00:29:48,360 –> 00:29:51,600
The architecture that works includes cost allocation through tagging.
697
00:29:51,600 –> 00:29:54,040
Every resource tagged with cost center owner project.
698
00:29:54,040 –> 00:29:58,000
Budget controls at the subscription and resource group level with spending limits.
699
00:29:58,000 –> 00:30:01,200
Automated remediation that scales down under utilized resources terminates orphaned
700
00:30:01,200 –> 00:30:03,440
assets, stops runaway processes.
701
00:30:03,440 –> 00:30:07,800
Forecasting and anomaly detection that predicts spend and alert when deviations occur.
702
00:30:07,800 –> 00:30:10,000
The AI specific problem is acute.
703
00:30:10,000 –> 00:30:12,760
Agents can generate massive costs through retry loops.
704
00:30:12,760 –> 00:30:16,120
An agent that reaches a failed operation a thousand times costs a thousand times more
705
00:30:16,120 –> 00:30:17,880
than an agent that retries ten times.
706
00:30:17,880 –> 00:30:22,360
Without cost controls a single, misconfigured agent can bankrupt a project in minutes.
707
00:30:22,360 –> 00:30:26,240
Not through malice, not through compromise, just through the normal operation of an agent
708
00:30:26,240 –> 00:30:30,120
operating at machine speed with unbounded retry logic.
709
00:30:30,120 –> 00:30:33,880
The pattern that prevents erosion is pre-execution cost estimation.
710
00:30:33,880 –> 00:30:36,520
Before an agent executes an operation it estimates the cost.
711
00:30:36,520 –> 00:30:39,320
If the cost exceeds a threshold the operation is blocked.
712
00:30:39,320 –> 00:30:41,360
The agent is rooted to cheaper infrastructure.
713
00:30:41,360 –> 00:30:42,840
The operation is deferred.
714
00:30:42,840 –> 00:30:46,240
Electrical controls become architectural constraints, not post-hoc reviews.
715
00:30:46,240 –> 00:30:47,960
Here’s what this looks like in practice.
716
00:30:47,960 –> 00:30:52,680
Defined cost classes, gold, silver bronze, based on acceptable spending per agent or workload,
717
00:30:52,680 –> 00:30:55,000
implement pre-execution cost estimation.
718
00:30:55,000 –> 00:30:57,200
Block operations that exceed thresholds.
719
00:30:57,200 –> 00:31:01,760
Monitor actual spend against forecasts, alert on anomalies, a spike in costs that indicates
720
00:31:01,760 –> 00:31:04,360
misconfiguration triggers an immediate investigation.
721
00:31:04,360 –> 00:31:07,640
You don’t wait for the monthly bill, you catch it in real time.
722
00:31:07,640 –> 00:31:11,680
Why this skill is valuable is because designing cost governance that prevents explosions without
723
00:31:11,680 –> 00:31:13,800
stifling innovation is harder than it sounds.
724
00:31:13,800 –> 00:31:17,040
A cost control that’s too strict blocks legitimate use cases.
725
00:31:17,040 –> 00:31:20,200
A cost control that’s too loose doesn’t prevent erosion.
726
00:31:20,200 –> 00:31:24,200
The sweet spot requires understanding both the technical requirements and the business model.
727
00:31:24,200 –> 00:31:25,400
That’s the skill that’s rare.
728
00:31:25,400 –> 00:31:26,400
Real scenario.
729
00:31:26,400 –> 00:31:31,240
An AI agent configured to search through 10 years of logs to answer a question.
730
00:31:31,240 –> 00:31:35,320
Without cost controls the query runs for hours, costs thousands of dollars and doesn’t even
731
00:31:35,320 –> 00:31:36,680
provide useful results.
732
00:31:36,680 –> 00:31:40,320
With cost controls the query is blocked or rooted to cheaper infrastructure.
733
00:31:40,320 –> 00:31:43,040
The agent learns that expensive queries aren’t allowed.
734
00:31:43,040 –> 00:31:44,600
It adapts its behavior.
735
00:31:44,600 –> 00:31:48,240
Cost governance becomes invisible because the system enforces it automatically.
736
00:31:48,240 –> 00:31:49,960
The scaling pattern is elegant.
737
00:31:49,960 –> 00:31:54,000
Once you’ve designed cost governance for one workload you can apply it to new workloads,
738
00:31:54,000 –> 00:31:55,960
new teams, new regions without reinventing.
739
00:31:55,960 –> 00:31:58,880
You’re not creating cost controls from scratch for each new agent.
740
00:31:58,880 –> 00:32:02,120
You’re instantiating a template that’s already been tested and proven.
741
00:32:02,120 –> 00:32:03,800
This is where most organizations fail.
742
00:32:03,800 –> 00:32:08,680
They treat cost as something to be managed reactively through better budgeting or stricter
743
00:32:08,680 –> 00:32:09,680
reviews.
744
00:32:09,680 –> 00:32:11,760
They treat cost as an architectural concern.
745
00:32:11,760 –> 00:32:14,640
They don’t design systems where cost control is built in from the start.
746
00:32:14,640 –> 00:32:18,400
And then they get surprised when a single, misconfigured agent generates thousands of
747
00:32:18,400 –> 00:32:20,480
dollars in unexpected charges.
748
00:32:20,480 –> 00:32:24,880
The organizations that understand this that are building cost governance into their architecture
749
00:32:24,880 –> 00:32:29,640
that are implementing pre-execution gates that are treating cost as a first class architectural
750
00:32:29,640 –> 00:32:33,320
concern, those organizations are going to win in 2026.
751
00:32:33,320 –> 00:32:37,680
Everyone else is going to have bills they can’t explain and incidents they don’t understand.
752
00:32:37,680 –> 00:32:40,840
CI/CD governance pipelines and shift left security.
753
00:32:40,840 –> 00:32:42,480
Traditional security works like this.
754
00:32:42,480 –> 00:32:46,320
You build something, you deploy it to production, you find the problems, you fix them.
755
00:32:46,320 –> 00:32:50,120
This is reactive security, it’s expensive, it’s slow, it’s the reason organizations are
756
00:32:50,120 –> 00:32:52,680
constantly dealing with incidents they didn’t see coming.
757
00:32:52,680 –> 00:32:54,920
Shift left security does something different.
758
00:32:54,920 –> 00:32:57,080
It prevents problems before they reach production.
759
00:32:57,080 –> 00:32:59,480
When they’re cheap to fix, when they’re still in code.
760
00:32:59,480 –> 00:33:02,800
When the developer who made the mistake is still thinking about the problem instead of
761
00:33:02,800 –> 00:33:04,880
three sprints ahead on something else.
762
00:33:04,880 –> 00:33:09,160
CI/CD governance pipelines are the mechanism that implements shift left security.
763
00:33:09,160 –> 00:33:11,680
Here’s how it works, a developer commits code to Git.
764
00:33:11,680 –> 00:33:13,120
A pipeline runs automatically.
765
00:33:13,120 –> 00:33:16,200
The pipeline validates the code against your governance policies.
766
00:33:16,200 –> 00:33:20,880
It checks compliance, it estimates costs, it scans for vulnerabilities, it validates against
767
00:33:20,880 –> 00:33:23,800
your security baselines, the pipeline either passes or fails.
768
00:33:23,800 –> 00:33:25,640
If it passes the code can be deployed.
769
00:33:25,640 –> 00:33:27,520
If it fails, the deployment is blocked.
770
00:33:27,520 –> 00:33:29,680
The developer sees the error immediately.
771
00:33:29,680 –> 00:33:32,440
They understand why they are code violated the governance framework.
772
00:33:32,440 –> 00:33:34,560
They fix it, they push the corrected code.
773
00:33:34,560 –> 00:33:36,680
The pipeline runs again, this time it passes.
774
00:33:36,680 –> 00:33:38,760
This is where governance becomes invisible.
775
00:33:38,760 –> 00:33:42,360
Developers don’t have to think about whether their code complies with policies.
776
00:33:42,360 –> 00:33:46,040
The system tells them immediately if it doesn’t, they fix it right away instead of discovering
777
00:33:46,040 –> 00:33:48,840
the problem six months later during a compliance audit.
778
00:33:48,840 –> 00:33:51,280
The policies applied consistently to every deployment.
779
00:33:51,280 –> 00:33:52,280
There are no exceptions.
780
00:33:52,280 –> 00:33:53,560
There are no manual reviews.
781
00:33:53,560 –> 00:33:54,720
There are no workarounds.
782
00:33:54,720 –> 00:33:58,200
The governance gates that matter include policy compliance checks.
783
00:33:58,200 –> 00:34:00,840
Does this infrastructure comply with our policies?
784
00:34:00,840 –> 00:34:01,840
Cost estimation.
785
00:34:01,840 –> 00:34:04,720
Will this infrastructure cost more than expected security scanning?
786
00:34:04,720 –> 00:34:06,760
Are there known vulnerabilities in this code?
787
00:34:06,760 –> 00:34:07,760
Compliance validation.
788
00:34:07,760 –> 00:34:10,480
Does this infrastructure meet our regulatory requirements?
789
00:34:10,480 –> 00:34:11,760
These gates run in parallel.
790
00:34:11,760 –> 00:34:12,760
They run fast.
791
00:34:12,760 –> 00:34:14,160
They provide immediate feedback.
792
00:34:14,160 –> 00:34:17,240
A developer knows within seconds whether their code is compliant or not.
793
00:34:17,240 –> 00:34:22,000
Why this skill is valuable is because designing pipelines that enforce governance without creating
794
00:34:22,000 –> 00:34:23,920
friction is harder than it sounds.
795
00:34:23,920 –> 00:34:28,800
A pipeline that’s too strict blocks legitimate use cases and slows down development.
796
00:34:28,800 –> 00:34:31,960
A pipeline that’s too loose doesn’t prevent erosion.
797
00:34:31,960 –> 00:34:35,400
The sweet spot requires understanding both the technical requirements and the development
798
00:34:35,400 –> 00:34:36,400
workflow.
799
00:34:36,400 –> 00:34:40,320
The anti-pattern is governance pipelines that are so strict they slow down development.
800
00:34:40,320 –> 00:34:42,920
This creates incentives for teams to bypass the pipeline.
801
00:34:42,920 –> 00:34:43,920
They find workarounds.
802
00:34:43,920 –> 00:34:45,680
They deploy directly to infrastructure.
803
00:34:45,680 –> 00:34:47,640
They skip the approval process.
804
00:34:47,640 –> 00:34:51,040
Bipast pipelines are worse than no pipelines at all because now you have the overhead of
805
00:34:51,040 –> 00:34:52,880
a governance system that nobody is using.
806
00:34:52,880 –> 00:34:56,880
The pattern that scales is governance pipelines that are clear, fast and fair.
807
00:34:56,880 –> 00:35:00,880
There means teams understand why policies exist and what they’re trying to prevent.
808
00:35:00,880 –> 00:35:03,480
Fast means pipelines run in seconds, not minutes.
809
00:35:03,480 –> 00:35:06,360
Fair means policies apply equally to all teams.
810
00:35:06,360 –> 00:35:08,880
No special exceptions for high priority projects.
811
00:35:08,880 –> 00:35:10,560
No shortcuts for senior engineers.
812
00:35:10,560 –> 00:35:12,360
The system treats everyone the same.
813
00:35:12,360 –> 00:35:13,360
Real scenario.
814
00:35:13,360 –> 00:35:17,160
A pipeline that validates Azure policy compliance before deployment.
815
00:35:17,160 –> 00:35:21,040
A developer writes bicep code that creates a storage account without encryption.
816
00:35:21,040 –> 00:35:22,760
The pipeline runs policy validation.
817
00:35:22,760 –> 00:35:23,760
The policy check fails.
818
00:35:23,760 –> 00:35:25,760
The developer sees the error immediately.
819
00:35:25,760 –> 00:35:28,040
They enable encryption in their code and resubmit.
820
00:35:28,040 –> 00:35:29,040
The pipeline passes.
821
00:35:29,040 –> 00:35:30,120
The code is deployed.
822
00:35:30,120 –> 00:35:31,280
This takes minutes.
823
00:35:31,280 –> 00:35:33,000
The developer learns the policy.
824
00:35:33,000 –> 00:35:34,240
They understand what’s required.
825
00:35:34,240 –> 00:35:35,560
They move on.
826
00:35:35,560 –> 00:35:39,800
Without this gate, non-compliant infrastructure reaches production and becomes harder to fix.
827
00:35:39,800 –> 00:35:41,240
You discover the problem later.
828
00:35:41,240 –> 00:35:42,920
You have to remediate in production.
829
00:35:42,920 –> 00:35:45,480
You have to explain the compliance violation to auditors.
830
00:35:45,480 –> 00:35:47,720
You have to figure out how it happened in the first place.
831
00:35:47,720 –> 00:35:49,040
All of this is expensive.
832
00:35:49,040 –> 00:35:51,800
All of it is preventable through shift-left security.
833
00:35:51,800 –> 00:35:53,600
The scaling problem is straightforward.
834
00:35:53,600 –> 00:35:56,880
As teams grow, manual compliance review becomes impossible.
835
00:35:56,880 –> 00:35:59,080
You cannot have a person review every deployment.
836
00:35:59,080 –> 00:36:02,040
You cannot have a security team approve every change.
837
00:36:02,040 –> 00:36:04,880
Automated pipelines are the only way to enforce governance at scale.
838
00:36:04,880 –> 00:36:06,800
They run the same checks for every deployment.
839
00:36:06,800 –> 00:36:08,880
They apply the same rules to every team.
840
00:36:08,880 –> 00:36:11,800
They provide consistent enforcement without human bottlenecks.
841
00:36:11,800 –> 00:36:16,120
This is where governance moves from manual process to automated enforcement.
842
00:36:16,120 –> 00:36:20,400
When policies are validated in CICD pipelines, when violations are prevented before they
843
00:36:20,400 –> 00:36:25,080
reach production, when the system itself makes doing the right thing the path of least resistance.
844
00:36:25,080 –> 00:36:26,840
That’s when erosion stops.
845
00:36:26,840 –> 00:36:31,200
That’s when architects move from reacting to incidents to preventing them at the source.
846
00:36:31,200 –> 00:36:33,600
Drift detection and continuous compliance.
847
00:36:33,600 –> 00:36:36,720
Drift is the gap between intended state and actual state.
848
00:36:36,720 –> 00:36:39,120
You define how your infrastructure should be configured.
849
00:36:39,120 –> 00:36:40,120
You deploy it.
850
00:36:40,120 –> 00:36:42,160
For a while, it matches your definition.
851
00:36:42,160 –> 00:36:43,160
Then something changes.
852
00:36:43,160 –> 00:36:46,800
A manual modification, an automatic update, a misconfigured resource.
853
00:36:46,800 –> 00:36:49,160
A permission that got assigned and never removed.
854
00:36:49,160 –> 00:36:51,840
Slowly the actual state diverges from the intended state.
855
00:36:51,840 –> 00:36:52,840
That’s drift.
856
00:36:52,840 –> 00:36:56,480
And if you’re not detecting it continuously, it’s accumulating silently while you’re not
857
00:36:56,480 –> 00:36:57,480
paying attention.
858
00:36:57,480 –> 00:36:59,200
Sources of drift are varied.
859
00:36:59,200 –> 00:37:02,040
Manual changes made through the portal instead of through code.
860
00:37:02,040 –> 00:37:05,840
Someone needs to troubleshoot an issue so they modify a configuration directly in the Azure
861
00:37:05,840 –> 00:37:06,840
console.
862
00:37:06,840 –> 00:37:08,320
They’re planning to update the code later.
863
00:37:08,320 –> 00:37:09,320
They never do.
864
00:37:09,320 –> 00:37:12,280
Now your actual infrastructure doesn’t match your IAC definition.
865
00:37:12,280 –> 00:37:13,960
Automatic updates applied by Azure.
866
00:37:13,960 –> 00:37:15,960
Microsoft patches a security vulnerability.
867
00:37:15,960 –> 00:37:17,560
Azure applies the patch automatically.
868
00:37:17,560 –> 00:37:21,040
Your infrastructure is now more secure but it doesn’t match your code anymore.
869
00:37:21,040 –> 00:37:23,640
Misconfigured resources that don’t match policy.
870
00:37:23,640 –> 00:37:27,280
A resource was created before the policy was deployed so it never got validated.
871
00:37:27,280 –> 00:37:29,240
Now it violates the policy but it’s still running.
872
00:37:29,240 –> 00:37:32,280
Abandoned resources that are no longer used but still incur costs.
873
00:37:32,280 –> 00:37:34,000
A project ended six months ago.
874
00:37:34,000 –> 00:37:35,520
The infrastructure is still running.
875
00:37:35,520 –> 00:37:36,960
Nobody remembers to clean it up.
876
00:37:36,960 –> 00:37:38,480
Why Drift matters is this.
877
00:37:38,480 –> 00:37:41,040
Every unit of Drift is a unit of governance failure.
878
00:37:41,040 –> 00:37:42,040
You intended one thing.
879
00:37:42,040 –> 00:37:43,040
You got something else.
880
00:37:43,040 –> 00:37:46,360
That gap is a signal that your architecture isn’t enforcing what should happen.
881
00:37:46,360 –> 00:37:49,760
It’s a signal that something is broken and if you’re not detecting it it’s compounding.
882
00:37:49,760 –> 00:37:52,080
The pattern that detects Drift is straightforward.
883
00:37:52,080 –> 00:37:53,840
Define intended state in code.
884
00:37:53,840 –> 00:37:56,960
This is your IAC, bicep, terraform, whatever you’re using.
885
00:37:56,960 –> 00:37:58,320
This is your source of truth.
886
00:37:58,320 –> 00:38:00,080
Periodically scan actual state.
887
00:38:00,080 –> 00:38:02,640
Run a tool that looks at what’s actually deployed in Azure.
888
00:38:02,640 –> 00:38:04,320
Compare intended versus actual.
889
00:38:04,320 –> 00:38:05,480
Look for divergence.
890
00:38:05,480 –> 00:38:06,800
Alert on divergence.
891
00:38:06,800 –> 00:38:10,520
Automatically remediate or require manual approval depending on the severity.
892
00:38:10,520 –> 00:38:15,960
The architecture that works at this layer includes infrastructure as code as the source of truth.
893
00:38:15,960 –> 00:38:18,440
Scheduled scans that compare code to actual resources.
894
00:38:18,440 –> 00:38:21,520
If you’re not scanning regularly, you’re not detecting Drift.
895
00:38:21,520 –> 00:38:23,000
Alerts on divergence.
896
00:38:23,000 –> 00:38:24,320
Email, Slack, dashboard.
897
00:38:24,320 –> 00:38:26,800
However your organization communicates.
898
00:38:26,800 –> 00:38:28,640
Automated remediation for low-risk Drift.
899
00:38:28,640 –> 00:38:30,080
If a tag is missing, edit.
900
00:38:30,080 –> 00:38:33,920
If a configuration Drift is slightly corrected, manual approval for high-risk Drift.
901
00:38:33,920 –> 00:38:38,760
If something changed in a way that might indicate a legitimate change, require a human
902
00:38:38,760 –> 00:38:40,920
to review it before reverting.
903
00:38:40,920 –> 00:38:45,080
Why this skill is valuable is because designing Drift detection that catches real problems
904
00:38:45,080 –> 00:38:47,720
without creating alert fatigue is harder than it sounds.
905
00:38:47,720 –> 00:38:50,400
Two sensitive and you’re alerting on every minor variation.
906
00:38:50,400 –> 00:38:51,640
You get alert fatigue.
907
00:38:51,640 –> 00:38:52,880
People stop paying attention.
908
00:38:52,880 –> 00:38:54,640
The signal disappears into noise.
909
00:38:54,640 –> 00:38:56,840
Two insensitive and you’re missing real Drift.
910
00:38:56,840 –> 00:39:01,840
Resources diverge from intended state and nobody notices until an audit discovers the problem.
911
00:39:01,840 –> 00:39:02,840
Real scenario.
912
00:39:02,840 –> 00:39:07,800
A network security group is manually modified through the portal to allow SSH access for
913
00:39:07,800 –> 00:39:08,720
debugging.
914
00:39:08,720 –> 00:39:11,240
First detection finds the divergence and alert is raised.
915
00:39:11,240 –> 00:39:14,280
The team reviews the change and decides whether it’s intentional.
916
00:39:14,280 –> 00:39:16,640
If intentional, the change is committed to code.
917
00:39:16,640 –> 00:39:19,480
Now your ISE matches your actual infrastructure.
918
00:39:19,480 –> 00:39:21,440
If unintentional, the change is reverted.
919
00:39:21,440 –> 00:39:23,360
The resource is restored to its intended state.
920
00:39:23,360 –> 00:39:26,120
Either way, the gap between intended and actual is closed.
921
00:39:26,120 –> 00:39:27,640
The scaling pattern is elegant.
922
00:39:27,640 –> 00:39:31,400
Once you’ve designed Drift detection for one workload, you can apply it to new workloads
923
00:39:31,400 –> 00:39:33,800
and new teams, new regions, without starting over.
924
00:39:33,800 –> 00:39:37,040
You’re not creating Drift detection from scratch for each new application.
925
00:39:37,040 –> 00:39:40,120
You’re instantiating a template that’s already been tested and proven.
926
00:39:40,120 –> 00:39:42,120
The cost of not doing this is substantial.
927
00:39:42,120 –> 00:39:43,520
Drift accumulates silently.
928
00:39:43,520 –> 00:39:46,640
You have no idea what your actual infrastructure looks like.
929
00:39:46,640 –> 00:39:49,720
Resources diverge from policy without anyone noticing.
930
00:39:49,720 –> 00:39:52,720
Compliance violations go undetected until an audit.
931
00:39:52,720 –> 00:39:55,400
Security vulnerabilities are introduced through manual changes.
932
00:39:55,400 –> 00:39:58,960
Cost optimization opportunities are missed because you don’t know what’s actually running.
933
00:39:58,960 –> 00:40:03,520
By the time you realize Drift is a problem, you’ve got months or years of accumulated divergence
934
00:40:03,520 –> 00:40:04,720
to remediate.
935
00:40:04,720 –> 00:40:08,560
This is where continuous compliance becomes real when Drift is detected automatically
936
00:40:08,560 –> 00:40:13,080
when divergence triggers alerts when the system continuously compares actual to intended
937
00:40:13,080 –> 00:40:14,560
and flags mismatches.
938
00:40:14,560 –> 00:40:15,560
That’s when erosion stops.
939
00:40:15,560 –> 00:40:19,840
That’s when architects move from hoping people follow the rules to ensuring the system
940
00:40:19,840 –> 00:40:22,440
enforces them automatically.
941
00:40:22,440 –> 00:40:24,360
Management groups and hierarchical governance.
942
00:40:24,360 –> 00:40:28,760
A management group is a container for subscriptions that allows you to apply policies,
943
00:40:28,760 –> 00:40:31,240
R-BAC and other controls hierarchically.
944
00:40:31,240 –> 00:40:34,120
This is the organizational structure that makes governance scale.
945
00:40:34,120 –> 00:40:37,120
Without it, you’re managing governance at the subscription level, which means you’re
946
00:40:37,120 –> 00:40:39,640
duplicating rules across every subscription.
947
00:40:39,640 –> 00:40:44,080
With it, you define rules once at a high level and they cascade down automatically.
948
00:40:44,080 –> 00:40:45,800
Here’s why hierarchy matters.
949
00:40:45,800 –> 00:40:48,520
You have an organization with hundreds of subscriptions.
950
00:40:48,520 –> 00:40:52,760
You want to enforce a policy that says all resources must have encryption enabled.
951
00:40:52,760 –> 00:40:57,360
Without management groups, you have to apply that policy to every subscription individually.
952
00:40:57,360 –> 00:41:00,560
If you add a new subscription, you have to remember to apply the policy.
953
00:41:00,560 –> 00:41:04,040
If you want to update the policy, you have to update it in hundreds of places.
954
00:41:04,040 –> 00:41:05,040
This is not governance.
955
00:41:05,040 –> 00:41:06,560
This is chaos with extra steps.
956
00:41:06,560 –> 00:41:09,880
With management groups, you apply the policy once at the root level.
957
00:41:09,880 –> 00:41:11,720
Every subscription inherits it automatically.
958
00:41:11,720 –> 00:41:14,960
When you add a new subscription, it inherits the policy immediately.
959
00:41:14,960 –> 00:41:17,880
When you update the policy, the change propagates everywhere.
960
00:41:17,880 –> 00:41:19,360
This is governance that scales.
961
00:41:19,360 –> 00:41:21,320
The pattern that works has several levels.
962
00:41:21,320 –> 00:41:25,280
At the root management group, you define organization-wide policies.
963
00:41:25,280 –> 00:41:28,240
Encryption requirements, logging requirements, compliance frameworks.
964
00:41:28,240 –> 00:41:29,720
These are non-negotiable.
965
00:41:29,720 –> 00:41:31,840
Every part of the organization inherits them.
966
00:41:31,840 –> 00:41:34,720
Know that you have business-unit management groups.
967
00:41:34,720 –> 00:41:36,440
Policies specific to that business unit.
968
00:41:36,440 –> 00:41:39,560
Maybe finance has different requirements than engineering.
969
00:41:39,560 –> 00:41:41,880
Maybe healthcare has different requirements than retail.
970
00:41:41,880 –> 00:41:46,160
Each business unit gets its own management group with policies tailored to its needs.
971
00:41:46,160 –> 00:41:48,320
Below that you have environment management groups.
972
00:41:48,320 –> 00:41:50,480
Production, staging, development.
973
00:41:50,480 –> 00:41:52,920
Each environment gets different policies.
974
00:41:52,920 –> 00:41:54,920
Production might require more stringent controls.
975
00:41:54,920 –> 00:41:57,760
Development might be more permissive to enable innovation.
976
00:41:57,760 –> 00:42:00,480
At the bottom, you have team management groups.
977
00:42:00,480 –> 00:42:02,240
You have to be specific to that team’s needs.
978
00:42:02,240 –> 00:42:06,200
Why this prevents erosion is that governance is inherited down the hierarchy.
979
00:42:06,200 –> 00:42:08,360
You don’t have to redefine rules at every level.
980
00:42:08,360 –> 00:42:11,440
You don’t have to manually apply the same policy to every subscription.
981
00:42:11,440 –> 00:42:13,840
The system enforces hierarchy automatically.
982
00:42:13,840 –> 00:42:17,720
A policy defined at the root applies to every subscription in the organization.
983
00:42:17,720 –> 00:42:22,000
A policy defined at the business unit level applies to every subscription in that business
984
00:42:22,000 –> 00:42:23,000
unit.
985
00:42:23,000 –> 00:42:26,960
A policy defined at the environment level applies to every subscription in that environment.
986
00:42:26,960 –> 00:42:30,320
The anti-pattern is a flat subscription structure with no management groups.
987
00:42:30,320 –> 00:42:34,280
This requires you to apply the same policies to every subscription manually.
988
00:42:34,280 –> 00:42:36,560
Policies are inconsistent across subscriptions.
989
00:42:36,560 –> 00:42:38,760
Some subscriptions have encryption enabled.
990
00:42:38,760 –> 00:42:39,760
Others don’t.
991
00:42:39,760 –> 00:42:41,520
Some subscriptions have logging configured.
992
00:42:41,520 –> 00:42:42,520
Others don’t.
993
00:42:42,520 –> 00:42:44,000
Governance becomes un-maintainable.
994
00:42:44,000 –> 00:42:48,240
You’re constantly discovering that a policy exists in some subscriptions but not others.
995
00:42:48,240 –> 00:42:53,000
You’re spending time on manual remediation instead of designing better governance.
996
00:42:53,000 –> 00:42:54,000
Real scenario.
997
00:42:54,000 –> 00:42:58,040
A policy that requires all resources to have a cost-center tag.
998
00:42:58,040 –> 00:43:00,600
Group management group policy is defined once.
999
00:43:00,600 –> 00:43:02,400
All subscriptions inherit the policy.
1000
00:43:02,400 –> 00:43:05,640
When the policy is updated, the change propagates to all subscriptions.
1001
00:43:05,640 –> 00:43:09,280
When a new subscription is created, it automatically inherits the policy.
1002
00:43:09,280 –> 00:43:10,520
No manual work required.
1003
00:43:10,520 –> 00:43:11,520
No inconsistency.
1004
00:43:11,520 –> 00:43:12,520
No exceptions.
1005
00:43:12,520 –> 00:43:15,680
The policy applies everywhere because it’s defined at the top and cascades down.
1006
00:43:15,680 –> 00:43:19,120
Why this skill is valuable is because designing a management group hierarchy that scales
1007
00:43:19,120 –> 00:43:22,080
to hundreds of subscriptions and teams is harder than it sounds.
1008
00:43:22,080 –> 00:43:25,520
Too many levels and the hierarchy becomes un-maintainable.
1009
00:43:25,520 –> 00:43:28,600
We’ve got so many layers that nobody understands how policies cascade.
1010
00:43:28,600 –> 00:43:31,000
Two few levels and policies aren’t granular enough.
1011
00:43:31,000 –> 00:43:35,080
You’re forced to apply organization-wide policies that don’t fit every business unit’s
1012
00:43:35,080 –> 00:43:36,080
needs.
1013
00:43:36,080 –> 00:43:40,080
The sweet spot requires understanding both the organization structure and the technical
1014
00:43:40,080 –> 00:43:41,400
constraints of the system.
1015
00:43:41,400 –> 00:43:42,720
The scaling problem is real.
1016
00:43:42,720 –> 00:43:46,480
As organizations grow, management group hierarchies become complex.
1017
00:43:46,480 –> 00:43:50,040
You start with a simple three-level hierarchy then you acquire another company.
1018
00:43:50,040 –> 00:43:51,800
Now you need to integrate their subscriptions.
1019
00:43:51,800 –> 00:43:53,600
Do you create a new branch in your hierarchy?
1020
00:43:53,600 –> 00:43:55,760
Do you reorganize the existing structure?
1021
00:43:55,760 –> 00:43:58,920
Do you create a separate hierarchy for the acquired company?
1022
00:43:58,920 –> 00:44:00,280
These decisions compound.
1023
00:44:00,280 –> 00:44:03,840
Before long your hierarchy is a mess of special cases and exceptions.
1024
00:44:03,840 –> 00:44:07,440
The pattern that scales is a hierarchy that’s deep enough to be granular but shallow enough
1025
00:44:07,440 –> 00:44:08,840
to be understandable.
1026
00:44:08,840 –> 00:44:11,080
Four or five levels is usually the sweet spot.
1027
00:44:11,080 –> 00:44:13,040
Root for organization-wide policies.
1028
00:44:13,040 –> 00:44:15,840
Business unit or geography for regional policies.
1029
00:44:15,840 –> 00:44:19,280
Environment for dev test production may be one more level for specific applications or
1030
00:44:19,280 –> 00:44:20,280
teams.
1031
00:44:20,280 –> 00:44:23,040
Beyond that, you’re creating complexity that doesn’t add value.
1032
00:44:23,040 –> 00:44:27,600
Why this matters is that a well-designed hierarchy prevents governance from becoming a bottleneck
1033
00:44:27,600 –> 00:44:28,600
to innovation.
1034
00:44:28,600 –> 00:44:31,560
Teams can operate within their branch of the hierarchy with autonomy.
1035
00:44:31,560 –> 00:44:35,840
They inherit organization-wide policies that ensure security and compliance.
1036
00:44:35,840 –> 00:44:38,400
But they also get policies tailored to their needs.
1037
00:44:38,400 –> 00:44:41,520
This is where governance becomes an enabler instead of a blocker.
1038
00:44:41,520 –> 00:44:44,920
Teams move faster because the system enforces what should happen automatically.
1039
00:44:44,920 –> 00:44:46,440
They don’t have to think about compliance.
1040
00:44:46,440 –> 00:44:48,280
They don’t have to request exceptions.
1041
00:44:48,280 –> 00:44:52,280
The system is designed so that doing the right thing is the path of least resistance.
1042
00:44:52,280 –> 00:44:55,240
This is the foundation that makes everything else work.
1043
00:44:55,240 –> 00:44:58,520
Without a proper management group hierarchy, your policies are scattered.
1044
00:44:58,520 –> 00:44:59,920
Your controls are inconsistent.
1045
00:44:59,920 –> 00:45:01,480
Your governance is theater.
1046
00:45:01,480 –> 00:45:04,920
With a proper hierarchy, governance scales automatically.
1047
00:45:04,920 –> 00:45:06,720
Policies cascade down.
1048
00:45:06,720 –> 00:45:08,120
Controls are consistent.
1049
00:45:08,120 –> 00:45:11,240
The system enforces what should happen.
1050
00:45:11,240 –> 00:45:13,520
Bicep and infrastructure as code patterns.
1051
00:45:13,520 –> 00:45:18,120
Bicep is Microsoft’s domain-specific language for defining Azure Infrastructure as code.
1052
00:45:18,120 –> 00:45:19,120
It’s not the only option.
1053
00:45:19,120 –> 00:45:20,120
You can use Terraform.
1054
00:45:20,120 –> 00:45:21,440
You can use ARM templates.
1055
00:45:21,440 –> 00:45:23,800
You can use CloudFormation if you’re on AWS.
1056
00:45:23,800 –> 00:45:28,280
But bicep is what matters if you’re building on Azure because it’s designed specifically for Azure.
1057
00:45:28,280 –> 00:45:30,520
It understands Azure resources natively.
1058
00:45:30,520 –> 00:45:32,800
It integrates with Azure tooling seamlessly.
1059
00:45:32,800 –> 00:45:36,240
And most importantly, it allows you to define infrastructure in a way that’s readable,
1060
00:45:36,240 –> 00:45:37,920
maintainable, and version-controlled.
1061
00:45:37,920 –> 00:45:40,680
Why bicep matters in the context of governance is this.
1062
00:45:40,680 –> 00:45:45,960
Infrastructure defined in bicep is infrastructure that can be reviewed, tested, and enforced.
1063
00:45:45,960 –> 00:45:50,440
When infrastructure is defined in code, you can run it through your governance pipelines.
1064
00:45:50,440 –> 00:45:53,080
You can validate it against your policies before it’s deployed.
1065
00:45:53,080 –> 00:45:54,960
You can track changes through Git history.
1066
00:45:54,960 –> 00:45:57,480
You can understand exactly who changed what and when.
1067
00:45:57,480 –> 00:46:02,880
This is fundamentally different from infrastructure created through the portal or through ad hoc scripts.
1068
00:46:02,880 –> 00:46:08,080
The pattern that works includes defining reusable modules, a storage account module, a virtual machine module,
1069
00:46:08,080 –> 00:46:09,280
a network module.
1070
00:46:09,280 –> 00:46:13,040
These modules encapsulate the complexity of creating a resource correctly.
1071
00:46:13,040 –> 00:46:14,520
They enforce best practices.
1072
00:46:14,520 –> 00:46:18,800
They ensure consistency when a team needs a storage account they don’t create it from scratch.
1073
00:46:18,800 –> 00:46:23,400
They use the storage account module, the module enforces encryption, the module enforces tagging,
1074
00:46:23,400 –> 00:46:24,720
the module enforces logging.
1075
00:46:24,720 –> 00:46:28,280
The team doesn’t have to remember all these requirements, the module enforces them automatically.
1076
00:46:28,280 –> 00:46:32,280
You compose modules into larger templates, a landing zone template that includes storage,
1077
00:46:32,280 –> 00:46:34,480
networking, identity, and monitoring.
1078
00:46:34,480 –> 00:46:38,240
An application template that includes compute, databases, and load balancing.
1079
00:46:38,240 –> 00:46:40,000
These templates are versioned in Git.
1080
00:46:40,000 –> 00:46:41,600
They’re reviewed through pull requests.
1081
00:46:41,600 –> 00:46:43,040
They’re tested before deployment.
1082
00:46:43,040 –> 00:46:46,920
They are deployed through pipelines that validate them against your governance policies.
1083
00:46:46,920 –> 00:46:49,480
Why this prevents erosion is straightforward.
1084
00:46:49,480 –> 00:46:52,240
Infrastructure defined in code is infrastructure that’s repeatable.
1085
00:46:52,240 –> 00:46:55,400
You deploy the same landing zone to 10 different teams and it’s identical.
1086
00:46:55,400 –> 00:46:59,240
You deploy an application template to 10 different regions and it’s consistent.
1087
00:46:59,240 –> 00:47:01,120
You’re not relying on manual configuration.
1088
00:47:01,120 –> 00:47:04,240
You’re not relying on people remembering the right way to do things.
1089
00:47:04,240 –> 00:47:06,680
The code enforces consistency automatically.
1090
00:47:06,680 –> 00:47:10,960
The anti-pattern is infrastructure defined through the portal or through ad hoc scripts.
1091
00:47:10,960 –> 00:47:12,240
Changes aren’t reviewed.
1092
00:47:12,240 –> 00:47:13,440
Changes aren’t auditable.
1093
00:47:13,440 –> 00:47:14,960
Changes aren’t repeatable.
1094
00:47:14,960 –> 00:47:19,160
You create a resource one way in one subscription and a different way in another subscription.
1095
00:47:19,160 –> 00:47:24,200
By the time you realize the inconsistency, you’ve got technical debt spread across your entire environment.
1096
00:47:24,200 –> 00:47:25,200
Real scenario.
1097
00:47:25,200 –> 00:47:27,400
Defining a landing zone in BICEP.
1098
00:47:27,400 –> 00:47:32,120
The template defines management groups, subscriptions, policy assignments, network configuration,
1099
00:47:32,120 –> 00:47:34,920
identity configuration, monitoring, infrastructure.
1100
00:47:34,920 –> 00:47:39,400
When the template is deployed, the entire landing zone is created consistently.
1101
00:47:39,400 –> 00:47:42,240
Every deployment of that template produces identical results.
1102
00:47:42,240 –> 00:47:45,520
When the template is updated, all landing zones inherit the change.
1103
00:47:45,520 –> 00:47:48,400
You’re not manually updating 10 different landing zones.
1104
00:47:48,400 –> 00:47:51,200
You update the template once and the change propagates everywhere.
1105
00:47:51,200 –> 00:47:56,360
Why this skill is valuable is because designing BICEP templates that are reusable, maintainable,
1106
00:47:56,360 –> 00:47:59,000
and in force governance is harder than it sounds.
1107
00:47:59,000 –> 00:48:01,680
A template that’s too generic doesn’t enforce governance.
1108
00:48:01,680 –> 00:48:03,920
It’s just a collection of resources without constraints.
1109
00:48:03,920 –> 00:48:08,320
A template that’s too specific can’t be reused across different teams or regions.
1110
00:48:08,320 –> 00:48:12,800
The sweet spot requires understanding both the technical requirements and the organizational needs.
1111
00:48:12,800 –> 00:48:14,440
The scaling pattern is elegant.
1112
00:48:14,440 –> 00:48:18,160
Once you’ve designed a landing zone template, you can deploy it to new business units,
1113
00:48:18,160 –> 00:48:20,840
new regions, new teams without reinventing governance.
1114
00:48:20,840 –> 00:48:23,760
You’re not creating governance from scratch for each new deployment.
1115
00:48:23,760 –> 00:48:27,120
You’re instantiating a template that’s already been tested and proven.
1116
00:48:27,120 –> 00:48:29,840
The first landing zone takes weeks to design and deploy.
1117
00:48:29,840 –> 00:48:31,400
The second one takes days.
1118
00:48:31,400 –> 00:48:32,840
The third one takes hours.
1119
00:48:32,840 –> 00:48:37,520
By the time you’ve deployed your tent landing zone, you’ve got a repeatable process that works.
1120
00:48:37,520 –> 00:48:39,200
The distinction that matters is this.
1121
00:48:39,200 –> 00:48:42,000
BICEP templates that define infrastructure are useful,
1122
00:48:42,000 –> 00:48:45,760
but BICEP templates that define infrastructure and enforce governance are valuable.
1123
00:48:45,760 –> 00:48:48,000
A template that creates a storage account is nice.
1124
00:48:48,000 –> 00:48:51,040
A template that creates a storage account with encryption enabled,
1125
00:48:51,040 –> 00:48:53,880
with the right tags, with the right logging, with the right access controls,
1126
00:48:53,880 –> 00:48:55,240
that’s a template that scales.
1127
00:48:55,240 –> 00:48:56,960
That’s a template that prevents erosion.
1128
00:48:56,960 –> 00:49:01,120
That’s the skill that commands premium compensation in 2026.
1129
00:49:01,120 –> 00:49:03,480
Conditional access and zero trust architecture.
1130
00:49:03,480 –> 00:49:06,400
Conditional access is a policy engine that evaluates contacts
1131
00:49:06,400 –> 00:49:08,680
and makes access decisions based on that context.
1132
00:49:08,680 –> 00:49:12,960
Location, device, risk level, time of day, anomalies.
1133
00:49:12,960 –> 00:49:17,720
The system gathers signals about who’s trying to access what and where they’re trying to access it from.
1134
00:49:17,720 –> 00:49:18,960
Then it makes a decision.
1135
00:49:18,960 –> 00:49:21,840
Allow, block, require additional verification.
1136
00:49:21,840 –> 00:49:25,840
This is fundamentally different from static or back assignments that never change.
1137
00:49:25,840 –> 00:49:27,680
Why conditional access matters is this?
1138
00:49:27,680 –> 00:49:31,120
It allows you to enforce zero trust principles at scale.
1139
00:49:31,120 –> 00:49:32,840
Zero trust is a simple idea.
1140
00:49:32,840 –> 00:49:35,120
Assume breach, verify every access request.
1141
00:49:35,120 –> 00:49:37,840
Grant-leased privilege, don’t trust anything by default.
1142
00:49:37,840 –> 00:49:39,760
Verify everything continuously.
1143
00:49:39,760 –> 00:49:42,400
Most organizations operate on the opposite principle.
1144
00:49:42,400 –> 00:49:44,080
They assume their network is secure.
1145
00:49:44,080 –> 00:49:46,680
They assume that if you’re inside the network, you’re trusted.
1146
00:49:46,680 –> 00:49:50,360
They assume that once you’ve been granted access, you keep that access forever.
1147
00:49:50,360 –> 00:49:52,800
These assumptions are wrong and they’re expensive.
1148
00:49:52,800 –> 00:49:55,840
The pattern that works at this layer includes baseline policies.
1149
00:49:55,840 –> 00:49:59,560
Multifactor authentication required, compliant device required.
1150
00:49:59,560 –> 00:50:01,880
The system evaluates risk in real time.
1151
00:50:01,880 –> 00:50:05,760
Impossible travel, anomalous sign-in location, suspicious activity.
1152
00:50:05,760 –> 00:50:09,640
If risk is high, the system blocks or requires additional verification.
1153
00:50:09,640 –> 00:50:11,560
The system grants least privilege access.
1154
00:50:11,560 –> 00:50:13,400
The minimum permissions needed for the task.
1155
00:50:13,400 –> 00:50:16,040
Not the maximum permissions the person might ever need.
1156
00:50:16,040 –> 00:50:18,200
Not the permissions they had in their last role.
1157
00:50:18,200 –> 00:50:21,120
Just the permissions they need right now for this specific task.
1158
00:50:21,120 –> 00:50:27,480
Why this prevents erosion is that access is continuously evaluated and adjusted based on context.
1159
00:50:27,480 –> 00:50:30,160
A user’s permissions don’t just stay the same forever.
1160
00:50:30,160 –> 00:50:31,400
They change based on risk.
1161
00:50:31,400 –> 00:50:33,840
Based on location, based on behavior.
1162
00:50:33,840 –> 00:50:38,360
If a user suddenly tries to access resources from a country they’ve never accessed from before,
1163
00:50:38,360 –> 00:50:39,880
the system notices.
1164
00:50:39,880 –> 00:50:43,680
If a user tries to access resources at three in the morning when they normally access them
1165
00:50:43,680 –> 00:50:45,680
at nine in the morning, the system notices.
1166
00:50:45,680 –> 00:50:50,320
If a user suddenly tries to access resources, they’ve never accessed before the system notices.
1167
00:50:50,320 –> 00:50:53,040
And the system responds, it might require additional verification.
1168
00:50:53,040 –> 00:50:54,760
It might block the access entirely.
1169
00:50:54,760 –> 00:50:57,400
It might grant temporary access with additional monitoring.
1170
00:50:57,400 –> 00:51:00,760
The anti-pattern is static RBAC assignments that never change.
1171
00:51:00,760 –> 00:51:04,040
A user is assigned the contributor role and keeps it forever.
1172
00:51:04,040 –> 00:51:07,120
When the user’s role changes, nobody remembers to update the assignment.
1173
00:51:07,120 –> 00:51:08,920
The user has permissions they no longer need.
1174
00:51:08,920 –> 00:51:12,920
The user leaves the company and their account is disabled, but the permissions are still assigned.
1175
00:51:12,920 –> 00:51:17,880
The user’s credentials are compromised and the attacker has access to everything the user had access to.
1176
00:51:17,880 –> 00:51:19,720
None of this is prevented by static RBAC.
1177
00:51:19,720 –> 00:51:21,360
Static RBAC is governance theater.
1178
00:51:21,360 –> 00:51:23,840
It looks like you’re controlling access, but you’re not.
1179
00:51:23,840 –> 00:51:24,760
Real scenario.
1180
00:51:24,760 –> 00:51:28,760
A developer needs temporary access to a production database to troubleshoot an issue.
1181
00:51:28,760 –> 00:51:32,960
Instead of assigning permanent contributor role, use conditional access to grant temporary access.
1182
00:51:32,960 –> 00:51:33,760
One hour.
1183
00:51:33,760 –> 00:51:36,920
Require MFA require the request to be approved by a manager.
1184
00:51:36,920 –> 00:51:38,600
Log the access for audit purposes.
1185
00:51:38,600 –> 00:51:41,880
When the hour expires, access is revoked automatically.
1186
00:51:41,880 –> 00:51:44,040
The developer can’t access the database anymore.
1187
00:51:44,040 –> 00:51:46,480
If they need access again, they have to request it again.
1188
00:51:46,480 –> 00:51:47,480
This is least privilege.
1189
00:51:47,480 –> 00:51:48,560
This is zero trust.
1190
00:51:48,560 –> 00:51:50,000
This is how you prevent erosion.
1191
00:51:50,000 –> 00:51:56,840
Why this skill is valuable is because designing conditional access policies that enforce zero trust without creating friction is harder than it sounds.
1192
00:51:56,840 –> 00:51:59,560
A policy that’s too strict blocks legitimate use cases.
1193
00:51:59,560 –> 00:52:00,840
Users can’t do their jobs.
1194
00:52:00,840 –> 00:52:03,440
A policy that’s too loose doesn’t prevent erosion.
1195
00:52:03,440 –> 00:52:05,560
Overprivileged identities persist.
1196
00:52:05,560 –> 00:52:10,120
The sweet spot requires understanding both the security requirements and the operational workflow.
1197
00:52:10,120 –> 00:52:11,560
The scaling pattern is elegant.
1198
00:52:11,560 –> 00:52:14,640
Once you’ve designed conditional access policies for one scenario,
1199
00:52:14,640 –> 00:52:18,800
you can apply them to new scenarios, new teams, new regions without starting over.
1200
00:52:18,800 –> 00:52:22,400
You’re not creating access controls from scratch for each new use case.
1201
00:52:22,400 –> 00:52:25,600
You’re instantiating a template that’s already been tested and proven.
1202
00:52:25,600 –> 00:52:30,080
An organization with mature conditional access policies can onboard a new user,
1203
00:52:30,080 –> 00:52:34,960
grant them appropriate access and revoke it when they leave, all without manual intervention.
1204
00:52:34,960 –> 00:52:38,920
An organization without conditional access is constantly dealing with access requests,
1205
00:52:38,920 –> 00:52:40,800
access reviews and access cleanup.
1206
00:52:40,800 –> 00:52:43,120
The cost of not doing this is substantial.
1207
00:52:43,120 –> 00:52:46,640
Overprivileged identities are a leading cause of security breaches.
1208
00:52:46,640 –> 00:52:51,320
Attackers compromise a single credential and suddenly have access to everything that credential had access to.
1209
00:52:51,320 –> 00:52:54,880
If that credential had excessive permissions, the blast radius is enormous.
1210
00:52:54,880 –> 00:52:59,000
Conditional access reduces that blast radius by ensuring that permissions are scoped
1211
00:52:59,000 –> 00:53:02,920
to what’s actually needed and continuously evaluated based on context.
1212
00:53:02,920 –> 00:53:06,600
This is where governance moves from static rules to dynamic enforcement.
1213
00:53:06,600 –> 00:53:10,560
When access is continuously evaluated, when risk triggers automatic responses,
1214
00:53:10,560 –> 00:53:14,920
when the system adjusts permissions based on context, that’s when erosion stops.
1215
00:53:14,920 –> 00:53:18,440
That’s when architects move from hoping people follow the rules to ensuring the system
1216
00:53:18,440 –> 00:53:21,200
enforces them automatically based on real-time signals.
1217
00:53:21,200 –> 00:53:23,440
Defender for cloud and compliance automation.
1218
00:53:23,440 –> 00:53:27,120
Defender for cloud is Azure’s security post-geo management service.
1219
00:53:27,120 –> 00:53:32,040
It continuously scans your environment and alerts on misconfigurations, vulnerabilities and compliance violations.
1220
00:53:32,040 –> 00:53:33,040
It’s not optional.
1221
00:53:33,040 –> 00:53:36,640
If you’re operating Azure without Defender for cloud, you’re operating blind.
1222
00:53:36,640 –> 00:53:39,280
You have no visibility into whether your infrastructure is secure.
1223
00:53:39,280 –> 00:53:41,880
You have no visibility into whether you’re compliant.
1224
00:53:41,880 –> 00:53:43,520
You’re just hoping nothing goes wrong.
1225
00:53:43,520 –> 00:53:44,480
Here’s how it works.
1226
00:53:44,480 –> 00:53:47,000
You enable Defender for cloud on all your subscriptions.
1227
00:53:47,000 –> 00:53:49,000
It immediately starts scanning your resources.
1228
00:53:49,000 –> 00:53:50,880
It looks at your configurations.
1229
00:53:50,880 –> 00:53:52,640
It compares them against security benchmarks.
1230
00:53:52,640 –> 00:53:58,120
Azure Security Benchmark, CIS controls, NIST, PCI DSS, whatever frameworks your organization cares about.
1231
00:53:58,120 –> 00:54:01,440
It identifies violations, resources that don’t match the benchmark,
1232
00:54:01,440 –> 00:54:05,760
resources that violate compliance requirements, resources that have known vulnerabilities.
1233
00:54:05,760 –> 00:54:07,400
It alerts you to every deviation.
1234
00:54:07,400 –> 00:54:11,440
The pattern that works includes enabling Defender for cloud on all subscriptions.
1235
00:54:11,440 –> 00:54:14,640
Not some subscriptions, all subscriptions, configure security standards,
1236
00:54:14,640 –> 00:54:17,120
choose the frameworks that matter to your organization.
1237
00:54:17,120 –> 00:54:19,040
Monitor compliance against those standards.
1238
00:54:19,040 –> 00:54:21,520
Remediate violations automatically where possible.
1239
00:54:21,520 –> 00:54:24,000
Escalate violations that require manual review.
1240
00:54:24,000 –> 00:54:27,160
This is where governance becomes continuous instead of episodic.
1241
00:54:27,160 –> 00:54:29,480
Why Defender for cloud matters is this.
1242
00:54:29,480 –> 00:54:33,080
It detects problems continuously, not during annual audits.
1243
00:54:33,080 –> 00:54:36,000
You discover a compliance violation the day it happens,
1244
00:54:36,000 –> 00:54:38,240
not six months later when an auditor finds it.
1245
00:54:38,240 –> 00:54:41,080
You discover a vulnerability, the moment it’s identified,
1246
00:54:41,080 –> 00:54:43,160
not after an attacker exploits it.
1247
00:54:43,160 –> 00:54:45,640
You discover a misconfiguration, the instant it’s deployed,
1248
00:54:45,640 –> 00:54:47,880
not after it’s been running in production for months.
1249
00:54:47,880 –> 00:54:51,320
This is where governance moves from reactive to proactive.
1250
00:54:51,320 –> 00:54:52,800
The distinction that matters is this.
1251
00:54:52,800 –> 00:54:54,640
Defender for cloud is detection.
1252
00:54:54,640 –> 00:54:56,320
Azure policy is prevention.
1253
00:54:56,320 –> 00:54:58,280
Defender finds problems after they exist.
1254
00:54:58,280 –> 00:55:00,920
Azure policy prevents problems from being created.
1255
00:55:00,920 –> 00:55:03,680
Together, they form a defense in-depth approach.
1256
00:55:03,680 –> 00:55:07,960
Azure policy stops non-compliant resources from being deployed in the first place.
1257
00:55:07,960 –> 00:55:11,200
Defender for cloud finds resources that somehow got deployed anyway.
1258
00:55:11,200 –> 00:55:13,400
Maybe they were created before the policy existed.
1259
00:55:13,400 –> 00:55:17,000
Maybe they were created through a manual process that bypassed the policy.
1260
00:55:17,000 –> 00:55:18,920
Maybe they drifted after deployment.
1261
00:55:18,920 –> 00:55:20,360
Defender catches all of these.
1262
00:55:20,360 –> 00:55:24,400
The combination of prevention and detection is what creates real governance.
1263
00:55:24,400 –> 00:55:27,320
Real scenario, a compliance requirement that all storage accounts
1264
00:55:27,320 –> 00:55:28,960
must have encryption enabled.
1265
00:55:28,960 –> 00:55:32,440
Azure policy prevents creation of non-encrypted storage accounts.
1266
00:55:32,440 –> 00:55:35,640
Defender for cloud detects existing non-encrypted storage accounts.
1267
00:55:35,640 –> 00:55:38,320
Together, they ensure encryption is always enabled.
1268
00:55:38,320 –> 00:55:39,960
Policy prevents new violations.
1269
00:55:39,960 –> 00:55:41,920
Defender finds old violations.
1270
00:55:41,920 –> 00:55:45,720
The organization gradually becomes compliant as old resources are remediated
1271
00:55:45,720 –> 00:55:48,400
and new resources are prevented from being non-compliant.
1272
00:55:48,400 –> 00:55:51,600
Why this skill is valuable is because designing compliance automation
1273
00:55:51,600 –> 00:55:55,080
that works across your entire Azure environment is harder than it sounds.
1274
00:55:55,080 –> 00:55:56,880
Defender for cloud generates alerts.
1275
00:55:56,880 –> 00:55:57,600
Lots of alerts.
1276
00:55:57,600 –> 00:56:00,480
If you don’t have a process for handling those alerts, they become noise.
1277
00:56:00,480 –> 00:56:01,640
You get alert fatigue.
1278
00:56:01,640 –> 00:56:02,920
People stop paying attention.
1279
00:56:02,920 –> 00:56:04,560
The signal disappears into the background.
1280
00:56:04,560 –> 00:56:07,520
The skill is designing a system where alerts are meaningful,
1281
00:56:07,520 –> 00:56:09,200
where violations are remediated,
1282
00:56:09,200 –> 00:56:11,400
where compliance becomes automatic instead of manual.
1283
00:56:11,400 –> 00:56:12,800
The scaling problem is real.
1284
00:56:12,800 –> 00:56:14,880
Compliance requirements vary by team,
1285
00:56:14,880 –> 00:56:17,680
by business unit, by region, by regulatory framework.
1286
00:56:17,680 –> 00:56:21,360
You need a system that can handle this complexity without becoming un-maintainable.
1287
00:56:21,360 –> 00:56:24,400
You need policies that are specific enough to catch real violations
1288
00:56:24,400 –> 00:56:26,760
but broad enough to apply across your organization.
1289
00:56:26,760 –> 00:56:29,760
You need alerts that are actionable, not theoretical.
1290
00:56:29,760 –> 00:56:32,280
You need remediation that’s automated, not manual.
1291
00:56:32,280 –> 00:56:34,880
The pattern that scales is governance frameworks
1292
00:56:34,880 –> 00:56:37,320
that are composed of smaller, reusable pieces.
1293
00:56:37,320 –> 00:56:40,840
A policy for encryption, a policy for tagging, a policy for logging.
1294
00:56:40,840 –> 00:56:42,800
Combine these into a compliance standard,
1295
00:56:42,800 –> 00:56:45,680
apply the standard to different scopes based on requirements.
1296
00:56:45,680 –> 00:56:48,200
Different business units might have different standards.
1297
00:56:48,200 –> 00:56:50,000
Different regions might have different requirements,
1298
00:56:50,000 –> 00:56:51,840
but the underlying policies are reusable.
1299
00:56:51,840 –> 00:56:54,960
You’re not creating compliance from scratch for each new team.
1300
00:56:54,960 –> 00:56:58,880
You’re combining existing policies into standards that fit the team’s needs.
1301
00:56:58,880 –> 00:57:01,880
Why this matters is that compliance automation is the only way
1302
00:57:01,880 –> 00:57:05,000
to enforce governance at scale without hiring a compliance team.
1303
00:57:05,000 –> 00:57:07,720
You cannot have a person review every resource in your environment.
1304
00:57:07,720 –> 00:57:10,880
You cannot have a security team audit every configuration.
1305
00:57:10,880 –> 00:57:13,960
Automated tools are the only way to enforce governance at scale.
1306
00:57:13,960 –> 00:57:15,760
Defender for cloud provides the visibility
1307
00:57:15,760 –> 00:57:17,440
as your policy provides the prevention.
1308
00:57:17,440 –> 00:57:21,120
Together they create a system where compliance is enforced automatically,
1309
00:57:21,120 –> 00:57:22,800
continuously at scale.
1310
00:57:22,800 –> 00:57:26,440
This is where governance moves from manual audit to continuous enforcement.
1311
00:57:26,440 –> 00:57:29,040
When violations are detected automatically,
1312
00:57:29,040 –> 00:57:31,640
when remediation is triggered by policy,
1313
00:57:31,640 –> 00:57:34,880
when compliance is measured continuously instead of annually,
1314
00:57:34,880 –> 00:57:36,280
that’s when erosion stops.
1315
00:57:36,280 –> 00:57:38,760
That’s when organizations move from crossing their fingers
1316
00:57:38,760 –> 00:57:42,640
and hoping for the best to ensuring the system enforces what should happen.
1317
00:57:42,640 –> 00:57:45,280
The governance scorecard and measuring what matters.
1318
00:57:45,280 –> 00:57:46,920
You can’t improve what you don’t measure.
1319
00:57:46,920 –> 00:57:48,720
Most organizations measure the wrong things.
1320
00:57:48,720 –> 00:57:51,160
They measure the number of policies they’ve deployed.
1321
00:57:51,160 –> 00:57:53,360
They measure the number of deployments that happened.
1322
00:57:53,360 –> 00:57:54,240
They measure uptime.
1323
00:57:54,240 –> 00:57:57,080
These metrics don’t tell you whether governance is actually working.
1324
00:57:57,080 –> 00:57:58,640
They tell you that you have governance.
1325
00:57:58,640 –> 00:58:01,240
They don’t tell you whether it’s preventing erosion.
1326
00:58:01,240 –> 00:58:03,440
The metrics that matter for governance are different.
1327
00:58:03,440 –> 00:58:06,160
They measure whether the system is actually doing what it’s supposed to do.
1328
00:58:06,160 –> 00:58:07,160
Policy compliance rate.
1329
00:58:07,160 –> 00:58:09,800
What percentage of resources comply with policies?
1330
00:58:09,800 –> 00:58:14,080
If you’ve deployed a policy that says all storage accounts must have encryption enabled,
1331
00:58:14,080 –> 00:58:16,640
and 80% of your storage accounts are encrypted,
1332
00:58:16,640 –> 00:58:18,720
you have a compliance rate of 80%.
1333
00:58:18,720 –> 00:58:19,880
That’s not good enough.
1334
00:58:19,880 –> 00:58:22,080
You should be targeting above 95%.
1335
00:58:22,080 –> 00:58:23,760
The remaining resources are violations.
1336
00:58:23,760 –> 00:58:26,560
They’re either old resources that existed before the policy
1337
00:58:26,560 –> 00:58:30,320
or they’re new resources that somehow got created without the policy being enforced.
1338
00:58:30,320 –> 00:58:30,880
Drift rate.
1339
00:58:30,880 –> 00:58:33,600
What percentage of resources diverge from intended state?
1340
00:58:33,600 –> 00:58:36,040
You defined how your infrastructure should be configured.
1341
00:58:36,040 –> 00:58:36,800
You deployed it.
1342
00:58:36,800 –> 00:58:38,520
Now you’re comparing actual to intended.
1343
00:58:38,520 –> 00:58:40,680
If 5% of your resources have drifted,
1344
00:58:40,680 –> 00:58:43,680
that’s a signal that your architecture isn’t enforcing what should happen.
1345
00:58:43,680 –> 00:58:47,400
You’re not detecting drift quickly enough, or you’re not remediating it.
1346
00:58:47,400 –> 00:58:49,720
Or you’re not preventing it from happening in the first place.
1347
00:58:49,720 –> 00:58:51,480
Your target should be below 5%.
1348
00:58:51,480 –> 00:58:52,560
RBAC hygiene.
1349
00:58:52,560 –> 00:58:55,320
What percentage of identities have least privilege access?
1350
00:58:55,320 –> 00:58:57,960
This is harder to measure because least privilege is contextual.
1351
00:58:57,960 –> 00:58:58,800
But you can measure it.
1352
00:58:58,800 –> 00:59:02,080
How many users have the owner role when they only need reader?
1353
00:59:02,080 –> 00:59:04,440
How many service principles have contributor permissions
1354
00:59:04,440 –> 00:59:07,040
when they only need read access to specific resources?
1355
00:59:07,040 –> 00:59:09,240
How many identities have permissions they’re not using?
1356
00:59:09,240 –> 00:59:12,880
If your RBAC hygiene is below 80%, you’ve got significant overprivileging.
1357
00:59:12,880 –> 00:59:14,400
That’s a security vulnerability.
1358
00:59:14,400 –> 00:59:16,200
That’s erosion.costvariance.
1359
00:59:16,200 –> 00:59:18,720
How much does actual spend diverge from forecast?
1360
00:59:18,720 –> 00:59:24,080
If you forecasted $10,000 a month and you spend 12,000, that’s a 12% variance.
1361
00:59:24,080 –> 00:59:25,120
That’s acceptable.
1362
00:59:25,120 –> 00:59:29,160
If you forecasted 10,000 and you spend 15,000, that’s a 50% variance.
1363
00:59:29,160 –> 00:59:31,040
That’s a signal that something is wrong.
1364
00:59:31,040 –> 00:59:34,960
Either your forecasting is broken or your cost controls aren’t working.
1365
00:59:34,960 –> 00:59:36,280
Either way, you need to fix it.
1366
00:59:36,280 –> 00:59:38,800
Your target should be under 10% variance.
1367
00:59:38,800 –> 00:59:39,920
Remediation time.
1368
00:59:39,920 –> 00:59:42,720
How long does it take to fix a compliance violation?
1369
00:59:42,720 –> 00:59:46,760
If a violation is discovered on Monday and fixed on Friday, that’s five days of noncompliance.
1370
00:59:46,760 –> 00:59:49,920
If a violation is discovered and fixed the same day, that’s ideal.
1371
00:59:49,920 –> 00:59:51,680
Your target should be under 24 hours.
1372
00:59:51,680 –> 00:59:55,160
The faster you remediate, the less time your environment is noncompliant.
1373
00:59:55,160 –> 00:59:57,040
The less time erosion is accumulating.
1374
00:59:57,040 –> 00:59:59,320
The pattern that works is straightforward.
1375
00:59:59,320 –> 01:00:00,640
Define target metrics.
1376
01:00:00,640 –> 01:00:04,600
Policy compliance above 95%, drift rate below 5%.
1377
01:00:04,600 –> 01:00:10,480
RBAC hygiene above 85%, cost variance under 10%, remediation time under 24 hours.
1378
01:00:10,480 –> 01:00:12,280
Measure actual metrics.
1379
01:00:12,280 –> 01:00:15,840
Run reports that tell you where you stand against these targets.
1380
01:00:15,840 –> 01:00:16,840
Identify gaps.
1381
01:00:16,840 –> 01:00:21,120
If your policy compliance is 85% and your target is 95%, you’ve got a gap.
1382
01:00:21,120 –> 01:00:22,640
Design interventions to close gaps.
1383
01:00:22,640 –> 01:00:23,640
Titan policies.
1384
01:00:23,640 –> 01:00:24,640
Remove exceptions.
1385
01:00:24,640 –> 01:00:25,640
Improve enforcement.
1386
01:00:25,640 –> 01:00:27,680
Measure again to verify improvement.
1387
01:00:27,680 –> 01:00:28,680
Real scenario.
1388
01:00:28,680 –> 01:00:30,800
A governance scorecard for a landing zone.
1389
01:00:30,800 –> 01:00:32,600
Policy compliance is 92%.
1390
01:00:32,600 –> 01:00:33,960
Target is 95%.
1391
01:00:33,960 –> 01:00:34,960
Gap of 3%.
1392
01:00:34,960 –> 01:00:35,960
Drift rate is 8%.
1393
01:00:35,960 –> 01:00:36,960
Target is 5%.
1394
01:00:36,960 –> 01:00:37,960
Gap of 3%.
1395
01:00:37,960 –> 01:00:39,960
RBAC hygiene is 78%.
1396
01:00:39,960 –> 01:00:40,960
Target is 85%.
1397
01:00:40,960 –> 01:00:41,960
Gap of 7%.
1398
01:00:41,960 –> 01:00:43,960
Cost variance is 12%.
1399
01:00:43,960 –> 01:00:44,960
Target is 10%.
1400
01:00:44,960 –> 01:00:45,960
Gap of 2%.
1401
01:00:45,960 –> 01:00:47,680
Remediation time is 36 hours.
1402
01:00:47,680 –> 01:00:48,920
Target is 24 hours.
1403
01:00:48,920 –> 01:00:50,200
Gap of 12 hours.
1404
01:00:50,200 –> 01:00:52,120
Now you have interventions for policy compliance.
1405
01:00:52,120 –> 01:00:53,440
Titan policies.
1406
01:00:53,440 –> 01:00:55,000
Identify which policies are failing.
1407
01:00:55,000 –> 01:00:56,000
Are they too broad?
1408
01:00:56,000 –> 01:00:58,080
Are they catching legitimate use cases?
1409
01:00:58,080 –> 01:00:59,080
Refine them.
1410
01:00:59,080 –> 01:01:00,080
Remove exceptions.
1411
01:01:00,080 –> 01:01:01,080
Exceptions are dead.
1412
01:01:01,080 –> 01:01:02,080
Drift rate.
1413
01:01:02,080 –> 01:01:03,080
Improve drift detection.
1414
01:01:03,080 –> 01:01:04,080
Maybe you’re only scanning weekly.
1415
01:01:04,080 –> 01:01:05,080
Scan daily.
1416
01:01:05,080 –> 01:01:06,680
Maybe you’re not remediating automatically.
1417
01:01:06,680 –> 01:01:08,080
Implement auto remediation.
1418
01:01:08,080 –> 01:01:10,920
For RBAC hygiene, implement just in time access.
1419
01:01:10,920 –> 01:01:12,320
Remove permanent assignments.
1420
01:01:12,320 –> 01:01:14,680
For cost variance, improve cost estimation.
1421
01:01:14,680 –> 01:01:16,040
Maybe your forecasts are broken.
1422
01:01:16,040 –> 01:01:18,440
For remediation time, automate remediation.
1423
01:01:18,440 –> 01:01:20,040
Manual remediation is slow.
1424
01:01:20,040 –> 01:01:21,680
Automated remediation is fast.
1425
01:01:21,680 –> 01:01:25,080
Why this skill is valuable is because designing metrics that actually measure governance
1426
01:01:25,080 –> 01:01:26,960
effectiveness is harder than it sounds.
1427
01:01:26,960 –> 01:01:28,520
Easy metrics are meaningless.
1428
01:01:28,520 –> 01:01:30,000
Hard metrics are hard to measure.
1429
01:01:30,000 –> 01:01:33,240
The skill is finding the metrics that are meaningful and measurable.
1430
01:01:33,240 –> 01:01:37,160
That’s what separates governance that’s real from governance that’s theatre.
1431
01:01:37,160 –> 01:01:39,960
Scaling governance across teams and organizations.
1432
01:01:39,960 –> 01:01:43,360
Governance that works for one team doesn’t automatically work for 10 teams.
1433
01:01:43,360 –> 01:01:47,520
As organization scale, governance either becomes more systematic or it collapses.
1434
01:01:47,520 –> 01:01:49,960
You can’t scale governance through heroic effort.
1435
01:01:49,960 –> 01:01:53,200
You can’t scale it through a single person understanding all the rules.
1436
01:01:53,200 –> 01:01:58,120
You can’t scale it through manual processes that require human judgment at every step.
1437
01:01:58,120 –> 01:02:00,720
That’s scales through automation and delegation.
1438
01:02:00,720 –> 01:02:04,080
Through frameworks that teams instantiate instead of creating from scratch.
1439
01:02:04,080 –> 01:02:07,280
Through policies that are enforced by the system instead of by people.
1440
01:02:07,280 –> 01:02:08,520
Here’s the pattern that works.
1441
01:02:08,520 –> 01:02:10,200
You define governance principles.
1442
01:02:10,200 –> 01:02:12,280
Security, compliance, cost, agility.
1443
01:02:12,280 –> 01:02:13,560
These are the things you care about.
1444
01:02:13,560 –> 01:02:15,920
These are the things that matter to your organization.
1445
01:02:15,920 –> 01:02:20,000
You codify these principles into policies, standards and procedures.
1446
01:02:20,000 –> 01:02:24,760
Not as documentation, not as guidelines, as code, as enforcement mechanisms.
1447
01:02:24,760 –> 01:02:26,880
You distribute governance responsibility.
1448
01:02:26,880 –> 01:02:28,880
You just own their own governance within frameworks.
1449
01:02:28,880 –> 01:02:30,080
They’re not asking for permission.
1450
01:02:30,080 –> 01:02:31,560
They’re not waiting for approval.
1451
01:02:31,560 –> 01:02:35,160
They’re operating within guardrails that the system enforces automatically.
1452
01:02:35,160 –> 01:02:37,240
You audit and adjust continuously.
1453
01:02:37,240 –> 01:02:38,240
Governance isn’t static.
1454
01:02:38,240 –> 01:02:39,720
Your organization changes.
1455
01:02:39,720 –> 01:02:40,800
Your requirements change.
1456
01:02:40,800 –> 01:02:41,800
Your threats change.
1457
01:02:41,800 –> 01:02:43,240
Your governance has to change with it.
1458
01:02:43,240 –> 01:02:44,240
You measure compliance.
1459
01:02:44,240 –> 01:02:45,240
You identify gaps.
1460
01:02:45,240 –> 01:02:46,400
You adjust policies.
1461
01:02:46,400 –> 01:02:47,400
You test changes.
1462
01:02:47,400 –> 01:02:48,400
You deploy them.
1463
01:02:48,400 –> 01:02:49,400
You measure again.
1464
01:02:49,400 –> 01:02:52,320
This is a continuous cycle, not a one-time project.
1465
01:02:52,320 –> 01:02:56,600
Why this prevents erosion is that governance scales through automation and delegation.
1466
01:02:56,600 –> 01:02:57,880
Through centralized control.
1467
01:02:57,880 –> 01:03:01,320
Essential governance team that approves every change becomes a bottleneck.
1468
01:03:01,320 –> 01:03:02,600
Teams wait for approval.
1469
01:03:02,600 –> 01:03:03,800
Teams get frustrated.
1470
01:03:03,800 –> 01:03:05,360
Teams find workarounds.
1471
01:03:05,360 –> 01:03:07,680
Teams bypass governance to move faster.
1472
01:03:07,680 –> 01:03:10,320
Governance becomes a blocker instead of an enabler.
1473
01:03:10,320 –> 01:03:13,480
The anti-pattern is a central governance team that controls everything.
1474
01:03:13,480 –> 01:03:14,400
They own the policies.
1475
01:03:14,400 –> 01:03:15,720
They approve every deployment.
1476
01:03:15,720 –> 01:03:16,920
They review every change.
1477
01:03:16,920 –> 01:03:17,960
They’re the gatekeepers.
1478
01:03:17,960 –> 01:03:19,360
This creates bottlenecks.
1479
01:03:19,360 –> 01:03:20,440
It creates resentment.
1480
01:03:20,440 –> 01:03:22,600
It creates incentives to bypass the system.
1481
01:03:22,600 –> 01:03:25,600
Teams that feel blocked by governance don’t comply with governance.
1482
01:03:25,600 –> 01:03:26,840
They find ways around it.
1483
01:03:26,840 –> 01:03:27,880
They use workarounds.
1484
01:03:27,880 –> 01:03:29,680
They operate outside the framework.
1485
01:03:29,680 –> 01:03:33,360
This is worse than no governance at all because now you have the overhead of a governance system
1486
01:03:33,360 –> 01:03:34,600
that nobody is using.
1487
01:03:34,600 –> 01:03:37,800
The pattern that scales is distributed governance with guardrails.
1488
01:03:37,800 –> 01:03:39,560
Teams have autonomy within frameworks.
1489
01:03:39,560 –> 01:03:40,400
They can innovate.
1490
01:03:40,400 –> 01:03:41,520
They can move fast.
1491
01:03:41,520 –> 01:03:44,760
But they’re operating within constraints that ensure security and compliance.
1492
01:03:44,760 –> 01:03:47,640
The constraints are enforced by the system, not by people.
1493
01:03:47,640 –> 01:03:51,680
A team can’t deploy non-compliant infrastructure because the system blocks it.
1494
01:03:51,680 –> 01:03:54,560
A team can’t exceed their budget because cost controls prevented.
1495
01:03:54,560 –> 01:03:58,880
A team can’t create over-privileged identities because the system limits what’s possible.
1496
01:03:58,880 –> 01:04:00,560
The framework is enforced automatically.
1497
01:04:00,560 –> 01:04:02,040
Teams don’t have to ask for permission.
1498
01:04:02,040 –> 01:04:03,960
They just operate within the constraints.
1499
01:04:03,960 –> 01:04:04,960
Real scenario.
1500
01:04:04,960 –> 01:04:07,160
A governance framework for AI agents.
1501
01:04:07,160 –> 01:04:08,160
Principle.
1502
01:04:08,160 –> 01:04:10,800
AI agents should have least privilege access.
1503
01:04:10,800 –> 01:04:11,800
Policy.
1504
01:04:11,800 –> 01:04:14,200
AI agents must be registered in Entra Agent ID.
1505
01:04:14,200 –> 01:04:15,200
Policy.
1506
01:04:15,200 –> 01:04:17,240
AI agents must have scoped permissions.
1507
01:04:17,240 –> 01:04:18,240
Policy.
1508
01:04:18,240 –> 01:04:20,040
AI agent actions must be logged.
1509
01:04:20,040 –> 01:04:21,040
Policy.
1510
01:04:21,040 –> 01:04:22,360
AI agents must have human owners.
1511
01:04:22,360 –> 01:04:23,960
Teams can create AI agents.
1512
01:04:23,960 –> 01:04:25,440
They want to deploy new agents.
1513
01:04:25,440 –> 01:04:26,800
They don’t ask for permission.
1514
01:04:26,800 –> 01:04:27,800
They follow the framework.
1515
01:04:27,800 –> 01:04:28,920
They register the agent.
1516
01:04:28,920 –> 01:04:30,280
They define scoped permissions.
1517
01:04:30,280 –> 01:04:31,520
They assign a human owner.
1518
01:04:31,520 –> 01:04:34,760
The system validates that the agent complies with policies.
1519
01:04:34,760 –> 01:04:36,640
If it does deployment proceeds automatically.
1520
01:04:36,640 –> 01:04:39,400
If it doesn’t, the system blocks it and tells the team what’s wrong.
1521
01:04:39,400 –> 01:04:41,400
The team fixes the issue and resubmits.
1522
01:04:41,400 –> 01:04:42,840
No approval process.
1523
01:04:42,840 –> 01:04:43,840
No bottleneck.
1524
01:04:43,840 –> 01:04:44,720
No delay.
1525
01:04:44,720 –> 01:04:46,840
Just automated enforcement.
1526
01:04:46,840 –> 01:04:51,440
Why this skill is valuable is because designing governance frameworks that scale to hundreds
1527
01:04:51,440 –> 01:04:54,720
of teams without creating bottlenecks is harder than it sounds.
1528
01:04:54,720 –> 01:04:56,680
You have to balance autonomy with control.
1529
01:04:56,680 –> 01:05:00,200
You have to make constraints visible without making them burdensome.
1530
01:05:00,200 –> 01:05:02,560
You have to enforce policies without blocking innovation.
1531
01:05:02,560 –> 01:05:07,760
This is the skill that separates architects who understand scaling from architects who understand
1532
01:05:07,760 –> 01:05:08,760
Azure.
1533
01:05:08,760 –> 01:05:09,880
The scaling problem is real.
1534
01:05:09,880 –> 01:05:12,760
As organizations grow, governance frameworks become complex.
1535
01:05:12,760 –> 01:05:15,360
Too many policies and teams can’t remember them all.
1536
01:05:15,360 –> 01:05:17,600
Too few policies and governance isn’t granular enough.
1537
01:05:17,600 –> 01:05:21,320
You need frameworks that are simple at the core, but extensible for specific needs.
1538
01:05:21,320 –> 01:05:25,880
You need policies that are broadly applicable, but allow for context-specific variations.
1539
01:05:25,880 –> 01:05:29,920
You need guardrails that prevent the worst outcomes without preventing all outcomes.
1540
01:05:29,920 –> 01:05:34,960
The pattern that scales is governance frameworks composed of smaller, reusable pieces.
1541
01:05:34,960 –> 01:05:38,000
A core set of organization-wide policies that everyone follows.
1542
01:05:38,000 –> 01:05:41,120
A set of business-unit policies that apply to specific groups.
1543
01:05:41,120 –> 01:05:44,400
A set of team policies that apply to specific workloads.
1544
01:05:44,400 –> 01:05:46,120
Teams inherit policies from every level.
1545
01:05:46,120 –> 01:05:47,800
They’re constrained by all of them.
1546
01:05:47,800 –> 01:05:49,480
But they’re not overwhelmed by them.
1547
01:05:49,480 –> 01:05:50,840
The constraints are layered.
1548
01:05:50,840 –> 01:05:52,840
The system enforces them automatically.
1549
01:05:52,840 –> 01:05:55,560
Teams operate within the constraints without thinking about them.
1550
01:05:55,560 –> 01:05:59,160
Why this matters is that most organizations have governance frameworks that don’t scale.
1551
01:05:59,160 –> 01:06:00,800
They start with manual processes.
1552
01:06:00,800 –> 01:06:02,280
They add policies as they grow.
1553
01:06:02,280 –> 01:06:03,880
They patch problems reactively.
1554
01:06:03,880 –> 01:06:07,560
By the time they realize governance isn’t scaling, they’ve got technical debt spread across
1555
01:06:07,560 –> 01:06:08,720
their entire environment.
1556
01:06:08,720 –> 01:06:12,480
The organizations that understand this, that design governance for scale from the beginning,
1557
01:06:12,480 –> 01:06:16,840
that build frameworks that are composable and extensible, that automate enforcement.
1558
01:06:16,840 –> 01:06:19,320
So teams don’t have to think about compliance.
1559
01:06:19,320 –> 01:06:23,640
Those organizations are going to win in 2026.
1560
01:06:23,640 –> 01:06:25,000
The career path.
1561
01:06:25,000 –> 01:06:27,360
From infrastructure to governance architecture.
1562
01:06:27,360 –> 01:06:30,960
The traditional career path in cloud has always been straightforward.
1563
01:06:30,960 –> 01:06:34,000
Infrastructure engineer, you learn how to provision resources, you learn how to configure
1564
01:06:34,000 –> 01:06:38,960
networks, you learn how to deploy applications, you move up to cloud architect, you design larger
1565
01:06:38,960 –> 01:06:43,640
systems, you make decisions about how infrastructure should be organized, you move up to enterprise
1566
01:06:43,640 –> 01:06:47,000
architect, you make decisions about how the entire organization should operate in the
1567
01:06:47,000 –> 01:06:48,000
cloud.
1568
01:06:48,000 –> 01:06:49,000
This path exists.
1569
01:06:49,000 –> 01:06:50,760
It’s how most people think about cloud careers.
1570
01:06:50,760 –> 01:06:54,760
There’s a different path emerging and it’s the one that leads to higher compensation,
1571
01:06:54,760 –> 01:06:59,640
more interesting problems and genuine leverage over organizational outcomes.
1572
01:06:59,640 –> 01:07:03,320
Infrastructure engineer, you learn how to provision resources, you learn how to configure
1573
01:07:03,320 –> 01:07:08,040
networks, you learn how to deploy applications, then you pivot, you become a governance engineer,
1574
01:07:08,040 –> 01:07:11,560
you stop building new infrastructure and start designing the frameworks that prevent bad
1575
01:07:11,560 –> 01:07:15,480
infrastructure from being built, you design policies, you design identity controls, you
1576
01:07:15,480 –> 01:07:19,680
design cost governance, you design the systems that make doing the right thing the path of
1577
01:07:19,680 –> 01:07:23,920
least resistance, you move up to governance architect, you design governance frameworks that
1578
01:07:23,920 –> 01:07:29,040
scale across entire organizations, you design systems that prevent erosion at scale, you
1579
01:07:29,040 –> 01:07:33,120
design the control planes that enable innovation without creating chaos.
1580
01:07:33,120 –> 01:07:37,320
Why this path exists is because governance is becoming more valuable than infrastructure
1581
01:07:37,320 –> 01:07:38,880
as organization scale.
1582
01:07:38,880 –> 01:07:43,040
When you’re a startup with 10 engineers and one Azure subscription, infrastructure skills
1583
01:07:43,040 –> 01:07:44,040
matter.
1584
01:07:44,040 –> 01:07:47,720
You need people who can provision resources quickly, you need people who understand how to
1585
01:07:47,720 –> 01:07:53,040
build systems, but when you’re an enterprise with a thousand engineers and a hundred subscriptions,
1586
01:07:53,040 –> 01:07:57,000
governance matters more, you need people who can prevent chaos, you need people who can
1587
01:07:57,000 –> 01:08:02,320
enforce standards at scale, you need people who can design systems where compliance is automatic
1588
01:08:02,320 –> 01:08:03,560
instead of manual.
1589
01:08:03,560 –> 01:08:08,360
The skills that matter at each level are different, a governance engineer can design and implement
1590
01:08:08,360 –> 01:08:12,760
governance frameworks for a single business unit or team, they understand Azure policy,
1591
01:08:12,760 –> 01:08:16,840
they understand bicep, they understand identity governance, they understand how to compose
1592
01:08:16,840 –> 01:08:21,760
these into a framework that works for a specific context, a governance architect can design
1593
01:08:21,760 –> 01:08:26,160
governance frameworks that scale across an entire organization, they understand how to
1594
01:08:26,160 –> 01:08:30,240
make governance composable, they understand how to balance autonomy with control, they understand
1595
01:08:30,240 –> 01:08:33,600
how to design systems that scale without becoming unwieldy.
1596
01:08:33,600 –> 01:08:38,000
A principle governance architect can design governance frameworks that work across multiple
1597
01:08:38,000 –> 01:08:42,560
organizations, multiple clouds, multiple regulatory frameworks, they understand how to
1598
01:08:42,560 –> 01:08:46,440
make governance portable, they understand how to design systems that adapt to different
1599
01:08:46,440 –> 01:08:51,000
contexts, why compensation increases along this path is straightforward, governance engineers
1600
01:08:51,000 –> 01:08:54,800
are scarce, most people want to build new things, they want to see their code running in
1601
01:08:54,800 –> 01:08:58,400
production, they want to solve problems, governance is about preventing problems, it’s about
1602
01:08:58,400 –> 01:09:02,600
making sure things don’t go wrong, it’s less glamorous, it’s less visible, most people
1603
01:09:02,600 –> 01:09:06,560
don’t want to do it, most organizations don’t have formal governance roles, the few
1604
01:09:06,560 –> 01:09:11,320
organizations that do have formal governance roles understand the value, they pay premium
1605
01:09:11,320 –> 01:09:15,680
salaries because they understand that a single governance engineer can prevent millions of
1606
01:09:15,680 –> 01:09:20,760
dollars in incidence compliance violations and architectural debt, real scenario, a governance
1607
01:09:20,760 –> 01:09:27,240
engineer at a financial services company, salary range 150 to 200,000 dollars, responsibilities,
1608
01:09:27,240 –> 01:09:31,160
design and implement governance frameworks for regulatory compliance, prevent compliance
1609
01:09:31,160 –> 01:09:36,360
violations that could cost millions in fines, move to governance architect role, potential
1610
01:09:36,360 –> 01:09:42,160
salary 200 to 300,000 dollars, responsibilities design governance frameworks that scale across
1611
01:09:42,160 –> 01:09:47,280
multiple business units, prevent architectural erosion across the entire organization, enable
1612
01:09:47,280 –> 01:09:52,040
innovation without creating chaos, this is where the compensation premium becomes substantial,
1613
01:09:52,040 –> 01:09:55,960
why this matters is this, if you’re building a career in Azure, governance is a more valuable
1614
01:09:55,960 –> 01:10:01,080
specialization than infrastructure, infrastructure skills become obsolete as Azure services change,
1615
01:10:01,080 –> 01:10:05,160
new services launch, old services are retired, the skills you learned two years ago might
1616
01:10:05,160 –> 01:10:09,760
be irrelevant today, but governance skills compound, the governance framework you designed for
1617
01:10:09,760 –> 01:10:14,600
one organization can be adapted for another, the policies you wrote can be reused, the
1618
01:10:14,600 –> 01:10:18,680
patterns you learned scale across contexts, your value increases as you accumulate experience
1619
01:10:18,680 –> 01:10:22,600
with governance patterns, the market opportunity is substantial, there are hundreds of thousands
1620
01:10:22,600 –> 01:10:26,320
of infrastructure engineers, there are tens of thousands of cloud architects, there are
1621
01:10:26,320 –> 01:10:30,200
thousands of governance engineers, the supply is tiny compared to demand, organizations
1622
01:10:30,200 –> 01:10:34,160
are desperately looking for people who can design governance frameworks that scale, they’re
1623
01:10:34,160 –> 01:10:38,120
looking for people who understand how to prevent erosion, they’re looking for people who can
1624
01:10:38,120 –> 01:10:41,640
make governance invisible because it’s so well designed that teams don’t even think
1625
01:10:41,640 –> 01:10:46,040
about it, how to transition is straightforward, start by designing governance for your current
1626
01:10:46,040 –> 01:10:51,400
team, design the policies, design the controls, design the frameworks, expand to larger scopes
1627
01:10:51,400 –> 01:10:55,320
as you gain experience, design governance for your business unit, design governance for
1628
01:10:55,320 –> 01:11:00,880
your organization, build a portfolio of governance frameworks you’ve designed, document the outcomes,
1629
01:11:00,880 –> 01:11:05,000
show the metrics, show the compliance rates, show the cost savings, show the incidents prevented,
1630
01:11:05,000 –> 01:11:09,240
this is how you build credibility as a governance architect, this is how you move from infrastructure
1631
01:11:09,240 –> 01:11:13,520
to governance, this is how you position yourself for the high income roles that are going
1632
01:11:13,520 –> 01:11:15,960
to dominate in 2026.
1633
01:11:15,960 –> 01:11:20,880
Building your governance foundation, the first 90 days, if you’re starting from scratch,
1634
01:11:20,880 –> 01:11:26,080
here’s the order to tackle governance, not all at once, not in parallel, in sequence,
1635
01:11:26,080 –> 01:11:31,360
month one, establish identity governance, month two, establish policy governance, month three,
1636
01:11:31,360 –> 01:11:34,720
establish operational governance, this is the order that works because each layer builds
1637
01:11:34,720 –> 01:11:38,440
on the previous one, month one is identity governance, you start here because everything
1638
01:11:38,440 –> 01:11:42,320
else depends on it, you can’t enforce policies without knowing who’s doing what, you can’t
1639
01:11:42,320 –> 01:11:46,520
audit actions without knowing who performed them, you can’t prevent erosion without controlling
1640
01:11:46,520 –> 01:11:50,720
who has access to what, so you start with identity, audit existing identities, who has
1641
01:11:50,720 –> 01:11:54,320
what access it, this is harder than it sounds, you’re not just looking at human users,
1642
01:11:54,320 –> 01:11:58,600
you’re looking at service principles, managed identities, application registrations,
1643
01:11:58,600 –> 01:12:02,640
every non-human identity that has access to your resources, most organizations discover
1644
01:12:02,640 –> 01:12:06,640
they have far more identities than they thought, service principles created for automation
1645
01:12:06,640 –> 01:12:11,400
that nobody remembers, managed identities assigned to applications years ago, application
1646
01:12:11,400 –> 01:12:15,480
registrations for integrations that no longer exist, you’re going to find a lot of craft,
1647
01:12:15,480 –> 01:12:18,800
identify overprivileged identities who has more access than they need, a user with the
1648
01:12:18,800 –> 01:12:23,680
owner role who only reads resources, a service principle with contributor permissions that
1649
01:12:23,680 –> 01:12:28,640
only needs read access to specific storage accounts, a managed identity with broad permissions
1650
01:12:28,640 –> 01:12:32,400
when it should have scope permissions, you’re going to find a lot of overprivileging, this
1651
01:12:32,400 –> 01:12:35,680
is where most organizations are, they grant broad permissions to get something working
1652
01:12:35,680 –> 01:12:40,160
quickly, they plan to tighten it later, they never do, implement least privilege access,
1653
01:12:40,160 –> 01:12:44,080
remove unnecessary permissions, this is the hard part, you have to understand what each identity
1654
01:12:44,080 –> 01:12:47,680
actually needs, not what they might need, not what they had before, what they actually
1655
01:12:47,680 –> 01:12:51,720
need right now, you have to be ruthless about removing permissions, if an identity hasn’t
1656
01:12:51,720 –> 01:12:55,880
used a permission in 90 days, it probably doesn’t need it, remove it, if an identity has permissions
1657
01:12:55,880 –> 01:13:00,200
to resources, it doesn’t interact with, remove them, if an identity has broad permissions
1658
01:13:00,200 –> 01:13:04,680
when scoped permissions would work, scope them, implement conditional access policies,
1659
01:13:04,680 –> 01:13:09,280
enforce multi-factor authentication, require device compliance block access from suspicious
1660
01:13:09,280 –> 01:13:13,560
locations, this is where you move from static access controls to dynamic ones, access is
1661
01:13:13,560 –> 01:13:18,880
no longer just a binary decision, it’s evaluated based on context, risk, location, device, time
1662
01:13:18,880 –> 01:13:24,320
of day, the system adjusts access based on these signals, implement just-in-time access.
1663
01:13:24,320 –> 01:13:27,640
Temporary elevation for privileged operations, a user needs to perform an administrative
1664
01:13:27,640 –> 01:13:32,240
task, they request temporary access, the system grants it for limited time, one hour, two
1665
01:13:32,240 –> 01:13:36,680
hours, whatever the task requires, when the time expires, access is revoked automatically,
1666
01:13:36,680 –> 01:13:40,520
the user can’t access the resource anymore, this is where you prevent standing privileges
1667
01:13:40,520 –> 01:13:43,040
from becoming permanent vulnerabilities.
1668
01:13:43,040 –> 01:13:46,360
Month two is policy governance, now that you have identity controls in place, you can
1669
01:13:46,360 –> 01:13:50,360
enforce policies, define governance principles, what are you trying to prevent?
1670
01:13:50,360 –> 01:13:55,280
Unencrypted data, untagged resources, resources in unauthorized regions, resources without logging,
1671
01:13:55,280 –> 01:13:59,120
define your principles clearly, these are non-negotiable design policy framework, what
1672
01:13:59,120 –> 01:14:04,240
policies enforce your principles, a policy that requires encryption on storage accounts,
1673
01:14:04,240 –> 01:14:09,040
a policy that requires tagging on all resources, a policy that restricts deployment to authorized
1674
01:14:09,040 –> 01:14:13,080
regions, a policy that requires logging on all resources, start with a small number
1675
01:14:13,080 –> 01:14:15,560
of core policies, you can add more later.
1676
01:14:15,560 –> 01:14:19,720
Some policies in audit mode, deploy your policies but don’t enforce them yet, just detect
1677
01:14:19,720 –> 01:14:23,840
violations, run this for a week or two, see what violations you find, some violations are
1678
01:14:23,840 –> 01:14:28,520
going to be legitimate, resources that existed before the policy, resources that need exceptions,
1679
01:14:28,520 –> 01:14:32,560
some violations are going to be mistakes, resources that were misconfigured, resources that
1680
01:14:32,560 –> 01:14:37,320
shouldn’t exist, identify which is which, test policies, identify false positives, a policy
1681
01:14:37,320 –> 01:14:41,600
that catches legitimate use cases, refine the policy, make it more specific, make it less
1682
01:14:41,600 –> 01:14:45,520
likely to catch false positives, identify false negatives, a policy that should catch
1683
01:14:45,520 –> 01:14:50,040
something but doesn’t refine the policy, make it broader, make it catch what it’s supposed
1684
01:14:50,040 –> 01:14:54,960
to catch, shift to deny mode, now enforce the policies, block non-compliant deployments.
1685
01:14:54,960 –> 01:14:58,680
This is where governance becomes real, teams can’t deploy resources that violate policy,
1686
01:14:58,680 –> 01:15:01,160
they have to fix their deployments, they have to comply.
1687
01:15:01,160 –> 01:15:04,960
Month three is operational governance, now that you have identity controls and policies
1688
01:15:04,960 –> 01:15:10,000
in place, you can enforce governance operationally, design CI/CD pipelines, enforce governance
1689
01:15:10,000 –> 01:15:15,080
before deployment, implement drift detection, catch divergence from intended state, implement
1690
01:15:15,080 –> 01:15:19,840
monitoring and alerting, visibility into governance metrics, implement remediation, automatically
1691
01:15:19,840 –> 01:15:24,200
fix violations where possible, this is a 90 day sprint, by the end you have governance
1692
01:15:24,200 –> 01:15:28,280
foundations in place, identities controlled, policies are enforced, operations are governed,
1693
01:15:28,280 –> 01:15:31,560
you’re not done, governance is never done, but you’ve established the foundations, you’ve
1694
01:15:31,560 –> 01:15:36,040
prevented the worst outcomes, you’ve created the frameworks that scale, from here you expand,
1695
01:15:36,040 –> 01:15:40,200
you add more policies, you extend governance to new teams, you refine controls based on what
1696
01:15:40,200 –> 01:15:45,040
you’ve learned, but the foundations are solid, the counter intuitive truth about governance,
1697
01:15:45,040 –> 01:15:49,480
governance is often seen as a blocker to innovation, it’s the thing that slows you down,
1698
01:15:49,480 –> 01:15:53,000
it’s the reason you can’t move fast, it’s the bureaucracy that prevents you from getting
1699
01:15:53,000 –> 01:15:57,200
things done, this perception is wrong, and it’s expensive to be wrong about this.
1700
01:15:57,200 –> 01:16:01,440
The counter intuitive truth is this, governance is an accelerator to innovation, not a blocker,
1701
01:16:01,440 –> 01:16:05,600
an accelerator, the best run organizations move faster than poorly governed organizations,
1702
01:16:05,600 –> 01:16:09,200
they innovate more, they ship more, they achieve more, the difference isn’t that they have
1703
01:16:09,200 –> 01:16:13,120
fewer constraints, it’s that they have the right constraints, constraints that prevent
1704
01:16:13,120 –> 01:16:17,760
the worst outcomes without preventing all outcomes, here’s why, a team without governance moves
1705
01:16:17,760 –> 01:16:21,840
fast initially, their provision resources quickly, they deploy applications, they ship
1706
01:16:21,840 –> 01:16:26,480
features, they’re moving at full velocity, then they make mistakes, misconfigurations,
1707
01:16:26,480 –> 01:16:31,320
security gaps, cost overruns, they discover problems in production, they spend time fixing
1708
01:16:31,320 –> 01:16:36,160
mistakes, they spend time in incident response, they spend time remediating compliance violations,
1709
01:16:36,160 –> 01:16:40,160
they slow down, by the end of the quarter they’ve shipped fewer features than a team with
1710
01:16:40,160 –> 01:16:44,240
governance because they spend so much time fixing problems, a team with governance moves slightly
1711
01:16:44,240 –> 01:16:48,320
slower initially, they have to think about compliance, they have to follow policies, they have to
1712
01:16:48,320 –> 01:16:52,400
tag resources, they have to request approvals, but they make fewer mistakes, they spend less
1713
01:16:52,400 –> 01:16:56,400
time fixing problems, they spend less time in incident response, they spend less time
1714
01:16:56,400 –> 01:17:00,160
remediating violations, by the end of the quarter they’ve shipped more features because they
1715
01:17:00,160 –> 01:17:04,560
didn’t waste time fixing preventable problems, the distinction that matters is this, governance
1716
01:17:04,560 –> 01:17:08,640
that’s designed well accelerates innovation, governance that’s designed poorly blocks it,
1717
01:17:08,640 –> 01:17:12,720
most organizations have governance that’s designed poorly, that’s why they see it as a blocker,
1718
01:17:12,720 –> 01:17:16,640
they’ve implemented governance in a way that creates friction, without creating value,
1719
01:17:16,640 –> 01:17:20,400
they’ve implemented governance in a way that requires manual approval for every change,
1720
01:17:20,400 –> 01:17:24,640
they’ve implemented governance in a way that’s so strict it forces teams to find workarounds,
1721
01:17:24,640 –> 01:17:28,640
the pattern that works is governance that’s clear fast and fair, clear means teams understand
1722
01:17:28,640 –> 01:17:32,960
why policies exist and what they’re trying to prevent, if a team understands that encryption is
1723
01:17:32,960 –> 01:17:37,600
required because unencrypted data creates compliance violations, they are more likely to comply
1724
01:17:37,600 –> 01:17:42,160
than if their just told encryption is required, fast means policies are enforced automatically,
1725
01:17:42,160 –> 01:17:46,320
not through manual approval processes, a developer commits code, a pipeline validates it,
1726
01:17:46,320 –> 01:17:50,320
the validation takes seconds, the developer knows immediately whether their code is compliant,
1727
01:17:50,320 –> 01:17:54,240
if it’s not they fix it, if it is it deploys, no waiting for approval, no bottleneck,
1728
01:17:54,240 –> 01:17:58,720
fair means policies apply equally to all teams, no special exceptions for high priority projects,
1729
01:17:58,720 –> 01:18:02,880
no shortcuts for senior engineers, the system treats everyone the same, real scenario,
1730
01:18:02,880 –> 01:18:07,120
a company with a policy that requires all resources to be tagged, poorly designed,
1731
01:18:07,120 –> 01:18:11,040
governance team reviews every deployment and requires tags before approval,
1732
01:18:11,040 –> 01:18:14,480
this is slow, this is manual, this creates a bottleneck, teams wait for approval,
1733
01:18:14,480 –> 01:18:20,240
teams get frustrated, teams find ways to bypass the system, well designed, policy automatically
1734
01:18:20,240 –> 01:18:25,200
blocks untagged resources, developers tag resources in their code, the policy validates the tags,
1735
01:18:25,200 –> 01:18:29,840
if tags are present and correct deployment proceeds automatically, if tags are missing or incorrect,
1736
01:18:29,840 –> 01:18:34,000
the policy blocks deployment and tells the developer what’s wrong, the developer fixes it,
1737
01:18:34,000 –> 01:18:38,720
no approval process, no bottleneck, no delay, just automated enforcement, the difference is that
1738
01:18:38,720 –> 01:18:43,120
the well designed governance system makes compliance the path of least resistance, developers don’t
1739
01:18:43,120 –> 01:18:46,880
have to ask for permission, they don’t have to wait for approval, they just follow the constraints
1740
01:18:46,880 –> 01:18:52,000
that the system enforces and because the constraints are clear and reasonable, they don’t feel like a burden,
1741
01:18:52,000 –> 01:18:55,840
they feel like guidance, they feel like the system is helping them do the right thing instead of
1742
01:18:55,840 –> 01:18:59,840
blocking them from doing it, why this skill is valuable is because designing governance that
1743
01:18:59,840 –> 01:19:03,600
accelerates innovation instead of blocking it is harder than it sounds, it requires
1744
01:19:03,600 –> 01:19:08,080
understanding both the technical requirements and the human factors, it requires understanding
1745
01:19:08,080 –> 01:19:12,080
what constraints are actually necessary and what constraints are just bureaucracy, it requires
1746
01:19:12,080 –> 01:19:16,400
designing systems where doing the right thing is easier than doing the wrong thing, this is the
1747
01:19:16,400 –> 01:19:20,960
skill that separates governance architects from people who just implement policies, the market
1748
01:19:20,960 –> 01:19:25,440
opportunity is this, organizations are desperate for people who can design governance that works,
1749
01:19:25,440 –> 01:19:29,520
not governance that’s theater, not governance that creates the illusion of control without
1750
01:19:29,520 –> 01:19:34,000
preventing erosion, governance that actually prevents problems while enabling innovation,
1751
01:19:34,000 –> 01:19:38,320
organizations that get this right move faster than their competitors, they ship more, they innovate
1752
01:19:38,320 –> 01:19:43,040
more, they win, organizations that get it wrong are constantly dealing with incidents and erosion,
1753
01:19:43,040 –> 01:19:47,520
they lose, the people who understand this who can design governance that accelerates instead of
1754
01:19:47,520 –> 01:19:53,040
blocks, those people are going to be in very high demand in 2026, the skill that matters in 2026
1755
01:19:53,040 –> 01:19:58,160
isn’t knowing as your services, it’s architecting governance frameworks that prevent erosion at scale,
1756
01:19:58,160 –> 01:20:02,400
this is the skill that commands premium compensation, this is the skill that creates genuine leverage
1757
01:20:02,400 –> 01:20:06,640
over organizational outcomes, this is the skill that separates the architects who understand cloud
1758
01:20:06,640 –> 01:20:10,560
from the architects who just know how to click buttons, if you’re building a career in Azure,
1759
01:20:10,560 –> 01:20:15,280
governance is the specialization that compounds, the frameworks you design scale, the patterns you
1760
01:20:15,280 –> 01:20:20,960
learn transfer, the value you create increases as you accumulate experience, start with identity,
1761
01:20:20,960 –> 01:20:26,000
move to policy, expand to operations, build governance that’s clear, fast and fair,
1762
01:20:26,000 –> 01:20:29,680
build governance that accelerates innovation, build governance that prevents erosion,
1763
01:20:29,680 –> 01:20:33,440
that’s the skill that matters in 2026.