
1
00:00:00,000 –> 00:00:02,120
Most organizations think Azure gets expensive
2
00:00:02,120 –> 00:00:03,960
because engineers waste money.
3
00:00:03,960 –> 00:00:04,800
They are wrong.
4
00:00:04,800 –> 00:00:07,080
Azure gets expensive because the platform is allowed
5
00:00:07,080 –> 00:00:09,040
to spend without an owner, without limits,
6
00:00:09,040 –> 00:00:10,280
and without consequences.
7
00:00:10,280 –> 00:00:11,600
That isn’t a savings problem.
8
00:00:11,600 –> 00:00:12,720
It’s cost entropy.
9
00:00:12,720 –> 00:00:15,000
Drift created by unowned deployment pathways
10
00:00:15,000 –> 00:00:16,920
that keep producing recurring spend
11
00:00:16,920 –> 00:00:19,360
long after the original decision got forgotten.
12
00:00:19,360 –> 00:00:21,680
This episode isn’t dashboards, savings hacks,
13
00:00:21,680 –> 00:00:23,160
or spot VM folklore.
14
00:00:23,160 –> 00:00:26,080
It’s the uncomfortable shift from why is Azure expensive?
15
00:00:26,080 –> 00:00:27,600
To the only question that matters,
16
00:00:27,600 –> 00:00:29,800
what did you allow and why can nobody stop it?
17
00:00:30,320 –> 00:00:32,080
The enterprise cost failure mode,
18
00:00:32,080 –> 00:00:33,760
unowned spend becomes normal.
19
00:00:33,760 –> 00:00:36,200
Cost overruns don’t show up as one dramatic mistake.
20
00:00:36,200 –> 00:00:37,480
They show up as a new normal,
21
00:00:37,480 –> 00:00:39,400
a temporary environment that never gets deleted
22
00:00:39,400 –> 00:00:41,600
because nobody can prove it safe.
23
00:00:41,600 –> 00:00:43,800
A premium SKU chosen for safety,
24
00:00:43,800 –> 00:00:46,120
because the engineer is accountable for outages,
25
00:00:46,120 –> 00:00:47,280
not for invoices.
26
00:00:47,280 –> 00:00:48,800
Silent egress during a migration
27
00:00:48,800 –> 00:00:50,680
because the network path changed,
28
00:00:50,680 –> 00:00:53,040
the data moved, and the bill kept arriving.
29
00:00:53,040 –> 00:00:54,760
None of these are exotic failures.
30
00:00:54,760 –> 00:00:57,600
They are the default behavior of a large Azure estate
31
00:00:57,600 –> 00:00:59,200
when intent is not enforced.
32
00:01:00,120 –> 00:01:01,160
Here’s what most people miss.
33
00:01:01,160 –> 00:01:03,760
Every one of those outcomes is locally rational.
34
00:01:03,760 –> 00:01:05,440
The engineer wants a stable deployment,
35
00:01:05,440 –> 00:01:07,000
so they select the higher tier.
36
00:01:07,000 –> 00:01:08,240
The team wants velocity,
37
00:01:08,240 –> 00:01:09,560
so they clone the environment
38
00:01:09,560 –> 00:01:11,240
and come back later to clean up.
39
00:01:11,240 –> 00:01:13,320
The platform team wants to unblock delivery,
40
00:01:13,320 –> 00:01:16,200
so they grant broad permissions temporarily.
41
00:01:16,200 –> 00:01:18,400
Each decision makes sense in isolation,
42
00:01:18,400 –> 00:01:20,720
but the enterprise doesn’t pay for isolated decisions.
43
00:01:20,720 –> 00:01:22,600
The enterprise pays for the aggregate,
44
00:01:22,600 –> 00:01:24,200
and that aggregate becomes chaos
45
00:01:24,200 –> 00:01:27,000
because cloud cost is not additive in the way leaders imagine.
46
00:01:27,000 –> 00:01:27,840
It is compounding.
47
00:01:27,840 –> 00:01:29,600
It accumulates from recurring resources,
48
00:01:29,600 –> 00:01:32,360
from idle capacity, from just in case redundancy,
49
00:01:32,360 –> 00:01:34,960
from shared services nobody can allocate,
50
00:01:34,960 –> 00:01:38,360
and from the quiet truth that Azure is a permissioned system.
51
00:01:38,360 –> 00:01:42,440
If something exists, some identity was allowed to create it.
52
00:01:42,440 –> 00:01:45,000
This is the part that should sound familiar to security people.
53
00:01:45,000 –> 00:01:46,280
Security drift doesn’t happen
54
00:01:46,280 –> 00:01:48,360
because everyone suddenly forgets security.
55
00:01:48,360 –> 00:01:50,440
It happens because exceptions accumulate.
56
00:01:50,440 –> 00:01:52,600
A conditional access policy gets an exclude
57
00:01:52,600 –> 00:01:54,160
for this service account.
58
00:01:54,160 –> 00:01:56,320
An R-back role gets a temporary owner.
59
00:01:56,320 –> 00:01:58,280
A firewall rule gets a one-day opening
60
00:01:58,280 –> 00:01:59,800
that survives three quarters.
61
00:01:59,800 –> 00:02:02,120
Over time, the system stops behaving deterministically
62
00:02:02,120 –> 00:02:04,200
and starts behaving probabilistically.
63
00:02:04,200 –> 00:02:05,520
Cost follows the same physics.
64
00:02:05,520 –> 00:02:08,160
If the platform allows teams to create resources
65
00:02:08,160 –> 00:02:09,680
without ownership metadata,
66
00:02:09,680 –> 00:02:10,840
without budget boundaries,
67
00:02:10,840 –> 00:02:12,800
and without constrained SQL choices,
68
00:02:12,800 –> 00:02:14,280
then drift is not a risk.
69
00:02:14,280 –> 00:02:15,240
Drift is guaranteed.
70
00:02:15,240 –> 00:02:18,040
You are operating a cost system with no memory of intent.
71
00:02:18,040 –> 00:02:20,240
That distinction matters.
72
00:02:20,240 –> 00:02:24,000
The typical enterprise response is predictable, reminders.
73
00:02:24,000 –> 00:02:25,880
We need to be more cost conscious.
74
00:02:25,880 –> 00:02:27,320
Please tag your resources.
75
00:02:27,320 –> 00:02:29,400
Here’s the monthly cost review deck.
76
00:02:29,400 –> 00:02:32,000
That approach feels mature because it looks organized,
77
00:02:32,000 –> 00:02:33,720
but awareness does not constrain behavior.
78
00:02:33,720 –> 00:02:34,560
It never did.
79
00:02:34,560 –> 00:02:36,800
Reminders don’t close deployment pathways.
80
00:02:36,800 –> 00:02:37,920
They don’t stop a pipeline
81
00:02:37,920 –> 00:02:40,120
from deploying a premium database tier.
82
00:02:40,120 –> 00:02:42,720
They don’t prevent a team from creating yet another subscription
83
00:02:42,720 –> 00:02:44,400
because procurement takes too long.
84
00:02:44,400 –> 00:02:46,480
They don’t shut down an abandoned environment
85
00:02:46,480 –> 00:02:47,320
on Friday night.
86
00:02:47,320 –> 00:02:48,760
Humans are not a control plane.
87
00:02:48,760 –> 00:02:50,840
The platform is, as your resource manager is,
88
00:02:50,840 –> 00:02:52,920
R-back is, Azure policy is.
89
00:02:52,920 –> 00:02:54,960
Subscription boundaries are.
90
00:02:54,960 –> 00:02:56,960
Those are the things that decide what can exist
91
00:02:56,960 –> 00:02:57,800
and what cannot.
92
00:02:57,800 –> 00:03:00,200
If those layers do not encode financial intent,
93
00:03:00,200 –> 00:03:01,760
then the enterprise is basically
94
00:03:01,760 –> 00:03:03,480
running a distributed spending engine
95
00:03:03,480 –> 00:03:05,040
with no enforcement mechanism.
96
00:03:05,040 –> 00:03:07,080
So define the failure mode precisely.
97
00:03:07,080 –> 00:03:10,360
Unknown spend becomes normal because the system tolerates it.
98
00:03:10,360 –> 00:03:12,920
It tolerates resources that can’t be attributed to a product,
99
00:03:12,920 –> 00:03:15,280
a cost center, or a named owner.
100
00:03:15,280 –> 00:03:16,800
It tolerates platform spend,
101
00:03:16,800 –> 00:03:19,720
smeared across shared subscriptions when nobody feels it.
102
00:03:19,720 –> 00:03:21,800
It tolerates environments that outlive the sprint
103
00:03:21,800 –> 00:03:23,080
they were created for.
104
00:03:23,080 –> 00:03:26,440
It tolerates premium defaults because nothing in the platform
105
00:03:26,440 –> 00:03:27,920
says prove you need this.
106
00:03:27,920 –> 00:03:30,440
And then eventually finance sees the invoice.
107
00:03:30,440 –> 00:03:32,440
By that point, the spend is no longer a decision.
108
00:03:32,440 –> 00:03:33,280
It’s dead.
109
00:03:33,280 –> 00:03:34,280
The service is running.
110
00:03:34,280 –> 00:03:35,720
The stakeholders are attached.
111
00:03:35,720 –> 00:03:37,360
The architecture has formed around it.
112
00:03:37,360 –> 00:03:39,240
Turning it off is now a risk discussion,
113
00:03:39,240 –> 00:03:40,520
not a cost discussion.
114
00:03:40,520 –> 00:03:42,360
That’s why invoice time escalation fails.
115
00:03:42,360 –> 00:03:44,400
It’s always late and it’s always political.
116
00:03:44,400 –> 00:03:46,600
Cost entropy is the name for that trap.
117
00:03:46,600 –> 00:03:49,160
It is unmanaged pathways that generate recurring spend
118
00:03:49,160 –> 00:03:50,720
without decision review.
119
00:03:50,720 –> 00:03:54,800
It is the gradual conversion of cost control from a deterministic model
120
00:03:54,800 –> 00:03:57,880
where spending happens because someone explicitly intended it
121
00:03:57,880 –> 00:03:59,440
into a probabilistic one,
122
00:03:59,440 –> 00:04:03,760
where spending happens because the platform is allowed to do whatever it can.
123
00:04:03,760 –> 00:04:06,120
And if you’re wondering why waste cleanup never seems to finish,
124
00:04:06,120 –> 00:04:08,200
this is why you are chasing symptoms
125
00:04:08,200 –> 00:04:10,920
after the authorization decision already happened.
126
00:04:10,920 –> 00:04:12,560
The uncomfortable truth is simple.
127
00:04:12,560 –> 00:04:15,840
The enterprise cost failure mode is not the existence of waste.
128
00:04:15,840 –> 00:04:17,840
It’s the absence of enforceable ownership.
129
00:04:17,840 –> 00:04:20,960
Waste is just what unowned systems produce at scale.
130
00:04:20,960 –> 00:04:23,720
And that’s why most enterprises start FinOps backwards.
131
00:04:23,720 –> 00:04:27,280
They start with visibility tools, dashboards and reports,
132
00:04:27,280 –> 00:04:29,280
and then wonder why behavior doesn’t change.
133
00:04:29,280 –> 00:04:31,160
Visibility doesn’t enforce intent.
134
00:04:31,160 –> 00:04:32,320
Governance does.
135
00:04:32,320 –> 00:04:35,800
FinOps implemented backwards, tooling first, governance never.
136
00:04:35,800 –> 00:04:40,000
Most enterprises do FinOps the same way they do security awareness.
137
00:04:40,000 –> 00:04:43,720
They buy tooling, build dashboards, schedule a review meeting,
138
00:04:43,720 –> 00:04:46,200
and then act surprised when behavior doesn’t change.
139
00:04:46,200 –> 00:04:47,960
The usual sequence is almost scripted,
140
00:04:47,960 –> 00:04:52,280
first enable Azure cost management, then build reports,
141
00:04:52,280 –> 00:04:57,360
then export to Power BI, then argue about amortization, reservations,
142
00:04:57,360 –> 00:05:01,200
and whether the spend should be grouped by resource group, subscription or tech.
143
00:05:01,200 –> 00:05:04,840
Somewhere in the middle, someone adds an email alert at 90% of budget.
144
00:05:04,840 –> 00:05:06,280
Everyone feels responsible.
145
00:05:06,280 –> 00:05:07,960
Nobody is constrained.
146
00:05:07,960 –> 00:05:09,160
That distinction matters.
147
00:05:09,160 –> 00:05:10,720
Observability is not governance.
148
00:05:10,720 –> 00:05:12,400
Observability tells you what happened.
149
00:05:12,400 –> 00:05:14,400
Governance decides what can happen.
150
00:05:14,400 –> 00:05:17,440
FinOps implemented backwards confuses the two and calls it progress.
151
00:05:17,440 –> 00:05:20,040
This is why so many FinOps programs turn into cost theatre.
152
00:05:20,040 –> 00:05:22,240
The reports get prettier, the decks get longer,
153
00:05:22,240 –> 00:05:24,400
the conversations get more sophisticated,
154
00:05:24,400 –> 00:05:27,440
but the platform remains permissive, so the spend keeps happening,
155
00:05:27,440 –> 00:05:30,280
and the FinOps team becomes a translation layer
156
00:05:30,280 –> 00:05:33,080
between invoices and engineers who never had to feel the cost decision
157
00:05:33,080 –> 00:05:34,480
in the moment it was made.
158
00:05:34,480 –> 00:05:36,680
Here’s the uncomfortable behavior pattern that follows.
159
00:05:36,680 –> 00:05:37,840
Alerts become noise.
160
00:05:37,840 –> 00:05:41,040
Budget alert hits, email goes out, nobody responds.
161
00:05:41,040 –> 00:05:42,520
Not because people are lazy,
162
00:05:42,520 –> 00:05:46,000
because the alert is not attached to an owner with authority and consequence.
163
00:05:46,000 –> 00:05:47,600
The budget doesn’t change anything.
164
00:05:47,600 –> 00:05:50,280
It doesn’t block a deployment, it doesn’t require an exception,
165
00:05:50,280 –> 00:05:52,200
it doesn’t trigger escalation with teeth.
166
00:05:52,200 –> 00:05:56,600
It just creates another message in a mailbox already full of messages that sound urgent.
167
00:05:56,600 –> 00:05:58,520
And when alerts don’t trigger action,
168
00:05:58,520 –> 00:06:00,360
engineers learn the real policy.
169
00:06:00,360 –> 00:06:01,760
Ignore it.
170
00:06:01,760 –> 00:06:04,600
That is how cost entropy becomes a culture problem.
171
00:06:04,600 –> 00:06:06,280
Not because the people are irresponsible,
172
00:06:06,280 –> 00:06:08,040
but because the system trains them,
173
00:06:08,040 –> 00:06:10,040
that nothing happens when you exceed intent.
174
00:06:10,040 –> 00:06:12,600
The platform keeps running, the invoice arrives later.
175
00:06:12,600 –> 00:06:13,880
Somebody else argues about it.
176
00:06:13,880 –> 00:06:16,640
FinOps tooling is good at telling you where the money went.
177
00:06:16,640 –> 00:06:18,960
It is structurally bad at preventing the next dollar,
178
00:06:18,960 –> 00:06:22,160
unless you connect it to controls that shape deployment pathways.
179
00:06:22,160 –> 00:06:26,960
Most organizations don’t, they treat cost tooling as the control plane when it’s just telemetry.
180
00:06:26,960 –> 00:06:29,760
And nowhere does that failure hide better than shared services.
181
00:06:29,760 –> 00:06:32,360
Shared services is where cost accountability goes to die.
182
00:06:32,360 –> 00:06:34,600
Networking, logging, monitoring, security tooling,
183
00:06:34,600 –> 00:06:39,960
egress, private endpoints, everything that platform teams deploy in the name of standardization and safety.
184
00:06:39,960 –> 00:06:43,400
It’s also the perfect place for the organization to stop asking who owns spend
185
00:06:43,400 –> 00:06:44,840
because the answer is uncomfortable.
186
00:06:44,840 –> 00:06:46,920
Nobody owns it, everyone depends on it.
187
00:06:46,920 –> 00:06:50,680
So it becomes central IT spend and central IT becomes a cost sink.
188
00:06:50,680 –> 00:06:54,120
Every application team benefits, but no application team sees a direct bill.
189
00:06:54,120 –> 00:06:57,080
Therefore, nobody has an incentive to question retention sampling,
190
00:06:57,080 –> 00:07:01,400
SKU tiers, or whether that cross-region log ingestion was actually required.
191
00:07:01,400 –> 00:07:04,040
The system behaves exactly as designed.
192
00:07:04,040 –> 00:07:06,200
Shared costs become invisible costs.
193
00:07:06,200 –> 00:07:10,120
Then finance asks why cloud is expensive and the platform team shows a dashboard.
194
00:07:10,120 –> 00:07:13,640
The foundational mistake is treating the cost problem like a visibility problem.
195
00:07:13,640 –> 00:07:15,880
Visibility is necessary, it is never sufficient.
196
00:07:15,880 –> 00:07:17,800
A dashboard does not create a boundary.
197
00:07:17,800 –> 00:07:19,800
A report does not create a consequence.
198
00:07:19,800 –> 00:07:24,200
A monthly review does not stop a pipeline from deploying a premium tier on Tuesday morning
199
00:07:24,200 –> 00:07:26,440
because the engineer wants to reduce operational risks.
200
00:07:26,440 –> 00:07:30,680
So the film-opt’s meeting becomes a recurring ritual where everyone agrees something should change
201
00:07:30,680 –> 00:07:33,160
and then the system keeps doing what it’s allowed to do.
202
00:07:33,160 –> 00:07:35,400
That’s the key phrase, what it’s allowed to do.
203
00:07:35,400 –> 00:07:40,200
Because the only place you can reliably change cost behavior at scale is the control plane,
204
00:07:40,200 –> 00:07:42,760
identity, policy, hierarchy and permissions.
205
00:07:42,760 –> 00:07:45,800
As your doesn’t spend money, your authorization model spends money.
206
00:07:45,800 –> 00:07:49,480
The moment you accept that, the whole tooling first approach looks like putting a
207
00:07:49,480 –> 00:07:51,400
speedometer in a car and calling it breaking.
208
00:07:51,400 –> 00:07:53,480
It’s useful information, it is not control.
209
00:07:53,480 –> 00:07:56,440
FinOps implemented correctly starts from a different question.
210
00:07:56,440 –> 00:08:00,120
Where is the enterprise allowing spend to occur without explicit intent?
211
00:08:00,120 –> 00:08:03,480
And how does the platform enforce that intent every time?
212
00:08:03,480 –> 00:08:05,080
That means budgets aren’t just numbers.
213
00:08:05,080 –> 00:08:06,760
They’re signals wired to owners.
214
00:08:06,760 –> 00:08:08,120
Tagging isn’t etiquette.
215
00:08:08,120 –> 00:08:09,720
It’s enforced metadata.
216
00:08:09,720 –> 00:08:11,320
SQ selection isn’t preference.
217
00:08:11,320 –> 00:08:11,960
It’s policy.
218
00:08:11,960 –> 00:08:13,560
Subscription creation isn’t convenience.
219
00:08:13,560 –> 00:08:15,640
It’s a gated act with declared accountability.
220
00:08:15,640 –> 00:08:19,240
In other words, cost isn’t a finance artifact you observe after the fact.
221
00:08:19,240 –> 00:08:20,520
It’s a control plane outcome.
222
00:08:20,520 –> 00:08:23,160
You either constrained by design or you didn’t.
223
00:08:23,160 –> 00:08:26,280
And once you see cost that way, the next step becomes obvious.
224
00:08:26,280 –> 00:08:29,080
Every cloud dollar is an authorization decision.
225
00:08:29,080 –> 00:08:30,200
The reframe.
226
00:08:30,200 –> 00:08:32,440
Every cloud dollar is an authorization decision.
227
00:08:32,440 –> 00:08:35,800
Here’s the reframe that makes everything else painfully obvious.
228
00:08:35,800 –> 00:08:37,720
A cloud bill is not a finance event.
229
00:08:37,720 –> 00:08:40,280
It’s a runtime side effect of authorization.
230
00:08:40,280 –> 00:08:41,960
Before a dollar shows up on an invoice,
231
00:08:41,960 –> 00:08:44,920
something had to be created, scaled or left running.
232
00:08:44,920 –> 00:08:46,280
And before that could happen,
233
00:08:46,280 –> 00:08:49,480
the platform evaluated and allowed denied decision somewhere in the graph.
234
00:08:49,480 –> 00:08:51,160
A user, a service principle,
235
00:08:51,160 –> 00:08:53,320
a managed identity, a pipeline,
236
00:08:53,320 –> 00:08:55,160
a landing zone automation account.
237
00:08:55,160 –> 00:08:57,400
Azure didn’t get expensive.
238
00:08:57,400 –> 00:08:59,000
Azure did what it was allowed to do.
239
00:08:59,000 –> 00:09:01,720
That distinction matters because it moves the conversation away
240
00:09:01,720 –> 00:09:03,880
from feelings and toward mechanics.
241
00:09:03,880 –> 00:09:05,560
Cost isn’t a behavior you inspire.
242
00:09:05,560 –> 00:09:07,000
It’s a pathway you permit.
243
00:09:07,000 –> 00:09:08,440
If a resource exists,
244
00:09:08,440 –> 00:09:10,920
some identity had enough permission to create it.
245
00:09:10,920 –> 00:09:13,320
And the hierarchy had enough openness to accept it.
246
00:09:13,320 –> 00:09:14,760
So, okay, so basically,
247
00:09:14,760 –> 00:09:17,960
every cloud dollar begins life as an authorization decision.
248
00:09:17,960 –> 00:09:20,920
Most enterprises pretend cost starts in cost management.
249
00:09:20,920 –> 00:09:21,640
It does not.
250
00:09:21,640 –> 00:09:23,960
Cost starts at deploy time and scale time.
251
00:09:23,960 –> 00:09:27,320
Cost starts when the system compiles intent into reality.
252
00:09:27,320 –> 00:09:28,600
A bag allows the action,
253
00:09:28,600 –> 00:09:32,840
policy allows the configuration and the subscription boundary absorbs the blast radius.
254
00:09:32,840 –> 00:09:35,080
Think of Azure like an authorization compiler.
255
00:09:35,080 –> 00:09:36,360
You write intent as code,
256
00:09:36,360 –> 00:09:39,560
arm templates, bicep, terraform, pipelines, portal clicks.
257
00:09:39,560 –> 00:09:42,200
The control plane evaluates that intent against rules.
258
00:09:42,200 –> 00:09:43,240
If it passes,
259
00:09:43,240 –> 00:09:46,920
the platform materializes capacity that burns money every hour
260
00:09:46,920 –> 00:09:48,360
until something stops it.
261
00:09:48,360 –> 00:09:50,680
If you want cost control, you don’t need more visibility.
262
00:09:50,680 –> 00:09:52,520
You need tighter compilation rules.
263
00:09:52,520 –> 00:09:56,680
This is also why anonymous spending is the most dangerous anti-pattern in Azure.
264
00:09:56,680 –> 00:09:59,400
Anonymous spending isn’t literally anonymous as your logs everything,
265
00:09:59,400 –> 00:10:00,440
billing has line items.
266
00:10:00,440 –> 00:10:05,720
The issue is that the enterprise can’t map spend to a responsible decision maker in time to intervene.
267
00:10:05,720 –> 00:10:07,960
The cost is smeared across shared scopes
268
00:10:07,960 –> 00:10:11,320
or resources are created without enforceable ownership metadata
269
00:10:11,320 –> 00:10:13,640
or the owner left the company and the budget state.
270
00:10:13,640 –> 00:10:14,920
That’s not a reporting gap.
271
00:10:14,920 –> 00:10:18,760
That’s an authorization gap because cost control only works when the decision maker
272
00:10:18,760 –> 00:10:20,520
is inside the feedback loop.
273
00:10:20,520 –> 00:10:23,400
If engineering can deploy without owning the financial impact,
274
00:10:23,400 –> 00:10:25,880
you’ve built a system where accountability is optional.
275
00:10:25,880 –> 00:10:28,200
Optional accountability doesn’t survive scale.
276
00:10:28,200 –> 00:10:29,480
Now here’s the weird part.
277
00:10:29,480 –> 00:10:31,080
The more exceptions you allow,
278
00:10:31,080 –> 00:10:33,640
the less predictable cost control becomes.
279
00:10:33,640 –> 00:10:36,520
Enterprises love exceptions because they sound pragmatic.
280
00:10:36,520 –> 00:10:38,120
This workload is special.
281
00:10:38,120 –> 00:10:39,240
This team is blocked.
282
00:10:39,240 –> 00:10:40,440
We’ll fix it later.
283
00:10:40,440 –> 00:10:45,080
And each exception converts your financial control model from deterministic to probabilistic.
284
00:10:45,080 –> 00:10:47,800
To deterministic means if you try to deploy X,
285
00:10:47,800 –> 00:10:50,200
the platform will deny it unless you meet Y.
286
00:10:50,200 –> 00:10:52,680
Probabilistic means sometimes X is denied,
287
00:10:52,680 –> 00:10:53,880
sometimes it passes,
288
00:10:53,880 –> 00:10:56,040
depending on who asked what scope they used,
289
00:10:56,040 –> 00:10:57,880
which subscription they found,
290
00:10:57,880 –> 00:10:59,880
which policy is actually assigned,
291
00:10:59,880 –> 00:11:02,520
and which exemption was quietly granted six months ago.
292
00:11:02,520 –> 00:11:05,000
That is in governance. That’s conditional chaos.
293
00:11:05,000 –> 00:11:07,480
So what is financial intent in architectural terms?
294
00:11:07,480 –> 00:11:09,320
It’s not a spreadsheet. It’s not a forecast.
295
00:11:09,320 –> 00:11:12,120
It’s a set of constraints the platform enforces continuously.
296
00:11:12,120 –> 00:11:16,360
Ownership. Every deployable scope has a named accountable party.
297
00:11:16,360 –> 00:11:19,560
Boundaries, budgets and thresholds exist where ownership exists.
298
00:11:19,560 –> 00:11:23,800
Constraints allowed SKUs, regions and patterns match the environment’s purpose.
299
00:11:23,800 –> 00:11:28,040
Escalation. When spent deviates, something happens that is not an email.
300
00:11:28,040 –> 00:11:32,520
Financial intent is the enterprise’s decision logic encoded where decisions actually happen.
301
00:11:32,520 –> 00:11:34,840
And once you accept that cost is authorization,
302
00:11:34,840 –> 00:11:36,360
you also accept something else.
303
00:11:36,360 –> 00:11:38,920
Finops lives with identity, policy,
304
00:11:38,920 –> 00:11:40,520
RBIAC and hierarchy.
305
00:11:40,520 –> 00:11:42,280
Not because finance wants to be technical,
306
00:11:42,280 –> 00:11:44,200
but because that’s where enforcement lives.
307
00:11:44,200 –> 00:11:45,800
Cost management can tell you what happened.
308
00:11:45,800 –> 00:11:49,800
It can’t stop the next deployment as your policy can.rbac can.
309
00:11:49,800 –> 00:11:51,640
A subscription boundary can.
310
00:11:51,640 –> 00:11:55,880
Exception governance can. This is why the enterprise should stop talking about saving money
311
00:11:55,880 –> 00:11:59,000
and start talking about removing unordited spending pathways.
312
00:11:59,000 –> 00:12:01,320
The savings is a byproduct, control is the goal.
313
00:12:01,320 –> 00:12:04,040
And if you want one practical implication to hold on to,
314
00:12:04,040 –> 00:12:07,320
if you can’t point to the exact boundary where spend is owned and constrained,
315
00:12:07,320 –> 00:12:08,920
you don’t have financial governance.
316
00:12:08,920 –> 00:12:10,280
You have financial hope.
317
00:12:10,280 –> 00:12:14,440
So the question becomes, where is the first boundary that actually works at enterprise scale?
318
00:12:14,440 –> 00:12:16,440
It isn’t a resource group, it isn’t a tag.
319
00:12:16,440 –> 00:12:17,640
It isn’t a dashboard.
320
00:12:17,640 –> 00:12:18,840
It’s the subscription.
321
00:12:18,840 –> 00:12:21,320
Subscriptions are the primary cost governance boundary.
322
00:12:21,320 –> 00:12:23,640
Most people treat subscriptions as building buckets.
323
00:12:23,640 –> 00:12:25,160
A place to put workloads.
324
00:12:25,160 –> 00:12:26,760
A line item you can move later.
325
00:12:26,760 –> 00:12:29,000
That mental model is why cost control fails.
326
00:12:29,000 –> 00:12:31,560
A subscription is not primarily a finance construct.
327
00:12:31,560 –> 00:12:34,120
It’s a governance boundary where three things collide.
328
00:12:34,120 –> 00:12:35,880
RBIAC scope, policy scope,
329
00:12:35,880 –> 00:12:38,040
and a measurable financial blast radius.
330
00:12:38,040 –> 00:12:40,760
It is the first place where you can make ownership real.
331
00:12:40,760 –> 00:12:43,240
Because the platform can attach permissions, budgets,
332
00:12:43,240 –> 00:12:46,360
and policy enforcement to a scope that actually contains damage.
333
00:12:46,360 –> 00:12:48,520
Resource groups don’t do that, not reliably.
334
00:12:48,520 –> 00:12:50,440
Resource groups are operational containers.
335
00:12:50,440 –> 00:12:51,720
They help you organize.
336
00:12:51,720 –> 00:12:52,600
They help you deploy.
337
00:12:52,600 –> 00:12:55,400
They do not protect you from a team creating a second resource group
338
00:12:55,400 –> 00:12:58,200
with a different set of tags, a different naming convention,
339
00:12:58,200 –> 00:13:00,360
and a slightly different temporary story.
340
00:13:00,360 –> 00:13:03,560
And they absolutely don’t protect you from the oldest enterprise trick.
341
00:13:03,560 –> 00:13:07,800
Burying expensive shared services in a resource group nobody wants to touch.
342
00:13:07,800 –> 00:13:10,040
Management groups are higher level governance.
343
00:13:10,040 –> 00:13:11,560
They’re necessary for scale,
344
00:13:11,560 –> 00:13:14,280
but they’re not where cost accountability becomes personal.
345
00:13:14,280 –> 00:13:15,960
They’re where standards get inherited.
346
00:13:15,960 –> 00:13:18,760
The place where spend becomes owned is lower,
347
00:13:18,760 –> 00:13:21,480
where budgets and permissions map to actual teams.
348
00:13:21,480 –> 00:13:23,160
That’s the subscription.
349
00:13:23,160 –> 00:13:25,960
A well-designed subscription is a budget boundary first.
350
00:13:25,960 –> 00:13:27,560
It’s the unit where you can say,
351
00:13:27,560 –> 00:13:30,200
this is the maximum financial exposure we will tolerate
352
00:13:30,200 –> 00:13:31,880
for this workload or this team.
353
00:13:31,880 –> 00:13:35,240
And if it exceeds expected behavior, escalation happens immediately.
354
00:13:35,240 –> 00:13:37,400
Not at invoice time, at deviation time.
355
00:13:37,400 –> 00:13:39,400
A subscription is also an R-back boundary.
356
00:13:39,400 –> 00:13:41,080
If you want to stop anonymous spending,
357
00:13:41,080 –> 00:13:43,560
you need to stop handing out broad contributors at scopes
358
00:13:43,560 –> 00:13:45,240
where nobody can be clearly blamed.
359
00:13:45,240 –> 00:13:47,320
Subscriptions let you define who can deploy,
360
00:13:47,320 –> 00:13:49,080
who can approve, who can see costs,
361
00:13:49,080 –> 00:13:50,360
and who can grant further rights.
362
00:13:50,360 –> 00:13:52,120
That separation matters because otherwise,
363
00:13:52,120 –> 00:13:54,680
the same identity that can create spend can also hide it.
364
00:13:54,680 –> 00:13:57,240
And a subscription is a policy boundary.
365
00:13:57,240 –> 00:13:59,400
As your policy assignments at subscription scope
366
00:13:59,400 –> 00:14:01,960
are where enforcement stops being aspirational.
367
00:14:01,960 –> 00:14:03,640
You can deny premium skews in dev,
368
00:14:03,640 –> 00:14:06,200
you can restrict regions, you can require tags,
369
00:14:06,200 –> 00:14:07,800
you can force diagnostic settings
370
00:14:07,800 –> 00:14:09,880
that you’ve decided you’re willing to pay for.
371
00:14:09,880 –> 00:14:13,160
You can also carve exceptions with visibility and expiration
372
00:14:13,160 –> 00:14:16,920
instead of letting them live forever as silent entropy generators.
373
00:14:16,920 –> 00:14:19,320
Now look at the failure mode most enterprises live in.
374
00:14:19,320 –> 00:14:20,600
Subscriptions sprawl.
375
00:14:20,600 –> 00:14:22,120
Subscriptions get created ad hoc.
376
00:14:22,120 –> 00:14:24,280
A team needs a sandbox, so they create one.
377
00:14:24,280 –> 00:14:26,440
And another team needs a POC, so they create one.
378
00:14:26,440 –> 00:14:28,680
The platform team needs to unblock delivery,
379
00:14:28,680 –> 00:14:29,800
so they create one.
380
00:14:29,800 –> 00:14:32,760
Over time, you get dozens or hundreds of subscriptions
381
00:14:32,760 –> 00:14:34,920
with inconsistent policies, inconsistent tagging,
382
00:14:34,920 –> 00:14:37,640
inconsistent permissions, and no coherent budget story.
383
00:14:37,640 –> 00:14:40,040
And when the bill spikes, nobody knows where to look first.
384
00:14:40,040 –> 00:14:42,040
Because the sprawl wasn’t just more subscriptions,
385
00:14:42,040 –> 00:14:44,440
it was more unreviewed pathways for spend,
386
00:14:44,440 –> 00:14:46,840
more identities, more places where policies weren’t assigned.
387
00:14:47,240 –> 00:14:49,880
More corners where a high tier resource could hide for months.
388
00:14:49,880 –> 00:14:52,520
So the real principle is not use fewer subscriptions.
389
00:14:52,520 –> 00:14:56,280
The principle is, a subscription should not exist without declared intent.
390
00:14:56,280 –> 00:14:58,680
That means subscription creation is not a convenience.
391
00:14:58,680 –> 00:14:59,720
It’s a governance event.
392
00:14:59,720 –> 00:15:02,280
It is the moment you decide who owns this,
393
00:15:02,280 –> 00:15:04,840
what it can spend, what it is allowed to deploy,
394
00:15:04,840 –> 00:15:06,440
and what happens when it deviates.
395
00:15:06,440 –> 00:15:07,880
Call it a vending model if you want.
396
00:15:07,880 –> 00:15:08,840
The name doesn’t matter.
397
00:15:08,840 –> 00:15:10,280
The enforcement does.
398
00:15:10,280 –> 00:15:13,160
Before a subscription is issued, four things must be true.
399
00:15:13,160 –> 00:15:15,160
First, there is an accountable owner.
400
00:15:15,160 –> 00:15:18,760
A human name, not platform team, not a distribution list,
401
00:15:18,760 –> 00:15:21,800
a role with escalation responsibility when budgets fire.
402
00:15:21,800 –> 00:15:24,120
Second, there is a budget with early thresholds,
403
00:15:24,120 –> 00:15:27,640
not 90% at month and early, 50% and 75%
404
00:15:27,640 –> 00:15:29,880
are governance interrupts, not failure notices.
405
00:15:29,880 –> 00:15:33,080
Third, there are allowed skews and regions aligned to purpose.
406
00:15:33,080 –> 00:15:34,040
Dev is not prod.
407
00:15:34,040 –> 00:15:36,440
Non-prod doesn’t get premium defaults just in case.
408
00:15:36,440 –> 00:15:40,200
Regions are constrained because global sprawl is both expensive
409
00:15:40,200 –> 00:15:41,400
and operationally chaotic.
410
00:15:41,400 –> 00:15:44,680
Fourth, there is an escalation workflow that actually roots.
411
00:15:45,160 –> 00:15:48,120
If a budget triggers, it creates a ticket, it pages the owner,
412
00:15:48,120 –> 00:15:49,400
it hits the right channel.
413
00:15:49,400 –> 00:15:51,240
Something happens that forces a decision.
414
00:15:51,240 –> 00:15:52,280
This is the point.
415
00:15:52,280 –> 00:15:55,000
Subscriptions turn cost governance from a vague aspiration
416
00:15:55,000 –> 00:15:56,600
into an enforceable boundary.
417
00:15:56,600 –> 00:16:00,200
When you do this, sprawl collapses into an intentional structure.
418
00:16:00,200 –> 00:16:02,920
And that structure is the only thing that lets you scale
419
00:16:02,920 –> 00:16:05,800
Azure without scaling financial chaos,
420
00:16:05,800 –> 00:16:07,400
which brings up the first scenario,
421
00:16:07,400 –> 00:16:09,400
what happens when you don’t do any of this.
422
00:16:09,400 –> 00:16:12,200
Scenario one, subscription sprawl with no ownership.
423
00:16:12,200 –> 00:16:14,760
Here’s what subscriptions sprawl looks like in a real enterprise,
424
00:16:14,760 –> 00:16:16,920
not the PowerPoint version, the lived version.
425
00:16:16,920 –> 00:16:21,240
There are dozens of subscriptions because every project needed just one more.
426
00:16:21,240 –> 00:16:25,240
Some were created by Central IT, some by App Teams, some by an MSP,
427
00:16:25,240 –> 00:16:26,920
some by whoever still had the rights.
428
00:16:26,920 –> 00:16:29,160
A few are tied to products, many aren’t,
429
00:16:29,160 –> 00:16:32,200
and the ones that aren’t become the perfect hiding place for spend,
430
00:16:32,200 –> 00:16:34,920
because ambiguity is a financial shelter.
431
00:16:34,920 –> 00:16:38,520
In the before posture, budgets are either missing or decorative.
432
00:16:38,520 –> 00:16:41,960
Cost management exists, sure, but nobody owns the interpretation.
433
00:16:41,960 –> 00:16:44,040
Tags might exist, but they aren’t enforced,
434
00:16:44,040 –> 00:16:48,280
and the billing views are full of resources with blank owners or TBD cost centers.
435
00:16:48,280 –> 00:16:52,040
There’s usually at least one shared subscription called something like platform,
436
00:16:52,040 –> 00:16:53,880
connectivity or hub.
437
00:16:53,880 –> 00:16:56,280
And it contains everything expensive and unglomerates,
438
00:16:56,280 –> 00:17:00,120
firewalls, VPN gateways, private endpoints,
439
00:17:00,120 –> 00:17:04,120
log-in-gestion, cross-region replication, security tooling,
440
00:17:04,120 –> 00:17:06,360
the stuff that always grows and nobody wants to explain.
441
00:17:06,360 –> 00:17:07,880
And here’s the operational pattern.
442
00:17:07,880 –> 00:17:10,120
Anomalies are discovered at invoice time.
443
00:17:10,120 –> 00:17:14,040
Finance sees a spike. It escalates to IT, 84 words it to the cloud team.
444
00:17:14,040 –> 00:17:17,000
The cloud team opens cost analysis and starts doing archaeology,
445
00:17:17,000 –> 00:17:19,400
who created the resources? Why are they still running?
446
00:17:19,400 –> 00:17:20,920
Which subscription is this even in?
447
00:17:20,920 –> 00:17:22,520
Is this production? Is this a POC?
448
00:17:22,520 –> 00:17:23,560
Is it safe to delete?
449
00:17:23,560 –> 00:17:25,320
The answer is usually nobody knows,
450
00:17:25,320 –> 00:17:26,920
not because the logs are missing,
451
00:17:26,920 –> 00:17:30,040
because ownership was never declared in a way the platform could enforce.
452
00:17:30,040 –> 00:17:33,560
So the system is full of spend that is technically attributable,
453
00:17:33,560 –> 00:17:35,160
but practically unowned.
454
00:17:35,160 –> 00:17:36,360
That’s the pathology.
455
00:17:36,360 –> 00:17:38,120
The organization can see cost,
456
00:17:38,120 –> 00:17:41,000
but cannot assign responsibility fast enough to intervene.
457
00:17:41,000 –> 00:17:42,680
So every month becomes the same ritual.
458
00:17:42,680 –> 00:17:45,720
Finance wants a name, engineering wants proof it’s safe to change,
459
00:17:45,720 –> 00:17:49,240
the platform team wants to avoid breaking workloads it doesn’t own.
460
00:17:49,240 –> 00:17:51,080
And leadership wants the bill to stop growing
461
00:17:51,080 –> 00:17:53,480
without having to learn what a private endpoint is.
462
00:17:53,480 –> 00:17:56,280
Over time, the enterprise adapts in the worst possible way.
463
00:17:56,280 –> 00:17:57,720
It normalizes the unknown.
464
00:17:57,720 –> 00:17:59,320
Cloud is just expensive.
465
00:17:59,320 –> 00:18:00,680
It’s probably AI.
466
00:18:00,680 –> 00:18:02,280
It’s probably security logging.
467
00:18:02,280 –> 00:18:03,800
It’s probably the migration.
468
00:18:03,800 –> 00:18:07,240
The bill becomes weather, unpleasant, inevitable, and nobody’s fault.
469
00:18:07,880 –> 00:18:10,280
Now the after-poster is not a dashboard upgrade.
470
00:18:10,280 –> 00:18:13,400
It’s a subscription-vending model with enforced preconditions.
471
00:18:13,400 –> 00:18:16,360
A subscription cannot be created until it declares intent
472
00:18:16,360 –> 00:18:18,120
in a machine readable way.
473
00:18:18,120 –> 00:18:18,920
Who owns it?
474
00:18:18,920 –> 00:18:20,120
What budget it has?
475
00:18:20,120 –> 00:18:21,640
What environment it is?
476
00:18:21,640 –> 00:18:23,240
What it is allowed to deploy?
477
00:18:23,240 –> 00:18:25,000
And how escalation works?
478
00:18:25,000 –> 00:18:26,440
Ownership is not a wiki page.
479
00:18:26,440 –> 00:18:28,920
It’s metadata tied to the subscription itself
480
00:18:28,920 –> 00:18:31,640
and referenced by policy, budget actions, and routing.
481
00:18:31,640 –> 00:18:34,360
This is where budget thresholds stop being polite notifications
482
00:18:34,360 –> 00:18:36,040
and start being governance interrupts.
483
00:18:36,040 –> 00:18:38,360
A budget at 50% isn’t your failing.
484
00:18:38,360 –> 00:18:40,760
It’s you are deviating from expected behavior early enough
485
00:18:40,760 –> 00:18:42,280
to still have options.
486
00:18:42,280 –> 00:18:45,720
At 75% it escalates harder, not by spamming more people.
487
00:18:45,720 –> 00:18:49,080
By triggering the next step in the enterprise workflow,
488
00:18:49,080 –> 00:18:51,240
a ticket, a routing rule,
489
00:18:51,240 –> 00:18:54,280
an accountable owner who has to either justify the spend,
490
00:18:54,280 –> 00:18:57,320
fix the drift, or request an exception with an expiry.
491
00:18:57,320 –> 00:19:00,360
And yes, this is where the platform team will complain about friction.
492
00:19:00,360 –> 00:19:02,440
Good, friction is how the system signals
493
00:19:02,440 –> 00:19:03,880
that a decision is happening.
494
00:19:03,880 –> 00:19:05,480
The goal isn’t to prevent spending.
495
00:19:05,480 –> 00:19:07,480
The goal is to prevent unreviewed spending.
496
00:19:07,480 –> 00:19:10,680
A subscription-vending model makes spend a conscious act again.
497
00:19:10,680 –> 00:19:13,880
Because it forces the enterprise to answer the questions it avoided,
498
00:19:13,880 –> 00:19:17,080
who owns this, what is it for, and what happens when it grows.
499
00:19:17,080 –> 00:19:18,680
You also get a second order effect
500
00:19:18,680 –> 00:19:20,920
that matters more than savings.
501
00:19:20,920 –> 00:19:23,880
Subscriptions sprawl collapses into a comprehensible structure.
502
00:19:23,880 –> 00:19:26,440
If every subscription has a named owner, a budget,
503
00:19:26,440 –> 00:19:28,360
and policy constraints align to purpose
504
00:19:28,360 –> 00:19:31,080
then when an anomaly happens, the investigation path is short.
505
00:19:31,080 –> 00:19:32,680
It becomes days, not months.
506
00:19:32,680 –> 00:19:35,240
And the organization stops paying for unknown spend,
507
00:19:35,240 –> 00:19:36,840
simply because it cannot allocate it.
508
00:19:36,840 –> 00:19:39,720
One caveat, don’t pretend you can safely invent numbers here.
509
00:19:39,720 –> 00:19:43,320
The measurable outcome isn’t we save 37%.
510
00:19:43,320 –> 00:19:46,200
The measurable outcome is reduction of unknown spend categories,
511
00:19:46,200 –> 00:19:47,800
faster anomaly detection cycles,
512
00:19:47,800 –> 00:19:50,440
and fewer often subscriptions that nobody can defend.
513
00:19:50,440 –> 00:19:52,440
And now the important transition,
514
00:19:52,440 –> 00:19:54,680
ownership alone doesn’t solve allocation.
515
00:19:54,680 –> 00:19:57,080
If costs can’t be attributed inside the subscription,
516
00:19:57,080 –> 00:19:59,160
down to product, environment, and life cycle,
517
00:19:59,160 –> 00:20:01,240
then subscription ownership becomes a blunt instrument.
518
00:20:01,240 –> 00:20:04,280
You end up with one owner holding a bag of costs they can’t explain,
519
00:20:04,280 –> 00:20:06,040
which is where the next slide shows up.
520
00:20:06,040 –> 00:20:09,240
Tagging, tagging fails because it’s treated as etiquette.
521
00:20:09,240 –> 00:20:12,040
Tagging is where most phenops programs go to die
522
00:20:12,040 –> 00:20:14,680
because enterprises treat it like manners.
523
00:20:14,680 –> 00:20:16,440
Please tag your resources.
524
00:20:16,440 –> 00:20:18,440
Here’s the tagging standard.
525
00:20:18,440 –> 00:20:20,680
Don’t forget your cost center.
526
00:20:20,680 –> 00:20:22,360
That language reveals the real posture.
527
00:20:22,360 –> 00:20:24,280
They’re asking for compliance the way you ask people
528
00:20:24,280 –> 00:20:25,480
to rinse their dishes.
529
00:20:25,480 –> 00:20:27,880
And then they act surprised when the sink fills up.
530
00:20:27,880 –> 00:20:29,240
Tagging is not etiquette.
531
00:20:29,240 –> 00:20:31,160
Tagging is financial identity.
532
00:20:31,160 –> 00:20:33,720
If a resource doesn’t carry ownership metadata,
533
00:20:33,720 –> 00:20:35,880
then it is not a resource in a managed system.
534
00:20:35,880 –> 00:20:38,120
It’s a liability with an invoice attached.
535
00:20:38,120 –> 00:20:39,640
And if allocation depends on tags,
536
00:20:39,640 –> 00:20:42,280
then the platform must refuse to create resources without them.
537
00:20:42,280 –> 00:20:43,960
Humans will not do this consistently.
538
00:20:43,960 –> 00:20:44,760
They are busy.
539
00:20:44,760 –> 00:20:46,040
They are optimizing for delivery.
540
00:20:46,040 –> 00:20:47,080
They will forget.
541
00:20:47,080 –> 00:20:49,160
They will type, prod in one place,
542
00:20:49,160 –> 00:20:50,360
and prod in another.
543
00:20:50,360 –> 00:20:52,440
And Azure will treat those values as different
544
00:20:52,440 –> 00:20:55,080
because tag values are case sensitive.
545
00:20:55,080 –> 00:20:57,480
So you don’t get slightly imperfect tagging.
546
00:20:57,480 –> 00:20:59,080
You get allocation collapse.
547
00:20:59,080 –> 00:21:01,560
This is what the failure actually looks like in an enterprise.
548
00:21:01,560 –> 00:21:03,400
Half the estate has tags, half doesn’t.
549
00:21:03,880 –> 00:21:06,680
The tag half uses inconsistent keys and values.
550
00:21:06,680 –> 00:21:09,400
Cost center, cost center, cost center.
551
00:21:09,400 –> 00:21:12,520
Owner tags contain emails that belong to people who left.
552
00:21:12,520 –> 00:21:15,800
Environment tags say production, prod, PRD, and yes.
553
00:21:15,800 –> 00:21:17,800
Some teams tag at the resource group,
554
00:21:17,800 –> 00:21:19,240
some tag at the resource,
555
00:21:19,240 –> 00:21:22,200
some rely on terraform modules that never got updated,
556
00:21:22,200 –> 00:21:24,840
some deploy through the portal and don’t see the field at all
557
00:21:24,840 –> 00:21:25,800
or they do,
558
00:21:25,800 –> 00:21:28,040
and they skip it because nothing stops them.
559
00:21:28,040 –> 00:21:30,360
Then finance shows up and asks for show back.
560
00:21:30,360 –> 00:21:34,680
Engineering says sure and produces a report that is 40% unallocated.
561
00:21:34,680 –> 00:21:36,760
The conversation immediately turns political,
562
00:21:36,760 –> 00:21:37,960
not because people are emotional
563
00:21:37,960 –> 00:21:40,120
but because the data is unusable.
564
00:21:40,120 –> 00:21:42,520
Every charge back discussion becomes an argument
565
00:21:42,520 –> 00:21:44,680
about whether the allocation rules are fair
566
00:21:44,680 –> 00:21:47,320
because the tags aren’t reliable enough to be treated as truth.
567
00:21:47,320 –> 00:21:49,800
Then this is where cost entropy hides again.
568
00:21:49,800 –> 00:21:51,640
In the fear of deletion,
569
00:21:51,640 –> 00:21:54,760
when a resource is untagged, nobody can confidently delete it
570
00:21:54,760 –> 00:21:56,440
because nobody can prove ownership.
571
00:21:56,440 –> 00:21:58,440
So the safest move is to keep paying.
572
00:21:58,440 –> 00:22:01,160
Untagged resources become financial fossils,
573
00:22:01,160 –> 00:22:04,920
expensive, old, and politically protected by ambiguity.
574
00:22:04,920 –> 00:22:06,200
The hard rule is simple.
575
00:22:06,200 –> 00:22:08,120
If you require a tag for allocation,
576
00:22:08,120 –> 00:22:09,960
then you require it for deployment.
577
00:22:09,960 –> 00:22:12,200
That means the platform refuses to create resources
578
00:22:12,200 –> 00:22:14,760
that don’t carry the minimum financial identity,
579
00:22:14,760 –> 00:22:18,600
owner, environment, and a product or cost center dimension.
580
00:22:18,600 –> 00:22:20,520
Not because those tags are magical,
581
00:22:20,520 –> 00:22:22,120
but because without them,
582
00:22:22,120 –> 00:22:24,600
the organization cannot route accountability
583
00:22:24,600 –> 00:22:28,120
and without routing, budgets and alerts become noise again.
584
00:22:28,120 –> 00:22:30,680
Azure gives you the enforcement mechanisms to do this
585
00:22:30,680 –> 00:22:32,200
and most enterprises still don’t
586
00:22:32,200 –> 00:22:35,320
because they confuse being strict with being hostile.
587
00:22:35,320 –> 00:22:36,600
Here’s what most people miss.
588
00:22:36,600 –> 00:22:38,600
Enforcement doesn’t have to be punitive.
589
00:22:38,600 –> 00:22:39,960
It has to be deterministic.
590
00:22:39,960 –> 00:22:42,360
Use Azure policy, use deny where you must.
591
00:22:42,360 –> 00:22:44,280
Use modify where you can do it safely.
592
00:22:44,280 –> 00:22:46,600
Modify policies can auto-add baseline tags
593
00:22:46,600 –> 00:22:49,640
when they’re missing or inherit tags from the resource group.
594
00:22:49,640 –> 00:22:54,200
That’s a useful pattern when you can rely on a well-constructed resource group boundary
595
00:22:54,200 –> 00:22:56,760
but you can’t treat inheritance as a substitute for governance.
596
00:22:56,760 –> 00:22:59,240
If teams can create arbitrary resource groups,
597
00:22:59,240 –> 00:23:02,200
then inheritance just moves the problem one layer down,
598
00:23:02,200 –> 00:23:03,720
so the correct posture is layered.
599
00:23:03,720 –> 00:23:08,280
Subscription-level tags establish ownership and budget responsibility.
600
00:23:08,280 –> 00:23:11,960
Resource group tags establish workload grouping and life cycle intent,
601
00:23:11,960 –> 00:23:15,240
resource tags handle exceptions and resource specific dimensions.
602
00:23:15,240 –> 00:23:16,680
And yes, you will need a taxonomy,
603
00:23:16,680 –> 00:23:19,720
but keep it small, six to eight tags that actually matter.
604
00:23:19,720 –> 00:23:22,120
Every extra tag is another chance for entropy.
605
00:23:22,120 –> 00:23:25,080
There’s another uncomfortable truth hiding and tagging people lie
606
00:23:25,080 –> 00:23:26,440
when tags are optional.
607
00:23:26,440 –> 00:23:28,040
Not maliciously, operationally.
608
00:23:28,040 –> 00:23:30,600
If someone is blocked by a tag policy
609
00:23:30,600 –> 00:23:31,960
and they don’t know the right value,
610
00:23:31,960 –> 00:23:33,960
they will pick something to get unblocked.
611
00:23:33,960 –> 00:23:36,040
That’s why controlled vocabulary is matter.
612
00:23:36,040 –> 00:23:39,800
That’s why free-text-owner tags turn into unknown and TBD and later.
613
00:23:39,800 –> 00:23:41,160
So if you want tagging to work,
614
00:23:41,160 –> 00:23:42,760
you don’t just enforce presence.
615
00:23:42,760 –> 00:23:44,920
You enforce meaning, allowed values,
616
00:23:44,920 –> 00:23:47,480
normalized casing, clear ownership mapping,
617
00:23:47,480 –> 00:23:50,760
a real taxonomy tied to the org structure you actually operate,
618
00:23:50,760 –> 00:23:52,440
not the one in your HR system.
619
00:23:52,440 –> 00:23:53,320
And once you do that,
620
00:23:53,320 –> 00:23:55,640
once the platform refuses untag deployments,
621
00:23:55,640 –> 00:23:58,040
the entire FinOps conversation changes.
622
00:23:58,040 –> 00:24:00,440
Cost allocation stops being a quarterly negotiation.
623
00:24:00,440 –> 00:24:02,440
It becomes a boring mechanical process,
624
00:24:02,440 –> 00:24:03,640
which is exactly what you want,
625
00:24:03,640 –> 00:24:05,880
because boring means deterministic.
626
00:24:05,880 –> 00:24:07,960
Now to make this concrete,
627
00:24:07,960 –> 00:24:10,120
the next scenario is where tagging failure
628
00:24:10,120 –> 00:24:12,120
becomes financial archaeology,
629
00:24:12,120 –> 00:24:14,360
untagged resources, no ownership,
630
00:24:14,360 –> 00:24:16,600
and weeks of arguing about who pays.
631
00:24:16,600 –> 00:24:20,280
Scenario two, untagged resources and financial archaeology.
632
00:24:20,280 –> 00:24:22,600
Here’s the scenario every enterprise recognizes,
633
00:24:22,600 –> 00:24:24,120
even if they pretend they don’t.
634
00:24:24,120 –> 00:24:25,240
The cost spike shows up.
635
00:24:25,240 –> 00:24:26,840
Someone opens cost analysis.
636
00:24:26,840 –> 00:24:29,000
The top line item isn’t a clear application name.
637
00:24:29,000 –> 00:24:31,160
It’s a storage account with a random suffix,
638
00:24:31,160 –> 00:24:32,680
or a log analytics workspace,
639
00:24:32,680 –> 00:24:35,080
or a database server named like a developer sneezed
640
00:24:35,080 –> 00:24:36,040
on the keyboard,
641
00:24:36,040 –> 00:24:37,320
and the tags are empty.
642
00:24:37,320 –> 00:24:39,880
In the before posture, tagging was recommended,
643
00:24:39,880 –> 00:24:41,960
which means it was ignored whenever delivery pressure
644
00:24:41,960 –> 00:24:43,560
was higher than etiquette.
645
00:24:43,560 –> 00:24:45,240
Finance can’t allocate the cost.
646
00:24:45,240 –> 00:24:47,320
Engineering can’t tell who owns the resource.
647
00:24:47,320 –> 00:24:49,160
The platform team can’t safely delete it.
648
00:24:49,160 –> 00:24:52,760
So everyone does the only thing the enterprise teaches them to do.
649
00:24:52,760 –> 00:24:54,040
They investigate slowly,
650
00:24:54,040 –> 00:24:55,000
and they keep paying.
651
00:24:55,000 –> 00:24:57,560
This is what financial archaeology looks like in practice.
652
00:24:57,560 –> 00:25:01,160
First, someone tries to infer ownership from the resource name
653
00:25:01,160 –> 00:25:03,640
that fails because naming standards are aspirational
654
00:25:03,640 –> 00:25:05,000
and time erodes them.
655
00:25:05,000 –> 00:25:07,880
Then they search activity logs for who created it.
656
00:25:07,880 –> 00:25:09,000
That fails in two common ways.
657
00:25:09,000 –> 00:25:10,600
The identity is a service principle shared
658
00:25:10,600 –> 00:25:11,800
by multiple pipelines,
659
00:25:11,800 –> 00:25:13,080
or the creator left the company.
660
00:25:13,080 –> 00:25:14,360
Then they look for connections,
661
00:25:14,360 –> 00:25:16,440
peering private endpoint diagnostic settings,
662
00:25:16,440 –> 00:25:18,520
linked workspaces to determine impact.
663
00:25:18,520 –> 00:25:19,720
That becomes a graph problem,
664
00:25:19,720 –> 00:25:21,960
and graph problems don’t finish in a meeting.
665
00:25:21,960 –> 00:25:24,360
So the resource survives, the cost continues.
666
00:25:24,360 –> 00:25:26,760
And a week later you have a second untanked resource
667
00:25:26,760 –> 00:25:28,200
because the system learned nothing.
668
00:25:28,200 –> 00:25:29,480
Here’s the key insight.
669
00:25:29,480 –> 00:25:31,960
Untanked resources don’t just prevent allocation.
670
00:25:31,960 –> 00:25:33,640
They prevent intervention.
671
00:25:33,640 –> 00:25:37,400
Because deletion in an enterprise is a political act disguised as a technical act.
672
00:25:37,400 –> 00:25:39,480
If you can’t name the owner, you can’t escalate.
673
00:25:39,480 –> 00:25:41,400
If you can’t escalate, you can’t get approval.
674
00:25:41,400 –> 00:25:43,320
If you can’t get approval, you don’t delete.
675
00:25:43,320 –> 00:25:45,640
The resource becomes too risky to touch,
676
00:25:45,640 –> 00:25:47,720
which is the most expensive category in Azure.
677
00:25:47,720 –> 00:25:51,560
Now the after-poster is not, we reminded people harder.
678
00:25:51,560 –> 00:25:54,280
It’s enforced tagging as a deployment precondition.
679
00:25:54,280 –> 00:25:57,160
In production, the platform denies resource creation
680
00:25:57,160 –> 00:25:59,400
when the minimum financial identity is missing.
681
00:25:59,400 –> 00:26:01,640
Not later, not in a monthly report.
682
00:26:01,640 –> 00:26:03,400
At the moment of creation, that sounds harsh
683
00:26:03,400 –> 00:26:05,320
until you realize what it actually does.
684
00:26:05,320 –> 00:26:07,640
It forces the ownership conversation to happen
685
00:26:07,640 –> 00:26:09,240
while change is still cheap.
686
00:26:09,240 –> 00:26:11,240
When the engineer is still at their keyboard.
687
00:26:11,240 –> 00:26:13,240
When the pipeline can still fail fast.
688
00:26:13,240 –> 00:26:16,200
When the workload is still a proposal, not a dependency.
689
00:26:16,200 –> 00:26:18,600
And yes, you will use two different policy effects
690
00:26:18,600 –> 00:26:20,520
depending on what you’re protecting.
691
00:26:20,520 –> 00:26:23,320
For baseline tags that are safe to apply universally,
692
00:26:23,320 –> 00:26:25,560
you use modify to add or normalize.
693
00:26:25,560 –> 00:26:28,440
A common pattern is to inherit owner and cost center
694
00:26:28,440 –> 00:26:30,280
from the subscription or resource group
695
00:26:30,280 –> 00:26:32,120
where that identity is already declared.
696
00:26:32,120 –> 00:26:34,120
That’s how you avoid making every engineer type
697
00:26:34,120 –> 00:26:36,280
the same metadata 400 times.
698
00:26:36,280 –> 00:26:37,560
But for production workloads,
699
00:26:37,560 –> 00:26:40,280
you also use deny for missing or invalid tags.
700
00:26:40,280 –> 00:26:42,360
Because allocation that depends on best effort
701
00:26:42,360 –> 00:26:44,920
is just a slower version of untagged chaos.
702
00:26:44,920 –> 00:26:46,520
This is where value standards matter.
703
00:26:46,520 –> 00:26:50,360
If the tag key exists, but the value is TBD, nothing improved.
704
00:26:50,360 –> 00:26:52,680
So you constrain values, controlled casing,
705
00:26:52,680 –> 00:26:55,240
allowed environments, approved cost centers,
706
00:26:55,240 –> 00:26:57,240
known owner formats, it’s not bureaucracy,
707
00:26:57,240 –> 00:26:59,560
it’s how you keep allocation deterministic.
708
00:26:59,560 –> 00:27:01,560
Now the architecture move that makes this stick
709
00:27:01,560 –> 00:27:04,120
is to normalize tags to ownership boundaries.
710
00:27:04,120 –> 00:27:05,960
The subscription holds accountable ownership
711
00:27:05,960 –> 00:27:07,720
and budget responsibility.
712
00:27:07,720 –> 00:27:09,720
The resource group holds workload grouping
713
00:27:09,720 –> 00:27:11,080
and life cycle intent.
714
00:27:11,080 –> 00:27:13,640
Individual resources only carry special cases
715
00:27:13,640 –> 00:27:15,400
because special cases multiply.
716
00:27:15,400 –> 00:27:19,000
If you don’t build that hierarchy, tags become another entropy generator,
717
00:27:19,000 –> 00:27:22,280
endlessly debated, inconsistently applied, and never trusted.
718
00:27:22,280 –> 00:27:25,480
With enforcement in place, the operational behavior changes immediately.
719
00:27:25,480 –> 00:27:27,640
Engineering stops treating tags like paperwork
720
00:27:27,640 –> 00:27:30,840
because the platform refuses to deploy without them.
721
00:27:30,840 –> 00:27:33,640
Finance stops treating allocation like a quarterly negotiation
722
00:27:33,640 –> 00:27:35,160
because the data is complete.
723
00:27:35,160 –> 00:27:38,360
And leadership stops hearing we can’t tell as an excuse
724
00:27:38,360 –> 00:27:41,640
because the system no longer allows we can’t tell resources
725
00:27:41,640 –> 00:27:43,240
to exist in the first place.
726
00:27:43,240 –> 00:27:45,400
The measurable outcome isn’t a fantasy percentage,
727
00:27:45,400 –> 00:27:47,160
it’s something you can actually verify.
728
00:27:47,160 –> 00:27:49,480
The unallocated bucket shrinks towards zero,
729
00:27:49,480 –> 00:27:50,680
not because people got better
730
00:27:50,680 –> 00:27:53,560
because the control plane stopped accepting ambiguity.
731
00:27:53,560 –> 00:27:55,400
And once attribution becomes boring,
732
00:27:55,400 –> 00:27:57,080
showback and chargeback become boring too,
733
00:27:57,080 –> 00:27:58,120
which is the entire point,
734
00:27:58,120 –> 00:28:01,240
you want the cost conversation to be factual, not political.
735
00:28:01,240 –> 00:28:03,240
Now there’s a second order consequence
736
00:28:03,240 –> 00:28:05,400
that shows up right after tagging gets enforced.
737
00:28:05,400 –> 00:28:07,400
Once teams can’t hide behind ambiguity,
738
00:28:07,400 –> 00:28:09,160
they hide behind safety.
739
00:28:09,160 –> 00:28:11,240
They start over-provisioning by default
740
00:28:11,240 –> 00:28:12,920
because cost is now visible,
741
00:28:12,920 –> 00:28:15,720
but operational risk still hurts more than the bill.
742
00:28:15,720 –> 00:28:17,000
That’s the next failure mode,
743
00:28:17,000 –> 00:28:19,560
premium tiers and multi-region by reflex.
744
00:28:19,560 –> 00:28:22,440
Scenario three, pass over-provisioning by default.
745
00:28:22,440 –> 00:28:24,600
Once tagging and ownership become real,
746
00:28:24,600 –> 00:28:26,200
teams lose the ability to hide,
747
00:28:26,200 –> 00:28:27,480
so they switch strategies.
748
00:28:27,480 –> 00:28:29,000
They hide behind safety.
749
00:28:29,000 –> 00:28:31,640
This is where power turns into a quiet budget murderer
750
00:28:31,640 –> 00:28:34,280
because power as defaults are easy to justify
751
00:28:34,280 –> 00:28:35,800
and hard to unwind.
752
00:28:35,800 –> 00:28:37,880
A developer doesn’t have to rack servers anymore.
753
00:28:37,880 –> 00:28:39,640
They click a tier, they pick redundancy,
754
00:28:39,640 –> 00:28:41,720
they enable the features that sound responsible
755
00:28:41,720 –> 00:28:44,840
and nobody stops them because the platform treats premium
756
00:28:44,840 –> 00:28:46,600
as just another valid choice.
757
00:28:46,600 –> 00:28:49,320
In the before posture, over-provisioning isn’t malicious.
758
00:28:49,320 –> 00:28:52,040
It’s rational engineers are accountable for availability.
759
00:28:52,040 –> 00:28:53,640
They get paged for latency,
760
00:28:53,640 –> 00:28:54,920
they get blamed for downtime,
761
00:28:54,920 –> 00:28:57,480
they do not get praised for choosing the cheaper SKU.
762
00:28:57,480 –> 00:28:59,160
So when faced with uncertainty,
763
00:28:59,160 –> 00:29:01,960
they pick the tier that reduces operational risk,
764
00:29:01,960 –> 00:29:04,120
premium database tier just in case.
765
00:29:04,120 –> 00:29:05,800
Multisone, because what if?
766
00:29:05,800 –> 00:29:09,000
Multiregion because the business might need it later.
767
00:29:09,000 –> 00:29:11,400
Diagnostic retention because security might ask,
768
00:29:11,400 –> 00:29:14,200
each of those decisions can be individually defensible.
769
00:29:14,200 –> 00:29:16,200
Collectively, they are financial entropy
770
00:29:16,200 –> 00:29:17,400
and PAS makes it worse
771
00:29:17,400 –> 00:29:19,800
because it’s designed to abstract capacity decisions.
772
00:29:19,800 –> 00:29:21,080
That’s the selling point.
773
00:29:21,080 –> 00:29:22,760
But abstraction doesn’t remove cost.
774
00:29:22,760 –> 00:29:24,680
It removes friction and when you remove friction
775
00:29:24,680 –> 00:29:27,560
in a large organization, consumption expands
776
00:29:27,560 –> 00:29:28,680
until a boundary stops it.
777
00:29:28,680 –> 00:29:30,280
Most enterprises don’t build that boundary.
778
00:29:30,280 –> 00:29:32,600
They treat PAS like it’s inherently optimized
779
00:29:32,600 –> 00:29:34,680
because Microsoft marketing implies that it is.
780
00:29:34,680 –> 00:29:35,240
It is not.
781
00:29:35,240 –> 00:29:38,040
It is a set of cost curves you must choose deliberately.
782
00:29:38,040 –> 00:29:39,800
So here’s the real failure mechanism.
783
00:29:39,800 –> 00:29:42,120
Teams externalize the cost of safety.
784
00:29:42,120 –> 00:29:43,480
They buy safety with your budget
785
00:29:43,480 –> 00:29:44,600
and the platform lets them.
786
00:29:44,600 –> 00:29:46,520
The after-posture isn’t telling engineers
787
00:29:46,520 –> 00:29:47,640
to be more careful.
788
00:29:47,640 –> 00:29:49,640
It’s forcing the platform to distinguish
789
00:29:49,640 –> 00:29:51,400
between environments and intents.
790
00:29:51,400 –> 00:29:52,280
Dev is not prod.
791
00:29:52,280 –> 00:29:53,240
Test is not prod.
792
00:29:53,240 –> 00:29:55,320
A sandbox is not a customer facing service.
793
00:29:55,320 –> 00:29:57,720
If you allow the same skill catalog everywhere,
794
00:29:57,720 –> 00:30:00,040
you are telling every team in every environment
795
00:30:00,040 –> 00:30:01,800
that the enterprise is comfortable paying
796
00:30:01,800 –> 00:30:03,640
for worst-case assumptions by default.
797
00:30:03,640 –> 00:30:05,480
That is not governance.
798
00:30:05,480 –> 00:30:07,240
That is surrender.
799
00:30:07,240 –> 00:30:09,880
The control is simple and it’s always unpopular at first.
800
00:30:09,880 –> 00:30:11,480
Allow the skills per environment.
801
00:30:11,480 –> 00:30:13,720
In non-production, deny premium tiers
802
00:30:13,720 –> 00:30:15,640
unless there is an explicit exception.
803
00:30:15,640 –> 00:30:18,200
In production, don’t deny everything.
804
00:30:18,200 –> 00:30:21,080
But constrain the choices to a set you can defend.
805
00:30:21,080 –> 00:30:23,080
Tears that match real SLOs,
806
00:30:23,080 –> 00:30:24,840
actual throughput requirements
807
00:30:24,840 –> 00:30:27,400
and resilience patterns you’ve agreed to pay for.
808
00:30:27,400 –> 00:30:29,560
This is where Azure Policy stops being compliance
809
00:30:29,560 –> 00:30:31,400
and starts being cost engineering.
810
00:30:31,400 –> 00:30:34,360
You define policy rules that deny specific SKUs
811
00:30:34,360 –> 00:30:37,240
or deny specific features outside approved scopes.
812
00:30:37,240 –> 00:30:38,200
You restrict regions.
813
00:30:38,200 –> 00:30:41,000
You restrict redundancy options where they don’t make sense.
814
00:30:41,000 –> 00:30:42,600
You can also enforce patterns.
815
00:30:42,600 –> 00:30:45,080
Prod databases must have backups configured.
816
00:30:45,080 –> 00:30:47,800
But dev databases must not be zone redundant
817
00:30:47,800 –> 00:30:50,680
because zone redundancy in dev is just expensive cosplay.
818
00:30:50,680 –> 00:30:53,080
And yes, someone will argue that policy can’t cover
819
00:30:53,080 –> 00:30:54,520
every past configuration perfectly.
820
00:30:54,520 –> 00:30:55,320
That’s true.
821
00:30:55,320 –> 00:30:57,080
But the point isn’t perfect coverage.
822
00:30:57,080 –> 00:30:59,000
The point is removing default freedom
823
00:30:59,000 –> 00:31:01,560
where default freedom creates default overspend.
824
00:31:01,560 –> 00:31:03,560
Now the weird part, exceptions don’t go away.
825
00:31:03,560 –> 00:31:04,520
They never do.
826
00:31:04,520 –> 00:31:06,840
So you treat exceptions as what they are.
827
00:31:06,840 –> 00:31:08,200
Entropy generators.
828
00:31:08,200 –> 00:31:11,160
An exception should be tracked, justified, and time-boxed.
829
00:31:11,160 –> 00:31:13,640
Not because the justification is morally important,
830
00:31:13,640 –> 00:31:15,400
but because time is the only thing
831
00:31:15,400 –> 00:31:17,960
that prevents an exception from becoming the new baseline.
832
00:31:17,960 –> 00:31:20,920
An exception without an expiry is policy rot.
833
00:31:20,920 –> 00:31:23,800
A premium SKU approval that lasts forever is not an approval.
834
00:31:23,800 –> 00:31:25,240
It’s a quiet surrender
835
00:31:25,240 –> 00:31:27,800
that the platform will remember longer than your org chart.
836
00:31:27,800 –> 00:31:30,600
So the after-poster includes an exception workflow.
837
00:31:30,600 –> 00:31:34,120
When a team needs a premium tier in a non-prod subscription,
838
00:31:34,120 –> 00:31:34,920
they request it.
839
00:31:34,920 –> 00:31:37,240
They state why they pick an expiry date.
840
00:31:37,240 –> 00:31:39,800
The platform team grants an exemption at the policy layer
841
00:31:39,800 –> 00:31:42,600
not by giving someone owner and hoping they behave.
842
00:31:42,600 –> 00:31:45,000
Then the exemption expires automatically
843
00:31:45,000 –> 00:31:47,160
and the team has to renew it with intent
844
00:31:47,160 –> 00:31:48,680
or fall back to the baseline.
845
00:31:48,680 –> 00:31:50,440
That’s how you stop premium sprawl
846
00:31:50,440 –> 00:31:52,200
from becoming permanent architecture.
847
00:31:52,200 –> 00:31:53,880
The outcome isn’t just cost reduction.
848
00:31:53,880 –> 00:31:55,480
It’s fewer emergency rollbacks.
849
00:31:55,480 –> 00:31:58,200
Because the enterprise stops discovering cost explosions
850
00:31:58,200 –> 00:31:59,000
after they happen.
851
00:31:59,000 –> 00:32:00,520
It discovers them at deploy time.
852
00:32:00,520 –> 00:32:01,640
The pipeline fails.
853
00:32:01,640 –> 00:32:03,160
The team sees the denial.
854
00:32:03,160 –> 00:32:05,080
They either adjust to the approved tier
855
00:32:05,080 –> 00:32:06,760
or escalate with a conscious decision.
856
00:32:06,760 –> 00:32:10,280
That is a healthier failure mode than an invoice time surprise.
857
00:32:10,280 –> 00:32:12,040
And it has a second order benefit.
858
00:32:12,040 –> 00:32:13,960
The team starts designing for efficiency
859
00:32:13,960 –> 00:32:15,880
because the platform forces them too.
860
00:32:15,880 –> 00:32:19,000
If premium is harder to get, engineers invest in better indexing,
861
00:32:19,000 –> 00:32:20,680
better caching, better query patterns,
862
00:32:20,680 –> 00:32:21,960
better scaling strategies.
863
00:32:21,960 –> 00:32:23,400
Not because they became saints,
864
00:32:23,400 –> 00:32:25,800
because the control plane changed the incentives.
865
00:32:25,800 –> 00:32:27,640
Now there’s one more place where this pattern
866
00:32:27,640 –> 00:32:29,960
becomes pathological non-production.
867
00:32:29,960 –> 00:32:32,360
Because non-prod is where teams feel the least financial pain
868
00:32:32,360 –> 00:32:34,520
so it becomes the landfill for over-provisioning,
869
00:32:34,520 –> 00:32:36,840
abandoned experiments and temporary environments
870
00:32:36,840 –> 00:32:37,560
that never die.
871
00:32:37,560 –> 00:32:38,280
That’s next.
872
00:32:38,280 –> 00:32:39,240
scenario 4.
873
00:32:39,240 –> 00:32:41,160
Unbounded non-production spend.
874
00:32:41,160 –> 00:32:44,040
Non-production is where Azure budgets go to be embarrassed.
875
00:32:44,040 –> 00:32:45,240
In the before posture,
876
00:32:45,240 –> 00:32:47,800
Devin test are treated like free real estate.
877
00:32:47,800 –> 00:32:49,080
It’s not prod,
878
00:32:49,080 –> 00:32:51,080
so nobody bothers with budgets.
879
00:32:51,080 –> 00:32:52,280
Nobody sets thresholds.
880
00:32:52,280 –> 00:32:54,520
Nobody defines what done means for an environment.
881
00:32:54,520 –> 00:32:57,480
And because nobody gets paged when dev costs spike,
882
00:32:57,480 –> 00:32:59,400
the platform quietly turns non-prod
883
00:32:59,400 –> 00:33:02,760
into the largest, least defended surface area of recurring spend.
884
00:33:02,760 –> 00:33:04,440
The failure pattern is always the same.
885
00:33:04,440 –> 00:33:06,920
A team spins up a full stack environment for a sprint.
886
00:33:06,920 –> 00:33:08,360
It was supposed to be temporary.
887
00:33:08,360 –> 00:33:09,080
It isn’t.
888
00:33:09,080 –> 00:33:10,920
Another team creates a parallel environment
889
00:33:10,920 –> 00:33:12,600
because the first one is messy
890
00:33:12,600 –> 00:33:14,040
and they don’t want to touch it.
891
00:33:14,040 –> 00:33:17,000
Someone enables extra diagnostics just for troubleshooting
892
00:33:17,000 –> 00:33:18,280
and never turns it off.
893
00:33:18,280 –> 00:33:19,960
An internal demo needs more capacity
894
00:33:19,960 –> 00:33:22,520
so the tier gets bumped up and never comes back down.
895
00:33:22,520 –> 00:33:24,360
Then a few of these environments get connected
896
00:33:24,360 –> 00:33:25,720
to shared services,
897
00:33:25,720 –> 00:33:26,920
log ingestion,
898
00:33:26,920 –> 00:33:28,520
private endpoints, hubs,
899
00:33:28,520 –> 00:33:31,080
and now even deleting the compute doesn’t stop the bill
900
00:33:31,080 –> 00:33:33,000
because the dependencies keep running.
901
00:33:33,000 –> 00:33:35,800
And because non-prod is full of experimentation,
902
00:33:35,800 –> 00:33:38,200
nobody wants to be the person who deletes the wrong thing.
903
00:33:38,200 –> 00:33:39,400
So they don’t delete anything.
904
00:33:39,400 –> 00:33:41,400
That’s why non-prod becomes a cost landfill.
905
00:33:41,400 –> 00:33:43,400
It’s the intersection of low accountability,
906
00:33:43,400 –> 00:33:44,520
high-change velocity,
907
00:33:44,520 –> 00:33:46,200
and fear-driven retention.
908
00:33:46,200 –> 00:33:48,760
Those three forces always produce the same outcome.
909
00:33:48,760 –> 00:33:51,320
Resources outlive the work that justified them.
910
00:33:51,320 –> 00:33:54,840
This is also where the enterprise commits its most common cost lie.
911
00:33:54,840 –> 00:33:56,600
It’s cheap compared to prod.
912
00:33:56,600 –> 00:33:58,680
That statement is usually true in isolation.
913
00:33:58,680 –> 00:34:00,280
It’s just irrelevant.
914
00:34:00,280 –> 00:34:01,880
Non-prod isn’t supposed to be cheap.
915
00:34:01,880 –> 00:34:03,320
It’s supposed to be bounded.
916
00:34:03,320 –> 00:34:06,120
The point of non-prod is to support delivery
917
00:34:06,120 –> 00:34:07,080
with a known purpose
918
00:34:07,080 –> 00:34:08,920
and a known financial blast radius.
919
00:34:08,920 –> 00:34:11,960
If dev test can run indefinitely at any SKU
920
00:34:11,960 –> 00:34:14,040
with no budget and no life cycle rule,
921
00:34:14,040 –> 00:34:16,200
then it stops being a delivery capability
922
00:34:16,200 –> 00:34:18,760
and becomes a parallel cloud estate with no governance.
923
00:34:18,760 –> 00:34:20,920
In other words, you build a second as your environment
924
00:34:20,920 –> 00:34:21,960
inside your first one.
925
00:34:21,960 –> 00:34:24,120
So the after-poster starts with the most boring
926
00:34:24,120 –> 00:34:25,800
but effective design move.
927
00:34:25,800 –> 00:34:27,560
Separate non-production subscriptions.
928
00:34:27,560 –> 00:34:29,320
Not resource groups, not tags.
929
00:34:29,320 –> 00:34:30,600
Subscriptions.
930
00:34:30,600 –> 00:34:34,040
When non-prod lives inside the same subscription as prod,
931
00:34:34,040 –> 00:34:36,200
it inherits prod-level permissions,
932
00:34:36,200 –> 00:34:39,080
prod-level SKU freedom and prod-level ambiguity.
933
00:34:39,080 –> 00:34:41,720
People will also use prod subscriptions
934
00:34:41,720 –> 00:34:43,720
for non-prod work temporarily
935
00:34:43,720 –> 00:34:44,760
because it’s convenient.
936
00:34:44,760 –> 00:34:47,080
Separate subscriptions remove that pathway.
937
00:34:47,080 –> 00:34:48,280
They create a clean boundary
938
00:34:48,280 –> 00:34:51,080
where policies can be strict without breaking production.
939
00:34:51,080 –> 00:34:53,320
And now you do something enterprises almost never do.
940
00:34:53,320 –> 00:34:56,600
You treat non-prod budgets as aggressive by design.
941
00:34:56,600 –> 00:34:58,360
Non-prod budgets are not there to track.
942
00:34:58,360 –> 00:35:00,200
They are there to interrupt behavior early.
943
00:35:00,200 –> 00:35:02,920
50% and 70% thresholds aren’t warnings.
944
00:35:02,920 –> 00:35:04,040
They are scheduled friction.
945
00:35:04,040 –> 00:35:06,200
They force someone to explain why dev is burning through.
946
00:35:06,200 –> 00:35:08,520
It’s expected envelope halfway through the cycle.
947
00:35:08,520 –> 00:35:10,360
And because it’s non-prod, you can actually act.
948
00:35:10,360 –> 00:35:11,320
You can scale down.
949
00:35:11,320 –> 00:35:12,360
You can shut things off.
950
00:35:12,360 –> 00:35:13,480
You can delete environments.
951
00:35:13,480 –> 00:35:14,760
You can deny premium tiers.
952
00:35:14,760 –> 00:35:16,040
You can restrict regions.
953
00:35:16,040 –> 00:35:17,640
You can enforce short-lock retention.
954
00:35:17,640 –> 00:35:19,480
You can stop pretending that a dev environment
955
00:35:19,480 –> 00:35:22,840
needs the same resilience posture as customer facing revenue.
956
00:35:22,840 –> 00:35:25,400
This is where automation stops being optimization
957
00:35:25,400 –> 00:35:27,720
and becomes enforcement amplification.
958
00:35:27,720 –> 00:35:29,160
The default posture in non-prod
959
00:35:29,160 –> 00:35:30,360
should be that things turn off
960
00:35:30,360 –> 00:35:32,120
unless someone actively keeps them on.
961
00:35:32,120 –> 00:35:35,640
Schedules, auto shutdown, life cycle rules, whatever mechanism you use,
962
00:35:35,640 –> 00:35:37,000
the intent is the same.
963
00:35:37,000 –> 00:35:40,760
The platform should require explicit justification for idle runtime
964
00:35:40,760 –> 00:35:42,760
because idle runtime is not innovation.
965
00:35:42,760 –> 00:35:43,560
It’s just billing.
966
00:35:43,560 –> 00:35:46,680
Now there’s a predictable pushback here.
967
00:35:46,680 –> 00:35:48,600
But developers need flexibility.
968
00:35:48,600 –> 00:35:49,240
Yes, they do.
969
00:35:49,240 –> 00:35:50,920
That’s why the goal isn’t to ban spend.
970
00:35:50,920 –> 00:35:52,120
It’s to encode purposes.
971
00:35:52,120 –> 00:35:55,400
If a team needs a larger environment for performance testing, fine.
972
00:35:55,400 –> 00:35:57,240
But it should happen through an approved path.
973
00:35:57,240 –> 00:35:59,320
Time boxed, budgeted, and visible.
974
00:35:59,320 –> 00:36:01,000
The platform can allow the exception
975
00:36:01,000 –> 00:36:02,760
while keeping the baseline strict.
976
00:36:02,760 –> 00:36:05,560
Without that, flexibility just becomes another word
977
00:36:05,560 –> 00:36:07,080
for architectural erosion.
978
00:36:07,080 –> 00:36:08,760
The outcome is also predictable
979
00:36:08,760 –> 00:36:11,720
and it’s measurable without inventing magic savings numbers.
980
00:36:11,720 –> 00:36:14,680
First, you reduce the number of long-lived idle environments
981
00:36:14,680 –> 00:36:16,600
because they get shut down by default.
982
00:36:16,600 –> 00:36:18,760
Second, you surface rogue environments early
983
00:36:18,760 –> 00:36:20,920
because budgets fire when spent deviates
984
00:36:20,920 –> 00:36:23,400
and the deviation can’t hide in a shared scope.
985
00:36:23,400 –> 00:36:25,800
Third, you force teams to make conscious choices
986
00:36:25,800 –> 00:36:27,320
about what they’re paying for in non-prod
987
00:36:27,320 –> 00:36:28,920
which changes behavior faster
988
00:36:28,920 –> 00:36:31,320
than any cost-awareness campaign ever will.
989
00:36:31,320 –> 00:36:32,680
And here’s the real payoff.
990
00:36:32,680 –> 00:36:35,640
The organization stops treating dev tests as a junk draw.
991
00:36:35,640 –> 00:36:38,680
Non-production becomes what it was supposed to be
992
00:36:38,680 –> 00:36:41,240
and intentionally bounded delivery capability.
993
00:36:41,240 –> 00:36:43,240
Not an unbounded parallel cloud estate.
994
00:36:43,240 –> 00:36:45,880
Now, even with clean, non-prod boundaries,
995
00:36:45,880 –> 00:36:47,960
there’s still one category of spend
996
00:36:47,960 –> 00:36:51,560
that loves to evade accountability, shared platform services.
997
00:36:51,560 –> 00:36:54,760
Because once you centralize networking, logging, and security,
998
00:36:54,760 –> 00:36:57,160
you’ve built a cost engine that every team depends on
999
00:36:57,160 –> 00:36:58,280
but few teams can see.
1000
00:36:58,280 –> 00:36:59,560
That’s the next failure mode.
1001
00:36:59,560 –> 00:37:03,800
Scenario 5, shared platform services with no cost signal,
1002
00:37:03,800 –> 00:37:07,560
shared platform services are where good intentions go to inflate quietly.
1003
00:37:07,560 –> 00:37:10,360
In the before posture, the organization centralizes
1004
00:37:10,360 –> 00:37:14,200
the expensive fundamentals, hub networking, firewalls,
1005
00:37:14,200 –> 00:37:18,600
private DNS, log analytics, Sentinel, Defender plans,
1006
00:37:18,600 –> 00:37:21,800
central key vault patterns, shared container registries,
1007
00:37:21,800 –> 00:37:25,160
monitoring pipelines, maybe an enterprise API gateway.
1008
00:37:25,160 –> 00:37:27,400
All of it is deployed for everyone,
1009
00:37:27,400 –> 00:37:29,560
which sounds efficient and it can be.
1010
00:37:29,560 –> 00:37:32,760
But the cost signal usually disappears the moment it becomes shared
1011
00:37:32,760 –> 00:37:35,000
because those services are built to somewhere,
1012
00:37:35,000 –> 00:37:36,600
a platform subscription,
1013
00:37:36,600 –> 00:37:39,560
a connectivity subscription, a management subscription,
1014
00:37:39,560 –> 00:37:42,280
a catch all that’s treated like a necessary tax.
1015
00:37:42,280 –> 00:37:45,080
The application team’s consumer, depend on it,
1016
00:37:45,080 –> 00:37:47,240
and then optimize only their own resource groups
1017
00:37:47,240 –> 00:37:49,560
because that’s what they can see and what they’re measured on.
1018
00:37:49,560 –> 00:37:53,240
So the platform spend becomes a black hole with a justification attached.
1019
00:37:53,240 –> 00:37:55,400
Here’s the operational behavior that follows.
1020
00:37:55,400 –> 00:37:58,680
Logangestion grows because every team enables diagnostics
1021
00:37:58,680 –> 00:38:01,000
at maximum verbosity temporarily,
1022
00:38:01,000 –> 00:38:02,920
and nobody owns the retention curve.
1023
00:38:02,920 –> 00:38:06,600
Network egress grows because architecture sprawl across regions,
1024
00:38:06,600 –> 00:38:09,880
vnet’s peer-like IV and traffic routes get fixed
1025
00:38:09,880 –> 00:38:11,640
in ways that are correct for availability
1026
00:38:11,640 –> 00:38:13,400
but catastrophic for cost.
1027
00:38:13,400 –> 00:38:17,640
Security tooling grows because every new capability adds another build meter,
1028
00:38:17,640 –> 00:38:20,600
and the platform team is incentivized to be safer, not cheaper,
1029
00:38:20,600 –> 00:38:22,280
and because it’s shared, nobody feels it.
1030
00:38:22,280 –> 00:38:24,120
The app team doesn’t feel the firewall build,
1031
00:38:24,120 –> 00:38:27,000
the platform team doesn’t feel the app team’s birth traffic,
1032
00:38:27,000 –> 00:38:29,720
finance sees a line item labeled platform,
1033
00:38:29,720 –> 00:38:32,440
and gets told, “It’s foundational,” which is true.
1034
00:38:32,440 –> 00:38:34,760
It’s also not an excuse for unbounded growth.
1035
00:38:34,760 –> 00:38:37,400
This is the core governance failure of shared services.
1036
00:38:37,400 –> 00:38:39,400
The enterprise funds a cost engine
1037
00:38:39,400 –> 00:38:42,600
without attaching economic feedback to the consumers of that engine,
1038
00:38:42,600 –> 00:38:45,800
without feedback consumption expands until it hits a crisis.
1039
00:38:45,800 –> 00:38:48,120
Then the crisis is framed as Azure is expensive
1040
00:38:48,120 –> 00:38:51,240
when it’s actually shared services are unmetered internally.
1041
00:38:51,240 –> 00:38:55,240
The after-poster is not split everything into separate subscriptions.
1042
00:38:55,240 –> 00:38:56,200
That’s not the point.
1043
00:38:56,200 –> 00:38:59,800
The point is to make shared costs legible and attributable
1044
00:38:59,800 –> 00:39:02,520
even when the underlying service must remain centralized.
1045
00:39:02,520 –> 00:39:05,080
So the first step is explicit platform subscriptions
1046
00:39:05,080 –> 00:39:06,840
with explicit accountability,
1047
00:39:06,840 –> 00:39:08,280
not the cloud team owns it,
1048
00:39:08,280 –> 00:39:10,840
a name platform owner, a documented service catalog,
1049
00:39:10,840 –> 00:39:13,320
a declared budget, early thresholds,
1050
00:39:13,320 –> 00:39:15,000
the same governance rules you demanded
1051
00:39:15,000 –> 00:39:16,600
for every workload subscription.
1052
00:39:16,600 –> 00:39:18,360
Because shared services don’t get a pass,
1053
00:39:18,360 –> 00:39:21,880
they are the highest risk-spend category in the entire estate.
1054
00:39:21,880 –> 00:39:24,440
Then you add the piece everyone avoids.
1055
00:39:24,440 –> 00:39:27,720
An allocation model, not perfect, not theoretically pure,
1056
00:39:27,720 –> 00:39:30,680
just consistent, defensible, and repeatable.
1057
00:39:30,680 –> 00:39:33,080
Some shared costs can be allocated by usage.
1058
00:39:33,080 –> 00:39:35,480
Log analytics ingestion, sentinel data,
1059
00:39:35,480 –> 00:39:37,800
firewall processing metrics in some cases,
1060
00:39:37,800 –> 00:39:40,040
egressed by workload if you collect flow logs
1061
00:39:40,040 –> 00:39:42,600
and map them back to subscriptions or v-nets.
1062
00:39:42,600 –> 00:39:45,000
If you can measure consumption, allocate by consumption.
1063
00:39:45,000 –> 00:39:47,960
But many shared costs can’t be allocated cleanly,
1064
00:39:47,960 –> 00:39:49,720
without building an internal billing system
1065
00:39:49,720 –> 00:39:50,920
that nobody wants.
1066
00:39:50,920 –> 00:39:53,640
So you use proportional allocation where you must.
1067
00:39:53,640 –> 00:39:55,240
Percentage by subscription spend,
1068
00:39:55,240 –> 00:39:58,040
percentage by headcount, percentage by throughput class,
1069
00:39:58,040 –> 00:39:59,560
whatever the business will accept
1070
00:39:59,560 –> 00:40:02,280
is stable enough to create a feedback loop.
1071
00:40:02,280 –> 00:40:04,840
The critical requirement isn’t mathematical perfection.
1072
00:40:04,840 –> 00:40:06,520
The requirement is that shared spend
1073
00:40:06,520 –> 00:40:07,880
stops being invisible.
1074
00:40:07,880 –> 00:40:09,480
Because invisibility is what lets it grow
1075
00:40:09,480 –> 00:40:10,440
without design review.
1076
00:40:10,440 –> 00:40:12,280
This is also where showback and chargeback
1077
00:40:12,280 –> 00:40:14,120
stop being philosophical arguments
1078
00:40:14,120 –> 00:40:15,880
and become engineering inputs.
1079
00:40:15,880 –> 00:40:18,280
If an application team sees that their architecture is driving
1080
00:40:18,280 –> 00:40:20,040
a disproportionate share of log ingestion
1081
00:40:20,040 –> 00:40:21,160
or cross-region traffic,
1082
00:40:21,160 –> 00:40:23,240
the next design discussion changes.
1083
00:40:23,240 –> 00:40:25,480
Suddenly retention settings, sampling strategies,
1084
00:40:25,480 –> 00:40:27,320
diagnostics scope, entropology,
1085
00:40:27,320 –> 00:40:29,400
are not abstract platform concerns,
1086
00:40:29,400 –> 00:40:32,040
but their product decisions with financial consequences.
1087
00:40:32,040 –> 00:40:34,440
And yes, some teams will complain that it’s unfair.
1088
00:40:34,440 –> 00:40:35,000
Good.
1089
00:40:35,000 –> 00:40:36,680
Fairness complaints are often the first sign
1090
00:40:36,680 –> 00:40:38,520
that cost signals are finally reaching the people
1091
00:40:38,520 –> 00:40:39,640
making the trade-offs.
1092
00:40:39,640 –> 00:40:42,280
Now, here’s the part that most enterprises miss.
1093
00:40:42,280 –> 00:40:44,200
Shared platform costs should be discussed
1094
00:40:44,200 –> 00:40:46,120
as architectural constraints, not as builds.
1095
00:40:46,120 –> 00:40:49,400
If you centralize logging, you must also centralize logging policy.
1096
00:40:49,400 –> 00:40:51,800
Retention limits by environment, sampling defaults,
1097
00:40:51,800 –> 00:40:53,800
what debug means in production,
1098
00:40:53,800 –> 00:40:55,560
what data is worth paying to store.
1099
00:40:55,560 –> 00:40:56,920
If you centralize networking,
1100
00:40:56,920 –> 00:40:58,840
you must centralize topology standards,
1101
00:40:58,840 –> 00:41:00,440
where traffic is allowed to flow,
1102
00:41:00,440 –> 00:41:02,440
when cross-region is justified,
1103
00:41:02,440 –> 00:41:04,920
what services are allowed to punch through the hub.
1104
00:41:04,920 –> 00:41:07,720
Otherwise, the platform team becomes the custodian
1105
00:41:07,720 –> 00:41:10,280
of a cost-service area that everyone can expand
1106
00:41:10,280 –> 00:41:11,400
and no one can shrink.
1107
00:41:11,400 –> 00:41:12,920
So the outcome of the after-poster
1108
00:41:12,920 –> 00:41:14,200
is not just predictability,
1109
00:41:14,200 –> 00:41:16,120
it’s earlier financial design decisions.
1110
00:41:16,120 –> 00:41:17,720
Platform cost becomes a known,
1111
00:41:17,720 –> 00:41:19,960
modeled component of architecture review.
1112
00:41:19,960 –> 00:41:21,800
It becomes part of landing zone standards,
1113
00:41:21,800 –> 00:41:23,960
it becomes part of exception governance.
1114
00:41:23,960 –> 00:41:25,240
And most importantly,
1115
00:41:25,240 –> 00:41:27,960
it stops being a mystery that shows up as a quarterly surprise.
1116
00:41:27,960 –> 00:41:29,720
Now, once platform costs are legible,
1117
00:41:29,720 –> 00:41:32,040
the enterprise usually makes its next mistake.
1118
00:41:32,040 –> 00:41:34,200
They treat budgets like household trackers.
1119
00:41:34,200 –> 00:41:36,360
That’s next, budgets are intense signals,
1120
00:41:36,360 –> 00:41:37,640
not household trackers.
1121
00:41:37,640 –> 00:41:39,160
Budgets are the most misunderstood
1122
00:41:39,160 –> 00:41:40,360
Finops control in Azure,
1123
00:41:40,360 –> 00:41:41,960
and it’s predictable why.
1124
00:41:41,960 –> 00:41:44,520
Most organizations use them like a household expense app.
1125
00:41:44,520 –> 00:41:47,080
Set a number, watch it panic when it turns red,
1126
00:41:47,080 –> 00:41:48,600
then do nothing meaningful,
1127
00:41:48,600 –> 00:41:50,440
because the month is basically over.
1128
00:41:50,440 –> 00:41:53,080
That is not what budgets are for in an enterprise cloud.
1129
00:41:53,080 –> 00:41:54,600
A budget is not a spending limit,
1130
00:41:54,600 –> 00:41:56,120
Azure will not stop your workloads,
1131
00:41:56,120 –> 00:41:58,040
Azure will not shut down your platform.
1132
00:41:58,040 –> 00:41:59,240
A budget is a signal.
1133
00:41:59,240 –> 00:42:01,880
It’s the platform telling you that actual behavior
1134
00:42:01,880 –> 00:42:05,080
is diverging from declared intent early enough to intervene.
1135
00:42:05,080 –> 00:42:06,120
That distinction matters,
1136
00:42:06,120 –> 00:42:08,280
because budgets only work when they are attached
1137
00:42:08,280 –> 00:42:09,880
to ownership and action.
1138
00:42:09,880 –> 00:42:12,840
If a budget alert lands in a shared mailbox, it is theater.
1139
00:42:12,840 –> 00:42:14,520
If it lands with an accountable owner
1140
00:42:14,520 –> 00:42:16,280
who has both authority and consequence,
1141
00:42:16,280 –> 00:42:17,320
it becomes governance.
1142
00:42:17,320 –> 00:42:19,960
And if it lands early enough that changes are still cheap,
1143
00:42:19,960 –> 00:42:21,560
it becomes an operational interrupt,
1144
00:42:21,560 –> 00:42:23,000
not a finance post-mortem.
1145
00:42:23,000 –> 00:42:24,280
Most enterprises do the opposite.
1146
00:42:24,280 –> 00:42:27,160
They set budgets late, they set them at the wrong scope,
1147
00:42:27,160 –> 00:42:28,680
and they set thresholds that fire
1148
00:42:28,680 –> 00:42:30,600
after the enterprises already burn the money.
1149
00:42:30,600 –> 00:42:32,200
The classic example is a monthly budget
1150
00:42:32,200 –> 00:42:33,800
with a 90% alert.
1151
00:42:33,800 –> 00:42:35,800
That alert triggers when the month is nearly finished,
1152
00:42:35,800 –> 00:42:37,000
the spend has already happened,
1153
00:42:37,000 –> 00:42:39,800
and your options are limited to eat it or break something.
1154
00:42:39,800 –> 00:42:40,760
That’s not a control.
1155
00:42:40,760 –> 00:42:43,320
That’s a notification that your control model failed.
1156
00:42:43,320 –> 00:42:45,640
So budgets need three rules that are non-negotiable.
1157
00:42:45,640 –> 00:42:47,960
First, align budgets to ownership boundaries.
1158
00:42:47,960 –> 00:42:50,840
If a team owns a subscription, that subscription needs a budget.
1159
00:42:50,840 –> 00:42:53,640
If a platform domain owns a shared services subscription,
1160
00:42:53,640 –> 00:42:55,160
that subscription needs a budget.
1161
00:42:55,160 –> 00:42:57,320
If you can’t name the owner, you can’t budget it,
1162
00:42:57,320 –> 00:42:59,960
because there is no decision maker to receive the signal.
1163
00:42:59,960 –> 00:43:02,120
Budgeting, on-ones spend, is just documenting
1164
00:43:02,120 –> 00:43:03,720
a problem you refuse to fix.
1165
00:43:03,720 –> 00:43:05,720
Second, budgets must fire early.
1166
00:43:05,720 –> 00:43:08,680
50% and 70% thresholds aren’t warnings.
1167
00:43:08,680 –> 00:43:10,440
They are deliberately placed interrupts.
1168
00:43:10,440 –> 00:43:12,760
They force the question, is this spend consistent
1169
00:43:12,760 –> 00:43:14,520
with what we expected the subscription to do?
1170
00:43:14,520 –> 00:43:17,000
If yes, then the organization updates its intent,
1171
00:43:17,000 –> 00:43:18,600
budget, forecast or constraints.
1172
00:43:18,600 –> 00:43:20,680
If no, then the organization intervenes,
1173
00:43:20,680 –> 00:43:22,280
right sizing, disabling a feature,
1174
00:43:22,280 –> 00:43:23,560
killing a runaway environment
1175
00:43:23,560 –> 00:43:25,800
or denying the next scale out through policy.
1176
00:43:25,800 –> 00:43:27,400
Budgets are not there to shame people.
1177
00:43:27,400 –> 00:43:28,680
They’re there to force a decision
1178
00:43:28,680 –> 00:43:30,200
while the decision is still reversible.
1179
00:43:30,200 –> 00:43:33,960
Third, budget alerts must trigger action, not email.
1180
00:43:33,960 –> 00:43:36,440
Email is where accountability goes to die.
1181
00:43:36,440 –> 00:43:37,800
If you want budgets to matter,
1182
00:43:37,800 –> 00:43:39,720
root the alert into an escalation lane
1183
00:43:39,720 –> 00:43:41,720
that produces a tracked artifact,
1184
00:43:41,720 –> 00:43:43,640
a ticket in your ITSM tool,
1185
00:43:43,640 –> 00:43:46,920
a message into the right teams channel with the owner tagged,
1186
00:43:46,920 –> 00:43:48,920
an incident workflow for spend spikes
1187
00:43:48,920 –> 00:43:50,600
that threaten financial controls.
1188
00:43:50,600 –> 00:43:51,960
The point is not the tool.
1189
00:43:51,960 –> 00:43:53,800
The point is that an alert becomes a cue
1190
00:43:53,800 –> 00:43:56,200
with an owner and a response expectation.
1191
00:43:56,200 –> 00:43:57,800
And yes, you can do this in Azure
1192
00:43:57,800 –> 00:43:59,560
with action groups, webhooks, logic apps,
1193
00:43:59,560 –> 00:44:01,720
and whatever workflow system your org already pretends
1194
00:44:01,720 –> 00:44:02,840
is standardized.
1195
00:44:02,840 –> 00:44:05,080
The mechanism is implementation detail.
1196
00:44:05,080 –> 00:44:06,600
The model is what matters.
1197
00:44:06,600 –> 00:44:09,400
Budget trigger governance, not awareness.
1198
00:44:09,400 –> 00:44:10,760
Now, there’s a common objection.
1199
00:44:10,760 –> 00:44:13,320
Budgets create alert fatigue.
1200
00:44:13,320 –> 00:44:14,920
They do if you design them like spam,
1201
00:44:14,920 –> 00:44:16,280
if you create budgets everywhere
1202
00:44:16,280 –> 00:44:18,360
at every scope with noisy thresholds,
1203
00:44:18,360 –> 00:44:19,720
you will flood the organization
1204
00:44:19,720 –> 00:44:21,240
with alerts that represent nothing.
1205
00:44:21,240 –> 00:44:23,080
Then teams mute them
1206
00:44:23,080 –> 00:44:25,240
and your budgets turn into background radiation.
1207
00:44:25,240 –> 00:44:26,360
That’s not a people problem.
1208
00:44:26,360 –> 00:44:27,480
That’s a design problem.
1209
00:44:27,480 –> 00:44:29,720
Avoid alert fatigue by having fewer budgets
1210
00:44:29,720 –> 00:44:30,680
with sharper scopes.
1211
00:44:30,680 –> 00:44:32,440
Put budgets where there is real ownership
1212
00:44:32,440 –> 00:44:34,120
and real financial exposure,
1213
00:44:34,120 –> 00:44:35,080
subscriptions,
1214
00:44:35,080 –> 00:44:37,080
platform domains, high-risk workloads
1215
00:44:37,080 –> 00:44:39,960
like AI, data, egress, heavy architectures,
1216
00:44:39,960 –> 00:44:42,360
and non-prod estates that love to sprawl.
1217
00:44:42,360 –> 00:44:44,360
Don’t budget every resource group in the estate
1218
00:44:44,360 –> 00:44:45,720
because it feels thorough.
1219
00:44:45,720 –> 00:44:47,080
Thuroness is not control.
1220
00:44:47,080 –> 00:44:49,160
Control is knowing which levers matter.
1221
00:44:49,160 –> 00:44:52,760
Also, don’t treat budget alerts as failures.
1222
00:44:52,760 –> 00:44:55,160
A fired budget alert is not an incident by default.
1223
00:44:55,160 –> 00:44:56,760
It’s an anomaly indicator.
1224
00:44:56,760 –> 00:44:59,240
Sometimes the anomaly is legitimate growth.
1225
00:44:59,240 –> 00:45:03,000
A new workload, a migration phase, a seasonal spike.
1226
00:45:03,000 –> 00:45:04,360
The alert still did its job
1227
00:45:04,360 –> 00:45:06,120
because it forced the organization to acknowledge
1228
00:45:06,120 –> 00:45:07,560
that intent changed.
1229
00:45:07,560 –> 00:45:09,560
The worst outcome is not budget exceeded.
1230
00:45:09,560 –> 00:45:11,160
The worst outcome is budget exceeded
1231
00:45:11,160 –> 00:45:13,240
and nobody noticed until the invoice.
1232
00:45:13,240 –> 00:45:15,960
So if budgets are signals, what are they signaling?
1233
00:45:15,960 –> 00:45:17,400
They’re signaling one of three things,
1234
00:45:17,400 –> 00:45:19,000
drift growth or fraud.
1235
00:45:19,000 –> 00:45:21,480
Drift means something is running that shouldn’t be.
1236
00:45:21,480 –> 00:45:23,640
Growth means your usage pattern changed
1237
00:45:23,640 –> 00:45:24,920
and your budget model is stale.
1238
00:45:24,920 –> 00:45:28,520
Fraud in the broad sense means an unexpected pathway
1239
00:45:28,520 –> 00:45:30,040
is consuming resources
1240
00:45:30,040 –> 00:45:33,000
and cost is acting as the earliest signal something is wrong.
1241
00:45:33,000 –> 00:45:34,600
Budgets can’t tell you which one it is.
1242
00:45:34,600 –> 00:45:37,320
That’s your job, but budgets can tell you when to look.
1243
00:45:37,320 –> 00:45:39,800
Early, reliably at scale.
1244
00:45:39,800 –> 00:45:41,160
Now, here’s the catch.
1245
00:45:41,160 –> 00:45:44,040
None of this works unless budgets have an accountability model
1246
00:45:44,040 –> 00:45:44,840
behind them.
1247
00:45:44,840 –> 00:45:46,360
Otherwise, you’re just watching numbers move
1248
00:45:46,360 –> 00:45:47,880
and calling it governance.
1249
00:45:47,880 –> 00:45:50,520
Accountability models, showback, chargeback,
1250
00:45:50,520 –> 00:45:51,640
and the real point.
1251
00:45:51,640 –> 00:45:53,880
This is where every enterprise waste six months.
1252
00:45:53,880 –> 00:45:56,360
The religious war between showback and chargeback.
1253
00:45:56,360 –> 00:45:58,600
Finance wants chargeback because it looks like control.
1254
00:45:58,600 –> 00:46:02,120
Engineering wants to showback because it looks like collaboration.
1255
00:46:02,120 –> 00:46:05,480
Someone says we can’t do chargeback until tagging is perfect.
1256
00:46:05,480 –> 00:46:09,000
And someone else says we won’t fix tagging unless we do chargeback.
1257
00:46:09,000 –> 00:46:12,360
Then the meeting ends, nothing changes and the platform keeps spending.
1258
00:46:12,360 –> 00:46:13,720
That argument misses the point.
1259
00:46:13,720 –> 00:46:15,640
Showback and chargeback are not ideologies.
1260
00:46:15,640 –> 00:46:16,920
They are feedback mechanisms.
1261
00:46:16,920 –> 00:46:19,720
The only thing that matters is whether the cost signal
1262
00:46:19,720 –> 00:46:23,000
reaches the person who made the decision that created the spend.
1263
00:46:23,000 –> 00:46:25,400
Fast enough for them to change the next decision.
1264
00:46:25,400 –> 00:46:27,880
If the signal doesn’t reach the decision maker,
1265
00:46:27,880 –> 00:46:29,080
you don’t have accountability.
1266
00:46:29,080 –> 00:46:30,280
You have reporting.
1267
00:46:30,280 –> 00:46:33,640
Showback is the early stage tool for building trust in the data.
1268
00:46:33,640 –> 00:46:35,400
It says, here’s what you consumed.
1269
00:46:35,400 –> 00:46:37,560
Here’s the unit of allocation we agreed on.
1270
00:46:37,560 –> 00:46:39,000
And here’s how it maps to the org.
1271
00:46:39,000 –> 00:46:40,760
No money moves, no budgets get hit.
1272
00:46:40,760 –> 00:46:43,400
The goal is to remove the your numbers are wrong debate
1273
00:46:43,400 –> 00:46:45,080
and replace it with boring acceptance.
1274
00:46:45,080 –> 00:46:46,920
Because until the numbers are boring,
1275
00:46:46,920 –> 00:46:48,280
nobody will accept chargeback.
1276
00:46:48,280 –> 00:46:51,080
Chargeback is the enforcement tool for making cost a real constraint
1277
00:46:51,080 –> 00:46:52,920
that moves money or at least it moves budget.
1278
00:46:52,920 –> 00:46:54,360
It creates an economic consequence
1279
00:46:54,360 –> 00:46:56,360
that forces teams to treat cloud consumption
1280
00:46:56,360 –> 00:46:58,760
like any other resource they can’t waste without trade-offs.
1281
00:46:58,760 –> 00:47:00,120
But here’s the uncomfortable truth.
1282
00:47:00,120 –> 00:47:03,320
Neither showback nor chargeback fixes anything by itself.
1283
00:47:03,320 –> 00:47:04,840
They are both downstream of governance.
1284
00:47:04,840 –> 00:47:07,000
If you didn’t enforce ownership boundaries tagging
1285
00:47:07,000 –> 00:47:09,160
SKU constraints and budget escalation,
1286
00:47:09,160 –> 00:47:12,200
then chargeback just turns chaos into internal invoices.
1287
00:47:12,200 –> 00:47:14,920
So you’ll spend your year mediating disputes between teams about
1288
00:47:14,920 –> 00:47:17,160
who pays for a shared log analytics workspace
1289
00:47:17,160 –> 00:47:18,680
that nobody scoped correctly.
1290
00:47:18,680 –> 00:47:19,960
That isn’t accountability.
1291
00:47:19,960 –> 00:47:21,640
That’s internal billing theater.
1292
00:47:21,640 –> 00:47:24,440
And showback without enforcement becomes wallpaper.
1293
00:47:24,440 –> 00:47:27,240
People glance at it, nod and keep deploying the same way
1294
00:47:27,240 –> 00:47:29,560
because nothing in their system changes when the number goes up.
1295
00:47:29,560 –> 00:47:31,160
So the sequence is deterministic.
1296
00:47:31,160 –> 00:47:33,400
First, showback to establish legitimacy.
1297
00:47:33,400 –> 00:47:35,640
Second, chargeback to establish consequence.
1298
00:47:35,640 –> 00:47:38,440
And the real point of both is to create an economic feedback loop
1299
00:47:38,440 –> 00:47:39,800
that closes fast.
1300
00:47:39,800 –> 00:47:42,600
A mature organization doesn’t pick showback or chargeback.
1301
00:47:42,600 –> 00:47:44,840
It uses both deliberately at different stages
1302
00:47:44,840 –> 00:47:46,440
for different kinds of spend.
1303
00:47:46,440 –> 00:47:48,600
Now accountability isn’t just who pays.
1304
00:47:48,600 –> 00:47:50,280
It’s who is accountable for the guardrails.
1305
00:47:50,280 –> 00:47:52,360
This is where enterprises get sloppy
1306
00:47:52,360 –> 00:47:54,920
because they pretend accountability is a cultural concept.
1307
00:47:54,920 –> 00:47:56,680
It isn’t. It’s a design requirement.
1308
00:47:56,680 –> 00:47:57,960
In a real operating model,
1309
00:47:57,960 –> 00:48:00,440
the FinOps team or cloud economics function,
1310
00:48:00,440 –> 00:48:02,360
call it whatever your org chart tolerates,
1311
00:48:02,360 –> 00:48:04,200
should be accountable for defining the guardrails
1312
00:48:04,200 –> 00:48:05,560
and the measurement model.
1313
00:48:05,560 –> 00:48:08,440
App teams should be responsible for staying inside those guardrails,
1314
00:48:08,440 –> 00:48:11,400
including tagging compliance and cost-aware design decisions.
1315
00:48:11,400 –> 00:48:13,240
Finance should be consulted for budgeting,
1316
00:48:13,240 –> 00:48:16,040
allocation rules, and the mechanics of internal charge.
1317
00:48:16,040 –> 00:48:17,320
Leadership should be informed,
1318
00:48:17,320 –> 00:48:19,880
not asked to resolve the same argument every month.
1319
00:48:19,880 –> 00:48:20,920
That’s not bureaucracy.
1320
00:48:20,920 –> 00:48:23,960
That’s how you stop ownership from dissolving into everyone’s problem,
1321
00:48:23,960 –> 00:48:26,520
which is just a polite way to say nobody’s problem.
1322
00:48:26,520 –> 00:48:29,400
And there’s another shift happening that enterprises keep ignoring.
1323
00:48:29,400 –> 00:48:30,920
FinOps is no longer just cloud.
1324
00:48:30,920 –> 00:48:34,120
The FinOps Foundation’s newer cloud plus framing and scopes idea exists
1325
00:48:34,120 –> 00:48:35,720
because spending doesn’t stay contained.
1326
00:48:35,720 –> 00:48:36,760
Cloud leads to SASS.
1327
00:48:36,760 –> 00:48:38,600
SASS leads to licensing sprawl.
1328
00:48:38,600 –> 00:48:40,600
AI leads to token burn and GPU bills
1329
00:48:40,600 –> 00:48:42,840
that make your old VM arguments look adorable.
1330
00:48:42,840 –> 00:48:44,600
If you build an accountability model
1331
00:48:44,600 –> 00:48:46,360
that only works for VMspend,
1332
00:48:46,360 –> 00:48:48,200
you’re building yesterday’s governance.
1333
00:48:48,200 –> 00:48:49,960
So the practical posture is scopes.
1334
00:48:49,960 –> 00:48:52,840
Apply accountability where spend concentrates.
1335
00:48:52,840 –> 00:48:56,440
Cloud scope, subscriptions, tagging, budget, policy enforcement,
1336
00:48:56,440 –> 00:48:59,480
AI scope, model usage, token forecasts,
1337
00:48:59,480 –> 00:49:02,760
stricter anomaly response because costs can spike fast.
1338
00:49:02,760 –> 00:49:05,080
Shared services scope, allocation rules
1339
00:49:05,080 –> 00:49:06,920
that make platforms spend legible,
1340
00:49:06,920 –> 00:49:08,760
even when it can’t be perfectly metered.
1341
00:49:08,760 –> 00:49:10,280
You don’t need to boil the ocean.
1342
00:49:10,280 –> 00:49:13,000
You need to stop pretending a single model fits everything.
1343
00:49:13,000 –> 00:49:14,760
The simplest version is this accountability
1344
00:49:14,760 –> 00:49:16,520
must follow decision rights.
1345
00:49:16,520 –> 00:49:18,440
If engineering can choose the SKU,
1346
00:49:18,440 –> 00:49:20,360
engineering must see the cost signal.
1347
00:49:20,360 –> 00:49:23,080
If the platform team controls diagnostics defaults,
1348
00:49:23,080 –> 00:49:26,200
the platform team must own the retention economics.
1349
00:49:26,200 –> 00:49:28,600
If leadership demands multi-region resilience,
1350
00:49:28,600 –> 00:49:30,280
leadership must accept the price tag
1351
00:49:30,280 –> 00:49:32,600
as a design decision, not a surprise invoice.
1352
00:49:32,600 –> 00:49:34,200
Once that operating model is real,
1353
00:49:34,200 –> 00:49:36,440
showback and chargeback, stop being arguments.
1354
00:49:36,440 –> 00:49:37,960
They become implementation details.
1355
00:49:37,960 –> 00:49:39,560
And now the transition that matters.
1356
00:49:39,560 –> 00:49:41,240
Once accountability exists,
1357
00:49:41,240 –> 00:49:43,400
enforcement has to live where the decisions happen.
1358
00:49:43,400 –> 00:49:45,080
Not in meetings in the control plane.
1359
00:49:45,080 –> 00:49:47,240
The enforcement stack, policy, RBIQ,
1360
00:49:47,240 –> 00:49:48,760
budgets, deployment stamps.
1361
00:49:48,760 –> 00:49:50,760
So if cost is an authorization outcome,
1362
00:49:50,760 –> 00:49:52,760
an accountability is the feedback loop.
1363
00:49:52,760 –> 00:49:54,360
Then enforcement is the only part
1364
00:49:54,360 –> 00:49:56,360
that actually survives contact with reality.
1365
00:49:56,360 –> 00:49:59,480
Meetings don’t enforce, slide decks don’t enforce.
1366
00:49:59,480 –> 00:50:00,920
Cost awareness doesn’t enforce.
1367
00:50:00,920 –> 00:50:03,000
The control plane enforces.
1368
00:50:03,000 –> 00:50:06,280
And the enforcement stack in Azure is not complicated.
1369
00:50:06,280 –> 00:50:08,440
It’s just unpopular because it removes freedom people
1370
00:50:08,440 –> 00:50:09,640
already got used to.
1371
00:50:09,640 –> 00:50:10,840
Start with Azure policy
1372
00:50:10,840 –> 00:50:13,720
because it’s the closest thing Azure has to an authorization
1373
00:50:13,720 –> 00:50:15,160
compiler for cost intent.
1374
00:50:15,160 –> 00:50:17,960
Policy is where you encode the non-negotiables,
1375
00:50:17,960 –> 00:50:21,240
required tags, allowed regions, allowed SKUs
1376
00:50:21,240 –> 00:50:22,520
and configuration baselines
1377
00:50:22,520 –> 00:50:24,280
that have real financial impact,
1378
00:50:24,280 –> 00:50:25,640
deny is the blunt instrument,
1379
00:50:25,640 –> 00:50:27,080
modify is the quieter one.
1380
00:50:27,080 –> 00:50:28,760
Deploy if not exists is the,
1381
00:50:28,760 –> 00:50:30,040
you’re going to pay for this anyway,
1382
00:50:30,040 –> 00:50:31,880
so we’re going to standardize it move.
1383
00:50:31,880 –> 00:50:34,040
The key is that policy runs at deploy time.
1384
00:50:34,040 –> 00:50:35,240
It doesn’t ask for cooperation.
1385
00:50:35,240 –> 00:50:36,760
It evaluates intent
1386
00:50:36,760 –> 00:50:39,400
and either materializes capacity or refuses it.
1387
00:50:39,400 –> 00:50:41,960
That means you stop trying to convince teams to behave
1388
00:50:41,960 –> 00:50:43,480
and you start designing the platform.
1389
00:50:43,480 –> 00:50:44,840
So behavior has boundaries.
1390
00:50:44,840 –> 00:50:47,000
Now this is where most enterprises sabotage themselves.
1391
00:50:47,000 –> 00:50:49,240
They treat policy exemptions like kindness.
1392
00:50:49,240 –> 00:50:50,520
Exemptions are not kindness.
1393
00:50:50,520 –> 00:50:52,200
They are entropy generators.
1394
00:50:52,200 –> 00:50:53,720
Every exemption should be visible,
1395
00:50:53,720 –> 00:50:55,560
justified, time boxed and reviewed
1396
00:50:55,560 –> 00:50:58,760
because if you can’t explain why something bypassed the rules
1397
00:50:58,760 –> 00:51:00,120
you didn’t make an exception,
1398
00:51:00,120 –> 00:51:01,800
you created a second rule set
1399
00:51:01,800 –> 00:51:03,640
that only some people know exists.
1400
00:51:03,640 –> 00:51:04,760
Next layer is RBAC
1401
00:51:04,760 –> 00:51:06,360
because policy without permission design
1402
00:51:06,360 –> 00:51:08,440
is just a guard rail around a highway exit,
1403
00:51:08,440 –> 00:51:09,480
nobody controls.
1404
00:51:09,480 –> 00:51:11,320
Most organizations have a contributor problem.
1405
00:51:11,320 –> 00:51:13,640
They hand out broad contributor at subscription scope
1406
00:51:13,640 –> 00:51:15,240
because it makes delivery easy
1407
00:51:15,240 –> 00:51:17,800
and then they act surprised when spend is unbounded.
1408
00:51:17,800 –> 00:51:19,560
Contributor is not empowerment.
1409
00:51:19,560 –> 00:51:21,400
It is a spend authorization primitive.
1410
00:51:21,400 –> 00:51:23,880
If a team can create resources they can create cost.
1411
00:51:23,880 –> 00:51:27,640
If they can assign roles they can create new spend pathways
1412
00:51:27,640 –> 00:51:29,560
and if they can deploy without a cost signal
1413
00:51:29,560 –> 00:51:31,800
they can externalize the consequences.
1414
00:51:31,800 –> 00:51:33,160
So RBAC needs two outcomes.
1415
00:51:33,160 –> 00:51:35,000
First the people making deployment decisions
1416
00:51:35,000 –> 00:51:37,560
must be able to see cost, cost management reader
1417
00:51:37,560 –> 00:51:39,720
or equivalent visibility for engineering,
1418
00:51:39,720 –> 00:51:41,160
leads and platform owners.
1419
00:51:41,160 –> 00:51:43,960
If they can’t see cost trends, budgets and anomalies
1420
00:51:43,960 –> 00:51:46,760
they are operating blind and blind systems drift.
1421
00:51:46,760 –> 00:51:49,720
Second, deploy authority and spend accountability
1422
00:51:49,720 –> 00:51:52,120
can’t be the same unmanaged blob.
1423
00:51:52,120 –> 00:51:54,280
That doesn’t mean you create a bureaucracy of approvals.
1424
00:51:54,280 –> 00:51:55,880
That means you design roles
1425
00:51:55,880 –> 00:51:58,200
so the enterprise can tell who is allowed to do what
1426
00:51:58,200 –> 00:52:00,280
and who is responsible when it goes wrong.
1427
00:52:00,280 –> 00:52:01,880
And yes, that often means pipelines
1428
00:52:01,880 –> 00:52:03,800
and managed identities get narrowed permissions
1429
00:52:03,800 –> 00:52:05,640
not contributor because it works.
1430
00:52:05,640 –> 00:52:08,200
Because it works is how cost entropy gets funded.
1431
00:52:08,200 –> 00:52:10,040
Third layer is budgets but not as tracking
1432
00:52:10,040 –> 00:52:11,320
as escalation engines.
1433
00:52:11,320 –> 00:52:13,720
Budgets are the interrupt system
1434
00:52:13,720 –> 00:52:16,200
that tells you intent and reality diverged.
1435
00:52:16,200 –> 00:52:17,080
They don’t stop spend.
1436
00:52:17,080 –> 00:52:18,040
They root attention
1437
00:52:18,040 –> 00:52:20,280
and if they aren’t wired into action,
1438
00:52:20,280 –> 00:52:22,200
tickets, paging, escalation channels,
1439
00:52:22,200 –> 00:52:24,680
they are a compliance checkbox that produces email.
1440
00:52:24,680 –> 00:52:26,520
Budgets should exist at subscription scope
1441
00:52:26,520 –> 00:52:28,120
because that’s where ownership is legible
1442
00:52:28,120 –> 00:52:29,560
and blast radius is bounded.
1443
00:52:29,560 –> 00:52:32,040
They can also exist at higher scopes for rollups
1444
00:52:32,040 –> 00:52:33,560
but the place where action happens
1445
00:52:33,560 –> 00:52:35,640
is where someone can actually change something
1446
00:52:35,640 –> 00:52:37,000
without a committee.
1447
00:52:37,000 –> 00:52:38,280
And budgets should fire early
1448
00:52:38,280 –> 00:52:40,280
because the platform needs time to respond.
1449
00:52:40,280 –> 00:52:42,680
50 and 70% thresholds are not conservative.
1450
00:52:42,680 –> 00:52:43,560
They are practical.
1451
00:52:43,560 –> 00:52:44,840
They’re the only way to catch drift
1452
00:52:44,840 –> 00:52:46,040
while you still have options.
1453
00:52:46,040 –> 00:52:48,280
Now the fourth layer is the part people ignore
1454
00:52:48,280 –> 00:52:50,120
because it feels like platform engineering
1455
00:52:50,120 –> 00:52:51,000
not finnops.
1456
00:52:51,000 –> 00:52:52,200
Deployment stamps.
1457
00:52:52,200 –> 00:52:53,480
Guarded environments.
1458
00:52:53,480 –> 00:52:55,240
Standard patterns.
1459
00:52:55,240 –> 00:52:56,520
Call them what you want.
1460
00:52:56,520 –> 00:52:57,640
The idea is the same.
1461
00:52:57,640 –> 00:53:00,600
You stop letting every team invent their own cost model
1462
00:53:00,600 –> 00:53:01,720
by accident.
1463
00:53:01,720 –> 00:53:04,760
A stamp is a pre-approved, pre-constrained deployment pattern
1464
00:53:04,760 –> 00:53:06,520
networking, logging, diagnostics,
1465
00:53:06,520 –> 00:53:08,920
SKU baselines, scaling rules, retention settings
1466
00:53:08,920 –> 00:53:11,160
and whatever else always turns into surprise spend.
1467
00:53:11,160 –> 00:53:13,800
When teams deploy through the stamp,
1468
00:53:13,800 –> 00:53:15,800
they inherit the constraints and the defaults
1469
00:53:15,800 –> 00:53:17,080
and the platform doesn’t really
1470
00:53:17,080 –> 00:53:19,080
delegate the same cost mistakes 400 times.
1471
00:53:19,080 –> 00:53:21,560
This is how you scale autonomy without scaling chaos
1472
00:53:21,560 –> 00:53:24,120
because you’re not restricting teams to one architecture.
1473
00:53:24,120 –> 00:53:25,640
You’re restricting them to architectures
1474
00:53:25,640 –> 00:53:27,160
with known cost behavior
1475
00:53:27,160 –> 00:53:28,440
and then you do one more thing,
1476
00:53:28,440 –> 00:53:29,640
enterprises avoid.
1477
00:53:29,640 –> 00:53:31,800
You make exceptions expensive in process,
1478
00:53:31,800 –> 00:53:32,920
not in politics.
1479
00:53:32,920 –> 00:53:35,000
If a workload needs to break the stamp fine,
1480
00:53:35,000 –> 00:53:36,920
but it does so through a visible exception path
1481
00:53:36,920 –> 00:53:37,800
with an expiry.
1482
00:53:37,800 –> 00:53:39,080
That keeps the baseline clean
1483
00:53:39,080 –> 00:53:40,520
and it forces special cases
1484
00:53:40,520 –> 00:53:41,880
to prove they’re still special
1485
00:53:41,880 –> 00:53:43,400
every time the clock runs out.
1486
00:53:43,400 –> 00:53:45,400
So the enforcement stack is simple.
1487
00:53:45,400 –> 00:53:47,000
Policy defines what is allowed,
1488
00:53:47,000 –> 00:53:48,520
our back defines who can attempt it,
1489
00:53:48,520 –> 00:53:50,440
budgets define when reality diverges,
1490
00:53:50,440 –> 00:53:52,360
stamps define the default pathways,
1491
00:53:52,360 –> 00:53:53,880
so divergence is rarer,
1492
00:53:53,880 –> 00:53:55,880
and the outcome is the only thing that matters.
1493
00:53:55,880 –> 00:53:57,880
Cost becomes an enforced design decision,
1494
00:53:57,880 –> 00:53:59,320
not a post-mortem artifact.
1495
00:53:59,320 –> 00:54:01,000
Now the question isn’t what tools should we use,
1496
00:54:01,000 –> 00:54:02,680
the way the question is,
1497
00:54:02,680 –> 00:54:05,000
can you roll this out in a way that survives
1498
00:54:05,000 –> 00:54:06,600
organizational pressure?
1499
00:54:06,600 –> 00:54:08,600
That’s next, the 90-day rollout
1500
00:54:08,600 –> 00:54:10,520
from surprise bills to enforced in 10.
1501
00:54:10,520 –> 00:54:12,760
This only works if you treat it like a platform rollout,
1502
00:54:12,760 –> 00:54:14,200
not a finance initiative.
1503
00:54:14,200 –> 00:54:17,240
90 days is enough time to change the system behavior.
1504
00:54:17,240 –> 00:54:19,400
If you stop negotiating with entropy
1505
00:54:19,400 –> 00:54:21,480
and start removing its pathways,
1506
00:54:21,480 –> 00:54:22,680
days one to 30,
1507
00:54:22,680 –> 00:54:24,680
define ownership boundaries and make them real.
1508
00:54:24,680 –> 00:54:27,480
Lock in your subscription strategy,
1509
00:54:27,480 –> 00:54:28,840
what is prod, what is non-prod,
1510
00:54:28,840 –> 00:54:29,480
what is platform,
1511
00:54:29,480 –> 00:54:30,280
what is sandbox,
1512
00:54:30,280 –> 00:54:32,120
and who owns each of those scopes,
1513
00:54:32,120 –> 00:54:33,880
then define the minimum tagging taxonomy
1514
00:54:33,880 –> 00:54:35,720
that represents financial identity,
1515
00:54:35,720 –> 00:54:37,960
owner, environment, and cost center or product.
1516
00:54:37,960 –> 00:54:39,400
Keep it small and enforceable.
1517
00:54:39,400 –> 00:54:41,320
At the same time, stand up initial showback
1518
00:54:41,320 –> 00:54:43,400
with whatever accuracy you currently have
1519
00:54:43,400 –> 00:54:45,160
because the point in month one is to surface
1520
00:54:45,160 –> 00:54:46,840
where you can’t allocate and why
1521
00:54:46,840 –> 00:54:49,080
that gap is your backlog, not your shame.
1522
00:54:49,080 –> 00:54:52,360
Days 31 to 60, move from visibility to enforcement.
1523
00:54:52,360 –> 00:54:54,120
This is where you start using Azure Policy
1524
00:54:54,120 –> 00:54:55,320
like it’s meant to be used
1525
00:54:55,320 –> 00:54:57,800
to stop the platform from accepting ambiguity.
1526
00:54:57,800 –> 00:54:59,720
Deny untagged production deployments.
1527
00:54:59,720 –> 00:55:02,520
Use modify where you can safely add baseline tags
1528
00:55:02,520 –> 00:55:03,560
or inherit them,
1529
00:55:03,560 –> 00:55:06,200
but don’t confuse auto tagging with governance.
1530
00:55:06,200 –> 00:55:09,480
Implement budgets at subscription scope with early thresholds.
1531
00:55:09,480 –> 00:55:11,880
Root alerts into an action path, not a mailbox,
1532
00:55:11,880 –> 00:55:13,800
then restrict as used by environment.
1533
00:55:13,800 –> 00:55:15,720
Non-prod doesn’t get premium by default
1534
00:55:15,720 –> 00:55:17,160
and regions don’t sprawl
1535
00:55:17,160 –> 00:55:19,160
because someone felt adventurous in the portal.
1536
00:55:19,160 –> 00:55:20,440
Day 61 to 90,
1537
00:55:20,440 –> 00:55:23,000
wire escalation and institutionalized exceptions,
1538
00:55:23,000 –> 00:55:24,040
build the workflow,
1539
00:55:24,040 –> 00:55:25,400
budget alert creates a ticket,
1540
00:55:25,400 –> 00:55:26,680
it lands with a named owner,
1541
00:55:26,680 –> 00:55:28,520
it has an SLA and it has an outcome,
1542
00:55:28,520 –> 00:55:31,880
justify, remediate or request an exception with an expiry,
1543
00:55:31,880 –> 00:55:33,880
then formalize shared platform accountability.
1544
00:55:33,880 –> 00:55:36,600
If platform subscriptions aren’t budgeted and allocated,
1545
00:55:36,600 –> 00:55:38,280
you’re funding a black hole.
1546
00:55:38,280 –> 00:55:40,360
Finally, introduce deployment stamps.
1547
00:55:40,360 –> 00:55:42,520
Guarded patterns that encode cost-bounded default
1548
00:55:42,520 –> 00:55:45,960
so teams stop reinventing expensive architectures accidentally.
1549
00:55:45,960 –> 00:55:48,440
The deliverables at day 90 are boring on purpose,
1550
00:55:48,440 –> 00:55:51,400
a reference architecture for subscriptions and environments,
1551
00:55:51,400 –> 00:55:52,840
a policy starter pack,
1552
00:55:52,840 –> 00:55:55,560
an accountability model that survives org charts
1553
00:55:55,560 –> 00:55:58,680
and an operating cadence that doesn’t depend on heroics.
1554
00:55:58,680 –> 00:56:00,760
And avoid the predictable anti-patterns.
1555
00:56:00,760 –> 00:56:03,000
Optimize first, tag later,
1556
00:56:03,000 –> 00:56:06,040
dashboards as governance and exceptions without expiry.
1557
00:56:06,040 –> 00:56:08,440
Those are just different ways of asking entropy to be polite.
1558
00:56:09,160 –> 00:56:11,400
Cost discipline is enforced autonomy.
1559
00:56:11,400 –> 00:56:15,000
Cloud becomes expensive when unbounded choice meets zero accountability
1560
00:56:15,000 –> 00:56:18,760
and as your will happily build you for every unknown decision you allowed.
1561
00:56:18,760 –> 00:56:20,280
If you want predictable spend,
1562
00:56:20,280 –> 00:56:21,960
stop treating finops like reporting
1563
00:56:21,960 –> 00:56:25,000
and start enforcing financial intent in the control plane,
1564
00:56:25,000 –> 00:56:27,240
subscription boundaries, policy constraints,
1565
00:56:27,240 –> 00:56:30,040
budget escalation and time boxed exceptions.
1566
00:56:30,040 –> 00:56:31,240
If you want the next layer,
1567
00:56:31,240 –> 00:56:32,760
how to design the authorization graph
1568
00:56:32,760 –> 00:56:35,160
so cost controls don’t erode over time,
1569
00:56:35,160 –> 00:56:36,360
watch the next episode,
1570
00:56:36,360 –> 00:56:38,360
subscribe if you’re done paying for ambiguity.