
1
00:00:00,000 –> 00:00:07,680
Most organizations think agents means co-pilot with extra steps, a nicer chat box, a few connectors, maybe some workflow buttons.
2
00:00:07,680 –> 00:00:15,840
They are wrong. Co-pilot speeds up a human. Autonomy replaces the human step entirely, planning, acting, verifying, and documenting without waiting for your approval.
3
00:00:15,840 –> 00:00:23,920
And that’s where the fear is rational. The moment a system can act, every missing policy, every sloppy permission, every undocumented exception turns into conditional chaos.
4
00:00:23,920 –> 00:00:27,920
The blast rate is “Stop’s being theoretical” because the system actually has hands.
5
00:00:27,920 –> 00:00:35,600
So this episode isn’t UI-talked, it’s system behavior. We’re going to draw the line between suggestion and execution, define the contract that controls what an agent can touch,
6
00:00:35,600 –> 00:00:42,720
and then we’ll come back to the uncomfortable parts. Identity-dead, authorisation sprawl, and why governance always arrives late.
7
00:00:42,720 –> 00:00:45,680
Because that’s where autonomy breaks in real tenants.
8
00:00:45,680 –> 00:00:48,480
Define the through line, the autonomy boundary.
9
00:00:48,480 –> 00:00:54,240
If there’s one idea to hold on to for the full episode, it’s this. Autonomy fails at boundaries, not capabilities.
10
00:00:54,240 –> 00:00:59,280
Most people obsess over model quality. They ask whether the agent understands the task.
11
00:00:59,280 –> 00:01:02,880
That’s comforting, because it sounds like progress is a matter of smarter tokens.
12
00:01:02,880 –> 00:01:06,160
But in the Microsoft Enterprise, the model is rarely the limiting factor.
13
00:01:06,160 –> 00:01:11,520
The limiting factor is the moment the system transitions from “I suggest to “I execute.”
14
00:01:11,520 –> 00:01:17,200
That transition is the autonomy boundary. The autonomy boundary is the explicit decision line between two modes of operation,
15
00:01:17,200 –> 00:01:18,720
recommendation and action.
16
00:01:18,720 –> 00:01:24,080
On one side, the agent produces text, options, summaries, and plans. On the other side, the agent changes the world.
17
00:01:24,080 –> 00:01:31,760
It makes graph calls, edits configurations, closes tickets, revoke sessions, moves money, or sends communications that people will treat as official.
18
00:01:31,760 –> 00:01:37,040
That distinction matters. Because the boundary is where ownership moves, it’s where audit expectations change,
19
00:01:37,040 –> 00:01:39,840
it’s where helpful assistant becomes operator.
20
00:01:39,840 –> 00:01:42,800
An enterprises don’t struggle because the operator is incompetent.
21
00:01:42,800 –> 00:01:47,840
They struggle because nobody bothered to define, enforce, and continuously test the line where operation is allowed.
22
00:01:47,840 –> 00:01:51,600
To make that line enforceable, you need a second artifact, the execution contract.
23
00:01:51,600 –> 00:01:54,160
The execution contract is not a vibe, it is not a prompt.
24
00:01:54,160 –> 00:01:58,560
It is a concrete definition of what the agent is allowed to do and under what constraints.
25
00:01:58,560 –> 00:02:02,160
Think of it as a compiled interface between business intent and tool execution.
26
00:02:02,160 –> 00:02:06,400
It specifies at minimum five things. First, allowed tools.
27
00:02:06,400 –> 00:02:08,720
Not, it can use Microsoft Graph.
28
00:02:08,720 –> 00:02:10,800
Which graph endpoints? Which actions?
29
00:02:10,800 –> 00:02:13,040
Read versus write is not a detail.
30
00:02:13,040 –> 00:02:15,520
It’s the difference between reporting and damage.
31
00:02:15,520 –> 00:02:20,400
Second, scopes and boundaries, tenant, subscription, resource group, site collection, mailbox,
32
00:02:20,400 –> 00:02:23,520
environment, whatever the containment unit is for the workload.
33
00:02:23,520 –> 00:02:26,640
The contract names the containment unit and makes it non-negotiable.
34
00:02:26,640 –> 00:02:28,560
Third, evidence requirements.
35
00:02:28,560 –> 00:02:30,960
What does the agent need to cite before it acts?
36
00:02:30,960 –> 00:02:35,680
A ticket ID, an alert correlation, a policy clause, an approval reference, a change record.
37
00:02:35,680 –> 00:02:39,200
Autonomy without evidence is just automated, guessing, with better grammar.
38
00:02:39,200 –> 00:02:43,680
Fourth, thresholds, confidence thresholds, anomaly thresholds, volume thresholds.
39
00:02:43,680 –> 00:02:47,840
The contract states what’s safe enough means and when the system must escalate.
40
00:02:47,840 –> 00:02:49,680
Fifth, escalation and kill behavior.
41
00:02:49,680 –> 00:02:50,560
Who does it wake up?
42
00:02:50,560 –> 00:02:51,280
Where does it post?
43
00:02:51,280 –> 00:02:52,400
What’s the rollback path?
44
00:02:52,400 –> 00:02:54,480
And this is the part everyone forgets.
45
00:02:54,480 –> 00:02:59,520
How do you stop it cleanly mid-flight without leaving half a plight changes across 10 systems?
46
00:02:59,520 –> 00:03:03,280
Now, here’s where Altaira becomes useful as a concept without becoming marketing.
47
00:03:03,280 –> 00:03:07,680
In Microsoft Terms, Altaira represents the mechanism that operationalyzes the autonomy boundary
48
00:03:07,680 –> 00:03:09,040
through an execution contract.
49
00:03:09,040 –> 00:03:12,960
It’s the layer that turns we want autonomy into enforceable constraints,
50
00:03:12,960 –> 00:03:16,640
tool-rooting, scoped identities, evidence capture and predictable escalation.
51
00:03:16,640 –> 00:03:19,440
Not more chat, more closed-loop outcomes.
52
00:03:19,440 –> 00:03:22,880
And when the episode gets abstract and it will, this is the anchor.
53
00:03:22,880 –> 00:03:24,000
Come back to two questions.
54
00:03:24,000 –> 00:03:27,760
Where is the autonomy boundary and what does the execution contract require
55
00:03:27,760 –> 00:03:29,040
before the agent crosses it?
56
00:03:29,040 –> 00:03:33,760
Because every enterprise failure story in this space reduces to those two questions being answered
57
00:03:33,760 –> 00:03:36,720
informally once by the wrong person and then never revisited.
58
00:03:36,720 –> 00:03:37,680
The contract drifts.
59
00:03:37,680 –> 00:03:40,400
Exceptions get added, someone needs an urgent workaround.
60
00:03:40,400 –> 00:03:42,640
Someone else copies that work around into another environment.
61
00:03:42,640 –> 00:03:46,080
And slowly, your deterministic intent becomes probabilistic behavior.
62
00:03:46,080 –> 00:03:50,240
We’ll come back to this later when we talk about identity debt because identity debt is what happens when
63
00:03:50,240 –> 00:03:54,400
execution contracts get multiplied across dozens of non-human operators
64
00:03:54,400 –> 00:03:56,400
and nobody remembers why they exist.
65
00:03:56,400 –> 00:04:00,960
But before we get to the debt, you need to understand why co-pilot can’t cross this boundary by design
66
00:04:00,960 –> 00:04:04,640
and why that limitation is the feature that keeps most tenants intact.
67
00:04:04,640 –> 00:04:08,400
Co-pilot versus autonomous execution, the non-negotiable difference.
68
00:04:08,400 –> 00:04:12,560
If a human must approve the final action you are still buying labor, just faster labor.
69
00:04:12,560 –> 00:04:14,880
That’s not a moral judgment, it’s a systems description.
70
00:04:14,880 –> 00:04:18,880
Co-pilot is an interface layer that compresses the cost of thinking, drafting,
71
00:04:18,880 –> 00:04:20,160
searching and summarizing.
72
00:04:20,160 –> 00:04:24,480
It moves work from slow human keystrokes to fast human supervision.
73
00:04:24,480 –> 00:04:27,680
The human still owns the last mile.
74
00:04:27,680 –> 00:04:31,360
The click that changes state in Azure, the approval that closes the ticket,
75
00:04:31,360 –> 00:04:35,360
the decision that revokes the session, the email that becomes an official instruction.
76
00:04:35,360 –> 00:04:39,360
And because the human owns the last mile, the blast radius stays human-shaped.
77
00:04:39,360 –> 00:04:41,920
It’s bounded by attention, fatigue and time.
78
00:04:41,920 –> 00:04:43,600
That’s not great, but it’s legible.
79
00:04:43,600 –> 00:04:46,000
You can point to a person and say this was your decision.
80
00:04:46,000 –> 00:04:49,200
Autonomous execution is different, it is not a better chat experience,
81
00:04:49,200 –> 00:04:51,360
it is not co-pilot but with confidence.
82
00:04:51,360 –> 00:04:54,000
Autonomy is goal-pursued under constraints.
83
00:04:54,000 –> 00:04:57,520
The system receives a signal, forms a plan, uses tools,
84
00:04:57,520 –> 00:05:01,360
tracks state over time and keeps going until it meets an outcome condition
85
00:05:01,360 –> 00:05:02,800
or hits an escalation boundary.
86
00:05:02,800 –> 00:05:06,800
That means autonomy has three properties, co-pilot doesn’t need first,
87
00:05:06,800 –> 00:05:07,520
statefulness.
88
00:05:07,520 –> 00:05:10,080
It remembers what it tried, what failed, what changed,
89
00:05:10,080 –> 00:05:11,920
what evidence it gathered and what remains.
90
00:05:11,920 –> 00:05:15,120
Without state, you don’t have autonomy, you have looping suggestions.
91
00:05:15,120 –> 00:05:17,360
Second, tool ownership.
92
00:05:17,360 –> 00:05:21,280
Co-pilot can call tools, sure, but the human still authorizes meaning.
93
00:05:21,280 –> 00:05:24,160
Autonomy calls tools because tool calls are the work.
94
00:05:24,160 –> 00:05:27,120
Graph, Azure Resource Manager, ITSM APIs,
95
00:05:27,120 –> 00:05:28,720
Defender Action, Sentinel Playbooks,
96
00:05:28,720 –> 00:05:30,640
these aren’t integrations, they’re actuators.
97
00:05:31,360 –> 00:05:34,000
Third, multi-step execution with feedback.
98
00:05:34,000 –> 00:05:37,360
Autonomy doesn’t just perform an action, it verifies.
99
00:05:37,360 –> 00:05:39,360
It checks whether the service came back healthy,
100
00:05:39,360 –> 00:05:42,800
whether the config drift stopped, whether the incidence scope shrank,
101
00:05:42,800 –> 00:05:46,080
whether the reconciliation balanced, whether the containment actually contained.
102
00:05:46,080 –> 00:05:47,440
If it didn’t, it iterates.
103
00:05:47,440 –> 00:05:50,080
Now here’s where most organizations lie to themselves.
104
00:05:50,080 –> 00:05:53,840
They say they want autonomy, but they implement assistance with a longer leash.
105
00:05:53,840 –> 00:05:56,720
The agent drafts the change request and the engineer clicks approve.
106
00:05:56,720 –> 00:05:59,280
That’s still labor, faster labor.
107
00:05:59,280 –> 00:06:02,720
It can be worth doing, but don’t pretend you crossed the autonomy boundary.
108
00:06:02,720 –> 00:06:05,520
You just built a better router for human attention.
109
00:06:05,520 –> 00:06:09,040
And the reason this distinction matters isn’t philosophical, it’s operational.
110
00:06:09,040 –> 00:06:14,640
With Co-pilot, you manage model risk, hallucinations, missing context, bad summaries.
111
00:06:14,640 –> 00:06:17,440
With autonomy, you manage execution risk.
112
00:06:17,440 –> 00:06:19,520
Actual changes in production systems.
113
00:06:19,520 –> 00:06:23,040
The failure mode moves from wrong words to wrong actions.
114
00:06:23,040 –> 00:06:27,360
And at that point, the only question that matters is who owns the blast radius.
115
00:06:27,360 –> 00:06:31,600
In a deterministic security model, you can explain outcomes by configuration.
116
00:06:31,600 –> 00:06:34,960
The policy allowed it, the role permitted it, the audit log shows it.
117
00:06:34,960 –> 00:06:38,880
In a probabilistic model, outcomes emerge from a sequence of conditional decisions.
118
00:06:38,880 –> 00:06:42,000
Confidence thresholds, tool rooting, exception paths,
119
00:06:42,000 –> 00:06:47,280
retreats, partial failures, and whatever helpful fallback someone enabled in a hurry.
120
00:06:47,280 –> 00:06:50,560
That probabilistic drift is not caused by the model being random.
121
00:06:50,560 –> 00:06:52,720
It’s caused by the enterprise being inconsistent.
122
00:06:52,720 –> 00:06:54,080
The model just exposes it.
123
00:06:54,080 –> 00:06:55,680
This is the part people miss.
124
00:06:55,680 –> 00:06:58,400
Autonomy doesn’t create new governance problems.
125
00:06:58,400 –> 00:07:01,680
It simply turns your existing governance gaps into runtime behavior.
126
00:07:01,680 –> 00:07:05,200
And that’s why identity and authorization become the real cost center.
127
00:07:05,200 –> 00:07:08,720
Not tokens, not model rooting, not whether the agents sound smart.
128
00:07:08,720 –> 00:07:12,320
When you shift ownership of actions from humans to non-human operators,
129
00:07:12,320 –> 00:07:16,960
you are manufacturing new principles, new entitlements, new conditional access edges,
130
00:07:16,960 –> 00:07:19,600
new audit requirements, new incident pathways.
131
00:07:19,600 –> 00:07:23,360
We’ll come back to identity debt later because that’s where this breaks in real tenants.
132
00:07:23,360 –> 00:07:25,120
But for now, keep the frame simple.
133
00:07:25,120 –> 00:07:27,120
Copilot optimizes an individual.
134
00:07:27,120 –> 00:07:28,640
Autonomy optimizes a queue.
135
00:07:28,640 –> 00:07:31,120
Copilot makes one person faster at doing work.
136
00:07:31,120 –> 00:07:34,160
Autonomy makes work happen without that person being involved.
137
00:07:34,160 –> 00:07:38,080
Once you see that, Microsoft 365 stops looking like a suite of apps
138
00:07:38,080 –> 00:07:39,280
with a chat sidebar.
139
00:07:39,280 –> 00:07:43,200
It starts looking like an agent runtime with a massive tool surface area.
140
00:07:43,200 –> 00:07:46,800
Graph as the actuator bus, teams as the coordination layer,
141
00:07:46,800 –> 00:07:49,280
entra as the distributed decision engine,
142
00:07:49,280 –> 00:07:53,920
and purview and defender as the rails that decide whether the system stays deterministic
143
00:07:53,920 –> 00:07:56,240
or degrades into conditional chaos.
144
00:07:56,240 –> 00:07:58,960
And that’s why Copilot can’t cross the boundary by design,
145
00:07:58,960 –> 00:08:00,080
isn’t a limitation.
146
00:08:00,080 –> 00:08:01,840
It’s a containment strategy.
147
00:08:01,840 –> 00:08:05,280
Microsoft’s direction, the agentic web is already here.
148
00:08:05,280 –> 00:08:10,000
Most enterprises still talk about agents like it’s a feature you can choose to enable later.
149
00:08:10,000 –> 00:08:13,600
Once the pilot’s finished and the governance deck gets its annual refresh,
150
00:08:13,600 –> 00:08:14,480
they are wrong.
151
00:08:14,480 –> 00:08:16,240
The direction is already set.
152
00:08:16,240 –> 00:08:20,160
Microsoft is normalizing delegation to non-human operators across the stack,
153
00:08:20,160 –> 00:08:23,840
not as a sidebar, as the default unit of work. This is the uncomfortable truth.
154
00:08:23,840 –> 00:08:25,360
The agentic web is not coming.
155
00:08:25,360 –> 00:08:29,920
It is here and it’s being built out as a set of runtimes, protocols and identity surfaces
156
00:08:29,920 –> 00:08:32,080
that make autonomous execution feel ordinary.
157
00:08:32,080 –> 00:08:36,240
Look at the signals Microsoft chose to amplify at build 2025.
158
00:08:36,240 –> 00:08:38,000
They didn’t lead with better chat, dealer.
159
00:08:38,000 –> 00:08:39,840
They led with task delegation.
160
00:08:39,840 –> 00:08:41,520
Assign an issue to an agent.
161
00:08:41,520 –> 00:08:42,880
Let it spin compute.
162
00:08:42,880 –> 00:08:44,320
Make changes in a branch.
163
00:08:44,320 –> 00:08:46,000
Produce session logs, open a PR,
164
00:08:46,000 –> 00:08:48,880
and then let other agents review before merge.
165
00:08:48,880 –> 00:08:51,200
That is an operational pattern, not a UX pattern.
166
00:08:51,200 –> 00:08:53,760
It’s also a rehearsal for enterprise autonomy.
167
00:08:53,760 –> 00:08:56,640
Because if you can delegate software work end to end,
168
00:08:56,640 –> 00:08:59,920
you can delegate everything else that behaves like software incident response
169
00:08:59,920 –> 00:09:03,920
on boarding access reviews, finance, close workflows, security triage.
170
00:09:03,920 –> 00:09:06,640
These are all systems of cues, evidence and actions.
171
00:09:06,640 –> 00:09:10,240
The substrate is the same and Microsoft is making that substrate explicit.
172
00:09:10,240 –> 00:09:14,160
Azure AI Foundry is being positioned like an app server for stateful agents,
173
00:09:14,160 –> 00:09:19,600
multi-model, multi-agent orchestration, production, observability and managed execution.
174
00:09:19,600 –> 00:09:22,240
That matters because autonomy doesn’t scale on prompts.
175
00:09:22,240 –> 00:09:23,440
It scales on runtimes.
176
00:09:23,440 –> 00:09:26,880
Runtimes give you consistent tool invocation, consistent memory patterns,
177
00:09:26,880 –> 00:09:29,200
consistent telemetry and predictable failure modes.
178
00:09:29,200 –> 00:09:33,440
Without a runtime, agent is just a demo that stops working the moment the network blips
179
00:09:33,440 –> 00:09:34,640
or the API throttles.
180
00:09:34,640 –> 00:09:38,640
Then there’s co-pilot studio pushing multi-agent orchestration into low code,
181
00:09:38,640 –> 00:09:40,080
which is a polite way of saying,
182
00:09:40,080 –> 00:09:45,440
the people who least understand your control plane will soon be able to assemble autonomous workflows anyway.
183
00:09:45,440 –> 00:09:47,920
The platform doesn’t wait for architectural maturity.
184
00:09:47,920 –> 00:09:51,360
It roots around it and Microsoft is also standardizing the wiring.
185
00:09:51,360 –> 00:09:53,920
MCP, the model context protocol, is the clearest example.
186
00:09:53,920 –> 00:09:57,600
Microsoft is treating MCP like a universal adapter between agents and tools,
187
00:09:57,600 –> 00:09:59,520
and that sounds developer-friendly and it is.
188
00:09:59,520 –> 00:10:03,600
But in enterprise terms, MCP is a force multiplier for both capability and risk,
189
00:10:03,600 –> 00:10:07,760
because it collapses the friction of adding just one more tool into an agent’s reach.
190
00:10:08,400 –> 00:10:10,480
Here’s the failure mode you need to anchor on.
191
00:10:10,480 –> 00:10:14,480
An agent accidentally gains the ability to delete what it should only read,
192
00:10:14,480 –> 00:10:16,000
not because the model went rogue,
193
00:10:16,000 –> 00:10:18,480
because someone exposed a tool with a broad scope,
194
00:10:18,480 –> 00:10:21,280
or a server drifted, or a permission got inherited,
195
00:10:21,280 –> 00:10:23,840
or a temporary exception became permanent.
196
00:10:23,840 –> 00:10:25,600
MCP makes tool discovery easy.
197
00:10:25,600 –> 00:10:27,200
It does not make authorization safe.
198
00:10:27,200 –> 00:10:28,640
Discovery is not authorization.
199
00:10:28,640 –> 00:10:33,120
Microsoft is even pushing MCP down into windows itself with a registry concept,
200
00:10:33,120 –> 00:10:37,040
user consent prompts, and a model where local capabilities become calable tools.
201
00:10:37,040 –> 00:10:38,640
That’s not a niche developer story.
202
00:10:38,640 –> 00:10:42,160
It’s Microsoft telling you that tool access is the new perimeter,
203
00:10:42,160 –> 00:10:44,480
and the perimeter now spans cloud and endpoint.
204
00:10:44,480 –> 00:10:47,280
At the same time, they’re doing something more consequential.
205
00:10:47,280 –> 00:10:50,080
Normalizing non-human identities at scale.
206
00:10:50,080 –> 00:10:54,160
In the keynote language, agents get their own identity and show up in entra.
207
00:10:54,160 –> 00:10:55,120
That’s not cosmetic.
208
00:10:55,120 –> 00:10:57,360
That’s the beginning of an enterprise identity graph,
209
00:10:57,360 –> 00:10:59,520
where humans are no longer the only operators.
210
00:10:59,520 –> 00:11:03,600
Your tenant becomes a mixed ecology of people and principles acting with intent
211
00:11:03,600 –> 00:11:04,800
that someone wants to find.
212
00:11:04,800 –> 00:11:07,920
And when that becomes normal, governance stops being a policy document
213
00:11:07,920 –> 00:11:09,440
and becomes a compiler problem.
214
00:11:09,440 –> 00:11:14,000
You are compiling intent into enforceable constraints across thousands of decisions per day,
215
00:11:14,000 –> 00:11:18,240
made by systems that don’t get tired and don’t use judgment the way humans do.
216
00:11:18,240 –> 00:11:21,600
So if you’re waiting for a clean, agent rollout moment,
217
00:11:21,600 –> 00:11:23,040
you’re already behind.
218
00:11:23,040 –> 00:11:25,040
The ecosystem is converging.
219
00:11:25,040 –> 00:11:27,760
GitHub task delegation is cultural proof.
220
00:11:27,760 –> 00:11:29,280
Foundry is runtime.
221
00:11:29,280 –> 00:11:31,840
Co-pilot Studio as distribution channel.
222
00:11:31,840 –> 00:11:33,680
Teams as coordination layer.
223
00:11:33,680 –> 00:11:35,440
Graph as actuator bus.
224
00:11:35,440 –> 00:11:39,360
And entra as the decision engine that either enforces your intent
225
00:11:39,360 –> 00:11:43,440
or quietly accumulates exceptions until you’re running conditional chaos.
226
00:11:43,440 –> 00:11:44,880
And that sets up the next question.
227
00:11:44,880 –> 00:11:48,320
If this is Microsoft’s direction, what exactly is Altera in Microsoft terms
228
00:11:48,320 –> 00:11:50,160
without marketing, without mysticism,
229
00:11:50,160 –> 00:11:53,840
and without pretending the platform will save you from your own design debt?
230
00:11:53,840 –> 00:11:56,480
What Altera represents in Microsoft terms?
231
00:11:56,480 –> 00:11:59,280
Most people here, Altera, and immediately hunt for the UI.
232
00:11:59,280 –> 00:12:00,880
They want to know where the chat box lives,
233
00:12:00,880 –> 00:12:04,400
what the agent looks like in teams, how it shows up in co-pilot.
234
00:12:04,400 –> 00:12:05,440
That’s the wrong axis.
235
00:12:05,440 –> 00:12:07,680
The interface is the least interesting part of autonomy
236
00:12:07,680 –> 00:12:09,520
because the interface doesn’t carry the risk.
237
00:12:09,520 –> 00:12:10,560
The system does.
238
00:12:10,560 –> 00:12:13,440
In Microsoft terms, Altera represents an execution layer
239
00:12:13,440 –> 00:12:17,120
that operationalizes the autonomy boundary through an execution contract.
240
00:12:17,120 –> 00:12:19,440
It sits above tools and below business intent.
241
00:12:19,440 –> 00:12:20,960
It is the part that takes a goal,
242
00:12:20,960 –> 00:12:23,520
a set of allowed actions, a set of required evidence,
243
00:12:23,520 –> 00:12:26,240
and turns that into a controlled sequence of tool calls
244
00:12:26,240 –> 00:12:28,960
that either completes the work or escalates cleanly.
245
00:12:28,960 –> 00:12:31,520
That distinction matters because Microsoft already gives you
246
00:12:31,520 –> 00:12:33,200
most of the raw ingredients.
247
00:12:33,200 –> 00:12:36,640
Graph, Azure Resource Manager, Defender Actions,
248
00:12:36,640 –> 00:12:41,200
Sentinel Playbooks, co-pilot Studio Orchestration, Foundry Run Times,
249
00:12:41,200 –> 00:12:42,960
Teams as a Coordination Surface.
250
00:12:42,960 –> 00:12:44,480
The enterprise does not lack tools.
251
00:12:44,480 –> 00:12:48,000
It lacks a mechanism that forces those tools to behave like a system.
252
00:12:48,000 –> 00:12:50,640
So the clean way to describe Altera is not another agent.
253
00:12:50,640 –> 00:12:54,640
It is the thing that makes an agent behave like an operator
254
00:12:54,640 –> 00:12:56,080
you’d be willing to put on call,
255
00:12:56,080 –> 00:13:00,000
constrained identity, explicit tool access, predictable escalation,
256
00:13:00,000 –> 00:13:01,440
and replayable evidence.
257
00:13:01,440 –> 00:13:03,440
And you can translate that into a mental model
258
00:13:03,440 –> 00:13:06,160
that enterprise people actually understand.
259
00:13:06,160 –> 00:13:08,640
Altera behaves like an authorization compiler.
260
00:13:08,640 –> 00:13:11,520
You provide intent, resolve these incident classes,
261
00:13:11,520 –> 00:13:14,720
reconcile these accounts, contain these alert types.
262
00:13:14,720 –> 00:13:17,680
You provide constraints, scopes, thresholds,
263
00:13:17,680 –> 00:13:20,000
evidence rules, and who owns escalation.
264
00:13:20,000 –> 00:13:22,880
And then that intent gets compiled into a runtime plan
265
00:13:22,880 –> 00:13:24,800
which tools can be invoked in which order,
266
00:13:24,800 –> 00:13:28,160
with which checks, under which identity producing which artifacts.
267
00:13:28,160 –> 00:13:29,200
It is not magic.
268
00:13:29,200 –> 00:13:32,000
It is constraint enforcement under load.
269
00:13:32,000 –> 00:13:34,080
Now, where does it sit in the Microsoft stack?
270
00:13:34,080 –> 00:13:37,440
It sits in the seam between the control plane and the execution plane.
271
00:13:37,440 –> 00:13:41,280
Entra, purview, Defender, and your policy layer define what should be allowed.
272
00:13:41,280 –> 00:13:43,760
Graph, Azure, ITSM, ERP connectors,
273
00:13:43,760 –> 00:13:46,080
and endpoint actions are how work gets done.
274
00:13:46,080 –> 00:13:50,640
Altera lives between those worlds translating allowed into perform without letting
275
00:13:50,640 –> 00:13:52,240
convenience rewrite intent.
276
00:13:52,240 –> 00:13:54,480
That’s why it can’t be just another prompt wrapper.
277
00:13:54,480 –> 00:13:56,240
Prompt wrappers make the demo feel good.
278
00:13:56,240 –> 00:13:57,600
They do not make the tenant safer.
279
00:13:57,600 –> 00:13:59,040
They don’t solve identities, brawl.
280
00:13:59,040 –> 00:14:00,560
They don’t solve tool scope drift.
281
00:14:00,560 –> 00:14:02,400
They don’t produce evidence you can replay.
282
00:14:02,400 –> 00:14:06,560
They don’t give you a kill switch that actually stops a multi-system run halfway through.
283
00:14:06,560 –> 00:14:09,440
They just produce better sentences about what might happen.
284
00:14:09,440 –> 00:14:11,600
Altera, as we’re using it in this episode,
285
00:14:11,600 –> 00:14:14,240
represents the closed loop outcome approach.
286
00:14:14,240 –> 00:14:20,480
Detect, decide, act, verify, and document as a single executable run.
287
00:14:20,480 –> 00:14:22,480
The output is not, here’s my reasoning.
288
00:14:22,480 –> 00:14:25,600
The output is, the incident is resolved, the reconciliation is balanced,
289
00:14:25,600 –> 00:14:28,800
the containment is applied, and here is the evidence trail that proves it.
290
00:14:28,800 –> 00:14:30,800
And this is the uncomfortable part for buyers.
291
00:14:30,800 –> 00:14:33,520
Altera’s value has almost nothing to do with model quality.
292
00:14:33,520 –> 00:14:36,480
Yes, you want decent reasoning, but model quality is not what determines
293
00:14:36,480 –> 00:14:38,320
whether autonomy works in production.
294
00:14:38,320 –> 00:14:39,920
Control play maturity does.
295
00:14:39,920 –> 00:14:43,520
If your identity model is sloppy, autonomy accelerates the sloppiness.
296
00:14:43,520 –> 00:14:46,880
If your tool permissions are broad, autonomy turns them into a power tool.
297
00:14:46,880 –> 00:14:50,400
If your approvals are ambiguous, autonomy becomes a blame generator.
298
00:14:50,400 –> 00:14:52,160
If your audit surfaces are weak,
299
00:14:52,160 –> 00:14:54,560
autonomy becomes a storytelling engine.
300
00:14:54,560 –> 00:14:57,280
That’s why the promise isn’t, will make the model smarter.
301
00:14:57,280 –> 00:15:00,560
The promise is, will make the system more deterministic.
302
00:15:00,560 –> 00:15:03,520
And deterministic in this context doesn’t mean perfect.
303
00:15:03,520 –> 00:15:04,800
It means explainable.
304
00:15:04,800 –> 00:15:06,960
You can map an outcome back to a policy clause,
305
00:15:06,960 –> 00:15:09,920
an entitlement, an evidence artifact, and a bounded action set.
306
00:15:09,920 –> 00:15:11,520
So here’s what Altera is not.
307
00:15:11,520 –> 00:15:13,040
It is not a replacement for Entra.
308
00:15:13,040 –> 00:15:15,280
Entra is still the distributed decision engine.
309
00:15:15,280 –> 00:15:18,320
Altera is an execution layer that consumes those decisions.
310
00:15:18,320 –> 00:15:20,800
It is not a replacement for Perview or Defender.
311
00:15:20,800 –> 00:15:22,400
Those are your governance and threat rails.
312
00:15:22,400 –> 00:15:25,120
Altera produces the evidence and the action footprints
313
00:15:25,120 –> 00:15:26,720
those systems need to evaluate.
314
00:15:26,720 –> 00:15:28,960
It is not co-pilot, but autonomous.
315
00:15:28,960 –> 00:15:31,600
Co-pilot is a human productivity interface.
316
00:15:31,600 –> 00:15:33,680
Altera is an operator runtime pattern.
317
00:15:33,680 –> 00:15:36,880
And if that feels like semantics good, semantics are where audits live.
318
00:15:36,880 –> 00:15:40,080
Because once you accept that Altera is essentially a mechanism
319
00:15:40,080 –> 00:15:42,560
for enforcing execution contracts at scale.
320
00:15:42,560 –> 00:15:44,320
The next question becomes obvious.
321
00:15:44,320 –> 00:15:46,880
Why do enterprises still get stuck at pilot forever?
322
00:15:46,880 –> 00:15:48,560
Not because autonomy is impossible,
323
00:15:48,560 –> 00:15:50,560
because the first time you try to productionize it,
324
00:15:50,560 –> 00:15:53,840
you discover the tenant has no enforceable autonomy boundary at all.
325
00:15:53,840 –> 00:15:56,320
Why enterprises stall at pilot forever?
326
00:15:56,320 –> 00:15:58,560
The pattern is boring because it repeats.
327
00:15:58,560 –> 00:16:00,080
A team runs a proof of concept.
328
00:16:00,080 –> 00:16:01,040
It looks great.
329
00:16:01,040 –> 00:16:03,520
The agent summarizes, tickets, drafts, responses,
330
00:16:03,520 –> 00:16:04,960
maybe even proposes a fix.
331
00:16:04,960 –> 00:16:07,360
Everyone nods, then someone says the fatal sentence,
332
00:16:07,360 –> 00:16:09,200
“Okay, let’s roll this into production.”
333
00:16:09,200 –> 00:16:12,160
And production is where the tenant’s actual shape appears.
334
00:16:12,160 –> 00:16:14,560
Pilots succeed because they borrow certainty.
335
00:16:14,560 –> 00:16:16,240
They live in a narrow sandbox.
336
00:16:16,240 –> 00:16:19,920
A clean data set, a cooperative API, a friendly stakeholder,
337
00:16:19,920 –> 00:16:23,280
and permissions that quietly ignore how the enterprise actually works.
338
00:16:23,280 –> 00:16:25,680
Then the moment you connect the pilot to the real cues,
339
00:16:25,680 –> 00:16:28,720
real incidents, real approvals, real change control,
340
00:16:28,720 –> 00:16:31,600
the system hits friction you can’t prompt your way out of.
341
00:16:31,600 –> 00:16:33,280
The first friction point is permissions.
342
00:16:33,280 –> 00:16:36,000
In a pilot, people hand the agent broad access
343
00:16:36,000 –> 00:16:37,840
because they’re optimizing for speed.
344
00:16:37,840 –> 00:16:40,640
In production, broad access becomes a liability surface
345
00:16:40,640 –> 00:16:43,040
and suddenly everyone remembers segregation of duties.
346
00:16:43,040 –> 00:16:45,200
The same person who loved the demo now asks,
347
00:16:45,200 –> 00:16:47,040
“Wait, what identity is that running as?”
348
00:16:47,040 –> 00:16:48,880
And if you can’t answer in one sentence,
349
00:16:48,880 –> 00:16:50,480
what principle, what roles, what scopes,
350
00:16:50,480 –> 00:16:52,000
what conditional access constraints,
351
00:16:52,000 –> 00:16:53,040
you don’t have autonomy,
352
00:16:53,040 –> 00:16:55,200
you have a science project with admin rights.
353
00:16:55,200 –> 00:16:57,040
The second friction point is auditability.
354
00:16:57,040 –> 00:16:58,960
The demo says, “Here’s what I did.”
355
00:16:58,960 –> 00:17:02,160
The auditor says, “Prove it, replay it, show me the evidence chain.”
356
00:17:02,160 –> 00:17:05,360
Autonomy only counts as enterprise automation
357
00:17:05,360 –> 00:17:08,800
when it produces artifacts that survive hostile review.
358
00:17:08,800 –> 00:17:11,280
Time stamps, inputs, tool calls, approvals,
359
00:17:11,280 –> 00:17:13,280
and outcomes tied to policy.
360
00:17:13,280 –> 00:17:16,080
If your agent can’t produce evidence, it can’t be trusted.
361
00:17:16,080 –> 00:17:18,240
It can only be tolerated temporarily
362
00:17:18,240 –> 00:17:19,840
by people who haven’t been burned yet.
363
00:17:19,840 –> 00:17:22,480
The third friction point is incident ownership.
364
00:17:22,480 –> 00:17:24,080
Pilots have a hero, a champion,
365
00:17:24,080 –> 00:17:26,320
someone who owns the agent because they built it.
366
00:17:26,320 –> 00:17:28,080
In production, ownership means a pager
367
00:17:28,080 –> 00:17:30,720
who gets woken up when the agent loops at 2am,
368
00:17:30,720 –> 00:17:33,280
who approves the rollback when it partially applied changes
369
00:17:33,280 –> 00:17:35,840
across Azure Graph and the ITSM system,
370
00:17:35,840 –> 00:17:38,240
who signs off when the agent’s action caused user impact
371
00:17:38,240 –> 00:17:40,560
but the model’s explanation sounds plausible.
372
00:17:40,560 –> 00:17:43,200
Enterprises don’t stall because they hate autonomy.
373
00:17:43,200 –> 00:17:44,960
They stall because nobody wants to inherit
374
00:17:44,960 –> 00:17:47,600
a new failure mode without a clear escalation contract.
375
00:17:47,600 –> 00:17:51,040
Then comes change control, the quiet killer of agent projects.
376
00:17:51,040 –> 00:17:53,920
Autonomy requires updating tools, policies, thresholds,
377
00:17:53,920 –> 00:17:56,080
and runbooks as the environment changes.
378
00:17:56,080 –> 00:17:58,960
But enterprises treat policy like a museum artifact,
379
00:17:58,960 –> 00:18:00,640
written once, rarely revisited,
380
00:18:00,640 –> 00:18:02,640
and only updated after an incident.
381
00:18:02,640 –> 00:18:05,360
So the agent drifts out of alignment with reality.
382
00:18:05,360 –> 00:18:08,800
API’s change rolls evolve, a new SAS tool appears.
383
00:18:08,800 –> 00:18:11,360
An exception gets added just for this quarter.
384
00:18:11,360 –> 00:18:14,160
The pilot keeps running with assumptions that no longer hold.
385
00:18:14,160 –> 00:18:15,920
And when the first production incident happens,
386
00:18:15,920 –> 00:18:18,160
the organization responds predictably.
387
00:18:18,160 –> 00:18:19,280
Pause for governance.
388
00:18:19,280 –> 00:18:20,880
That phrase sounds responsible.
389
00:18:20,880 –> 00:18:22,480
It is usually a confession.
390
00:18:22,480 –> 00:18:25,680
It means the organization didn’t have an enforceable autonomy boundary.
391
00:18:25,680 –> 00:18:27,840
They had enthusiasm in a slide deck.
392
00:18:27,840 –> 00:18:30,480
Governance arrives late because it’s uncomfortable work.
393
00:18:30,480 –> 00:18:33,520
It forces you to make decisions about what the agent is allowed to do,
394
00:18:33,520 –> 00:18:37,280
who owns the consequences and what evidence is required before action.
395
00:18:37,280 –> 00:18:39,440
Most organizations avoid those decisions
396
00:18:39,440 –> 00:18:41,600
by keeping the agent in suggestion mode.
397
00:18:41,600 –> 00:18:44,240
Because suggestion mode keeps responsibility human-shaped.
398
00:18:44,240 –> 00:18:45,920
This is also where shadow AI shows up.
399
00:18:45,920 –> 00:18:48,000
Business units don’t wait for central IT.
400
00:18:48,000 –> 00:18:49,440
They build agents anyway.
401
00:18:49,440 –> 00:18:52,480
Co-pilot studio here, a connector there, an MCP server,
402
00:18:52,480 –> 00:18:53,520
someone found on GitHub,
403
00:18:53,520 –> 00:18:56,560
and suddenly actions happen outside the control plane’s visibility.
404
00:18:56,560 –> 00:18:58,000
Not because people are malicious,
405
00:18:58,000 –> 00:18:59,440
because cues never shrink,
406
00:18:59,440 –> 00:19:01,280
and someone always wants relief.
407
00:19:01,280 –> 00:19:03,280
The platform routes around your governance
408
00:19:03,280 –> 00:19:05,680
because the business routes around your delays.
409
00:19:05,680 –> 00:19:08,160
So the root cause isn’t the enterprise’s cautious.
410
00:19:08,160 –> 00:19:11,600
The root cause is that autonomy forces the tenant to become honest.
411
00:19:11,600 –> 00:19:13,280
It forces you to formalize intent.
412
00:19:13,280 –> 00:19:15,520
It forces you to define the execution contract.
413
00:19:15,520 –> 00:19:19,120
It forces you to treat exceptions as entropy generators, not as favors.
414
00:19:19,120 –> 00:19:21,040
And it forces you to align the control plane.
415
00:19:21,040 –> 00:19:23,680
Identity policy evidence with the execution plane,
416
00:19:23,680 –> 00:19:25,200
tools, actions, outcomes,
417
00:19:25,200 –> 00:19:27,600
pilots avoid that alignment by staying small.
418
00:19:27,600 –> 00:19:29,600
Production demands it immediately.
419
00:19:29,600 –> 00:19:32,640
And that’s why pilot forever is not a maturity stage.
420
00:19:32,640 –> 00:19:33,920
It’s a stable equilibrium.
421
00:19:33,920 –> 00:19:35,600
Assistance feels useful and safe.
422
00:19:35,600 –> 00:19:39,280
Autonomy feels risky and political, therefore autonomy gets deferred
423
00:19:39,280 –> 00:19:40,240
until the next quarter.
424
00:19:40,240 –> 00:19:41,440
The quarter never ends.
425
00:19:41,440 –> 00:19:43,440
So the question isn’t how to do a better pilot.
426
00:19:43,440 –> 00:19:45,680
The question is how to design autonomy as a system,
427
00:19:45,680 –> 00:19:47,360
not a feature, because the moment you do,
428
00:19:47,360 –> 00:19:50,240
the stall pattern becomes predictable and solvable.
429
00:19:50,240 –> 00:19:52,800
The autonomy stack, event, reasoning,
430
00:19:52,800 –> 00:19:55,280
orchestration, action, evidence.
431
00:19:55,280 –> 00:19:57,600
Once you stop treating autonomy like a feature,
432
00:19:57,600 –> 00:19:58,720
you need a stack.
433
00:19:58,720 –> 00:20:01,120
Not a vendor diagram, a behavioral stack.
434
00:20:01,120 –> 00:20:02,400
How work enters the system?
435
00:20:02,400 –> 00:20:04,320
How decisions get made, how actions happen,
436
00:20:04,320 –> 00:20:06,560
and how you proved the system didn’t just improvise.
437
00:20:06,560 –> 00:20:09,760
This is the autonomy stack that actually survives production,
438
00:20:09,760 –> 00:20:12,560
event, reasoning, orchestration, action, evidence.
439
00:20:12,560 –> 00:20:15,200
Start with event, autonomy doesn’t begin with a prompt.
440
00:20:15,200 –> 00:20:18,000
It begins with a signal that arrives, whether you’re ready or not.
441
00:20:18,000 –> 00:20:20,640
An alert fires, a ticket opens, a mailbox, rule triggers,
442
00:20:20,640 –> 00:20:22,000
a scheduled job hits.
443
00:20:22,000 –> 00:20:23,760
A user reports something in teams.
444
00:20:23,760 –> 00:20:25,600
A threshold crosses into the limit.
445
00:20:25,600 –> 00:20:29,600
The key point is that events are external reality pushing into your system.
446
00:20:29,600 –> 00:20:31,760
And this is where people quietly cheat.
447
00:20:31,760 –> 00:20:35,120
They build an autonomous agent that only runs when a human asks it to.
448
00:20:35,120 –> 00:20:36,240
That’s still assistance.
449
00:20:36,240 –> 00:20:38,640
Autonomy starts when the system can wake itself up.
450
00:20:38,640 –> 00:20:41,840
But event ingestion has an architectural requirement, normalization.
451
00:20:41,840 –> 00:20:45,280
If your events arrive in 10 formats with 10 levels of fidelity,
452
00:20:45,280 –> 00:20:47,280
you don’t have an autonomy pipeline.
453
00:20:47,280 –> 00:20:48,560
You have a noisy inbox.
454
00:20:48,560 –> 00:20:52,000
So the first job is to translate raw signals into a consistent envelope.
455
00:20:52,000 –> 00:20:52,720
What happened?
456
00:20:52,720 –> 00:20:53,440
Where? To what?
457
00:20:53,440 –> 00:20:55,440
And what evidence exists that it actually happened?
458
00:20:55,440 –> 00:20:56,640
Now reasoning.
459
00:20:56,640 –> 00:20:58,960
Reasoning is not the agent thinking.
460
00:20:58,960 –> 00:21:01,200
Reasoning is the system converting a signal
461
00:21:01,200 –> 00:21:03,680
into an intentful plan under constraints.
462
00:21:03,680 –> 00:21:07,200
That typically means classify the event, extract the goal,
463
00:21:07,200 –> 00:21:10,160
decompose into steps and decide whether action is allowed.
464
00:21:10,160 –> 00:21:11,760
And here’s the uncomfortable truth.
465
00:21:11,760 –> 00:21:14,240
Reasoning needs explicit stop conditions.
466
00:21:14,240 –> 00:21:15,760
Humans stop because they get tired.
467
00:21:15,760 –> 00:21:18,400
Agents stop only when you define done or not safe.
468
00:21:18,400 –> 00:21:20,320
And without that, they don’t become autonomous.
469
00:21:20,320 –> 00:21:21,360
They become persistent.
470
00:21:21,360 –> 00:21:22,960
So you need confidence thresholds,
471
00:21:22,960 –> 00:21:25,920
anomaly detection, and policy checks as part of reasoning.
472
00:21:25,920 –> 00:21:26,960
Not as an afterthought.
473
00:21:26,960 –> 00:21:30,400
The system has to decide upfront whether it should act, ask, or escalate.
474
00:21:30,400 –> 00:21:32,800
That decision is the autonomy boundary in motion.
475
00:21:32,800 –> 00:21:34,720
Suggestion versus execution.
476
00:21:34,720 –> 00:21:38,640
Then orchestration orchestration is where most people get seduced by complexity.
477
00:21:38,640 –> 00:21:42,080
Multi-agent this planner that tool router, memory store, fine.
478
00:21:42,080 –> 00:21:44,560
But the practical purpose of orchestration is simple.
479
00:21:44,560 –> 00:21:47,680
Root the work to the right capability in the right order
480
00:21:47,680 –> 00:21:49,920
with fallbacks that don’t become loopholes.
481
00:21:49,920 –> 00:21:51,920
Orchestration chooses tools and specialists
482
00:21:51,920 –> 00:21:53,680
the way a human operator does.
483
00:21:53,680 –> 00:21:54,880
I need more context.
484
00:21:54,880 –> 00:21:56,560
Go query the ticket system.
485
00:21:56,560 –> 00:21:57,840
I need to validate scope.
486
00:21:57,840 –> 00:21:59,360
Go check identity risk.
487
00:21:59,360 –> 00:22:01,840
I need to apply a change, use this runbook.
488
00:22:01,840 –> 00:22:04,400
The difference is that orchestration has to be deterministic
489
00:22:04,400 –> 00:22:06,560
about permissions and evidence collection.
490
00:22:06,560 –> 00:22:10,240
Otherwise, your fallback path becomes the real path because it’s easier.
491
00:22:10,240 –> 00:22:13,280
And orchestration must handle failure as a first class input.
492
00:22:13,280 –> 00:22:14,160
API’s throttle.
493
00:22:14,160 –> 00:22:15,440
Graph returns partial data.
494
00:22:15,440 –> 00:22:16,640
A device goes offline.
495
00:22:16,640 –> 00:22:17,440
A resource group.
496
00:22:17,440 –> 00:22:17,920
Locks.
497
00:22:17,920 –> 00:22:19,040
A connector breaks.
498
00:22:19,040 –> 00:22:20,480
The agent doesn’t get to pretend.
499
00:22:20,480 –> 00:22:22,400
Orchestration has to implement retreats,
500
00:22:22,400 –> 00:22:24,560
back off alternate paths and escalation rules
501
00:22:24,560 –> 00:22:26,880
that don’t spam your on-call rotation into quitting.
502
00:22:26,880 –> 00:22:27,920
Next is action.
503
00:22:28,800 –> 00:22:31,520
Action is the part everyone demos because it looks impressive.
504
00:22:31,520 –> 00:22:34,720
But action is where you either enforce the execution contract
505
00:22:34,720 –> 00:22:36,080
or you lie about having one.
506
00:22:36,080 –> 00:22:38,560
Actions are concrete tool calls,
507
00:22:38,560 –> 00:22:41,280
patching a service, updating a configuration,
508
00:22:41,280 –> 00:22:42,240
revoking a session,
509
00:22:42,240 –> 00:22:43,920
disabling a risky app consent,
510
00:22:43,920 –> 00:22:45,600
posting to a team’s channel,
511
00:22:45,600 –> 00:22:48,000
creating a change record, closing a ticket.
512
00:22:48,000 –> 00:22:50,480
And each action must run under a scoped identity
513
00:22:50,480 –> 00:22:51,920
with bounded permissions.
514
00:22:51,920 –> 00:22:55,040
This is where read versus write stops being theory.
515
00:22:55,040 –> 00:22:56,960
If the agent can write to the wrong plane,
516
00:22:56,960 –> 00:22:59,360
you’ve built a worm with good documentation.
517
00:22:59,360 –> 00:23:00,720
So action needs guardrails,
518
00:23:00,720 –> 00:23:02,960
quotas, rate limits, scope boundaries,
519
00:23:02,960 –> 00:23:05,600
and a kill switch that actually holds an inflight run.
520
00:23:05,600 –> 00:23:07,520
An action must include verification.
521
00:23:07,520 –> 00:23:08,560
Not I executed.
522
00:23:08,560 –> 00:23:10,480
Verified outcomes?
523
00:23:10,480 –> 00:23:12,480
Service healthy, incident stopped paging,
524
00:23:12,480 –> 00:23:14,720
reconciliation balanced containment took effect.
525
00:23:14,720 –> 00:23:16,800
If you don’t verify, you didn’t automate a result.
526
00:23:16,800 –> 00:23:17,840
You automated a guess.
527
00:23:17,840 –> 00:23:19,520
Finally, evidence evidence is the part
528
00:23:19,520 –> 00:23:21,840
that makes autonomy enterprise grade.
529
00:23:21,840 –> 00:23:23,520
Without it, you get agent said so,
530
00:23:23,520 –> 00:23:26,400
which is just a new flavor of unaccountable change.
531
00:23:26,400 –> 00:23:28,400
Evidence means a replayable run.
532
00:23:28,400 –> 00:23:30,720
Inputs captured, the event payload stored,
533
00:23:30,720 –> 00:23:32,320
the reasoning decision recorded,
534
00:23:32,320 –> 00:23:34,480
the tool calls logged with parameters,
535
00:23:34,480 –> 00:23:36,800
the identities used, the approvals referenced,
536
00:23:36,800 –> 00:23:38,160
the outputs produced,
537
00:23:38,160 –> 00:23:40,800
and the verification checks that confirm success.
538
00:23:40,800 –> 00:23:42,080
This is not for curiosity.
539
00:23:42,080 –> 00:23:44,880
It’s for incident reviews, audits, and blame assignment.
540
00:23:44,880 –> 00:23:46,800
Because enterprises will do all three,
541
00:23:46,800 –> 00:23:49,040
evidence is also how you detect drift.
542
00:23:49,040 –> 00:23:51,040
When the same event class suddenly produces
543
00:23:51,040 –> 00:23:52,400
different action paths, you know,
544
00:23:52,400 –> 00:23:54,240
your contracts or entitlements eroded.
545
00:23:54,240 –> 00:23:57,120
So when someone asks what is autonomy architecturally,
546
00:23:57,120 –> 00:23:59,040
the answer isn’t an LLM with tools.
547
00:23:59,040 –> 00:24:01,520
It’s a closed loop system that ingests events,
548
00:24:01,520 –> 00:24:03,920
reasons under policy, orchestrates safely,
549
00:24:03,920 –> 00:24:06,240
acts with bounded identity and outputs evidence
550
00:24:06,240 –> 00:24:07,680
you can replay under hostility.
551
00:24:07,680 –> 00:24:11,040
Control plane versus execution plane,
552
00:24:11,040 –> 00:24:12,720
where governance actually lives.
553
00:24:12,720 –> 00:24:15,760
Now the stack is useful, but it hides the real fight.
554
00:24:15,760 –> 00:24:18,000
Governance doesn’t live in the agent.
555
00:24:18,000 –> 00:24:20,320
It lives in how you separate the control plane
556
00:24:20,320 –> 00:24:21,760
from the execution plane,
557
00:24:21,760 –> 00:24:23,840
and whether you keep that separation intact
558
00:24:23,840 –> 00:24:25,520
when someone asks for speed.
559
00:24:25,520 –> 00:24:28,320
The control plane is where you encode intent as constraints.
560
00:24:28,320 –> 00:24:30,400
It is identity’s entitlements, policies,
561
00:24:30,400 –> 00:24:32,400
approvals, tool allow lists, evidence rules,
562
00:24:32,400 –> 00:24:34,080
and the ability to revoke any of those
563
00:24:34,080 –> 00:24:36,320
without negotiating with a dozen app teams.
564
00:24:36,320 –> 00:24:38,640
If you can’t change the rules without redeploying the agent,
565
00:24:38,640 –> 00:24:40,160
you don’t have a control plane.
566
00:24:40,160 –> 00:24:41,360
You have a fragile app.
567
00:24:41,360 –> 00:24:44,560
In Microsoft terms, the control plane is anchored in Entra,
568
00:24:44,560 –> 00:24:46,640
your policy layer, and your governance systems.
569
00:24:46,640 –> 00:24:48,880
The place where you decide what principles exist,
570
00:24:48,880 –> 00:24:50,560
what they can do under what conditions
571
00:24:50,560 –> 00:24:52,640
and what must be recorded when they do it.
572
00:24:52,640 –> 00:24:55,200
It’s also where you decide what allowed even means
573
00:24:55,200 –> 00:24:56,640
when the actor isn’t a person.
574
00:24:56,640 –> 00:24:58,640
The execution plane is where work happens.
575
00:24:58,640 –> 00:25:00,720
It is the runtime making graph calls,
576
00:25:00,720 –> 00:25:03,520
running runbooks, invoking sentinel playbooks,
577
00:25:03,520 –> 00:25:06,480
updating tickets, pushing messages into teams,
578
00:25:06,480 –> 00:25:09,520
touching SharePoint, writing back into the ERP
579
00:25:09,520 –> 00:25:12,960
or performing any other actuator move that changes state.
580
00:25:12,960 –> 00:25:15,760
Execution is the part that makes demos look impressive
581
00:25:15,760 –> 00:25:17,680
because it creates visible outcomes.
582
00:25:17,680 –> 00:25:20,480
It is also the part that turns small mistakes into incidents.
583
00:25:20,480 –> 00:25:21,520
That distinction matters
584
00:25:21,520 –> 00:25:23,440
because enterprises routinely invert them.
585
00:25:23,440 –> 00:25:24,720
They start with execution.
586
00:25:24,720 –> 00:25:26,080
We connected it to graph.
587
00:25:26,080 –> 00:25:27,360
We wired up the connector.
588
00:25:27,360 –> 00:25:28,960
It can restart the service.
589
00:25:28,960 –> 00:25:30,720
And then later they bolt on governance
590
00:25:30,720 –> 00:25:32,160
a log file, a few approvals,
591
00:25:32,160 –> 00:25:34,080
a policy doc that nobody reads.
592
00:25:34,080 –> 00:25:36,160
Over time, convenience overrides intent.
593
00:25:36,160 –> 00:25:38,480
The execution plane becomes the real control plane
594
00:25:38,480 –> 00:25:40,000
because whoever owns the connector
595
00:25:40,000 –> 00:25:41,920
effectively owns the blast radius.
596
00:25:41,920 –> 00:25:43,440
This is the uncomfortable truth.
597
00:25:43,440 –> 00:25:46,160
Autonomy systems drift toward the fastest path
598
00:25:46,160 –> 00:25:48,400
unless you enforce separation by design.
599
00:25:48,400 –> 00:25:51,200
So what does separation look like in practice?
600
00:25:51,200 –> 00:25:53,200
First, control plane owns identity.
601
00:25:53,200 –> 00:25:55,760
Not the agent developer, not the workflow designer,
602
00:25:55,760 –> 00:25:58,320
not whoever has contributor in the subscription.
603
00:25:58,320 –> 00:26:00,240
The agent runs as a non-human principle
604
00:26:00,240 –> 00:26:02,240
with explicitly bounded roles.
605
00:26:02,240 –> 00:26:04,960
And those roles live in the same life cycle as human access,
606
00:26:04,960 –> 00:26:07,200
review, rotation and revocation.
607
00:26:07,200 –> 00:26:09,280
If a developer can quietly widen permissions
608
00:26:09,280 –> 00:26:11,920
to make the demo work, the system will inevitably
609
00:26:11,920 –> 00:26:13,360
ship with those permissions.
610
00:26:13,360 –> 00:26:16,080
Second, control plane owns tool availability.
611
00:26:16,080 –> 00:26:18,160
Not the agent can use tools.
612
00:26:18,160 –> 00:26:19,520
Which tools exist at all?
613
00:26:19,520 –> 00:26:21,120
Which versions and which ones are allowed?
614
00:26:21,120 –> 00:26:21,840
In production.
615
00:26:21,840 –> 00:26:23,680
This is where MCP becomes dangerous
616
00:26:23,680 –> 00:26:25,600
if you don’t treat it like a perimeter.
617
00:26:25,600 –> 00:26:27,200
A tool registry is discovery.
618
00:26:27,200 –> 00:26:28,480
An all-list is governance.
619
00:26:28,480 –> 00:26:30,880
If you don’t have both, you will wake up to toolsprall
620
00:26:30,880 –> 00:26:31,920
and entitlement sprawl,
621
00:26:31,920 –> 00:26:34,160
and you won’t remember which one caused the incident.
622
00:26:34,160 –> 00:26:36,560
Third, control plane owns evidence requirements.
623
00:26:36,560 –> 00:26:39,120
You don’t let execution decide what counts as proof.
624
00:26:39,120 –> 00:26:41,840
The policy says, before you cross the autonomy boundary,
625
00:26:41,840 –> 00:26:44,560
you must have a ticket reference correlated telemetry
626
00:26:44,560 –> 00:26:46,640
and a policy clause that permits the action.
627
00:26:46,640 –> 00:26:50,160
And after the action, you must emit a replayable record.
628
00:26:50,160 –> 00:26:53,680
If you let the execution plane best effort its way through evidence,
629
00:26:53,680 –> 00:26:55,280
you’ll end up with polite narratives
630
00:26:55,280 –> 00:26:56,880
instead of audit artifacts.
631
00:26:56,880 –> 00:26:58,560
Now here’s the part everyone gets wrong.
632
00:26:58,560 –> 00:26:59,520
Exceptions.
633
00:26:59,520 –> 00:27:02,560
Most organizations think exceptions are operational flexibility.
634
00:27:02,560 –> 00:27:03,280
They are wrong.
635
00:27:03,280 –> 00:27:04,960
Exceptions are entropy generators.
636
00:27:04,960 –> 00:27:06,000
Every time someone says,
637
00:27:06,000 –> 00:27:07,600
“Just let the agent do it this one time.”
638
00:27:07,600 –> 00:27:09,520
They’re not making the system more useful.
639
00:27:09,520 –> 00:27:12,800
They’re making your deterministic security model probabilistic.
640
00:27:12,800 –> 00:27:14,560
Because the exception doesn’t live in a vacuum.
641
00:27:14,560 –> 00:27:16,240
It gets copied, reused, inherited
642
00:27:16,240 –> 00:27:17,840
and eventually treated as baseline.
643
00:27:17,840 –> 00:27:19,680
The system did exactly what you allowed.
644
00:27:19,680 –> 00:27:21,120
You just forgot you allowed it.
645
00:27:21,120 –> 00:27:23,200
And the hardest problem in this entire model
646
00:27:23,200 –> 00:27:25,680
isn’t starting an agent, it’s stopping one.
647
00:27:25,680 –> 00:27:27,600
Not disable the app registration.
648
00:27:27,600 –> 00:27:29,440
Stopping an inflight run cleanly,
649
00:27:29,440 –> 00:27:31,600
mid execution across multiple systems
650
00:27:31,600 –> 00:27:33,840
with partial state changes and retries queued.
651
00:27:33,840 –> 00:27:36,240
If you don’t design kill behavior into the control plane,
652
00:27:36,240 –> 00:27:37,920
you’ll learn about it during an incident
653
00:27:37,920 –> 00:27:40,400
when the agent keeps helpfully reapplying
654
00:27:40,400 –> 00:27:42,400
the action you’re trying to roll back.
655
00:27:42,400 –> 00:27:43,600
So if you remember nothing else,
656
00:27:43,600 –> 00:27:45,200
governance lives in the control plane
657
00:27:45,200 –> 00:27:46,720
not in the agent’s prompt,
658
00:27:46,720 –> 00:27:49,200
the execution plane will always see convenience.
659
00:27:49,200 –> 00:27:51,200
Your job is to make convenience impossible
660
00:27:51,200 –> 00:27:52,880
when it violates intent.
661
00:27:52,880 –> 00:27:54,080
The worth it test.
662
00:27:54,080 –> 00:27:56,160
When autonomy beats assistance.
663
00:27:56,160 –> 00:27:58,000
Autonomy is not better AI.
664
00:27:58,000 –> 00:27:59,280
It’s a different cost model.
665
00:27:59,280 –> 00:28:01,200
Assistance helps a person finish work.
666
00:28:01,200 –> 00:28:03,600
Autonomy finishes work and leaves you with an artifact.
667
00:28:03,600 –> 00:28:05,680
That means the only honest question
668
00:28:05,680 –> 00:28:07,280
is whether the overhead of building
669
00:28:07,280 –> 00:28:10,080
and governing autonomous execution pays for itself.
670
00:28:10,080 –> 00:28:13,120
And it only pays in a very specific shape of problem.
671
00:28:13,120 –> 00:28:15,520
Autonomy wins when the work has volume,
672
00:28:15,520 –> 00:28:17,760
repeatability and bounded consequences.
673
00:28:18,480 –> 00:28:20,320
Think of it like any other automation.
674
00:28:20,320 –> 00:28:23,440
If the decision is rare, ambiguous or politically sensitive,
675
00:28:23,440 –> 00:28:24,880
autonomy won’t save you.
676
00:28:24,880 –> 00:28:26,960
It will just give you a faster way to be wrong.
677
00:28:26,960 –> 00:28:28,320
So here’s the worth it test,
678
00:28:28,320 –> 00:28:30,640
stated the way an enterprise should state it.
679
00:28:30,640 –> 00:28:33,680
Autonomy beats assistance when it increases outcome throughput
680
00:28:33,680 –> 00:28:35,680
without increasing policy violations.
681
00:28:35,680 –> 00:28:37,680
Not when users like it,
682
00:28:37,680 –> 00:28:39,520
or not when the demo is cool.
683
00:28:39,520 –> 00:28:42,240
When the system closes more outcomes per unit time,
684
00:28:42,240 –> 00:28:43,680
under enforced intent,
685
00:28:43,680 –> 00:28:45,920
and humans intervene less without losing control,
686
00:28:45,920 –> 00:28:47,600
that test has four components.
687
00:28:47,600 –> 00:28:49,600
First, throughput.
688
00:28:49,600 –> 00:28:50,880
Autonomy is a queue optimizer.
689
00:28:50,880 –> 00:28:52,480
If your queue depth never goes down,
690
00:28:52,480 –> 00:28:55,600
tickets churn, incidents churn, analysts become routers,
691
00:28:55,600 –> 00:28:58,160
then you have a throughput problem, not a skill problem.
692
00:28:58,160 –> 00:29:01,920
Autonomy earns its keep when it takes the low to medium complexity items
693
00:29:01,920 –> 00:29:04,560
off the queue entirely and keeps doing it at 2am
694
00:29:04,560 –> 00:29:07,120
on a weekend without waiting for someone to look at it.
695
00:29:07,120 –> 00:29:08,880
Second, consistency.
696
00:29:08,880 –> 00:29:10,560
Humans are inconsistent by design.
697
00:29:10,560 –> 00:29:12,080
They interpret runbooks differently.
698
00:29:12,080 –> 00:29:14,480
They skip documentation when the page is screaming.
699
00:29:14,480 –> 00:29:17,040
They make temporary changes and forget to reverse them.
700
00:29:17,040 –> 00:29:19,200
Autonomy, under an execution contract,
701
00:29:19,200 –> 00:29:21,280
does the same thing the same way every time.
702
00:29:21,280 –> 00:29:22,240
That is boring.
703
00:29:22,240 –> 00:29:23,600
Boring is the goal.
704
00:29:23,600 –> 00:29:26,240
Third, 24/7 execution.
705
00:29:26,240 –> 00:29:28,400
Assistance still bottlenecks on attention.
706
00:29:28,400 –> 00:29:30,800
Copilot can draft the incident report at midnight,
707
00:29:30,800 –> 00:29:32,720
but the incident still waits for the engineer
708
00:29:32,720 –> 00:29:35,680
who has to approve the change, run the fix, and document it.
709
00:29:35,680 –> 00:29:37,120
Autonomy doesn’t wait.
710
00:29:37,120 –> 00:29:39,600
It executes within its allowed action set,
711
00:29:39,600 –> 00:29:42,880
verifies and escalates only when the contract says it must.
712
00:29:42,880 –> 00:29:45,200
Fourth, reduced intervention rate.
713
00:29:45,200 –> 00:29:47,520
This is the metric most enterprises refuse to name
714
00:29:47,520 –> 00:29:49,120
because it forces accountability.
715
00:29:49,120 –> 00:29:51,840
What percentage of cases require a human to step in?
716
00:29:51,840 –> 00:29:53,840
With assistance, it’s basically all of them,
717
00:29:53,840 –> 00:29:56,240
because the human owns the last mile.
718
00:29:56,240 –> 00:29:58,880
With autonomy, you expect the intervention rate to drop,
719
00:29:58,880 –> 00:30:01,280
meaning the system handles the known-nones
720
00:30:01,280 –> 00:30:03,360
and punts the unknown unknowns to humans.
721
00:30:03,360 –> 00:30:04,480
Now those are the benefits.
722
00:30:04,480 –> 00:30:05,280
Here’s the gate.
723
00:30:05,280 –> 00:30:08,960
Autonomy only works when the decision environment is stable.
724
00:30:08,960 –> 00:30:11,360
That means the work items have recognizable patterns.
725
00:30:11,360 –> 00:30:13,440
The systems involved have reliable telemetry
726
00:30:13,440 –> 00:30:15,600
and the organization can define done
727
00:30:15,600 –> 00:30:18,080
in a way that can be validated automatically.
728
00:30:18,080 –> 00:30:21,040
If you can’t define done, you can’t automate outcomes.
729
00:30:21,040 –> 00:30:22,400
You can only automate motion.
730
00:30:22,400 –> 00:30:23,520
So what passes the test?
731
00:30:23,520 –> 00:30:26,160
High-volume, repeatable tasks with clear ownership
732
00:30:26,160 –> 00:30:27,760
and bounded scope.
733
00:30:27,760 –> 00:30:29,760
Common IT remediations.
734
00:30:29,760 –> 00:30:31,600
Known reconciliation patterns.
735
00:30:31,600 –> 00:30:33,760
Low-to-medium risk security responses
736
00:30:33,760 –> 00:30:36,800
where policy already defines what containment means.
737
00:30:36,800 –> 00:30:38,960
Autonomy thrives on operational repetition.
738
00:30:38,960 –> 00:30:40,000
What fails the test?
739
00:30:40,000 –> 00:30:41,440
Anything with ambiguous policy,
740
00:30:41,440 –> 00:30:44,560
sensitive consequences, weak telemetry or unclear ownership.
741
00:30:44,560 –> 00:30:46,800
If the action is “might-impact executives”,
742
00:30:46,800 –> 00:30:48,560
you will end up with humans anyway.
743
00:30:48,560 –> 00:30:50,320
If the action involves money movement
744
00:30:50,320 –> 00:30:52,320
without a deterministic evidence chain,
745
00:30:52,320 –> 00:30:54,320
you will end up with auditors anyway.
746
00:30:54,320 –> 00:30:57,600
If the signal quality is low and the system spends its time guessing,
747
00:30:57,600 –> 00:31:00,160
you will end up with an expensive guessing machine.
748
00:31:00,160 –> 00:31:02,720
And the most common anti-case is the one nobody admits,
749
00:31:02,720 –> 00:31:04,160
unclear blast radius.
750
00:31:04,160 –> 00:31:07,200
If you can’t bound the scope of what the agent is allowed to touch,
751
00:31:07,200 –> 00:31:08,480
you shouldn’t let it touch anything.
752
00:31:08,480 –> 00:31:10,000
That’s not caution, that’s just math.
753
00:31:10,960 –> 00:31:14,000
Now the KPI framing, because this is where autonomy projects die
754
00:31:14,000 –> 00:31:14,960
in finance meetings.
755
00:31:14,960 –> 00:31:16,720
You don’t measure autonomy by token cost.
756
00:31:16,720 –> 00:31:18,880
You measure it by cost per closed outcome,
757
00:31:18,880 –> 00:31:21,600
cost per incident resolved, cost per ticket closed,
758
00:31:21,600 –> 00:31:23,360
cost per reconciliation balanced.
759
00:31:23,360 –> 00:31:25,360
Cost per alert, triage to a real incident
760
00:31:25,360 –> 00:31:26,800
or dismissed with evidence.
761
00:31:26,800 –> 00:31:28,640
If autonomy lowers that number
762
00:31:28,640 –> 00:31:30,880
while holding policy compliance steady,
763
00:31:30,880 –> 00:31:31,680
it’s worth it.
764
00:31:31,680 –> 00:31:33,200
If it only makes people feel faster,
765
00:31:33,200 –> 00:31:34,560
it’s assistance with extra risk.
766
00:31:34,560 –> 00:31:35,760
So track four metrics
767
00:31:35,760 –> 00:31:37,600
and don’t negotiate with yourself about them.
768
00:31:37,600 –> 00:31:40,080
Time to close, from event to verified outcome.
769
00:31:40,080 –> 00:31:41,360
Human in the loop rate,
770
00:31:41,360 –> 00:31:43,120
what percentage required intervention,
771
00:31:43,120 –> 00:31:44,080
not review.
772
00:31:44,080 –> 00:31:47,120
Rollback frequency, how often did autonomy make a change
773
00:31:47,120 –> 00:31:48,320
that had to be undone?
774
00:31:48,320 –> 00:31:50,720
Policy compliance, how often did it cross a boundary
775
00:31:50,720 –> 00:31:51,600
it shouldn’t have crossed?
776
00:31:51,600 –> 00:31:54,160
And if you want one more that cuts through the noise,
777
00:31:54,160 –> 00:31:55,440
intervention histogram,
778
00:31:55,440 –> 00:31:56,960
not averages, the distribution,
779
00:31:56,960 –> 00:31:59,120
because the long tail is where incidents live.
780
00:31:59,120 –> 00:32:00,400
If the worth it test passes,
781
00:32:00,400 –> 00:32:02,160
autonomy becomes an engineering project.
782
00:32:02,160 –> 00:32:04,400
If it fails, keep it as assistance
783
00:32:04,400 –> 00:32:06,640
and be honest that you’re buying faster labor.
784
00:32:06,640 –> 00:32:08,160
Now we can get concrete
785
00:32:08,160 –> 00:32:11,280
because the first scenario, autonomous IT remediation,
786
00:32:11,280 –> 00:32:13,200
exposes control plane immaturity
787
00:32:13,200 –> 00:32:16,080
faster than any governance workshop ever will.
788
00:32:16,080 –> 00:32:20,400
Scenario one, setup, autonomous IT remediation at scale.
789
00:32:20,400 –> 00:32:22,560
IT remediation is where autonomy stops
790
00:32:22,560 –> 00:32:25,600
being a philosophy and becomes a liability calculation.
791
00:32:25,600 –> 00:32:28,160
Because the pain is real, the volume is relentless
792
00:32:28,160 –> 00:32:31,440
and the work is mostly the same handful of moves repeated
793
00:32:31,440 –> 00:32:33,680
forever by tired humans who swear
794
00:32:33,680 –> 00:32:35,280
they’ll document it next time.
795
00:32:35,280 –> 00:32:37,680
The typical enterprise starts here, alert fatigue.
796
00:32:37,680 –> 00:32:40,400
Monitoring fires, someone triages, they assign it,
797
00:32:40,400 –> 00:32:42,080
the assignee asks for context,
798
00:32:42,080 –> 00:32:43,680
then you get the escalation loop.
799
00:32:43,680 –> 00:32:45,120
The ticket bounces between teams
800
00:32:45,120 –> 00:32:46,880
because nobody owns the whole path.
801
00:32:46,880 –> 00:32:49,040
Meanwhile, users keep reporting the symptom,
802
00:32:49,040 –> 00:32:50,720
not the cause, so the queue gets heavier
803
00:32:50,720 –> 00:32:52,160
while the service gets worse.
804
00:32:52,160 –> 00:32:53,920
And buried inside that mess is a simple truth.
805
00:32:53,920 –> 00:32:55,760
Most of the incidents aren’t mysterious.
806
00:32:55,760 –> 00:32:56,800
They’re just unknown.
807
00:32:56,800 –> 00:32:59,200
They restart worthy, rollback worthy.
808
00:32:59,200 –> 00:33:02,080
Apply the known fix and verify worthy.
809
00:33:02,080 –> 00:33:03,840
But because humans are the bottleneck,
810
00:33:03,840 –> 00:33:04,880
everything queues.
811
00:33:04,880 –> 00:33:06,080
Work doesn’t close.
812
00:33:06,080 –> 00:33:06,880
It turns.
813
00:33:07,840 –> 00:33:10,800
So the baseline flow most organizations run looks like this.
814
00:33:10,800 –> 00:33:13,600
Detect, triage, assign, remediate, document.
815
00:33:13,600 –> 00:33:14,480
It sounds orderly.
816
00:33:14,480 –> 00:33:16,240
In practice, it’s a game of telephone.
817
00:33:16,240 –> 00:33:18,240
Detect is an alert with weak context.
818
00:33:18,240 –> 00:33:22,080
Triage is a person reconstructing context across three portals.
819
00:33:22,080 –> 00:33:23,600
Assign is guessing who’s least busy.
820
00:33:23,600 –> 00:33:26,000
Remediate is someone doing the same command sequence
821
00:33:26,000 –> 00:33:26,880
they did last week.
822
00:33:26,880 –> 00:33:28,640
Document is either an afterthought
823
00:33:28,640 –> 00:33:30,160
or a copy-paced narrative
824
00:33:30,160 –> 00:33:32,480
written to satisfy process, not truth.
825
00:33:32,480 –> 00:33:33,680
Now the autonomy version,
826
00:33:33,680 –> 00:33:36,240
the one worth doing, changes the shape of the work.
827
00:33:36,240 –> 00:33:37,600
The agentic flow is,
828
00:33:37,600 –> 00:33:41,360
detect, diagnose, remediate, verify, close, report.
829
00:33:41,360 –> 00:33:42,320
Notice what’s missing.
830
00:33:42,320 –> 00:33:44,640
There’s no assigned step
831
00:33:44,640 –> 00:33:46,640
because the system doesn’t need to find a human.
832
00:33:46,640 –> 00:33:48,560
And there’s no document later
833
00:33:48,560 –> 00:33:50,240
because evidence is part of the run,
834
00:33:50,240 –> 00:33:52,320
not a chore you hope somebody remembers.
835
00:33:52,320 –> 00:33:54,800
But this only works if you take ownership seriously.
836
00:33:54,800 –> 00:33:56,320
Autonomy doesn’t eliminate ownership.
837
00:33:56,320 –> 00:33:57,200
It just moves it.
838
00:33:57,200 –> 00:33:58,640
Someone still carries the pager.
839
00:33:58,640 –> 00:34:00,160
Someone still owns rollback.
840
00:34:00,160 –> 00:34:01,680
Someone still owns the change record
841
00:34:01,680 –> 00:34:03,520
when the agent makes a configuration update
842
00:34:03,520 –> 00:34:05,200
that technically counts as a change.
843
00:34:05,200 –> 00:34:06,960
Even if nobody typed the command.
844
00:34:06,960 –> 00:34:09,120
So before you let an agent touch remediation,
845
00:34:09,120 –> 00:34:10,560
you need to answer the questions
846
00:34:10,560 –> 00:34:13,200
most pilot teams avoid because they’re inconvenient.
847
00:34:13,200 –> 00:34:14,720
What incident classes are in scope?
848
00:34:14,720 –> 00:34:16,320
What systems are allowed to be changed?
849
00:34:16,320 –> 00:34:17,760
What is the containment unit?
850
00:34:17,760 –> 00:34:19,040
Subscription, resource group,
851
00:34:19,040 –> 00:34:20,960
specific service, specific environment?
852
00:34:20,960 –> 00:34:24,000
What evidence is required before the agent is allowed to act?
853
00:34:24,000 –> 00:34:25,840
And what does verified mean for each fix?
854
00:34:25,840 –> 00:34:27,040
Because in IT remediation,
855
00:34:27,040 –> 00:34:28,480
the action is usually trivial.
856
00:34:28,480 –> 00:34:29,920
The blast radius is not.
857
00:34:29,920 –> 00:34:31,840
Restarting a service sounds harmless
858
00:34:31,840 –> 00:34:33,440
until it restarts the wrong tier,
859
00:34:33,440 –> 00:34:35,360
drops connections and triggers a cascade
860
00:34:35,360 –> 00:34:36,800
that looks like an outage.
861
00:34:36,800 –> 00:34:38,320
Rolling back a config sounds safe
862
00:34:38,320 –> 00:34:40,560
until the known good state is from three months ago
863
00:34:40,560 –> 00:34:42,560
and today’s dependencies are different.
864
00:34:42,560 –> 00:34:43,920
Patching sounds responsible
865
00:34:43,920 –> 00:34:46,400
until the patch triggers a reboot during business hours
866
00:34:46,400 –> 00:34:48,800
because someone forgot to encode a maintenance window.
867
00:34:48,800 –> 00:34:50,480
This is why autonomy in remediation
868
00:34:50,480 –> 00:34:52,080
is the fastest way to expose
869
00:34:52,080 –> 00:34:53,840
whether your control plane is real.
870
00:34:53,840 –> 00:34:55,840
If you can’t express a remediation action
871
00:34:55,840 –> 00:34:58,640
as a bounded, auditable, reversible operation,
872
00:34:58,640 –> 00:34:59,840
you shouldn’t automate it.
873
00:34:59,840 –> 00:35:01,440
Not because automation is scary.
874
00:35:01,440 –> 00:35:02,880
Because automation is honest,
875
00:35:02,880 –> 00:35:05,680
it executes what you allow repeatedly at machine speed.
876
00:35:05,680 –> 00:35:06,800
So in this scenario,
877
00:35:06,800 –> 00:35:09,200
the objective isn’t let the agent fix everything there.
878
00:35:09,200 –> 00:35:11,600
The objective is narrower and more defensible.
879
00:35:11,600 –> 00:35:13,920
Let the agent close the predictable incidents
880
00:35:13,920 –> 00:35:16,080
that already have deterministic runbooks
881
00:35:16,080 –> 00:35:18,720
with explicit thresholds and clean escalation.
882
00:35:18,720 –> 00:35:21,120
Think memory leaks with known mitigations.
883
00:35:21,120 –> 00:35:22,560
Stuck queue processors.
884
00:35:22,560 –> 00:35:23,920
Certificates approaching expiry
885
00:35:23,920 –> 00:35:25,520
where rotation is already scripted.
886
00:35:25,520 –> 00:35:28,320
Diskspace remediation where a cleanup is defined and bounded,
887
00:35:28,320 –> 00:35:30,800
service restarts where the verification checks are clear
888
00:35:30,800 –> 00:35:32,000
and the rollback is
889
00:35:32,000 –> 00:35:33,760
bring it back up and page a human
890
00:35:33,760 –> 00:35:35,280
if the health probe doesn’t recover.
891
00:35:35,280 –> 00:35:36,080
And if you’re thinking,
892
00:35:36,080 –> 00:35:38,000
“Okay, that’s just automation? Good.”
893
00:35:38,000 –> 00:35:40,320
Autonomy is automation with three added requirements.
894
00:35:40,320 –> 00:35:42,880
It chooses the runbook, it proves why it chose it,
895
00:35:42,880 –> 00:35:45,920
and it verifies the outcome under an execution contract.
896
00:35:45,920 –> 00:35:48,160
Now here’s the payoff signal you should hold onto.
897
00:35:48,160 –> 00:35:49,440
Closing the ticket is easy.
898
00:35:49,440 –> 00:35:52,240
Producing evidence and bounding the blast radius is the work.
899
00:35:52,240 –> 00:35:54,960
That’s why this scenario is perfect as the first deep dive.
900
00:35:54,960 –> 00:35:57,120
It forces you to confront the autonomy boundary
901
00:35:57,120 –> 00:35:59,840
in a domain where outcomes are measurable and failure is loud.
902
00:35:59,840 –> 00:36:02,640
If the agent can’t show its evidence trail, you won’t trust it.
903
00:36:02,640 –> 00:36:04,960
If it can’t be stopped mid-flight, you’ll fear it.
904
00:36:04,960 –> 00:36:07,040
If you can’t name who wakes up when it fails,
905
00:36:07,040 –> 00:36:08,320
you’re not doing autonomy.
906
00:36:08,320 –> 00:36:09,360
You’re doing a demo.
907
00:36:09,360 –> 00:36:13,360
So the next thing is to map the flow across the real enterprise surfaces
908
00:36:13,360 –> 00:36:14,720
as you’re for the resources,
909
00:36:14,720 –> 00:36:17,840
graph for identity adjacent actions and communications,
910
00:36:17,840 –> 00:36:20,800
the ITSM system for tickets and change records
911
00:36:20,800 –> 00:36:23,760
and policy gates that decide when execution is allowed.
912
00:36:23,760 –> 00:36:25,600
That’s where most implementations collapse,
913
00:36:25,600 –> 00:36:27,760
not in the model but in permissions and scope.
914
00:36:27,760 –> 00:36:33,680
Scenario one, system flow, Azure plus graph plus ITSM plus policy gates
915
00:36:33,680 –> 00:36:37,360
start with the reality, the agent can’t remediate an incident.
916
00:36:37,360 –> 00:36:39,680
It can only move through systems that already exist.
917
00:36:39,680 –> 00:36:40,960
Azure for the workload,
918
00:36:40,960 –> 00:36:44,240
Microsoft graph for identity adjacent actions and communication,
919
00:36:44,240 –> 00:36:46,640
the ITSM platform for the record of truth,
920
00:36:46,640 –> 00:36:47,920
and then policy gates,
921
00:36:47,920 –> 00:36:53,200
entra approvals and evidence rules that decide whether the agent is allowed to touch anything at all.
922
00:36:53,200 –> 00:36:54,960
So the flow begins at ingestion.
923
00:36:54,960 –> 00:36:59,760
A signal arrives from Azure Monitor, log analytics, defender for cloud, service health,
924
00:36:59,760 –> 00:37:01,840
or a ticket event from your ITSM tool.
925
00:37:01,840 –> 00:37:06,400
The first job is to normalize that signal into something the autonomy stack can reason over.
926
00:37:06,400 –> 00:37:09,520
Incident class, impacted resource,
927
00:37:09,520 –> 00:37:12,160
environment tag, customer impact signals,
928
00:37:12,160 –> 00:37:14,320
and any known runbook mapping keys.
929
00:37:14,320 –> 00:37:16,240
If the event payload can’t be mapped to scope,
930
00:37:16,240 –> 00:37:18,320
the system should not try harder.
931
00:37:18,320 –> 00:37:19,280
It should escalate,
932
00:37:19,280 –> 00:37:21,200
autonomy doesn’t earn trust by guessing,
933
00:37:21,200 –> 00:37:24,160
it earns trust by refusing to act without containment.
934
00:37:24,160 –> 00:37:25,840
Next is correlation and diagnosis.
935
00:37:25,840 –> 00:37:28,400
The agent pulls additional context from Azure.
936
00:37:28,400 –> 00:37:30,800
Recent deployments, configuration changes,
937
00:37:30,800 –> 00:37:33,200
scaling events, health probes, dependency failures,
938
00:37:33,200 –> 00:37:36,720
and whatever telemetry exists that can confirm this isn’t a phantom alert.
939
00:37:36,720 –> 00:37:40,480
This is where the execution contracts evidence requirements become mechanical.
940
00:37:40,480 –> 00:37:42,960
If the contract says two independent signals,
941
00:37:42,960 –> 00:37:44,400
the system must collect them.
942
00:37:44,400 –> 00:37:47,520
A failing synthetic test plus a spike in error rate, for example.
943
00:37:47,520 –> 00:37:48,560
If it can’t, it stops.
944
00:37:48,560 –> 00:37:50,480
That’s the autonomy boundary doing its job.
945
00:37:50,480 –> 00:37:53,920
Now the system decides whether the incident is in an autonomous class.
946
00:37:53,920 –> 00:37:55,600
That classification shouldn’t live in a prompt.
947
00:37:55,600 –> 00:37:57,040
It should live in policy,
948
00:37:57,040 –> 00:37:58,400
a list of incident types,
949
00:37:58,400 –> 00:38:00,400
environments, and severity levels
950
00:38:00,400 –> 00:38:02,160
that are eligible for automatic action.
951
00:38:02,160 –> 00:38:04,560
Production, CV-1 with unknown blast radius?
952
00:38:04,560 –> 00:38:05,200
No.
953
00:38:05,200 –> 00:38:08,080
Non-prodQ processor wedged for 30 minutes with a known fix?
954
00:38:08,080 –> 00:38:08,560
Yes.
955
00:38:08,560 –> 00:38:09,920
The goal is not heroics.
956
00:38:09,920 –> 00:38:11,520
The goal is predictable closure.
957
00:38:11,520 –> 00:38:12,880
Once the incident is eligible,
958
00:38:12,880 –> 00:38:15,280
orchestration selects the remediation pathway.
959
00:38:15,280 –> 00:38:16,400
In enterprise terms,
960
00:38:16,400 –> 00:38:18,960
this is runbook selection with preconditions.
961
00:38:18,960 –> 00:38:21,120
The agent chooses restart service,
962
00:38:21,120 –> 00:38:22,000
scale out,
963
00:38:22,000 –> 00:38:23,520
rollback last deployment,
964
00:38:23,520 –> 00:38:24,800
clear poison queue,
965
00:38:24,800 –> 00:38:26,400
rotate certificate,
966
00:38:26,400 –> 00:38:27,520
whatever you’ve defined.
967
00:38:27,520 –> 00:38:30,960
But each pathway has to include two extra things humans often skip,
968
00:38:30,960 –> 00:38:32,080
a rollback plan,
969
00:38:32,080 –> 00:38:33,520
and a verification plan.
970
00:38:33,520 –> 00:38:36,080
Rollback is what happens if the action makes it worse.
971
00:38:36,080 –> 00:38:38,320
Verification is what proves the action worked
972
00:38:38,320 –> 00:38:40,720
without a human saying looks fine.
973
00:38:40,720 –> 00:38:42,080
Now we hit the policy gates.
974
00:38:42,080 –> 00:38:43,520
Before any right action,
975
00:38:43,520 –> 00:38:46,240
the agent should cross-check identity and authorization.
976
00:38:46,240 –> 00:38:48,080
What principle is executing?
977
00:38:48,080 –> 00:38:49,360
What roles are active?
978
00:38:49,360 –> 00:38:51,040
And whether the current context
979
00:38:51,040 –> 00:38:52,800
satisfies conditional access
980
00:38:52,800 –> 00:38:54,800
and whatever risk conditions you enforce.
981
00:38:54,800 –> 00:38:56,640
And yes, if you’re doing this properly,
982
00:38:56,640 –> 00:38:58,800
you’ll end up with something PIM-like in spirit,
983
00:38:58,800 –> 00:39:00,640
even if the implementation differs,
984
00:39:00,640 –> 00:39:03,360
a constrained elevation model for specific actions,
985
00:39:03,360 –> 00:39:04,800
time-bounded, scope-bounded,
986
00:39:04,800 –> 00:39:07,200
and logged as an event that can be audited.
987
00:39:07,200 –> 00:39:09,920
At the same time, the ITSM system becomes a gate,
988
00:39:09,920 –> 00:39:11,120
not a bystander.
989
00:39:11,120 –> 00:39:14,000
The agent should either create or update a ticket with
990
00:39:14,000 –> 00:39:15,200
the detected signal,
991
00:39:15,200 –> 00:39:16,560
the evidence collected,
992
00:39:16,560 –> 00:39:17,840
the planned action sequence,
993
00:39:17,840 –> 00:39:20,240
and the policy clause that authorizes execution.
994
00:39:20,240 –> 00:39:22,080
If change control matters in your org,
995
00:39:22,080 –> 00:39:23,840
the agent should also create a change record,
996
00:39:23,840 –> 00:39:27,360
because the agent did it is not an exemption from your own process.
997
00:39:27,360 –> 00:39:29,920
It just means the process must be machine readable,
998
00:39:29,920 –> 00:39:31,840
then the action execution happens in Azure.
999
00:39:31,840 –> 00:39:33,440
This is where people get sloppy.
1000
00:39:33,440 –> 00:39:35,040
Restart the service must be implemented
1001
00:39:35,040 –> 00:39:36,640
as a scoped operation.
1002
00:39:36,640 –> 00:39:39,280
Target resource IDs explicitly restrict subscription
1003
00:39:39,280 –> 00:39:40,560
and resource group boundaries
1004
00:39:40,560 –> 00:39:41,680
and enforce rate limits,
1005
00:39:41,680 –> 00:39:43,680
so the agent can’t restart the entire fleet
1006
00:39:43,680 –> 00:39:45,680
because it saw the same symptom twice.
1007
00:39:45,680 –> 00:39:48,240
If the remediation involves deployment rollback,
1008
00:39:48,240 –> 00:39:50,160
it must pin to a specific version
1009
00:39:50,160 –> 00:39:51,840
and validate dependency drift.
1010
00:39:51,840 –> 00:39:52,960
If it involves patching,
1011
00:39:52,960 –> 00:39:54,800
it must honor maintenance windows.
1012
00:39:54,800 –> 00:39:56,960
Autonomy doesn’t erase operational discipline.
1013
00:39:56,960 –> 00:39:58,240
It weaponizes it,
1014
00:39:58,240 –> 00:40:00,080
either in your favor or against you.
1015
00:40:00,080 –> 00:40:02,880
Now graph shows up for two things.
1016
00:40:02,880 –> 00:40:04,640
Coordination and containment.
1017
00:40:04,640 –> 00:40:06,560
Coordination means notifications,
1018
00:40:06,560 –> 00:40:08,240
posting to the right teams channel,
1019
00:40:08,240 –> 00:40:09,200
updating the ticket,
1020
00:40:09,200 –> 00:40:11,440
emailing impacted stakeholders if that’s your norm.
1021
00:40:11,440 –> 00:40:13,680
Containment means identity adjacent actions
1022
00:40:13,680 –> 00:40:15,200
when the incident demands it,
1023
00:40:15,200 –> 00:40:17,440
disabling a compromised app registration,
1024
00:40:17,440 –> 00:40:18,640
revoking sessions,
1025
00:40:18,640 –> 00:40:19,760
rotating secrets,
1026
00:40:19,760 –> 00:40:20,720
or pulling access.
1027
00:40:20,720 –> 00:40:22,160
But those actions are higher risk,
1028
00:40:22,160 –> 00:40:24,080
so they should sit behind stricter gates,
1029
00:40:24,080 –> 00:40:25,440
stronger evidence requirements,
1030
00:40:25,440 –> 00:40:27,680
tithescopes, and lower confidence tolerance.
1031
00:40:27,680 –> 00:40:29,760
Finally, verification and closure.
1032
00:40:29,760 –> 00:40:32,720
The agent requaries telemetry,
1033
00:40:32,720 –> 00:40:33,920
health probes green,
1034
00:40:33,920 –> 00:40:35,200
error rates normal,
1035
00:40:35,200 –> 00:40:36,720
queue depth trending down.
1036
00:40:36,720 –> 00:40:38,400
User impact signals resolved.
1037
00:40:38,400 –> 00:40:39,520
If verification fails,
1038
00:40:39,520 –> 00:40:41,040
it either rolls back or escalates
1039
00:40:41,040 –> 00:40:42,240
depending on the contract.
1040
00:40:42,240 –> 00:40:42,960
And when it closes,
1041
00:40:42,960 –> 00:40:44,400
it doesn’t just close the ticket.
1042
00:40:44,400 –> 00:40:45,680
It writes the evidence bundle,
1043
00:40:45,680 –> 00:40:46,640
inputs, decisions,
1044
00:40:46,640 –> 00:40:47,600
toolcalls, approvals,
1045
00:40:47,600 –> 00:40:50,160
and verification results linked to the ITSM record
1046
00:40:50,160 –> 00:40:51,360
that bundle is the product.
1047
00:40:51,360 –> 00:40:53,120
Without it, you don’t have autonomy.
1048
00:40:53,120 –> 00:40:53,920
You have fast,
1049
00:40:53,920 –> 00:40:54,960
unreviewable change.
1050
00:40:54,960 –> 00:40:56,720
Scenario one,
1051
00:40:56,720 –> 00:40:57,920
governance leaves privilege,
1052
00:40:57,920 –> 00:40:59,040
or it becomes a worm.
1053
00:40:59,040 –> 00:41:00,160
Now we talk about governance,
1054
00:41:00,160 –> 00:41:01,840
because this is where the remediation story
1055
00:41:01,840 –> 00:41:03,360
stops being an engineering win
1056
00:41:03,360 –> 00:41:06,400
and starts being an enterprise incident waiting to happen.
1057
00:41:06,400 –> 00:41:09,120
Autonomous remediation has a simple security truth.
1058
00:41:09,120 –> 00:41:10,560
If the agent can do anything,
1059
00:41:10,560 –> 00:41:12,240
it will eventually do everything.
1060
00:41:12,240 –> 00:41:13,200
Not out of malice,
1061
00:41:13,200 –> 00:41:15,360
out of pathfinding, tools try to succeed,
1062
00:41:15,360 –> 00:41:16,560
retreats try to recover,
1063
00:41:16,560 –> 00:41:17,680
fallbacks try to help.
1064
00:41:17,680 –> 00:41:19,360
And if you gave the system broad rides,
1065
00:41:19,360 –> 00:41:21,040
you built a self-propelled operator
1066
00:41:21,040 –> 00:41:22,640
with no meaningful containment.
1067
00:41:22,640 –> 00:41:24,240
That’s a worm, just with nicer logs.
1068
00:41:24,240 –> 00:41:26,640
So governance for this scenario is not a checklist.
1069
00:41:26,640 –> 00:41:29,120
It is least privileged expressed as an execution contract
1070
00:41:29,120 –> 00:41:30,960
that the runtime cannot negotiate with.
1071
00:41:30,960 –> 00:41:32,560
Start with the agent identity.
1072
00:41:32,560 –> 00:41:34,240
This cannot be a service account.
1073
00:41:34,240 –> 00:41:36,960
It cannot be my automation app registration.
1074
00:41:36,960 –> 00:41:38,160
It’s a non-human principle
1075
00:41:38,160 –> 00:41:39,200
with a single purpose,
1076
00:41:39,200 –> 00:41:40,080
a narrow scope,
1077
00:41:40,080 –> 00:41:41,600
and a life cycle you actually manage.
1078
00:41:41,600 –> 00:41:43,280
It needs explicit role boundaries,
1079
00:41:43,280 –> 00:41:44,080
what it can read,
1080
00:41:44,080 –> 00:41:44,880
what it can write,
1081
00:41:44,880 –> 00:41:46,480
and more importantly, where.
1082
00:41:46,480 –> 00:41:48,000
Subscription, resource group,
1083
00:41:48,000 –> 00:41:49,280
specific resource types,
1084
00:41:49,280 –> 00:41:50,320
specific environments.
1085
00:41:50,320 –> 00:41:52,160
The containment unit needs to be explicit
1086
00:41:52,160 –> 00:41:54,960
because remediation is always tempted to expand scope.
1087
00:41:54,960 –> 00:41:55,920
I saw the issue here,
1088
00:41:55,920 –> 00:41:57,520
so I’ll go look over there.
1089
00:41:57,520 –> 00:41:59,360
No, it stays where you told it to stay.
1090
00:41:59,360 –> 00:42:00,480
Then you enforce it in
1091
00:42:00,480 –> 00:42:02,000
entra and as your authorization,
1092
00:42:02,000 –> 00:42:02,720
not in a prompt.
1093
00:42:02,720 –> 00:42:04,240
The easiest way to lie to yourself
1094
00:42:04,240 –> 00:42:05,840
is to implement least privilege
1095
00:42:05,840 –> 00:42:07,200
in the orchestration logic
1096
00:42:07,200 –> 00:42:09,360
while the principle still has contributor.
1097
00:42:09,360 –> 00:42:10,880
The system will behave until it doesn’t,
1098
00:42:10,880 –> 00:42:11,680
and when it doesn’t,
1099
00:42:11,680 –> 00:42:14,320
the logs will faithfully record the outcome you allowed.
1100
00:42:14,320 –> 00:42:15,280
So you need a pattern
1101
00:42:15,280 –> 00:42:17,520
where the baseline identity can observe
1102
00:42:17,520 –> 00:42:18,880
broadly enough to diagnose,
1103
00:42:18,880 –> 00:42:21,360
but act narrowly enough to not create a blast radius.
1104
00:42:21,360 –> 00:42:23,920
And if you require elevation for certain actions,
1105
00:42:23,920 –> 00:42:25,680
you make that elevation time-bounded,
1106
00:42:25,680 –> 00:42:27,360
scope-bounded, and auditable.
1107
00:42:27,360 –> 00:42:28,240
Call it PIM-like,
1108
00:42:28,240 –> 00:42:29,120
call it just in time,
1109
00:42:29,120 –> 00:42:30,160
call it whatever you want.
1110
00:42:30,160 –> 00:42:31,440
The mechanism isn’t the point.
1111
00:42:31,440 –> 00:42:33,280
The point is that right access
1112
00:42:33,280 –> 00:42:34,880
is a temporary capability,
1113
00:42:34,880 –> 00:42:36,400
not a permanent property.
1114
00:42:36,400 –> 00:42:38,320
Next, permission granularity.
1115
00:42:38,320 –> 00:42:42,160
Most logs treat remediation as a single permission set.
1116
00:42:42,160 –> 00:42:43,680
The agent can remediate.
1117
00:42:43,680 –> 00:42:45,040
That’s how you end up with an agent
1118
00:42:45,040 –> 00:42:46,320
that can restart a service
1119
00:42:46,320 –> 00:42:47,760
and also reconfigure networking
1120
00:42:47,760 –> 00:42:49,760
because both are operations.
1121
00:42:49,760 –> 00:42:51,120
They are not symmetrical.
1122
00:42:51,120 –> 00:42:54,400
Restart one app service instance is an operational nudge.
1123
00:42:54,400 –> 00:42:57,120
Modify NSG rules is infrastructure surgery.
1124
00:42:57,120 –> 00:42:59,280
Rollback at deployment is reversible.
1125
00:42:59,280 –> 00:43:02,080
Rotate secrets across dependencies is cross-system coupling.
1126
00:43:02,080 –> 00:43:03,600
So you define action classes
1127
00:43:03,600 –> 00:43:05,520
and you bind privileges to those classes.
1128
00:43:05,520 –> 00:43:06,720
You do not grant right
1129
00:43:06,720 –> 00:43:08,000
and hope policy saves you,
1130
00:43:08,000 –> 00:43:09,360
policy doesn’t save you.
1131
00:43:09,360 –> 00:43:10,480
It records your mistakes.
1132
00:43:10,480 –> 00:43:12,080
Now, guardrails,
1133
00:43:12,080 –> 00:43:14,560
because permissions alone don’t prevent failure loops.
1134
00:43:14,560 –> 00:43:15,920
You need kill switches.
1135
00:43:15,920 –> 00:43:16,560
Real ones.
1136
00:43:16,560 –> 00:43:19,920
A kill switch is not disabled the app.
1137
00:43:19,920 –> 00:43:21,760
A kill switch is a control plane decision
1138
00:43:21,760 –> 00:43:23,200
that stops new runs from starting
1139
00:43:23,200 –> 00:43:25,840
and also terminates in-flight runs cleanly.
1140
00:43:25,840 –> 00:43:27,440
Cancel queue tool calls,
1141
00:43:27,440 –> 00:43:28,480
prevent retries,
1142
00:43:28,480 –> 00:43:30,560
and leave a clear halted state
1143
00:43:30,560 –> 00:43:33,600
that humans can resume from or roll back from.
1144
00:43:33,600 –> 00:43:34,480
Without that,
1145
00:43:34,480 –> 00:43:36,080
your incident response will include
1146
00:43:36,080 –> 00:43:37,600
fighting your own automation
1147
00:43:37,600 –> 00:43:39,360
while it keeps trying to help.
1148
00:43:39,360 –> 00:43:41,840
Then you need Quoters Action Quoters per run.
1149
00:43:41,840 –> 00:43:43,440
Action Quoters per hour.
1150
00:43:43,440 –> 00:43:44,640
Resource Quoters per scope.
1151
00:43:44,640 –> 00:43:46,560
If the agent sees 500 alerts
1152
00:43:46,560 –> 00:43:48,400
and decides to remediate all of them,
1153
00:43:48,400 –> 00:43:49,760
that’s not productivity.
1154
00:43:49,760 –> 00:43:51,600
That’s a denial of service you paid for.
1155
00:43:51,600 –> 00:43:53,360
Quoters force the system to batch,
1156
00:43:53,360 –> 00:43:54,160
to prioritize,
1157
00:43:54,160 –> 00:43:56,480
and to escalate when it hits its allowed limit.
1158
00:43:56,480 –> 00:43:58,000
And you need confidence thresholds
1159
00:43:58,000 –> 00:43:59,360
that actually mean something.
1160
00:43:59,360 –> 00:44:01,120
Not a single confidence score number
1161
00:44:01,120 –> 00:44:03,120
that gets tuned until the system acts.
1162
00:44:03,120 –> 00:44:05,200
You define what constitutes sufficient evidence
1163
00:44:05,200 –> 00:44:06,400
for the class of incident.
1164
00:44:06,400 –> 00:44:07,520
Two independent signals,
1165
00:44:07,520 –> 00:44:08,320
a known signature,
1166
00:44:08,320 –> 00:44:09,680
a validated precondition.
1167
00:44:09,680 –> 00:44:10,720
If those aren’t met,
1168
00:44:10,720 –> 00:44:13,120
the agent escalates with the evidence it has and stops.
1169
00:44:13,120 –> 00:44:14,400
That’s how you keep autonomy
1170
00:44:14,400 –> 00:44:16,640
from becoming probabilistic improvisation.
1171
00:44:16,640 –> 00:44:18,480
Finally, the escalation contract.
1172
00:44:18,480 –> 00:44:20,080
When it can’t act, where does it go?
1173
00:44:20,080 –> 00:44:21,520
ITSM ticket assignment,
1174
00:44:21,520 –> 00:44:22,400
Teams channel,
1175
00:44:22,400 –> 00:44:23,520
on call paging.
1176
00:44:23,520 –> 00:44:24,480
And what does it include?
1177
00:44:24,480 –> 00:44:25,760
It includes the evidence bundle
1178
00:44:25,760 –> 00:44:27,120
and the proposed next action.
1179
00:44:27,120 –> 00:44:28,240
Not a vague summary.
1180
00:44:28,240 –> 00:44:30,240
The goal is to turn human in the loop
1181
00:44:30,240 –> 00:44:31,920
into human as exception handler,
1182
00:44:31,920 –> 00:44:33,760
not human as the default executor.
1183
00:44:33,760 –> 00:44:34,960
And you measure all of this
1184
00:44:34,960 –> 00:44:36,880
because governance without measurement is theatre.
1185
00:44:36,880 –> 00:44:38,640
Track MTTR Delta, sure.
1186
00:44:38,640 –> 00:44:40,480
But also track human in loop rate,
1187
00:44:40,480 –> 00:44:41,440
rollback frequency,
1188
00:44:41,440 –> 00:44:43,600
and the number of times the kill switch gets used.
1189
00:44:43,600 –> 00:44:44,800
If rollbacks are frequent,
1190
00:44:44,800 –> 00:44:46,800
your execution contract is too permissive
1191
00:44:46,800 –> 00:44:48,160
or your verification is weak.
1192
00:44:48,160 –> 00:44:49,440
If the kill switch gets used often,
1193
00:44:49,440 –> 00:44:50,640
you have a drift problem.
1194
00:44:50,640 –> 00:44:52,320
If the human in loop rate never drops,
1195
00:44:52,320 –> 00:44:54,560
you build assistance and call it autonomy.
1196
00:44:54,560 –> 00:44:57,200
So the governance rule for scenario one is brutal and simple.
1197
00:44:57,200 –> 00:44:59,120
Either remediation is least privileged
1198
00:44:59,120 –> 00:45:00,400
with enforceable boundaries
1199
00:45:00,400 –> 00:45:02,720
or it becomes a worm with change control paperwork.
1200
00:45:02,720 –> 00:45:04,320
There is no third state.
1201
00:45:04,320 –> 00:45:06,160
Scenario two, setup.
1202
00:45:06,160 –> 00:45:08,720
Finance reconciliation and close support.
1203
00:45:08,720 –> 00:45:10,720
Finance is where autonomy stops being
1204
00:45:10,720 –> 00:45:13,600
ops automation and turns into institutional trust.
1205
00:45:13,600 –> 00:45:15,920
Because reconciliation isn’t a convenience task,
1206
00:45:15,920 –> 00:45:17,840
it’s the thing standing between your organization
1207
00:45:17,840 –> 00:45:20,880
and an audit finding that ruins someone’s quarter.
1208
00:45:20,880 –> 00:45:22,320
The pain pattern is predictable.
1209
00:45:22,320 –> 00:45:25,120
Close arrives, everyone becomes a human join engine
1210
00:45:25,120 –> 00:45:27,280
and the spreadsheet layer metastasizes.
1211
00:45:27,280 –> 00:45:29,680
People pull exports from the ERP bank feeds,
1212
00:45:29,680 –> 00:45:31,680
expense platforms, procurement systems,
1213
00:45:31,680 –> 00:45:33,760
and whatever temporary tracker someone made
1214
00:45:33,760 –> 00:45:35,840
because the official system was slow.
1215
00:45:35,840 –> 00:45:37,680
Then they spend days matching line items,
1216
00:45:37,680 –> 00:45:39,200
chasing missing references,
1217
00:45:39,200 –> 00:45:41,360
and writing explanations that sound plausible enough
1218
00:45:41,360 –> 00:45:42,240
to survive review.
1219
00:45:42,240 –> 00:45:44,960
And the thing most people miss is that reconciliation work
1220
00:45:44,960 –> 00:45:47,200
has two outputs, not one.
1221
00:45:47,200 –> 00:45:48,960
Yes, you want the numbers to balance.
1222
00:45:48,960 –> 00:45:50,960
But the real product is the rationale.
1223
00:45:50,960 –> 00:45:52,960
Why this transaction matches that one?
1224
00:45:52,960 –> 00:45:54,560
Why this variance exists?
1225
00:45:54,560 –> 00:45:56,560
What policy clause allows the adjustment?
1226
00:45:56,560 –> 00:45:58,160
And who approved the exception?
1227
00:45:58,160 –> 00:45:59,840
Finance doesn’t just need an answer.
1228
00:45:59,840 –> 00:46:02,960
It needs an answer that can be re-performed under scrutiny.
1229
00:46:02,960 –> 00:46:04,800
That’s why assistance hits a ceiling here.
1230
00:46:04,800 –> 00:46:07,200
Copilot can draft a variance narrative faster.
1231
00:46:07,200 –> 00:46:08,480
It can summarize a spreadsheet.
1232
00:46:08,480 –> 00:46:10,560
It can help a controller write an email.
1233
00:46:10,560 –> 00:46:13,520
But it can’t, by itself, create an evidence chain
1234
00:46:13,520 –> 00:46:15,840
that an auditor can replay end to end.
1235
00:46:15,840 –> 00:46:17,200
And without that evidence chain,
1236
00:46:17,200 –> 00:46:18,960
autonomy is not automation.
1237
00:46:18,960 –> 00:46:20,000
It’s liability.
1238
00:46:20,000 –> 00:46:21,840
So the autonomy boundary in finance
1239
00:46:21,840 –> 00:46:23,840
has to be drawn differently than in IT.
1240
00:46:23,840 –> 00:46:26,320
In IT remediation, the boundaries usually
1241
00:46:26,320 –> 00:46:28,800
can the agent execute the runbook safely
1242
00:46:28,800 –> 00:46:30,640
and verify service health.
1243
00:46:30,640 –> 00:46:32,400
In finance, the boundary is,
1244
00:46:32,400 –> 00:46:34,000
can the agent justify the action
1245
00:46:34,000 –> 00:46:36,480
with grounded source references and policy alignment
1246
00:46:36,480 –> 00:46:38,960
before it touches anything that affects a ledger?
1247
00:46:38,960 –> 00:46:40,720
Because finance failures are quiet.
1248
00:46:40,720 –> 00:46:42,320
They don’t page you at 2 a.m.
1249
00:46:42,320 –> 00:46:44,320
and they show up months later in a room with lawyers.
1250
00:46:44,320 –> 00:46:46,640
The baseline close workflow looks like this.
1251
00:46:46,640 –> 00:46:49,600
Extract data, reconcil, resolve exceptions,
1252
00:46:49,600 –> 00:46:52,240
document rationale, get approvals,
1253
00:46:52,240 –> 00:46:54,720
post-adjustments, report.
1254
00:46:54,720 –> 00:46:56,800
Humans act as translators between systems
1255
00:46:56,800 –> 00:46:59,120
that don’t agree on identifiers, timestamps,
1256
00:46:59,120 –> 00:47:01,200
currencies, or the meaning of settled.
1257
00:47:01,200 –> 00:47:03,120
They also act as policy interpreters
1258
00:47:03,120 –> 00:47:05,840
because exception handling is where the judgment lives.
1259
00:47:05,840 –> 00:47:08,880
The agentic target outcome is not replace accountants.
1260
00:47:08,880 –> 00:47:11,440
The agentic target is shrink the exception queue
1261
00:47:11,440 –> 00:47:14,320
and turn routine matching into a deterministic pipeline.
1262
00:47:14,320 –> 00:47:16,320
That means the agent does three things well.
1263
00:47:16,320 –> 00:47:19,440
First, automated matching across known patterns.
1264
00:47:19,440 –> 00:47:22,000
Same vendor, same invoice ID, same amount,
1265
00:47:22,000 –> 00:47:23,440
predictable timing offsets.
1266
00:47:23,440 –> 00:47:25,280
This is boring work, but it’s high volume
1267
00:47:25,280 –> 00:47:26,880
and it’s where humans burn time
1268
00:47:26,880 –> 00:47:28,720
that should be spent on the weird cases.
1269
00:47:28,720 –> 00:47:31,360
Second, anomaly servicing with real triage.
1270
00:47:31,360 –> 00:47:33,520
Not here are 500 variances.
1271
00:47:33,520 –> 00:47:35,120
But here are the 12 that matter
1272
00:47:35,120 –> 00:47:36,160
with clustering.
1273
00:47:36,160 –> 00:47:38,720
Duplicates currency conversion discrepancies,
1274
00:47:38,720 –> 00:47:40,880
partial shipments, late postings,
1275
00:47:40,880 –> 00:47:42,720
missing purchase order references.
1276
00:47:42,720 –> 00:47:44,480
The value is not finding anomalies.
1277
00:47:44,480 –> 00:47:46,480
The value is reducing the search space.
1278
00:47:46,480 –> 00:47:49,120
Third, auto-resolution for known mismatch classes,
1279
00:47:49,120 –> 00:47:51,840
but only when the execution contract permits it.
1280
00:47:51,840 –> 00:47:53,920
For example, reclassifying transactions
1281
00:47:53,920 –> 00:47:55,760
that meet explicit criteria,
1282
00:47:55,760 –> 00:47:57,120
generating correcting entries
1283
00:47:57,120 –> 00:47:58,960
that are pre-approved under policy
1284
00:47:58,960 –> 00:48:00,720
or preparing a journal entry package
1285
00:48:00,720 –> 00:48:02,960
that is complete and ready for human approval
1286
00:48:02,960 –> 00:48:06,080
when the action crosses a sensitivity threshold.
1287
00:48:06,080 –> 00:48:08,880
And the blunt line for this section needs to land cleanly,
1288
00:48:08,880 –> 00:48:11,840
autonomy that can’t explain itself is not automation,
1289
00:48:11,840 –> 00:48:12,960
it’s liability.
1290
00:48:12,960 –> 00:48:15,200
Because a finance agent that says trust me
1291
00:48:15,200 –> 00:48:18,240
is just a faster way to create untraceable adjustments.
1292
00:48:18,240 –> 00:48:20,560
The agent must behave like a disciplined analyst.
1293
00:48:20,560 –> 00:48:22,160
Every number tied to a source.
1294
00:48:22,160 –> 00:48:23,760
Every transformation documented,
1295
00:48:23,760 –> 00:48:25,200
every decision bound to policy
1296
00:48:25,200 –> 00:48:27,280
and every action gated by approval
1297
00:48:27,280 –> 00:48:29,840
when the consequences exceed the autonomy boundary.
1298
00:48:29,840 –> 00:48:31,920
Now, let’s be precise about systems
1299
00:48:31,920 –> 00:48:34,240
touched without pretending we’re doing a product tour.
1300
00:48:34,240 –> 00:48:36,480
Finance reconciliation in a Microsoft enterprise
1301
00:48:36,480 –> 00:48:38,320
will touch at least three surfaces.
1302
00:48:38,320 –> 00:48:40,320
The system of record, the collaboration layer,
1303
00:48:40,320 –> 00:48:41,680
and identity context.
1304
00:48:41,680 –> 00:48:44,880
The system of record is your ERP and its satellites,
1305
00:48:44,880 –> 00:48:48,000
where transactions live and where adjustments ultimately land.
1306
00:48:48,000 –> 00:48:50,960
The collaboration layer is Microsoft 365.
1307
00:48:50,960 –> 00:48:53,680
Excel files, SharePoint or OneDrive stores,
1308
00:48:53,680 –> 00:48:56,080
Teams conversations, email threads that become
1309
00:48:56,080 –> 00:48:58,400
approvals in practice even when they shouldn’t.
1310
00:48:58,400 –> 00:49:00,320
An identity context is Entra,
1311
00:49:00,320 –> 00:49:01,840
who is authorized to view,
1312
00:49:01,840 –> 00:49:03,360
who is authorized to propose,
1313
00:49:03,360 –> 00:49:04,480
who is authorized to post,
1314
00:49:04,480 –> 00:49:07,440
and what segregation of duties rules must remain true,
1315
00:49:07,440 –> 00:49:09,360
even when an agent is doing the legwork.
1316
00:49:09,360 –> 00:49:11,920
And this is where the autonomy stack becomes unavoidable.
1317
00:49:11,920 –> 00:49:13,280
Events are the close calendar,
1318
00:49:13,280 –> 00:49:14,320
the arrival of feeds,
1319
00:49:14,320 –> 00:49:16,560
the detection of variances beyond tolerance,
1320
00:49:16,560 –> 00:49:18,800
reasoning is classification and policy mapping.
1321
00:49:18,800 –> 00:49:21,360
Orchestration is dispatching specialized matches
1322
00:49:21,360 –> 00:49:22,560
and anomaly agents.
1323
00:49:22,560 –> 00:49:24,560
Action is creating the adjustment package
1324
00:49:24,560 –> 00:49:26,880
or posting within a constrained scope if allowed.
1325
00:49:26,880 –> 00:49:29,280
Evidence is the entire point.
1326
00:49:29,280 –> 00:49:33,280
A replayable reconciliation run that survives hostile review.
1327
00:49:33,280 –> 00:49:35,680
So scenario two sets up the real enterprise question.
1328
00:49:35,680 –> 00:49:37,200
IT autonomy fails loudly.
1329
00:49:37,200 –> 00:49:39,200
Finance autonomy fails quietly.
1330
00:49:39,200 –> 00:49:40,400
And that’s why in the next section,
1331
00:49:40,400 –> 00:49:42,160
the product we design isn’t the agent.
1332
00:49:42,160 –> 00:49:43,520
It’s the audit trail.
1333
00:49:43,520 –> 00:49:45,920
Scenario two, evidence first design.
1334
00:49:45,920 –> 00:49:47,760
Audit trails as the product.
1335
00:49:47,760 –> 00:49:50,880
Finance autonomy only works when the evidence trail is treated
1336
00:49:50,880 –> 00:49:53,600
as a first class deliverable, not a side effect.
1337
00:49:53,600 –> 00:49:55,280
Most implementations do the opposite.
1338
00:49:55,280 –> 00:49:56,880
They build the reconciliation logic,
1339
00:49:56,880 –> 00:49:58,160
they wire up the connectors,
1340
00:49:58,160 –> 00:50:00,080
they generate a looks right summary,
1341
00:50:00,080 –> 00:50:01,280
and then someone says,
1342
00:50:01,280 –> 00:50:02,800
“We’ll add audit later.”
1343
00:50:02,800 –> 00:50:04,720
Audit later is how you end up with an agent
1344
00:50:04,720 –> 00:50:06,800
that can move numbers without leaving fingerprints.
1345
00:50:06,800 –> 00:50:07,920
That is not innovation,
1346
00:50:07,920 –> 00:50:09,920
that is a governance incident with better branding.
1347
00:50:09,920 –> 00:50:12,560
So in this scenario, the product is the audit trail.
1348
00:50:12,560 –> 00:50:14,800
The reconciliation result is just the byproduct
1349
00:50:14,800 –> 00:50:16,080
that makes finance care.
1350
00:50:16,080 –> 00:50:17,920
Start with the required artifacts
1351
00:50:17,920 –> 00:50:20,080
because finance doesn’t accept vibes as proof.
1352
00:50:20,080 –> 00:50:24,160
Every matched or adjusted item needs source references,
1353
00:50:24,160 –> 00:50:25,440
the transformations applied,
1354
00:50:25,440 –> 00:50:27,200
the rationale and the approval context,
1355
00:50:27,200 –> 00:50:30,080
not as pros, as structured linkable objects,
1356
00:50:30,080 –> 00:50:31,360
a bank line item ID,
1357
00:50:31,360 –> 00:50:32,800
an ERP document ID,
1358
00:50:32,800 –> 00:50:34,800
a file hash or SharePoint version ID
1359
00:50:34,800 –> 00:50:36,160
for the supporting schedule,
1360
00:50:36,160 –> 00:50:37,760
and a pointer to the policy clause
1361
00:50:37,760 –> 00:50:39,280
that authorizes the treatment.
1362
00:50:39,280 –> 00:50:41,840
If the agent can’t point back to exactly what it used,
1363
00:50:41,840 –> 00:50:43,680
it can’t claim it reconciled anything.
1364
00:50:43,680 –> 00:50:46,160
It just predicted what reconciliation might look like.
1365
00:50:46,160 –> 00:50:47,200
That distinction matters
1366
00:50:47,200 –> 00:50:50,320
because large language models are inherently probabilistic.
1367
00:50:50,320 –> 00:50:52,000
They generate plausible explanations
1368
00:50:52,000 –> 00:50:55,040
unless you force them to operate under grounding constraints.
1369
00:50:55,040 –> 00:50:57,200
In finance, plausible is the enemy.
1370
00:50:57,200 –> 00:50:59,360
So grounding discipline becomes non-negotiable.
1371
00:50:59,360 –> 00:51:00,800
This is not a web search problem.
1372
00:51:00,800 –> 00:51:04,080
This is not, let’s ask the internet what GAAP says.
1373
00:51:04,080 –> 00:51:07,440
The agent must operate on controlled enterprise data sources
1374
00:51:07,440 –> 00:51:09,200
with deterministic access boundaries.
1375
00:51:09,200 –> 00:51:10,960
If the system of record is the ERP,
1376
00:51:10,960 –> 00:51:12,640
then the agent reads the ERP.
1377
00:51:12,640 –> 00:51:15,280
If supporting documentation lives in SharePoint,
1378
00:51:15,280 –> 00:51:16,960
then it reads specific libraries
1379
00:51:16,960 –> 00:51:19,360
with specific labels under specific scopes.
1380
00:51:19,360 –> 00:51:20,720
And when it produces a narrative,
1381
00:51:20,720 –> 00:51:21,760
it cites those sources
1382
00:51:21,760 –> 00:51:23,360
like a hostile reviewer will check them
1383
00:51:23,360 –> 00:51:24,160
because they will.
1384
00:51:24,160 –> 00:51:27,200
Now, orchestration finance reconciliation looks like one workflow,
1385
00:51:27,200 –> 00:51:29,440
but it’s really a set of specialist behaviors
1386
00:51:29,440 –> 00:51:31,200
coordinated under a strict contract.
1387
00:51:31,200 –> 00:51:33,840
You typically want at least three conceptual agents,
1388
00:51:33,840 –> 00:51:37,120
even if they’re implemented as one service, a matching specialist.
1389
00:51:37,120 –> 00:51:40,160
It performs deterministic joins and pattern matches
1390
00:51:40,160 –> 00:51:42,640
with tolerances and rules that are explicit.
1391
00:51:42,640 –> 00:51:44,480
It should prefer boring, explainable logic
1392
00:51:44,480 –> 00:51:46,000
over reasoning whenever possible
1393
00:51:46,000 –> 00:51:49,040
because deterministic matching produces auditability by default.
1394
00:51:49,040 –> 00:51:51,360
An anomaly specialist.
1395
00:51:51,360 –> 00:51:54,000
It clusters exceptions into known classes,
1396
00:51:54,000 –> 00:51:56,560
prioritizes by materiality and risk,
1397
00:51:56,560 –> 00:51:59,440
and flags what cannot be resolved automatically.
1398
00:51:59,440 –> 00:52:02,160
The goal is not to generate a longer exception list.
1399
00:52:02,160 –> 00:52:04,880
The goal is to reduce the controller’s search space.
1400
00:52:04,880 –> 00:52:06,080
A policy specialist.
1401
00:52:06,080 –> 00:52:08,560
It maps proposed adjustments to policy.
1402
00:52:08,560 –> 00:52:10,720
Sagregation of duties, approval thresholds,
1403
00:52:10,720 –> 00:52:13,600
materiality rules, and whatever your organization enforces.
1404
00:52:13,600 –> 00:52:15,520
This is where the autonomy boundary lives.
1405
00:52:15,520 –> 00:52:18,000
In finance, the system can propose broadly,
1406
00:52:18,000 –> 00:52:20,000
but it can only execute narrowly
1407
00:52:20,000 –> 00:52:22,480
and only with the approvals the policy requires.
1408
00:52:22,480 –> 00:52:24,080
Then a coordinator ties them together
1409
00:52:24,080 –> 00:52:25,440
and produces a run artifact,
1410
00:52:25,440 –> 00:52:27,360
and that run artifact has to be replayable.
1411
00:52:27,360 –> 00:52:29,360
Replayability is the thing most teams skip
1412
00:52:29,360 –> 00:52:30,800
because it feels like extra work.
1413
00:52:30,800 –> 00:52:31,760
It is not extra work.
1414
00:52:31,760 –> 00:52:34,640
It is the only mechanism that converts agent output
1415
00:52:34,640 –> 00:52:37,040
into operationally defensible automation.
1416
00:52:37,040 –> 00:52:39,440
Replay means you can take the same inputs,
1417
00:52:39,440 –> 00:52:40,800
the same source extracts,
1418
00:52:40,800 –> 00:52:43,600
the same versions of files, the same policy rule set,
1419
00:52:43,600 –> 00:52:46,160
and rerun the logic to get the same outcome.
1420
00:52:46,160 –> 00:52:48,960
Or if the outcome changes, you can prove why.
1421
00:52:48,960 –> 00:52:51,600
A data change, a policy change, or a toolversion change
1422
00:52:51,600 –> 00:52:53,840
without replay post-mortems become storytelling.
1423
00:52:53,840 –> 00:52:55,440
Finance doesn’t tolerate storytelling.
1424
00:52:55,440 –> 00:52:56,720
So what does the agent produce?
1425
00:52:56,720 –> 00:52:59,200
It produces variance packs and exception cues
1426
00:52:59,200 –> 00:53:01,840
that look like finance work product, not AI output.
1427
00:53:01,840 –> 00:53:04,560
A variance pack that includes the matched sets,
1428
00:53:04,560 –> 00:53:08,080
the unmatched sets, the transformation steps, and the rationale.
1429
00:53:08,080 –> 00:53:10,640
An exception cue that includes reason codes,
1430
00:53:10,640 –> 00:53:12,160
suggested remediation steps,
1431
00:53:12,160 –> 00:53:14,720
and the minimum approval required to resolve it.
1432
00:53:14,720 –> 00:53:16,800
And it produces controller-ready narratives
1433
00:53:16,800 –> 00:53:18,080
that are grounded.
1434
00:53:18,080 –> 00:53:20,240
Every claim backed by a linked source reference.
1435
00:53:20,240 –> 00:53:23,200
Now metrics because you’ll be asked to justify this.
1436
00:53:23,200 –> 00:53:25,600
Time to close for the reconciliation cycle is obvious.
1437
00:53:25,600 –> 00:53:26,400
But it’s not enough.
1438
00:53:26,400 –> 00:53:28,960
You track error rate versus human baselines,
1439
00:53:28,960 –> 00:53:31,760
because autonomy that is faster but wrong is not autonomy.
1440
00:53:31,760 –> 00:53:33,440
You track exception backlog aging
1441
00:53:33,440 –> 00:53:35,440
because the goal is to shrink the long tail
1442
00:53:35,440 –> 00:53:37,600
that drags close past the calendar.
1443
00:53:37,600 –> 00:53:39,200
And you track intervention rate.
1444
00:53:39,200 –> 00:53:41,520
How often did humans have to rewrite the rationale?
1445
00:53:41,520 –> 00:53:42,720
Not just approve the package.
1446
00:53:42,720 –> 00:53:44,240
Because if humans keep rewriting it,
1447
00:53:44,240 –> 00:53:46,000
you didn’t automate reconciliation.
1448
00:53:46,000 –> 00:53:47,680
You automated draft generation.
1449
00:53:47,680 –> 00:53:50,080
And once you build evidence first,
1450
00:53:50,080 –> 00:53:51,600
you also get a hidden benefit.
1451
00:53:51,600 –> 00:53:52,960
Blast radius containment.
1452
00:53:52,960 –> 00:53:54,800
If every action is tied to a policy clause
1453
00:53:54,800 –> 00:53:56,080
and an approval state,
1454
00:53:56,080 –> 00:53:58,880
the system can’t quietly just post the entry.
1455
00:53:58,880 –> 00:54:00,480
It either has the authority and evidence
1456
00:54:00,480 –> 00:54:02,320
or it escalates with a complete package.
1457
00:54:02,320 –> 00:54:04,960
That’s the autonomy boundary, but finance flavoured.
1458
00:54:04,960 –> 00:54:07,600
And it’s the only version that survives audit season.
1459
00:54:07,600 –> 00:54:09,120
Scenario three, setup.
1460
00:54:09,120 –> 00:54:11,680
Security incident triage without SOC collapse.
1461
00:54:11,680 –> 00:54:14,240
Security is where autonomy stops being a throughput discussion
1462
00:54:14,240 –> 00:54:16,000
and becomes an adversarial one.
1463
00:54:16,000 –> 00:54:18,400
IT remediation fights entropy.
1464
00:54:18,400 –> 00:54:19,920
Finance fights scrutiny.
1465
00:54:19,920 –> 00:54:22,080
Security fights an opponent that adapts.
1466
00:54:22,080 –> 00:54:26,160
And that’s why SOC collapse is the most honest autonomy use case you can pick.
1467
00:54:26,160 –> 00:54:29,520
Because the baseline operating model is already broken in most enterprises,
1468
00:54:29,520 –> 00:54:32,080
alert volume grows faster than analyst headcount.
1469
00:54:32,080 –> 00:54:33,520
Fidelity stays mediocre.
1470
00:54:33,520 –> 00:54:37,200
And every new tool adds another stream of signals that mostly become noise.
1471
00:54:37,200 –> 00:54:40,160
So analysts spend their day rooting, enriching,
1472
00:54:40,160 –> 00:54:42,960
and writing summaries that don’t prevent the next incident.
1473
00:54:42,960 –> 00:54:43,920
The queue doesn’t shrink.
1474
00:54:43,920 –> 00:54:45,280
It churns.
1475
00:54:45,280 –> 00:54:46,640
Defender produces alerts.
1476
00:54:46,640 –> 00:54:48,080
Sentinel produces incidents.
1477
00:54:48,080 –> 00:54:50,000
Identity produces risk events.
1478
00:54:50,000 –> 00:54:51,840
Endpoint telemetry produces anomalies.
1479
00:54:51,840 –> 00:54:53,280
Cloud produces activity logs.
1480
00:54:53,280 –> 00:54:54,800
None of those are inherently wrong.
1481
00:54:54,800 –> 00:54:57,120
The failure is the human bottleneck in the middle.
1482
00:54:57,120 –> 00:55:00,640
A small team forced to do correlation and enrichment manually
1483
00:55:00,640 –> 00:55:04,240
at the exact moment the environment requires speed and consistency.
1484
00:55:04,240 –> 00:55:06,640
So the baseline workflow looks like this.
1485
00:55:06,640 –> 00:55:10,320
Triage, enrich, correlate, decide, contain, document.
1486
00:55:10,320 –> 00:55:12,080
And everyone pretends it’s a linear process.
1487
00:55:12,080 –> 00:55:12,560
It isn’t.
1488
00:55:12,560 –> 00:55:13,360
It’s a loop.
1489
00:55:13,360 –> 00:55:16,400
Analysts, bounds between portals, copy identifiers,
1490
00:55:16,400 –> 00:55:20,320
search for context, and rebuild the same mental model of what happened every time.
1491
00:55:20,320 –> 00:55:22,320
The attacker gets parallelism.
1492
00:55:22,320 –> 00:55:24,000
The defenders get a ticketing queue.
1493
00:55:24,000 –> 00:55:25,360
That asymmetry is the point.
1494
00:55:25,360 –> 00:55:27,600
So when people ask what autonomy is good for,
1495
00:55:27,600 –> 00:55:29,440
security has the cleanest answer.
1496
00:55:29,440 –> 00:55:32,400
Autonomy buys you parallelism under policy.
1497
00:55:32,400 –> 00:55:34,720
It lets you do the mechanical work at machine speed,
1498
00:55:34,720 –> 00:55:37,520
correlation, enrichment, scoping, and low risk containment.
1499
00:55:37,520 –> 00:55:42,080
So humans spend their limited attention on the weird cases that actually require judgment.
1500
00:55:42,080 –> 00:55:44,800
But the autonomy boundary here is brutally non-negotiable.
1501
00:55:44,800 –> 00:55:47,200
A security agent doesn’t get to improvise containment.
1502
00:55:47,200 –> 00:55:48,640
It doesn’t get to try something.
1503
00:55:48,640 –> 00:55:52,240
It doesn’t get to block identities or isolate devices because it feels right.
1504
00:55:52,240 –> 00:55:55,200
It acts only under policy with pre-approved actions,
1505
00:55:55,200 –> 00:55:58,720
bounded scopes, and evidence thresholds that are defined ahead of time.
1506
00:55:58,720 –> 00:56:01,280
Otherwise, you build the most dangerous thing possible.
1507
00:56:01,280 –> 00:56:04,320
An actor in your tenant with the power to disrupt business operations
1508
00:56:04,320 –> 00:56:06,960
guided by probabilistic reasoning during high stress.
1509
00:56:06,960 –> 00:56:10,160
So the agentic objective in this scenario is narrow by design.
1510
00:56:10,160 –> 00:56:12,640
Correlate alerts into coherent narratives.
1511
00:56:12,640 –> 00:56:15,760
Assess blast radius with real signals, not vibes.
1512
00:56:15,760 –> 00:56:19,440
Contain low to medium risk incidents where policy already defines the response
1513
00:56:19,440 –> 00:56:22,640
and generate investigation summaries that humans can trust and replay.
1514
00:56:22,640 –> 00:56:25,040
That means the agent becomes a triage engine
1515
00:56:25,040 –> 00:56:28,480
and a response executor for the boring repeatable cases.
1516
00:56:28,480 –> 00:56:31,600
Suspicious sign-ins with clear identity risk signals,
1517
00:56:31,600 –> 00:56:35,200
commodity malware on endpoints where isolation is already standard,
1518
00:56:35,200 –> 00:56:38,160
impossible travel combined with high confidence fishing,
1519
00:56:38,160 –> 00:56:41,280
known bad tokens, known bad device posture.
1520
00:56:41,280 –> 00:56:44,640
It handles the class of incidents where humans currently waste time
1521
00:56:44,640 –> 00:56:47,120
doing the same steps and it escalates everything else.
1522
00:56:47,120 –> 00:56:50,320
Now the thing most people miss is that security autonomy fails first
1523
00:56:50,320 –> 00:56:52,080
when identities and afterthought,
1524
00:56:52,080 –> 00:56:55,360
because containment is mostly identity and access control actions.
1525
00:56:55,360 –> 00:56:58,560
Revoke sessions, reset passwords, disable accounts,
1526
00:56:58,560 –> 00:57:01,040
block tokens, tighten conditional access,
1527
00:57:01,040 –> 00:57:03,040
remove risky app consent.
1528
00:57:03,040 –> 00:57:05,280
If you can’t express those actions as bounded,
1529
00:57:05,280 –> 00:57:08,480
auditable operations under explicit identity constraints,
1530
00:57:08,480 –> 00:57:10,080
you don’t have autonomous response.
1531
00:57:10,080 –> 00:57:11,440
You have automated self-harm.
1532
00:57:11,440 –> 00:57:13,600
So you need a hard boundary.
1533
00:57:13,600 –> 00:57:15,280
The agent can recommend broadly,
1534
00:57:15,280 –> 00:57:18,480
but it can only execute in the lanes you’ve made deterministic.
1535
00:57:18,480 –> 00:57:22,240
And the evidence requirement must be higher than the model is confident.
1536
00:57:22,240 –> 00:57:24,800
It has to be these signals match this response class
1537
00:57:24,800 –> 00:57:27,280
under this policy clause within this scope.
1538
00:57:27,280 –> 00:57:29,200
And the payoff signal for the audience is simple.
1539
00:57:29,200 –> 00:57:31,200
The problem isn’t building the containment action.
1540
00:57:31,200 –> 00:57:32,480
Microsoft gives you actions.
1541
00:57:32,480 –> 00:57:35,600
The problem is deciding when the system is allowed to execute them,
1542
00:57:35,600 –> 00:57:38,400
under which identity and how you prove it didn’t overreach.
1543
00:57:38,400 –> 00:57:42,000
Because the SOC doesn’t get judged by how fast it can generate a summary,
1544
00:57:42,000 –> 00:57:44,000
it gets judged by whether it contained the right thing
1545
00:57:44,000 –> 00:57:45,120
without breaking the business.
1546
00:57:45,120 –> 00:57:49,040
So this scenario is where the autonomy stack becomes visibly real.
1547
00:57:49,040 –> 00:57:51,280
Event ingestion is alerts and incidents,
1548
00:57:51,280 –> 00:57:54,240
reasoning is correlation and classification under policy.
1549
00:57:54,240 –> 00:57:58,160
Orchestration is too rooting across defenders, sentinel and entra,
1550
00:57:58,160 –> 00:58:00,560
action is containment with bounded permissions,
1551
00:58:00,560 –> 00:58:02,720
and evidence is the investigation record
1552
00:58:02,720 –> 00:58:05,280
that ties every step back to signals and policy.
1553
00:58:05,280 –> 00:58:08,000
And in the next section, we map it as an enforcement graph.
1554
00:58:08,000 –> 00:58:10,560
Defender detects, sentinel correlates,
1555
00:58:10,560 –> 00:58:12,240
entra enforces.
1556
00:58:12,240 –> 00:58:14,800
If those three aren’t wired into a coherent control plane,
1557
00:58:14,800 –> 00:58:17,760
autonomy won’t save the SOC, it will just accelerate the chaos.
1558
00:58:17,760 –> 00:58:21,520
Scenario three, system flow, defender plus sentinel
1559
00:58:21,520 –> 00:58:23,120
plus entra as enforcement graph.
1560
00:58:23,120 –> 00:58:24,880
If scenario three is going to work,
1561
00:58:24,880 –> 00:58:26,560
it needs a real system flow.
1562
00:58:26,560 –> 00:58:28,160
Not the agent checks defender,
1563
00:58:28,160 –> 00:58:29,680
not it uses sentinel.
1564
00:58:29,680 –> 00:58:32,880
A flow where each product plays its actual role in the enterprise,
1565
00:58:32,880 –> 00:58:36,320
defender a signal source, sentinel as correlation and case management,
1566
00:58:36,320 –> 00:58:40,400
entra as the enforcement graph that turns decisions into bounded actions.
1567
00:58:40,400 –> 00:58:41,520
Start with ingestion.
1568
00:58:41,520 –> 00:58:46,880
Defender for endpoint and defender for office generate alerts with raw artifacts,
1569
00:58:46,880 –> 00:58:51,120
device IDs, user principles, process hashes, URLs, mailbox activity,
1570
00:58:51,120 –> 00:58:53,360
and whatever else the detection contains.
1571
00:58:53,360 –> 00:58:57,280
Sentinel ingests those alerts and also brings in everything defender doesn’t own.
1572
00:58:57,280 –> 00:59:00,480
Cloud activity logs, firewall events, identity risk events,
1573
00:59:00,480 –> 00:59:02,400
and third party sources if you have them.
1574
00:59:02,400 –> 00:59:05,040
The agent doesn’t treat this as “more data.”
1575
00:59:05,040 –> 00:59:06,800
It treats it as a graph problem,
1576
00:59:06,800 –> 00:59:09,280
which entities are involved, what relationships exist,
1577
00:59:09,280 –> 00:59:10,640
and what changed recently.
1578
00:59:10,640 –> 00:59:13,360
So the first move in the flow is normalization into entities.
1579
00:59:13,360 –> 00:59:16,640
User device app mailbox IP token session tenant resource.
1580
00:59:16,640 –> 00:59:19,200
If the system can’t map the alert to entities,
1581
00:59:19,200 –> 00:59:20,720
it should not execute anything.
1582
00:59:20,720 –> 00:59:23,840
It should escalate for human triage because it can’t bound scope.
1583
00:59:23,840 –> 00:59:25,840
Containment without scope is just disruption.
1584
00:59:25,840 –> 00:59:30,080
Then comes reasoning, correlation and blast radius estimation.
1585
00:59:30,080 –> 00:59:31,600
This is where sentinel earns its role.
1586
00:59:31,600 –> 00:59:34,400
Sentinel already builds incidents and correlates signals.
1587
00:59:34,400 –> 00:59:36,800
The agent’s job is to query that correlation layer,
1588
00:59:36,800 –> 00:59:38,480
not to reinvent it with reasoning.
1589
00:59:38,480 –> 00:59:41,040
It should pull the incident graph,
1590
00:59:41,040 –> 00:59:45,040
related alerts, linked entities, timeline, known tactics,
1591
00:59:45,040 –> 00:59:46,640
and severity context.
1592
00:59:46,640 –> 00:59:48,960
Then it applies an execution contract decision.
1593
00:59:48,960 –> 00:59:52,000
Does this incident class have an approved autonomous response path?
1594
00:59:52,000 –> 00:59:54,480
That decision is not a vibe check, it’s policy.
1595
00:59:54,480 –> 00:59:59,840
Low to medium risk classes with clear response playbooks can be eligible.
1596
00:59:59,840 –> 01:00:02,800
Revoke sessions for a confirmed, risky sign-in,
1597
01:00:02,800 –> 01:00:05,680
isolated device for a high confidence malware alert,
1598
01:00:05,680 –> 01:00:09,120
block a known malicious URL through your existing controls,
1599
01:00:09,120 –> 01:00:12,960
disable a specific OAuth consent that matches a known bad pattern.
1600
01:00:12,960 –> 01:00:17,440
High risk or ambiguous cases get escalated with a complete evidence bundle.
1601
01:00:17,440 –> 01:00:19,760
Now orchestration tool routing.
1602
01:00:19,760 –> 01:00:23,680
This is the part that separates agent as chat from agent as system.
1603
01:00:23,680 –> 01:00:27,680
The agent routes work across a set of tools that already exist.
1604
01:00:27,680 –> 01:00:30,480
Defender APIs for endpoint and email actions,
1605
01:00:30,480 –> 01:00:33,440
Sentinel automation rules or playbooks for workflow,
1606
01:00:33,440 –> 01:00:35,280
Entra for identity enforcement,
1607
01:00:35,280 –> 01:00:37,600
and graph for communications and ticketing.
1608
01:00:37,600 –> 01:00:39,920
The key is that orchestration must be deterministic
1609
01:00:39,920 –> 01:00:42,560
about which tool is authoritative for which action.
1610
01:00:42,560 –> 01:00:45,280
You don’t revoke sessions through a random connector
1611
01:00:45,280 –> 01:00:46,880
if Entra is the enforcement point.
1612
01:00:46,880 –> 01:00:49,280
You don’t isolate devices through a custom script
1613
01:00:49,280 –> 01:00:52,720
if Defender already provides the actuator and the audit trail.
1614
01:00:52,720 –> 01:00:54,960
Orchestration chooses the canonical actuator
1615
01:00:54,960 –> 01:00:57,120
because that’s how you get predictable logs
1616
01:00:57,120 –> 01:00:58,480
and predictable rollback.
1617
01:00:58,480 –> 01:01:01,040
Then we hit action and action should come in two tiers.
1618
01:01:01,040 –> 01:01:02,880
Containment and coordination.
1619
01:01:02,880 –> 01:01:05,760
Containment actions are the hard ones, session revoke,
1620
01:01:05,760 –> 01:01:09,920
password reset initiation, user disablement in narrow conditions,
1621
01:01:09,920 –> 01:01:13,680
device isolation, token blocking, OAuth app consent removal
1622
01:01:13,680 –> 01:01:16,000
and conditional access response patterns.
1623
01:01:16,000 –> 01:01:19,040
Coordination actions are everything that keeps humans aligned.
1624
01:01:19,040 –> 01:01:21,120
Create or update the Sentinel incident,
1625
01:01:21,120 –> 01:01:23,600
open the ITSM ticket if that’s your process,
1626
01:01:23,600 –> 01:01:25,680
notify the SOC channel in Teams,
1627
01:01:25,680 –> 01:01:28,240
and ping an on-call human only when thresholds say
1628
01:01:28,240 –> 01:01:29,840
the agent can’t close the loop.
1629
01:01:29,840 –> 01:01:32,240
Now the enforcement graph, Entra as the choke point,
1630
01:01:32,240 –> 01:01:34,320
this is where people get comfortable and then get hurt.
1631
01:01:34,320 –> 01:01:37,200
They treat Entra as identity, meaning login and users.
1632
01:01:37,200 –> 01:01:40,400
In reality, it is the decision engine for access across the tenant.
1633
01:01:40,400 –> 01:01:42,080
When the agent takes action, it should do it
1634
01:01:42,080 –> 01:01:43,760
through Entra controlled mechanisms,
1635
01:01:43,760 –> 01:01:47,200
revoking sessions, blocking sign-ins through conditional access,
1636
01:01:47,200 –> 01:01:50,400
where appropriate, adjusting entitlements through scoped rolls
1637
01:01:50,400 –> 01:01:53,600
and ensuring the agent identity itself remains constrained.
1638
01:01:53,600 –> 01:01:56,160
And every action must run as a non-human principle
1639
01:01:56,160 –> 01:01:58,320
with explicit permissions, not global admin,
1640
01:01:58,320 –> 01:02:00,880
not security administrator because it was easier.
1641
01:02:00,880 –> 01:02:03,520
The system should have separate execution identities
1642
01:02:03,520 –> 01:02:05,280
for separate action classes,
1643
01:02:05,280 –> 01:02:07,440
because the moment one identity can do everything,
1644
01:02:07,440 –> 01:02:09,760
the blast radius becomes the entire tenant.
1645
01:02:09,760 –> 01:02:12,160
Again, worm mechanics, just in a blazer.
1646
01:02:12,160 –> 01:02:13,280
Finally, evidence.
1647
01:02:13,280 –> 01:02:16,720
Every run produces a replayable record.
1648
01:02:16,720 –> 01:02:19,360
The alert IDs, incident IDs, entity graph,
1649
01:02:19,360 –> 01:02:21,520
the policy clause that authorised action,
1650
01:02:21,520 –> 01:02:24,240
the exact tool calls the parameters, the identity used,
1651
01:02:24,240 –> 01:02:27,360
the verification checks, and the final state change.
1652
01:02:27,360 –> 01:02:28,960
And verification matters here.
1653
01:02:28,960 –> 01:02:32,720
Session revoked, confirmed, device isolation state confirmed,
1654
01:02:32,720 –> 01:02:36,400
sign-in-risk-reduced confirmed, incident status updated confirmed.
1655
01:02:36,400 –> 01:02:40,160
A verification fails, the system doesn’t try harder indefinitely.
1656
01:02:40,160 –> 01:02:42,640
It escalates with the evidence bundle and it stops.
1657
01:02:42,640 –> 01:02:45,840
So the system flow is simple to say, but hard to implement cleanly.
1658
01:02:45,840 –> 01:02:48,160
Defender detects sentinel correlates,
1659
01:02:48,160 –> 01:02:52,480
entra enforces, the agent sits in the middle as an orchestrator under contract.
1660
01:02:52,480 –> 01:02:54,880
If you can’t draw that graph and name the boundaries,
1661
01:02:54,880 –> 01:02:56,320
you don’t have autonomous triage,
1662
01:02:56,320 –> 01:02:58,560
you have conditional chaos with security branding.
1663
01:02:58,560 –> 01:03:03,440
The limiting factor, identity debt and authorisation sprawl.
1664
01:03:03,440 –> 01:03:06,720
All three scenarios hit the same wall and it’s not model quality,
1665
01:03:06,720 –> 01:03:09,600
it’s not agent memory, it’s not orchestration patterns,
1666
01:03:09,600 –> 01:03:12,800
it’s identity debt, identity debt is the inevitable accumulation
1667
01:03:12,800 –> 01:03:14,720
of non-human operators and entitlements
1668
01:03:14,720 –> 01:03:16,960
that your organization cannot explain anymore,
1669
01:03:16,960 –> 01:03:18,880
but still depends on to function.
1670
01:03:18,880 –> 01:03:22,080
Service principles manage identities, app registrations,
1671
01:03:22,080 –> 01:03:24,240
connector identities, delegated permissions,
1672
01:03:24,240 –> 01:03:26,960
certificates, secrets, conditional access exceptions,
1673
01:03:26,960 –> 01:03:30,960
break glass accounts, temporary admin roles that never got removed.
1674
01:03:30,960 –> 01:03:34,000
This clicked for a lot of architects when agents showed up
1675
01:03:34,000 –> 01:03:35,840
because agents don’t just consume permissions,
1676
01:03:35,840 –> 01:03:37,040
they operationalize them.
1677
01:03:37,040 –> 01:03:39,920
A human with broad access is a risk,
1678
01:03:39,920 –> 01:03:43,440
but it’s a bounded risk, attention, fatigue, and work hours,
1679
01:03:43,440 –> 01:03:44,720
limit blast radius.
1680
01:03:44,720 –> 01:03:47,520
An autonomous executor with broad access is different,
1681
01:03:47,520 –> 01:03:50,560
it can apply that access continuously in parallel
1682
01:03:50,560 –> 01:03:53,680
and without the psychological friction that makes humans hesitate,
1683
01:03:53,680 –> 01:03:56,480
so identity debt is not accidental, it is guaranteed.
1684
01:03:56,480 –> 01:04:00,400
Autonomy makes it visible because it forces you to name the actor.
1685
01:04:00,400 –> 01:04:03,760
Every time you build an agent that does things,
1686
01:04:03,760 –> 01:04:05,280
you must pick an identity,
1687
01:04:05,280 –> 01:04:08,080
and every identity you add expands the authorisation graph,
1688
01:04:08,080 –> 01:04:11,280
new assignments, new scopes, new conditional logic, new exceptions,
1689
01:04:11,280 –> 01:04:12,720
these pathways accumulate.
1690
01:04:13,360 –> 01:04:15,440
This is the foundational misunderstanding.
1691
01:04:15,440 –> 01:04:18,560
Most organizations still treat Entra as an identity provider.
1692
01:04:18,560 –> 01:04:19,520
They are wrong.
1693
01:04:19,520 –> 01:04:22,960
In architectural terms, Entra is a distributed decision engine.
1694
01:04:22,960 –> 01:04:25,600
It continuously compiles policy, role assignments,
1695
01:04:25,600 –> 01:04:27,840
device posture, risk signals, token claims,
1696
01:04:27,840 –> 01:04:30,960
and application constraints into real-time authorisation outcomes.
1697
01:04:30,960 –> 01:04:32,480
And once you introduce agents,
1698
01:04:32,480 –> 01:04:35,440
you’re feeding that engine a new species of principle,
1699
01:04:35,440 –> 01:04:38,400
non-human actors that behave like staff but scale like software.
1700
01:04:38,400 –> 01:04:41,280
That distinction matters because the enterprise typically governs
1701
01:04:41,280 –> 01:04:43,680
human identities with social process.
1702
01:04:43,680 –> 01:04:47,200
Onboarding, role changes, manager approvals, quarterly reviews.
1703
01:04:47,200 –> 01:04:50,640
It governs app identities with whatever happened during the project.
1704
01:04:50,640 –> 01:04:52,160
That’s where identity debt comes from,
1705
01:04:52,160 –> 01:04:54,800
not misconfiguration, design or mission.
1706
01:04:54,800 –> 01:04:57,600
Now add authorisation sprawl.
1707
01:04:57,600 –> 01:04:59,760
Autonomous work is rarely one permission.
1708
01:04:59,760 –> 01:05:00,960
It’s a multi-step chain,
1709
01:05:00,960 –> 01:05:04,560
read telemetry, update a ticket, pull a file, call an API,
1710
01:05:04,560 –> 01:05:08,000
write a config change, post a notification, verify health.
1711
01:05:08,000 –> 01:05:09,520
Each step has a permission surface,
1712
01:05:09,520 –> 01:05:11,680
and you have to grant enough capability for the agent
1713
01:05:11,680 –> 01:05:12,960
to complete the chain.
1714
01:05:12,960 –> 01:05:16,400
Over time, the safest path becomes just give it a bigger role.
1715
01:05:16,400 –> 01:05:18,400
And that’s where RBAC starts lying to you.
1716
01:05:18,400 –> 01:05:22,240
RBAC roles tend to be static bundles designed around human job functions.
1717
01:05:22,240 –> 01:05:23,680
Agents don’t have job functions.
1718
01:05:23,680 –> 01:05:26,480
They have task graphs, a task graph crosses roles.
1719
01:05:26,480 –> 01:05:28,960
It crosses systems, it crosses environments.
1720
01:05:28,960 –> 01:05:32,560
It also changes over time because the easiest way to evolve an agent
1721
01:05:32,560 –> 01:05:35,360
is to add one more tool and one more action.
1722
01:05:35,360 –> 01:05:38,800
So you end up with a mismatch, static roles versus dynamic execution.
1723
01:05:38,800 –> 01:05:41,760
The organisation tries to solve that mismatch with exceptions.
1724
01:05:41,760 –> 01:05:45,120
Conditional access excludes the agent because the run broke.
1725
01:05:45,120 –> 01:05:47,120
A resource group gets a broader role assignment
1726
01:05:47,120 –> 01:05:49,120
because a remediation failed at 2am.
1727
01:05:49,120 –> 01:05:52,560
A connector gets tenant-wide read because a dataset wasn’t available.
1728
01:05:52,560 –> 01:05:54,000
Each exception feels small.
1729
01:05:54,000 –> 01:05:56,000
Each exception is an entropy generator.
1730
01:05:56,000 –> 01:05:58,160
And the real danger isn’t the obvious gap.
1731
01:05:58,160 –> 01:06:01,360
It’s the ambiguity you create when the same agent behaves differently
1732
01:06:01,360 –> 01:06:04,160
across context because policy drift has accumulated.
1733
01:06:04,160 –> 01:06:07,120
Deterministic intent becomes probabilistic behaviour.
1734
01:06:07,120 –> 01:06:10,240
You can’t predict what the agent can do anymore because the authorisation graph
1735
01:06:10,240 –> 01:06:12,560
has become a patchwork of historical compromises.
1736
01:06:12,560 –> 01:06:16,000
This is why identity debt unwinds slower than it accrues.
1737
01:06:16,000 –> 01:06:20,720
It accrues at project speed, one sprint, one fix, one temporary permission.
1738
01:06:20,720 –> 01:06:25,120
It unwinds at audit speed, inventory, review, re-approval, remediation
1739
01:06:25,120 –> 01:06:29,360
and political negotiation with every team that depends on the thing you’re trying to remove.
1740
01:06:29,360 –> 01:06:32,240
And in an agentic enterprise, the identities don’t just sit there.
1741
01:06:32,240 –> 01:06:35,040
They execute, they touch data, they change state,
1742
01:06:35,040 –> 01:06:39,360
they create evidence trails that ironically prove the access is being used
1743
01:06:39,360 –> 01:06:42,320
which makes it harder to decommission because now it’s critical.
1744
01:06:42,320 –> 01:06:46,000
So the limiting factor in autonomy isn’t whether the agent can plan.
1745
01:06:46,000 –> 01:06:50,000
It’s whether you can constrain execution without collapsing the workflow.
1746
01:06:50,000 –> 01:06:55,200
If you can’t express least privilege as an execution contract that maps to actual entitlements,
1747
01:06:55,200 –> 01:06:58,160
the agent either fails constantly so people widen permissions
1748
01:06:58,160 –> 01:06:59,760
or it succeeds unsafely.
1749
01:06:59,760 –> 01:07:02,560
So you accumulate risk until something breaks loudly.
1750
01:07:02,560 –> 01:07:03,920
That’s the identity debt trap.
1751
01:07:03,920 –> 01:07:06,880
Either you accept failure and keep humans in the loop forever
1752
01:07:06,880 –> 01:07:09,440
or you accept sprawl and pretend you can govern it later.
1753
01:07:09,440 –> 01:07:10,400
You can’t.
1754
01:07:10,400 –> 01:07:12,560
So when someone asks what does Altera really change?
1755
01:07:12,560 –> 01:07:17,120
The honest answer is this, it forces the enterprise to operationalize the autonomy boundary
1756
01:07:17,120 –> 01:07:21,200
as an identity and authorization problem, not a UX problem, not a chat problem.
1757
01:07:21,200 –> 01:07:24,960
And once you see that, the next limiting factor becomes obvious.
1758
01:07:24,960 –> 01:07:30,800
Tool access is the new perimeter and MCP makes that perimeter easier to adopt and easier to lose control of.
1759
01:07:30,800 –> 01:07:33,840
MCP and tool access, one protocol, many new ways to fail.
1760
01:07:33,840 –> 01:07:36,240
MCP is going to feel like progress because it is.
1761
01:07:36,240 –> 01:07:37,680
It standardizes tool access.
1762
01:07:37,680 –> 01:07:42,080
It makes agent can call a tool, stop being a bespoke integration project.
1763
01:07:42,080 –> 01:07:44,400
It turns every SAS system, every internal service,
1764
01:07:44,400 –> 01:07:48,400
every local capability into something an agent runtime can discover and invoke
1765
01:07:48,400 –> 01:07:51,920
without your developers reinventing glue code for the thousandth time.
1766
01:07:51,920 –> 01:07:53,040
And that’s the trap.
1767
01:07:53,040 –> 01:07:54,960
Standardization doesn’t reduce risk.
1768
01:07:54,960 –> 01:07:56,240
It reduces friction.
1769
01:07:56,240 –> 01:07:59,760
Risk scales with adoption and MCP is designed to accelerate adoption.
1770
01:07:59,760 –> 01:08:05,360
So if you treat MCP as just a protocol, you will wake up with a tool surface area that outgrew your governance model.
1771
01:08:05,360 –> 01:08:09,200
Here is the failure mode to anchor on because it’s the one that will actually happen.
1772
01:08:09,200 –> 01:08:13,040
An agent accidentally gains the ability to delete what it should only read,
1773
01:08:13,040 –> 01:08:16,240
not because someone flipped an evil setting, because tool scopes drift,
1774
01:08:16,240 –> 01:08:19,600
because a connector gets reused, because a server gets upgraded,
1775
01:08:19,600 –> 01:08:22,880
because someone adds one method to solve a legitimate business need.
1776
01:08:22,880 –> 01:08:26,560
And the permission model doesn’t force reauthorization with the same seriousness
1777
01:08:26,560 –> 01:08:28,080
as adding a new human admin.
1778
01:08:28,080 –> 01:08:31,920
MCP makes tool capabilities composable, composability is how you get outcomes.
1779
01:08:31,920 –> 01:08:36,000
Composibility is also how you get privilege escalation with a paper trail.
1780
01:08:36,000 –> 01:08:41,360
The thing most people miss is that MCP collapses the psychological boundary between data access
1781
01:08:41,360 –> 01:08:43,120
and action execution.
1782
01:08:43,120 –> 01:08:47,440
In a pre-agent world, a connector that reads SharePoint feels like a data integration.
1783
01:08:47,440 –> 01:08:51,040
A connector that changes, entra rolls feels like administration.
1784
01:08:51,040 –> 01:08:53,520
Different teams, different approvals, different audits.
1785
01:08:53,520 –> 01:08:56,080
MCP puts them in the same shape, a tool call.
1786
01:08:56,080 –> 01:09:00,400
That distinction matters because your organization’s current control model relies on friction.
1787
01:09:00,400 –> 01:09:03,440
Separate portals, separate owners, separate change boards.
1788
01:09:03,440 –> 01:09:08,480
MCP removes that friction, therefore your design has to replace it with enforceable intent.
1789
01:09:08,480 –> 01:09:09,600
So what breaks first?
1790
01:09:09,600 –> 01:09:10,480
Tools sprawl.
1791
01:09:10,480 –> 01:09:12,720
Every product team will ship an MCP server.
1792
01:09:12,720 –> 01:09:14,480
Every vendor will ship an MCP server.
1793
01:09:14,480 –> 01:09:18,240
Every internal platform team will expose helpful MCP endpoints,
1794
01:09:18,240 –> 01:09:19,920
because it’s easier than building a UI.
1795
01:09:19,920 –> 01:09:22,880
And suddenly your agent runtime isn’t talking to five systems,
1796
01:09:22,880 –> 01:09:24,400
it’s talking to 50.
1797
01:09:24,400 –> 01:09:29,600
And each one comes with its own auth model, its own scope semantics, its own notion of read,
1798
01:09:29,600 –> 01:09:31,280
and its own logging quality.
1799
01:09:31,280 –> 01:09:32,640
That is not interoperability.
1800
01:09:32,640 –> 01:09:34,640
That is an authorization expansion pack.
1801
01:09:34,640 –> 01:09:38,480
Then you get entitlement multiplication, a single business workflow that used to require
1802
01:09:38,480 –> 01:09:43,120
one person with three roles now requires an agent identity with tool access across multiple servers.
1803
01:09:43,120 –> 01:09:47,360
Each server wants credentials, tokens, delegated permissions, app roles,
1804
01:09:47,360 –> 01:09:50,880
secrets, certificates, managed identities, pick your poison.
1805
01:09:50,880 –> 01:09:53,280
And because agents are expected to work end to end,
1806
01:09:53,280 –> 01:09:57,600
the easiest path is to grant broad access so the workflow doesn’t get stuck.
1807
01:09:57,600 –> 01:10:01,760
That’s how delete permissions show up in a read scenario, not maliciously, inevitably.
1808
01:10:01,760 –> 01:10:06,640
So you need two separate control concepts and enterprises keep blending them until nothing is
1809
01:10:06,640 –> 01:10:09,360
controlled. Discovery is not authorization.
1810
01:10:09,360 –> 01:10:13,440
A registry that lets an agent find MCP servers is not a permission system.
1811
01:10:13,440 –> 01:10:14,480
It’s an index.
1812
01:10:14,480 –> 01:10:17,600
It answers what exists, not what is allowed.
1813
01:10:17,600 –> 01:10:21,520
If you confuse those, you’ve built an ecosystem where it showed up in the registry,
1814
01:10:21,520 –> 01:10:24,000
it becomes the justification for the agent used it.
1815
01:10:24,000 –> 01:10:25,200
That’s backwards.
1816
01:10:25,200 –> 01:10:27,040
Authorization must be explicit.
1817
01:10:27,040 –> 01:10:30,240
Per agent identity, per tool, per method, per scope,
1818
01:10:30,240 –> 01:10:32,560
with evidence requirements that can be audited.
1819
01:10:32,560 –> 01:10:35,440
And the allo list has to be enforced in the control plane,
1820
01:10:35,440 –> 01:10:37,680
not politely suggested in runtime code.
1821
01:10:37,680 –> 01:10:40,880
Because runtime code drifts, control planes are supposed to be the thing that doesn’t.
1822
01:10:40,880 –> 01:10:43,920
Now ad-versioning because MCP servers won’t sit still.
1823
01:10:43,920 –> 01:10:47,840
Servers get new capabilities, methods get renamed, default scopes get widened
1824
01:10:47,840 –> 01:10:50,000
because of ender wants fewer support tickets.
1825
01:10:50,000 –> 01:10:52,800
Breaking changes don’t always break the integration.
1826
01:10:52,800 –> 01:10:54,880
Sometimes they break your safety assumptions.
1827
01:10:54,880 –> 01:10:57,120
That’s why tool allow listing can’t be.
1828
01:10:57,120 –> 01:10:58,800
This server is approved.
1829
01:10:58,800 –> 01:11:01,760
It has to be this server, this version,
1830
01:11:01,760 –> 01:11:04,320
these methods, these scopes, in this environment,
1831
01:11:04,320 –> 01:11:05,920
anything else is trust by branding.
1832
01:11:05,920 –> 01:11:10,480
And the ugly part is that MCP encourages exactly the behavior that creates drift.
1833
01:11:10,480 –> 01:11:11,680
Rapid composition.
1834
01:11:11,680 –> 01:11:13,920
You build an agent, you add a tool, you get a win-you-ship.
1835
01:11:13,920 –> 01:11:16,320
Over time, the tool graph becomes the real perimeter,
1836
01:11:16,320 –> 01:11:18,960
because it defines what the agent can touch.
1837
01:11:18,960 –> 01:11:22,000
So MCP doesn’t replace identity debt, it accelerates it.
1838
01:11:22,000 –> 01:11:25,600
Every MCP server you add is another place where entitlements can sprawl
1839
01:11:25,600 –> 01:11:27,520
another place where evidence can be lost,
1840
01:11:27,520 –> 01:11:32,640
another place where temporary access becomes permanent because it unblocked the workflow.
1841
01:11:32,640 –> 01:11:35,040
And yes, Microsoft is leaning into MCP hard,
1842
01:11:35,040 –> 01:11:37,840
Teams AI library agent platforms, Windows registries,
1843
01:11:37,840 –> 01:11:39,840
that’s not a warning that MCP is bad.
1844
01:11:39,840 –> 01:11:42,080
It’s a warning that MCP will be everywhere,
1845
01:11:42,080 –> 01:11:46,160
therefore your enterprise needs to treat tool access like production infrastructure.
1846
01:11:46,160 –> 01:11:49,040
Because in an autonomous enterprise, tools are actuators,
1847
01:11:49,040 –> 01:11:51,200
and actuators are weapons if you don’t constrain them.
1848
01:11:51,200 –> 01:11:53,840
So if you remember one rule from this section, make it this.
1849
01:11:53,840 –> 01:11:56,080
MCP makes action cheap.
1850
01:11:56,080 –> 01:11:58,640
Governance has to make unsafe action impossible,
1851
01:11:58,640 –> 01:12:00,880
otherwise you didn’t build an agent platform.
1852
01:12:00,880 –> 01:12:04,320
You build a fast path to conditional chaos with standardized APIs.
1853
01:12:04,320 –> 01:12:08,320
Observability and replayability, the only cure for agent sites, so.
1854
01:12:08,320 –> 01:12:12,960
MCP makes action cheap, that means the enterprise has to make accountability unavoidable.
1855
01:12:12,960 –> 01:12:16,640
Because once agents start acting, the failure mode isn’t the model was wrong.
1856
01:12:16,640 –> 01:12:20,560
The failure mode is that nobody can prove what happened in what order,
1857
01:12:20,560 –> 01:12:22,720
under which permissions, and based on which inputs.
1858
01:12:22,720 –> 01:12:25,280
That’s how you end up in the worst possible incident review,
1859
01:12:25,280 –> 01:12:29,840
a room full of senior people reconstructing reality from screenshots and vibes.
1860
01:12:29,840 –> 01:12:33,520
Without observability, autonomy degenerates into agent set, so.
1861
01:12:33,520 –> 01:12:36,560
An agent set, so is not evidence, it’s a resignation letter.
1862
01:12:36,560 –> 01:12:40,560
So the core requirement for an autonomous enterprise is not better prompting.
1863
01:12:40,560 –> 01:12:43,840
It’s a telemetry model that treats every run like a production change,
1864
01:12:43,840 –> 01:12:45,840
recorded, attributable, and replayable.
1865
01:12:45,840 –> 01:12:48,560
Start with what has to be captured, not optionally.
1866
01:12:48,560 –> 01:12:52,000
By contract, inputs, the event payloads, ticket fields, alert IDs,
1867
01:12:52,000 –> 01:12:56,320
file versions, data extracts, and the exact prompt instructions that shape decisions.
1868
01:12:56,320 –> 01:13:00,000
If the agent used a SharePoint file, you need the file identity in version.
1869
01:13:00,000 –> 01:13:03,040
If it used a Sentinel incident, you need the incident ID,
1870
01:13:03,040 –> 01:13:05,200
and the related entity graph snapshot.
1871
01:13:05,200 –> 01:13:10,080
If it used work data, you need the scope that defined what work data meant at that moment.
1872
01:13:10,080 –> 01:13:13,760
Then decisions, the branching points, what class did it assign the incident to?
1873
01:13:13,760 –> 01:13:15,200
Which policy clause did it map to?
1874
01:13:15,200 –> 01:13:17,760
Which confidence threshold did it claim it met?
1875
01:13:17,760 –> 01:13:19,680
Which evidence requirement did it satisfy?
1876
01:13:19,680 –> 01:13:20,160
And how?
1877
01:13:20,160 –> 01:13:23,440
The thing most people miss is that decisions are more important than outputs.
1878
01:13:23,440 –> 01:13:24,880
Outputs are easy to store.
1879
01:13:24,880 –> 01:13:26,880
Decisions are where accountability lives.
1880
01:13:26,880 –> 01:13:29,840
Then tool calls, every tool in vocation with parameters.
1881
01:13:29,840 –> 01:13:30,880
Which API endpoint?
1882
01:13:30,880 –> 01:13:31,520
Which method?
1883
01:13:31,520 –> 01:13:32,080
Which scope?
1884
01:13:32,080 –> 01:13:32,880
Which identity?
1885
01:13:32,880 –> 01:13:34,240
Which resource IDs?
1886
01:13:34,240 –> 01:13:35,120
And the response?
1887
01:13:35,120 –> 01:13:37,040
If an agent restarts a service,
1888
01:13:37,040 –> 01:13:40,320
you log the resource ID, the operation ID, and the result state.
1889
01:13:40,320 –> 01:13:42,880
If it revokes a session, you log the principle,
1890
01:13:42,880 –> 01:13:46,480
the token session identifiers, if available, and the confirmation.
1891
01:13:46,480 –> 01:13:49,120
This has to be structured data, not a chat transcript,
1892
01:13:49,120 –> 01:13:51,600
chat transcripts or theater, tool calls or facts.
1893
01:13:51,600 –> 01:13:54,080
Then actions and state changes, what changed in Azure,
1894
01:13:54,080 –> 01:13:56,640
what changed in Entra, what changed in the ITSM record,
1895
01:13:56,640 –> 01:13:57,840
what messages were sent,
1896
01:13:57,840 –> 01:14:01,280
and critically, what verification checks were executed after the action.
1897
01:14:01,280 –> 01:14:05,520
If the contract says verify health probe green, show the probe result.
1898
01:14:05,520 –> 01:14:08,240
Not the sentence verified successfully.
1899
01:14:08,240 –> 01:14:10,560
Finally, outputs, the human facing artifact,
1900
01:14:10,560 –> 01:14:12,640
the incident report, the reconciliation pack,
1901
01:14:12,640 –> 01:14:14,640
the investigation summary, those are important,
1902
01:14:14,640 –> 01:14:15,840
but they are downstream.
1903
01:14:15,840 –> 01:14:17,680
They should be generated from the run record,
1904
01:14:17,680 –> 01:14:19,920
not written as free form narrative that drifts away
1905
01:14:19,920 –> 01:14:21,200
from what actually happened.
1906
01:14:21,200 –> 01:14:24,320
Now auditability, audit does not care that an agent is clever,
1907
01:14:24,320 –> 01:14:27,120
audit cares that identity and action are linkable.
1908
01:14:27,120 –> 01:14:30,080
Who or what took the action and under what authorization?
1909
01:14:30,080 –> 01:14:32,560
That means your run record must tie to the non-human principle,
1910
01:14:32,560 –> 01:14:34,320
the role assignments active at the time
1911
01:14:34,320 –> 01:14:36,640
and any approval objects that were required.
1912
01:14:36,640 –> 01:14:39,920
If you can’t link action to authorization deterministically,
1913
01:14:39,920 –> 01:14:43,360
you didn’t automate work, you automated liability.
1914
01:14:43,360 –> 01:14:44,880
Cost controls also live here,
1915
01:14:44,880 –> 01:14:47,840
and this is where most teams accidentally build infinite loops
1916
01:14:47,840 –> 01:14:48,480
with a budget.
1917
01:14:48,480 –> 01:14:51,520
You need to track token usage, tool usage, action volume,
1918
01:14:51,520 –> 01:14:53,520
retry as and failure loops per run,
1919
01:14:53,520 –> 01:14:54,800
not to optimize the model,
1920
01:14:54,800 –> 01:14:57,520
to enforce blast radius on compute and on action.
1921
01:14:57,520 –> 01:15:00,560
If an agent gets stuck and calls the same tool 50 times,
1922
01:15:00,560 –> 01:15:02,080
that’s not persistence.
1923
01:15:02,080 –> 01:15:04,880
That’s a runaway process. Observability is how you detect it.
1924
01:15:04,880 –> 01:15:06,560
Control plane limits are how you stop it.
1925
01:15:06,560 –> 01:15:08,400
And now the real point, replayability.
1926
01:15:08,400 –> 01:15:10,560
Replayability means you can re-execute the run
1927
01:15:10,560 –> 01:15:11,840
in a controlled environment
1928
01:15:11,840 –> 01:15:14,320
and see the same decisions with the same inputs.
1929
01:15:14,320 –> 01:15:15,600
Or if something differs,
1930
01:15:15,600 –> 01:15:17,040
you can point to the exact delta,
1931
01:15:17,040 –> 01:15:18,000
different data version,
1932
01:15:18,000 –> 01:15:19,040
different policy version,
1933
01:15:19,040 –> 01:15:21,120
different tool version, different model version.
1934
01:15:21,120 –> 01:15:24,000
That is how you do post mortems without mythology
1935
01:15:24,000 –> 01:15:25,680
because without replay incident review
1936
01:15:25,680 –> 01:15:26,800
becomes storytelling.
1937
01:15:26,800 –> 01:15:28,960
Humans, fill gaps, teams protect themselves.
1938
01:15:28,960 –> 01:15:31,360
People argue about what the agent meant.
1939
01:15:31,360 –> 01:15:33,360
None of that matters. The system did what it did.
1940
01:15:33,360 –> 01:15:36,240
Replay is how you stop debating and start fixing.
1941
01:15:36,240 –> 01:15:38,720
And replayability changes governance behavior.
1942
01:15:38,720 –> 01:15:41,200
It forces you to version your execution contracts.
1943
01:15:41,200 –> 01:15:43,440
It forces you to treat tool scopes like code.
1944
01:15:43,440 –> 01:15:45,120
It forces you to notice drift.
1945
01:15:45,120 –> 01:15:47,920
When a server update widens capabilities replay breaks,
1946
01:15:47,920 –> 01:15:50,160
therefore someone has to re-approved the new behavior.
1947
01:15:50,160 –> 01:15:51,760
That is the point.
1948
01:15:51,760 –> 01:15:54,480
So the cure for agent said so is a run ledger,
1949
01:15:54,480 –> 01:15:56,080
immutable enough to trust,
1950
01:15:56,080 –> 01:15:59,600
detailed enough to diagnose and structured enough to audit.
1951
01:15:59,600 –> 01:16:01,920
If you don’t build that, autonomy won’t scale
1952
01:16:01,920 –> 01:16:03,280
because trust won’t scale.
1953
01:16:03,280 –> 01:16:05,440
Now, once you can observe and replay,
1954
01:16:05,440 –> 01:16:07,040
you can do the next uncomfortable thing.
1955
01:16:07,040 –> 01:16:09,520
You can compute ROI without fantasy
1956
01:16:09,520 –> 01:16:11,920
because you can finally count outcomes,
1957
01:16:11,920 –> 01:16:14,000
interventions, rollbacks,
1958
01:16:14,000 –> 01:16:16,320
and policy violations as data,
1959
01:16:16,320 –> 01:16:17,280
not opinions.
1960
01:16:17,280 –> 01:16:20,640
ROI, without fantasy,
1961
01:16:20,640 –> 01:16:23,040
cost, speed, and risk is one equation.
1962
01:16:23,040 –> 01:16:24,640
Once you can observe and replay,
1963
01:16:24,640 –> 01:16:27,200
you can finally talk about ROI without lying to yourself.
1964
01:16:27,920 –> 01:16:30,880
Most agent ROI decks are token math and vibes.
1965
01:16:30,880 –> 01:16:33,040
Tokens are cheap, therefore we saved money,
1966
01:16:33,040 –> 01:16:34,640
or time saved per employee,
1967
01:16:34,640 –> 01:16:36,320
therefore we gained capacity.
1968
01:16:36,320 –> 01:16:37,440
That’s assistance logic.
1969
01:16:37,440 –> 01:16:38,480
It’s fine for co-pilot.
1970
01:16:38,480 –> 01:16:40,800
It’s the wrong accounting model for autonomy.
1971
01:16:40,800 –> 01:16:43,040
Because autonomy doesn’t sell you better sentences.
1972
01:16:43,040 –> 01:16:44,320
It sells you closed loops.
1973
01:16:44,320 –> 01:16:46,480
So the unit of value isn’t cost per chat.
1974
01:16:46,480 –> 01:16:47,760
It’s cost per outcome.
1975
01:16:47,760 –> 01:16:49,200
Cost per resolved incident,
1976
01:16:49,200 –> 01:16:51,280
cost per reconciled variance pack,
1977
01:16:51,280 –> 01:16:54,080
cost per contained low-risk security incident,
1978
01:16:54,080 –> 01:16:56,400
with an evidence bundle that survives review.
1979
01:16:56,400 –> 01:16:58,400
If you can’t measure cost per outcome,
1980
01:16:58,400 –> 01:17:00,080
you are not doing ROI.
1981
01:17:00,080 –> 01:17:01,600
You’re doing procurement theater.
1982
01:17:01,600 –> 01:17:04,320
Start with cost, but define it like an operator.
1983
01:17:04,320 –> 01:17:06,240
Direct compute is the easy part.
1984
01:17:06,240 –> 01:17:09,440
Model calls, orchestration runtime, tool call overhead.
1985
01:17:09,440 –> 01:17:10,320
You should measure that,
1986
01:17:10,320 –> 01:17:12,480
but you should treat it as a marginal cost
1987
01:17:12,480 –> 01:17:15,280
on top of the real cost driver, human intervention.
1988
01:17:15,280 –> 01:17:16,800
Every time the agent escalates,
1989
01:17:16,800 –> 01:17:19,440
pauses, asks for approval or fails verification,
1990
01:17:19,440 –> 01:17:20,800
and needs a human to clean up.
1991
01:17:20,800 –> 01:17:23,120
That’s labor cost injected back into the loop.
1992
01:17:23,120 –> 01:17:24,640
And it’s not just the time spent.
1993
01:17:24,640 –> 01:17:25,760
It’s the context switch.
1994
01:17:25,760 –> 01:17:27,040
It’s the seniority tax,
1995
01:17:27,040 –> 01:17:28,800
because exceptions tend to land
1996
01:17:28,800 –> 01:17:30,960
on the most expensive humans you have.
1997
01:17:30,960 –> 01:17:33,440
If an agent creates more exceptions than it resolves,
1998
01:17:33,440 –> 01:17:36,400
congratulations, you automated the worst part of the job.
1999
01:17:36,400 –> 01:17:38,080
Now speed, and this is where enterprises
2000
01:17:38,080 –> 01:17:39,600
keep using the wrong metric.
2001
01:17:39,600 –> 01:17:41,760
Speed isn’t how fast did it respond.
2002
01:17:41,760 –> 01:17:43,040
Speed is Q behavior.
2003
01:17:43,040 –> 01:17:45,840
Q depth never goes down when the system can’t close.
2004
01:17:45,840 –> 01:17:47,280
Tickets churn, not close.
2005
01:17:47,280 –> 01:17:48,560
Analysts become routers.
2006
01:17:48,560 –> 01:17:50,800
Controllers become spreadsheet traffic cops.
2007
01:17:50,800 –> 01:17:53,680
Autonomy wins when it reduces Q depth over time,
2008
01:17:53,680 –> 01:17:55,840
not when it generates a faster first reply.
2009
01:17:55,840 –> 01:17:58,240
So measure time to close, not time to first action.
2010
01:17:58,240 –> 01:17:59,680
Measure throughput under load.
2011
01:17:59,680 –> 01:18:02,160
How many incidents closed per day at peak volume
2012
01:18:02,160 –> 01:18:03,520
with the same head count?
2013
01:18:03,520 –> 01:18:04,800
Measure backlog aging.
2014
01:18:04,800 –> 01:18:08,080
How long do exceptions sit before a human touches them?
2015
01:18:08,080 –> 01:18:10,000
And measure the shape of the distribution,
2016
01:18:10,000 –> 01:18:12,400
not just the average, because the average hides your long tail.
2017
01:18:12,400 –> 01:18:13,840
The long tail is where trust dies.
2018
01:18:13,840 –> 01:18:16,160
Now risk, because this is the part that turns ROI
2019
01:18:16,160 –> 01:18:17,760
into a real enterprise conversation.
2020
01:18:17,760 –> 01:18:19,120
Risk isn’t a moral concept.
2021
01:18:19,120 –> 01:18:21,040
It’s an operational metric, intervention rate,
2022
01:18:21,040 –> 01:18:24,240
rollback rate, policy violations, and audit exceptions.
2023
01:18:24,240 –> 01:18:26,400
Those are the things that make your autonomy program
2024
01:18:26,400 –> 01:18:28,080
politically unsustainable.
2025
01:18:28,080 –> 01:18:30,880
If intervention rate is high, you didn’t build autonomy.
2026
01:18:30,880 –> 01:18:33,200
You built a noisy assistant that still needs a person
2027
01:18:33,200 –> 01:18:34,320
to finish the job.
2028
01:18:34,320 –> 01:18:36,720
If rollback rate is high, your verification is weak
2029
01:18:36,720 –> 01:18:39,120
or your execution contract is too permissive.
2030
01:18:39,120 –> 01:18:42,240
If policy violations occur, your control plane is ornamental.
2031
01:18:42,240 –> 01:18:44,240
And if audit exceptions appear,
2032
01:18:44,240 –> 01:18:46,080
finance and security will shut you down,
2033
01:18:46,080 –> 01:18:48,640
regardless of how productive it felt in a demo.
2034
01:18:48,640 –> 01:18:49,840
This is the uncomfortable truth.
2035
01:18:49,840 –> 01:18:51,680
In autonomy, risk has a cost curve.
2036
01:18:51,680 –> 01:18:53,920
The first policy breach costs your credibility.
2037
01:18:53,920 –> 01:18:56,480
The second costs your budget, the third costs you the program.
2038
01:18:56,480 –> 01:18:59,280
So the equation you should run is brutally simple.
2039
01:18:59,280 –> 01:19:02,080
Cost per outcome equals, compute plus tool usage,
2040
01:19:02,080 –> 01:19:05,360
plus human intervention, plus remediation overhead from failures.
2041
01:19:05,360 –> 01:19:08,400
Speed equals outcomes per unit time under real load,
2042
01:19:08,400 –> 01:19:11,680
reflected as reduced queue depth and reduced backlog aging.
2043
01:19:11,680 –> 01:19:15,040
Risk equals the rate at which outcomes required rollback,
2044
01:19:15,040 –> 01:19:18,560
violated policy, or produced evidence that didn’t pass review.
2045
01:19:18,560 –> 01:19:20,480
And you don’t get to optimize one in isolation.
2046
01:19:20,480 –> 01:19:22,960
If you reduce cost by lowering evidence requirements,
2047
01:19:22,960 –> 01:19:24,000
you increase risk.
2048
01:19:24,000 –> 01:19:25,920
If you increase speed by widening permissions,
2049
01:19:25,920 –> 01:19:27,440
you increase blast radius.
2050
01:19:27,440 –> 01:19:30,800
If you reduce risk by forcing human approvals everywhere,
2051
01:19:30,800 –> 01:19:33,440
you collapse autonomy back into faster labor.
2052
01:19:33,440 –> 01:19:35,600
That distinction matters because executives will ask,
2053
01:19:35,600 –> 01:19:37,680
should we just buy more copilot seats?
2054
01:19:37,680 –> 01:19:40,000
And the honest answer is copilot boosts individuals,
2055
01:19:40,000 –> 01:19:41,680
autonomy boosts system throughput.
2056
01:19:41,680 –> 01:19:44,320
Copilot makes one analyst faster at triage.
2057
01:19:44,320 –> 01:19:47,600
Autonomy makes the queue smaller even when the analyst isn’t there.
2058
01:19:47,600 –> 01:19:50,640
And that’s the only kind of ROI that survives budget season,
2059
01:19:50,640 –> 01:19:52,480
because it shows up as fewer open tickets,
2060
01:19:52,480 –> 01:19:54,640
faster close cycles and fewer policy incidents,
2061
01:19:54,640 –> 01:19:55,920
not happier anecdotes.
2062
01:19:55,920 –> 01:19:58,720
So if you want one practical test before you show a single number,
2063
01:19:58,720 –> 01:19:59,600
it’s this.
2064
01:19:59,600 –> 01:20:01,520
Pick a workflow where the queue never shrinks.
2065
01:20:01,520 –> 01:20:02,800
The tickets keep coming.
2066
01:20:02,800 –> 01:20:04,800
The team keeps working hard.
2067
01:20:04,800 –> 01:20:06,400
And yet the backlog ages anyway.
2068
01:20:06,400 –> 01:20:08,240
If autonomy can’t change that queue shape,
2069
01:20:08,240 –> 01:20:09,520
it’s not an autonomy investment.
2070
01:20:09,520 –> 01:20:11,200
It’s a chat interface with ambition.
2071
01:20:11,200 –> 01:20:13,840
And now that you can define ROI like an adult,
2072
01:20:13,840 –> 01:20:16,720
you can do the next thing most organizations avoid.
2073
01:20:16,720 –> 01:20:20,080
Decide when autonomy is worth it and when the correct answer is no.
2074
01:20:20,080 –> 01:20:21,440
Decision framework.
2075
01:20:21,440 –> 01:20:23,840
When autonomy is worth it, when to say no.
2076
01:20:23,840 –> 01:20:26,880
Here’s the decision framework executives keep asking for
2077
01:20:26,880 –> 01:20:30,000
and architects keep avoiding because it forces a real answer.
2078
01:20:30,000 –> 01:20:32,400
Autonomy is worth it when the work is repeatable.
2079
01:20:32,400 –> 01:20:34,880
The ownership is explicit and the system already
2080
01:20:34,880 –> 01:20:37,200
emits enough telemetry to verify success.
2081
01:20:37,200 –> 01:20:38,400
Not logs exist.
2082
01:20:38,400 –> 01:20:41,280
Telemetry that can prove the outcome
2083
01:20:41,280 –> 01:20:43,280
without a human squinting at a dashboard.
2084
01:20:43,280 –> 01:20:44,480
That’s the first gate.
2085
01:20:44,480 –> 01:20:46,480
Second gate, the action surface is enforceable.
2086
01:20:46,480 –> 01:20:49,360
You can name the tools, scopes and identities involved.
2087
01:20:49,360 –> 01:20:50,960
You can write an execution contract
2088
01:20:50,960 –> 01:20:52,640
that the runtime can’t negotiate with.
2089
01:20:52,640 –> 01:20:54,800
If you can’t, you’re not evaluating autonomy.
2090
01:20:54,800 –> 01:20:56,320
You’re evaluating optimism.
2091
01:20:56,320 –> 01:21:00,640
Third gate, you can define escalation contracts in advance.
2092
01:21:00,640 –> 01:21:02,240
When the agent hits ambiguity,
2093
01:21:02,240 –> 01:21:04,640
it doesn’t stall silently and it doesn’t improvise.
2094
01:21:04,640 –> 01:21:06,640
It roots to a human with the evidence bundle
2095
01:21:06,640 –> 01:21:08,080
and a proposed next action.
2096
01:21:08,080 –> 01:21:09,600
Humans become exception handlers.
2097
01:21:09,600 –> 01:21:11,600
If humans are still the default executor,
2098
01:21:11,600 –> 01:21:12,960
you bought faster labor.
2099
01:21:12,960 –> 01:21:15,920
Now the no criteria because mature teams say no early
2100
01:21:15,920 –> 01:21:17,760
and save themselves a year of politics,
2101
01:21:17,760 –> 01:21:19,360
say no when approvals are ambiguous.
2102
01:21:19,360 –> 01:21:21,680
If you can’t express who is allowed to approve what?
2103
01:21:21,680 –> 01:21:23,280
As machine readable policy,
2104
01:21:23,280 –> 01:21:25,680
autonomy will either freeze or bypass the process.
2105
01:21:25,680 –> 01:21:28,400
Both outcomes are failures, just with different paperwork.
2106
01:21:28,400 –> 01:21:30,480
Say no when data boundaries are unclear.
2107
01:21:30,480 –> 01:21:33,040
If you’re all can’t name which systems are authoritative
2108
01:21:33,040 –> 01:21:35,120
and which are just spreadsheets people trust,
2109
01:21:35,120 –> 01:21:37,360
you’re going to ground the agent on the wrong truth
2110
01:21:37,360 –> 01:21:39,120
and then argue about it for months.
2111
01:21:39,120 –> 01:21:42,160
Finance and security will not tolerate that ambiguity.
2112
01:21:42,160 –> 01:21:44,400
Say no when the audit surface doesn’t exist.
2113
01:21:44,400 –> 01:21:46,400
If you cannot capture inputs, decisions,
2114
01:21:46,400 –> 01:21:49,520
tool calls and verification as a replayable run record,
2115
01:21:49,520 –> 01:21:52,720
you will eventually end up with agents said so in front of leadership.
2116
01:21:52,720 –> 01:21:54,160
That’s the end of the program
2117
01:21:54,160 –> 01:21:57,360
and say no when nobody owns the page, autonomy shifts ownership.
2118
01:21:57,360 –> 01:21:58,480
It doesn’t remove it.
2119
01:21:58,480 –> 01:22:01,120
If the failure mode is, everyone is responsible,
2120
01:22:01,120 –> 01:22:03,840
then no one will fix the control plane when it drifts
2121
01:22:03,840 –> 01:22:06,480
and the agent will inherit a decaying policy model.
2122
01:22:06,480 –> 01:22:08,160
So the maturity gates are simple.
2123
01:22:08,160 –> 01:22:09,520
Identity readiness.
2124
01:22:09,520 –> 01:22:12,720
Can you issue non-human principles with narrow scopes
2125
01:22:12,720 –> 01:22:14,400
and life cycle controls?
2126
01:22:14,400 –> 01:22:15,760
Tool registry readiness.
2127
01:22:15,760 –> 01:22:19,600
Can you enumerate and allow list what exists versus what’s allowed?
2128
01:22:19,600 –> 01:22:20,560
Evidence readiness.
2129
01:22:20,560 –> 01:22:24,560
Can you produce replayable runs that survive post mortems and audits?
2130
01:22:24,560 –> 01:22:27,200
Now human in the loop design, this isn’t about feelings,
2131
01:22:27,200 –> 01:22:28,400
it’s about thresholds.
2132
01:22:28,400 –> 01:22:31,200
Define explicit confidence thresholds per action class.
2133
01:22:31,200 –> 01:22:33,600
Define evidence requirements per incident class.
2134
01:22:33,600 –> 01:22:35,520
Define what triggers elevation,
2135
01:22:35,520 –> 01:22:38,480
what triggers approval and what triggers escalation.
2136
01:22:38,480 –> 01:22:41,120
Don’t let human in the loop become a permanent crutch
2137
01:22:41,120 –> 01:22:43,520
and don’t let full autonomy become a marketing goal.
2138
01:22:43,520 –> 01:22:46,400
The autonomy boundary is a control surface treated like one.
2139
01:22:46,400 –> 01:22:48,880
And the operating model is the part most org skip
2140
01:22:48,880 –> 01:22:50,400
because it’s boring and political.
2141
01:22:50,400 –> 01:22:52,400
Who owns agent failures, not the dev team,
2142
01:22:52,400 –> 01:22:54,640
not AI, the business owner of the workflow?
2143
01:22:54,640 –> 01:22:56,160
Who owns policy changes?
2144
01:22:56,160 –> 01:22:57,920
The team that owns the control plane
2145
01:22:57,920 –> 01:23:00,320
with change control like any other enforcement system?
2146
01:23:00,320 –> 01:23:01,440
Who owns the tool scopes?
2147
01:23:01,440 –> 01:23:02,240
The tool owners?
2148
01:23:02,240 –> 01:23:04,800
With versioning and re-approval when capabilities change.
2149
01:23:04,800 –> 01:23:06,880
That’s the framework if it sounds strict good.
2150
01:23:06,880 –> 01:23:08,880
Autonomy is strictness automated.
2151
01:23:08,880 –> 01:23:10,160
Implementation payoff.
2152
01:23:10,160 –> 01:23:12,880
The 30-day autonomy pilot that doesn’t embarrass you.
2153
01:23:12,880 –> 01:23:15,920
If you want a 30-day pilot that survives contact with reality,
2154
01:23:15,920 –> 01:23:16,960
pick one domain.
2155
01:23:16,960 –> 01:23:18,960
IT remediation or security triage.
2156
01:23:18,960 –> 01:23:20,640
Don’t run three pilots in parallel
2157
01:23:20,640 –> 01:23:22,400
and call the confusion learning.
2158
01:23:22,400 –> 01:23:24,160
Write three policies on day one,
2159
01:23:24,160 –> 01:23:26,160
allowed actions, evidence requirements
2160
01:23:26,160 –> 01:23:28,560
with confidence thresholds and escalation paths
2161
01:23:28,560 –> 01:23:30,000
with named owners.
2162
01:23:30,000 –> 01:23:32,000
Stand up evidence before you stand up autonomy.
2163
01:23:32,000 –> 01:23:33,760
Action logs, tool call capture
2164
01:23:33,760 –> 01:23:37,040
and replayable run records mapped to your audit expectations.
2165
01:23:37,040 –> 01:23:39,360
If you can’t replay a run, you can’t defend it.
2166
01:23:39,360 –> 01:23:41,120
Then measure the only metrics that matter.
2167
01:23:41,120 –> 01:23:41,920
Time to close.
2168
01:23:41,920 –> 01:23:44,160
MTTR delta, human enlub rate,
2169
01:23:44,160 –> 01:23:46,080
rollback rate and policy violations.
2170
01:23:46,080 –> 01:23:47,360
If those don’t move, stop.
2171
01:23:47,360 –> 01:23:48,400
Don’t rebrand.
2172
01:23:48,400 –> 01:23:51,600
Autonomy becomes safe only when it’s enforced by design
2173
01:23:51,600 –> 01:23:52,880
through the autonomy boundary
2174
01:23:52,880 –> 01:23:54,000
and execution contract,
2175
01:23:54,000 –> 01:23:55,440
not by intent or good luck.
2176
01:23:55,440 –> 01:23:57,360
If you want to test readiness this week,
2177
01:23:57,360 –> 01:23:58,320
do one thing.
2178
01:23:58,320 –> 01:24:00,400
Remove one human step from a workflow
2179
01:24:00,400 –> 01:24:01,760
where the queue never shrinks
2180
01:24:01,760 –> 01:24:03,280
but add one hard boundary
2181
01:24:03,280 –> 01:24:04,800
that the agent cannot cross
2182
01:24:04,800 –> 01:24:06,400
without evidence and policy.
2183
01:24:06,400 –> 01:24:09,120
And here’s the line that should end the discussion fast.
2184
01:24:09,120 –> 01:24:13,040
If you can’t name who wakes up at 2am when the agent fails,
2185
01:24:13,040 –> 01:24:14,400
you’re not ready for autonomy.
2186
01:24:14,400 –> 01:24:16,320
If you’ve got a workflow where tickets churn
2187
01:24:16,320 –> 01:24:18,720
and nobody can close the loop, put it in the comments.
2188
01:24:18,720 –> 01:24:19,920
And watch the next episode
2189
01:24:19,920 –> 01:24:22,320
because we’ll go deeper on agent identities,
2190
01:24:22,320 –> 01:24:23,520
MCP entitlements
2191
01:24:23,520 –> 01:24:26,720
and how to stop conditional chaos before it becomes policy drift.






