
1
00:00:00,000 –> 00:00:02,200
At 0902, the agent signs in.
2
00:00:02,200 –> 00:00:06,880
Conditional access evaluates once, passes, and a token gets issued.
3
00:00:06,880 –> 00:00:11,000
At 0904, the meeting changes, and external guest joins a channel gets renamed
4
00:00:11,000 –> 00:00:13,200
a document link, shifts, whatever.
5
00:00:13,200 –> 00:00:14,360
Context moves.
6
00:00:14,360 –> 00:00:20,920
At 0907, the agent executes a destructive tool call anyway, inside the workload with a still valid token.
7
00:00:20,920 –> 00:00:23,240
At 0908, Perview has the transcript.
8
00:00:23,240 –> 00:00:24,760
Copilot logs have the activity.
9
00:00:24,760 –> 00:00:28,080
The identity is correct, the timestamps are correct, the story is perfect.
10
00:00:28,080 –> 00:00:32,720
Every control worked, every log is correct, and the system still did the wrong thing is
11
00:00:32,720 –> 00:00:33,600
what this is not.
12
00:00:33,600 –> 00:00:36,280
Anti-voice, anti-UX, anti-microsoft.
13
00:00:36,280 –> 00:00:37,880
This is not an anti-voice rant.
14
00:00:37,880 –> 00:00:43,320
Voice is useful, avatars are useful, accessibility matters, and real-time interaction matters.
15
00:00:43,320 –> 00:00:48,120
A speaking interface can lower friction, reduce cognitive load, and make a system usable
16
00:00:48,120 –> 00:00:50,560
for people who would never type into a chat box.
17
00:00:50,560 –> 00:00:52,760
This is also not an anti-microsoft episode.
18
00:00:52,760 –> 00:00:56,880
Microsoft has shipped real governance improvements that most platforms still don’t have.
19
00:00:56,880 –> 00:01:01,320
Perview can capture transcripts, copilot studio can log activities, Entra has a clearer model
20
00:01:01,320 –> 00:01:03,840
for workload identities and non-human identities.
21
00:01:03,840 –> 00:01:06,120
Conditional access exists and it’s mature.
22
00:01:06,120 –> 00:01:07,200
Those are not small things.
23
00:01:07,200 –> 00:01:10,640
That is the scaffolding you need for operating agents at enterprise scale.
24
00:01:10,640 –> 00:01:14,040
But this is the boundary line, the industry keeps refusing to say out loud.
25
00:01:14,040 –> 00:01:15,400
Forensics are not control.
26
00:01:15,400 –> 00:01:17,400
Or it tells you what happened after the fact.
27
00:01:17,400 –> 00:01:19,240
It gives you a narrative you can export.
28
00:01:19,240 –> 00:01:23,600
It helps legal, it helps incident response, it helps you argue with reality less.
29
00:01:23,600 –> 00:01:28,200
One of that prevents an allowed identity from doing a wrong thing at the moment of execution.
30
00:01:28,200 –> 00:01:30,200
And that’s why the format of this episode matters.
31
00:01:30,200 –> 00:01:33,920
This isn’t a tutorial on building agents, there won’t be configuration walkthroughs, no
32
00:01:33,920 –> 00:01:35,400
click here demos.
33
00:01:35,400 –> 00:01:36,400
This is an autopsy.
34
00:01:36,400 –> 00:01:39,600
Claim, failure pattern, architectural cause consequence.
35
00:01:39,600 –> 00:01:41,600
Because the failure isn’t that teams lack tools.
36
00:01:41,600 –> 00:01:45,360
The failure is that teams keep buying comfort instead of determinism.
37
00:01:45,360 –> 00:01:49,760
The embodied lie, trust signaling wrapped around probabilistic execution.
38
00:01:49,760 –> 00:01:50,840
Here’s the embodied lie.
39
00:01:50,840 –> 00:01:54,080
A voice in a face are not features, they are trust signals.
40
00:01:54,080 –> 00:01:58,000
They’re a human interface hack that makes a probabilistic system feel like a deterministic
41
00:01:58,000 –> 00:01:59,000
one.
42
00:01:59,000 –> 00:02:02,440
And the moment you add them, you change how people evaluate risk.
43
00:02:02,440 –> 00:02:05,920
The thing most people miss is that the speaking agent isn’t just an agent.
44
00:02:05,920 –> 00:02:08,200
It’s an execution engine, wearing a personality.
45
00:02:08,200 –> 00:02:10,200
The avatar doesn’t make the agent more accurate.
46
00:02:10,200 –> 00:02:11,960
It makes the output more persuasive.
47
00:02:11,960 –> 00:02:14,880
That distinction matters because persuasion is not governance.
48
00:02:14,880 –> 00:02:17,600
In architectural terms, the agent is not a teammate.
49
00:02:17,600 –> 00:02:19,520
It is a distributed decision engine.
50
00:02:19,520 –> 00:02:23,880
It takes an input, retrieves some context, chooses a tool and executes an action.
51
00:02:23,880 –> 00:02:27,880
The choice is probabilistic, the retrieval is probabilistic, tool selection is probabilistic.
52
00:02:27,880 –> 00:02:31,480
Even when it’s grounded, it’s grounded in whatever it retrieved, not in what your intent
53
00:02:31,480 –> 00:02:32,480
actually was.
54
00:02:32,480 –> 00:02:36,760
Now add embodiment, low latency speech, smooth turn, taking a confident tone.
55
00:02:36,760 –> 00:02:38,040
Humans read those as competence.
56
00:02:38,040 –> 00:02:42,040
They stop asking what approved this and start accepting it sounded right.
57
00:02:42,040 –> 00:02:45,920
That’s human interface trust bias doing what it always does, shifting scrutiny away from
58
00:02:45,920 –> 00:02:48,280
the control plane and onto the performance.
59
00:02:48,280 –> 00:02:50,400
That’s why governance gets worse when you add a face.
60
00:02:50,400 –> 00:02:53,080
The organization starts optimizing for the experience plane.
61
00:02:53,080 –> 00:02:57,000
Prompt tweaks, persona tuning, make it sound more cautious.
62
00:02:57,000 –> 00:02:58,520
Add a confirmation question.
63
00:02:58,520 –> 00:02:59,600
Those are theater patches.
64
00:02:59,600 –> 00:03:01,440
They don’t change the system’s blast radius.
65
00:03:01,440 –> 00:03:02,760
They don’t enforce intent.
66
00:03:02,760 –> 00:03:08,240
They don’t create a deterministic gate between the agent’s proposal and the platform’s execution.
67
00:03:08,240 –> 00:03:12,320
This clicked for me when I watched teams celebrate transcripts as if they were safety.
68
00:03:12,320 –> 00:03:13,560
A transcript is not safety.
69
00:03:13,560 –> 00:03:15,440
A transcript is a post-incident artifact.
70
00:03:15,440 –> 00:03:16,440
It’s a replay.
71
00:03:16,440 –> 00:03:17,440
It’s a confession.
72
00:03:17,440 –> 00:03:20,520
You hand to counsel when the action already happened.
73
00:03:20,520 –> 00:03:23,160
The system did not become safer because it can narrate itself.
74
00:03:23,160 –> 00:03:26,600
Now, Microsoft will say the right things here and they’re not wrong.
75
00:03:26,600 –> 00:03:28,760
Conditional access evaluates a token acquisition.
76
00:03:28,760 –> 00:03:30,800
Per view can capture interactions.
77
00:03:30,800 –> 00:03:31,800
Activity logs exist.
78
00:03:31,800 –> 00:03:33,280
Workload identity controls exist.
79
00:03:33,280 –> 00:03:34,280
That’s the ticket booth.
80
00:03:34,280 –> 00:03:35,280
That’s the camera system.
81
00:03:35,280 –> 00:03:36,280
That’s the audit trail.
82
00:03:36,280 –> 00:03:41,360
But the embodied lie lives in the gap between those controls and the moment a tool called
83
00:03:41,360 –> 00:03:42,680
executes.
84
00:03:42,680 –> 00:03:45,000
Token time controls decide who can show up.
85
00:03:45,000 –> 00:03:47,560
And controls decide what is allowed to happen next.
86
00:03:47,560 –> 00:03:49,960
Most organizations build only the first one.
87
00:03:49,960 –> 00:03:53,080
Then they act surprised when the second one behaves like a suggestion.
88
00:03:53,080 –> 00:03:57,400
And this is where the speaking agent becomes an entropy generator because the more human
89
00:03:57,400 –> 00:04:01,440
it seems, the more likely you are to let it run with broad scopes, the more likely you
90
00:04:01,440 –> 00:04:06,040
are to skip segmentation, the more likely you are to accept its logged as a substitute
91
00:04:06,040 –> 00:04:07,800
for its prevented.
92
00:04:07,800 –> 00:04:12,960
Over time, you accumulate permissions, exceptions, and implicit trust until you have conditional
93
00:04:12,960 –> 00:04:13,960
chaos.
94
00:04:13,960 –> 00:04:18,160
And that behaves correctly most of the time, right up until the moment it doesn’t.
95
00:04:18,160 –> 00:04:21,920
So when the agent speaks with certainty, treat that as a warning, not a reassurance.
96
00:04:21,920 –> 00:04:23,360
You are not hearing determinism.
97
00:04:23,360 –> 00:04:27,000
You are hearing probability wrapped in a voice that implies accountability.
98
00:04:27,000 –> 00:04:30,600
The control plane versus the experience plane, two timelines that don’t meet.
99
00:04:30,600 –> 00:04:34,320
There are two timelines running every time an agent helps someone.
100
00:04:34,320 –> 00:04:38,320
Most organizations only instrument one of them because it’s the one humans notice.
101
00:04:38,320 –> 00:04:39,880
That’s the experience plane.
102
00:04:39,880 –> 00:04:43,840
It includes the chat transcript, the speaking voice, the avatar, the response latency,
103
00:04:43,840 –> 00:04:48,760
the citations, the little thinking indicator, and the meeting dynamics where nobody wants
104
00:04:48,760 –> 00:04:53,320
to slow the room down by arguing with a confident sounding assistant.
105
00:04:53,320 –> 00:04:56,440
Its perception management, its social flow, its persuasion at scale.
106
00:04:56,440 –> 00:04:59,480
The other timeline is the only one that matters when something breaks.
107
00:04:59,480 –> 00:05:01,160
That’s the control plane.
108
00:05:01,160 –> 00:05:07,280
Identity issuance, token lifetime, scope, retrieval boundaries, tool invocation, side effects,
109
00:05:07,280 –> 00:05:13,720
state transitions, retry behavior, compensating actions, data class enforcement, venue enforcement.
110
00:05:13,720 –> 00:05:16,360
That’s the plane where blast radius is defined.
111
00:05:16,360 –> 00:05:19,840
And the uncomfortable truth is that these two timelines don’t line up.
112
00:05:19,840 –> 00:05:21,600
They rarely even touch.
113
00:05:21,600 –> 00:05:25,920
Because the platform’s strongest controls tend to fire at token time while the damage
114
00:05:25,920 –> 00:05:27,800
happens at tool time.
115
00:05:27,800 –> 00:05:29,360
Condition access is a perfect example.
116
00:05:29,360 –> 00:05:30,560
It’s the ticket booth.
117
00:05:30,560 –> 00:05:32,080
It answers a narrow question.
118
00:05:32,080 –> 00:05:35,360
Should this identity get a token right now under current conditions?
119
00:05:35,360 –> 00:05:40,520
It can evaluate signals, risk, device posture, location, it can deny, it can require stronger
120
00:05:40,520 –> 00:05:41,520
auth.
121
00:05:41,520 –> 00:05:42,520
That is real control.
122
00:05:42,520 –> 00:05:45,240
If the token exists, the train leaves the station.
123
00:05:45,240 –> 00:05:46,920
Now the system is in the workload.
124
00:05:46,920 –> 00:05:51,400
Tool selection happens, data gets read, rights happen, shares happen, deletes happen, and
125
00:05:51,400 –> 00:05:54,920
the control plane is often no longer in the loop in a deterministic way.
126
00:05:54,920 –> 00:05:59,120
You’ve moved from who may show up to what is happening, and most enterprises have no enforcement
127
00:05:59,120 –> 00:06:00,120
point in the middle.
128
00:06:00,120 –> 00:06:02,720
Per view is the other half of the same mismatch.
129
00:06:02,720 –> 00:06:04,640
Per view is the security camera system.
130
00:06:04,640 –> 00:06:09,000
It records, it correlates, it lets you do forensics after the fact, it’s useful, and
131
00:06:09,000 –> 00:06:10,000
it’s getting better.
132
00:06:10,000 –> 00:06:11,520
But cameras do not stop the train.
133
00:06:11,520 –> 00:06:14,440
They just help you reconstruct which door was forced and when.
134
00:06:14,440 –> 00:06:18,400
The reason this gap keeps surprising people is that the experience plane looks like control.
135
00:06:18,400 –> 00:06:19,520
The agent speaks calmly.
136
00:06:19,520 –> 00:06:20,520
It cites a document.
137
00:06:20,520 –> 00:06:22,200
It says, based on policy.
138
00:06:22,200 –> 00:06:25,660
It feels governed, and because it feels governed, people assume the control plane must have
139
00:06:25,660 –> 00:06:26,660
approved it.
140
00:06:26,660 –> 00:06:28,120
That assumption is false.
141
00:06:28,120 –> 00:06:30,760
A citation is not an authorization decision.
142
00:06:30,760 –> 00:06:32,960
A transcript is not a policy evaluation.
143
00:06:32,960 –> 00:06:35,840
A token issuance event is not a per-action gate.
144
00:06:35,840 –> 00:06:38,520
If you want a mental model, you can hold in your head, use the rail system.
145
00:06:38,520 –> 00:06:40,280
The ticket booth is conditional access.
146
00:06:40,280 –> 00:06:42,240
It can stop someone from entering the station.
147
00:06:42,240 –> 00:06:45,800
It cannot stop them from pulling the emergency brake once they’re on the train.
148
00:06:45,800 –> 00:06:46,960
The cameras are per view.
149
00:06:46,960 –> 00:06:48,840
They can tell you which car it happened in.
150
00:06:48,840 –> 00:06:50,320
They cannot prevent the derailment.
151
00:06:50,320 –> 00:06:52,240
The missing role is the guard on the train.
152
00:06:52,240 –> 00:06:56,880
The deterministic policy gate that evaluates each action at the moment it is about to execute.
153
00:06:56,880 –> 00:06:59,640
And that’s the heart of the architectural lie.
154
00:06:59,640 –> 00:07:04,160
Organizations keep building governance around artifacts that exist before and after execution,
155
00:07:04,160 –> 00:07:05,440
but not at execution.
156
00:07:05,440 –> 00:07:08,240
So you get beautiful audit trails and ugly outcomes.
157
00:07:08,240 –> 00:07:11,520
This also explains why embodiment makes the problem worse.
158
00:07:11,520 –> 00:07:15,000
The more polished the experience plane becomes, the more it masks the absence of control plane
159
00:07:15,000 –> 00:07:16,000
enforcement.
160
00:07:16,000 –> 00:07:18,480
The organization feels safer because it can see more.
161
00:07:18,480 –> 00:07:21,640
But visibility without gating is just higher resolution regret.
162
00:07:21,640 –> 00:07:26,640
Once you separate the two planes, you stop arguing about whether the platform has governance.
163
00:07:26,640 –> 00:07:27,640
It does.
164
00:07:27,640 –> 00:07:31,120
You start arguing about where in the timeline governance actually applies.
165
00:07:31,120 –> 00:07:35,080
And you stop treating that as semantics because timing is where incidents live.
166
00:07:35,080 –> 00:07:38,960
Token time control without tool time control is a polite front door with no locks inside
167
00:07:38,960 –> 00:07:40,280
the building.
168
00:07:40,280 –> 00:07:43,680
What Microsoft gets right and why it still doesn’t save you.
169
00:07:43,680 –> 00:07:45,160
Microsoft is not asleep at the wheel here.
170
00:07:45,160 –> 00:07:49,840
That’s what makes this harder because the comfortable critique is the platform is immature.
171
00:07:49,840 –> 00:07:50,840
It isn’t.
172
00:07:50,840 –> 00:07:54,360
The uncomfortable critique is that the platform is improving in the places enterprises
173
00:07:54,360 –> 00:07:58,800
like to measure while the failure happens in the place they avoid designing.
174
00:07:58,800 –> 00:07:59,800
Start with purview.
175
00:07:59,800 –> 00:08:03,800
Getting co-pided conversations into a compliant surface is real progress.
176
00:08:03,800 –> 00:08:05,480
Change the nature of investigations.
177
00:08:05,480 –> 00:08:06,480
They give you a timeline.
178
00:08:06,480 –> 00:08:09,320
They give you a record of what was asked and what was answered.
179
00:08:09,320 –> 00:08:14,800
They also give you a way to correlate that conversation with a user identity and increasingly
180
00:08:14,800 –> 00:08:18,080
with the sources the system touched that closes a lot of the old.
181
00:08:18,080 –> 00:08:19,880
We have no idea what it did problem.
182
00:08:19,880 –> 00:08:25,360
Co-pilot studio logging is the same category of win activity logging tool invocation traces.
183
00:08:25,360 –> 00:08:29,960
The ability to see what actions were taken and when again real for operations teams that’s
184
00:08:29,960 –> 00:08:33,800
better than folklore and screen recordings it turns agent behavior into something you can
185
00:08:33,800 –> 00:08:34,800
query.
186
00:08:34,800 –> 00:08:35,800
Now identity.
187
00:08:35,800 –> 00:08:39,000
Interest framing of workload identities and non-human identities is exactly where this
188
00:08:39,000 –> 00:08:40,000
should go.
189
00:08:40,000 –> 00:08:41,000
An agent is not a user.
190
00:08:41,000 –> 00:08:42,160
It is not an intern.
191
00:08:42,160 –> 00:08:47,080
It is a workload identity with automation privileges and treating it as such is the first
192
00:08:47,080 –> 00:08:48,600
admission of reality.
193
00:08:48,600 –> 00:08:51,480
Conditional access applying to those identities matters.
194
00:08:51,480 –> 00:08:54,920
Token issuance becomes conditional signals driven and enforceable.
195
00:08:54,920 –> 00:08:55,920
Risk goes up.
196
00:08:55,920 –> 00:08:57,160
Token issuance gets blocked.
197
00:08:57,160 –> 00:08:58,400
Device posture is wrong.
198
00:08:58,400 –> 00:08:59,880
Token issuance gets blocked.
199
00:08:59,880 –> 00:09:02,120
Token issuance gets blocked.
200
00:09:02,120 –> 00:09:03,720
You can make the front door real.
201
00:09:03,720 –> 00:09:07,640
And there’s also continuous access evaluation sitting in the background as Microsoft’s answer
202
00:09:07,640 –> 00:09:09,960
to context changes after sign in.
203
00:09:09,960 –> 00:09:13,960
It’s an attempt to reduce the time lag between a changing risk posture and what the token
204
00:09:13,960 –> 00:09:15,160
is allowed to do.
205
00:09:15,160 –> 00:09:16,160
That direction is correct.
206
00:09:16,160 –> 00:09:20,640
You can’t keep treating authentication as a one time ceremony in a world where sessions
207
00:09:20,640 –> 00:09:22,320
persist and context drift.
208
00:09:22,320 –> 00:09:23,520
All of that is necessary.
209
00:09:23,520 –> 00:09:25,040
All of it is still insufficient.
210
00:09:25,040 –> 00:09:26,760
Here’s the boundary you don’t get to hand wave.
211
00:09:26,760 –> 00:09:30,200
These controls mostly operate at token time and after execution.
212
00:09:30,200 –> 00:09:34,440
They don’t operate at action time inside the tool call path with deterministic intent
213
00:09:34,440 –> 00:09:35,440
enforcement.
214
00:09:35,440 –> 00:09:36,600
Per view tells you what happened.
215
00:09:36,600 –> 00:09:38,960
It does not decide what is allowed to happen next.
216
00:09:38,960 –> 00:09:43,560
Conditional access decides whether an identity should be issued a token under current conditions.
217
00:09:43,560 –> 00:09:48,800
It does not evaluate whether a specific delete, share or send is appropriate given the intent
218
00:09:48,800 –> 00:09:53,680
of the request, the sensitivity of the target and the venue in which the result will be exposed.
219
00:09:53,680 –> 00:09:57,920
That distinction matters because enterprise harm rarely looks like the agent got global
220
00:09:57,920 –> 00:09:59,400
admin.
221
00:09:59,400 –> 00:10:02,320
Microsoft has already blocked a lot of those extremes for agent identities.
222
00:10:02,320 –> 00:10:07,520
The real harm looks like the agent had legitimate right access in the wrong place or the agent
223
00:10:07,520 –> 00:10:10,800
retrieved legitimate data and disclosed it in the wrong venue.
224
00:10:10,800 –> 00:10:12,360
And those are action time failures.
225
00:10:12,360 –> 00:10:14,400
If you want to hear the gap, walk the timeline.
226
00:10:14,400 –> 00:10:15,720
Agent signs in.
227
00:10:15,720 –> 00:10:17,960
Conditional access evaluates token issued.
228
00:10:17,960 –> 00:10:18,960
Fine.
229
00:10:18,960 –> 00:10:19,960
Agent retrieves documents.
230
00:10:19,960 –> 00:10:20,960
It is entitled to retrieve.
231
00:10:20,960 –> 00:10:21,960
Fine.
232
00:10:21,960 –> 00:10:22,960
Transcript gets captured.
233
00:10:22,960 –> 00:10:24,840
The agents get recorded fine.
234
00:10:24,840 –> 00:10:29,000
Now the agent proposes a tool call, delete a site, share a file, post a message, send an
235
00:10:29,000 –> 00:10:31,200
email, trigger a workflow.
236
00:10:31,200 –> 00:10:35,440
Where is the deterministic policy gate that evaluates that proposed action against intent,
237
00:10:35,440 –> 00:10:38,840
scope, data classification and venue before the tool executes?
238
00:10:38,840 –> 00:10:40,440
In most deployments it isn’t there.
239
00:10:40,440 –> 00:10:42,520
The platform gave you the ticket booth and the cameras.
240
00:10:42,520 –> 00:10:45,120
It did not automatically give you a guard on the train.
241
00:10:45,120 –> 00:10:48,640
And because those Microsoft controls exist, organizations stop designing.
242
00:10:48,640 –> 00:10:50,320
They assume governance is covered.
243
00:10:50,320 –> 00:10:53,120
They feel safe because they can export transcripts.
244
00:10:53,120 –> 00:10:56,400
They feel safe because conditional access policies look mature.
245
00:10:56,400 –> 00:11:00,960
They feel safe because the agent has an identity object and identities feel like control.
246
00:11:00,960 –> 00:11:02,880
But control is not a directory object.
247
00:11:02,880 –> 00:11:04,160
Control is an enforcement point.
248
00:11:04,160 –> 00:11:05,880
So yes, praise the forensics.
249
00:11:05,880 –> 00:11:07,160
Praise the identity model.
250
00:11:07,160 –> 00:11:08,160
Praise the growing observability.
251
00:11:08,160 –> 00:11:10,120
And those are the raw materials you need.
252
00:11:10,120 –> 00:11:13,520
Then say the sentence that forces the architectural truth into the room.
253
00:11:13,520 –> 00:11:15,360
Microsoft has significantly improved visibility.
254
00:11:15,360 –> 00:11:18,200
They have not eliminated non-deterministic execution.
255
00:11:18,200 –> 00:11:23,000
Once you accept that, you stop asking the platform to save you with more logs and you start building
256
00:11:23,000 –> 00:11:24,480
the missing thing.
257
00:11:24,480 –> 00:11:28,280
Action time, per tool call determinism.
258
00:11:28,280 –> 00:11:29,760
Audit provenance policy gate.
259
00:11:29,760 –> 00:11:32,360
Here is the trilogy that keeps getting blurred on purpose.
260
00:11:32,360 –> 00:11:33,880
Audit is a record of what happened.
261
00:11:33,880 –> 00:11:37,800
Who asked what the agent said, which identity executed, which file got touched, which API
262
00:11:37,800 –> 00:11:39,440
got called, what time it happened?
263
00:11:39,440 –> 00:11:40,440
It’s a timeline.
264
00:11:40,440 –> 00:11:41,440
It’s useful.
265
00:11:41,440 –> 00:11:43,080
It’s also inherently retrospective.
266
00:11:43,080 –> 00:11:46,320
Audit is the black box flight recorder you consult after the impact.
267
00:11:46,320 –> 00:11:48,880
It doesn’t change the trajectory of the plane.
268
00:11:48,880 –> 00:11:51,920
Provenance is the missing middle that most teams pretend is nice to have.
269
00:11:51,920 –> 00:11:53,920
Provenance is not the transcript.
270
00:11:53,920 –> 00:11:57,720
Provenance is the decision chain, which chunks were retrieved, which candidates were considered
271
00:11:57,720 –> 00:12:02,560
and rejected, which tool options were available, which constraints were applied, and what caused
272
00:12:02,560 –> 00:12:04,080
the final selection.
273
00:12:04,080 –> 00:12:08,160
It is the explanation graph that ties inputs to outputs in a way that survives an incident
274
00:12:08,160 –> 00:12:09,160
review.
275
00:12:09,160 –> 00:12:12,360
Without provenance, you don’t know why the agent did what it did.
276
00:12:12,360 –> 00:12:13,840
You only know that it did it.
277
00:12:13,840 –> 00:12:15,840
And then there’s the part that prevents harm.
278
00:12:15,840 –> 00:12:16,840
The policy gate.
279
00:12:16,840 –> 00:12:20,560
A policy gate is a deterministic decision point that runs before execution.
280
00:12:20,560 –> 00:12:25,640
It evaluates a structured request against policy and authoritative state and returns, allow,
281
00:12:25,640 –> 00:12:26,640
deny or transform.
282
00:12:26,640 –> 00:12:28,160
It is not a prompt instruction.
283
00:12:28,160 –> 00:12:29,160
It is not a persona.
284
00:12:29,160 –> 00:12:31,360
It is not a please ask for confirmation.
285
00:12:31,360 –> 00:12:32,640
It is an enforcement layer.
286
00:12:32,640 –> 00:12:34,600
The agent cannot bypass.
287
00:12:34,600 –> 00:12:35,920
Most enterprises have ordered.
288
00:12:35,920 –> 00:12:37,520
Some have fragments of provenance.
289
00:12:37,520 –> 00:12:38,760
Almost none have a real gate.
290
00:12:38,760 –> 00:12:43,040
That distinction matters because your worst failures happen in the gap between entitled
291
00:12:43,040 –> 00:12:44,360
and appropriate.
292
00:12:44,360 –> 00:12:48,000
The agent can be entitled to read a document and still be wrong to disclose it in that
293
00:12:48,000 –> 00:12:49,000
venue.
294
00:12:49,000 –> 00:12:52,840
The agent can be entitled to write and still be wrong to write here now in that way.
295
00:12:52,840 –> 00:12:56,160
Audit will happily record the wrong thing with perfect fidelity.
296
00:12:56,160 –> 00:12:58,160
Provenance helps you argue with reality less.
297
00:12:58,160 –> 00:13:00,120
It tells you how you arrived at the bad action.
298
00:13:00,120 –> 00:13:03,680
It’s what you need when a regulator asks, why did the system decide this?
299
00:13:03,680 –> 00:13:06,320
And your only other answer is, it felt right.
300
00:13:06,320 –> 00:13:09,760
Provenance turns post mortems from fan fiction into analysis, but provenance still doesn’t
301
00:13:09,760 –> 00:13:11,000
prevent the incident.
302
00:13:11,000 –> 00:13:12,080
Only a gate does.
303
00:13:12,080 –> 00:13:13,960
And the thing most people miss is timing.
304
00:13:13,960 –> 00:13:17,680
The strongest built in controls are mostly outside the action path.
305
00:13:17,680 –> 00:13:19,680
Conditional access happens at token acquisition.
306
00:13:19,680 –> 00:13:23,480
Per view happens after the fact those are important controls, but they are not action time
307
00:13:23,480 –> 00:13:24,720
authorization.
308
00:13:24,720 –> 00:13:28,680
So here’s what audit provenance policy gate looks like on a real timeline.
309
00:13:28,680 –> 00:13:32,600
The user asks the agent to do something in the agent retrieves context.
310
00:13:32,600 –> 00:13:34,160
It compiles candidates.
311
00:13:34,160 –> 00:13:35,520
It selects a tool.
312
00:13:35,520 –> 00:13:38,600
In a safe architecture, there’s a hard boundary right there.
313
00:13:38,600 –> 00:13:41,680
The agent submits a request, not an imperative.
314
00:13:41,680 –> 00:13:45,600
Data, intent, scope, data class, venue and an operation ID.
315
00:13:45,600 –> 00:13:49,840
The policy engine evaluates those attributes against rules and authoritative state, produces
316
00:13:49,840 –> 00:13:53,000
a decision artifact and only then does execution happen.
317
00:13:53,000 –> 00:13:55,800
And the decision artifact gets stored next to the action.
318
00:13:55,800 –> 00:13:59,560
That last part is what makes governance real, because the artifact is proof, not narrative.
319
00:13:59,560 –> 00:14:00,560
You can sample it.
320
00:14:00,560 –> 00:14:01,560
You can query it.
321
00:14:01,560 –> 00:14:02,560
You can show it in an audit.
322
00:14:02,560 –> 00:14:07,720
A loud under rule, D104, constraints C17 based on state version 6.
323
00:14:07,720 –> 00:14:11,000
Or denied under rule V302 due to mixed audience.
324
00:14:11,000 –> 00:14:13,160
This is what prevention looks like when it’s measurable.
325
00:14:13,160 –> 00:14:15,800
Now, the obvious pushback is, but we have transcripts.
326
00:14:15,800 –> 00:14:17,000
We have citations.
327
00:14:17,000 –> 00:14:18,000
We have activity logs.
328
00:14:18,000 –> 00:14:19,000
Isn’t that provenance?
329
00:14:19,000 –> 00:14:20,000
No.
330
00:14:20,000 –> 00:14:22,440
Transcripts are experienced playing narration.
331
00:14:22,440 –> 00:14:24,200
Citations are retrieval references.
332
00:14:24,200 –> 00:14:25,360
Activity logs are event records.
333
00:14:25,360 –> 00:14:26,360
They are necessary.
334
00:14:26,360 –> 00:14:27,360
They are not sufficient.
335
00:14:27,360 –> 00:14:28,880
They do not tell you what was excluded.
336
00:14:28,880 –> 00:14:31,200
They do not tell you what alternatives were rejected.
337
00:14:31,200 –> 00:14:35,080
They do not tell you whether a policy evaluated the action before execution.
338
00:14:35,080 –> 00:14:37,680
They do not tell you whether the system could have stopped itself.
339
00:14:37,680 –> 00:14:40,960
If you remember, nothing else from this section, keep this ordering straight.
340
00:14:40,960 –> 00:14:42,400
It explains what happened.
341
00:14:42,400 –> 00:14:44,680
Provenance explains why that path was taken.
342
00:14:44,680 –> 00:14:47,840
A policy gate decides whether it’s allowed to happen at all.
343
00:14:47,840 –> 00:14:52,000
And when you add a face and a voice, you increase the probability that your organization
344
00:14:52,000 –> 00:14:54,120
confuses the first two for the third.
345
00:14:54,120 –> 00:14:58,120
Case study 1, mis-scoped tool call, deletes the wrong sharepoint side.
346
00:14:58,120 –> 00:15:02,320
Here’s the first failure pattern because it’s the one that keeps happening quietly in enterprises
347
00:15:02,320 –> 00:15:04,160
that believe were governed.
348
00:15:04,160 –> 00:15:08,160
A productivity team rolls out an agent to clean up obsolete project sites.
349
00:15:08,160 –> 00:15:09,480
The brief sounds harmless.
350
00:15:09,480 –> 00:15:11,280
The agent is grounded in sharepoint.
351
00:15:11,280 –> 00:15:15,680
It can read site metadata, pass a tracker spreadsheet, and it has Microsoft graph write
352
00:15:15,680 –> 00:15:19,720
access because eventually it needs to delete or archive things.
353
00:15:19,720 –> 00:15:21,240
The organization is proud.
354
00:15:21,240 –> 00:15:23,920
It’s using a dedicated workload identity.
355
00:15:23,920 –> 00:15:26,600
Conditional access is enforced and purview capture is enabled.
356
00:15:26,600 –> 00:15:29,080
At 0902, the agent authenticates.
357
00:15:29,080 –> 00:15:30,880
Conditional access evaluates and passes.
358
00:15:30,880 –> 00:15:31,880
A token is issued.
359
00:15:31,880 –> 00:15:32,880
No anomaly.
360
00:15:32,880 –> 00:15:33,880
No risk event.
361
00:15:33,880 –> 00:15:35,880
This is what good looks like.
362
00:15:35,880 –> 00:15:40,160
At 0905, a user asks, “Can you remove the old project spaces from last year?”
363
00:15:40,160 –> 00:15:42,640
The active list is in the project’s archive tracker.
364
00:15:42,640 –> 00:15:44,400
Now the agent does what agents do.
365
00:15:44,400 –> 00:15:45,400
It retrieves context.
366
00:15:45,400 –> 00:15:46,480
It reads the tracker.
367
00:15:46,480 –> 00:15:48,520
It searches for sites with similar names.
368
00:15:48,520 –> 00:15:49,720
It weighs signals.
369
00:15:49,720 –> 00:15:54,440
Last modified date, owner, whether a team’s channel exists, whether there are recent files,
370
00:15:54,440 –> 00:15:58,680
maybe a week hint, from an email thread, none of those are authoritative truth.
371
00:15:58,680 –> 00:15:59,680
They’re clues.
372
00:15:59,680 –> 00:16:00,680
Then it makes the choice.
373
00:16:00,680 –> 00:16:02,640
It selects a site that looks obsolete.
374
00:16:02,640 –> 00:16:03,800
And it calls the tool.
375
00:16:03,800 –> 00:16:06,520
It executes a graph delete on the wrong sharepoint site.
376
00:16:06,520 –> 00:16:07,520
Nothing exotic happened here.
377
00:16:07,520 –> 00:16:08,520
No prompt injection.
378
00:16:08,520 –> 00:16:10,120
No compromised credential.
379
00:16:10,120 –> 00:16:11,680
No global admin role.
380
00:16:11,680 –> 00:16:16,520
This is normal probabilistic selection, acting at machine speed, with standing right scopes.
381
00:16:16,520 –> 00:16:18,800
Now look at what your governance artifacts say.
382
00:16:18,800 –> 00:16:20,200
Purview will show an interaction.
383
00:16:20,200 –> 00:16:21,720
It will show the user request.
384
00:16:21,720 –> 00:16:23,520
It will show the agent’s response.
385
00:16:23,520 –> 00:16:24,520
You will see timestamps.
386
00:16:24,520 –> 00:16:25,920
You will see the agent identity.
387
00:16:25,920 –> 00:16:29,920
You may see citations pointing to the tracker and maybe a policy doc.
388
00:16:29,920 –> 00:16:33,600
And you will see an activity a site was deleted by that agent identity.
389
00:16:33,600 –> 00:16:34,600
Everything is correct.
390
00:16:34,600 –> 00:16:37,640
And none of it answers the question that matters in the incident review.
391
00:16:37,640 –> 00:16:38,800
Why that site?
392
00:16:38,800 –> 00:16:40,880
Not the narrative because it was obsolete.
393
00:16:40,880 –> 00:16:42,360
The actual decision chain.
394
00:16:42,360 –> 00:16:46,000
Which retrieved chunk, pushed it over the threshold, which alternative candidates were
395
00:16:46,000 –> 00:16:47,840
considered and rejected.
396
00:16:47,840 –> 00:16:51,040
What eligibility rule was evaluated at the moment of execution?
397
00:16:51,040 –> 00:16:54,920
In most deployments, the answer is, no eligibility rule was evaluated.
398
00:16:54,920 –> 00:16:56,760
The agent inferred eligibility.
399
00:16:56,760 –> 00:17:00,760
That inference became an action because the tool was callable and the token was valid.
400
00:17:00,760 –> 00:17:02,200
Or it gave you a story.
401
00:17:02,200 –> 00:17:03,680
It did not give you prevention.
402
00:17:03,680 –> 00:17:07,360
And the worst part is how the post-incident conversation usually goes because it’s always
403
00:17:07,360 –> 00:17:09,120
experience plane thinking.
404
00:17:09,120 –> 00:17:10,120
We’ll improve the prompt.
405
00:17:10,120 –> 00:17:11,840
We’ll add a confirmation step.
406
00:17:11,840 –> 00:17:13,960
We’ll tell users to be more specific.
407
00:17:13,960 –> 00:17:15,280
Those are all entropy generators.
408
00:17:15,280 –> 00:17:18,800
They add more conditional branches, more human confusion and more opportunity for the
409
00:17:18,800 –> 00:17:21,280
agent to interpret a suggestion as a command.
410
00:17:21,280 –> 00:17:25,080
The architectural fix is boring and it works because it doesn’t require belief.
411
00:17:25,080 –> 00:17:26,080
First, idempotency.
412
00:17:26,080 –> 00:17:30,120
Every destructive request carries an operation ID persisted before execution.
413
00:17:30,120 –> 00:17:33,560
The same request is replayed, retried, duplicated or reordered.
414
00:17:33,560 –> 00:17:37,240
The system returns the prior outcome and does not re-execute side effects.
415
00:17:37,240 –> 00:17:40,760
That turns event-driven unreliability into safe replay.
416
00:17:40,760 –> 00:17:41,920
Second, authoritative state.
417
00:17:41,920 –> 00:17:43,720
Eligible for deletion is not a vibe.
418
00:17:43,720 –> 00:17:46,360
It’s a state property stored in a system of record.
419
00:17:46,360 –> 00:17:50,000
If the authoritative catalog says retired through dormant 90 days and owner approved
420
00:17:50,000 –> 00:17:51,720
true, then the site can be deleted.
421
00:17:51,720 –> 00:17:53,440
If not, the site cannot be deleted.
422
00:17:53,440 –> 00:17:55,280
The agent does not get to negotiate that.
423
00:17:55,280 –> 00:17:57,280
Third, the policy gate.
424
00:17:57,280 –> 00:18:00,080
Before the delete tool executes, the agent submits a structure.
425
00:18:00,080 –> 00:18:01,080
Request.
426
00:18:01,080 –> 00:18:02,080
Actor.
427
00:18:02,080 –> 00:18:03,080
Intent.
428
00:18:03,080 –> 00:18:04,080
Delete.
429
00:18:04,080 –> 00:18:05,080
Scope.
430
00:18:05,080 –> 00:18:06,080
Side.
431
00:18:06,080 –> 00:18:07,080
ID.
432
00:18:07,080 –> 00:18:08,080
Data class.
433
00:18:08,080 –> 00:18:09,080
Venue.
434
00:18:09,080 –> 00:18:10,080
Operation.
435
00:18:10,080 –> 00:18:11,080
ID.
436
00:18:11,080 –> 00:18:12,080
The policy engine evaluates that request against rules and authoritative state and returns allow,
437
00:18:12,080 –> 00:18:13,080
deny or transform.
438
00:18:13,080 –> 00:18:16,080
If it denies, the tool never sees the request.
439
00:18:16,080 –> 00:18:19,080
If it allows, the decision artifact is stored next to the action.
440
00:18:19,080 –> 00:18:21,440
Now, replay the same scenario under that model.
441
00:18:21,440 –> 00:18:22,840
The agent compiles candidates.
442
00:18:22,840 –> 00:18:24,520
It proposes the wrong side.
443
00:18:24,520 –> 00:18:28,160
The policy engine evaluates the proposal against the authoritative catalog.
444
00:18:28,160 –> 00:18:30,160
The wrong side fails eligibility.
445
00:18:30,160 –> 00:18:31,160
Deny.
446
00:18:31,160 –> 00:18:32,160
The user still gets a transcript.
447
00:18:32,160 –> 00:18:33,480
The activity logs still exist.
448
00:18:33,480 –> 00:18:37,320
The difference is that your incident is now a denied decision, not a post-mortem.
449
00:18:37,320 –> 00:18:41,080
That’s what audit provenance policy gate means operationally.
450
00:18:41,080 –> 00:18:42,840
Audit will always be perfect after the damage.
451
00:18:42,840 –> 00:18:44,880
A gate makes the damage never happen.
452
00:18:44,880 –> 00:18:46,560
Case study 2.
453
00:18:46,560 –> 00:18:47,560
Compliant retrieval.
454
00:18:47,560 –> 00:18:49,560
Policy violation via voice in a meeting.
455
00:18:49,560 –> 00:18:53,600
Now move from wrong target to the failure that governance teams hate because it breaks
456
00:18:53,600 –> 00:18:55,760
all their comfortable categories.
457
00:18:55,760 –> 00:18:56,760
Everything is entitled.
458
00:18:56,760 –> 00:18:57,760
Everything is logged.
459
00:18:57,760 –> 00:18:58,760
Correct.
460
00:18:58,760 –> 00:19:00,080
And it’s still unacceptable.
461
00:19:00,080 –> 00:19:03,560
An HR assistant agent gets deployed into team’s meetings.
462
00:19:03,560 –> 00:19:08,400
It’s grounded on policy documents, FAQ’s, compensation guidance and a curated SharePoint
463
00:19:08,400 –> 00:19:11,080
library managed by the compensation team.
464
00:19:11,080 –> 00:19:12,520
The pitch sounds responsible.
465
00:19:12,520 –> 00:19:13,520
The agent is read only.
466
00:19:13,520 –> 00:19:15,080
It’s not writing anywhere.
467
00:19:15,080 –> 00:19:17,480
And it’s meant to reduce interruptions in live calls.
468
00:19:17,480 –> 00:19:18,760
A director asks a question.
469
00:19:18,760 –> 00:19:19,760
The agent answers.
470
00:19:19,760 –> 00:19:20,760
Everyone moves on.
471
00:19:20,760 –> 00:19:22,120
The identity model looks clean.
472
00:19:22,120 –> 00:19:24,400
It runs under a workload identity.
473
00:19:24,400 –> 00:19:26,120
Conditional access protects token issuance.
474
00:19:26,120 –> 00:19:29,240
Per view is configured to capture conversation transcripts.
475
00:19:29,240 –> 00:19:30,920
Copilot activity logs are enabled.
476
00:19:30,920 –> 00:19:33,080
From a governance standpoint it checks boxes.
477
00:19:33,080 –> 00:19:34,080
Then the meeting happens.
478
00:19:34,080 –> 00:19:37,160
A director asks, what are the employee trends this quarter?
479
00:19:37,160 –> 00:19:38,720
That question is vague on purpose.
480
00:19:38,720 –> 00:19:43,240
Humans ask vague questions in meetings because they don’t want to specify constraints out loud.
481
00:19:43,240 –> 00:19:46,400
They assume the audience understands the implied boundaries.
482
00:19:46,400 –> 00:19:49,120
Don’t mention anything sensitive in front of externals.
483
00:19:49,120 –> 00:19:50,120
Keep it high level.
484
00:19:50,120 –> 00:19:53,120
Don’t surface anything that can be misinterpreted or forwarded.
485
00:19:53,120 –> 00:19:54,800
The agent does not have those instincts.
486
00:19:54,800 –> 00:19:56,240
It does what it was built to do.
487
00:19:56,240 –> 00:19:57,240
It retrieves.
488
00:19:57,240 –> 00:19:58,240
It aggregates.
489
00:19:58,240 –> 00:19:59,240
It summarizes.
490
00:19:59,240 –> 00:20:01,680
It picks numbers because numbers sound authoritative.
491
00:20:01,680 –> 00:20:03,680
It synthesizes a clean verbal answer.
492
00:20:03,680 –> 00:20:04,840
And it says it out loud.
493
00:20:04,840 –> 00:20:07,920
Maybe it mentions compensation movement by level in region.
494
00:20:07,920 –> 00:20:10,280
Maybe it reports internal mobility rates.
495
00:20:10,280 –> 00:20:14,600
Maybe it references subgroup deltas because the underlying documents include those charts.
496
00:20:14,600 –> 00:20:15,600
No names.
497
00:20:15,600 –> 00:20:17,320
No row level PII.
498
00:20:17,320 –> 00:20:18,960
No single record disclosure.
499
00:20:18,960 –> 00:20:20,760
Still a policy violation.
500
00:20:20,760 –> 00:20:22,760
Because the harm here isn’t access.
501
00:20:22,760 –> 00:20:24,520
The harm is venue.
502
00:20:24,520 –> 00:20:26,960
The harm is aggregation.
503
00:20:26,960 –> 00:20:29,080
The meeting includes external participants.
504
00:20:29,080 –> 00:20:33,200
Vendors, a partner org, someone dialing in from an unfamiliar domain that happens constantly
505
00:20:33,200 –> 00:20:34,920
in modern enterprises.
506
00:20:34,920 –> 00:20:36,880
Teams meetings are porous by default.
507
00:20:36,880 –> 00:20:39,440
The audience boundary shifts in real time.
508
00:20:39,440 –> 00:20:42,160
And the agent, because it is speaking, becomes an egress path.
509
00:20:42,160 –> 00:20:46,120
Now look at the telemetry and watch how it fails you while remaining technically correct.
510
00:20:46,120 –> 00:20:47,320
Per view shows the transcript.
511
00:20:47,320 –> 00:20:48,800
The question and the answer are there.
512
00:20:48,800 –> 00:20:51,240
The citations point to the right HR library documents.
513
00:20:51,240 –> 00:20:52,880
The agent identity is valid.
514
00:20:52,880 –> 00:20:56,240
The user who asked the question is entitled to the documents.
515
00:20:56,240 –> 00:20:57,800
The share point permissions are correct.
516
00:20:57,800 –> 00:20:59,440
The retrieval was security trimmed.
517
00:20:59,440 –> 00:21:01,240
All the traditional controls passed.
518
00:21:01,240 –> 00:21:02,480
So what exactly was missing?
519
00:21:02,480 –> 00:21:05,360
The policy evaluation that should have happened at speech time.
520
00:21:05,360 –> 00:21:09,320
Nobody asked a deterministic question like, is it permissible to verbalize this class of
521
00:21:09,320 –> 00:21:13,080
information at this aggregation level in this venue to this audience?
522
00:21:13,080 –> 00:21:15,720
Because speech was treated as output, not action.
523
00:21:15,720 –> 00:21:16,720
This is the trap.
524
00:21:16,720 –> 00:21:18,840
Teams treat tool calls like actions.
525
00:21:18,840 –> 00:21:22,320
Graph rights, deletes, shares, but they treat speech like harmless UI.
526
00:21:22,320 –> 00:21:23,160
It is not.
527
00:21:23,160 –> 00:21:25,040
In a meeting speech is publication.
528
00:21:25,040 –> 00:21:28,000
It leaves the system boundary the moment it hits the room.
529
00:21:28,000 –> 00:21:29,080
People repeat it.
530
00:21:29,080 –> 00:21:30,080
Screen shots happen.
531
00:21:30,080 –> 00:21:31,640
Someone says, can you send that to me?
532
00:21:31,640 –> 00:21:32,760
And now it’s in chat.
533
00:21:32,760 –> 00:21:37,240
The output becomes durable even if the data never left share point at the file level.
534
00:21:37,240 –> 00:21:40,640
Your deal-p policies can stay green while your policy posture goes red.
535
00:21:40,640 –> 00:21:44,360
And because the agent sounds calm and competent, nobody interrupts it.
536
00:21:44,360 –> 00:21:47,440
Human interface trust bias turns the meeting into an amplifier.
537
00:21:47,440 –> 00:21:51,000
The agent just shipped an aggregation to a mixed audience at machine speed, wrapped in
538
00:21:51,000 –> 00:21:52,800
a tone that implies permission.
539
00:21:52,800 –> 00:21:54,560
Now the fix again isn’t band voice.
540
00:21:54,560 –> 00:21:56,760
The fix is to treat voice as a tool call.
541
00:21:56,760 –> 00:22:00,640
Before the agent speaks, you classify the output, not just the input documents.
542
00:22:00,640 –> 00:22:01,680
The output.
543
00:22:01,680 –> 00:22:06,320
You attach attributes, data class compensation aggregation cohort, venue or team’s meeting,
544
00:22:06,320 –> 00:22:09,000
audience mixed external present true.
545
00:22:09,000 –> 00:22:12,080
Then you submit that as a request to a policy engine.
546
00:22:12,080 –> 00:22:16,120
And the policy engine does what humans do automatically and machines never do unless
547
00:22:16,120 –> 00:22:17,160
you force them to.
548
00:22:17,160 –> 00:22:19,200
It evaluates a rule like.
549
00:22:19,200 –> 00:22:23,720
Compensation cohorts may not be disclosed verbally when external participants are present.
550
00:22:23,720 –> 00:22:27,040
Allow only high level summaries with no subgroup references.
551
00:22:27,040 –> 00:22:30,600
Transform the response or deny it if it denies the speech to never runs.
552
00:22:30,600 –> 00:22:33,480
If it transforms, the agent speaks a sanitized version.
553
00:22:33,480 –> 00:22:35,680
High level trends remained within target ranges.
554
00:22:35,680 –> 00:22:39,120
Detailed breakdown is available to HR only audiences.
555
00:22:39,120 –> 00:22:43,240
And the decision artifact gets stored next to the action denied or transformed under rule
556
00:22:43,240 –> 00:22:46,000
v302 with the attributes that triggered it.
557
00:22:46,000 –> 00:22:47,360
Now replay the incident.
558
00:22:47,360 –> 00:22:50,200
Some questions, same retrieval, same entitlement, different outcome.
559
00:22:50,200 –> 00:22:54,040
The agent proposes a detailed answer, the control plane disposes, the meeting gets a safe
560
00:22:54,040 –> 00:22:57,280
summary, and your governance story becomes boring on purpose.
561
00:22:57,280 –> 00:23:01,040
Because compliance systems still fail when venue and intent aren’t enforced at the moment
562
00:23:01,040 –> 00:23:02,440
of publication.
563
00:23:02,440 –> 00:23:06,120
Case study 3, external shadow agent with internal blast radius.
564
00:23:06,120 –> 00:23:10,280
Now the failure pattern that doesn’t show up as a breach until the screenshots are already
565
00:23:10,280 –> 00:23:11,280
circulating.
566
00:23:11,280 –> 00:23:13,120
A developer is under pressure.
567
00:23:13,120 –> 00:23:14,360
Support tickets are piling up.
568
00:23:14,360 –> 00:23:18,920
The product team wants a deflection bot and someone has seen a demo where an agent answers questions
569
00:23:18,920 –> 00:23:19,920
instantly.
570
00:23:19,920 –> 00:23:21,720
So they do what modern platforms encourage.
571
00:23:21,720 –> 00:23:26,240
They stand up an externally accessible agent, put a chat widget on a public page, and wire
572
00:23:26,240 –> 00:23:29,320
it to enterprise knowledge so it doesn’t sound stupid.
573
00:23:29,320 –> 00:23:34,360
And because it’s just answering questions, they give it broad read scopes to internal content.
574
00:23:34,360 –> 00:23:38,960
A share point side with runbooks, a wiki, maybe a knowledge base, maybe a support analytics
575
00:23:38,960 –> 00:23:39,960
store.
576
00:23:39,960 –> 00:23:42,280
They also add a couple of write scopes for later.
577
00:23:42,280 –> 00:23:46,320
Because they’re malicious, because future features always arrive and nobody wants to redo consent,
578
00:23:46,320 –> 00:23:48,600
the agent authenticates using an app registration.
579
00:23:48,600 –> 00:23:49,600
It gets a token.
580
00:23:49,600 –> 00:23:51,000
It calls internal systems.
581
00:23:51,000 –> 00:23:52,000
Everything is legitimate.
582
00:23:52,000 –> 00:23:53,480
That’s the core danger here.
583
00:23:53,480 –> 00:23:55,480
Nothing has to be compromised for this to go wrong.
584
00:23:55,480 –> 00:24:00,720
A customer asks a harmless question, what’s the work around for the X120 firmware outage?
585
00:24:00,720 –> 00:24:04,320
The agent retrieves internal runbooks and post mortem fragments that were never meant
586
00:24:04,320 –> 00:24:05,600
to leave the tenant.
587
00:24:05,600 –> 00:24:07,320
It assembles a confident answer.
588
00:24:07,320 –> 00:24:08,800
It publishes it to the public chat.
589
00:24:08,800 –> 00:24:12,800
No exploit chain, no prompt injection, no data exfiltration tooling, just a public interface
590
00:24:12,800 –> 00:24:16,600
connected to an internal corpus by an overpermissioned workload identity.
591
00:24:16,600 –> 00:24:19,680
Now walk through what the logs tell you and what they can’t.
592
00:24:19,680 –> 00:24:22,720
Enter shows token issuance under a workload identity.
593
00:24:22,720 –> 00:24:26,440
If conditional access is configured for that identity, it evaluates the signing context
594
00:24:26,440 –> 00:24:27,720
and issues the token.
595
00:24:27,720 –> 00:24:30,440
Per view shows the agent reading internal documents.
596
00:24:30,440 –> 00:24:34,880
The audit trail is pristine, identity timestamps resource access downstream calls.
597
00:24:34,880 –> 00:24:39,160
The organization can prove down to the minute that the agent touched those files and responded
598
00:24:39,160 –> 00:24:40,640
to that external user.
599
00:24:40,640 –> 00:24:45,320
And that’s the trap, because the logs being correct becomes evidence incorrectly that the
600
00:24:45,320 –> 00:24:46,680
system was governed.
601
00:24:46,680 –> 00:24:49,640
What’s missing is the decision chain and the boundary enforcement.
602
00:24:49,640 –> 00:24:53,720
Why did the external request get access to internal only material, which rule asserted
603
00:24:53,720 –> 00:24:56,360
that this venue is allowed to consume that corpus?
604
00:24:56,360 –> 00:25:01,320
Where is the policy artifact that says public audience internal classification deny disclosure?
605
00:25:01,320 –> 00:25:04,840
In most shadow deployments, there is no artifact because there was no gate.
606
00:25:04,840 –> 00:25:08,280
The agent selected sources based on similarity and availability.
607
00:25:08,280 –> 00:25:10,920
The tool call executed because the token allowed it.
608
00:25:10,920 –> 00:25:13,040
The system did exactly what you configured.
609
00:25:13,040 –> 00:25:16,720
This is where audit provenance policy gate becomes operationally expensive.
610
00:25:16,720 –> 00:25:18,360
Audit tells you the leak happened.
611
00:25:18,360 –> 00:25:21,840
Provenance would tell you how the agent chose that runbook over a public doc.
612
00:25:21,840 –> 00:25:23,120
What other candidates existed?
613
00:25:23,120 –> 00:25:24,640
What was excluded and why?
614
00:25:24,640 –> 00:25:28,280
A policy gate would have prevented the response from ever being published externally.
615
00:25:28,280 –> 00:25:30,640
But the public facing agent usually has none of that.
616
00:25:30,640 –> 00:25:35,200
It has an experience plane that looks polished and a control plane that is effectively absent.
617
00:25:35,200 –> 00:25:37,240
Now the blast radius, this isn’t a single reply.
618
00:25:37,240 –> 00:25:38,880
It’s speed, reach and replication.
619
00:25:38,880 –> 00:25:41,920
The agent can answer a thousand external users in a day.
620
00:25:41,920 –> 00:25:45,120
Each answer can include a slightly different internal detail.
621
00:25:45,120 –> 00:25:48,600
Customers screenshot aggregators scrape the information spreads because the interfaces
622
00:25:48,600 –> 00:25:51,560
public and the system is consistent in the one way that matters.
623
00:25:51,560 –> 00:25:53,200
It’s consistently allowed.
624
00:25:53,200 –> 00:25:54,960
And the post incident review is predictable.
625
00:25:54,960 –> 00:25:59,920
People say we’ll tighten the prompt or we’ll add a disclaimer or we’ll retrain the model.
626
00:25:59,920 –> 00:26:01,480
Those are not containment strategies.
627
00:26:01,480 –> 00:26:03,000
Those are narrative strategies.
628
00:26:03,000 –> 00:26:07,000
The deterministic fix is boring and it works because it creates failure domains.
629
00:26:07,000 –> 00:26:09,280
First, split the identities.
630
00:26:09,280 –> 00:26:13,840
The public facing agent identity must have zero access to internal core data planes.
631
00:26:13,840 –> 00:26:14,840
None.
632
00:26:14,840 –> 00:26:17,640
It should only query a curated, published, approved external knowledge base.
633
00:26:17,640 –> 00:26:22,840
If the public corpus can’t answer the correct behavior as refusal or escalation, not improvisation.
634
00:26:22,840 –> 00:26:26,920
Second, if you truly need internal knowledge to support external responses, you introduce
635
00:26:26,920 –> 00:26:27,920
a broker.
636
00:26:27,920 –> 00:26:31,380
The public agent can ask the broker for candidate content but the broker is the policy
637
00:26:31,380 –> 00:26:32,380
gate.
638
00:26:32,380 –> 00:26:34,640
It evaluates venue, audience and data classification.
639
00:26:34,640 –> 00:26:35,980
It transforms or denies.
640
00:26:35,980 –> 00:26:39,880
The public agent never sees internal chunks that are not eligible for egress.
641
00:26:39,880 –> 00:26:42,780
Third, persist the decision artifact with the action.
642
00:26:42,780 –> 00:26:48,520
When a response is allowed externally, you store a loud underrule EX2-1 source set,
643
00:26:48,520 –> 00:26:51,320
PUB docs 2024 Q2.
644
00:26:51,320 –> 00:26:56,640
When it’s denied, you store denied underrule EX3-01 internal only content.
645
00:26:56,640 –> 00:27:00,120
Now your audit stops being a story and becomes proof of enforcement.
646
00:27:00,120 –> 00:27:01,960
Replay the same incident under that model.
647
00:27:01,960 –> 00:27:03,880
The customer asks about firmware.
648
00:27:03,880 –> 00:27:06,880
The public agent searches the external corpus and finds nothing definitive.
649
00:27:06,880 –> 00:27:08,160
It asks the broker.
650
00:27:08,160 –> 00:27:11,280
The broker evaluates the internal candidate and denies egress.
651
00:27:11,280 –> 00:27:12,920
The agent replies calmly.
652
00:27:12,920 –> 00:27:17,000
I can’t share internal remediation notes here but I can connect you with support.
653
00:27:17,000 –> 00:27:19,160
The screenshot that circulates is a refusal.
654
00:27:19,160 –> 00:27:22,000
That is what containment looks like when you stop trusting the interface and start
655
00:27:22,000 –> 00:27:24,200
enforcing the control plane.
656
00:27:24,200 –> 00:27:27,320
The internal standardizes the envelope, not the guarantees.
657
00:27:27,320 –> 00:27:31,560
Microsoft is right about one thing that most teams quietly misunderstand.
658
00:27:31,560 –> 00:27:35,800
Activities, turn context, direct line, the bot framework patterns, those are not hacks.
659
00:27:35,800 –> 00:27:37,560
They are intentional design surfaces.
660
00:27:37,560 –> 00:27:41,560
They’re how Microsoft expects you to build conversational systems that can run across channels
661
00:27:41,560 –> 00:27:43,800
and survive real-world connectivity.
662
00:27:43,800 –> 00:27:48,640
But teams keep hearing supported protocol and mentally upgrading it to guaranteed behavior.
663
00:27:48,640 –> 00:27:49,640
It is not.
664
00:27:49,640 –> 00:27:51,680
A protocol standardizes the envelope.
665
00:27:51,680 –> 00:27:55,560
It standardizes field names, schemas and how events are represented on the wire.
666
00:27:55,560 –> 00:27:59,920
It does not standardize the guarantees you wish you had, ordering exactly once delivery,
667
00:27:59,920 –> 00:28:02,360
causal consistency or safe side effects.
668
00:28:02,360 –> 00:28:06,280
That distinction matters because the moment you wire tool execution to an event stream,
669
00:28:06,280 –> 00:28:07,840
you’ve built a distributed system.
670
00:28:07,840 –> 00:28:11,160
And distributed systems don’t fail because you wrote bad code.
671
00:28:11,160 –> 00:28:13,800
They fail because reality is asynchronous.
672
00:28:13,800 –> 00:28:17,120
Here are the four failure modes you inherit the second you go event-driven.
673
00:28:17,120 –> 00:28:20,520
Duplication, delay, reordering and loss.
674
00:28:20,520 –> 00:28:23,080
Not edge cases, not rare, the environment.
675
00:28:23,080 –> 00:28:25,120
A retry duplicates an activity.
676
00:28:25,120 –> 00:28:27,720
A congested path delays it, two workers re-order it.
677
00:28:27,720 –> 00:28:29,680
A transient broker drop loses it.
678
00:28:29,680 –> 00:28:31,480
The SDK abstracts the plumbing.
679
00:28:31,480 –> 00:28:32,720
It does not repeal physics.
680
00:28:32,720 –> 00:28:33,720
Now add tools.
681
00:28:33,720 –> 00:28:38,160
A send email, delete site or post-message tool call is not a chat reply.
682
00:28:38,160 –> 00:28:39,160
It’s a side effect.
683
00:28:39,160 –> 00:28:41,480
Side effects are where your system becomes expensive.
684
00:28:41,480 –> 00:28:45,560
If you treat an incoming activity as authoritative state you are saying, “If I see this envelope,
685
00:28:45,560 –> 00:28:46,880
I will mutate the world.”
686
00:28:46,880 –> 00:28:49,080
That’s fine for rendering a typing indicator.
687
00:28:49,080 –> 00:28:51,040
It’s insane for deleting a site.
688
00:28:51,040 –> 00:28:54,480
And this is exactly how you get the incident pattern from the opening.
689
00:28:54,480 –> 00:28:59,000
Conditional access issued a token once, the context drifted, the agent executed anyway,
690
00:28:59,000 –> 00:29:02,440
and per view logged the whole tragedy with perfect fidelity.
691
00:29:02,440 –> 00:29:06,840
The comfortable response is to say, “Will did you, teams try to did you by best effort in
692
00:29:06,840 –> 00:29:11,520
memory caches, fuzzy comparisons, if the text matches or correlation IDs that exist only
693
00:29:11,520 –> 00:29:13,040
inside a process boundary?”
694
00:29:13,040 –> 00:29:16,080
That’s not id-impotency, that’s optimism.
695
00:29:16,080 –> 00:29:17,480
But impotency is a contract.
696
00:29:17,480 –> 00:29:22,240
The same operation id produces the same result once, no matter how many times it arrives.
697
00:29:22,240 –> 00:29:26,200
And the only way to make that true is to persist the operation id in an authoritative store
698
00:29:26,200 –> 00:29:28,080
before the side effect happens.
699
00:29:28,080 –> 00:29:29,920
That store is the real boundary.
700
00:29:29,920 –> 00:29:33,080
Not the activity envelope, not the transcript, not the avatar.
701
00:29:33,080 –> 00:29:35,200
This is where most agent architectures quietly rot.
702
00:29:35,200 –> 00:29:37,000
They build a state machine out of envelopes.
703
00:29:37,000 –> 00:29:40,720
Turn context feels like state because it carries context, but it’s a context object, not
704
00:29:40,720 –> 00:29:41,720
a ledger.
705
00:29:41,720 –> 00:29:43,800
It’s a structured wrapper for a single turn.
706
00:29:43,800 –> 00:29:46,680
Not a durable source of truth for workflow eligibility.
707
00:29:46,680 –> 00:29:49,680
When the process restarts, the truth evaporates.
708
00:29:49,680 –> 00:29:53,520
And the event stream happily replays old messages into a new process that has no memory
709
00:29:53,520 –> 00:29:54,920
of what it already did.
710
00:29:54,920 –> 00:29:56,400
That’s conditional chaos.
711
00:29:56,400 –> 00:29:58,600
And notice what makes it worse, embodiment.
712
00:29:58,600 –> 00:30:00,520
Voice adds latency sensitivity.
713
00:30:00,520 –> 00:30:01,520
Streaming adds retries.
714
00:30:01,520 –> 00:30:04,080
Web RTC reconnect logic adds more events.
715
00:30:04,080 –> 00:30:08,040
The experience plane injects more asynchronous behavior into the system, which increases the
716
00:30:08,040 –> 00:30:11,640
probability of duplicates, reordering, and partial failures.
717
00:30:11,640 –> 00:30:13,160
Exactly where your tool calls live.
718
00:30:13,160 –> 00:30:14,880
So the fix starts with the demotion.
719
00:30:14,880 –> 00:30:17,160
Demote events to proposals and telemetry.
720
00:30:17,160 –> 00:30:20,960
Treat every event as a thing that happened or a request that arrived.
721
00:30:20,960 –> 00:30:23,440
Not a state transition that must execute.
722
00:30:23,440 –> 00:30:25,960
Your authoritative workflow position lives elsewhere.
723
00:30:25,960 –> 00:30:27,440
Eligibility lives elsewhere.
724
00:30:27,440 –> 00:30:28,960
The decision lives elsewhere.
725
00:30:28,960 –> 00:30:30,520
Then you do the boring thing that works.
726
00:30:30,520 –> 00:30:32,280
You interpose a deterministic gate.
727
00:30:32,280 –> 00:30:33,880
The agent does not send commands.
728
00:30:33,880 –> 00:30:36,640
It sends structured requests with an operation ID.
729
00:30:36,640 –> 00:30:42,240
The policy engine evaluates against authoritative state and returns, allow, deny, or transform.
730
00:30:42,240 –> 00:30:45,040
These accept decisions, not imperatives.
731
00:30:45,040 –> 00:30:49,800
If an event replays, the same operation ID returns the same decision and the same effect.
732
00:30:49,800 –> 00:30:53,960
If the event arrives out of order, the state store says resource doesn’t exist yet.
733
00:30:53,960 –> 00:30:56,040
And the request gets denied deterministically.
734
00:30:56,040 –> 00:31:00,720
If the event arrives late, it gets the same cash decision, not a fresh guess.
735
00:31:00,720 –> 00:31:03,840
Protocol standardization made the system interoperable.
736
00:31:03,840 –> 00:31:05,680
Deterministic design makes it survivable.
737
00:31:05,680 –> 00:31:06,920
Event-driven entropy.
738
00:31:06,920 –> 00:31:09,320
Why retries become incidents without determinism?
739
00:31:09,320 –> 00:31:12,800
This is where enterprise teams accidentally build roulette tables and then act shocked when
740
00:31:12,800 –> 00:31:14,120
the ball lands on red.
741
00:31:14,120 –> 00:31:15,840
They wire an agent to an event stream.
742
00:31:15,840 –> 00:31:19,920
They see clean activities arriving and they treat the arrival of an envelope as permission
743
00:31:19,920 –> 00:31:21,360
to mutate the world.
744
00:31:21,360 –> 00:31:25,160
Create the task, delete the site, send the email, post the message, and because the agent
745
00:31:25,160 –> 00:31:28,200
is just responding, they don’t treat it like a transactional system.
746
00:31:28,200 –> 00:31:29,440
They treat it like UI.
747
00:31:29,440 –> 00:31:30,840
That’s the foundational mistake.
748
00:31:30,840 –> 00:31:33,240
In an event-driven system, delivery is not a guarantee.
749
00:31:33,240 –> 00:31:34,240
It is an attempt.
750
00:31:34,240 –> 00:31:35,240
The platform will retry.
751
00:31:35,240 –> 00:31:36,720
The SDK will reconnect.
752
00:31:36,720 –> 00:31:38,160
WebRTC will renegotiate.
753
00:31:38,160 –> 00:31:39,640
The broker will re-deliver.
754
00:31:39,640 –> 00:31:43,120
When you add a speaking agent, you add more opportunities for those retries because the
755
00:31:43,120 –> 00:31:46,840
experience plane depends on low latency streaming and noisy networks.
756
00:31:46,840 –> 00:31:48,160
The system compensates.
757
00:31:48,160 –> 00:31:49,640
It tries again.
758
00:31:49,640 –> 00:31:50,800
Here’s the part.
759
00:31:50,800 –> 00:31:52,480
People refuse to internalize.
760
00:31:52,480 –> 00:31:54,760
At least once delivery is not reliability.
761
00:31:54,760 –> 00:31:56,880
It is duplication with good intentions.
762
00:31:56,880 –> 00:32:00,520
If your tool call is not a damp-potent, at least once becomes twice.
763
00:32:00,520 –> 00:32:03,120
Twice is an incident if the action isn’t reversible.
764
00:32:03,120 –> 00:32:05,000
The cleanest example is email.
765
00:32:05,000 –> 00:32:07,880
Everyone thinks email is harmless because it’s not deleting data.
766
00:32:07,880 –> 00:32:11,440
When the agent sends the same message twice because a transient error occurred after the
767
00:32:11,440 –> 00:32:15,600
first send before the system persisted success, the business impact isn’t technical.
768
00:32:15,600 –> 00:32:16,600
It’s human.
769
00:32:16,600 –> 00:32:18,080
People respond to the wrong thread.
770
00:32:18,080 –> 00:32:19,240
Someone escalates.
771
00:32:19,240 –> 00:32:20,480
Someone forwards.
772
00:32:20,480 –> 00:32:24,480
Now you’ve created confusion and possibly disclosure and your logs will insist everything
773
00:32:24,480 –> 00:32:26,840
was fine because both sends were legitimate.
774
00:32:26,840 –> 00:32:31,040
Now, upgrade the action to SharePoint Deletion or Permissions Changes and the same pattern
775
00:32:31,040 –> 00:32:32,440
becomes catastrophic.
776
00:32:32,440 –> 00:32:34,360
This is what actually happens in real workloads.
777
00:32:34,360 –> 00:32:35,840
The agent proposes an action.
778
00:32:35,840 –> 00:32:37,320
The orchestrator calls the tool.
779
00:32:37,320 –> 00:32:38,320
The tool executes.
780
00:32:38,320 –> 00:32:41,560
The response is slow or the network flakes or the process restarts.
781
00:32:41,560 –> 00:32:44,160
The orchestrator never receives the success, so it retries.
782
00:32:44,160 –> 00:32:45,640
The tool executes again.
783
00:32:45,640 –> 00:32:47,360
You now have two side effects.
784
00:32:47,360 –> 00:32:50,560
And the only reason you’re surprised is because you treated the first execution as if it
785
00:32:50,560 –> 00:32:52,400
was tied to the event receipt.
786
00:32:52,400 –> 00:32:53,400
It wasn’t.
787
00:32:53,400 –> 00:32:54,400
The action was tied to optimism.
788
00:32:54,400 –> 00:32:56,280
That’s why D-D-D-UP doesn’t save you.
789
00:32:56,280 –> 00:32:58,880
D-D-D-UP by best effort is not a safety mechanism.
790
00:32:58,880 –> 00:33:00,400
It is a logging convenience.
791
00:33:00,400 –> 00:33:01,800
People did do it using message text.
792
00:33:01,800 –> 00:33:04,920
That fails the first time the model rephrases the same intent.
793
00:33:04,920 –> 00:33:09,600
People did d-D-UP using timestamps that fails when delayed delivery shifts the arrival window.
794
00:33:09,600 –> 00:33:10,680
People did d-UP in memory.
795
00:33:10,680 –> 00:33:12,120
That fails on process restart.
796
00:33:12,120 –> 00:33:14,080
People did d-UP by correlating turn IDs.
797
00:33:14,080 –> 00:33:17,240
That fails across channels and adapters where IDs are transformed.
798
00:33:17,240 –> 00:33:19,400
The important see is not probably the same.
799
00:33:19,400 –> 00:33:20,760
It is provably the same.
800
00:33:20,760 –> 00:33:24,480
And provably the same requires two things, a stable operation identity and an authoritative
801
00:33:24,480 –> 00:33:26,400
state store that outlives the process.
802
00:33:26,400 –> 00:33:27,560
This is the system law.
803
00:33:27,560 –> 00:33:30,960
If an event can’t be safely replayed, it shouldn’t control state.
804
00:33:30,960 –> 00:33:32,520
Now apply that law to agents.
805
00:33:32,520 –> 00:33:33,960
The agents job is to propose.
806
00:33:33,960 –> 00:33:37,520
The system’s job is to decide and decide must be persistent.
807
00:33:37,520 –> 00:33:39,560
So the deterministic design is simple.
808
00:33:39,560 –> 00:33:43,600
Even if the implementation isn’t, every side effecting operation gets an immutable operation
809
00:33:43,600 –> 00:33:48,360
ID that is generated once, not per retry, once.
810
00:33:48,360 –> 00:33:51,960
That operation ID is persisted before the tool executes.
811
00:33:51,960 –> 00:33:55,720
The persisted record includes the proposed action and its current status.
812
00:33:55,720 –> 00:33:58,280
Proposed allowed denied executed failed.
813
00:33:58,280 –> 00:34:00,000
Then every retry becomes boring.
814
00:34:00,000 –> 00:34:03,840
If the same operation ID arrives again, the system returns the already recorded decision
815
00:34:03,840 –> 00:34:04,840
and outcome.
816
00:34:04,840 –> 00:34:05,840
No new side effect.
817
00:34:05,840 –> 00:34:08,960
No second deletion, no second email, no double share.
818
00:34:08,960 –> 00:34:10,480
The replay is saved by design.
819
00:34:10,480 –> 00:34:12,800
This is also how you neutralize reordering.
820
00:34:12,800 –> 00:34:17,160
If complete task arrives before create task, the state store says the task doesn’t exist
821
00:34:17,160 –> 00:34:19,440
and the request is denied deterministically.
822
00:34:19,440 –> 00:34:20,920
Not handled later, denied.
823
00:34:20,920 –> 00:34:23,400
The event becomes telemetry, not authority.
824
00:34:23,400 –> 00:34:25,320
And this is how you neutralize delay.
825
00:34:25,320 –> 00:34:27,120
Later rivals don’t trigger fresh decisions.
826
00:34:27,120 –> 00:34:30,640
They map to existing operation IDs and return existing outcomes.
827
00:34:30,640 –> 00:34:33,720
The system does not relitigate intent because the packet arrived late.
828
00:34:33,720 –> 00:34:35,960
Now here’s the uncomfortable part.
829
00:34:35,960 –> 00:34:37,560
None of this is a model problem.
830
00:34:37,560 –> 00:34:39,120
None of this is hallucination.
831
00:34:39,120 –> 00:34:40,840
None of this is Microsoft being sloppy.
832
00:34:40,840 –> 00:34:44,240
This is distributed systems behavior colliding with side effects.
833
00:34:44,240 –> 00:34:47,800
And the more you anthropomorphize the agent, the less likely you are to build these boring
834
00:34:47,800 –> 00:34:51,160
guarantees because you start believing the interaction is the system.
835
00:34:51,160 –> 00:34:52,160
It isn’t.
836
00:34:52,160 –> 00:34:54,800
The system is the state spine behind the conversation.
837
00:34:54,800 –> 00:34:57,120
Without that spine, retries are not resilience.
838
00:34:57,120 –> 00:35:02,160
Retries are how your architecture manufactures incidents out of transient failures.
839
00:35:02,160 –> 00:35:05,800
At pattern one, idempotency keys, post-authoritative state spine.
840
00:35:05,800 –> 00:35:09,200
Now the first deterministic pattern is the one everybody claims they already have.
841
00:35:09,200 –> 00:35:11,800
Right up until the first replay deletes the wrong thing.
842
00:35:11,800 –> 00:35:14,280
Idempotency is not, we try not to do it twice.
843
00:35:14,280 –> 00:35:15,480
Idempotency is a guarantee.
844
00:35:15,480 –> 00:35:19,640
The same operation, identified the same way, produces the same effect exactly once,
845
00:35:19,640 –> 00:35:22,400
no matter how many times the system replays the request.
846
00:35:22,400 –> 00:35:26,920
And the only way to get that guarantee is to stop pretending the event stream is your state.
847
00:35:26,920 –> 00:35:29,000
Here’s the model that holds under pressure.
848
00:35:29,000 –> 00:35:33,080
This side-effecting action gets an operation id that is generated once, at the moment the
849
00:35:33,080 –> 00:35:34,680
intent becomes a request.
850
00:35:34,680 –> 00:35:37,200
Not after the tool call, not after the model replies.
851
00:35:37,200 –> 00:35:39,480
Before, that operation id is immutable.
852
00:35:39,480 –> 00:35:43,240
Collision-resistant, boring, it doesn’t encode meaning, it encodes identity.
853
00:35:43,240 –> 00:35:47,360
Then you persisted to an authoritative state spine before you execute anything.
854
00:35:47,360 –> 00:35:51,520
That spine is not your turn state, it is not an in-memory cache, it is not will reconstructed
855
00:35:51,520 –> 00:35:52,720
from logs.
856
00:35:52,720 –> 00:35:57,760
It is a durable store that outlives the process and survives retries, restarts and parallel
857
00:35:57,760 –> 00:35:58,760
workers.
858
00:35:58,760 –> 00:36:03,480
Then you store the minimum fields that make replay safe, operation id, proposed action
859
00:36:03,480 –> 00:36:09,160
structured, not pros, current status, proposed, decided executed.
860
00:36:09,160 –> 00:36:14,160
Decision artifact pointer allowed, denied, transformed, target resource identifiers,
861
00:36:14,160 –> 00:36:16,760
and a timestamp plus version for concurrency control.
862
00:36:16,760 –> 00:36:20,760
Once you have that, retries stop being dangerous because retries stop being meaningful.
863
00:36:20,760 –> 00:36:25,040
A duplicate event arrives, you look up the operation id, you already have a status of
864
00:36:25,040 –> 00:36:28,640
executed, you return the previous outcome, no second side effect, no best effort
865
00:36:28,640 –> 00:36:29,640
to do it.
866
00:36:29,640 –> 00:36:33,800
It is deterministic because the state spine is authoritative, reordering becomes boring
867
00:36:33,800 –> 00:36:38,720
for the same reason, an event arrives that says complete task before create task, in
868
00:36:38,720 –> 00:36:43,000
an envelope driven system that creates a time machine, in a spine driven system you check
869
00:36:43,000 –> 00:36:44,000
state.
870
00:36:44,000 –> 00:36:45,800
Task doesn’t exist.
871
00:36:45,800 –> 00:36:50,720
You deny deterministically, not because the agent is smart, but because the state is authoritative.
872
00:36:50,720 –> 00:36:52,240
Delays become boring as well.
873
00:36:52,240 –> 00:36:56,520
If the event arrives late, it still references the same operation id, you return the same decision.
874
00:36:56,520 –> 00:37:00,240
The system doesn’t reinterpret intent because a packet took a scenic route through someone’s
875
00:37:00,240 –> 00:37:01,240
VPN happened.
876
00:37:01,240 –> 00:37:04,440
Now the thing most people miss is the separation of concerns.
877
00:37:04,440 –> 00:37:07,600
Item potency prevents double harm, it does not decide what is allowed.
878
00:37:07,600 –> 00:37:13,520
That’s why the operation id must exist before policy evaluation and before tool execution.
879
00:37:13,520 –> 00:37:17,880
It becomes the anchor for everything else, decision execution, audit, provenance.
880
00:37:17,880 –> 00:37:22,480
Without that anchor, you can’t tie what we decided to, what we did in a way that survives
881
00:37:22,480 –> 00:37:23,480
failure.
882
00:37:23,480 –> 00:37:25,880
And you need one more piece to make the spine real.
883
00:37:25,880 –> 00:37:30,440
A workflow state machine that is defined outside the agent, the agent can propose the system
884
00:37:30,440 –> 00:37:34,920
must track progression, proposed decided executed is not optional ceremony.
885
00:37:34,920 –> 00:37:38,360
It is how you prevent event noise from becoming irreversible action.
886
00:37:38,360 –> 00:37:42,920
When a worker crashes after executing, but before replying, the state already says executed.
887
00:37:42,920 –> 00:37:44,720
The next worker doesn’t try again.
888
00:37:44,720 –> 00:37:46,160
It returns the recorded effect.
889
00:37:46,160 –> 00:37:48,480
This is also why you don’t store only success.
890
00:37:48,480 –> 00:37:50,440
You store denials and transforms too.
891
00:37:50,440 –> 00:37:54,480
Because the absence of an action is still a decision you need to replace safely.
892
00:37:54,480 –> 00:38:00,240
If the policy denied the delete at 0905 and the same request replace at 0906, you must deny
893
00:38:00,240 –> 00:38:04,320
again for the same operation id, otherwise you’ve built a system that can be bypassed by
894
00:38:04,320 –> 00:38:05,840
retry storms.
895
00:38:05,840 –> 00:38:08,440
The practical consequence is that you stop debugging ghosts.
896
00:38:08,440 –> 00:38:12,600
Your incident review stops being how did this run twice and becomes why did we ever allow
897
00:38:12,600 –> 00:38:15,240
this operation id to execute once.
898
00:38:15,240 –> 00:38:16,240
That’s progress.
899
00:38:16,240 –> 00:38:17,560
That is where accountability lives.
900
00:38:17,560 –> 00:38:19,280
And yes, this costs engineering effort.
901
00:38:19,280 –> 00:38:23,400
So does every week you spend reconstructing an incident from transcripts and half correlated
902
00:38:23,400 –> 00:38:28,440
log lines, id, potency keys, plus an authoritative state spine are not an optimization.
903
00:38:28,440 –> 00:38:33,080
They are the price of admission for letting probabilistic agents touch deterministic systems.
904
00:38:33,080 –> 00:38:35,320
Deterministic pattern 2 per tool call policy gate.
905
00:38:35,320 –> 00:38:37,160
id, potency gives you safe replay.
906
00:38:37,160 –> 00:38:38,160
Good.
907
00:38:38,160 –> 00:38:40,000
It stops duplicates from turning into double damage.
908
00:38:40,000 –> 00:38:43,960
But id, potency doesn’t answer the question that actually decides whether you have an incident.
909
00:38:43,960 –> 00:38:45,480
Should this action be allowed at all?
910
00:38:45,480 –> 00:38:47,920
That’s where most enterprises fall back into religion.
911
00:38:47,920 –> 00:38:49,200
The agent knows.
912
00:38:49,200 –> 00:38:50,520
The prompt told it.
913
00:38:50,520 –> 00:38:51,360
We trained it.
914
00:38:51,360 –> 00:38:52,520
It has citations.
915
00:38:52,520 –> 00:38:54,120
None of that is an enforcement model.
916
00:38:54,120 –> 00:38:55,120
It’s a hope model.
917
00:38:55,120 –> 00:39:00,080
A per tool called policy gate is the mechanism that converts hope into a deterministic decision.
918
00:39:00,080 –> 00:39:02,200
And the key change is conceptual, not technical.
919
00:39:02,200 –> 00:39:03,680
The agent stops issuing commands.
920
00:39:03,680 –> 00:39:05,600
It starts submitting requests.
921
00:39:05,600 –> 00:39:11,920
The moment you let the model speak in imperatives, delete side X, share file Y, email this to Zid.
922
00:39:11,920 –> 00:39:13,920
You’ve made the LLM the control plane.
923
00:39:13,920 –> 00:39:16,640
You’ve delegated authority to a probabilistic system.
924
00:39:16,640 –> 00:39:18,040
That is not agentic.
925
00:39:18,040 –> 00:39:19,040
That is application.
926
00:39:19,040 –> 00:39:20,960
A policy gate flips the relationship.
927
00:39:20,960 –> 00:39:23,840
The agent proposes the control plane disposes.
928
00:39:23,840 –> 00:39:25,480
So what does the gate actually evaluate?
929
00:39:25,480 –> 00:39:30,760
Not pros, not vibe, not it sounded reasonable, a structured request.
930
00:39:30,760 –> 00:39:36,360
At minimum, every tool reaching request carries a tuple, actor, intent, scope, data class,
931
00:39:36,360 –> 00:39:38,400
venue and operation id.
932
00:39:38,400 –> 00:39:43,320
Actor is the identity that would execute the call, human, workload identity or segmented
933
00:39:43,320 –> 00:39:44,800
agent principle.
934
00:39:44,800 –> 00:39:51,280
It is the verb, delete, share, send, post, create, approve, small, innumerable, boring.
935
00:39:51,280 –> 00:39:56,760
Scope is the concrete target, site id, file id, mailbox, distribution list, external domain,
936
00:39:56,760 –> 00:39:58,080
API endpoint.
937
00:39:58,080 –> 00:40:02,200
Data class is the sensitivity of the thing being touched or disclosed, derived from authoritative
938
00:40:02,200 –> 00:40:04,840
classification, not guessed by the model.
939
00:40:04,840 –> 00:40:08,920
Venew is where the effect will manifest internal tenant, external email, teams meeting
940
00:40:08,920 –> 00:40:13,640
with external’s public web chat, operation id, anchors, replay and traceability as we already
941
00:40:13,640 –> 00:40:14,640
covered.
942
00:40:14,640 –> 00:40:18,360
Now the policy engine evaluates that tuple against rules and authoritative state.
943
00:40:18,360 –> 00:40:20,240
It returns one of three outcomes.
944
00:40:20,240 –> 00:40:24,240
Allow, deny, transform.
945
00:40:24,240 –> 00:40:27,680
Allow means it can proceed, but not as a blank check.
946
00:40:27,680 –> 00:40:32,920
It can attach constraints, time window, max recipients, required approval, rate limits,
947
00:40:32,920 –> 00:40:34,440
a narrow target set.
948
00:40:34,440 –> 00:40:36,280
deny means the tool never executes.
949
00:40:36,280 –> 00:40:40,280
The refusal is not a moral stance, it’s a deterministic result, intent x and venue
950
00:40:40,280 –> 00:40:46,320
y with data class z violates rule r, transform is the underused one that keeps systems usable.
951
00:40:46,320 –> 00:40:49,600
It means the action is allowed only in a safer form.
952
00:40:49,600 –> 00:40:53,800
Replace share externally with share internally and create an approval task.
953
00:40:53,800 –> 00:40:58,600
Replace speak compensation cohort data with speaker high level summary template.
954
00:40:58,600 –> 00:41:00,720
Replace delete with move to quarantine.
955
00:41:00,720 –> 00:41:02,960
This is how you avoid the false choice.
956
00:41:02,960 –> 00:41:05,840
Between agents are useless and agents are dangerous.
957
00:41:05,840 –> 00:41:09,760
The gate is also where you encode negative space, not just what you did, what you refused
958
00:41:09,760 –> 00:41:13,720
to do, what you refused to retrieve, what you refused to disclose.
959
00:41:13,720 –> 00:41:17,440
Because governance without refusal telemetry becomes performative, it only shows motion.
960
00:41:17,440 –> 00:41:21,080
Now here’s the part that separates a real gate from a prompt based imitation.
961
00:41:21,080 –> 00:41:23,480
Tools must accept decisions, not requests.
962
00:41:23,480 –> 00:41:28,080
If your tool endpoint will execute any authenticated call that contains delete, true, you don’t have
963
00:41:28,080 –> 00:41:31,360
a gate, you have a suggestion layer in front of a loaded weapon.
964
00:41:31,360 –> 00:41:35,480
The tool should accept only a signed decision artifact from the policy engine bound to
965
00:41:35,480 –> 00:41:40,200
the operation ID with a short TTL if the decision doesn’t match the tool denies.
966
00:41:40,200 –> 00:41:43,840
If the operation ID was already executed, the tool returns the prior outcome.
967
00:41:43,840 –> 00:41:48,440
That binds execution to policy and makes bypassing the gate materially harder.
968
00:41:48,440 –> 00:41:49,920
And yes, this sounds like overhead.
969
00:41:49,920 –> 00:41:50,920
It is.
970
00:41:50,920 –> 00:41:53,640
It’s also the only place where intent can be enforced at action time.
971
00:41:53,640 –> 00:41:54,880
Conditional access can’t do this.
972
00:41:54,880 –> 00:41:56,200
It doesn’t see the tool call.
973
00:41:56,200 –> 00:41:57,760
It sees token issuance context.
974
00:41:57,760 –> 00:41:58,760
Per view can’t do this.
975
00:41:58,760 –> 00:42:00,360
It sees the aftermath.
976
00:42:00,360 –> 00:42:01,360
Citations can’t do this.
977
00:42:01,360 –> 00:42:03,160
They explain retrieval, not permission.
978
00:42:03,160 –> 00:42:05,160
Only a gate can stop the train while it’s moving.
979
00:42:05,160 –> 00:42:09,680
If you want a concrete mental picture, treat the policy engine like an authorization compiler.
980
00:42:09,680 –> 00:42:11,360
The agent submits a high level request.
981
00:42:11,360 –> 00:42:13,280
The compiler checks it against rules and state.
982
00:42:13,280 –> 00:42:15,000
It emits a decision artifact.
983
00:42:15,000 –> 00:42:16,760
The runtime can execute.
984
00:42:16,760 –> 00:42:18,760
Without that artifact execution is invalid.
985
00:42:18,760 –> 00:42:21,600
That’s determinism grafted onto probabilistic reasoning.
986
00:42:21,600 –> 00:42:24,800
And once you have it, your incident reviews change shape.
987
00:42:24,800 –> 00:42:28,360
You stop asking why did it do that as if the agent had agency?
988
00:42:28,360 –> 00:42:30,480
You ask which rule allowed this?
989
00:42:30,480 –> 00:42:31,960
And who changed it?
990
00:42:31,960 –> 00:42:33,440
That’s accountability.
991
00:42:33,440 –> 00:42:38,040
And then the logistic pattern three segmented agent identities as failure domains.
992
00:42:38,040 –> 00:42:41,920
Once you put a real policy gate in front of tools, you’ve solved the should this be a
993
00:42:41,920 –> 00:42:46,800
loud problem at action time, but you still haven’t solved the bigger failure domain problem.
994
00:42:46,800 –> 00:42:50,560
Because if one identity can do everything, your gate becomes your only break and breaks
995
00:42:50,560 –> 00:42:51,560
fail.
996
00:42:51,560 –> 00:42:52,560
Rules drift.
997
00:42:52,560 –> 00:42:53,880
Someone adds an exception.
998
00:42:53,880 –> 00:42:56,120
An urgent request becomes permanent.
999
00:42:56,120 –> 00:42:58,760
Entropy always wins unless you give it walls to hit.
1000
00:42:58,760 –> 00:43:00,360
Segmented agent identities are those walls.
1001
00:43:00,360 –> 00:43:01,960
One agent is not one identity.
1002
00:43:01,960 –> 00:43:03,520
One agent is an orchestrator.
1003
00:43:03,520 –> 00:43:07,600
It should coordinate multiple principles, each with a narrow capability and a narrow blast
1004
00:43:07,600 –> 00:43:08,600
radius.
1005
00:43:08,600 –> 00:43:10,120
Read, write and address.
1006
00:43:10,120 –> 00:43:13,840
That distinction matters because the dominant failure mode in enterprise agents is not the
1007
00:43:13,840 –> 00:43:15,680
agent got global admin.
1008
00:43:15,680 –> 00:43:17,680
Microsoft has already constrained a lot of that.
1009
00:43:17,680 –> 00:43:19,440
The dominant failure is the boring one.
1010
00:43:19,440 –> 00:43:24,440
A convenience driven, overscoped identity executing at machine speed in the wrong place.
1011
00:43:24,440 –> 00:43:25,880
Least privilege isn’t a value statement.
1012
00:43:25,880 –> 00:43:26,880
It’s math.
1013
00:43:26,880 –> 00:43:30,880
Scopes multiplied by ambiguity multiplied by speed equals blast radius.
1014
00:43:30,880 –> 00:43:35,360
If you let the same identity retrieve broadly, write broadly and communicate externally, you’ve
1015
00:43:35,360 –> 00:43:37,240
created a super user with a polite interface.
1016
00:43:37,240 –> 00:43:39,400
It doesn’t matter how good your prompt is.
1017
00:43:39,400 –> 00:43:40,680
The capability exists.
1018
00:43:40,680 –> 00:43:42,680
The model will eventually root intent into it.
1019
00:43:42,680 –> 00:43:44,760
So you split capability at the identity boundary.
1020
00:43:44,760 –> 00:43:46,240
The read identity can only read.
1021
00:43:46,240 –> 00:43:50,000
It can query SharePoint metadata, retrieve files and summarize content.
1022
00:43:50,000 –> 00:43:51,000
It cannot delete.
1023
00:43:51,000 –> 00:43:52,000
It cannot share.
1024
00:43:52,000 –> 00:43:53,000
It cannot send.
1025
00:43:53,000 –> 00:43:54,960
It cannot write anywhere that matters.
1026
00:43:54,960 –> 00:43:56,600
Its job is to propose not to act.
1027
00:43:56,600 –> 00:43:58,040
Then you create a write identity.
1028
00:43:58,040 –> 00:43:59,440
This one is intentionally painful.
1029
00:43:59,440 –> 00:44:03,320
It holds only the minimum permissions needed for irreversible actions.
1030
00:44:03,320 –> 00:44:07,880
And those permissions are resource, scoped, short-lived and ideally minted just in time.
1031
00:44:07,880 –> 00:44:11,240
If you can’t make them short-lived, then you rotate aggressively and monitor like you mean
1032
00:44:11,240 –> 00:44:12,240
it.
1033
00:44:12,240 –> 00:44:13,720
This identity never retrieves broadly.
1034
00:44:13,720 –> 00:44:14,720
It doesn’t need to.
1035
00:44:14,720 –> 00:44:18,640
It executes against explicit targets that have already passed policy evaluation.
1036
00:44:18,640 –> 00:44:20,320
And then you create an egress identity.
1037
00:44:20,320 –> 00:44:24,560
This one can talk outside the tenant or publish to public surfaces or send email to external
1038
00:44:24,560 –> 00:44:25,560
domains.
1039
00:44:25,560 –> 00:44:27,720
It has zero access to internal corporate data planes.
1040
00:44:27,720 –> 00:44:28,720
None.
1041
00:44:28,720 –> 00:44:32,600
You can see internal runbooks and also post externally you’ve already lost.
1042
00:44:32,600 –> 00:44:33,680
Egress is not a feature.
1043
00:44:33,680 –> 00:44:34,680
It’s a failure domain.
1044
00:44:34,680 –> 00:44:37,560
Now the obvious objection is that’s three times the complexity.
1045
00:44:37,560 –> 00:44:41,840
No, it’s three times the clarity because now every action has a lane and lanes don’t cross
1046
00:44:41,840 –> 00:44:43,000
without a broker.
1047
00:44:43,000 –> 00:44:44,440
The orchestrator can request.
1048
00:44:44,440 –> 00:44:47,960
The policy gate can decide the correct identity can execute.
1049
00:44:47,960 –> 00:44:52,280
If the read identity gets compromised, the attacker gets visibility, not destruction.
1050
00:44:52,280 –> 00:44:56,040
If the right identity gets compromised, the attacker gets destruction, but only within
1051
00:44:56,040 –> 00:45:00,640
a narrowly scoped domain and ideally only for a short time window.
1052
00:45:00,640 –> 00:45:04,720
If the egress identity gets compromised, the attacker can speak, but they can’t see your
1053
00:45:04,720 –> 00:45:06,040
internal knowledge base.
1054
00:45:06,040 –> 00:45:09,920
This is how you build containment into the system rather than writing incident reviews
1055
00:45:09,920 –> 00:45:11,680
about will be more careful.
1056
00:45:11,680 –> 00:45:14,400
And it pairs cleanly with the previous two patterns.
1057
00:45:14,400 –> 00:45:16,040
Identity gives you safe replay.
1058
00:45:16,040 –> 00:45:19,200
The policy gate gives you action time authorization.
1059
00:45:19,200 –> 00:45:22,360
Segmented identities give you blast radius containment when the gate is wrong.
1060
00:45:22,360 –> 00:45:25,200
Now here’s the part most teams miss.
1061
00:45:25,200 –> 00:45:28,160
Reaction must be enforced by design, not etiquette.
1062
00:45:28,160 –> 00:45:31,720
Don’t let the agent choose which identity to use based on a prompt instruction.
1063
00:45:31,720 –> 00:45:32,720
That’s still hope.
1064
00:45:32,720 –> 00:45:37,160
Identity selection should be a deterministic mapping from intent and venue to a principle
1065
00:45:37,160 –> 00:45:38,920
enforced by the control plane.
1066
00:45:38,920 –> 00:45:41,320
Delete intent routes to the right principle.
1067
00:45:41,320 –> 00:45:44,320
External publication routes to the egress principle.
1068
00:45:44,320 –> 00:45:46,080
Retrieval routes to the read principle.
1069
00:45:46,080 –> 00:45:50,400
The agent can’t override that because it never directly holds the credentials for the
1070
00:45:50,400 –> 00:45:51,640
other lanes.
1071
00:45:51,640 –> 00:45:54,160
This is also how you survive shadow agents sprawl.
1072
00:45:54,160 –> 00:45:58,120
When someone spins up a quick external bot, the external lane simply cannot authenticate
1073
00:45:58,120 –> 00:46:00,200
to internal core data planes.
1074
00:46:00,200 –> 00:46:02,280
Even if they try, even if they copy code.
1075
00:46:02,280 –> 00:46:06,240
Even if they add a connector, the design makes the bad path impossible without an explicit
1076
00:46:06,240 –> 00:46:07,400
governance decision.
1077
00:46:07,400 –> 00:46:11,280
So if you remember one sentence from this pattern, make it this.
1078
00:46:11,280 –> 00:46:13,880
Agents should fail small, not fail loud.
1079
00:46:13,880 –> 00:46:17,240
A single identity design makes failure loud by default.
1080
00:46:17,240 –> 00:46:20,640
Segmented identities make failure bounded by default.
1081
00:46:20,640 –> 00:46:24,720
That’s the difference between a contained incident and a tenet wide outage delivered by
1082
00:46:24,720 –> 00:46:26,120
a calm voice.
1083
00:46:26,120 –> 00:46:31,040
Bragg as a security boundary, retrieval filters plus negative space plus output classification
1084
00:46:31,040 –> 00:46:34,960
not tie the whole thing back to the part everybody treats as just search.
1085
00:46:34,960 –> 00:46:37,120
Retrieval.
1086
00:46:37,120 –> 00:46:39,360
Most teams implement rag like a convenience feature.
1087
00:46:39,360 –> 00:46:43,440
Embed documents, vector search, pull the top five chunks, stuff them into the prompt and
1088
00:46:43,440 –> 00:46:45,080
call it grounded.
1089
00:46:45,080 –> 00:46:46,080
That is not a boundary.
1090
00:46:46,080 –> 00:46:49,120
That is a suggestion engine feeding a probabilistic model.
1091
00:46:49,120 –> 00:46:51,520
In a real enterprise, retrieval is an authorization event.
1092
00:46:51,520 –> 00:46:55,240
It is the moment your system decides what information is allowed to exist for this actor
1093
00:46:55,240 –> 00:46:56,960
in this venue right now.
1094
00:46:56,960 –> 00:47:00,680
If you don’t treat it that way, the nearest neighbor algorithm will outrun your governance
1095
00:47:00,680 –> 00:47:01,680
model every time.
1096
00:47:01,680 –> 00:47:03,560
So the boundary starts before similarity.
1097
00:47:03,560 –> 00:47:05,800
Alligibility comes first.
1098
00:47:05,800 –> 00:47:10,200
Before you run a vector search, you filter the candidate set using hard predicates,
1099
00:47:10,200 –> 00:47:13,800
principle, access scope, confidentiality and venue.
1100
00:47:13,800 –> 00:47:17,720
Principle means the workload identity or user context that is actually operating.
1101
00:47:17,720 –> 00:47:21,480
This scope means what corpus this identity is allowed to see based on an authoritative
1102
00:47:21,480 –> 00:47:25,200
catalog, not on whatever connector happens to be configured.
1103
00:47:25,200 –> 00:47:28,520
Confidentiality means the classification level of the content.
1104
00:47:28,520 –> 00:47:31,320
venue means where the answer will be consumed.
1105
00:47:31,320 –> 00:47:35,480
Internal chat, mixed audience meeting, external web, email public site.
1106
00:47:35,480 –> 00:47:38,720
If a chunk is not eligible under those predicates, it does not exist.
1107
00:47:38,720 –> 00:47:40,040
Not it won’t be used.
1108
00:47:40,040 –> 00:47:41,480
It doesn’t exist.
1109
00:47:41,480 –> 00:47:43,040
This is the uncomfortable truth.
1110
00:47:43,040 –> 00:47:44,760
Similarity search is not a permissions model.
1111
00:47:44,760 –> 00:47:48,600
It is math and math will happily return the best match from an ineligible corpus unless
1112
00:47:48,600 –> 00:47:49,800
you fence it.
1113
00:47:49,800 –> 00:47:53,200
Then you do the thing that makes the whole system safer without anyone noticing.
1114
00:47:53,200 –> 00:47:55,160
You build negative space.
1115
00:47:55,160 –> 00:47:58,440
Negative space means the system records what it refused to retrieve and what it refused
1116
00:47:58,440 –> 00:47:59,440
to say.
1117
00:47:59,440 –> 00:48:03,600
When the pre-filters exclude chunks, you lock that exclusion with a reason.
1118
00:48:03,600 –> 00:48:08,160
Excluded because venue external, excluded because confidentiality and turn only, excluded
1119
00:48:08,160 –> 00:48:10,320
because principle lacks access scope.
1120
00:48:10,320 –> 00:48:14,120
When the filtered retrieval returns nothing, that emptiness is not an error.
1121
00:48:14,120 –> 00:48:15,280
It is a safe outcome.
1122
00:48:15,280 –> 00:48:18,440
It is the system refusing to invent or overshare.
1123
00:48:18,440 –> 00:48:21,120
Most organizations treat no results as a UX bug.
1124
00:48:21,120 –> 00:48:23,120
They force the model to answer anyway.
1125
00:48:23,120 –> 00:48:25,680
That turns your rack system into a leak mechanism.
1126
00:48:25,680 –> 00:48:27,680
The safe behavior is sight or silent.
1127
00:48:27,680 –> 00:48:30,480
If there is no eligible evidence, the agent says less.
1128
00:48:30,480 –> 00:48:33,080
No eligible content found for this request is a feature.
1129
00:48:33,080 –> 00:48:37,120
It’s the guardrail that stops the model from converting uncertainty into confident
1130
00:48:37,120 –> 00:48:38,120
nonsense.
1131
00:48:38,120 –> 00:48:40,840
Now you enforce the same discipline on generation.
1132
00:48:40,840 –> 00:48:45,240
The model can only assert claims that map to eligible chunk IDs and it must cite them.
1133
00:48:45,240 –> 00:48:46,720
If it can’t cite it downgrades.
1134
00:48:46,720 –> 00:48:49,200
If it can’t downgrade safely, it refuses.
1135
00:48:49,200 –> 00:48:52,360
This is how you make grounding measurable instead of aspirational.
1136
00:48:52,360 –> 00:48:55,200
But the part that most people miss is output classification.
1137
00:48:55,200 –> 00:48:59,120
Enterprises label inputs and then pretend outputs inherit safety biosmosis.
1138
00:48:59,120 –> 00:49:00,120
They don’t.
1139
00:49:00,120 –> 00:49:01,120
The output is a new artifact.
1140
00:49:01,120 –> 00:49:02,120
It can aggregate.
1141
00:49:02,120 –> 00:49:03,120
It can summarize.
1142
00:49:03,120 –> 00:49:07,040
It can combine two non-sensitive facts into a sensitive conclusion.
1143
00:49:07,040 –> 00:49:09,360
And invoice scenarios output is publication.
1144
00:49:09,360 –> 00:49:13,680
So you derive an output sensitivity label from the sources used and the aggregation level
1145
00:49:13,680 –> 00:49:14,680
of the answer.
1146
00:49:14,680 –> 00:49:19,720
If the answer pulls from compensation guidance and produces cohort level metrics, the output
1147
00:49:19,720 –> 00:49:24,000
is compensation sensitive even if no single chunk was labeled secret.
1148
00:49:24,000 –> 00:49:27,400
Then you root that output through the same policy gate that controls tool calls because
1149
00:49:27,400 –> 00:49:32,360
speech is a tool call, venue plus output classification becomes your egress boundary.
1150
00:49:32,360 –> 00:49:37,960
Mixed audience, external participants, then the speech path requires a transform or deny.
1151
00:49:37,960 –> 00:49:39,120
Internal HR channel.
1152
00:49:39,120 –> 00:49:42,440
You might allow text, deny speech or require different identity.
1153
00:49:42,440 –> 00:49:46,920
The point is that the system decides at action time, not after the transcript is stored.
1154
00:49:46,920 –> 00:49:51,120
This is how rag stops being a knowledge feature and becomes a security boundary.
1155
00:49:51,120 –> 00:49:53,360
Eligibility before similarity.
1156
00:49:53,360 –> 00:49:57,440
Negative space as a first class record, outputs classified and gated like actions.
1157
00:49:57,440 –> 00:49:59,200
The agent still speaks when it has proof.
1158
00:49:59,200 –> 00:50:00,880
It goes quiet when it doesn’t.
1159
00:50:00,880 –> 00:50:05,360
And that silence is what prevents the next incident from being perfectly logged.
1160
00:50:05,360 –> 00:50:06,360
Conditional access?
1161
00:50:06,360 –> 00:50:07,360
Necessary?
1162
00:50:07,360 –> 00:50:08,360
Not sufficient.
1163
00:50:08,360 –> 00:50:12,080
Conditional access is the most over praised control in the agent conversation and it’s
1164
00:50:12,080 –> 00:50:13,080
still mandatory.
1165
00:50:13,080 –> 00:50:14,080
It is the front gate.
1166
00:50:14,080 –> 00:50:18,600
It decides whether an identity should receive a token right now under current risk signals,
1167
00:50:18,600 –> 00:50:22,040
device posture, location, sign-in-risk, workload context.
1168
00:50:22,040 –> 00:50:25,240
For agents and other non-human identities, that matters.
1169
00:50:25,240 –> 00:50:27,920
It shrinks who can even show up holding credentials.
1170
00:50:27,920 –> 00:50:28,920
You don’t skip that.
1171
00:50:28,920 –> 00:50:33,160
But conditional access is also where enterprises stop thinking because it feels like enforcement.
1172
00:50:33,160 –> 00:50:34,880
This is the uncomfortable truth.
1173
00:50:34,880 –> 00:50:36,560
Conditional access is a token time decision.
1174
00:50:36,560 –> 00:50:38,080
It is not an action time decision.
1175
00:50:38,080 –> 00:50:40,760
And answers, may this identity obtain a token?
1176
00:50:40,760 –> 00:50:44,480
Not may this identity delete this site, share this file or speak this aggregation in this
1177
00:50:44,480 –> 00:50:45,480
venue?
1178
00:50:45,480 –> 00:50:48,720
Once the token exists, you are no longer in an authentication problem.
1179
00:50:48,720 –> 00:50:50,520
You are in an authorization problem.
1180
00:50:50,520 –> 00:50:55,480
And token issuance cannot adjudicate tool execution because tool execution happens later in a different
1181
00:50:55,480 –> 00:51:00,480
context after retrieval, after orchestration, after the meeting audience changes, after
1182
00:51:00,480 –> 00:51:03,160
the agent chooses a path you didn’t anticipate.
1183
00:51:03,160 –> 00:51:08,040
That’s why so many incidents look compliant in entra and still unacceptable in the business.
1184
00:51:08,040 –> 00:51:10,080
Walk the timeline and the gap becomes obvious.
1185
00:51:10,080 –> 00:51:12,040
The agent requests a token.
1186
00:51:12,040 –> 00:51:13,920
Conditional access evaluates and passes.
1187
00:51:13,920 –> 00:51:14,920
Good.
1188
00:51:14,920 –> 00:51:18,760
Then the agent retrieves a loud data under its scopes, logged, fine.
1189
00:51:18,760 –> 00:51:20,400
Then the agent proposes an action.
1190
00:51:20,400 –> 00:51:22,760
Delete, share, email, post, speak.
1191
00:51:22,760 –> 00:51:26,240
This is the moment that matters because this is the moment side effects happen.
1192
00:51:26,240 –> 00:51:30,120
And conditional access is not in that path unless you force it back in with a separate decision
1193
00:51:30,120 –> 00:51:31,120
point.
1194
00:51:31,120 –> 00:51:32,440
That is what the policy gate is for.
1195
00:51:32,440 –> 00:51:34,600
It is not a replacement for conditional access.
1196
00:51:34,600 –> 00:51:36,360
It is the missing second gate.
1197
00:51:36,360 –> 00:51:37,960
Conditional access decides who may try.
1198
00:51:37,960 –> 00:51:39,760
The policy engine decides what may happen.
1199
00:51:39,760 –> 00:51:43,960
Now the mistake teams make is trying to stretch conditional access to cover what it can’t.
1200
00:51:43,960 –> 00:51:48,040
They pile on network locations, token protection, session controls, device filters and assume
1201
00:51:48,040 –> 00:51:50,080
the blast radius shrinks automatically.
1202
00:51:50,080 –> 00:51:51,080
It doesn’t.
1203
00:51:51,080 –> 00:51:54,960
If the agent holds broad right scopes, the radius is already baked in.
1204
00:51:54,960 –> 00:51:57,320
Conditional access just decides who gets to hold the match.
1205
00:51:57,320 –> 00:51:59,760
So the architecture you enforce is a braid.
1206
00:51:59,760 –> 00:52:02,800
Conditional access at token time, strict and non-negotiable.
1207
00:52:02,800 –> 00:52:06,400
Least privilege on scopes because permissions are blast radius math.
1208
00:52:06,400 –> 00:52:10,480
Uncreated identities because one agent should not be one super identity and per tool call
1209
00:52:10,480 –> 00:52:14,640
policy evaluation because action time authorization is where incidents either happen or don’t.
1210
00:52:14,640 –> 00:52:16,440
Now make monitoring, earn its keep.
1211
00:52:16,440 –> 00:52:19,080
Watch token issuance patterns on agent identities.
1212
00:52:19,080 –> 00:52:22,240
Unusual cadence, unusual geos, new client types.
1213
00:52:22,240 –> 00:52:23,480
That’s the identity plane.
1214
00:52:23,480 –> 00:52:28,760
But also watch tool call shapes, spikes in deletes, sudden external egress, novel venues.
1215
00:52:28,760 –> 00:52:29,760
That’s the action plane.
1216
00:52:29,760 –> 00:52:32,640
And when you detect drift, you don’t retrain the agent.
1217
00:52:32,640 –> 00:52:37,040
With titan scopes, titan policies and shrink failure domains, conditional access is necessary
1218
00:52:37,040 –> 00:52:39,560
because it keeps the wrong identities from showing up.
1219
00:52:39,560 –> 00:52:43,080
It is not sufficient because the right identity can still do the wrong thing, perfectly
1220
00:52:43,080 –> 00:52:45,240
logged with a valid token.
1221
00:52:45,240 –> 00:52:49,240
The experience plane tax, web RTC, speech regions and metered certainty.
1222
00:52:49,240 –> 00:52:52,960
Now the punchline nobody budgets for until the demo becomes production, the experience plane
1223
00:52:52,960 –> 00:52:53,960
tax.
1224
00:52:53,960 –> 00:52:56,400
The face and the voice don’t just add engagement.
1225
00:52:56,400 –> 00:53:01,120
They add failure domains, networks, regions and metering and none of that complexity
1226
00:53:01,120 –> 00:53:04,800
buys you a single extra millisecond of deterministic control.
1227
00:53:04,800 –> 00:53:05,880
Start with web RTC.
1228
00:53:05,880 –> 00:53:07,800
It works beautifully in a clean lab.
1229
00:53:07,800 –> 00:53:12,360
Then it meets enterprise reality, NIT traversal, VPN hairpins, deep packet inspection, split
1230
00:53:12,360 –> 00:53:16,760
tunnel policies and firewalls that quietly hate UDP, so you fall back to relays.
1231
00:53:16,760 –> 00:53:18,040
Turn becomes mandatory.
1232
00:53:18,040 –> 00:53:20,640
That adds hops, jitter and operational overhead.
1233
00:53:20,640 –> 00:53:25,200
The avatar stutters, the audio talks over itself and the system compensates with retries
1234
00:53:25,200 –> 00:53:30,000
and reconnects more events, more envelope churn, more entropy injected into the same pathway
1235
00:53:30,000 –> 00:53:31,880
that also drives tool calls.
1236
00:53:31,880 –> 00:53:32,880
Then speech regions.
1237
00:53:32,880 –> 00:53:35,040
Azure Speech is region bound by design.
1238
00:53:35,040 –> 00:53:37,080
Keys are region locked and points are regional.
1239
00:53:37,080 –> 00:53:40,400
If you serve multiple geographies, you don’t have a voice.
1240
00:53:40,400 –> 00:53:45,840
You have a fleet of voices, separate resources, quotas, keys, routing logic and failover plans.
1241
00:53:45,840 –> 00:53:50,120
When a region blips, the agent doesn’t fail in a way your control plane can reason about.
1242
00:53:50,120 –> 00:53:53,840
It fails in the human layer, the voice disappears and the business interprets that as the
1243
00:53:53,840 –> 00:53:58,640
agent is down, even if the decision engine is still happily proposing actions.
1244
00:53:58,640 –> 00:54:02,720
And it’s all metered, per second, not per outcome, not per prevented incident, per second
1245
00:54:02,720 –> 00:54:04,040
of stream certainty.
1246
00:54:04,040 –> 00:54:07,960
So you end up financing persuasion, tens of millions of seconds of compute to animate confidence
1247
00:54:07,960 –> 00:54:11,560
while the control plane that could prevent harm remains underbuilt.
1248
00:54:11,560 –> 00:54:15,160
Conclusion, assume the face is lying and force intended action time.
1249
00:54:15,160 –> 00:54:16,600
The voice adds trust.
1250
00:54:16,600 –> 00:54:19,520
The system did not earn it and logs won’t save you after the fact.
1251
00:54:19,520 –> 00:54:22,800
Make the agent propose then force the control plane to dispose.
1252
00:54:22,800 –> 00:54:29,160
Item potency, authoritative state, per tool call policy gates and segmented identities.
1253
00:54:29,160 –> 00:54:33,240
If you do one thing next, audit where actions execute without a gate and market red, then
1254
00:54:33,240 –> 00:54:34,720
fund determinism not avatars.