How the Speaking Agent Obscures Architectural Entropy

Mirko PetersPodcasts3 weeks ago130 Views


1
00:00:00,000 –> 00:00:02,200
At 0902, the agent signs in.

2
00:00:02,200 –> 00:00:06,880
Conditional access evaluates once, passes, and a token gets issued.

3
00:00:06,880 –> 00:00:11,000
At 0904, the meeting changes, and external guest joins a channel gets renamed

4
00:00:11,000 –> 00:00:13,200
a document link, shifts, whatever.

5
00:00:13,200 –> 00:00:14,360
Context moves.

6
00:00:14,360 –> 00:00:20,920
At 0907, the agent executes a destructive tool call anyway, inside the workload with a still valid token.

7
00:00:20,920 –> 00:00:23,240
At 0908, Perview has the transcript.

8
00:00:23,240 –> 00:00:24,760
Copilot logs have the activity.

9
00:00:24,760 –> 00:00:28,080
The identity is correct, the timestamps are correct, the story is perfect.

10
00:00:28,080 –> 00:00:32,720
Every control worked, every log is correct, and the system still did the wrong thing is

11
00:00:32,720 –> 00:00:33,600
what this is not.

12
00:00:33,600 –> 00:00:36,280
Anti-voice, anti-UX, anti-microsoft.

13
00:00:36,280 –> 00:00:37,880
This is not an anti-voice rant.

14
00:00:37,880 –> 00:00:43,320
Voice is useful, avatars are useful, accessibility matters, and real-time interaction matters.

15
00:00:43,320 –> 00:00:48,120
A speaking interface can lower friction, reduce cognitive load, and make a system usable

16
00:00:48,120 –> 00:00:50,560
for people who would never type into a chat box.

17
00:00:50,560 –> 00:00:52,760
This is also not an anti-microsoft episode.

18
00:00:52,760 –> 00:00:56,880
Microsoft has shipped real governance improvements that most platforms still don’t have.

19
00:00:56,880 –> 00:01:01,320
Perview can capture transcripts, copilot studio can log activities, Entra has a clearer model

20
00:01:01,320 –> 00:01:03,840
for workload identities and non-human identities.

21
00:01:03,840 –> 00:01:06,120
Conditional access exists and it’s mature.

22
00:01:06,120 –> 00:01:07,200
Those are not small things.

23
00:01:07,200 –> 00:01:10,640
That is the scaffolding you need for operating agents at enterprise scale.

24
00:01:10,640 –> 00:01:14,040
But this is the boundary line, the industry keeps refusing to say out loud.

25
00:01:14,040 –> 00:01:15,400
Forensics are not control.

26
00:01:15,400 –> 00:01:17,400
Or it tells you what happened after the fact.

27
00:01:17,400 –> 00:01:19,240
It gives you a narrative you can export.

28
00:01:19,240 –> 00:01:23,600
It helps legal, it helps incident response, it helps you argue with reality less.

29
00:01:23,600 –> 00:01:28,200
One of that prevents an allowed identity from doing a wrong thing at the moment of execution.

30
00:01:28,200 –> 00:01:30,200
And that’s why the format of this episode matters.

31
00:01:30,200 –> 00:01:33,920
This isn’t a tutorial on building agents, there won’t be configuration walkthroughs, no

32
00:01:33,920 –> 00:01:35,400
click here demos.

33
00:01:35,400 –> 00:01:36,400
This is an autopsy.

34
00:01:36,400 –> 00:01:39,600
Claim, failure pattern, architectural cause consequence.

35
00:01:39,600 –> 00:01:41,600
Because the failure isn’t that teams lack tools.

36
00:01:41,600 –> 00:01:45,360
The failure is that teams keep buying comfort instead of determinism.

37
00:01:45,360 –> 00:01:49,760
The embodied lie, trust signaling wrapped around probabilistic execution.

38
00:01:49,760 –> 00:01:50,840
Here’s the embodied lie.

39
00:01:50,840 –> 00:01:54,080
A voice in a face are not features, they are trust signals.

40
00:01:54,080 –> 00:01:58,000
They’re a human interface hack that makes a probabilistic system feel like a deterministic

41
00:01:58,000 –> 00:01:59,000
one.

42
00:01:59,000 –> 00:02:02,440
And the moment you add them, you change how people evaluate risk.

43
00:02:02,440 –> 00:02:05,920
The thing most people miss is that the speaking agent isn’t just an agent.

44
00:02:05,920 –> 00:02:08,200
It’s an execution engine, wearing a personality.

45
00:02:08,200 –> 00:02:10,200
The avatar doesn’t make the agent more accurate.

46
00:02:10,200 –> 00:02:11,960
It makes the output more persuasive.

47
00:02:11,960 –> 00:02:14,880
That distinction matters because persuasion is not governance.

48
00:02:14,880 –> 00:02:17,600
In architectural terms, the agent is not a teammate.

49
00:02:17,600 –> 00:02:19,520
It is a distributed decision engine.

50
00:02:19,520 –> 00:02:23,880
It takes an input, retrieves some context, chooses a tool and executes an action.

51
00:02:23,880 –> 00:02:27,880
The choice is probabilistic, the retrieval is probabilistic, tool selection is probabilistic.

52
00:02:27,880 –> 00:02:31,480
Even when it’s grounded, it’s grounded in whatever it retrieved, not in what your intent

53
00:02:31,480 –> 00:02:32,480
actually was.

54
00:02:32,480 –> 00:02:36,760
Now add embodiment, low latency speech, smooth turn, taking a confident tone.

55
00:02:36,760 –> 00:02:38,040
Humans read those as competence.

56
00:02:38,040 –> 00:02:42,040
They stop asking what approved this and start accepting it sounded right.

57
00:02:42,040 –> 00:02:45,920
That’s human interface trust bias doing what it always does, shifting scrutiny away from

58
00:02:45,920 –> 00:02:48,280
the control plane and onto the performance.

59
00:02:48,280 –> 00:02:50,400
That’s why governance gets worse when you add a face.

60
00:02:50,400 –> 00:02:53,080
The organization starts optimizing for the experience plane.

61
00:02:53,080 –> 00:02:57,000
Prompt tweaks, persona tuning, make it sound more cautious.

62
00:02:57,000 –> 00:02:58,520
Add a confirmation question.

63
00:02:58,520 –> 00:02:59,600
Those are theater patches.

64
00:02:59,600 –> 00:03:01,440
They don’t change the system’s blast radius.

65
00:03:01,440 –> 00:03:02,760
They don’t enforce intent.

66
00:03:02,760 –> 00:03:08,240
They don’t create a deterministic gate between the agent’s proposal and the platform’s execution.

67
00:03:08,240 –> 00:03:12,320
This clicked for me when I watched teams celebrate transcripts as if they were safety.

68
00:03:12,320 –> 00:03:13,560
A transcript is not safety.

69
00:03:13,560 –> 00:03:15,440
A transcript is a post-incident artifact.

70
00:03:15,440 –> 00:03:16,440
It’s a replay.

71
00:03:16,440 –> 00:03:17,440
It’s a confession.

72
00:03:17,440 –> 00:03:20,520
You hand to counsel when the action already happened.

73
00:03:20,520 –> 00:03:23,160
The system did not become safer because it can narrate itself.

74
00:03:23,160 –> 00:03:26,600
Now, Microsoft will say the right things here and they’re not wrong.

75
00:03:26,600 –> 00:03:28,760
Conditional access evaluates a token acquisition.

76
00:03:28,760 –> 00:03:30,800
Per view can capture interactions.

77
00:03:30,800 –> 00:03:31,800
Activity logs exist.

78
00:03:31,800 –> 00:03:33,280
Workload identity controls exist.

79
00:03:33,280 –> 00:03:34,280
That’s the ticket booth.

80
00:03:34,280 –> 00:03:35,280
That’s the camera system.

81
00:03:35,280 –> 00:03:36,280
That’s the audit trail.

82
00:03:36,280 –> 00:03:41,360
But the embodied lie lives in the gap between those controls and the moment a tool called

83
00:03:41,360 –> 00:03:42,680
executes.

84
00:03:42,680 –> 00:03:45,000
Token time controls decide who can show up.

85
00:03:45,000 –> 00:03:47,560
And controls decide what is allowed to happen next.

86
00:03:47,560 –> 00:03:49,960
Most organizations build only the first one.

87
00:03:49,960 –> 00:03:53,080
Then they act surprised when the second one behaves like a suggestion.

88
00:03:53,080 –> 00:03:57,400
And this is where the speaking agent becomes an entropy generator because the more human

89
00:03:57,400 –> 00:04:01,440
it seems, the more likely you are to let it run with broad scopes, the more likely you

90
00:04:01,440 –> 00:04:06,040
are to skip segmentation, the more likely you are to accept its logged as a substitute

91
00:04:06,040 –> 00:04:07,800
for its prevented.

92
00:04:07,800 –> 00:04:12,960
Over time, you accumulate permissions, exceptions, and implicit trust until you have conditional

93
00:04:12,960 –> 00:04:13,960
chaos.

94
00:04:13,960 –> 00:04:18,160
And that behaves correctly most of the time, right up until the moment it doesn’t.

95
00:04:18,160 –> 00:04:21,920
So when the agent speaks with certainty, treat that as a warning, not a reassurance.

96
00:04:21,920 –> 00:04:23,360
You are not hearing determinism.

97
00:04:23,360 –> 00:04:27,000
You are hearing probability wrapped in a voice that implies accountability.

98
00:04:27,000 –> 00:04:30,600
The control plane versus the experience plane, two timelines that don’t meet.

99
00:04:30,600 –> 00:04:34,320
There are two timelines running every time an agent helps someone.

100
00:04:34,320 –> 00:04:38,320
Most organizations only instrument one of them because it’s the one humans notice.

101
00:04:38,320 –> 00:04:39,880
That’s the experience plane.

102
00:04:39,880 –> 00:04:43,840
It includes the chat transcript, the speaking voice, the avatar, the response latency,

103
00:04:43,840 –> 00:04:48,760
the citations, the little thinking indicator, and the meeting dynamics where nobody wants

104
00:04:48,760 –> 00:04:53,320
to slow the room down by arguing with a confident sounding assistant.

105
00:04:53,320 –> 00:04:56,440
Its perception management, its social flow, its persuasion at scale.

106
00:04:56,440 –> 00:04:59,480
The other timeline is the only one that matters when something breaks.

107
00:04:59,480 –> 00:05:01,160
That’s the control plane.

108
00:05:01,160 –> 00:05:07,280
Identity issuance, token lifetime, scope, retrieval boundaries, tool invocation, side effects,

109
00:05:07,280 –> 00:05:13,720
state transitions, retry behavior, compensating actions, data class enforcement, venue enforcement.

110
00:05:13,720 –> 00:05:16,360
That’s the plane where blast radius is defined.

111
00:05:16,360 –> 00:05:19,840
And the uncomfortable truth is that these two timelines don’t line up.

112
00:05:19,840 –> 00:05:21,600
They rarely even touch.

113
00:05:21,600 –> 00:05:25,920
Because the platform’s strongest controls tend to fire at token time while the damage

114
00:05:25,920 –> 00:05:27,800
happens at tool time.

115
00:05:27,800 –> 00:05:29,360
Condition access is a perfect example.

116
00:05:29,360 –> 00:05:30,560
It’s the ticket booth.

117
00:05:30,560 –> 00:05:32,080
It answers a narrow question.

118
00:05:32,080 –> 00:05:35,360
Should this identity get a token right now under current conditions?

119
00:05:35,360 –> 00:05:40,520
It can evaluate signals, risk, device posture, location, it can deny, it can require stronger

120
00:05:40,520 –> 00:05:41,520
auth.

121
00:05:41,520 –> 00:05:42,520
That is real control.

122
00:05:42,520 –> 00:05:45,240
If the token exists, the train leaves the station.

123
00:05:45,240 –> 00:05:46,920
Now the system is in the workload.

124
00:05:46,920 –> 00:05:51,400
Tool selection happens, data gets read, rights happen, shares happen, deletes happen, and

125
00:05:51,400 –> 00:05:54,920
the control plane is often no longer in the loop in a deterministic way.

126
00:05:54,920 –> 00:05:59,120
You’ve moved from who may show up to what is happening, and most enterprises have no enforcement

127
00:05:59,120 –> 00:06:00,120
point in the middle.

128
00:06:00,120 –> 00:06:02,720
Per view is the other half of the same mismatch.

129
00:06:02,720 –> 00:06:04,640
Per view is the security camera system.

130
00:06:04,640 –> 00:06:09,000
It records, it correlates, it lets you do forensics after the fact, it’s useful, and

131
00:06:09,000 –> 00:06:10,000
it’s getting better.

132
00:06:10,000 –> 00:06:11,520
But cameras do not stop the train.

133
00:06:11,520 –> 00:06:14,440
They just help you reconstruct which door was forced and when.

134
00:06:14,440 –> 00:06:18,400
The reason this gap keeps surprising people is that the experience plane looks like control.

135
00:06:18,400 –> 00:06:19,520
The agent speaks calmly.

136
00:06:19,520 –> 00:06:20,520
It cites a document.

137
00:06:20,520 –> 00:06:22,200
It says, based on policy.

138
00:06:22,200 –> 00:06:25,660
It feels governed, and because it feels governed, people assume the control plane must have

139
00:06:25,660 –> 00:06:26,660
approved it.

140
00:06:26,660 –> 00:06:28,120
That assumption is false.

141
00:06:28,120 –> 00:06:30,760
A citation is not an authorization decision.

142
00:06:30,760 –> 00:06:32,960
A transcript is not a policy evaluation.

143
00:06:32,960 –> 00:06:35,840
A token issuance event is not a per-action gate.

144
00:06:35,840 –> 00:06:38,520
If you want a mental model, you can hold in your head, use the rail system.

145
00:06:38,520 –> 00:06:40,280
The ticket booth is conditional access.

146
00:06:40,280 –> 00:06:42,240
It can stop someone from entering the station.

147
00:06:42,240 –> 00:06:45,800
It cannot stop them from pulling the emergency brake once they’re on the train.

148
00:06:45,800 –> 00:06:46,960
The cameras are per view.

149
00:06:46,960 –> 00:06:48,840
They can tell you which car it happened in.

150
00:06:48,840 –> 00:06:50,320
They cannot prevent the derailment.

151
00:06:50,320 –> 00:06:52,240
The missing role is the guard on the train.

152
00:06:52,240 –> 00:06:56,880
The deterministic policy gate that evaluates each action at the moment it is about to execute.

153
00:06:56,880 –> 00:06:59,640
And that’s the heart of the architectural lie.

154
00:06:59,640 –> 00:07:04,160
Organizations keep building governance around artifacts that exist before and after execution,

155
00:07:04,160 –> 00:07:05,440
but not at execution.

156
00:07:05,440 –> 00:07:08,240
So you get beautiful audit trails and ugly outcomes.

157
00:07:08,240 –> 00:07:11,520
This also explains why embodiment makes the problem worse.

158
00:07:11,520 –> 00:07:15,000
The more polished the experience plane becomes, the more it masks the absence of control plane

159
00:07:15,000 –> 00:07:16,000
enforcement.

160
00:07:16,000 –> 00:07:18,480
The organization feels safer because it can see more.

161
00:07:18,480 –> 00:07:21,640
But visibility without gating is just higher resolution regret.

162
00:07:21,640 –> 00:07:26,640
Once you separate the two planes, you stop arguing about whether the platform has governance.

163
00:07:26,640 –> 00:07:27,640
It does.

164
00:07:27,640 –> 00:07:31,120
You start arguing about where in the timeline governance actually applies.

165
00:07:31,120 –> 00:07:35,080
And you stop treating that as semantics because timing is where incidents live.

166
00:07:35,080 –> 00:07:38,960
Token time control without tool time control is a polite front door with no locks inside

167
00:07:38,960 –> 00:07:40,280
the building.

168
00:07:40,280 –> 00:07:43,680
What Microsoft gets right and why it still doesn’t save you.

169
00:07:43,680 –> 00:07:45,160
Microsoft is not asleep at the wheel here.

170
00:07:45,160 –> 00:07:49,840
That’s what makes this harder because the comfortable critique is the platform is immature.

171
00:07:49,840 –> 00:07:50,840
It isn’t.

172
00:07:50,840 –> 00:07:54,360
The uncomfortable critique is that the platform is improving in the places enterprises

173
00:07:54,360 –> 00:07:58,800
like to measure while the failure happens in the place they avoid designing.

174
00:07:58,800 –> 00:07:59,800
Start with purview.

175
00:07:59,800 –> 00:08:03,800
Getting co-pided conversations into a compliant surface is real progress.

176
00:08:03,800 –> 00:08:05,480
Change the nature of investigations.

177
00:08:05,480 –> 00:08:06,480
They give you a timeline.

178
00:08:06,480 –> 00:08:09,320
They give you a record of what was asked and what was answered.

179
00:08:09,320 –> 00:08:14,800
They also give you a way to correlate that conversation with a user identity and increasingly

180
00:08:14,800 –> 00:08:18,080
with the sources the system touched that closes a lot of the old.

181
00:08:18,080 –> 00:08:19,880
We have no idea what it did problem.

182
00:08:19,880 –> 00:08:25,360
Co-pilot studio logging is the same category of win activity logging tool invocation traces.

183
00:08:25,360 –> 00:08:29,960
The ability to see what actions were taken and when again real for operations teams that’s

184
00:08:29,960 –> 00:08:33,800
better than folklore and screen recordings it turns agent behavior into something you can

185
00:08:33,800 –> 00:08:34,800
query.

186
00:08:34,800 –> 00:08:35,800
Now identity.

187
00:08:35,800 –> 00:08:39,000
Interest framing of workload identities and non-human identities is exactly where this

188
00:08:39,000 –> 00:08:40,000
should go.

189
00:08:40,000 –> 00:08:41,000
An agent is not a user.

190
00:08:41,000 –> 00:08:42,160
It is not an intern.

191
00:08:42,160 –> 00:08:47,080
It is a workload identity with automation privileges and treating it as such is the first

192
00:08:47,080 –> 00:08:48,600
admission of reality.

193
00:08:48,600 –> 00:08:51,480
Conditional access applying to those identities matters.

194
00:08:51,480 –> 00:08:54,920
Token issuance becomes conditional signals driven and enforceable.

195
00:08:54,920 –> 00:08:55,920
Risk goes up.

196
00:08:55,920 –> 00:08:57,160
Token issuance gets blocked.

197
00:08:57,160 –> 00:08:58,400
Device posture is wrong.

198
00:08:58,400 –> 00:08:59,880
Token issuance gets blocked.

199
00:08:59,880 –> 00:09:02,120
Token issuance gets blocked.

200
00:09:02,120 –> 00:09:03,720
You can make the front door real.

201
00:09:03,720 –> 00:09:07,640
And there’s also continuous access evaluation sitting in the background as Microsoft’s answer

202
00:09:07,640 –> 00:09:09,960
to context changes after sign in.

203
00:09:09,960 –> 00:09:13,960
It’s an attempt to reduce the time lag between a changing risk posture and what the token

204
00:09:13,960 –> 00:09:15,160
is allowed to do.

205
00:09:15,160 –> 00:09:16,160
That direction is correct.

206
00:09:16,160 –> 00:09:20,640
You can’t keep treating authentication as a one time ceremony in a world where sessions

207
00:09:20,640 –> 00:09:22,320
persist and context drift.

208
00:09:22,320 –> 00:09:23,520
All of that is necessary.

209
00:09:23,520 –> 00:09:25,040
All of it is still insufficient.

210
00:09:25,040 –> 00:09:26,760
Here’s the boundary you don’t get to hand wave.

211
00:09:26,760 –> 00:09:30,200
These controls mostly operate at token time and after execution.

212
00:09:30,200 –> 00:09:34,440
They don’t operate at action time inside the tool call path with deterministic intent

213
00:09:34,440 –> 00:09:35,440
enforcement.

214
00:09:35,440 –> 00:09:36,600
Per view tells you what happened.

215
00:09:36,600 –> 00:09:38,960
It does not decide what is allowed to happen next.

216
00:09:38,960 –> 00:09:43,560
Conditional access decides whether an identity should be issued a token under current conditions.

217
00:09:43,560 –> 00:09:48,800
It does not evaluate whether a specific delete, share or send is appropriate given the intent

218
00:09:48,800 –> 00:09:53,680
of the request, the sensitivity of the target and the venue in which the result will be exposed.

219
00:09:53,680 –> 00:09:57,920
That distinction matters because enterprise harm rarely looks like the agent got global

220
00:09:57,920 –> 00:09:59,400
admin.

221
00:09:59,400 –> 00:10:02,320
Microsoft has already blocked a lot of those extremes for agent identities.

222
00:10:02,320 –> 00:10:07,520
The real harm looks like the agent had legitimate right access in the wrong place or the agent

223
00:10:07,520 –> 00:10:10,800
retrieved legitimate data and disclosed it in the wrong venue.

224
00:10:10,800 –> 00:10:12,360
And those are action time failures.

225
00:10:12,360 –> 00:10:14,400
If you want to hear the gap, walk the timeline.

226
00:10:14,400 –> 00:10:15,720
Agent signs in.

227
00:10:15,720 –> 00:10:17,960
Conditional access evaluates token issued.

228
00:10:17,960 –> 00:10:18,960
Fine.

229
00:10:18,960 –> 00:10:19,960
Agent retrieves documents.

230
00:10:19,960 –> 00:10:20,960
It is entitled to retrieve.

231
00:10:20,960 –> 00:10:21,960
Fine.

232
00:10:21,960 –> 00:10:22,960
Transcript gets captured.

233
00:10:22,960 –> 00:10:24,840
The agents get recorded fine.

234
00:10:24,840 –> 00:10:29,000
Now the agent proposes a tool call, delete a site, share a file, post a message, send an

235
00:10:29,000 –> 00:10:31,200
email, trigger a workflow.

236
00:10:31,200 –> 00:10:35,440
Where is the deterministic policy gate that evaluates that proposed action against intent,

237
00:10:35,440 –> 00:10:38,840
scope, data classification and venue before the tool executes?

238
00:10:38,840 –> 00:10:40,440
In most deployments it isn’t there.

239
00:10:40,440 –> 00:10:42,520
The platform gave you the ticket booth and the cameras.

240
00:10:42,520 –> 00:10:45,120
It did not automatically give you a guard on the train.

241
00:10:45,120 –> 00:10:48,640
And because those Microsoft controls exist, organizations stop designing.

242
00:10:48,640 –> 00:10:50,320
They assume governance is covered.

243
00:10:50,320 –> 00:10:53,120
They feel safe because they can export transcripts.

244
00:10:53,120 –> 00:10:56,400
They feel safe because conditional access policies look mature.

245
00:10:56,400 –> 00:11:00,960
They feel safe because the agent has an identity object and identities feel like control.

246
00:11:00,960 –> 00:11:02,880
But control is not a directory object.

247
00:11:02,880 –> 00:11:04,160
Control is an enforcement point.

248
00:11:04,160 –> 00:11:05,880
So yes, praise the forensics.

249
00:11:05,880 –> 00:11:07,160
Praise the identity model.

250
00:11:07,160 –> 00:11:08,160
Praise the growing observability.

251
00:11:08,160 –> 00:11:10,120
And those are the raw materials you need.

252
00:11:10,120 –> 00:11:13,520
Then say the sentence that forces the architectural truth into the room.

253
00:11:13,520 –> 00:11:15,360
Microsoft has significantly improved visibility.

254
00:11:15,360 –> 00:11:18,200
They have not eliminated non-deterministic execution.

255
00:11:18,200 –> 00:11:23,000
Once you accept that, you stop asking the platform to save you with more logs and you start building

256
00:11:23,000 –> 00:11:24,480
the missing thing.

257
00:11:24,480 –> 00:11:28,280
Action time, per tool call determinism.

258
00:11:28,280 –> 00:11:29,760
Audit provenance policy gate.

259
00:11:29,760 –> 00:11:32,360
Here is the trilogy that keeps getting blurred on purpose.

260
00:11:32,360 –> 00:11:33,880
Audit is a record of what happened.

261
00:11:33,880 –> 00:11:37,800
Who asked what the agent said, which identity executed, which file got touched, which API

262
00:11:37,800 –> 00:11:39,440
got called, what time it happened?

263
00:11:39,440 –> 00:11:40,440
It’s a timeline.

264
00:11:40,440 –> 00:11:41,440
It’s useful.

265
00:11:41,440 –> 00:11:43,080
It’s also inherently retrospective.

266
00:11:43,080 –> 00:11:46,320
Audit is the black box flight recorder you consult after the impact.

267
00:11:46,320 –> 00:11:48,880
It doesn’t change the trajectory of the plane.

268
00:11:48,880 –> 00:11:51,920
Provenance is the missing middle that most teams pretend is nice to have.

269
00:11:51,920 –> 00:11:53,920
Provenance is not the transcript.

270
00:11:53,920 –> 00:11:57,720
Provenance is the decision chain, which chunks were retrieved, which candidates were considered

271
00:11:57,720 –> 00:12:02,560
and rejected, which tool options were available, which constraints were applied, and what caused

272
00:12:02,560 –> 00:12:04,080
the final selection.

273
00:12:04,080 –> 00:12:08,160
It is the explanation graph that ties inputs to outputs in a way that survives an incident

274
00:12:08,160 –> 00:12:09,160
review.

275
00:12:09,160 –> 00:12:12,360
Without provenance, you don’t know why the agent did what it did.

276
00:12:12,360 –> 00:12:13,840
You only know that it did it.

277
00:12:13,840 –> 00:12:15,840
And then there’s the part that prevents harm.

278
00:12:15,840 –> 00:12:16,840
The policy gate.

279
00:12:16,840 –> 00:12:20,560
A policy gate is a deterministic decision point that runs before execution.

280
00:12:20,560 –> 00:12:25,640
It evaluates a structured request against policy and authoritative state and returns, allow,

281
00:12:25,640 –> 00:12:26,640
deny or transform.

282
00:12:26,640 –> 00:12:28,160
It is not a prompt instruction.

283
00:12:28,160 –> 00:12:29,160
It is not a persona.

284
00:12:29,160 –> 00:12:31,360
It is not a please ask for confirmation.

285
00:12:31,360 –> 00:12:32,640
It is an enforcement layer.

286
00:12:32,640 –> 00:12:34,600
The agent cannot bypass.

287
00:12:34,600 –> 00:12:35,920
Most enterprises have ordered.

288
00:12:35,920 –> 00:12:37,520
Some have fragments of provenance.

289
00:12:37,520 –> 00:12:38,760
Almost none have a real gate.

290
00:12:38,760 –> 00:12:43,040
That distinction matters because your worst failures happen in the gap between entitled

291
00:12:43,040 –> 00:12:44,360
and appropriate.

292
00:12:44,360 –> 00:12:48,000
The agent can be entitled to read a document and still be wrong to disclose it in that

293
00:12:48,000 –> 00:12:49,000
venue.

294
00:12:49,000 –> 00:12:52,840
The agent can be entitled to write and still be wrong to write here now in that way.

295
00:12:52,840 –> 00:12:56,160
Audit will happily record the wrong thing with perfect fidelity.

296
00:12:56,160 –> 00:12:58,160
Provenance helps you argue with reality less.

297
00:12:58,160 –> 00:13:00,120
It tells you how you arrived at the bad action.

298
00:13:00,120 –> 00:13:03,680
It’s what you need when a regulator asks, why did the system decide this?

299
00:13:03,680 –> 00:13:06,320
And your only other answer is, it felt right.

300
00:13:06,320 –> 00:13:09,760
Provenance turns post mortems from fan fiction into analysis, but provenance still doesn’t

301
00:13:09,760 –> 00:13:11,000
prevent the incident.

302
00:13:11,000 –> 00:13:12,080
Only a gate does.

303
00:13:12,080 –> 00:13:13,960
And the thing most people miss is timing.

304
00:13:13,960 –> 00:13:17,680
The strongest built in controls are mostly outside the action path.

305
00:13:17,680 –> 00:13:19,680
Conditional access happens at token acquisition.

306
00:13:19,680 –> 00:13:23,480
Per view happens after the fact those are important controls, but they are not action time

307
00:13:23,480 –> 00:13:24,720
authorization.

308
00:13:24,720 –> 00:13:28,680
So here’s what audit provenance policy gate looks like on a real timeline.

309
00:13:28,680 –> 00:13:32,600
The user asks the agent to do something in the agent retrieves context.

310
00:13:32,600 –> 00:13:34,160
It compiles candidates.

311
00:13:34,160 –> 00:13:35,520
It selects a tool.

312
00:13:35,520 –> 00:13:38,600
In a safe architecture, there’s a hard boundary right there.

313
00:13:38,600 –> 00:13:41,680
The agent submits a request, not an imperative.

314
00:13:41,680 –> 00:13:45,600
Data, intent, scope, data class, venue and an operation ID.

315
00:13:45,600 –> 00:13:49,840
The policy engine evaluates those attributes against rules and authoritative state, produces

316
00:13:49,840 –> 00:13:53,000
a decision artifact and only then does execution happen.

317
00:13:53,000 –> 00:13:55,800
And the decision artifact gets stored next to the action.

318
00:13:55,800 –> 00:13:59,560
That last part is what makes governance real, because the artifact is proof, not narrative.

319
00:13:59,560 –> 00:14:00,560
You can sample it.

320
00:14:00,560 –> 00:14:01,560
You can query it.

321
00:14:01,560 –> 00:14:02,560
You can show it in an audit.

322
00:14:02,560 –> 00:14:07,720
A loud under rule, D104, constraints C17 based on state version 6.

323
00:14:07,720 –> 00:14:11,000
Or denied under rule V302 due to mixed audience.

324
00:14:11,000 –> 00:14:13,160
This is what prevention looks like when it’s measurable.

325
00:14:13,160 –> 00:14:15,800
Now, the obvious pushback is, but we have transcripts.

326
00:14:15,800 –> 00:14:17,000
We have citations.

327
00:14:17,000 –> 00:14:18,000
We have activity logs.

328
00:14:18,000 –> 00:14:19,000
Isn’t that provenance?

329
00:14:19,000 –> 00:14:20,000
No.

330
00:14:20,000 –> 00:14:22,440
Transcripts are experienced playing narration.

331
00:14:22,440 –> 00:14:24,200
Citations are retrieval references.

332
00:14:24,200 –> 00:14:25,360
Activity logs are event records.

333
00:14:25,360 –> 00:14:26,360
They are necessary.

334
00:14:26,360 –> 00:14:27,360
They are not sufficient.

335
00:14:27,360 –> 00:14:28,880
They do not tell you what was excluded.

336
00:14:28,880 –> 00:14:31,200
They do not tell you what alternatives were rejected.

337
00:14:31,200 –> 00:14:35,080
They do not tell you whether a policy evaluated the action before execution.

338
00:14:35,080 –> 00:14:37,680
They do not tell you whether the system could have stopped itself.

339
00:14:37,680 –> 00:14:40,960
If you remember, nothing else from this section, keep this ordering straight.

340
00:14:40,960 –> 00:14:42,400
It explains what happened.

341
00:14:42,400 –> 00:14:44,680
Provenance explains why that path was taken.

342
00:14:44,680 –> 00:14:47,840
A policy gate decides whether it’s allowed to happen at all.

343
00:14:47,840 –> 00:14:52,000
And when you add a face and a voice, you increase the probability that your organization

344
00:14:52,000 –> 00:14:54,120
confuses the first two for the third.

345
00:14:54,120 –> 00:14:58,120
Case study 1, mis-scoped tool call, deletes the wrong sharepoint side.

346
00:14:58,120 –> 00:15:02,320
Here’s the first failure pattern because it’s the one that keeps happening quietly in enterprises

347
00:15:02,320 –> 00:15:04,160
that believe were governed.

348
00:15:04,160 –> 00:15:08,160
A productivity team rolls out an agent to clean up obsolete project sites.

349
00:15:08,160 –> 00:15:09,480
The brief sounds harmless.

350
00:15:09,480 –> 00:15:11,280
The agent is grounded in sharepoint.

351
00:15:11,280 –> 00:15:15,680
It can read site metadata, pass a tracker spreadsheet, and it has Microsoft graph write

352
00:15:15,680 –> 00:15:19,720
access because eventually it needs to delete or archive things.

353
00:15:19,720 –> 00:15:21,240
The organization is proud.

354
00:15:21,240 –> 00:15:23,920
It’s using a dedicated workload identity.

355
00:15:23,920 –> 00:15:26,600
Conditional access is enforced and purview capture is enabled.

356
00:15:26,600 –> 00:15:29,080
At 0902, the agent authenticates.

357
00:15:29,080 –> 00:15:30,880
Conditional access evaluates and passes.

358
00:15:30,880 –> 00:15:31,880
A token is issued.

359
00:15:31,880 –> 00:15:32,880
No anomaly.

360
00:15:32,880 –> 00:15:33,880
No risk event.

361
00:15:33,880 –> 00:15:35,880
This is what good looks like.

362
00:15:35,880 –> 00:15:40,160
At 0905, a user asks, “Can you remove the old project spaces from last year?”

363
00:15:40,160 –> 00:15:42,640
The active list is in the project’s archive tracker.

364
00:15:42,640 –> 00:15:44,400
Now the agent does what agents do.

365
00:15:44,400 –> 00:15:45,400
It retrieves context.

366
00:15:45,400 –> 00:15:46,480
It reads the tracker.

367
00:15:46,480 –> 00:15:48,520
It searches for sites with similar names.

368
00:15:48,520 –> 00:15:49,720
It weighs signals.

369
00:15:49,720 –> 00:15:54,440
Last modified date, owner, whether a team’s channel exists, whether there are recent files,

370
00:15:54,440 –> 00:15:58,680
maybe a week hint, from an email thread, none of those are authoritative truth.

371
00:15:58,680 –> 00:15:59,680
They’re clues.

372
00:15:59,680 –> 00:16:00,680
Then it makes the choice.

373
00:16:00,680 –> 00:16:02,640
It selects a site that looks obsolete.

374
00:16:02,640 –> 00:16:03,800
And it calls the tool.

375
00:16:03,800 –> 00:16:06,520
It executes a graph delete on the wrong sharepoint site.

376
00:16:06,520 –> 00:16:07,520
Nothing exotic happened here.

377
00:16:07,520 –> 00:16:08,520
No prompt injection.

378
00:16:08,520 –> 00:16:10,120
No compromised credential.

379
00:16:10,120 –> 00:16:11,680
No global admin role.

380
00:16:11,680 –> 00:16:16,520
This is normal probabilistic selection, acting at machine speed, with standing right scopes.

381
00:16:16,520 –> 00:16:18,800
Now look at what your governance artifacts say.

382
00:16:18,800 –> 00:16:20,200
Purview will show an interaction.

383
00:16:20,200 –> 00:16:21,720
It will show the user request.

384
00:16:21,720 –> 00:16:23,520
It will show the agent’s response.

385
00:16:23,520 –> 00:16:24,520
You will see timestamps.

386
00:16:24,520 –> 00:16:25,920
You will see the agent identity.

387
00:16:25,920 –> 00:16:29,920
You may see citations pointing to the tracker and maybe a policy doc.

388
00:16:29,920 –> 00:16:33,600
And you will see an activity a site was deleted by that agent identity.

389
00:16:33,600 –> 00:16:34,600
Everything is correct.

390
00:16:34,600 –> 00:16:37,640
And none of it answers the question that matters in the incident review.

391
00:16:37,640 –> 00:16:38,800
Why that site?

392
00:16:38,800 –> 00:16:40,880
Not the narrative because it was obsolete.

393
00:16:40,880 –> 00:16:42,360
The actual decision chain.

394
00:16:42,360 –> 00:16:46,000
Which retrieved chunk, pushed it over the threshold, which alternative candidates were

395
00:16:46,000 –> 00:16:47,840
considered and rejected.

396
00:16:47,840 –> 00:16:51,040
What eligibility rule was evaluated at the moment of execution?

397
00:16:51,040 –> 00:16:54,920
In most deployments, the answer is, no eligibility rule was evaluated.

398
00:16:54,920 –> 00:16:56,760
The agent inferred eligibility.

399
00:16:56,760 –> 00:17:00,760
That inference became an action because the tool was callable and the token was valid.

400
00:17:00,760 –> 00:17:02,200
Or it gave you a story.

401
00:17:02,200 –> 00:17:03,680
It did not give you prevention.

402
00:17:03,680 –> 00:17:07,360
And the worst part is how the post-incident conversation usually goes because it’s always

403
00:17:07,360 –> 00:17:09,120
experience plane thinking.

404
00:17:09,120 –> 00:17:10,120
We’ll improve the prompt.

405
00:17:10,120 –> 00:17:11,840
We’ll add a confirmation step.

406
00:17:11,840 –> 00:17:13,960
We’ll tell users to be more specific.

407
00:17:13,960 –> 00:17:15,280
Those are all entropy generators.

408
00:17:15,280 –> 00:17:18,800
They add more conditional branches, more human confusion and more opportunity for the

409
00:17:18,800 –> 00:17:21,280
agent to interpret a suggestion as a command.

410
00:17:21,280 –> 00:17:25,080
The architectural fix is boring and it works because it doesn’t require belief.

411
00:17:25,080 –> 00:17:26,080
First, idempotency.

412
00:17:26,080 –> 00:17:30,120
Every destructive request carries an operation ID persisted before execution.

413
00:17:30,120 –> 00:17:33,560
The same request is replayed, retried, duplicated or reordered.

414
00:17:33,560 –> 00:17:37,240
The system returns the prior outcome and does not re-execute side effects.

415
00:17:37,240 –> 00:17:40,760
That turns event-driven unreliability into safe replay.

416
00:17:40,760 –> 00:17:41,920
Second, authoritative state.

417
00:17:41,920 –> 00:17:43,720
Eligible for deletion is not a vibe.

418
00:17:43,720 –> 00:17:46,360
It’s a state property stored in a system of record.

419
00:17:46,360 –> 00:17:50,000
If the authoritative catalog says retired through dormant 90 days and owner approved

420
00:17:50,000 –> 00:17:51,720
true, then the site can be deleted.

421
00:17:51,720 –> 00:17:53,440
If not, the site cannot be deleted.

422
00:17:53,440 –> 00:17:55,280
The agent does not get to negotiate that.

423
00:17:55,280 –> 00:17:57,280
Third, the policy gate.

424
00:17:57,280 –> 00:18:00,080
Before the delete tool executes, the agent submits a structure.

425
00:18:00,080 –> 00:18:01,080
Request.

426
00:18:01,080 –> 00:18:02,080
Actor.

427
00:18:02,080 –> 00:18:03,080
Intent.

428
00:18:03,080 –> 00:18:04,080
Delete.

429
00:18:04,080 –> 00:18:05,080
Scope.

430
00:18:05,080 –> 00:18:06,080
Side.

431
00:18:06,080 –> 00:18:07,080
ID.

432
00:18:07,080 –> 00:18:08,080
Data class.

433
00:18:08,080 –> 00:18:09,080
Venue.

434
00:18:09,080 –> 00:18:10,080
Operation.

435
00:18:10,080 –> 00:18:11,080
ID.

436
00:18:11,080 –> 00:18:12,080
The policy engine evaluates that request against rules and authoritative state and returns allow,

437
00:18:12,080 –> 00:18:13,080
deny or transform.

438
00:18:13,080 –> 00:18:16,080
If it denies, the tool never sees the request.

439
00:18:16,080 –> 00:18:19,080
If it allows, the decision artifact is stored next to the action.

440
00:18:19,080 –> 00:18:21,440
Now, replay the same scenario under that model.

441
00:18:21,440 –> 00:18:22,840
The agent compiles candidates.

442
00:18:22,840 –> 00:18:24,520
It proposes the wrong side.

443
00:18:24,520 –> 00:18:28,160
The policy engine evaluates the proposal against the authoritative catalog.

444
00:18:28,160 –> 00:18:30,160
The wrong side fails eligibility.

445
00:18:30,160 –> 00:18:31,160
Deny.

446
00:18:31,160 –> 00:18:32,160
The user still gets a transcript.

447
00:18:32,160 –> 00:18:33,480
The activity logs still exist.

448
00:18:33,480 –> 00:18:37,320
The difference is that your incident is now a denied decision, not a post-mortem.

449
00:18:37,320 –> 00:18:41,080
That’s what audit provenance policy gate means operationally.

450
00:18:41,080 –> 00:18:42,840
Audit will always be perfect after the damage.

451
00:18:42,840 –> 00:18:44,880
A gate makes the damage never happen.

452
00:18:44,880 –> 00:18:46,560
Case study 2.

453
00:18:46,560 –> 00:18:47,560
Compliant retrieval.

454
00:18:47,560 –> 00:18:49,560
Policy violation via voice in a meeting.

455
00:18:49,560 –> 00:18:53,600
Now move from wrong target to the failure that governance teams hate because it breaks

456
00:18:53,600 –> 00:18:55,760
all their comfortable categories.

457
00:18:55,760 –> 00:18:56,760
Everything is entitled.

458
00:18:56,760 –> 00:18:57,760
Everything is logged.

459
00:18:57,760 –> 00:18:58,760
Correct.

460
00:18:58,760 –> 00:19:00,080
And it’s still unacceptable.

461
00:19:00,080 –> 00:19:03,560
An HR assistant agent gets deployed into team’s meetings.

462
00:19:03,560 –> 00:19:08,400
It’s grounded on policy documents, FAQ’s, compensation guidance and a curated SharePoint

463
00:19:08,400 –> 00:19:11,080
library managed by the compensation team.

464
00:19:11,080 –> 00:19:12,520
The pitch sounds responsible.

465
00:19:12,520 –> 00:19:13,520
The agent is read only.

466
00:19:13,520 –> 00:19:15,080
It’s not writing anywhere.

467
00:19:15,080 –> 00:19:17,480
And it’s meant to reduce interruptions in live calls.

468
00:19:17,480 –> 00:19:18,760
A director asks a question.

469
00:19:18,760 –> 00:19:19,760
The agent answers.

470
00:19:19,760 –> 00:19:20,760
Everyone moves on.

471
00:19:20,760 –> 00:19:22,120
The identity model looks clean.

472
00:19:22,120 –> 00:19:24,400
It runs under a workload identity.

473
00:19:24,400 –> 00:19:26,120
Conditional access protects token issuance.

474
00:19:26,120 –> 00:19:29,240
Per view is configured to capture conversation transcripts.

475
00:19:29,240 –> 00:19:30,920
Copilot activity logs are enabled.

476
00:19:30,920 –> 00:19:33,080
From a governance standpoint it checks boxes.

477
00:19:33,080 –> 00:19:34,080
Then the meeting happens.

478
00:19:34,080 –> 00:19:37,160
A director asks, what are the employee trends this quarter?

479
00:19:37,160 –> 00:19:38,720
That question is vague on purpose.

480
00:19:38,720 –> 00:19:43,240
Humans ask vague questions in meetings because they don’t want to specify constraints out loud.

481
00:19:43,240 –> 00:19:46,400
They assume the audience understands the implied boundaries.

482
00:19:46,400 –> 00:19:49,120
Don’t mention anything sensitive in front of externals.

483
00:19:49,120 –> 00:19:50,120
Keep it high level.

484
00:19:50,120 –> 00:19:53,120
Don’t surface anything that can be misinterpreted or forwarded.

485
00:19:53,120 –> 00:19:54,800
The agent does not have those instincts.

486
00:19:54,800 –> 00:19:56,240
It does what it was built to do.

487
00:19:56,240 –> 00:19:57,240
It retrieves.

488
00:19:57,240 –> 00:19:58,240
It aggregates.

489
00:19:58,240 –> 00:19:59,240
It summarizes.

490
00:19:59,240 –> 00:20:01,680
It picks numbers because numbers sound authoritative.

491
00:20:01,680 –> 00:20:03,680
It synthesizes a clean verbal answer.

492
00:20:03,680 –> 00:20:04,840
And it says it out loud.

493
00:20:04,840 –> 00:20:07,920
Maybe it mentions compensation movement by level in region.

494
00:20:07,920 –> 00:20:10,280
Maybe it reports internal mobility rates.

495
00:20:10,280 –> 00:20:14,600
Maybe it references subgroup deltas because the underlying documents include those charts.

496
00:20:14,600 –> 00:20:15,600
No names.

497
00:20:15,600 –> 00:20:17,320
No row level PII.

498
00:20:17,320 –> 00:20:18,960
No single record disclosure.

499
00:20:18,960 –> 00:20:20,760
Still a policy violation.

500
00:20:20,760 –> 00:20:22,760
Because the harm here isn’t access.

501
00:20:22,760 –> 00:20:24,520
The harm is venue.

502
00:20:24,520 –> 00:20:26,960
The harm is aggregation.

503
00:20:26,960 –> 00:20:29,080
The meeting includes external participants.

504
00:20:29,080 –> 00:20:33,200
Vendors, a partner org, someone dialing in from an unfamiliar domain that happens constantly

505
00:20:33,200 –> 00:20:34,920
in modern enterprises.

506
00:20:34,920 –> 00:20:36,880
Teams meetings are porous by default.

507
00:20:36,880 –> 00:20:39,440
The audience boundary shifts in real time.

508
00:20:39,440 –> 00:20:42,160
And the agent, because it is speaking, becomes an egress path.

509
00:20:42,160 –> 00:20:46,120
Now look at the telemetry and watch how it fails you while remaining technically correct.

510
00:20:46,120 –> 00:20:47,320
Per view shows the transcript.

511
00:20:47,320 –> 00:20:48,800
The question and the answer are there.

512
00:20:48,800 –> 00:20:51,240
The citations point to the right HR library documents.

513
00:20:51,240 –> 00:20:52,880
The agent identity is valid.

514
00:20:52,880 –> 00:20:56,240
The user who asked the question is entitled to the documents.

515
00:20:56,240 –> 00:20:57,800
The share point permissions are correct.

516
00:20:57,800 –> 00:20:59,440
The retrieval was security trimmed.

517
00:20:59,440 –> 00:21:01,240
All the traditional controls passed.

518
00:21:01,240 –> 00:21:02,480
So what exactly was missing?

519
00:21:02,480 –> 00:21:05,360
The policy evaluation that should have happened at speech time.

520
00:21:05,360 –> 00:21:09,320
Nobody asked a deterministic question like, is it permissible to verbalize this class of

521
00:21:09,320 –> 00:21:13,080
information at this aggregation level in this venue to this audience?

522
00:21:13,080 –> 00:21:15,720
Because speech was treated as output, not action.

523
00:21:15,720 –> 00:21:16,720
This is the trap.

524
00:21:16,720 –> 00:21:18,840
Teams treat tool calls like actions.

525
00:21:18,840 –> 00:21:22,320
Graph rights, deletes, shares, but they treat speech like harmless UI.

526
00:21:22,320 –> 00:21:23,160
It is not.

527
00:21:23,160 –> 00:21:25,040
In a meeting speech is publication.

528
00:21:25,040 –> 00:21:28,000
It leaves the system boundary the moment it hits the room.

529
00:21:28,000 –> 00:21:29,080
People repeat it.

530
00:21:29,080 –> 00:21:30,080
Screen shots happen.

531
00:21:30,080 –> 00:21:31,640
Someone says, can you send that to me?

532
00:21:31,640 –> 00:21:32,760
And now it’s in chat.

533
00:21:32,760 –> 00:21:37,240
The output becomes durable even if the data never left share point at the file level.

534
00:21:37,240 –> 00:21:40,640
Your deal-p policies can stay green while your policy posture goes red.

535
00:21:40,640 –> 00:21:44,360
And because the agent sounds calm and competent, nobody interrupts it.

536
00:21:44,360 –> 00:21:47,440
Human interface trust bias turns the meeting into an amplifier.

537
00:21:47,440 –> 00:21:51,000
The agent just shipped an aggregation to a mixed audience at machine speed, wrapped in

538
00:21:51,000 –> 00:21:52,800
a tone that implies permission.

539
00:21:52,800 –> 00:21:54,560
Now the fix again isn’t band voice.

540
00:21:54,560 –> 00:21:56,760
The fix is to treat voice as a tool call.

541
00:21:56,760 –> 00:22:00,640
Before the agent speaks, you classify the output, not just the input documents.

542
00:22:00,640 –> 00:22:01,680
The output.

543
00:22:01,680 –> 00:22:06,320
You attach attributes, data class compensation aggregation cohort, venue or team’s meeting,

544
00:22:06,320 –> 00:22:09,000
audience mixed external present true.

545
00:22:09,000 –> 00:22:12,080
Then you submit that as a request to a policy engine.

546
00:22:12,080 –> 00:22:16,120
And the policy engine does what humans do automatically and machines never do unless

547
00:22:16,120 –> 00:22:17,160
you force them to.

548
00:22:17,160 –> 00:22:19,200
It evaluates a rule like.

549
00:22:19,200 –> 00:22:23,720
Compensation cohorts may not be disclosed verbally when external participants are present.

550
00:22:23,720 –> 00:22:27,040
Allow only high level summaries with no subgroup references.

551
00:22:27,040 –> 00:22:30,600
Transform the response or deny it if it denies the speech to never runs.

552
00:22:30,600 –> 00:22:33,480
If it transforms, the agent speaks a sanitized version.

553
00:22:33,480 –> 00:22:35,680
High level trends remained within target ranges.

554
00:22:35,680 –> 00:22:39,120
Detailed breakdown is available to HR only audiences.

555
00:22:39,120 –> 00:22:43,240
And the decision artifact gets stored next to the action denied or transformed under rule

556
00:22:43,240 –> 00:22:46,000
v302 with the attributes that triggered it.

557
00:22:46,000 –> 00:22:47,360
Now replay the incident.

558
00:22:47,360 –> 00:22:50,200
Some questions, same retrieval, same entitlement, different outcome.

559
00:22:50,200 –> 00:22:54,040
The agent proposes a detailed answer, the control plane disposes, the meeting gets a safe

560
00:22:54,040 –> 00:22:57,280
summary, and your governance story becomes boring on purpose.

561
00:22:57,280 –> 00:23:01,040
Because compliance systems still fail when venue and intent aren’t enforced at the moment

562
00:23:01,040 –> 00:23:02,440
of publication.

563
00:23:02,440 –> 00:23:06,120
Case study 3, external shadow agent with internal blast radius.

564
00:23:06,120 –> 00:23:10,280
Now the failure pattern that doesn’t show up as a breach until the screenshots are already

565
00:23:10,280 –> 00:23:11,280
circulating.

566
00:23:11,280 –> 00:23:13,120
A developer is under pressure.

567
00:23:13,120 –> 00:23:14,360
Support tickets are piling up.

568
00:23:14,360 –> 00:23:18,920
The product team wants a deflection bot and someone has seen a demo where an agent answers questions

569
00:23:18,920 –> 00:23:19,920
instantly.

570
00:23:19,920 –> 00:23:21,720
So they do what modern platforms encourage.

571
00:23:21,720 –> 00:23:26,240
They stand up an externally accessible agent, put a chat widget on a public page, and wire

572
00:23:26,240 –> 00:23:29,320
it to enterprise knowledge so it doesn’t sound stupid.

573
00:23:29,320 –> 00:23:34,360
And because it’s just answering questions, they give it broad read scopes to internal content.

574
00:23:34,360 –> 00:23:38,960
A share point side with runbooks, a wiki, maybe a knowledge base, maybe a support analytics

575
00:23:38,960 –> 00:23:39,960
store.

576
00:23:39,960 –> 00:23:42,280
They also add a couple of write scopes for later.

577
00:23:42,280 –> 00:23:46,320
Because they’re malicious, because future features always arrive and nobody wants to redo consent,

578
00:23:46,320 –> 00:23:48,600
the agent authenticates using an app registration.

579
00:23:48,600 –> 00:23:49,600
It gets a token.

580
00:23:49,600 –> 00:23:51,000
It calls internal systems.

581
00:23:51,000 –> 00:23:52,000
Everything is legitimate.

582
00:23:52,000 –> 00:23:53,480
That’s the core danger here.

583
00:23:53,480 –> 00:23:55,480
Nothing has to be compromised for this to go wrong.

584
00:23:55,480 –> 00:24:00,720
A customer asks a harmless question, what’s the work around for the X120 firmware outage?

585
00:24:00,720 –> 00:24:04,320
The agent retrieves internal runbooks and post mortem fragments that were never meant

586
00:24:04,320 –> 00:24:05,600
to leave the tenant.

587
00:24:05,600 –> 00:24:07,320
It assembles a confident answer.

588
00:24:07,320 –> 00:24:08,800
It publishes it to the public chat.

589
00:24:08,800 –> 00:24:12,800
No exploit chain, no prompt injection, no data exfiltration tooling, just a public interface

590
00:24:12,800 –> 00:24:16,600
connected to an internal corpus by an overpermissioned workload identity.

591
00:24:16,600 –> 00:24:19,680
Now walk through what the logs tell you and what they can’t.

592
00:24:19,680 –> 00:24:22,720
Enter shows token issuance under a workload identity.

593
00:24:22,720 –> 00:24:26,440
If conditional access is configured for that identity, it evaluates the signing context

594
00:24:26,440 –> 00:24:27,720
and issues the token.

595
00:24:27,720 –> 00:24:30,440
Per view shows the agent reading internal documents.

596
00:24:30,440 –> 00:24:34,880
The audit trail is pristine, identity timestamps resource access downstream calls.

597
00:24:34,880 –> 00:24:39,160
The organization can prove down to the minute that the agent touched those files and responded

598
00:24:39,160 –> 00:24:40,640
to that external user.

599
00:24:40,640 –> 00:24:45,320
And that’s the trap, because the logs being correct becomes evidence incorrectly that the

600
00:24:45,320 –> 00:24:46,680
system was governed.

601
00:24:46,680 –> 00:24:49,640
What’s missing is the decision chain and the boundary enforcement.

602
00:24:49,640 –> 00:24:53,720
Why did the external request get access to internal only material, which rule asserted

603
00:24:53,720 –> 00:24:56,360
that this venue is allowed to consume that corpus?

604
00:24:56,360 –> 00:25:01,320
Where is the policy artifact that says public audience internal classification deny disclosure?

605
00:25:01,320 –> 00:25:04,840
In most shadow deployments, there is no artifact because there was no gate.

606
00:25:04,840 –> 00:25:08,280
The agent selected sources based on similarity and availability.

607
00:25:08,280 –> 00:25:10,920
The tool call executed because the token allowed it.

608
00:25:10,920 –> 00:25:13,040
The system did exactly what you configured.

609
00:25:13,040 –> 00:25:16,720
This is where audit provenance policy gate becomes operationally expensive.

610
00:25:16,720 –> 00:25:18,360
Audit tells you the leak happened.

611
00:25:18,360 –> 00:25:21,840
Provenance would tell you how the agent chose that runbook over a public doc.

612
00:25:21,840 –> 00:25:23,120
What other candidates existed?

613
00:25:23,120 –> 00:25:24,640
What was excluded and why?

614
00:25:24,640 –> 00:25:28,280
A policy gate would have prevented the response from ever being published externally.

615
00:25:28,280 –> 00:25:30,640
But the public facing agent usually has none of that.

616
00:25:30,640 –> 00:25:35,200
It has an experience plane that looks polished and a control plane that is effectively absent.

617
00:25:35,200 –> 00:25:37,240
Now the blast radius, this isn’t a single reply.

618
00:25:37,240 –> 00:25:38,880
It’s speed, reach and replication.

619
00:25:38,880 –> 00:25:41,920
The agent can answer a thousand external users in a day.

620
00:25:41,920 –> 00:25:45,120
Each answer can include a slightly different internal detail.

621
00:25:45,120 –> 00:25:48,600
Customers screenshot aggregators scrape the information spreads because the interfaces

622
00:25:48,600 –> 00:25:51,560
public and the system is consistent in the one way that matters.

623
00:25:51,560 –> 00:25:53,200
It’s consistently allowed.

624
00:25:53,200 –> 00:25:54,960
And the post incident review is predictable.

625
00:25:54,960 –> 00:25:59,920
People say we’ll tighten the prompt or we’ll add a disclaimer or we’ll retrain the model.

626
00:25:59,920 –> 00:26:01,480
Those are not containment strategies.

627
00:26:01,480 –> 00:26:03,000
Those are narrative strategies.

628
00:26:03,000 –> 00:26:07,000
The deterministic fix is boring and it works because it creates failure domains.

629
00:26:07,000 –> 00:26:09,280
First, split the identities.

630
00:26:09,280 –> 00:26:13,840
The public facing agent identity must have zero access to internal core data planes.

631
00:26:13,840 –> 00:26:14,840
None.

632
00:26:14,840 –> 00:26:17,640
It should only query a curated, published, approved external knowledge base.

633
00:26:17,640 –> 00:26:22,840
If the public corpus can’t answer the correct behavior as refusal or escalation, not improvisation.

634
00:26:22,840 –> 00:26:26,920
Second, if you truly need internal knowledge to support external responses, you introduce

635
00:26:26,920 –> 00:26:27,920
a broker.

636
00:26:27,920 –> 00:26:31,380
The public agent can ask the broker for candidate content but the broker is the policy

637
00:26:31,380 –> 00:26:32,380
gate.

638
00:26:32,380 –> 00:26:34,640
It evaluates venue, audience and data classification.

639
00:26:34,640 –> 00:26:35,980
It transforms or denies.

640
00:26:35,980 –> 00:26:39,880
The public agent never sees internal chunks that are not eligible for egress.

641
00:26:39,880 –> 00:26:42,780
Third, persist the decision artifact with the action.

642
00:26:42,780 –> 00:26:48,520
When a response is allowed externally, you store a loud underrule EX2-1 source set,

643
00:26:48,520 –> 00:26:51,320
PUB docs 2024 Q2.

644
00:26:51,320 –> 00:26:56,640
When it’s denied, you store denied underrule EX3-01 internal only content.

645
00:26:56,640 –> 00:27:00,120
Now your audit stops being a story and becomes proof of enforcement.

646
00:27:00,120 –> 00:27:01,960
Replay the same incident under that model.

647
00:27:01,960 –> 00:27:03,880
The customer asks about firmware.

648
00:27:03,880 –> 00:27:06,880
The public agent searches the external corpus and finds nothing definitive.

649
00:27:06,880 –> 00:27:08,160
It asks the broker.

650
00:27:08,160 –> 00:27:11,280
The broker evaluates the internal candidate and denies egress.

651
00:27:11,280 –> 00:27:12,920
The agent replies calmly.

652
00:27:12,920 –> 00:27:17,000
I can’t share internal remediation notes here but I can connect you with support.

653
00:27:17,000 –> 00:27:19,160
The screenshot that circulates is a refusal.

654
00:27:19,160 –> 00:27:22,000
That is what containment looks like when you stop trusting the interface and start

655
00:27:22,000 –> 00:27:24,200
enforcing the control plane.

656
00:27:24,200 –> 00:27:27,320
The internal standardizes the envelope, not the guarantees.

657
00:27:27,320 –> 00:27:31,560
Microsoft is right about one thing that most teams quietly misunderstand.

658
00:27:31,560 –> 00:27:35,800
Activities, turn context, direct line, the bot framework patterns, those are not hacks.

659
00:27:35,800 –> 00:27:37,560
They are intentional design surfaces.

660
00:27:37,560 –> 00:27:41,560
They’re how Microsoft expects you to build conversational systems that can run across channels

661
00:27:41,560 –> 00:27:43,800
and survive real-world connectivity.

662
00:27:43,800 –> 00:27:48,640
But teams keep hearing supported protocol and mentally upgrading it to guaranteed behavior.

663
00:27:48,640 –> 00:27:49,640
It is not.

664
00:27:49,640 –> 00:27:51,680
A protocol standardizes the envelope.

665
00:27:51,680 –> 00:27:55,560
It standardizes field names, schemas and how events are represented on the wire.

666
00:27:55,560 –> 00:27:59,920
It does not standardize the guarantees you wish you had, ordering exactly once delivery,

667
00:27:59,920 –> 00:28:02,360
causal consistency or safe side effects.

668
00:28:02,360 –> 00:28:06,280
That distinction matters because the moment you wire tool execution to an event stream,

669
00:28:06,280 –> 00:28:07,840
you’ve built a distributed system.

670
00:28:07,840 –> 00:28:11,160
And distributed systems don’t fail because you wrote bad code.

671
00:28:11,160 –> 00:28:13,800
They fail because reality is asynchronous.

672
00:28:13,800 –> 00:28:17,120
Here are the four failure modes you inherit the second you go event-driven.

673
00:28:17,120 –> 00:28:20,520
Duplication, delay, reordering and loss.

674
00:28:20,520 –> 00:28:23,080
Not edge cases, not rare, the environment.

675
00:28:23,080 –> 00:28:25,120
A retry duplicates an activity.

676
00:28:25,120 –> 00:28:27,720
A congested path delays it, two workers re-order it.

677
00:28:27,720 –> 00:28:29,680
A transient broker drop loses it.

678
00:28:29,680 –> 00:28:31,480
The SDK abstracts the plumbing.

679
00:28:31,480 –> 00:28:32,720
It does not repeal physics.

680
00:28:32,720 –> 00:28:33,720
Now add tools.

681
00:28:33,720 –> 00:28:38,160
A send email, delete site or post-message tool call is not a chat reply.

682
00:28:38,160 –> 00:28:39,160
It’s a side effect.

683
00:28:39,160 –> 00:28:41,480
Side effects are where your system becomes expensive.

684
00:28:41,480 –> 00:28:45,560
If you treat an incoming activity as authoritative state you are saying, “If I see this envelope,

685
00:28:45,560 –> 00:28:46,880
I will mutate the world.”

686
00:28:46,880 –> 00:28:49,080
That’s fine for rendering a typing indicator.

687
00:28:49,080 –> 00:28:51,040
It’s insane for deleting a site.

688
00:28:51,040 –> 00:28:54,480
And this is exactly how you get the incident pattern from the opening.

689
00:28:54,480 –> 00:28:59,000
Conditional access issued a token once, the context drifted, the agent executed anyway,

690
00:28:59,000 –> 00:29:02,440
and per view logged the whole tragedy with perfect fidelity.

691
00:29:02,440 –> 00:29:06,840
The comfortable response is to say, “Will did you, teams try to did you by best effort in

692
00:29:06,840 –> 00:29:11,520
memory caches, fuzzy comparisons, if the text matches or correlation IDs that exist only

693
00:29:11,520 –> 00:29:13,040
inside a process boundary?”

694
00:29:13,040 –> 00:29:16,080
That’s not id-impotency, that’s optimism.

695
00:29:16,080 –> 00:29:17,480
But impotency is a contract.

696
00:29:17,480 –> 00:29:22,240
The same operation id produces the same result once, no matter how many times it arrives.

697
00:29:22,240 –> 00:29:26,200
And the only way to make that true is to persist the operation id in an authoritative store

698
00:29:26,200 –> 00:29:28,080
before the side effect happens.

699
00:29:28,080 –> 00:29:29,920
That store is the real boundary.

700
00:29:29,920 –> 00:29:33,080
Not the activity envelope, not the transcript, not the avatar.

701
00:29:33,080 –> 00:29:35,200
This is where most agent architectures quietly rot.

702
00:29:35,200 –> 00:29:37,000
They build a state machine out of envelopes.

703
00:29:37,000 –> 00:29:40,720
Turn context feels like state because it carries context, but it’s a context object, not

704
00:29:40,720 –> 00:29:41,720
a ledger.

705
00:29:41,720 –> 00:29:43,800
It’s a structured wrapper for a single turn.

706
00:29:43,800 –> 00:29:46,680
Not a durable source of truth for workflow eligibility.

707
00:29:46,680 –> 00:29:49,680
When the process restarts, the truth evaporates.

708
00:29:49,680 –> 00:29:53,520
And the event stream happily replays old messages into a new process that has no memory

709
00:29:53,520 –> 00:29:54,920
of what it already did.

710
00:29:54,920 –> 00:29:56,400
That’s conditional chaos.

711
00:29:56,400 –> 00:29:58,600
And notice what makes it worse, embodiment.

712
00:29:58,600 –> 00:30:00,520
Voice adds latency sensitivity.

713
00:30:00,520 –> 00:30:01,520
Streaming adds retries.

714
00:30:01,520 –> 00:30:04,080
Web RTC reconnect logic adds more events.

715
00:30:04,080 –> 00:30:08,040
The experience plane injects more asynchronous behavior into the system, which increases the

716
00:30:08,040 –> 00:30:11,640
probability of duplicates, reordering, and partial failures.

717
00:30:11,640 –> 00:30:13,160
Exactly where your tool calls live.

718
00:30:13,160 –> 00:30:14,880
So the fix starts with the demotion.

719
00:30:14,880 –> 00:30:17,160
Demote events to proposals and telemetry.

720
00:30:17,160 –> 00:30:20,960
Treat every event as a thing that happened or a request that arrived.

721
00:30:20,960 –> 00:30:23,440
Not a state transition that must execute.

722
00:30:23,440 –> 00:30:25,960
Your authoritative workflow position lives elsewhere.

723
00:30:25,960 –> 00:30:27,440
Eligibility lives elsewhere.

724
00:30:27,440 –> 00:30:28,960
The decision lives elsewhere.

725
00:30:28,960 –> 00:30:30,520
Then you do the boring thing that works.

726
00:30:30,520 –> 00:30:32,280
You interpose a deterministic gate.

727
00:30:32,280 –> 00:30:33,880
The agent does not send commands.

728
00:30:33,880 –> 00:30:36,640
It sends structured requests with an operation ID.

729
00:30:36,640 –> 00:30:42,240
The policy engine evaluates against authoritative state and returns, allow, deny, or transform.

730
00:30:42,240 –> 00:30:45,040
These accept decisions, not imperatives.

731
00:30:45,040 –> 00:30:49,800
If an event replays, the same operation ID returns the same decision and the same effect.

732
00:30:49,800 –> 00:30:53,960
If the event arrives out of order, the state store says resource doesn’t exist yet.

733
00:30:53,960 –> 00:30:56,040
And the request gets denied deterministically.

734
00:30:56,040 –> 00:31:00,720
If the event arrives late, it gets the same cash decision, not a fresh guess.

735
00:31:00,720 –> 00:31:03,840
Protocol standardization made the system interoperable.

736
00:31:03,840 –> 00:31:05,680
Deterministic design makes it survivable.

737
00:31:05,680 –> 00:31:06,920
Event-driven entropy.

738
00:31:06,920 –> 00:31:09,320
Why retries become incidents without determinism?

739
00:31:09,320 –> 00:31:12,800
This is where enterprise teams accidentally build roulette tables and then act shocked when

740
00:31:12,800 –> 00:31:14,120
the ball lands on red.

741
00:31:14,120 –> 00:31:15,840
They wire an agent to an event stream.

742
00:31:15,840 –> 00:31:19,920
They see clean activities arriving and they treat the arrival of an envelope as permission

743
00:31:19,920 –> 00:31:21,360
to mutate the world.

744
00:31:21,360 –> 00:31:25,160
Create the task, delete the site, send the email, post the message, and because the agent

745
00:31:25,160 –> 00:31:28,200
is just responding, they don’t treat it like a transactional system.

746
00:31:28,200 –> 00:31:29,440
They treat it like UI.

747
00:31:29,440 –> 00:31:30,840
That’s the foundational mistake.

748
00:31:30,840 –> 00:31:33,240
In an event-driven system, delivery is not a guarantee.

749
00:31:33,240 –> 00:31:34,240
It is an attempt.

750
00:31:34,240 –> 00:31:35,240
The platform will retry.

751
00:31:35,240 –> 00:31:36,720
The SDK will reconnect.

752
00:31:36,720 –> 00:31:38,160
WebRTC will renegotiate.

753
00:31:38,160 –> 00:31:39,640
The broker will re-deliver.

754
00:31:39,640 –> 00:31:43,120
When you add a speaking agent, you add more opportunities for those retries because the

755
00:31:43,120 –> 00:31:46,840
experience plane depends on low latency streaming and noisy networks.

756
00:31:46,840 –> 00:31:48,160
The system compensates.

757
00:31:48,160 –> 00:31:49,640
It tries again.

758
00:31:49,640 –> 00:31:50,800
Here’s the part.

759
00:31:50,800 –> 00:31:52,480
People refuse to internalize.

760
00:31:52,480 –> 00:31:54,760
At least once delivery is not reliability.

761
00:31:54,760 –> 00:31:56,880
It is duplication with good intentions.

762
00:31:56,880 –> 00:32:00,520
If your tool call is not a damp-potent, at least once becomes twice.

763
00:32:00,520 –> 00:32:03,120
Twice is an incident if the action isn’t reversible.

764
00:32:03,120 –> 00:32:05,000
The cleanest example is email.

765
00:32:05,000 –> 00:32:07,880
Everyone thinks email is harmless because it’s not deleting data.

766
00:32:07,880 –> 00:32:11,440
When the agent sends the same message twice because a transient error occurred after the

767
00:32:11,440 –> 00:32:15,600
first send before the system persisted success, the business impact isn’t technical.

768
00:32:15,600 –> 00:32:16,600
It’s human.

769
00:32:16,600 –> 00:32:18,080
People respond to the wrong thread.

770
00:32:18,080 –> 00:32:19,240
Someone escalates.

771
00:32:19,240 –> 00:32:20,480
Someone forwards.

772
00:32:20,480 –> 00:32:24,480
Now you’ve created confusion and possibly disclosure and your logs will insist everything

773
00:32:24,480 –> 00:32:26,840
was fine because both sends were legitimate.

774
00:32:26,840 –> 00:32:31,040
Now, upgrade the action to SharePoint Deletion or Permissions Changes and the same pattern

775
00:32:31,040 –> 00:32:32,440
becomes catastrophic.

776
00:32:32,440 –> 00:32:34,360
This is what actually happens in real workloads.

777
00:32:34,360 –> 00:32:35,840
The agent proposes an action.

778
00:32:35,840 –> 00:32:37,320
The orchestrator calls the tool.

779
00:32:37,320 –> 00:32:38,320
The tool executes.

780
00:32:38,320 –> 00:32:41,560
The response is slow or the network flakes or the process restarts.

781
00:32:41,560 –> 00:32:44,160
The orchestrator never receives the success, so it retries.

782
00:32:44,160 –> 00:32:45,640
The tool executes again.

783
00:32:45,640 –> 00:32:47,360
You now have two side effects.

784
00:32:47,360 –> 00:32:50,560
And the only reason you’re surprised is because you treated the first execution as if it

785
00:32:50,560 –> 00:32:52,400
was tied to the event receipt.

786
00:32:52,400 –> 00:32:53,400
It wasn’t.

787
00:32:53,400 –> 00:32:54,400
The action was tied to optimism.

788
00:32:54,400 –> 00:32:56,280
That’s why D-D-D-UP doesn’t save you.

789
00:32:56,280 –> 00:32:58,880
D-D-D-UP by best effort is not a safety mechanism.

790
00:32:58,880 –> 00:33:00,400
It is a logging convenience.

791
00:33:00,400 –> 00:33:01,800
People did do it using message text.

792
00:33:01,800 –> 00:33:04,920
That fails the first time the model rephrases the same intent.

793
00:33:04,920 –> 00:33:09,600
People did d-D-UP using timestamps that fails when delayed delivery shifts the arrival window.

794
00:33:09,600 –> 00:33:10,680
People did d-UP in memory.

795
00:33:10,680 –> 00:33:12,120
That fails on process restart.

796
00:33:12,120 –> 00:33:14,080
People did d-UP by correlating turn IDs.

797
00:33:14,080 –> 00:33:17,240
That fails across channels and adapters where IDs are transformed.

798
00:33:17,240 –> 00:33:19,400
The important see is not probably the same.

799
00:33:19,400 –> 00:33:20,760
It is provably the same.

800
00:33:20,760 –> 00:33:24,480
And provably the same requires two things, a stable operation identity and an authoritative

801
00:33:24,480 –> 00:33:26,400
state store that outlives the process.

802
00:33:26,400 –> 00:33:27,560
This is the system law.

803
00:33:27,560 –> 00:33:30,960
If an event can’t be safely replayed, it shouldn’t control state.

804
00:33:30,960 –> 00:33:32,520
Now apply that law to agents.

805
00:33:32,520 –> 00:33:33,960
The agents job is to propose.

806
00:33:33,960 –> 00:33:37,520
The system’s job is to decide and decide must be persistent.

807
00:33:37,520 –> 00:33:39,560
So the deterministic design is simple.

808
00:33:39,560 –> 00:33:43,600
Even if the implementation isn’t, every side effecting operation gets an immutable operation

809
00:33:43,600 –> 00:33:48,360
ID that is generated once, not per retry, once.

810
00:33:48,360 –> 00:33:51,960
That operation ID is persisted before the tool executes.

811
00:33:51,960 –> 00:33:55,720
The persisted record includes the proposed action and its current status.

812
00:33:55,720 –> 00:33:58,280
Proposed allowed denied executed failed.

813
00:33:58,280 –> 00:34:00,000
Then every retry becomes boring.

814
00:34:00,000 –> 00:34:03,840
If the same operation ID arrives again, the system returns the already recorded decision

815
00:34:03,840 –> 00:34:04,840
and outcome.

816
00:34:04,840 –> 00:34:05,840
No new side effect.

817
00:34:05,840 –> 00:34:08,960
No second deletion, no second email, no double share.

818
00:34:08,960 –> 00:34:10,480
The replay is saved by design.

819
00:34:10,480 –> 00:34:12,800
This is also how you neutralize reordering.

820
00:34:12,800 –> 00:34:17,160
If complete task arrives before create task, the state store says the task doesn’t exist

821
00:34:17,160 –> 00:34:19,440
and the request is denied deterministically.

822
00:34:19,440 –> 00:34:20,920
Not handled later, denied.

823
00:34:20,920 –> 00:34:23,400
The event becomes telemetry, not authority.

824
00:34:23,400 –> 00:34:25,320
And this is how you neutralize delay.

825
00:34:25,320 –> 00:34:27,120
Later rivals don’t trigger fresh decisions.

826
00:34:27,120 –> 00:34:30,640
They map to existing operation IDs and return existing outcomes.

827
00:34:30,640 –> 00:34:33,720
The system does not relitigate intent because the packet arrived late.

828
00:34:33,720 –> 00:34:35,960
Now here’s the uncomfortable part.

829
00:34:35,960 –> 00:34:37,560
None of this is a model problem.

830
00:34:37,560 –> 00:34:39,120
None of this is hallucination.

831
00:34:39,120 –> 00:34:40,840
None of this is Microsoft being sloppy.

832
00:34:40,840 –> 00:34:44,240
This is distributed systems behavior colliding with side effects.

833
00:34:44,240 –> 00:34:47,800
And the more you anthropomorphize the agent, the less likely you are to build these boring

834
00:34:47,800 –> 00:34:51,160
guarantees because you start believing the interaction is the system.

835
00:34:51,160 –> 00:34:52,160
It isn’t.

836
00:34:52,160 –> 00:34:54,800
The system is the state spine behind the conversation.

837
00:34:54,800 –> 00:34:57,120
Without that spine, retries are not resilience.

838
00:34:57,120 –> 00:35:02,160
Retries are how your architecture manufactures incidents out of transient failures.

839
00:35:02,160 –> 00:35:05,800
At pattern one, idempotency keys, post-authoritative state spine.

840
00:35:05,800 –> 00:35:09,200
Now the first deterministic pattern is the one everybody claims they already have.

841
00:35:09,200 –> 00:35:11,800
Right up until the first replay deletes the wrong thing.

842
00:35:11,800 –> 00:35:14,280
Idempotency is not, we try not to do it twice.

843
00:35:14,280 –> 00:35:15,480
Idempotency is a guarantee.

844
00:35:15,480 –> 00:35:19,640
The same operation, identified the same way, produces the same effect exactly once,

845
00:35:19,640 –> 00:35:22,400
no matter how many times the system replays the request.

846
00:35:22,400 –> 00:35:26,920
And the only way to get that guarantee is to stop pretending the event stream is your state.

847
00:35:26,920 –> 00:35:29,000
Here’s the model that holds under pressure.

848
00:35:29,000 –> 00:35:33,080
This side-effecting action gets an operation id that is generated once, at the moment the

849
00:35:33,080 –> 00:35:34,680
intent becomes a request.

850
00:35:34,680 –> 00:35:37,200
Not after the tool call, not after the model replies.

851
00:35:37,200 –> 00:35:39,480
Before, that operation id is immutable.

852
00:35:39,480 –> 00:35:43,240
Collision-resistant, boring, it doesn’t encode meaning, it encodes identity.

853
00:35:43,240 –> 00:35:47,360
Then you persisted to an authoritative state spine before you execute anything.

854
00:35:47,360 –> 00:35:51,520
That spine is not your turn state, it is not an in-memory cache, it is not will reconstructed

855
00:35:51,520 –> 00:35:52,720
from logs.

856
00:35:52,720 –> 00:35:57,760
It is a durable store that outlives the process and survives retries, restarts and parallel

857
00:35:57,760 –> 00:35:58,760
workers.

858
00:35:58,760 –> 00:36:03,480
Then you store the minimum fields that make replay safe, operation id, proposed action

859
00:36:03,480 –> 00:36:09,160
structured, not pros, current status, proposed, decided executed.

860
00:36:09,160 –> 00:36:14,160
Decision artifact pointer allowed, denied, transformed, target resource identifiers,

861
00:36:14,160 –> 00:36:16,760
and a timestamp plus version for concurrency control.

862
00:36:16,760 –> 00:36:20,760
Once you have that, retries stop being dangerous because retries stop being meaningful.

863
00:36:20,760 –> 00:36:25,040
A duplicate event arrives, you look up the operation id, you already have a status of

864
00:36:25,040 –> 00:36:28,640
executed, you return the previous outcome, no second side effect, no best effort

865
00:36:28,640 –> 00:36:29,640
to do it.

866
00:36:29,640 –> 00:36:33,800
It is deterministic because the state spine is authoritative, reordering becomes boring

867
00:36:33,800 –> 00:36:38,720
for the same reason, an event arrives that says complete task before create task, in

868
00:36:38,720 –> 00:36:43,000
an envelope driven system that creates a time machine, in a spine driven system you check

869
00:36:43,000 –> 00:36:44,000
state.

870
00:36:44,000 –> 00:36:45,800
Task doesn’t exist.

871
00:36:45,800 –> 00:36:50,720
You deny deterministically, not because the agent is smart, but because the state is authoritative.

872
00:36:50,720 –> 00:36:52,240
Delays become boring as well.

873
00:36:52,240 –> 00:36:56,520
If the event arrives late, it still references the same operation id, you return the same decision.

874
00:36:56,520 –> 00:37:00,240
The system doesn’t reinterpret intent because a packet took a scenic route through someone’s

875
00:37:00,240 –> 00:37:01,240
VPN happened.

876
00:37:01,240 –> 00:37:04,440
Now the thing most people miss is the separation of concerns.

877
00:37:04,440 –> 00:37:07,600
Item potency prevents double harm, it does not decide what is allowed.

878
00:37:07,600 –> 00:37:13,520
That’s why the operation id must exist before policy evaluation and before tool execution.

879
00:37:13,520 –> 00:37:17,880
It becomes the anchor for everything else, decision execution, audit, provenance.

880
00:37:17,880 –> 00:37:22,480
Without that anchor, you can’t tie what we decided to, what we did in a way that survives

881
00:37:22,480 –> 00:37:23,480
failure.

882
00:37:23,480 –> 00:37:25,880
And you need one more piece to make the spine real.

883
00:37:25,880 –> 00:37:30,440
A workflow state machine that is defined outside the agent, the agent can propose the system

884
00:37:30,440 –> 00:37:34,920
must track progression, proposed decided executed is not optional ceremony.

885
00:37:34,920 –> 00:37:38,360
It is how you prevent event noise from becoming irreversible action.

886
00:37:38,360 –> 00:37:42,920
When a worker crashes after executing, but before replying, the state already says executed.

887
00:37:42,920 –> 00:37:44,720
The next worker doesn’t try again.

888
00:37:44,720 –> 00:37:46,160
It returns the recorded effect.

889
00:37:46,160 –> 00:37:48,480
This is also why you don’t store only success.

890
00:37:48,480 –> 00:37:50,440
You store denials and transforms too.

891
00:37:50,440 –> 00:37:54,480
Because the absence of an action is still a decision you need to replace safely.

892
00:37:54,480 –> 00:38:00,240
If the policy denied the delete at 0905 and the same request replace at 0906, you must deny

893
00:38:00,240 –> 00:38:04,320
again for the same operation id, otherwise you’ve built a system that can be bypassed by

894
00:38:04,320 –> 00:38:05,840
retry storms.

895
00:38:05,840 –> 00:38:08,440
The practical consequence is that you stop debugging ghosts.

896
00:38:08,440 –> 00:38:12,600
Your incident review stops being how did this run twice and becomes why did we ever allow

897
00:38:12,600 –> 00:38:15,240
this operation id to execute once.

898
00:38:15,240 –> 00:38:16,240
That’s progress.

899
00:38:16,240 –> 00:38:17,560
That is where accountability lives.

900
00:38:17,560 –> 00:38:19,280
And yes, this costs engineering effort.

901
00:38:19,280 –> 00:38:23,400
So does every week you spend reconstructing an incident from transcripts and half correlated

902
00:38:23,400 –> 00:38:28,440
log lines, id, potency keys, plus an authoritative state spine are not an optimization.

903
00:38:28,440 –> 00:38:33,080
They are the price of admission for letting probabilistic agents touch deterministic systems.

904
00:38:33,080 –> 00:38:35,320
Deterministic pattern 2 per tool call policy gate.

905
00:38:35,320 –> 00:38:37,160
id, potency gives you safe replay.

906
00:38:37,160 –> 00:38:38,160
Good.

907
00:38:38,160 –> 00:38:40,000
It stops duplicates from turning into double damage.

908
00:38:40,000 –> 00:38:43,960
But id, potency doesn’t answer the question that actually decides whether you have an incident.

909
00:38:43,960 –> 00:38:45,480
Should this action be allowed at all?

910
00:38:45,480 –> 00:38:47,920
That’s where most enterprises fall back into religion.

911
00:38:47,920 –> 00:38:49,200
The agent knows.

912
00:38:49,200 –> 00:38:50,520
The prompt told it.

913
00:38:50,520 –> 00:38:51,360
We trained it.

914
00:38:51,360 –> 00:38:52,520
It has citations.

915
00:38:52,520 –> 00:38:54,120
None of that is an enforcement model.

916
00:38:54,120 –> 00:38:55,120
It’s a hope model.

917
00:38:55,120 –> 00:39:00,080
A per tool called policy gate is the mechanism that converts hope into a deterministic decision.

918
00:39:00,080 –> 00:39:02,200
And the key change is conceptual, not technical.

919
00:39:02,200 –> 00:39:03,680
The agent stops issuing commands.

920
00:39:03,680 –> 00:39:05,600
It starts submitting requests.

921
00:39:05,600 –> 00:39:11,920
The moment you let the model speak in imperatives, delete side X, share file Y, email this to Zid.

922
00:39:11,920 –> 00:39:13,920
You’ve made the LLM the control plane.

923
00:39:13,920 –> 00:39:16,640
You’ve delegated authority to a probabilistic system.

924
00:39:16,640 –> 00:39:18,040
That is not agentic.

925
00:39:18,040 –> 00:39:19,040
That is application.

926
00:39:19,040 –> 00:39:20,960
A policy gate flips the relationship.

927
00:39:20,960 –> 00:39:23,840
The agent proposes the control plane disposes.

928
00:39:23,840 –> 00:39:25,480
So what does the gate actually evaluate?

929
00:39:25,480 –> 00:39:30,760
Not pros, not vibe, not it sounded reasonable, a structured request.

930
00:39:30,760 –> 00:39:36,360
At minimum, every tool reaching request carries a tuple, actor, intent, scope, data class,

931
00:39:36,360 –> 00:39:38,400
venue and operation id.

932
00:39:38,400 –> 00:39:43,320
Actor is the identity that would execute the call, human, workload identity or segmented

933
00:39:43,320 –> 00:39:44,800
agent principle.

934
00:39:44,800 –> 00:39:51,280
It is the verb, delete, share, send, post, create, approve, small, innumerable, boring.

935
00:39:51,280 –> 00:39:56,760
Scope is the concrete target, site id, file id, mailbox, distribution list, external domain,

936
00:39:56,760 –> 00:39:58,080
API endpoint.

937
00:39:58,080 –> 00:40:02,200
Data class is the sensitivity of the thing being touched or disclosed, derived from authoritative

938
00:40:02,200 –> 00:40:04,840
classification, not guessed by the model.

939
00:40:04,840 –> 00:40:08,920
Venew is where the effect will manifest internal tenant, external email, teams meeting

940
00:40:08,920 –> 00:40:13,640
with external’s public web chat, operation id, anchors, replay and traceability as we already

941
00:40:13,640 –> 00:40:14,640
covered.

942
00:40:14,640 –> 00:40:18,360
Now the policy engine evaluates that tuple against rules and authoritative state.

943
00:40:18,360 –> 00:40:20,240
It returns one of three outcomes.

944
00:40:20,240 –> 00:40:24,240
Allow, deny, transform.

945
00:40:24,240 –> 00:40:27,680
Allow means it can proceed, but not as a blank check.

946
00:40:27,680 –> 00:40:32,920
It can attach constraints, time window, max recipients, required approval, rate limits,

947
00:40:32,920 –> 00:40:34,440
a narrow target set.

948
00:40:34,440 –> 00:40:36,280
deny means the tool never executes.

949
00:40:36,280 –> 00:40:40,280
The refusal is not a moral stance, it’s a deterministic result, intent x and venue

950
00:40:40,280 –> 00:40:46,320
y with data class z violates rule r, transform is the underused one that keeps systems usable.

951
00:40:46,320 –> 00:40:49,600
It means the action is allowed only in a safer form.

952
00:40:49,600 –> 00:40:53,800
Replace share externally with share internally and create an approval task.

953
00:40:53,800 –> 00:40:58,600
Replace speak compensation cohort data with speaker high level summary template.

954
00:40:58,600 –> 00:41:00,720
Replace delete with move to quarantine.

955
00:41:00,720 –> 00:41:02,960
This is how you avoid the false choice.

956
00:41:02,960 –> 00:41:05,840
Between agents are useless and agents are dangerous.

957
00:41:05,840 –> 00:41:09,760
The gate is also where you encode negative space, not just what you did, what you refused

958
00:41:09,760 –> 00:41:13,720
to do, what you refused to retrieve, what you refused to disclose.

959
00:41:13,720 –> 00:41:17,440
Because governance without refusal telemetry becomes performative, it only shows motion.

960
00:41:17,440 –> 00:41:21,080
Now here’s the part that separates a real gate from a prompt based imitation.

961
00:41:21,080 –> 00:41:23,480
Tools must accept decisions, not requests.

962
00:41:23,480 –> 00:41:28,080
If your tool endpoint will execute any authenticated call that contains delete, true, you don’t have

963
00:41:28,080 –> 00:41:31,360
a gate, you have a suggestion layer in front of a loaded weapon.

964
00:41:31,360 –> 00:41:35,480
The tool should accept only a signed decision artifact from the policy engine bound to

965
00:41:35,480 –> 00:41:40,200
the operation ID with a short TTL if the decision doesn’t match the tool denies.

966
00:41:40,200 –> 00:41:43,840
If the operation ID was already executed, the tool returns the prior outcome.

967
00:41:43,840 –> 00:41:48,440
That binds execution to policy and makes bypassing the gate materially harder.

968
00:41:48,440 –> 00:41:49,920
And yes, this sounds like overhead.

969
00:41:49,920 –> 00:41:50,920
It is.

970
00:41:50,920 –> 00:41:53,640
It’s also the only place where intent can be enforced at action time.

971
00:41:53,640 –> 00:41:54,880
Conditional access can’t do this.

972
00:41:54,880 –> 00:41:56,200
It doesn’t see the tool call.

973
00:41:56,200 –> 00:41:57,760
It sees token issuance context.

974
00:41:57,760 –> 00:41:58,760
Per view can’t do this.

975
00:41:58,760 –> 00:42:00,360
It sees the aftermath.

976
00:42:00,360 –> 00:42:01,360
Citations can’t do this.

977
00:42:01,360 –> 00:42:03,160
They explain retrieval, not permission.

978
00:42:03,160 –> 00:42:05,160
Only a gate can stop the train while it’s moving.

979
00:42:05,160 –> 00:42:09,680
If you want a concrete mental picture, treat the policy engine like an authorization compiler.

980
00:42:09,680 –> 00:42:11,360
The agent submits a high level request.

981
00:42:11,360 –> 00:42:13,280
The compiler checks it against rules and state.

982
00:42:13,280 –> 00:42:15,000
It emits a decision artifact.

983
00:42:15,000 –> 00:42:16,760
The runtime can execute.

984
00:42:16,760 –> 00:42:18,760
Without that artifact execution is invalid.

985
00:42:18,760 –> 00:42:21,600
That’s determinism grafted onto probabilistic reasoning.

986
00:42:21,600 –> 00:42:24,800
And once you have it, your incident reviews change shape.

987
00:42:24,800 –> 00:42:28,360
You stop asking why did it do that as if the agent had agency?

988
00:42:28,360 –> 00:42:30,480
You ask which rule allowed this?

989
00:42:30,480 –> 00:42:31,960
And who changed it?

990
00:42:31,960 –> 00:42:33,440
That’s accountability.

991
00:42:33,440 –> 00:42:38,040
And then the logistic pattern three segmented agent identities as failure domains.

992
00:42:38,040 –> 00:42:41,920
Once you put a real policy gate in front of tools, you’ve solved the should this be a

993
00:42:41,920 –> 00:42:46,800
loud problem at action time, but you still haven’t solved the bigger failure domain problem.

994
00:42:46,800 –> 00:42:50,560
Because if one identity can do everything, your gate becomes your only break and breaks

995
00:42:50,560 –> 00:42:51,560
fail.

996
00:42:51,560 –> 00:42:52,560
Rules drift.

997
00:42:52,560 –> 00:42:53,880
Someone adds an exception.

998
00:42:53,880 –> 00:42:56,120
An urgent request becomes permanent.

999
00:42:56,120 –> 00:42:58,760
Entropy always wins unless you give it walls to hit.

1000
00:42:58,760 –> 00:43:00,360
Segmented agent identities are those walls.

1001
00:43:00,360 –> 00:43:01,960
One agent is not one identity.

1002
00:43:01,960 –> 00:43:03,520
One agent is an orchestrator.

1003
00:43:03,520 –> 00:43:07,600
It should coordinate multiple principles, each with a narrow capability and a narrow blast

1004
00:43:07,600 –> 00:43:08,600
radius.

1005
00:43:08,600 –> 00:43:10,120
Read, write and address.

1006
00:43:10,120 –> 00:43:13,840
That distinction matters because the dominant failure mode in enterprise agents is not the

1007
00:43:13,840 –> 00:43:15,680
agent got global admin.

1008
00:43:15,680 –> 00:43:17,680
Microsoft has already constrained a lot of that.

1009
00:43:17,680 –> 00:43:19,440
The dominant failure is the boring one.

1010
00:43:19,440 –> 00:43:24,440
A convenience driven, overscoped identity executing at machine speed in the wrong place.

1011
00:43:24,440 –> 00:43:25,880
Least privilege isn’t a value statement.

1012
00:43:25,880 –> 00:43:26,880
It’s math.

1013
00:43:26,880 –> 00:43:30,880
Scopes multiplied by ambiguity multiplied by speed equals blast radius.

1014
00:43:30,880 –> 00:43:35,360
If you let the same identity retrieve broadly, write broadly and communicate externally, you’ve

1015
00:43:35,360 –> 00:43:37,240
created a super user with a polite interface.

1016
00:43:37,240 –> 00:43:39,400
It doesn’t matter how good your prompt is.

1017
00:43:39,400 –> 00:43:40,680
The capability exists.

1018
00:43:40,680 –> 00:43:42,680
The model will eventually root intent into it.

1019
00:43:42,680 –> 00:43:44,760
So you split capability at the identity boundary.

1020
00:43:44,760 –> 00:43:46,240
The read identity can only read.

1021
00:43:46,240 –> 00:43:50,000
It can query SharePoint metadata, retrieve files and summarize content.

1022
00:43:50,000 –> 00:43:51,000
It cannot delete.

1023
00:43:51,000 –> 00:43:52,000
It cannot share.

1024
00:43:52,000 –> 00:43:53,000
It cannot send.

1025
00:43:53,000 –> 00:43:54,960
It cannot write anywhere that matters.

1026
00:43:54,960 –> 00:43:56,600
Its job is to propose not to act.

1027
00:43:56,600 –> 00:43:58,040
Then you create a write identity.

1028
00:43:58,040 –> 00:43:59,440
This one is intentionally painful.

1029
00:43:59,440 –> 00:44:03,320
It holds only the minimum permissions needed for irreversible actions.

1030
00:44:03,320 –> 00:44:07,880
And those permissions are resource, scoped, short-lived and ideally minted just in time.

1031
00:44:07,880 –> 00:44:11,240
If you can’t make them short-lived, then you rotate aggressively and monitor like you mean

1032
00:44:11,240 –> 00:44:12,240
it.

1033
00:44:12,240 –> 00:44:13,720
This identity never retrieves broadly.

1034
00:44:13,720 –> 00:44:14,720
It doesn’t need to.

1035
00:44:14,720 –> 00:44:18,640
It executes against explicit targets that have already passed policy evaluation.

1036
00:44:18,640 –> 00:44:20,320
And then you create an egress identity.

1037
00:44:20,320 –> 00:44:24,560
This one can talk outside the tenant or publish to public surfaces or send email to external

1038
00:44:24,560 –> 00:44:25,560
domains.

1039
00:44:25,560 –> 00:44:27,720
It has zero access to internal corporate data planes.

1040
00:44:27,720 –> 00:44:28,720
None.

1041
00:44:28,720 –> 00:44:32,600
You can see internal runbooks and also post externally you’ve already lost.

1042
00:44:32,600 –> 00:44:33,680
Egress is not a feature.

1043
00:44:33,680 –> 00:44:34,680
It’s a failure domain.

1044
00:44:34,680 –> 00:44:37,560
Now the obvious objection is that’s three times the complexity.

1045
00:44:37,560 –> 00:44:41,840
No, it’s three times the clarity because now every action has a lane and lanes don’t cross

1046
00:44:41,840 –> 00:44:43,000
without a broker.

1047
00:44:43,000 –> 00:44:44,440
The orchestrator can request.

1048
00:44:44,440 –> 00:44:47,960
The policy gate can decide the correct identity can execute.

1049
00:44:47,960 –> 00:44:52,280
If the read identity gets compromised, the attacker gets visibility, not destruction.

1050
00:44:52,280 –> 00:44:56,040
If the right identity gets compromised, the attacker gets destruction, but only within

1051
00:44:56,040 –> 00:45:00,640
a narrowly scoped domain and ideally only for a short time window.

1052
00:45:00,640 –> 00:45:04,720
If the egress identity gets compromised, the attacker can speak, but they can’t see your

1053
00:45:04,720 –> 00:45:06,040
internal knowledge base.

1054
00:45:06,040 –> 00:45:09,920
This is how you build containment into the system rather than writing incident reviews

1055
00:45:09,920 –> 00:45:11,680
about will be more careful.

1056
00:45:11,680 –> 00:45:14,400
And it pairs cleanly with the previous two patterns.

1057
00:45:14,400 –> 00:45:16,040
Identity gives you safe replay.

1058
00:45:16,040 –> 00:45:19,200
The policy gate gives you action time authorization.

1059
00:45:19,200 –> 00:45:22,360
Segmented identities give you blast radius containment when the gate is wrong.

1060
00:45:22,360 –> 00:45:25,200
Now here’s the part most teams miss.

1061
00:45:25,200 –> 00:45:28,160
Reaction must be enforced by design, not etiquette.

1062
00:45:28,160 –> 00:45:31,720
Don’t let the agent choose which identity to use based on a prompt instruction.

1063
00:45:31,720 –> 00:45:32,720
That’s still hope.

1064
00:45:32,720 –> 00:45:37,160
Identity selection should be a deterministic mapping from intent and venue to a principle

1065
00:45:37,160 –> 00:45:38,920
enforced by the control plane.

1066
00:45:38,920 –> 00:45:41,320
Delete intent routes to the right principle.

1067
00:45:41,320 –> 00:45:44,320
External publication routes to the egress principle.

1068
00:45:44,320 –> 00:45:46,080
Retrieval routes to the read principle.

1069
00:45:46,080 –> 00:45:50,400
The agent can’t override that because it never directly holds the credentials for the

1070
00:45:50,400 –> 00:45:51,640
other lanes.

1071
00:45:51,640 –> 00:45:54,160
This is also how you survive shadow agents sprawl.

1072
00:45:54,160 –> 00:45:58,120
When someone spins up a quick external bot, the external lane simply cannot authenticate

1073
00:45:58,120 –> 00:46:00,200
to internal core data planes.

1074
00:46:00,200 –> 00:46:02,280
Even if they try, even if they copy code.

1075
00:46:02,280 –> 00:46:06,240
Even if they add a connector, the design makes the bad path impossible without an explicit

1076
00:46:06,240 –> 00:46:07,400
governance decision.

1077
00:46:07,400 –> 00:46:11,280
So if you remember one sentence from this pattern, make it this.

1078
00:46:11,280 –> 00:46:13,880
Agents should fail small, not fail loud.

1079
00:46:13,880 –> 00:46:17,240
A single identity design makes failure loud by default.

1080
00:46:17,240 –> 00:46:20,640
Segmented identities make failure bounded by default.

1081
00:46:20,640 –> 00:46:24,720
That’s the difference between a contained incident and a tenet wide outage delivered by

1082
00:46:24,720 –> 00:46:26,120
a calm voice.

1083
00:46:26,120 –> 00:46:31,040
Bragg as a security boundary, retrieval filters plus negative space plus output classification

1084
00:46:31,040 –> 00:46:34,960
not tie the whole thing back to the part everybody treats as just search.

1085
00:46:34,960 –> 00:46:37,120
Retrieval.

1086
00:46:37,120 –> 00:46:39,360
Most teams implement rag like a convenience feature.

1087
00:46:39,360 –> 00:46:43,440
Embed documents, vector search, pull the top five chunks, stuff them into the prompt and

1088
00:46:43,440 –> 00:46:45,080
call it grounded.

1089
00:46:45,080 –> 00:46:46,080
That is not a boundary.

1090
00:46:46,080 –> 00:46:49,120
That is a suggestion engine feeding a probabilistic model.

1091
00:46:49,120 –> 00:46:51,520
In a real enterprise, retrieval is an authorization event.

1092
00:46:51,520 –> 00:46:55,240
It is the moment your system decides what information is allowed to exist for this actor

1093
00:46:55,240 –> 00:46:56,960
in this venue right now.

1094
00:46:56,960 –> 00:47:00,680
If you don’t treat it that way, the nearest neighbor algorithm will outrun your governance

1095
00:47:00,680 –> 00:47:01,680
model every time.

1096
00:47:01,680 –> 00:47:03,560
So the boundary starts before similarity.

1097
00:47:03,560 –> 00:47:05,800
Alligibility comes first.

1098
00:47:05,800 –> 00:47:10,200
Before you run a vector search, you filter the candidate set using hard predicates,

1099
00:47:10,200 –> 00:47:13,800
principle, access scope, confidentiality and venue.

1100
00:47:13,800 –> 00:47:17,720
Principle means the workload identity or user context that is actually operating.

1101
00:47:17,720 –> 00:47:21,480
This scope means what corpus this identity is allowed to see based on an authoritative

1102
00:47:21,480 –> 00:47:25,200
catalog, not on whatever connector happens to be configured.

1103
00:47:25,200 –> 00:47:28,520
Confidentiality means the classification level of the content.

1104
00:47:28,520 –> 00:47:31,320
venue means where the answer will be consumed.

1105
00:47:31,320 –> 00:47:35,480
Internal chat, mixed audience meeting, external web, email public site.

1106
00:47:35,480 –> 00:47:38,720
If a chunk is not eligible under those predicates, it does not exist.

1107
00:47:38,720 –> 00:47:40,040
Not it won’t be used.

1108
00:47:40,040 –> 00:47:41,480
It doesn’t exist.

1109
00:47:41,480 –> 00:47:43,040
This is the uncomfortable truth.

1110
00:47:43,040 –> 00:47:44,760
Similarity search is not a permissions model.

1111
00:47:44,760 –> 00:47:48,600
It is math and math will happily return the best match from an ineligible corpus unless

1112
00:47:48,600 –> 00:47:49,800
you fence it.

1113
00:47:49,800 –> 00:47:53,200
Then you do the thing that makes the whole system safer without anyone noticing.

1114
00:47:53,200 –> 00:47:55,160
You build negative space.

1115
00:47:55,160 –> 00:47:58,440
Negative space means the system records what it refused to retrieve and what it refused

1116
00:47:58,440 –> 00:47:59,440
to say.

1117
00:47:59,440 –> 00:48:03,600
When the pre-filters exclude chunks, you lock that exclusion with a reason.

1118
00:48:03,600 –> 00:48:08,160
Excluded because venue external, excluded because confidentiality and turn only, excluded

1119
00:48:08,160 –> 00:48:10,320
because principle lacks access scope.

1120
00:48:10,320 –> 00:48:14,120
When the filtered retrieval returns nothing, that emptiness is not an error.

1121
00:48:14,120 –> 00:48:15,280
It is a safe outcome.

1122
00:48:15,280 –> 00:48:18,440
It is the system refusing to invent or overshare.

1123
00:48:18,440 –> 00:48:21,120
Most organizations treat no results as a UX bug.

1124
00:48:21,120 –> 00:48:23,120
They force the model to answer anyway.

1125
00:48:23,120 –> 00:48:25,680
That turns your rack system into a leak mechanism.

1126
00:48:25,680 –> 00:48:27,680
The safe behavior is sight or silent.

1127
00:48:27,680 –> 00:48:30,480
If there is no eligible evidence, the agent says less.

1128
00:48:30,480 –> 00:48:33,080
No eligible content found for this request is a feature.

1129
00:48:33,080 –> 00:48:37,120
It’s the guardrail that stops the model from converting uncertainty into confident

1130
00:48:37,120 –> 00:48:38,120
nonsense.

1131
00:48:38,120 –> 00:48:40,840
Now you enforce the same discipline on generation.

1132
00:48:40,840 –> 00:48:45,240
The model can only assert claims that map to eligible chunk IDs and it must cite them.

1133
00:48:45,240 –> 00:48:46,720
If it can’t cite it downgrades.

1134
00:48:46,720 –> 00:48:49,200
If it can’t downgrade safely, it refuses.

1135
00:48:49,200 –> 00:48:52,360
This is how you make grounding measurable instead of aspirational.

1136
00:48:52,360 –> 00:48:55,200
But the part that most people miss is output classification.

1137
00:48:55,200 –> 00:48:59,120
Enterprises label inputs and then pretend outputs inherit safety biosmosis.

1138
00:48:59,120 –> 00:49:00,120
They don’t.

1139
00:49:00,120 –> 00:49:01,120
The output is a new artifact.

1140
00:49:01,120 –> 00:49:02,120
It can aggregate.

1141
00:49:02,120 –> 00:49:03,120
It can summarize.

1142
00:49:03,120 –> 00:49:07,040
It can combine two non-sensitive facts into a sensitive conclusion.

1143
00:49:07,040 –> 00:49:09,360
And invoice scenarios output is publication.

1144
00:49:09,360 –> 00:49:13,680
So you derive an output sensitivity label from the sources used and the aggregation level

1145
00:49:13,680 –> 00:49:14,680
of the answer.

1146
00:49:14,680 –> 00:49:19,720
If the answer pulls from compensation guidance and produces cohort level metrics, the output

1147
00:49:19,720 –> 00:49:24,000
is compensation sensitive even if no single chunk was labeled secret.

1148
00:49:24,000 –> 00:49:27,400
Then you root that output through the same policy gate that controls tool calls because

1149
00:49:27,400 –> 00:49:32,360
speech is a tool call, venue plus output classification becomes your egress boundary.

1150
00:49:32,360 –> 00:49:37,960
Mixed audience, external participants, then the speech path requires a transform or deny.

1151
00:49:37,960 –> 00:49:39,120
Internal HR channel.

1152
00:49:39,120 –> 00:49:42,440
You might allow text, deny speech or require different identity.

1153
00:49:42,440 –> 00:49:46,920
The point is that the system decides at action time, not after the transcript is stored.

1154
00:49:46,920 –> 00:49:51,120
This is how rag stops being a knowledge feature and becomes a security boundary.

1155
00:49:51,120 –> 00:49:53,360
Eligibility before similarity.

1156
00:49:53,360 –> 00:49:57,440
Negative space as a first class record, outputs classified and gated like actions.

1157
00:49:57,440 –> 00:49:59,200
The agent still speaks when it has proof.

1158
00:49:59,200 –> 00:50:00,880
It goes quiet when it doesn’t.

1159
00:50:00,880 –> 00:50:05,360
And that silence is what prevents the next incident from being perfectly logged.

1160
00:50:05,360 –> 00:50:06,360
Conditional access?

1161
00:50:06,360 –> 00:50:07,360
Necessary?

1162
00:50:07,360 –> 00:50:08,360
Not sufficient.

1163
00:50:08,360 –> 00:50:12,080
Conditional access is the most over praised control in the agent conversation and it’s

1164
00:50:12,080 –> 00:50:13,080
still mandatory.

1165
00:50:13,080 –> 00:50:14,080
It is the front gate.

1166
00:50:14,080 –> 00:50:18,600
It decides whether an identity should receive a token right now under current risk signals,

1167
00:50:18,600 –> 00:50:22,040
device posture, location, sign-in-risk, workload context.

1168
00:50:22,040 –> 00:50:25,240
For agents and other non-human identities, that matters.

1169
00:50:25,240 –> 00:50:27,920
It shrinks who can even show up holding credentials.

1170
00:50:27,920 –> 00:50:28,920
You don’t skip that.

1171
00:50:28,920 –> 00:50:33,160
But conditional access is also where enterprises stop thinking because it feels like enforcement.

1172
00:50:33,160 –> 00:50:34,880
This is the uncomfortable truth.

1173
00:50:34,880 –> 00:50:36,560
Conditional access is a token time decision.

1174
00:50:36,560 –> 00:50:38,080
It is not an action time decision.

1175
00:50:38,080 –> 00:50:40,760
And answers, may this identity obtain a token?

1176
00:50:40,760 –> 00:50:44,480
Not may this identity delete this site, share this file or speak this aggregation in this

1177
00:50:44,480 –> 00:50:45,480
venue?

1178
00:50:45,480 –> 00:50:48,720
Once the token exists, you are no longer in an authentication problem.

1179
00:50:48,720 –> 00:50:50,520
You are in an authorization problem.

1180
00:50:50,520 –> 00:50:55,480
And token issuance cannot adjudicate tool execution because tool execution happens later in a different

1181
00:50:55,480 –> 00:51:00,480
context after retrieval, after orchestration, after the meeting audience changes, after

1182
00:51:00,480 –> 00:51:03,160
the agent chooses a path you didn’t anticipate.

1183
00:51:03,160 –> 00:51:08,040
That’s why so many incidents look compliant in entra and still unacceptable in the business.

1184
00:51:08,040 –> 00:51:10,080
Walk the timeline and the gap becomes obvious.

1185
00:51:10,080 –> 00:51:12,040
The agent requests a token.

1186
00:51:12,040 –> 00:51:13,920
Conditional access evaluates and passes.

1187
00:51:13,920 –> 00:51:14,920
Good.

1188
00:51:14,920 –> 00:51:18,760
Then the agent retrieves a loud data under its scopes, logged, fine.

1189
00:51:18,760 –> 00:51:20,400
Then the agent proposes an action.

1190
00:51:20,400 –> 00:51:22,760
Delete, share, email, post, speak.

1191
00:51:22,760 –> 00:51:26,240
This is the moment that matters because this is the moment side effects happen.

1192
00:51:26,240 –> 00:51:30,120
And conditional access is not in that path unless you force it back in with a separate decision

1193
00:51:30,120 –> 00:51:31,120
point.

1194
00:51:31,120 –> 00:51:32,440
That is what the policy gate is for.

1195
00:51:32,440 –> 00:51:34,600
It is not a replacement for conditional access.

1196
00:51:34,600 –> 00:51:36,360
It is the missing second gate.

1197
00:51:36,360 –> 00:51:37,960
Conditional access decides who may try.

1198
00:51:37,960 –> 00:51:39,760
The policy engine decides what may happen.

1199
00:51:39,760 –> 00:51:43,960
Now the mistake teams make is trying to stretch conditional access to cover what it can’t.

1200
00:51:43,960 –> 00:51:48,040
They pile on network locations, token protection, session controls, device filters and assume

1201
00:51:48,040 –> 00:51:50,080
the blast radius shrinks automatically.

1202
00:51:50,080 –> 00:51:51,080
It doesn’t.

1203
00:51:51,080 –> 00:51:54,960
If the agent holds broad right scopes, the radius is already baked in.

1204
00:51:54,960 –> 00:51:57,320
Conditional access just decides who gets to hold the match.

1205
00:51:57,320 –> 00:51:59,760
So the architecture you enforce is a braid.

1206
00:51:59,760 –> 00:52:02,800
Conditional access at token time, strict and non-negotiable.

1207
00:52:02,800 –> 00:52:06,400
Least privilege on scopes because permissions are blast radius math.

1208
00:52:06,400 –> 00:52:10,480
Uncreated identities because one agent should not be one super identity and per tool call

1209
00:52:10,480 –> 00:52:14,640
policy evaluation because action time authorization is where incidents either happen or don’t.

1210
00:52:14,640 –> 00:52:16,440
Now make monitoring, earn its keep.

1211
00:52:16,440 –> 00:52:19,080
Watch token issuance patterns on agent identities.

1212
00:52:19,080 –> 00:52:22,240
Unusual cadence, unusual geos, new client types.

1213
00:52:22,240 –> 00:52:23,480
That’s the identity plane.

1214
00:52:23,480 –> 00:52:28,760
But also watch tool call shapes, spikes in deletes, sudden external egress, novel venues.

1215
00:52:28,760 –> 00:52:29,760
That’s the action plane.

1216
00:52:29,760 –> 00:52:32,640
And when you detect drift, you don’t retrain the agent.

1217
00:52:32,640 –> 00:52:37,040
With titan scopes, titan policies and shrink failure domains, conditional access is necessary

1218
00:52:37,040 –> 00:52:39,560
because it keeps the wrong identities from showing up.

1219
00:52:39,560 –> 00:52:43,080
It is not sufficient because the right identity can still do the wrong thing, perfectly

1220
00:52:43,080 –> 00:52:45,240
logged with a valid token.

1221
00:52:45,240 –> 00:52:49,240
The experience plane tax, web RTC, speech regions and metered certainty.

1222
00:52:49,240 –> 00:52:52,960
Now the punchline nobody budgets for until the demo becomes production, the experience plane

1223
00:52:52,960 –> 00:52:53,960
tax.

1224
00:52:53,960 –> 00:52:56,400
The face and the voice don’t just add engagement.

1225
00:52:56,400 –> 00:53:01,120
They add failure domains, networks, regions and metering and none of that complexity

1226
00:53:01,120 –> 00:53:04,800
buys you a single extra millisecond of deterministic control.

1227
00:53:04,800 –> 00:53:05,880
Start with web RTC.

1228
00:53:05,880 –> 00:53:07,800
It works beautifully in a clean lab.

1229
00:53:07,800 –> 00:53:12,360
Then it meets enterprise reality, NIT traversal, VPN hairpins, deep packet inspection, split

1230
00:53:12,360 –> 00:53:16,760
tunnel policies and firewalls that quietly hate UDP, so you fall back to relays.

1231
00:53:16,760 –> 00:53:18,040
Turn becomes mandatory.

1232
00:53:18,040 –> 00:53:20,640
That adds hops, jitter and operational overhead.

1233
00:53:20,640 –> 00:53:25,200
The avatar stutters, the audio talks over itself and the system compensates with retries

1234
00:53:25,200 –> 00:53:30,000
and reconnects more events, more envelope churn, more entropy injected into the same pathway

1235
00:53:30,000 –> 00:53:31,880
that also drives tool calls.

1236
00:53:31,880 –> 00:53:32,880
Then speech regions.

1237
00:53:32,880 –> 00:53:35,040
Azure Speech is region bound by design.

1238
00:53:35,040 –> 00:53:37,080
Keys are region locked and points are regional.

1239
00:53:37,080 –> 00:53:40,400
If you serve multiple geographies, you don’t have a voice.

1240
00:53:40,400 –> 00:53:45,840
You have a fleet of voices, separate resources, quotas, keys, routing logic and failover plans.

1241
00:53:45,840 –> 00:53:50,120
When a region blips, the agent doesn’t fail in a way your control plane can reason about.

1242
00:53:50,120 –> 00:53:53,840
It fails in the human layer, the voice disappears and the business interprets that as the

1243
00:53:53,840 –> 00:53:58,640
agent is down, even if the decision engine is still happily proposing actions.

1244
00:53:58,640 –> 00:54:02,720
And it’s all metered, per second, not per outcome, not per prevented incident, per second

1245
00:54:02,720 –> 00:54:04,040
of stream certainty.

1246
00:54:04,040 –> 00:54:07,960
So you end up financing persuasion, tens of millions of seconds of compute to animate confidence

1247
00:54:07,960 –> 00:54:11,560
while the control plane that could prevent harm remains underbuilt.

1248
00:54:11,560 –> 00:54:15,160
Conclusion, assume the face is lying and force intended action time.

1249
00:54:15,160 –> 00:54:16,600
The voice adds trust.

1250
00:54:16,600 –> 00:54:19,520
The system did not earn it and logs won’t save you after the fact.

1251
00:54:19,520 –> 00:54:22,800
Make the agent propose then force the control plane to dispose.

1252
00:54:22,800 –> 00:54:29,160
Item potency, authoritative state, per tool call policy gates and segmented identities.

1253
00:54:29,160 –> 00:54:33,240
If you do one thing next, audit where actions execute without a gate and market red, then

1254
00:54:33,240 –> 00:54:34,720
fund determinism not avatars.





Source link

0 Votes: 0 Upvotes, 0 Downvotes (0 Points)

Leave a reply

Follow
Search
Loading

Signing-in 3 seconds...

Signing-up 3 seconds...

Discover more from 365 Community Online

Subscribe now to keep reading and get access to the full archive.

Continue reading