Why Copilot Agents Fail: An Architectural Insight

Mirko PetersPodcasts1 hour ago17 Views


1
00:00:00,000 –> 00:00:04,280
Most enterprises blame co-pilot agent failures on early platform chaos.

2
00:00:04,280 –> 00:00:06,280
That story is comforting, it’s also wrong.

3
00:00:06,280 –> 00:00:09,600
Agents fail because teams deploy conversation where they need control,

4
00:00:09,600 –> 00:00:12,080
then act surprised when outcomes aren’t repeatable,

5
00:00:12,080 –> 00:00:13,880
auditable or safe to scale.

6
00:00:13,880 –> 00:00:17,680
In the next hour, this goes from provocation to a Monday morning mandate.

7
00:00:17,680 –> 00:00:20,920
How to design agents like govern systems, not chat experiences.

8
00:00:20,920 –> 00:00:25,440
If you’re building co-pilot studio agents inside a real tenant, identity data,

9
00:00:25,440 –> 00:00:27,640
power platform service, now subscribe.

10
00:00:27,640 –> 00:00:30,360
This channel is where the architecture gets enforced.

11
00:00:30,360 –> 00:00:32,760
Before the sprawl becomes policy.

12
00:00:32,760 –> 00:00:35,560
The foundational misunderstanding a chat is not assisted.

13
00:00:35,560 –> 00:00:37,360
The foundational misunderstanding is simple.

14
00:00:37,360 –> 00:00:39,560
People treat chat like a system interface.

15
00:00:39,560 –> 00:00:42,160
Chat is not a system, it’s a user experience layer.

16
00:00:42,160 –> 00:00:45,800
That distinction matters because enterprises don’t run on friendly conversations.

17
00:00:45,800 –> 00:00:48,560
They run on inputs, state outputs and accountability.

18
00:00:48,560 –> 00:00:49,880
They run on repeatability.

19
00:00:49,880 –> 00:00:53,920
They run on the ability to prove what happened, why it happened and who authorized it.

20
00:00:53,920 –> 00:00:57,440
A chat box gives you none of that by default, it gives you a vibe.

21
00:00:57,440 –> 00:00:59,200
Here’s what chat does architecturally.

22
00:00:59,200 –> 00:01:00,560
It hides the boundaries.

23
00:01:00,560 –> 00:01:05,480
It collapses intent capture, decisioning and execution into one continuous stream of text.

24
00:01:05,480 –> 00:01:08,480
And the moment you do that, you lose the ability to say,

25
00:01:08,480 –> 00:01:10,480
this is the point where the system decided,

26
00:01:10,480 –> 00:01:12,920
and this is the point where the system acted.

27
00:01:12,920 –> 00:01:14,800
In other words, you can’t draw the audit line.

28
00:01:14,800 –> 00:01:16,600
Enterprise systems run on contracts.

29
00:01:16,600 –> 00:01:18,080
A request has a defined shape.

30
00:01:18,080 –> 00:01:19,400
Inputs get validated.

31
00:01:19,400 –> 00:01:21,720
State changes happen inside boundaries.

32
00:01:21,720 –> 00:01:26,200
Outputs have lineage, failures stop, retry or escalate with evidence.

33
00:01:26,200 –> 00:01:28,080
A chat doesn’t do contracts by default.

34
00:01:28,080 –> 00:01:29,240
It does interpretation.

35
00:01:29,240 –> 00:01:33,080
Sometimes useful, sometimes wrong, always harder to reproduce than a workflow.

36
00:01:33,080 –> 00:01:34,840
That’s not a moral critique of AI.

37
00:01:34,840 –> 00:01:36,360
That’s a system behavior description.

38
00:01:36,360 –> 00:01:40,960
The enterprise problem starts when teams mistake fluent language for bounded decision logic.

39
00:01:40,960 –> 00:01:45,000
They assume that if the agent can talk through a process, it can run the process.

40
00:01:45,000 –> 00:01:46,600
But language is not a control plane.

41
00:01:46,600 –> 00:01:48,160
Language is not policy enforcement.

42
00:01:48,160 –> 00:01:49,720
Language is not a transaction boundary.

43
00:01:49,720 –> 00:01:51,320
Language is just output.

44
00:01:51,320 –> 00:01:54,160
The cost of this shows up as friendly ambiguity.

45
00:01:54,160 –> 00:01:57,280
Everyone loves friendly ambiguity in demos because it feels flexible.

46
00:01:57,280 –> 00:02:00,120
In production, it creates three predictable outcomes.

47
00:02:00,120 –> 00:02:04,040
In consistent actions, untraceable rationale and audit discomfort.

48
00:02:04,040 –> 00:02:08,320
In consistent actions happen because the agent doesn’t have a deterministic decision boundary.

49
00:02:08,320 –> 00:02:11,200
The same request from two users becomes two different interpretations.

50
00:02:11,200 –> 00:02:14,880
The same request from the same user on Tuesday becomes a slightly different path on Wednesday

51
00:02:14,880 –> 00:02:18,680
because context changed, the model changed, the connector throttled, or a knowledge source

52
00:02:18,680 –> 00:02:20,160
return different chunks.

53
00:02:20,160 –> 00:02:22,800
And now your process is a probability distribution.

54
00:02:22,800 –> 00:02:26,080
Untraceable rationale happens because chat encourages narrative.

55
00:02:26,080 –> 00:02:27,080
The agent explains.

56
00:02:27,080 –> 00:02:28,080
It justifies.

57
00:02:28,080 –> 00:02:29,080
It sounds reasonable.

58
00:02:29,080 –> 00:02:33,360
But the organization can’t verify which data it used, which policy it applied, which tool

59
00:02:33,360 –> 00:02:34,840
contracted and vote.

60
00:02:34,840 –> 00:02:37,640
And what preconditions were true at the moment of action.

61
00:02:37,640 –> 00:02:38,640
You get a story.

62
00:02:38,640 –> 00:02:39,640
You don’t get evidence.

63
00:02:39,640 –> 00:02:41,560
Audit discomfort is the inevitable end state.

64
00:02:41,560 –> 00:02:43,240
Not because auditors hate AI.

65
00:02:43,240 –> 00:02:44,720
Auditors hate ambiguity.

66
00:02:44,720 –> 00:02:47,520
Outcomes you can’t reconstruct and decisions you can’t attribute.

67
00:02:47,520 –> 00:02:48,760
That’s not a control story.

68
00:02:48,760 –> 00:02:50,280
That’s an incident review waiting to happen.

69
00:02:50,280 –> 00:02:52,440
So what is the real job of an enterprise agent?

70
00:02:52,440 –> 00:02:54,000
It is not answer questions.

71
00:02:54,000 –> 00:02:55,360
That’s a small safe subset.

72
00:02:55,360 –> 00:03:00,040
The real job is delegated decisioning plus delegated execution under constraints.

73
00:03:00,040 –> 00:03:04,400
Delegated decisioning means the agent can choose a path, which workflow to trigger, which record

74
00:03:04,400 –> 00:03:08,480
to retrieve, which policy applies, which exception requires escalation.

75
00:03:08,480 –> 00:03:13,360
Delegated execution means the agent can cause state change, create a ticket, update a record,

76
00:03:13,360 –> 00:03:17,200
submit an approval, notify a user, write to a system of record.

77
00:03:17,200 –> 00:03:20,240
And the phrase under constraints is the entire point.

78
00:03:20,240 –> 00:03:24,320
Constraints are what turn AI from a probabilistic assistant into an operational component.

79
00:03:24,320 –> 00:03:27,800
Most organizations skip constraints because constraints don’t demo well.

80
00:03:27,800 –> 00:03:29,360
Constraints look like friction.

81
00:03:29,360 –> 00:03:30,760
Constraints look like governance.

82
00:03:30,760 –> 00:03:32,480
Constraints look like someone saying no.

83
00:03:32,480 –> 00:03:33,760
But constraints are the design.

84
00:03:33,760 –> 00:03:34,880
They are the architecture.

85
00:03:34,880 –> 00:03:38,960
When you deploy chat first agents, you’re delegating without defining boundaries.

86
00:03:38,960 –> 00:03:43,680
You are handing a conversational interface, a set of tools and saying, be smart.

87
00:03:43,680 –> 00:03:47,240
Then you’re surprised that it behaves like a conversational interface with tools.

88
00:03:47,240 –> 00:03:50,920
And once the agent can act, the consequences stop being theoretical.

89
00:03:50,920 –> 00:03:52,800
The failure mode isn’t it answered wrong.

90
00:03:52,800 –> 00:03:55,760
The failure mode is it updated the wrong thing.

91
00:03:55,760 –> 00:03:57,360
It approved the wrong thing.

92
00:03:57,360 –> 00:03:59,280
It exposed the wrong thing.

93
00:03:59,280 –> 00:04:01,520
Or the quietest failure of all.

94
00:04:01,520 –> 00:04:04,040
People stop trusting it and routed around it.

95
00:04:04,040 –> 00:04:06,360
That’s why more prompts doesn’t fix this.

96
00:04:06,360 –> 00:04:07,480
Prompts can shape tone.

97
00:04:07,480 –> 00:04:09,400
Prompts can reduce some ambiguity.

98
00:04:09,400 –> 00:04:13,080
Prompts can nudge behavior, but prompts cannot manufacture a control plane that doesn’t

99
00:04:13,080 –> 00:04:14,080
exist.

100
00:04:14,080 –> 00:04:15,080
They can’t create missing contracts.

101
00:04:15,080 –> 00:04:17,160
They can’t make identity decisions explicit.

102
00:04:17,160 –> 00:04:20,640
They can’t enforce tool allow lists with preconditions and refusal rules.

103
00:04:20,640 –> 00:04:25,080
They can’t produce an audit trail if the system of record was never part of the design.

104
00:04:25,080 –> 00:04:27,080
So the first mandate is structural.

105
00:04:27,080 –> 00:04:29,080
Stop asking chat to behave like a system.

106
00:04:29,080 –> 00:04:32,080
If you want enterprise outcomes, you need enterprise mechanics.

107
00:04:32,080 –> 00:04:34,000
And that’s where this gets uncomfortable.

108
00:04:34,000 –> 00:04:35,760
Because it means the agent isn’t the product.

109
00:04:35,760 –> 00:04:38,920
The product is the architecture around it.

110
00:04:38,920 –> 00:04:40,240
Truth one.

111
00:04:40,240 –> 00:04:43,120
Most agents fail because they’re too conversational.

112
00:04:43,120 –> 00:04:45,480
Here’s the first truth that gets people defensive.

113
00:04:45,480 –> 00:04:48,520
First co-pilot agents fail because they are too conversational.

114
00:04:48,520 –> 00:04:52,480
Not because conversation is bad, because conversation becomes the center of gravity and everything

115
00:04:52,480 –> 00:04:54,000
else becomes optional.

116
00:04:54,000 –> 00:04:58,840
Identity discipline, input validation, transaction boundaries, evidence, the chat first pattern

117
00:04:58,840 –> 00:05:00,360
looks harmless at the start.

118
00:05:00,360 –> 00:05:04,680
A team opens co-pilot studio, writes friendly instructions, connects knowledge, adds tools

119
00:05:04,680 –> 00:05:06,760
and ships, the demo works.

120
00:05:06,760 –> 00:05:08,080
Then it hits the tenant.

121
00:05:08,080 –> 00:05:13,360
Partial permissions, messy data, throttling edge cases in a decade of just this once exceptions.

122
00:05:13,360 –> 00:05:17,000
And a chat first agent, intent capture is vague by design.

123
00:05:17,000 –> 00:05:20,240
The user asks, “Can you help me with onboarding?”

124
00:05:20,240 –> 00:05:22,760
And the agent has to decide what help means.

125
00:05:22,760 –> 00:05:27,880
Answer questions, create accounts, request equipment, assign training, open tickets, notify

126
00:05:27,880 –> 00:05:28,880
managers.

127
00:05:28,880 –> 00:05:32,760
The user didn’t specify and the agent doesn’t have a contract that forces specificity.

128
00:05:32,760 –> 00:05:33,760
So it improvises.

129
00:05:33,760 –> 00:05:35,400
And improvisation is fine for discovery.

130
00:05:35,400 –> 00:05:36,800
It’s lethal for execution.

131
00:05:36,800 –> 00:05:38,280
The next failure is tool choice.

132
00:05:38,280 –> 00:05:42,880
In a conversational pattern, tool invocation becomes a kind of live improvisational routing.

133
00:05:42,880 –> 00:05:47,000
The model selects what it thinks is the right connector, the right action, the right flow,

134
00:05:47,000 –> 00:05:49,720
based on whatever context it has in that moment.

135
00:05:49,720 –> 00:05:51,520
If two tools overlap, it guesses.

136
00:05:51,520 –> 00:05:53,840
If a tool fails, it tries something adjacent.

137
00:05:53,840 –> 00:05:57,560
If the knowledge source returns an ambiguous chunk, it fills the gap with language.

138
00:05:57,560 –> 00:06:00,640
This is where more prompts becomes the wrong fix.

139
00:06:00,640 –> 00:06:03,720
Teams respond to bad behavior by adding more instructions.

140
00:06:03,720 –> 00:06:04,720
Always do X.

141
00:06:04,720 –> 00:06:05,720
Never do Y.

142
00:06:05,720 –> 00:06:07,040
Ask clarifying questions.

143
00:06:07,040 –> 00:06:08,600
Confirm before you update.

144
00:06:08,600 –> 00:06:13,400
Play stack paragraphs of policy into an 8,000 character text box and call it governance.

145
00:06:13,400 –> 00:06:15,240
But prompts don’t create determinism.

146
00:06:15,240 –> 00:06:16,240
Prompts create persuasion.

147
00:06:16,240 –> 00:06:18,640
They’re an influence layer on a distributed decision engine.

148
00:06:18,640 –> 00:06:23,040
When the system has conflicting signals, user language, retrieved context, tool schemers,

149
00:06:23,040 –> 00:06:26,320
permission failures, connector timeouts, the prompt isn’t a compiler.

150
00:06:26,320 –> 00:06:27,400
It’s a suggestion.

151
00:06:27,400 –> 00:06:29,680
An enterprise is don’t run on suggestions.

152
00:06:29,680 –> 00:06:33,760
This is also why more language doesn’t equal more intelligence.

153
00:06:33,760 –> 00:06:34,760
Vibosities are masking layer.

154
00:06:34,760 –> 00:06:40,000
A chat first agent can produce a long, confident explanation while the underlying decision boundary

155
00:06:40,000 –> 00:06:41,240
remains undefined.

156
00:06:41,240 –> 00:06:42,560
The output sounds like control.

157
00:06:42,560 –> 00:06:45,000
The system behavior is still probabilistic.

158
00:06:45,000 –> 00:06:47,080
Stakeholders notice this in one specific way.

159
00:06:47,080 –> 00:06:49,760
They can’t get the same result twice.

160
00:06:49,760 –> 00:06:51,840
Different wording produces different rooting.

161
00:06:51,840 –> 00:06:53,360
Different day produces different answers.

162
00:06:53,360 –> 00:06:56,400
The same action request assumes different preconditions.

163
00:06:56,400 –> 00:06:59,640
Then IT gets asked what happens when we roll this out to 30,000 people.

164
00:06:59,640 –> 00:07:02,720
A conversational agent can’t answer that reliably.

165
00:07:02,720 –> 00:07:06,320
And when an agent can’t predict its own action path, you’re not deploying automation.

166
00:07:06,320 –> 00:07:07,920
You’re deploying conditional chaos.

167
00:07:07,920 –> 00:07:10,240
Now there is a place where chat works extremely well.

168
00:07:10,240 –> 00:07:11,400
Chat is great at discovery.

169
00:07:11,400 –> 00:07:12,400
Triage.

170
00:07:12,400 –> 00:07:13,400
Clarification.

171
00:07:13,400 –> 00:07:14,400
Samarization.

172
00:07:14,400 –> 00:07:16,360
Help me understand what’s going on.

173
00:07:16,360 –> 00:07:17,840
Show me the options.

174
00:07:17,840 –> 00:07:19,200
Explain the policy.

175
00:07:19,200 –> 00:07:20,880
Which team owns this?

176
00:07:20,880 –> 00:07:25,120
That’s where probabilistic behavior is acceptable because the output is advisory, not state

177
00:07:25,120 –> 00:07:26,120
changing.

178
00:07:26,120 –> 00:07:29,360
Chat is also great at absorbing messy inputs from humans and turning them into structured

179
00:07:29,360 –> 00:07:30,360
parameters.

180
00:07:30,360 –> 00:07:31,360
That’s valuable.

181
00:07:31,360 –> 00:07:32,440
But it’s a front end, not the engine.

182
00:07:32,440 –> 00:07:34,120
And the mistake is asking chat to be the engine.

183
00:07:34,120 –> 00:07:38,600
The moment you let conversation carry the workflow, you’ve made the workflow dependent on

184
00:07:38,600 –> 00:07:39,800
language variance.

185
00:07:39,800 –> 00:07:41,400
And language variance is infinite.

186
00:07:41,400 –> 00:07:43,360
Users will ask the same thing 10 different ways.

187
00:07:43,360 –> 00:07:44,360
They will omit details.

188
00:07:44,360 –> 00:07:45,600
They will paste screenshots.

189
00:07:45,600 –> 00:07:50,040
They will drop half a ticket history into the chat and expect the agent to just know.

190
00:07:50,040 –> 00:07:53,400
So the right design move is not make the agent more conversational.

191
00:07:53,400 –> 00:07:54,400
It’s the opposite.

192
00:07:54,400 –> 00:07:57,320
Make the agent less conversational where it matters.

193
00:07:57,320 –> 00:08:00,800
Shrink the conversation surface to parameter collection and confirmation.

194
00:08:00,800 –> 00:08:03,280
Post-decision boundaries into explicit routing.

195
00:08:03,280 –> 00:08:05,680
Push execution into deterministic systems.

196
00:08:05,680 –> 00:08:09,440
Flows, APIs, orchestration that you can test, version and audit.

197
00:08:09,440 –> 00:08:11,040
Which raises the obvious question.

198
00:08:11,040 –> 00:08:13,480
If chat can’t be the center of gravity, what can?

199
00:08:13,480 –> 00:08:15,120
A control surface.

200
00:08:15,120 –> 00:08:16,200
Delegation with contracts.

201
00:08:16,200 –> 00:08:19,760
An agent designed like an enterprise system, not a personality.

202
00:08:19,760 –> 00:08:20,760
Truth too.

203
00:08:20,760 –> 00:08:23,440
An agent is a delegated control surface.

204
00:08:23,440 –> 00:08:26,360
Truth number two is the one that breaks most org charts.

205
00:08:26,360 –> 00:08:28,480
An agent isn’t an AI employee.

206
00:08:28,480 –> 00:08:30,240
It’s a delegated control surface.

207
00:08:30,240 –> 00:08:34,000
The difference matters because employees come with training, supervision, professional

208
00:08:34,000 –> 00:08:36,920
judgment and critical personal accountability.

209
00:08:36,920 –> 00:08:37,960
Agents come with none of that.

210
00:08:37,960 –> 00:08:42,440
They come with permissions, tools, data pathways and a model that will happily produce a plausible

211
00:08:42,440 –> 00:08:44,920
plan even when the environment is lying to it.

212
00:08:44,920 –> 00:08:48,800
When someone says we’re giving the HR agent access to onboarding systems, what they’re

213
00:08:48,800 –> 00:08:53,800
really saying is we’re exposing a set of execution pathways to a probabilistic decision engine.

214
00:08:53,800 –> 00:08:54,800
That’s not inspirational.

215
00:08:54,800 –> 00:08:56,120
That’s a risk statement.

216
00:08:56,120 –> 00:09:01,120
Capability of an agent equals the sum of what you exposed, not what you wrote in instructions,

217
00:09:01,120 –> 00:09:02,680
not what the demo showed.

218
00:09:02,680 –> 00:09:04,000
Exposed pathways.

219
00:09:04,000 –> 00:09:08,000
Identity context, connector permissions, action schemers, knowledge sources and whatever

220
00:09:08,000 –> 00:09:10,000
the platform can reach at runtime.

221
00:09:10,000 –> 00:09:11,600
That’s the real surface area.

222
00:09:11,600 –> 00:09:15,960
And in enterprise systems, surface area is destiny because agents don’t amplify your best

223
00:09:15,960 –> 00:09:16,960
process.

224
00:09:16,960 –> 00:09:19,320
They amplify your weakest governance path.

225
00:09:19,320 –> 00:09:23,560
If your tenant already has temporary exceptions over permission service accounts, stale

226
00:09:23,560 –> 00:09:28,480
app registrations, connectors, nobody owns, sharepoint sites that grew like mold and agent

227
00:09:28,480 –> 00:09:29,480
doesn’t fix that.

228
00:09:29,480 –> 00:09:32,640
It automates it at speed with a friendly explanation attached.

229
00:09:32,640 –> 00:09:35,680
This is why the AI employee metaphor is so dangerous.

230
00:09:35,680 –> 00:09:39,240
It tricks leadership into thinking the hard part is adoption and change management, but

231
00:09:39,240 –> 00:09:43,960
architecturally the hard part is delegation mechanics who authorizes who executes and

232
00:09:43,960 –> 00:09:47,560
who owns the outcome when things go wrong start with authorization.

233
00:09:47,560 –> 00:09:50,720
In a real system, an action doesn’t happen because someone asked nicely.

234
00:09:50,720 –> 00:09:54,560
It happens because an identity with the right entitlements invoked a defined operation under

235
00:09:54,560 –> 00:09:55,640
policy.

236
00:09:55,640 –> 00:09:57,480
With agents, people blur that line.

237
00:09:57,480 –> 00:09:59,880
They assume the user’s request is the authorization.

238
00:09:59,880 –> 00:10:00,880
It isn’t.

239
00:10:00,880 –> 00:10:02,720
It’s input.

240
00:10:02,720 –> 00:10:04,280
Authorization is the control plane decision.

241
00:10:04,280 –> 00:10:08,280
Should this user in this context be allowed to cause the state change using this tool right

242
00:10:08,280 –> 00:10:09,280
now?

243
00:10:09,280 –> 00:10:10,520
Then execution.

244
00:10:10,520 –> 00:10:14,800
Execution is where the damage happens because execution changes state outside the chat.

245
00:10:14,800 –> 00:10:17,440
Create update, delete, approve, notify, escalate.

246
00:10:17,440 –> 00:10:22,120
If you let a model choose those operations freely, you’ve moved from assistant to unbounded

247
00:10:22,120 –> 00:10:23,120
operator.

248
00:10:23,120 –> 00:10:25,280
And that’s fine in consumer software.

249
00:10:25,280 –> 00:10:28,920
Enterprises don’t get to do fine and then ownership.

250
00:10:28,920 –> 00:10:32,200
When an agent updates a record incorrectly, who owns that incident?

251
00:10:32,200 –> 00:10:36,400
The maker, the system owner, the identity team, the business sponsor, everyone will point

252
00:10:36,400 –> 00:10:38,920
at everyone else unless you force it into the design.

253
00:10:38,920 –> 00:10:44,280
That’s why agents need an explicit responsibility model, not a vague product owner label in a spreadsheet.

254
00:10:44,280 –> 00:10:46,400
Here’s the uncomfortable truth.

255
00:10:46,400 –> 00:10:49,520
A candidate execution without explicit ownership becomes political debt fast.

256
00:10:49,520 –> 00:10:51,920
Now connect this back to what people actually build.

257
00:10:51,920 –> 00:10:56,080
A copilot agent is a UI that fronts a bundle of integrations.

258
00:10:56,080 –> 00:10:58,000
Each integration has a failure mode.

259
00:10:58,000 –> 00:11:02,400
Permission denied, throttling, schema change, timeout, bad data, partial success.

260
00:11:02,400 –> 00:11:04,720
When those happen, the model will try to be helpful.

261
00:11:04,720 –> 00:11:05,800
It will root around.

262
00:11:05,800 –> 00:11:07,480
It will attempt an alternative.

263
00:11:07,480 –> 00:11:09,000
It will summarize a guess.

264
00:11:09,000 –> 00:11:13,440
And if you didn’t design a refusal rule or an escalation path, it will keep going.

265
00:11:13,440 –> 00:11:15,880
That means your agent is not helping employees.

266
00:11:15,880 –> 00:11:19,880
It is making runtime decisions in a distributed environment you barely control.

267
00:11:19,880 –> 00:11:22,400
So the mandate is not make the agent smarter.

268
00:11:22,400 –> 00:11:25,640
The mandate is make delegation explicit.

269
00:11:25,640 –> 00:11:27,240
Define the identity model.

270
00:11:27,240 –> 00:11:31,280
Run as user, run as service or hybrid with consequences documented.

271
00:11:31,280 –> 00:11:33,040
Define the tool allow list.

272
00:11:33,040 –> 00:11:34,360
Which actions exist?

273
00:11:34,360 –> 00:11:35,800
What inputs they accept?

274
00:11:35,800 –> 00:11:37,320
What outputs they return?

275
00:11:37,320 –> 00:11:38,800
And what errors look like?

276
00:11:38,800 –> 00:11:40,120
Define the decision boundaries.

277
00:11:40,120 –> 00:11:41,520
What the agent may infer?

278
00:11:41,520 –> 00:11:42,720
What it must verify?

279
00:11:42,720 –> 00:11:43,840
And when it must stop?

280
00:11:43,840 –> 00:11:45,680
And define the audit surface?

281
00:11:45,680 –> 00:11:50,000
It gets written to the system of records so the business can reconstruct why something happened.

282
00:11:50,000 –> 00:11:54,400
That’s the difference between an agent as a novelty and an agent as an enterprise-controlled

283
00:11:54,400 –> 00:11:55,400
surface.

284
00:11:55,400 –> 00:11:58,760
And once you accept that framing, enthusiasm becomes irrelevant.

285
00:11:58,760 –> 00:12:00,760
Architecture becomes the product.

286
00:12:00,760 –> 00:12:02,920
Which is exactly where this goes next.

287
00:12:02,920 –> 00:12:07,960
Deterministic ROI only shows up when the design itself is deterministic.

288
00:12:07,960 –> 00:12:10,080
The architectural mandate.

289
00:12:10,080 –> 00:12:12,720
Deterministic ROI requires deterministic design.

290
00:12:12,720 –> 00:12:14,840
So here’s the mandate stated plainly.

291
00:12:14,840 –> 00:12:18,280
The moment an agent can take action, you stop building an agent.

292
00:12:18,280 –> 00:12:19,440
You start building a system.

293
00:12:19,440 –> 00:12:24,120
And systems only produce ROI when they behave predictably under real conditions.

294
00:12:24,120 –> 00:12:28,680
Partial data, partial permissions, outages, policy changes, and users who never read your

295
00:12:28,680 –> 00:12:29,920
documentation.

296
00:12:29,920 –> 00:12:35,360
That distinction matters because most ROI conversations about agents are really demo conversations.

297
00:12:35,360 –> 00:12:39,280
They measure how impressed someone felt, how quickly a pilot team shipped, how many questions

298
00:12:39,280 –> 00:12:41,680
the agent answered without embarrassing itself in a meeting.

299
00:12:41,680 –> 00:12:42,680
That’s not ROI.

300
00:12:42,680 –> 00:12:44,280
That’s novelty with a budget.

301
00:12:44,280 –> 00:12:46,440
Deterministic ROI is different.

302
00:12:46,440 –> 00:12:50,680
Deterministic ROI means you can name the outcome, measure the delta, and trust it will repeat

303
00:12:50,680 –> 00:12:54,160
tomorrow at 10x scale without turning into an incident backlog.

304
00:12:54,160 –> 00:12:58,280
And that only happens when the design is deterministic where it needs to be.

305
00:12:58,280 –> 00:13:01,040
Enterprises tolerate probabilistic behavior in one place.

306
00:13:01,040 –> 00:13:02,040
Interpretation.

307
00:13:02,040 –> 00:13:06,040
They tolerate variance in how a user asks a question, how an agent summarizes information,

308
00:13:06,040 –> 00:13:07,760
how it suggests next steps.

309
00:13:07,760 –> 00:13:08,760
That’s the reasoning layer.

310
00:13:08,760 –> 00:13:09,760
And it’s useful.

311
00:13:09,760 –> 00:13:12,120
They do not tolerate probabilistic behavior and execution.

312
00:13:12,120 –> 00:13:16,320
Not in approvals, not in record updates, not in entitlement changes, not in ticket routing,

313
00:13:16,320 –> 00:13:19,560
not in anything that moves state in a system of record.

314
00:13:19,560 –> 00:13:22,360
So the architectural mandate is a separation of concerns.

315
00:13:22,360 –> 00:13:24,600
Let the model do what models are good at.

316
00:13:24,600 –> 00:13:29,520
Language normalization, intent extraction, ambiguity detection, summarization, and classification.

317
00:13:29,520 –> 00:13:33,320
Then hand execution to what enterprises already trust.

318
00:13:33,320 –> 00:13:38,880
Deterministic workflows, explicit APIs, validated inputs, and governed identities.

319
00:13:38,880 –> 00:13:41,400
The core line is still the simplest one.

320
00:13:41,400 –> 00:13:42,640
Chat is for discovery.

321
00:13:42,640 –> 00:13:44,800
Agents are for execution.

322
00:13:44,800 –> 00:13:46,840
Confuse the two and you automate ambiguity.

323
00:13:46,840 –> 00:13:49,520
And yes, some people will argue, but the model can do tool calling.

324
00:13:49,520 –> 00:13:50,840
They can choose actions.

325
00:13:50,840 –> 00:13:51,840
Of course it can.

326
00:13:51,840 –> 00:13:52,840
That’s not the question.

327
00:13:52,840 –> 00:13:57,240
The question is whether you can prove ahead of time what it will do with a given intent.

328
00:13:57,240 –> 00:14:00,000
If you can’t predict the action path, you can’t govern it.

329
00:14:00,000 –> 00:14:01,680
If you can’t govern it, you can’t scale it.

330
00:14:01,680 –> 00:14:05,680
If you can’t scale it, your ROI is a one time demo artifact.

331
00:14:05,680 –> 00:14:07,400
Determinism isn’t about removing AI.

332
00:14:07,400 –> 00:14:11,240
It’s about placing AI inside a design that can survive entropy.

333
00:14:11,240 –> 00:14:13,720
Because entropy is the default state of every tenant.

334
00:14:13,720 –> 00:14:18,120
Connectors drift, permissions drift, knowledge sources drift, people leave.

335
00:14:18,120 –> 00:14:21,720
Temporary exceptions become permanent, platforms update, models update.

336
00:14:21,720 –> 00:14:24,440
Suddenly the agent behaves differently and no one can explain why.

337
00:14:24,440 –> 00:14:26,640
A deterministic design assumes drift.

338
00:14:26,640 –> 00:14:27,880
It constrains it.

339
00:14:27,880 –> 00:14:31,800
And it creates a stable envelope where change can happen without turning behavior into

340
00:14:31,800 –> 00:14:32,800
roulette.

341
00:14:32,800 –> 00:14:34,240
So what does that actually look like in practice?

342
00:14:34,240 –> 00:14:35,640
It looks like contracts.

343
00:14:35,640 –> 00:14:37,760
Instructions contracts.

344
00:14:37,760 –> 00:14:40,960
A contract says for this intent these are the required inputs.

345
00:14:40,960 –> 00:14:42,360
These are the allowable tools.

346
00:14:42,360 –> 00:14:45,760
These are the preconditions and these are the outcomes we will write back to the system

347
00:14:45,760 –> 00:14:46,760
of record.

348
00:14:46,760 –> 00:14:51,240
If any of those aren’t true, the agent refuses, escalates or roots to a human.

349
00:14:51,240 –> 00:14:52,240
No improvisation.

350
00:14:52,240 –> 00:14:54,680
No best effort updates to production systems.

351
00:14:54,680 –> 00:14:58,040
This is also where people misunderstand deterministic as rigid.

352
00:14:58,040 –> 00:14:59,360
It isn’t.

353
00:14:59,360 –> 00:15:05,040
You can be flexible in how users express intent and strict in how the system executes.

354
00:15:05,040 –> 00:15:07,960
Exactly how mature enterprise software already works.

355
00:15:07,960 –> 00:15:10,320
Humans type messy things into forms all day.

356
00:15:10,320 –> 00:15:12,480
The system doesn’t take the mess literally.

357
00:15:12,480 –> 00:15:15,120
It validates normalizes and enforces policy.

358
00:15:15,120 –> 00:15:16,160
Agents need the same shape.

359
00:15:16,160 –> 00:15:20,920
So when someone asks for proven ROI, the real answer isn’t “build a better prompt”.

360
00:15:20,920 –> 00:15:25,200
The real answer is “design an agent that produces repeatable outcomes”.

361
00:15:25,200 –> 00:15:30,200
Repeatable means the same intent and the same state lead to the same execution path.

362
00:15:30,200 –> 00:15:34,560
Legal means you can track throughput, cycle time, escalation rate and error rates without

363
00:15:34,560 –> 00:15:36,520
arguing about definitions.

364
00:15:36,520 –> 00:15:41,600
Stable means you can operate it, version it, test it, roll it back and audit it.

365
00:15:41,600 –> 00:15:43,520
That’s deterministic ROI.

366
00:15:43,520 –> 00:15:47,200
And it’s the only ROI that survives beyond the pilot phase.

367
00:15:47,200 –> 00:15:49,160
Everything else is the familiar pattern.

368
00:15:49,160 –> 00:15:53,560
Initial excitement then drift, then exceptions, then manual overrides, then quite abandonment.

369
00:15:53,560 –> 00:15:55,680
So the mandate is architectural honesty.

370
00:15:55,680 –> 00:15:59,040
The side where you accept probability and where you demand certainty.

371
00:15:59,040 –> 00:16:03,240
To make this concrete, the next piece is the decision model that replaces chat first agents

372
00:16:03,240 –> 00:16:04,240
entirely.

373
00:16:04,240 –> 00:16:08,280
Because once you see it, you stop designing conversations, you start designing systems that

374
00:16:08,280 –> 00:16:09,680
happen to speak.

375
00:16:09,680 –> 00:16:15,480
The decision model, event, reasoning, orchestration, execution, record.

376
00:16:15,480 –> 00:16:21,120
Here’s the replacement model, not a better chat, a decision system that happens to speak.

377
00:16:21,120 –> 00:16:26,160
Event reasoning, orchestration, execution, record, five stages, clear boundaries, clear ownership,

378
00:16:26,160 –> 00:16:27,360
clear evidence.

379
00:16:27,360 –> 00:16:31,960
Not with the event because enterprises love to pretend an agent just helps people.

380
00:16:31,960 –> 00:16:32,960
It doesn’t.

381
00:16:32,960 –> 00:16:35,240
An enterprise agent should start because something explicit happened.

382
00:16:35,240 –> 00:16:39,680
A user request in teams, a form submission, a service now ticket moved to a new state,

383
00:16:39,680 –> 00:16:42,600
a high risk sign-in event, a scheduled daily run.

384
00:16:42,600 –> 00:16:44,760
You don’t let the agent wake up based on vibes.

385
00:16:44,760 –> 00:16:46,280
An event is the trigger contract.

386
00:16:46,280 –> 00:16:48,080
It answers, “What starts this?

387
00:16:48,080 –> 00:16:49,080
Who started it?

388
00:16:49,080 –> 00:16:51,640
And what context is guaranteed at the start?”

389
00:16:51,640 –> 00:16:54,720
That matters because most agent drift starts at the beginning.

390
00:16:54,720 –> 00:16:58,600
If the start condition is loose, everything downstream becomes interpretive.

391
00:16:58,600 –> 00:17:00,640
Loose triggers create lose outcomes.

392
00:17:00,640 –> 00:17:02,680
Deterministic systems don’t start that way.

393
00:17:02,680 –> 00:17:07,800
Next reasoning, this is where the model earns its keep, but only inside a bounded context.

394
00:17:07,800 –> 00:17:12,760
Reesoning means interpret messy human language, detect ambiguity, extract parameters, classify

395
00:17:12,760 –> 00:17:15,880
intent and decide what must be verified before anything happens.

396
00:17:15,880 –> 00:17:20,240
It’s also where the agent can say, “I can do that, but I need these three inputs.”

397
00:17:20,240 –> 00:17:22,640
Or, “I can’t do that without approval.”

398
00:17:22,640 –> 00:17:27,200
That context means the agent is not allowed to treat the entire tenant as its memory.

399
00:17:27,200 –> 00:17:32,000
It gets a defined set of knowledge sources, a defined set of operational data, and a defined

400
00:17:32,000 –> 00:17:33,880
set of assumptions it may make.

401
00:17:33,880 –> 00:17:36,480
Everything else is either verification or refusal.

402
00:17:36,480 –> 00:17:40,200
This is where a lot of teams get lazy and call retrieval reasoning.

403
00:17:40,200 –> 00:17:41,520
Retrieval is just scavenging.

404
00:17:41,520 –> 00:17:45,320
Reesoning is deciding what to do with what you found and more importantly, what you’re

405
00:17:45,320 –> 00:17:47,400
not allowed to do without stronger evidence.

406
00:17:47,400 –> 00:17:48,720
Then orchestration.

407
00:17:48,720 –> 00:17:53,120
It is policy not creativity.

408
00:17:53,120 –> 00:17:55,440
Orcustration decides which tool is allowed to run for this intent under these conditions

409
00:17:55,440 –> 00:17:56,440
with these inputs.

410
00:17:56,440 –> 00:18:01,520
It is the layer that converts the user, ask the thing, into we are invoking exactly this

411
00:18:01,520 –> 00:18:02,800
contract.

412
00:18:02,800 –> 00:18:05,600
If two tools overlap, orchestration doesn’t guess.

413
00:18:05,600 –> 00:18:06,600
It roots.

414
00:18:06,600 –> 00:18:07,800
This is where the allow list lives.

415
00:18:07,800 –> 00:18:09,320
This is where preconditions live.

416
00:18:09,320 –> 00:18:11,040
This is where refusal rules live.

417
00:18:11,040 –> 00:18:14,840
If the user wants an action that isn’t in the contract, the agent doesn’t improvise.

418
00:18:14,840 –> 00:18:15,840
It stops.

419
00:18:15,840 –> 00:18:19,160
It escalates.

420
00:18:19,160 –> 00:18:23,240
And yes, co-pilot studio can do orchestration in different ways.

421
00:18:23,240 –> 00:18:24,240
Topics.

422
00:18:24,240 –> 00:18:25,240
Generative orchestration.

423
00:18:25,240 –> 00:18:26,240
Tool definitions.

424
00:18:26,240 –> 00:18:27,240
Agent flows.

425
00:18:27,240 –> 00:18:28,240
The mechanism isn’t the point.

426
00:18:28,240 –> 00:18:31,400
The point is that orchestration behaves like an authorization compiler.

427
00:18:31,400 –> 00:18:34,600
It turns intent into permitted operations or it rejects it.

428
00:18:34,600 –> 00:18:36,320
After orchestration comes execution.

429
00:18:36,320 –> 00:18:38,400
And execution should be boring.

430
00:18:38,400 –> 00:18:41,280
Execution is where deterministic systems do deterministic work.

431
00:18:41,280 –> 00:18:42,280
Power automate.

432
00:18:42,280 –> 00:18:43,280
Logic apps.

433
00:18:43,280 –> 00:18:44,920
APIs with typed schemas.

434
00:18:44,920 –> 00:18:49,040
Logic flows with idempotency, so run it again doesn’t duplicate side effects.

435
00:18:49,040 –> 00:18:50,360
Retrieves with timeouts.

436
00:18:50,360 –> 00:18:51,600
Known failure states.

437
00:18:51,600 –> 00:18:53,920
The stuff that incident reviews actually understand.

438
00:18:53,920 –> 00:18:58,560
The agent should not execute by narrating and then doing ad hoc tool calls until it feels

439
00:18:58,560 –> 00:18:59,560
done.

440
00:18:59,560 –> 00:19:01,640
That’s conditional chaos disguised as helpfulness.

441
00:19:01,640 –> 00:19:05,760
Instead execution runs through components you can version, test and observe.

442
00:19:05,760 –> 00:19:07,000
You can run them in isolation.

443
00:19:07,000 –> 00:19:08,080
You can gate releases.

444
00:19:08,080 –> 00:19:09,080
You can roll back.

445
00:19:09,080 –> 00:19:12,040
You can prove exactly what happened when an action mutated state.

446
00:19:12,040 –> 00:19:13,720
And finally, record.

447
00:19:13,720 –> 00:19:17,560
This is the stage most teams skip and its wide trust evaporates.

448
00:19:17,560 –> 00:19:21,320
A system that can’t write its outcomes and rationale to a system of record isn’t an

449
00:19:21,320 –> 00:19:22,320
enterprise system.

450
00:19:22,320 –> 00:19:24,680
It’s a suggestion engine with side effects.

451
00:19:24,680 –> 00:19:28,680
Record means capture the event, the intent, the key inputs, the policy decision, the tools

452
00:19:28,680 –> 00:19:32,120
invoked, the outcome and the handoff if it escalated.

453
00:19:32,120 –> 00:19:34,400
Service now is a common center of gravity here.

454
00:19:34,400 –> 00:19:38,560
Not because it’s magic, but because it already represents state ownership and auditability

455
00:19:38,560 –> 00:19:39,560
for work.

456
00:19:39,560 –> 00:19:40,880
It’s where work becomes accountable.

457
00:19:40,880 –> 00:19:44,120
The record is also how you measure ROI without lying to yourself.

458
00:19:44,120 –> 00:19:46,440
Not users liked it.

459
00:19:46,440 –> 00:19:52,440
Actual deltas, cycle time, escalation reduction, rework rate, exception volume, manual intervention.

460
00:19:52,440 –> 00:19:54,200
So this model does one crucial thing.

461
00:19:54,200 –> 00:19:58,320
It moves the agent from being a conversational blob to being a controlled pipeline.

462
00:19:58,320 –> 00:20:00,120
Humans can still type messy requests.

463
00:20:00,120 –> 00:20:03,840
The model can still reason, but action happens only through contracts and evidence lands

464
00:20:03,840 –> 00:20:06,000
in systems that governance already understands.

465
00:20:06,000 –> 00:20:08,040
Now the uncomfortable part.

466
00:20:08,040 –> 00:20:12,320
Once you build this way, you’ll notice how many popular agent designs are the exact opposite

467
00:20:12,320 –> 00:20:14,520
and they fail in the same three ways every time.

468
00:20:14,520 –> 00:20:18,840
The anti-patterns, three ways enterprises build conditional chaos.

469
00:20:18,840 –> 00:20:22,760
Now you can spot the anti-patterns instantly because they all violate the same law.

470
00:20:22,760 –> 00:20:26,440
They collapse, decisioning and execution back into chat, then pretend governance will catch

471
00:20:26,440 –> 00:20:27,440
up later.

472
00:20:27,440 –> 00:20:29,640
There are three versions of this.

473
00:20:29,640 –> 00:20:32,040
Enterprises rotate through them like it’s a maturity model.

474
00:20:32,040 –> 00:20:33,040
It isn’t.

475
00:20:33,040 –> 00:20:35,440
It’s just three different ways to automate ambiguity.

476
00:20:35,440 –> 00:20:36,440
Anti-pattern 1.

477
00:20:36,440 –> 00:20:37,640
Decide while you talk.

478
00:20:37,640 –> 00:20:40,920
This is the agent that narrates a plan and commits actions in the same breath.

479
00:20:40,920 –> 00:20:44,560
The user asks, can you offboard this contractor?

480
00:20:44,560 –> 00:20:49,600
The agent starts explaining the steps, but while it’s explaining, it’s also calling tools.

481
00:20:49,600 –> 00:20:54,080
Disabling the account, revoking sessions, removing licenses, closing access groups, updating

482
00:20:54,080 –> 00:20:55,240
a ticket.

483
00:20:55,240 –> 00:21:00,720
It feels efficient because it keeps the conversation moving, but architecturally it’s catastrophic.

484
00:21:00,720 –> 00:21:04,080
Because the system has no hard boundary between thinking and doing.

485
00:21:04,080 –> 00:21:05,720
The plan becomes execution.

486
00:21:05,720 –> 00:21:09,480
And once the agent can do partial work, you get the worst possible failure mode, half

487
00:21:09,480 –> 00:21:12,800
completed state change with a polite summary at the end.

488
00:21:12,800 –> 00:21:17,160
In incident terms, this is how you end up with account disabled, but mailbox still accessible

489
00:21:17,160 –> 00:21:22,040
ticket updated, but approvals missing, access removed in one system but not the other.

490
00:21:22,040 –> 00:21:24,840
Then the user asks, wait, did it actually do it?

491
00:21:24,840 –> 00:21:28,800
And no one has a clean answer because the conversation transcript is not an audit log.

492
00:21:28,800 –> 00:21:33,760
The enterprise requires a commit point, a confirmation, a transaction boundary.

493
00:21:33,760 –> 00:21:37,640
If the agent can’t separate, I understand what you want from I am now changing state.

494
00:21:37,640 –> 00:21:38,880
You don’t have automation.

495
00:21:38,880 –> 00:21:41,200
You have a probabilistic operator.

496
00:21:41,200 –> 00:21:43,440
Antipatent 2, retrieval equals reasoning.

497
00:21:43,440 –> 00:21:46,040
This is the agent that gets praised because it knows a lot.

498
00:21:46,040 –> 00:21:50,200
It has SharePoint, it has PDFs, it has a knowledge base, it can search, it can quote internal

499
00:21:50,200 –> 00:21:53,880
docs, and the team assumes that because it can retrieve context, it can make operational

500
00:21:53,880 –> 00:21:54,880
decisions.

501
00:21:54,880 –> 00:21:56,640
But retrieval is not reasoning.

502
00:21:56,640 –> 00:21:58,120
Retrieval is just context scavenging.

503
00:21:58,120 –> 00:22:02,360
In other words, the agent can find text that looks relevant, but it still has to decide

504
00:22:02,360 –> 00:22:06,240
what that text means, whether it applies, whether it’s current, whether the user is allowed

505
00:22:06,240 –> 00:22:11,400
to act on it, and whether the text implies an executable workflow or just a guideline.

506
00:22:11,400 –> 00:22:13,080
And here’s the weird part.

507
00:22:13,080 –> 00:22:17,600
Retrieval makes agents look more confident while being less safe because now the agent can

508
00:22:17,600 –> 00:22:20,240
anchor a wrong decision to a real paragraph.

509
00:22:20,240 –> 00:22:24,640
It can cite a policy snippet that is outdated, incomplete, or scoped to a different business

510
00:22:24,640 –> 00:22:25,640
unit.

511
00:22:25,640 –> 00:22:28,320
The output feels grounded, the decision is still unbounded.

512
00:22:28,320 –> 00:22:32,160
This is where audit risk quietly grows if the organization can’t attribute which source,

513
00:22:32,160 –> 00:22:35,880
which version, which section, and which rule actually govern the action, we retrieved

514
00:22:35,880 –> 00:22:39,560
something becomes a liability, not a control.

515
00:22:39,560 –> 00:22:40,720
Retrieval supports reasoning.

516
00:22:40,720 –> 00:22:43,080
It does not replace it.

517
00:22:43,080 –> 00:22:44,560
Antipatern 3.

518
00:22:44,560 –> 00:22:48,360
Prompt branching logic that nobody can explain after the third exception.

519
00:22:48,360 –> 00:22:52,880
This is the agent that starts as a clean pilot, a few intents, a few flows, a few prompts.

520
00:22:52,880 –> 00:22:55,240
Then reality shows up, someone needs an exception.

521
00:22:55,240 –> 00:22:59,240
Then another, then a special case for one region, then a temporary bypass because a connector

522
00:22:59,240 –> 00:23:02,120
is flaky, then a workaround because the knowledge source isn’t a big deal.

523
00:23:02,120 –> 00:23:05,080
It’s outdated yet, so the team keeps adding conditional text.

524
00:23:05,080 –> 00:23:08,560
More instructions, more if the user says x, do y, unless z.

525
00:23:08,560 –> 00:23:12,240
The logic lives in prompts, topic nodes, and scattered tool descriptions.

526
00:23:12,240 –> 00:23:16,240
It’s not versioned like a workflow, it’s not tested like software, it’s not even readable

527
00:23:16,240 –> 00:23:17,640
as a policy document.

528
00:23:17,640 –> 00:23:20,000
Over time, the agent becomes an entropy museum.

529
00:23:20,000 –> 00:23:21,960
Every workaround preserved forever.

530
00:23:21,960 –> 00:23:25,440
And the symptom pattern is consistent, first, manual overrides.

531
00:23:25,440 –> 00:23:27,880
People rerun the workflow just to be safe.

532
00:23:27,880 –> 00:23:29,800
Second, inconsistent approvals.

533
00:23:29,800 –> 00:23:34,000
The same request goes down different paths because the branching rules depend on phrasing

534
00:23:34,000 –> 00:23:35,320
and context chunks.

535
00:23:35,320 –> 00:23:37,480
Third, silent failure modes.

536
00:23:37,480 –> 00:23:41,800
The agent times out, falls back, or partially completes actions while telling the user it

537
00:23:41,800 –> 00:23:42,960
handled it.

538
00:23:42,960 –> 00:23:45,840
This is how trust dies, not loudly, quietly.

539
00:23:45,840 –> 00:23:49,440
Users stop escalating issues because they stop expecting the agent to behave.

540
00:23:49,440 –> 00:23:53,760
They root around it, they go back to email, teams, messages, and tribal knowledge.

541
00:23:53,760 –> 00:23:55,800
Leadership still thinks there’s an agent program.

542
00:23:55,800 –> 00:23:59,760
In reality, there’s an abandoned chat surface connected to production systems.

543
00:23:59,760 –> 00:24:04,240
If you want a quick diagnostic when an agent fails, ask where the decision boundary lives.

544
00:24:04,240 –> 00:24:07,160
If it lives in conversation, you build conditional chaos.

545
00:24:07,160 –> 00:24:11,960
If it lives in contracts, orchestration, and a system of record, you build something that

546
00:24:11,960 –> 00:24:13,480
can survive scale.

547
00:24:13,480 –> 00:24:17,200
Now it’s time to show what contracts first looks like when a regulated enterprise does

548
00:24:17,200 –> 00:24:18,800
it on purpose.

549
00:24:18,800 –> 00:24:22,400
The success case, regulated enterprise that started with contracts.

550
00:24:22,400 –> 00:24:25,880
A regulated enterprise doesn’t get agents right because they’re smarter.

551
00:24:25,880 –> 00:24:29,000
They get it right because the environment punishes improvisation.

552
00:24:29,000 –> 00:24:32,720
If you’re in financial services, farmer, aerospace, or regulated manufacturing, you don’t get

553
00:24:32,720 –> 00:24:35,640
to hide behind the model did something weird.

554
00:24:35,640 –> 00:24:38,040
You either produce evidence or you produce risk.

555
00:24:38,040 –> 00:24:41,560
So the teams that succeed don’t start by asking what can co-pilot do.

556
00:24:41,560 –> 00:24:45,400
They start by asking, what are we allowed to delegate under what constraints with what

557
00:24:45,400 –> 00:24:46,400
proof?

558
00:24:46,400 –> 00:24:48,920
That distinction matters because it flips the build order.

559
00:24:48,920 –> 00:24:50,880
The success pattern looks boring on purpose.

560
00:24:50,880 –> 00:24:52,920
It starts with contracts and identity.

561
00:24:52,920 –> 00:24:54,480
Only later does it add conversation.

562
00:24:54,480 –> 00:24:58,320
In one program like this, the initial scope wasn’t automate everything.

563
00:24:58,320 –> 00:25:02,480
It was one value stream that already had a system of record in a known escalation path.

564
00:25:02,480 –> 00:25:05,840
Think service requests, approvals, or control changes.

565
00:25:05,840 –> 00:25:08,400
The agent didn’t own the process.

566
00:25:08,400 –> 00:25:09,920
It owned a narrow slice.

567
00:25:09,920 –> 00:25:15,360
Intake normalization, parameter validation, rooting and execution of a fixed set of actions,

568
00:25:15,360 –> 00:25:18,360
and the first architectural decision was the one most pilots ignore.

569
00:25:18,360 –> 00:25:19,840
Who does this run as?

570
00:25:19,840 –> 00:25:22,040
They made the identity model explicit up front.

571
00:25:22,040 –> 00:25:24,400
User delegated actions, state user delegated.

572
00:25:24,400 –> 00:25:27,880
Service executed actions ran under dedicated identities with least privilege.

573
00:25:27,880 –> 00:25:32,320
No shared maker credentials, no temporary admin access because it was faster.

574
00:25:32,320 –> 00:25:36,680
Every permission was treated as a contract, not a convenience, because in regulated environments

575
00:25:36,680 –> 00:25:40,280
identity drift becomes audit drift fast.

576
00:25:40,280 –> 00:25:42,800
The second decision was context architecture.

577
00:25:42,800 –> 00:25:45,520
They didn’t attach share point and pray.

578
00:25:45,520 –> 00:25:49,200
They treated knowledge like a product with owners, life cycle and versioning.

579
00:25:49,200 –> 00:25:50,880
Policies had effective dates.

580
00:25:50,880 –> 00:25:52,880
Operating procedures had controlled revisions.

581
00:25:52,880 –> 00:25:56,640
If the agent referenced content, it had to be attributable to a source that governance

582
00:25:56,640 –> 00:25:58,520
already recognized as authoritative.

583
00:25:58,520 –> 00:26:01,440
That one move eliminated a whole class of failures.

584
00:26:01,440 –> 00:26:04,600
The agent quoting plausible but obsolete guidance.

585
00:26:04,600 –> 00:26:05,760
Then came tools.

586
00:26:05,760 –> 00:26:06,760
And here’s the subtle win.

587
00:26:06,760 –> 00:26:09,720
They separated reasoning from execution by design.

588
00:26:09,720 –> 00:26:11,440
The model did the reasoning in Copilot.

589
00:26:11,440 –> 00:26:15,800
It extracted intent, detected ambiguity and assembled a structured request.

590
00:26:15,800 –> 00:26:20,080
But the actual work, creating records, updating status, rooting approvals, ran through

591
00:26:20,080 –> 00:26:21,920
deterministic workflows.

592
00:26:21,920 –> 00:26:26,360
Power platform handled the execution layer because it already supports the things enterprises

593
00:26:26,360 –> 00:26:28,120
need to stay sane.

594
00:26:28,120 –> 00:26:34,600
Typed inputs, idempotency patterns, retries, timeouts, and predictable failure states.

595
00:26:34,600 –> 00:26:38,600
Copilot didn’t wing it with a string of tool calls until it felt finished.

596
00:26:38,600 –> 00:26:40,680
It invoked defined contracts.

597
00:26:40,680 –> 00:26:43,560
And for the system of record, they didn’t try to make Copilot the ledger.

598
00:26:43,560 –> 00:26:46,560
They used service now or in equivalent as the state authority.

599
00:26:46,560 –> 00:26:49,160
Every agent initiated action wrote an entry.

600
00:26:49,160 –> 00:26:53,240
What was requested, what was executed, which workflow ran, what the outcome was, and

601
00:26:53,240 –> 00:26:57,120
what escalation occurred if something didn’t meet preconditions.

602
00:26:57,120 –> 00:27:00,040
So when someone asked why did this ticket get approved?

603
00:27:00,040 –> 00:27:02,080
The answer wasn’t a chat transcript and a shrug.

604
00:27:02,080 –> 00:27:03,760
It was a reconstructable chain.

605
00:27:03,760 –> 00:27:08,200
Now, the measured outcomes in these environments tend to look unexciting in a keynote and

606
00:27:08,200 –> 00:27:10,840
extremely valuable in operations.

607
00:27:10,840 –> 00:27:14,240
Escalation loops dropped because the agent didn’t bounce between teams based on vibe.

608
00:27:14,240 –> 00:27:16,040
It routed based on contract.

609
00:27:16,040 –> 00:27:19,480
Approvals became predictable because the agent didn’t invent preconditions.

610
00:27:19,480 –> 00:27:21,040
It validated them.

611
00:27:21,040 –> 00:27:25,680
And audits stopped being theatrical because evidence existed by default, not as a retroactive

612
00:27:25,680 –> 00:27:26,680
scramble.

613
00:27:26,680 –> 00:27:28,720
The most important outcome wasn’t time-saved.

614
00:27:28,720 –> 00:27:32,920
It was trust preserved because once users see an agent behave consistently, same intent,

615
00:27:32,920 –> 00:27:35,360
same path, they stop treating it like a novelty.

616
00:27:35,360 –> 00:27:37,240
They start treating it like a service.

617
00:27:37,240 –> 00:27:41,360
And services can be operated, monitored, improved, versioned, and scaled.

618
00:27:41,360 –> 00:27:42,560
Here’s what most people miss.

619
00:27:42,560 –> 00:27:45,120
This wasn’t a Copilot Studio success story.

620
00:27:45,120 –> 00:27:48,240
It was an architecture success story implemented with Copilot Studio.

621
00:27:48,240 –> 00:27:52,600
Same platform, same connectors, same models, different sequence, different constraints, different

622
00:27:52,600 –> 00:27:53,960
definition of done.

623
00:27:53,960 –> 00:27:57,680
And that’s why regulated enterprises often look like they’re moving slower in month one

624
00:27:57,680 –> 00:28:01,720
and then suddenly they’re the only ones with agents that survive month six.

625
00:28:01,720 –> 00:28:06,680
Because they built the boring parts first, contracts, identity boundaries, and systems of

626
00:28:06,680 –> 00:28:08,280
record.

627
00:28:08,280 –> 00:28:09,560
Conversation came last.

628
00:28:09,560 –> 00:28:12,880
Which is exactly why the opposite build order failed so reliably.

629
00:28:12,880 –> 00:28:17,080
Because when you start with chat, you end with a trust problem you can’t patch later.

630
00:28:17,080 –> 00:28:18,560
The failure case.

631
00:28:18,560 –> 00:28:21,120
The chat first agent that lost trust quietly.

632
00:28:21,120 –> 00:28:25,680
Now for the contrast case, same platform, same shiny features, completely different outcome.

633
00:28:25,680 –> 00:28:30,160
This one usually starts with a sentence like, we just need an agent that helps people.

634
00:28:30,160 –> 00:28:34,920
No outcome definition, no bounded scope, no contracts, just a chat surface and optimism.

635
00:28:34,920 –> 00:28:37,800
So a small team spins up an agent in Copilot Studio.

636
00:28:37,800 –> 00:28:40,920
They give it broad instructions, be helpful, be concise, follow policy.

637
00:28:40,920 –> 00:28:45,440
They attach a pile of SharePoint sites because more knowledge is better.

638
00:28:45,440 –> 00:28:49,520
They connect a few tools because the demo needs motion, create a ticket, update a record,

639
00:28:49,520 –> 00:28:50,840
maybe send an email.

640
00:28:50,840 –> 00:28:54,480
And because they want adoption, they deploy it in teams where everyone already lives.

641
00:28:54,480 –> 00:28:57,120
The first week looks great, people ask simple questions.

642
00:28:57,120 –> 00:28:58,160
The agent answers.

643
00:28:58,160 –> 00:28:59,880
It finds the right dog more often than not.

644
00:28:59,880 –> 00:29:01,440
It creates a few tickets.

645
00:29:01,440 –> 00:29:02,480
Leadership gets a screenshot.

646
00:29:02,480 –> 00:29:04,160
The project gets declared a win.

647
00:29:04,160 –> 00:29:05,400
Then the environment shows up.

648
00:29:05,400 –> 00:29:07,920
A user asks the same question in a slightly different way.

649
00:29:07,920 –> 00:29:09,760
The agent roots to a different topic.

650
00:29:09,760 –> 00:29:13,480
Or it doesn’t root at all and falls back to a generative answer that sounds plausible.

651
00:29:13,480 –> 00:29:17,160
Someone pests a ticket thread into the chat and the agent quietly times out.

652
00:29:17,160 –> 00:29:19,960
Someone else asks for an action that should be allowed.

653
00:29:19,960 –> 00:29:23,440
But the connector runs under a different identity model than anyone documented.

654
00:29:23,440 –> 00:29:24,520
So it fails.

655
00:29:24,520 –> 00:29:26,640
And the agent masks it with language.

656
00:29:26,640 –> 00:29:28,960
This is where trust starts eroding.

657
00:29:28,960 –> 00:29:33,280
Not from one catastrophic incident, but from tiny inconsistencies that accumulate it.

658
00:29:33,280 –> 00:29:36,080
The most damaging moments are the ones that look like success.

659
00:29:36,080 –> 00:29:38,960
The agent says done, but the record didn’t change.

660
00:29:38,960 –> 00:29:42,880
Or it created the record, but missed a required field, so downstream automation didn’t

661
00:29:42,880 –> 00:29:43,880
run.

662
00:29:43,880 –> 00:29:47,560
Or it updated the wrong object because two systems have similar names and the tool schema

663
00:29:47,560 –> 00:29:49,000
wasn’t explicit enough.

664
00:29:49,000 –> 00:29:51,000
In a chat interface, that’s just one more message.

665
00:29:51,000 –> 00:29:52,800
In operations, it’s rework.

666
00:29:52,800 –> 00:29:54,080
So humans compensate.

667
00:29:54,080 –> 00:29:56,240
They start double checking everything the agent does.

668
00:29:56,240 –> 00:29:57,560
They rerun steps manually.

669
00:29:57,560 –> 00:30:01,640
They keep the agent open as a suggestion box, but they stop letting it touch systems.

670
00:30:01,640 –> 00:30:04,240
The intervention rate climbs, but nobody measures it.

671
00:30:04,240 –> 00:30:08,080
The agent’s usage metric looks fine for a while because people still chat with it.

672
00:30:08,080 –> 00:30:09,560
They just don’t trust it.

673
00:30:09,560 –> 00:30:11,080
Then comes the second phase.

674
00:30:11,080 –> 00:30:12,080
Drift.

675
00:30:12,080 –> 00:30:13,080
Drift.

676
00:30:13,080 –> 00:30:14,080
Drift.

677
00:30:14,080 –> 00:30:15,080
Drift.

678
00:30:15,080 –> 00:30:16,080
Drift.

679
00:30:16,080 –> 00:30:17,080
Drift.

680
00:30:17,080 –> 00:30:18,080
Drift.

681
00:30:18,080 –> 00:30:19,080
Drift.

682
00:30:19,080 –> 00:30:20,080
Drift.

683
00:30:20,080 –> 00:30:21,080
Drift.

684
00:30:21,080 –> 00:30:22,080
Drift.

685
00:30:22,080 –> 00:30:23,080
Drift.

686
00:30:23,080 –> 00:30:24,080
Drift.

687
00:30:24,080 –> 00:30:25,080
Drift.

688
00:30:25,080 –> 00:30:26,080
Drift.

689
00:30:26,080 –> 00:30:27,080
Drift.

690
00:30:27,080 –> 00:30:28,080
Drift.

691
00:30:28,080 –> 00:30:29,080
Drift.

692
00:30:29,080 –> 00:30:30,080
Drift.

693
00:30:30,080 –> 00:30:33,920
Over a couple months, the agent becomes harder to predict even for its builders.

694
00:30:33,920 –> 00:30:36,280
Now add normal platform reality.

695
00:30:36,280 –> 00:30:41,600
Connector throttling, model updates, content indexing changes, permission adjustments, conditional

696
00:30:41,600 –> 00:30:43,080
access shifts.

697
00:30:43,080 –> 00:30:45,720
Nothing dramatic, just the slow churn of a real tenant.

698
00:30:45,720 –> 00:30:49,720
The agent starts behaving differently on Mondays than it did on Fridays, and the team can’t

699
00:30:49,720 –> 00:30:54,320
tell if that’s AI being AI or a dependency drifting underneath it.

700
00:30:54,320 –> 00:30:56,120
Users don’t file bug reports for that.

701
00:30:56,120 –> 00:30:57,360
They root around it.

702
00:30:57,360 –> 00:31:00,080
And this is the quiet death that most agent programs misread.

703
00:31:00,080 –> 00:31:01,560
The agent doesn’t fail loudly.

704
00:31:01,560 –> 00:31:02,720
It becomes optional.

705
00:31:02,720 –> 00:31:04,480
People stop recommending it to new hires.

706
00:31:04,480 –> 00:31:05,880
Managers stop pointing teams to it.

707
00:31:05,880 –> 00:31:07,360
The team’s channel gets less traffic.

708
00:31:07,360 –> 00:31:08,600
The agent still exists.

709
00:31:08,600 –> 00:31:11,000
The organization tells itself it’s early days.

710
00:31:11,000 –> 00:31:12,560
What actually happened is simpler.

711
00:31:12,560 –> 00:31:17,120
The system never earned the right to be trusted because its design never made outcomes repeatable.

712
00:31:17,120 –> 00:31:21,440
And if you ask the team afterward what went wrong, you’ll hear the comforting story again.

713
00:31:21,440 –> 00:31:23,160
The platform is immature.

714
00:31:23,160 –> 00:31:24,800
Or users weren’t trained.

715
00:31:24,800 –> 00:31:26,080
Or we need better prompts.

716
00:31:26,080 –> 00:31:27,560
But the pattern is architectural.

717
00:31:27,560 –> 00:31:30,560
They deployed conversation where they needed a control plane.

718
00:31:30,560 –> 00:31:31,920
They let intense day vague.

719
00:31:31,920 –> 00:31:34,520
They let tool selection be improvisational.

720
00:31:34,520 –> 00:31:36,720
They treated knowledge sprawl as context.

721
00:31:36,720 –> 00:31:39,720
They skipped a system of record for decisions and outcomes.

722
00:31:39,720 –> 00:31:41,440
They built a chat-shaped workflow.

723
00:31:41,440 –> 00:31:43,560
Then acted surprised when it behaved like one.

724
00:31:43,560 –> 00:31:46,800
So the contrast case isn’t a warning about co-pilot studio.

725
00:31:46,800 –> 00:31:48,440
It’s a warning about build order.

726
00:31:48,440 –> 00:31:51,640
Chat first feels fast because it ships a surface.

727
00:31:51,640 –> 00:31:54,200
Architecture first feels slower because it ships constraints.

728
00:31:54,200 –> 00:31:58,840
But only one of those survives contact with 30,000 users, three business units and a security

729
00:31:58,840 –> 00:32:02,720
team that eventually notices you automated the weakest path in the tenant.

730
00:32:02,720 –> 00:32:03,720
And that’s the lesson.

731
00:32:03,720 –> 00:32:04,720
Same tools.

732
00:32:04,720 –> 00:32:05,720
Different architecture.

733
00:32:05,720 –> 00:32:06,520
Different reality.

734
00:32:06,520 –> 00:32:08,200
The question becomes actionable.

735
00:32:08,200 –> 00:32:11,440
If Monday morning is real, where does an enterprise start?

736
00:32:11,440 –> 00:32:16,080
Before the next helpful agent becomes another quietly abandoned teams tab.

737
00:32:16,080 –> 00:32:17,720
Monday mandate part one.

738
00:32:17,720 –> 00:32:18,720
Start with outcomes.

739
00:32:18,720 –> 00:32:19,720
Not use cases.

740
00:32:19,720 –> 00:32:23,320
So Monday morning if you want this to stop being theater, you start with outcomes.

741
00:32:23,320 –> 00:32:24,320
Not use cases.

742
00:32:24,320 –> 00:32:25,320
Not ideas.

743
00:32:25,320 –> 00:32:27,320
Not what co-pilot studio can do.

744
00:32:27,320 –> 00:32:28,320
Outcomes.

745
00:32:28,320 –> 00:32:31,320
Because use cases are how organizations hide from accountability.

746
00:32:31,320 –> 00:32:32,320
A use case can be anything.

747
00:32:32,320 –> 00:32:33,320
It can be a demo.

748
00:32:33,320 –> 00:32:34,320
It can be a chatbot.

749
00:32:34,320 –> 00:32:37,320
It can be a sharepoint search box with better manners.

750
00:32:37,320 –> 00:32:41,880
And it can still produce exactly zero operational improvement while everyone nods like progress

751
00:32:41,880 –> 00:32:42,880
happened.

752
00:32:42,880 –> 00:32:43,880
An outcome is different.

753
00:32:43,880 –> 00:32:46,040
An outcome forces a before and after.

754
00:32:46,040 –> 00:32:47,400
Cycle time moves.

755
00:32:47,400 –> 00:32:48,960
Escalation rate drops.

756
00:32:48,960 –> 00:32:49,960
Thruput increases.

757
00:32:49,960 –> 00:32:51,760
Rework decreases.

758
00:32:51,760 –> 00:32:53,960
Compliance evidence becomes easier to produce.

759
00:32:53,960 –> 00:32:54,960
Those are outcomes.

760
00:32:54,960 –> 00:32:58,640
They’re measurable deltas that survive outside the agent teams slide deck.

761
00:32:58,640 –> 00:33:00,640
And the first thing leadership has to accept is this.

762
00:33:00,640 –> 00:33:03,200
If the outcome can’t be measured, the agent isn’t a product.

763
00:33:03,200 –> 00:33:04,600
It’s a distraction.

764
00:33:04,600 –> 00:33:08,120
So define the outcome in the language the business already understands.

765
00:33:08,120 –> 00:33:12,920
For service and operations, mean time to resolution, first contact resolution, deflection

766
00:33:12,920 –> 00:33:17,920
rate with strict definitions, ticket reopen rate, approval time, exception volume.

767
00:33:17,920 –> 00:33:24,800
For sales, qualified opportunities, time to quote, proposal cycle time, RFP throughput.

768
00:33:24,800 –> 00:33:30,040
For HR onboarding completion time, case backlog, handoff count, error rate in payroll or benefits

769
00:33:30,040 –> 00:33:35,160
changes, not vanity metrics like number of conversations, and not sentiment metrics like

770
00:33:35,160 –> 00:33:39,640
users said it was helpful, unless helpful is tied to a measured operational delta.

771
00:33:39,640 –> 00:33:42,800
Next map the value stream, not the tool chain, the value stream.

772
00:33:42,800 –> 00:33:44,440
Where does the decision actually happen?

773
00:33:44,440 –> 00:33:46,160
Where does state actually change?

774
00:33:46,160 –> 00:33:47,560
Where does risk concentrate?

775
00:33:47,560 –> 00:33:49,000
Where do humans get pulled into loops?

776
00:33:49,000 –> 00:33:50,480
Because the system doesn’t know what to do.

777
00:33:50,480 –> 00:33:52,680
This is where most agent programs get exposed.

778
00:33:52,680 –> 00:33:56,520
They pick the most visible pain point, not the most architecturally leverageable one.

779
00:33:56,520 –> 00:33:58,800
They automate questions because questions are easy.

780
00:33:58,800 –> 00:34:02,920
They avoid decisions because decisions create accountability, but ROI lives in decisions,

781
00:34:02,920 –> 00:34:05,960
specifically repeatable decisions with bounded variation.

782
00:34:05,960 –> 00:34:08,520
So you’re looking for work that has three properties.

783
00:34:08,520 –> 00:34:11,640
First repeatable intent, the request shows up in recognizable forms.

784
00:34:11,640 –> 00:34:15,800
Users ask it a hundred times a week, it’s not a one off, solve my unique situation.

785
00:34:15,800 –> 00:34:17,400
It’s a recurring demand signal.

786
00:34:17,400 –> 00:34:19,000
Second bounded variation.

787
00:34:19,000 –> 00:34:23,040
There are edge cases, but the majority of requests fall into a small number of shapes.

788
00:34:23,040 –> 00:34:25,920
You can enumerate them, you can define what normal looks like.

789
00:34:25,920 –> 00:34:29,920
If every request is a snowflake, you don’t have an agent opportunity, you have a consulting

790
00:34:29,920 –> 00:34:30,920
practice.

791
00:34:30,920 –> 00:34:32,680
Third, clear success criteria.

792
00:34:32,680 –> 00:34:34,800
You can say what done means without a meeting.

793
00:34:34,800 –> 00:34:36,720
The ticket is created with these fields.

794
00:34:36,720 –> 00:34:38,880
The approval is captured with these conditions.

795
00:34:38,880 –> 00:34:41,800
The record is updated and the downstream automation ran.

796
00:34:41,800 –> 00:34:44,840
If success is subjective, you’ll get subjective behavior.

797
00:34:44,840 –> 00:34:48,720
And subjective behavior is just another name for non-determinism.

798
00:34:48,720 –> 00:34:50,360
Now once you’ve done that, you make a decision.

799
00:34:50,360 –> 00:34:54,560
Most teams avoid, are you building a discovery experience or an execution experience?

800
00:34:54,560 –> 00:34:56,480
Different experiences optimize for understanding.

801
00:34:56,480 –> 00:34:57,720
They tolerate ambiguity.

802
00:34:57,720 –> 00:35:01,400
They help users find the right policy, the right owner, the right option, the right next

803
00:35:01,400 –> 00:35:02,400
step.

804
00:35:02,400 –> 00:35:06,800
They can live in chat with, relatively, low risk because the output is guidance.

805
00:35:06,800 –> 00:35:09,000
Execution experiences optimize for changing state.

806
00:35:09,000 –> 00:35:11,920
They require contracts, validation, and systems of record.

807
00:35:11,920 –> 00:35:15,520
They can still use chat as an interface, but the design is dominated by control.

808
00:35:15,520 –> 00:35:16,840
You don’t mix these casually.

809
00:35:16,840 –> 00:35:21,200
If you do, you’ll end up with an agent that helps users and occasionally performs actions,

810
00:35:21,200 –> 00:35:22,960
which is the worst possible positioning.

811
00:35:22,960 –> 00:35:26,360
The actions will be blamed when they’re wrong and the help will be ignored when it’s

812
00:35:26,360 –> 00:35:27,360
right.

813
00:35:27,360 –> 00:35:28,360
So pick one per workflow.

814
00:35:28,360 –> 00:35:32,240
If the goal is execution, design the conversation to collect missing parameters and confirm

815
00:35:32,240 –> 00:35:33,240
intent.

816
00:35:33,240 –> 00:35:37,760
If the goal is discovery explicitly refused action and wrote to the appropriate system,

817
00:35:37,760 –> 00:35:42,120
then you tie the outcome to an operating rhythm who owns the metric, who reviews it weekly,

818
00:35:42,120 –> 00:35:45,400
who decides what changes when the metric drifts, because the agent will drift.

819
00:35:45,400 –> 00:35:49,680
The only question is whether drift becomes silent decay or managed iteration.

820
00:35:49,680 –> 00:35:52,160
This is why starting with outcomes is not bureaucratic.

821
00:35:52,160 –> 00:35:54,600
As a control mechanism, it forces bounded scope.

822
00:35:54,600 –> 00:35:55,600
It forces measurement.

823
00:35:55,600 –> 00:35:57,160
It forces a system of record.

824
00:35:57,160 –> 00:36:00,960
It forces you to admit what you’re delegating and once outcomes are clear, the next part

825
00:36:00,960 –> 00:36:02,720
becomes non-negotiable.

826
00:36:02,720 –> 00:36:03,720
Boundaries.

827
00:36:03,720 –> 00:36:05,040
Not best practices.

828
00:36:05,040 –> 00:36:06,880
Not guidelines.

829
00:36:06,880 –> 00:36:10,920
Boundaries that define what the agent is allowed to do, what must be true before it does

830
00:36:10,920 –> 00:36:13,760
it and exactly where it must stop.

831
00:36:13,760 –> 00:36:15,480
Monday mandate part two.

832
00:36:15,480 –> 00:36:17,960
Intent contracts and decision boundaries.

833
00:36:17,960 –> 00:36:20,840
Once you’ve picked outcomes, you don’t brainstorm prompts.

834
00:36:20,840 –> 00:36:24,160
You write contracts because the agent doesn’t need permission to talk.

835
00:36:24,160 –> 00:36:25,640
It needs permission to act.

836
00:36:25,640 –> 00:36:29,600
An intent contract is the simplest artifact that forces that discipline.

837
00:36:29,600 –> 00:36:34,440
It’s a plain language definition of one intent with a machine enforceable boundary behind

838
00:36:34,440 –> 00:36:35,440
it.

839
00:36:35,440 –> 00:36:36,880
It answers five questions.

840
00:36:36,880 –> 00:36:38,080
What the intent is.

841
00:36:38,080 –> 00:36:39,720
What the agent is allowed to do.

842
00:36:39,720 –> 00:36:41,280
What inputs are required.

843
00:36:41,280 –> 00:36:44,120
What systems it may touch and what evidence it must produce.

844
00:36:44,120 –> 00:36:48,240
But if that sounds like paperwork, good paperwork is what keeps state change from becoming

845
00:36:48,240 –> 00:36:49,240
folklore.

846
00:36:49,240 –> 00:36:50,640
Start with the intent itself.

847
00:36:50,640 –> 00:36:55,240
Let’s say it’s a system operation, not like a marketing feature, create service request,

848
00:36:55,240 –> 00:37:01,720
reset MFA method, generate RFP draft, submit onboarding task, update incident status.

849
00:37:01,720 –> 00:37:03,920
If you can’t name it crisply, it’s not an intent.

850
00:37:03,920 –> 00:37:05,480
It’s a category of vibes.

851
00:37:05,480 –> 00:37:08,600
Then define the allowed actions and write them like an allow list.

852
00:37:08,600 –> 00:37:10,360
Not help with onboarding it does.

853
00:37:10,360 –> 00:37:13,760
Instead create a service now request using catalog item X.

854
00:37:13,760 –> 00:37:16,920
Assign the request to group Y based on parameter Z.

855
00:37:16,920 –> 00:37:20,240
Send notification to manager via teams using template T.

856
00:37:20,240 –> 00:37:22,440
The agent can only do what’s enumerated.

857
00:37:22,440 –> 00:37:24,880
Everything else becomes refusal or escalation by design.

858
00:37:24,880 –> 00:37:28,800
This is where teams usually push back with, but users won’t know what to ask for.

859
00:37:28,800 –> 00:37:29,800
That’s fine.

860
00:37:29,800 –> 00:37:33,800
The model can translate messy language into a known intent, but translation is not authorization.

861
00:37:33,800 –> 00:37:36,120
The intent contract is authorization.

862
00:37:36,120 –> 00:37:37,520
Next required inputs.

863
00:37:37,520 –> 00:37:40,720
Every action capable intent has a minimum parameter set.

864
00:37:40,720 –> 00:37:43,360
If the parameters aren’t present, the agent doesn’t guess.

865
00:37:43,360 –> 00:37:48,680
It asks and it asks narrowly, for example, which user, which system, which effective date,

866
00:37:48,680 –> 00:37:49,960
which costs center.

867
00:37:49,960 –> 00:37:52,640
Which approver, these aren’t conversational flourishes.

868
00:37:52,640 –> 00:37:55,080
They are preconditions for safe execution.

869
00:37:55,080 –> 00:37:56,920
And the design move here is subtle.

870
00:37:56,920 –> 00:38:00,840
The agent should never request information it can deterministically retrieve.

871
00:38:00,840 –> 00:38:04,680
If Entra can provide the user’s ID, the agent shouldn’t ask the user to type it.

872
00:38:04,680 –> 00:38:09,560
If service now can return the ticket number from context, don’t make the user restate it.

873
00:38:09,560 –> 00:38:13,760
The only questions the agent asks are the missing fields required to execute the contract.

874
00:38:13,760 –> 00:38:18,200
That keeps the conversation surface minimal and keeps the system boundary explicit.

875
00:38:18,200 –> 00:38:19,520
Now define preconditions.

876
00:38:19,520 –> 00:38:23,120
Pre-conditions are the must be true before tool invocation rules.

877
00:38:23,120 –> 00:38:27,560
This is where most agent programs silently fail because they treat policies as documentation

878
00:38:27,560 –> 00:38:29,920
instead of runtime gates.

879
00:38:29,920 –> 00:38:32,040
Pre-conditions are executable checks.

880
00:38:32,040 –> 00:38:33,400
User has role X.

881
00:38:33,400 –> 00:38:34,960
Ticket is in state Y.

882
00:38:34,960 –> 00:38:37,640
Record exists and is owned by this business unit.

883
00:38:37,640 –> 00:38:42,200
MFA reset allowed only if identity risk score is below threshold.

884
00:38:42,200 –> 00:38:45,240
Approval must be present before status change.

885
00:38:45,240 –> 00:38:48,760
In a deterministic design, preconditions run before any state change.

886
00:38:48,760 –> 00:38:50,600
If a precondition fails, the agent stops.

887
00:38:50,600 –> 00:38:52,000
It does not attempt to work around.

888
00:38:52,000 –> 00:38:53,480
It does not pick an adjacent tool.

889
00:38:53,480 –> 00:38:57,160
It does not rewrite the policy in friendly language and proceed anyway.

890
00:38:57,160 –> 00:38:59,560
It escalates, which means you need refusal rules.

891
00:38:59,560 –> 00:39:00,560
Refusal rules aren’t.

892
00:39:00,560 –> 00:39:02,520
The agent says no sometimes.

893
00:39:02,520 –> 00:39:04,000
They are explicit boundaries.

894
00:39:04,000 –> 00:39:07,960
If the request touches privileged access, refuse and root to human approval.

895
00:39:07,960 –> 00:39:12,160
If the user lacks entitlement, refuse and provide the self-service path.

896
00:39:12,160 –> 00:39:15,600
If the context is ambiguous, refuse and ask for clarification.

897
00:39:15,600 –> 00:39:20,160
If the system of record is unavailable, refuse and create a fallback ticket.

898
00:39:20,160 –> 00:39:21,160
Refusal is a feature.

899
00:39:21,160 –> 00:39:23,160
It’s the control plane proving it exists.

900
00:39:23,160 –> 00:39:27,720
Then comes separation of concerns because contracts only work when the architecture respects

901
00:39:27,720 –> 00:39:28,720
them.

902
00:39:28,720 –> 00:39:30,840
Intent capture lives in the conversation layer.

903
00:39:30,840 –> 00:39:32,880
Decisioning lives in the orchestration layer.

904
00:39:32,880 –> 00:39:34,680
Execution lives in deterministic tools.

905
00:39:34,680 –> 00:39:36,120
Recording lives in the system of record.

906
00:39:36,120 –> 00:39:40,360
If you collapse those layers back into a chat transcript, you’ll get the same failure again,

907
00:39:40,360 –> 00:39:41,640
just with nicer phrasing.

908
00:39:41,640 –> 00:39:44,480
So on Monday, you don’t start by improving the agent.

909
00:39:44,480 –> 00:39:48,680
You start by writing three to five intent contracts for one outcome and you enforce them

910
00:39:48,680 –> 00:39:52,600
with decision boundaries, the model cannot talk its way around.

911
00:39:52,600 –> 00:39:54,400
And you’ll notice something immediately.

912
00:39:54,400 –> 00:39:58,480
The agent becomes less magical and more useful because it becomes predictable.

913
00:39:58,480 –> 00:40:02,600
Now contracts alone still don’t save you because contracts without identity discipline

914
00:40:02,600 –> 00:40:04,280
are just aspirational text.

915
00:40:04,280 –> 00:40:06,680
So the next question isn’t what can the agent do.

916
00:40:06,680 –> 00:40:09,360
It’s the one nobody wants to answer in a steering committee.

917
00:40:09,360 –> 00:40:11,120
Who does the agent run as?

918
00:40:11,120 –> 00:40:12,440
Identity as control plane.

919
00:40:12,440 –> 00:40:14,080
Who does the agent run as?

920
00:40:14,080 –> 00:40:18,880
If you want one question that exposes whether an agent program is real or just a pilot hobby,

921
00:40:18,880 –> 00:40:19,880
it’s this.

922
00:40:19,880 –> 00:40:20,880
Who does the agent run as?

923
00:40:20,880 –> 00:40:21,880
Not who built it?

924
00:40:21,880 –> 00:40:23,840
Not who owns the team’s channel?

925
00:40:23,840 –> 00:40:25,560
Not who pays for the license.

926
00:40:25,560 –> 00:40:30,520
At runtime, what identity actually calls the tool touches the data and changes state?

927
00:40:30,520 –> 00:40:32,520
Because the agent doesn’t act as a concept.

928
00:40:32,520 –> 00:40:33,520
It acts as a token.

929
00:40:33,520 –> 00:40:38,760
And tokens obey the rules of entra, conditional access, connector, or whatever accidental

930
00:40:38,760 –> 00:40:41,800
privilege you left lying around in the tenant.

931
00:40:41,800 –> 00:40:45,680
There are two dominant execution models and both come with consequences you don’t get

932
00:40:45,680 –> 00:40:46,920
to negotiate.

933
00:40:46,920 –> 00:40:49,480
Model one, user delegated execution.

934
00:40:49,480 –> 00:40:52,760
That means the agent calls tools using the signed in user’s permissions.

935
00:40:52,760 –> 00:40:54,560
It can only retrieve what they can retrieve.

936
00:40:54,560 –> 00:40:56,600
It can only update what they can update.

937
00:40:56,600 –> 00:41:00,520
In enterprise terms, this is the least surprising model because it preserves the existing access

938
00:41:00,520 –> 00:41:01,520
graph.

939
00:41:01,520 –> 00:41:03,480
It also gives you the cleanest accountability story.

940
00:41:03,480 –> 00:41:08,160
The user asked the user had access the action happened, but the cost is operational fragility.

941
00:41:08,160 –> 00:41:11,800
User delegated execution inherits every problem in human identity.

942
00:41:11,800 –> 00:41:16,960
MFA prompts, session expiry, conditional access changes, device compliance, role changes

943
00:41:16,960 –> 00:41:17,960
and licensing mismatches.

944
00:41:17,960 –> 00:41:22,080
And it also means two users can get two different outcomes from the same request because

945
00:41:22,080 –> 00:41:23,560
their entitlements differ.

946
00:41:23,560 –> 00:41:25,520
That’s not AI in consistency.

947
00:41:25,520 –> 00:41:26,920
That’s identity truth.

948
00:41:26,920 –> 00:41:30,840
And when people complain that an agent is unreliable, half the time they’re describing

949
00:41:30,840 –> 00:41:32,960
authorization variability.

950
00:41:32,960 –> 00:41:35,840
Model two, service executed execution.

951
00:41:35,840 –> 00:41:38,920
That means the agent calls tools under a non-human identity.

952
00:41:38,920 –> 00:41:43,520
A service principle managed identity or a dedicated run is account with fixed permissions.

953
00:41:43,520 –> 00:41:48,000
This is the model teams choose when they want consistent behavior across users or when

954
00:41:48,000 –> 00:41:51,240
the workflow must run without a specific person present.

955
00:41:51,240 –> 00:41:55,560
It is also the model that quietly creates the biggest blast radius in your environment.

956
00:41:55,560 –> 00:41:59,920
Because now the agent has a stable reusable identity with standing privileged.

957
00:41:59,920 –> 00:42:02,520
If you overscope it, you didn’t just create an agent.

958
00:42:02,520 –> 00:42:05,640
You created an automation back door with a chat interface.

959
00:42:05,640 –> 00:42:09,800
And once that exists, the question shifts from can the user do this to can the agent do

960
00:42:09,800 –> 00:42:12,440
this, which is exactly the wrong direction for governance.

961
00:42:12,440 –> 00:42:13,840
So the rule is simple.

962
00:42:13,840 –> 00:42:15,880
Least privilege isn’t a policy document.

963
00:42:15,880 –> 00:42:19,000
It’s an architectural boundary enforced by identity design.

964
00:42:19,000 –> 00:42:22,240
Every tool you expose has to bind to an identity model intentionally.

965
00:42:22,240 –> 00:42:25,440
If it runs as the user, you accept variance and you design for it.

966
00:42:25,440 –> 00:42:29,080
If it runs as a service, you constrain permissions to the minimum contract set and you treat

967
00:42:29,080 –> 00:42:30,440
changes like code.

968
00:42:30,440 –> 00:42:33,840
Now layer entra on top of this because entra is not a branding exercise.

969
00:42:33,840 –> 00:42:39,280
It is the policy engine that decides whether your architecture exists at runtime.

970
00:42:39,280 –> 00:42:43,960
Conditional access policies will shape agent behavior in ways builders often don’t predict.

971
00:42:43,960 –> 00:42:45,840
Step up or sign in risk.

972
00:42:45,840 –> 00:42:47,160
Device posture.

973
00:42:47,160 –> 00:42:48,360
Location constraints.

974
00:42:48,360 –> 00:42:49,360
Session lifetime.

975
00:42:49,360 –> 00:42:50,880
Token issuance controls.

976
00:42:50,880 –> 00:42:53,680
The agent doesn’t get exceptions because the maker is excited.

977
00:42:53,680 –> 00:42:58,480
If your action path depends on a token that CA will sometimes deny, your agent is probabilistic

978
00:42:58,480 –> 00:43:00,200
before the model even speaks.

979
00:43:00,200 –> 00:43:02,000
And then there’s the slow killer.

980
00:43:02,000 –> 00:43:03,000
Identity drift.

981
00:43:03,000 –> 00:43:05,520
Over time, access reviews don’t happen.

982
00:43:05,520 –> 00:43:07,600
People get added to groups temporarily.

983
00:43:07,600 –> 00:43:10,640
App registrations accumulate permissions because it unblocked a project.

984
00:43:10,640 –> 00:43:11,640
Owners leave.

985
00:43:11,640 –> 00:43:13,040
Secrets expire.

986
00:43:13,040 –> 00:43:14,360
Service accounts get reused.

987
00:43:14,360 –> 00:43:16,680
You don’t notice because the system still mostly works.

988
00:43:16,680 –> 00:43:20,080
Then you add an agent and it starts traversing those pathways at scale.

989
00:43:20,080 –> 00:43:22,040
The agent doesn’t introduce new entropy.

990
00:43:22,040 –> 00:43:23,640
It operationalizes the entropy.

991
00:43:23,640 –> 00:43:24,640
You already tolerated.

992
00:43:24,640 –> 00:43:26,960
So Monday morning identity work is brutally specific.

993
00:43:26,960 –> 00:43:29,200
You decide the runners model per intent contract.

994
00:43:29,200 –> 00:43:32,000
You document it as part of the contract, not as tribal knowledge.

995
00:43:32,000 –> 00:43:36,280
You prove it with testing under different user profiles because works for me is not an

996
00:43:36,280 –> 00:43:37,760
identity strategy.

997
00:43:37,760 –> 00:43:41,880
And you establish and access review cadence for the identities that matter because agents

998
00:43:41,880 –> 00:43:44,160
don’t stay inside the boundaries you hoped for.

999
00:43:44,160 –> 00:43:46,520
They stay inside the boundaries you enforced.

1000
00:43:46,520 –> 00:43:50,280
Once you answer who does it run as, a second truth becomes obvious.

1001
00:43:50,280 –> 00:43:53,800
Access determines what knowledge is even possible for the agent to retrieve.

1002
00:43:53,800 –> 00:43:57,720
And that’s where context architecture stops being a content project and becomes an engineering

1003
00:43:57,720 –> 00:43:59,000
discipline.

1004
00:43:59,000 –> 00:44:04,920
And architecture manage knowledge beats attach SharePoint and pray once identity is explicit

1005
00:44:04,920 –> 00:44:08,840
context becomes the next point of failure not because models can’t retrieve information

1006
00:44:08,840 –> 00:44:09,840
they can.

1007
00:44:09,840 –> 00:44:14,120
The failure is that enterprises treat knowledge as an attachment, not as a managed product

1008
00:44:14,120 –> 00:44:17,840
and that mistake scales beautifully because the fastest way to ship an agent is to point

1009
00:44:17,840 –> 00:44:21,280
at SharePoint dump in a few PDFs at a site or two and call it done.

1010
00:44:21,280 –> 00:44:22,280
It feels like progress.

1011
00:44:22,280 –> 00:44:26,800
The agent starts answering questions people see citations everyone relaxes.

1012
00:44:26,800 –> 00:44:30,880
And the boring questions arrive which version of that policy did it use who owns that page

1013
00:44:30,880 –> 00:44:32,240
what changed last week.

1014
00:44:32,240 –> 00:44:35,280
Why did the agent quote something that was replaced three months ago?

1015
00:44:35,280 –> 00:44:38,200
Why does it contradict what the service desk tells users?

1016
00:44:38,200 –> 00:44:39,520
This is the uncomfortable truth.

1017
00:44:39,520 –> 00:44:42,520
Retrieval is only as safe as the knowledge lifecycle behind it.

1018
00:44:42,520 –> 00:44:45,200
So context architecture starts with a redefinition.

1019
00:44:45,200 –> 00:44:46,520
Knowledge sources aren’t sources.

1020
00:44:46,520 –> 00:44:50,720
Their dependencies and dependencies require ownership, life cycle, change control and version

1021
00:44:50,720 –> 00:44:51,720
discipline.

1022
00:44:51,720 –> 00:44:55,960
Without those, an agent becomes a high speed amplifier for outdated guidance.

1023
00:44:56,960 –> 00:44:59,040
The first design rule is separation.

1024
00:44:59,040 –> 00:45:01,280
Static knowledge is not operational data.

1025
00:45:01,280 –> 00:45:04,800
Static knowledge is policy, procedures, FAQs and reference material.

1026
00:45:04,800 –> 00:45:06,760
It changes but it changes relatively slowly.

1027
00:45:06,760 –> 00:45:09,280
It should be curated, reviewed and attributable.

1028
00:45:09,280 –> 00:45:13,080
It should have a defined effective date concept even if it’s informal.

1029
00:45:13,080 –> 00:45:17,360
Most importantly, it should be treated as a product with an owner who cares when it’s wrong.

1030
00:45:17,360 –> 00:45:22,400
Operational data is tickets, approvals, user records, asset state, entitlements and anything

1031
00:45:22,400 –> 00:45:24,000
that represents current truth.

1032
00:45:24,000 –> 00:45:25,920
You don’t want that as documents.

1033
00:45:25,920 –> 00:45:31,320
You want it through deterministic tools with schema, with validation and with authorization

1034
00:45:31,320 –> 00:45:32,320
boundaries.

1035
00:45:32,320 –> 00:45:36,120
Operational data belongs in the execution and record layers, not in a pile of files the

1036
00:45:36,120 –> 00:45:37,320
model rummages through.

1037
00:45:37,320 –> 00:45:41,720
When teams mix these, they get an agent that confidently answers with yesterday’s reality

1038
00:45:41,720 –> 00:45:44,040
and then executes against today’s systems.

1039
00:45:44,040 –> 00:45:45,280
That gap creates incidents.

1040
00:45:45,280 –> 00:45:46,880
The second rule is bounding.

1041
00:45:46,880 –> 00:45:48,240
Context has to be intentionally narrow.

1042
00:45:48,240 –> 00:45:51,360
The human impulse is to say, “Give it everything so it has the best chance.”

1043
00:45:51,360 –> 00:45:56,680
That’s exactly backwards. Overbroad context increases ambiguity and latency and it drives

1044
00:45:56,680 –> 00:46:00,520
the model into probabilistic selection, which chunk matters, which policy applies, which

1045
00:46:00,520 –> 00:46:02,320
document is authoritative.

1046
00:46:02,320 –> 00:46:04,160
More context doesn’t mean more accuracy.

1047
00:46:04,160 –> 00:46:06,360
It means more opportunities to be wrong with confidence.

1048
00:46:06,360 –> 00:46:12,280
So instead of HR SharePoint, context needs to look like this policy set, these procedures,

1049
00:46:12,280 –> 00:46:17,000
these approved templates, this set of known definitions, this glossary of terms, bounded,

1050
00:46:17,000 –> 00:46:18,320
versioned and attributable.

1051
00:46:18,320 –> 00:46:22,080
That’s also how you prevent cross-business unit contamination, where one group’s process

1052
00:46:22,080 –> 00:46:26,040
becomes the answer for everyone because the agent found it first.

1053
00:46:26,040 –> 00:46:27,440
The third rule is life cycle.

1054
00:46:27,440 –> 00:46:29,680
If the agent can cite it, someone has to maintain it.

1055
00:46:29,680 –> 00:46:34,040
That means when a policy changes, the knowledge artifact is updated, the old one is archived

1056
00:46:34,040 –> 00:46:38,520
or marked clearly and the agent is re-evaluated against the intents that depend on it, not as

1057
00:46:38,520 –> 00:46:40,280
a nice to have it as a release step.

1058
00:46:40,280 –> 00:46:42,080
If that sounds heavy, good.

1059
00:46:42,080 –> 00:46:43,920
Enterprises already do this for code.

1060
00:46:43,920 –> 00:46:45,560
Context is now part of the runtime.

1061
00:46:45,560 –> 00:46:46,760
Therefore it gets the same discipline.

1062
00:46:46,760 –> 00:46:48,440
Now, a practical warning.

1063
00:46:48,440 –> 00:46:51,520
PDF-heavy approaches are a performance and reliability tax.

1064
00:46:51,520 –> 00:46:56,160
PDFs tend to be long, inconsistent in structure and full of duplicated sections.

1065
00:46:56,160 –> 00:46:58,680
Retrieval becomes slow, chunking becomes sloppy.

1066
00:46:58,680 –> 00:47:02,440
The model retrieves partial paragraphs that lose the conditions and exceptions that made

1067
00:47:02,440 –> 00:47:03,840
the policy safe.

1068
00:47:03,840 –> 00:47:07,960
And when response latency increases, you hit the exact failure modes, people are seeing

1069
00:47:07,960 –> 00:47:12,840
in real deployments, timeouts, non-responses, fallback behavior and users resending the same

1070
00:47:12,840 –> 00:47:15,880
request until the agent does something twice.

1071
00:47:15,880 –> 00:47:20,840
So if the context is important enough to govern behavior, it’s important enough to structure,

1072
00:47:20,840 –> 00:47:25,900
convert key guidance into stable pages, structured content, or curated knowledge entries with

1073
00:47:25,900 –> 00:47:27,760
clear titles and scope.

1074
00:47:27,760 –> 00:47:30,920
Treat the source of truth as a design choice, not an accident.

1075
00:47:30,920 –> 00:47:32,480
The last rule is attribution.

1076
00:47:32,480 –> 00:47:36,200
The agent must be able to say where it got the answer and the organization must be able

1077
00:47:36,200 –> 00:47:38,800
to trace that source back to an owner and a version.

1078
00:47:38,800 –> 00:47:40,320
Otherwise, you don’t have knowledge.

1079
00:47:40,320 –> 00:47:41,520
You have plausible text.

1080
00:47:41,520 –> 00:47:46,760
This is how context becomes an architectural asset instead of an embarrassment in audit meetings.

1081
00:47:46,760 –> 00:47:48,600
So the Monday move is simple.

1082
00:47:48,600 –> 00:47:53,320
Stop thinking about knowledge as things we attach and start thinking about it as managed

1083
00:47:53,320 –> 00:47:55,120
context with boundaries.

1084
00:47:55,120 –> 00:47:59,920
Curated, version it, own it, test it because once context is bounded and trustworthy, tool

1085
00:47:59,920 –> 00:48:04,600
design becomes the real enforcement layer and tools are where determinism either exists

1086
00:48:04,600 –> 00:48:07,080
or collapses back into improvisation.

1087
00:48:07,080 –> 00:48:08,720
Tool first rooting.

1088
00:48:08,720 –> 00:48:10,880
Tools are contracts, not features.

1089
00:48:10,880 –> 00:48:14,160
Once context is bounded, the next failure point is predictable.

1090
00:48:14,160 –> 00:48:15,160
Tools.

1091
00:48:15,160 –> 00:48:16,760
Most teams treat tools like features.

1092
00:48:16,760 –> 00:48:18,520
Let’s connect outlook.

1093
00:48:18,520 –> 00:48:20,520
Let’s add service now.

1094
00:48:20,520 –> 00:48:21,920
Let’s give it dataverse.

1095
00:48:21,920 –> 00:48:25,120
As if the agent is collecting plugins like a browser.

1096
00:48:25,120 –> 00:48:27,480
That thinking is why agents drift into chaos.

1097
00:48:27,480 –> 00:48:29,800
In an enterprise, a tool is not a capability.

1098
00:48:29,800 –> 00:48:30,800
It’s a contract.

1099
00:48:30,800 –> 00:48:35,000
It is an explicitly exposed pathway to change state in a system you already struggle to

1100
00:48:35,000 –> 00:48:36,000
govern.

1101
00:48:36,000 –> 00:48:37,000
So tool first rooting means this.

1102
00:48:37,000 –> 00:48:40,360
You design the tool surface area before you design the conversation.

1103
00:48:40,360 –> 00:48:43,400
Because the conversation is just how users request contracts.

1104
00:48:43,400 –> 00:48:44,960
The contracts are what actually happen.

1105
00:48:44,960 –> 00:48:47,200
A tool contract has four parts.

1106
00:48:47,200 –> 00:48:50,440
Purpose, inputs, outputs and failure modes.

1107
00:48:50,440 –> 00:48:52,600
Purpose is not integrates with service now.

1108
00:48:52,600 –> 00:48:56,880
Purpose is create incident with these required fields in these assignment groups under these

1109
00:48:56,880 –> 00:48:57,880
conditions.

1110
00:48:57,880 –> 00:48:59,880
Inputs are typed parameters.

1111
00:48:59,880 –> 00:49:05,120
Ticket category, affected service, urgency, request identity, business unit.

1112
00:49:05,120 –> 00:49:06,440
Outputs are not success.

1113
00:49:06,440 –> 00:49:10,360
Purpose are record ID, state, timestamps and any downstream workflow status.

1114
00:49:10,360 –> 00:49:15,000
And failure modes are where adult architecture lives, permission denied, validation error,

1115
00:49:15,000 –> 00:49:20,200
throttling, timeout, duplicate detection, partial success and system unavailable.

1116
00:49:20,200 –> 00:49:23,760
If you can’t name the failure modes, you can’t operate the agent.

1117
00:49:23,760 –> 00:49:24,760
You’re just hoping.

1118
00:49:24,760 –> 00:49:27,880
This is also where deterministic execution actually gets enforced.

1119
00:49:27,880 –> 00:49:29,200
The model can reason all day.

1120
00:49:29,200 –> 00:49:33,000
The only thing that matters is whether the action layer accepts or rejects the request

1121
00:49:33,000 –> 00:49:34,400
based on explicit rules.

1122
00:49:34,400 –> 00:49:38,080
That’s why power automate logic apps and API backed actions belong here.

1123
00:49:38,080 –> 00:49:43,240
They validate inputs, they apply fixed mappings, they return structured errors and they can be

1124
00:49:43,240 –> 00:49:45,200
wrapped in idempotent patterns.

1125
00:49:45,200 –> 00:49:48,160
So retries don’t create duplicate side effects.

1126
00:49:48,160 –> 00:49:50,640
An agent calling a tool is not automation.

1127
00:49:50,640 –> 00:49:54,120
It’s invoking a contract so the allow list matters more than the prompt.

1128
00:49:54,120 –> 00:49:57,080
Tool first routing starts by shrinking the surface area.

1129
00:49:57,080 –> 00:49:59,520
Enterprises love broad tools because they feel flexible.

1130
00:49:59,520 –> 00:50:02,320
Let the agent read and write anything in dataverse.

1131
00:50:02,320 –> 00:50:04,760
Give it access to all sharepoint sites.

1132
00:50:04,760 –> 00:50:07,040
Allow create update delete so it can help.

1133
00:50:07,040 –> 00:50:09,480
That flexibility is just unpriced risk.

1134
00:50:09,480 –> 00:50:11,280
The principle is simple.

1135
00:50:11,280 –> 00:50:15,680
Prohibit create, update and delete unless the business sponsor can explain the exact

1136
00:50:15,680 –> 00:50:19,560
outcome metric it supports and the exact audit evidence it will produce.

1137
00:50:19,560 –> 00:50:21,280
Read only tools are your default.

1138
00:50:21,280 –> 00:50:22,600
Write tools are exceptions.

1139
00:50:22,600 –> 00:50:24,560
Delete tools are almost never defensible.

1140
00:50:24,560 –> 00:50:29,600
And when you do allow write actions, you split them into narrow single purpose operations.

1141
00:50:29,600 –> 00:50:33,440
Add update ticket instead set ticket state from new to in progress.

1142
00:50:33,440 –> 00:50:35,560
Add work note with a required template.

1143
00:50:35,560 –> 00:50:37,680
Assign to group from an approved list.

1144
00:50:37,680 –> 00:50:38,840
Each one becomes governable.

1145
00:50:38,840 –> 00:50:40,400
Now tool selection policy.

1146
00:50:40,400 –> 00:50:44,360
This is the part everyone hand waves with the model will choose the right action.

1147
00:50:44,360 –> 00:50:45,360
That’s not governance.

1148
00:50:45,360 –> 00:50:47,360
That’s gambling with a nicer UI.

1149
00:50:47,360 –> 00:50:50,480
Tool selection has to be deterministic from the organization’s point of view.

1150
00:50:50,480 –> 00:50:54,280
Which means you either root based on intent contracts and preconditions or you don’t

1151
00:50:54,280 –> 00:50:55,280
root at all.

1152
00:50:55,280 –> 00:50:58,000
If two tools can satisfy the same intent you pick one.

1153
00:50:58,000 –> 00:51:01,520
We don’t let the agent improvise between them based on wording.

1154
00:51:01,520 –> 00:51:05,400
This is why the orchestration layer should behave like a policy router.

1155
00:51:05,400 –> 00:51:10,540
Given intent x and verified context y call tool z with payload schema s.

1156
00:51:10,540 –> 00:51:14,720
If the context doesn’t satisfy the preconditions the router doesn’t try something else.

1157
00:51:14,720 –> 00:51:16,040
It refuses or escalates.

1158
00:51:16,040 –> 00:51:19,840
And yes, co pilot studio gives you multiple ways to implement this.

1159
00:51:19,840 –> 00:51:25,840
Topics, tool definitions, agent flows and emerging protocols like mcp for control tool invocation.

1160
00:51:25,840 –> 00:51:27,440
The mechanism isn’t the point.

1161
00:51:27,440 –> 00:51:31,320
The point is that tool invocation is a governed decision, not a creative act.

1162
00:51:31,320 –> 00:51:33,560
Now at guardrails that people pretend are optional.

1163
00:51:33,560 –> 00:51:36,600
First require confirmation for high impact actions.

1164
00:51:36,600 –> 00:51:37,600
Not are you sure?

1165
00:51:37,600 –> 00:51:38,840
See you in a friendly sentence.

1166
00:51:38,840 –> 00:51:42,600
A structured confirmation that restates the action in precise terms.

1167
00:51:42,600 –> 00:51:43,600
What will change?

1168
00:51:43,600 –> 00:51:44,600
Where?

1169
00:51:44,600 –> 00:51:46,960
Under which identity and what record will be written?

1170
00:51:46,960 –> 00:51:49,080
Second, separate preparation from commit.

1171
00:51:49,080 –> 00:51:53,400
Let the model assemble the request, validate the parameters and present the plan.

1172
00:51:53,400 –> 00:51:56,880
Then commit through deterministic execution only after preconditions and confirmations

1173
00:51:56,880 –> 00:51:57,880
are satisfied.

1174
00:51:57,880 –> 00:52:02,560
Third, log every tool call with enough detail to reconstruct reality later.

1175
00:52:02,560 –> 00:52:06,880
If you can’t answer what happened without rereading a chat transcript you didn’t log.

1176
00:52:06,880 –> 00:52:08,640
You performed theater.

1177
00:52:08,640 –> 00:52:10,960
Tool first routing is how you keep the agent honest.

1178
00:52:10,960 –> 00:52:15,560
You stop asking what can it do and start asking what contracts are we willing to expose under

1179
00:52:15,560 –> 00:52:17,640
which identities with which evidence.

1180
00:52:17,640 –> 00:52:21,320
That’s how you build an agent that can act without becoming a liability.

1181
00:52:21,320 –> 00:52:26,360
And once the tool surface is explicit, orchestration stops being a vague agent flow.

1182
00:52:26,360 –> 00:52:30,960
It becomes a structure you can test, version and keep small on purpose.

1183
00:52:30,960 –> 00:52:31,960
Orchestration design.

1184
00:52:31,960 –> 00:52:35,120
Topics, flows and the minimal conversation surface.

1185
00:52:35,120 –> 00:52:38,760
Once tools are contracts, orchestration is the thing that prevents those contracts from

1186
00:52:38,760 –> 00:52:43,560
turning into a bucket of sharp objects sitting on a desk labeled AI.

1187
00:52:43,560 –> 00:52:47,760
Orchestration is how you decide repeatedly which contract is allowed to run in which order,

1188
00:52:47,760 –> 00:52:49,840
with which inputs and with what commit points.

1189
00:52:49,840 –> 00:52:53,800
It’s the difference between a governed service and a chat session that happens to touch production.

1190
00:52:53,800 –> 00:52:57,120
In co-pilot studio terms people will argue about mechanisms.

1191
00:52:57,120 –> 00:53:01,800
Topics versus generative orchestration, agent flows versus power automate, actions versus

1192
00:53:01,800 –> 00:53:05,040
prompts, MCP versus connectors.

1193
00:53:05,040 –> 00:53:07,040
Mechanisms are implementation details.

1194
00:53:07,040 –> 00:53:11,040
Orchestration is the structure you impose so the mechanism can’t drift into improvisation.

1195
00:53:11,040 –> 00:53:15,520
Start with topics because topics are still the cleanest way to make a path repeatable when

1196
00:53:15,520 –> 00:53:16,840
the outcome matters.

1197
00:53:16,840 –> 00:53:18,960
A topic is not old school bot design.

1198
00:53:18,960 –> 00:53:21,880
It’s a deterministic route through a known decision boundary.

1199
00:53:21,880 –> 00:53:26,520
If the intent triggers a state change, you want the predictable container, trigger, parameter

1200
00:53:26,520 –> 00:53:32,920
collection, validation, tool invocation, confirmation, record update, and a final response that reflects

1201
00:53:32,920 –> 00:53:34,400
the actual outcome.

1202
00:53:34,400 –> 00:53:38,080
And the design rule is simple, structured topics for execution paths.

1203
00:53:38,080 –> 00:53:42,600
Use freeform conversation where you can tolerate variance, clarifying questions, knowledge, lookup

1204
00:53:42,600 –> 00:53:43,600
and triage.

1205
00:53:43,600 –> 00:53:46,960
But don’t let freeform language drive execution branching because that’s how you get the

1206
00:53:46,960 –> 00:53:50,680
same request going down different paths depending on phrasing, context, chunks or which

1207
00:53:50,680 –> 00:53:52,440
prompt got edited last week.

1208
00:53:52,440 –> 00:53:55,880
So the conversation surface becomes minimal on purpose.

1209
00:53:55,880 –> 00:54:01,360
The agent should ask only for missing parameters, not for storytelling, not for help me understand,

1210
00:54:01,360 –> 00:54:04,760
not for open and a dialogue that makes the user feel heard while the system collects

1211
00:54:04,760 –> 00:54:06,360
unreliable inputs.

1212
00:54:06,360 –> 00:54:09,360
Minimal conversation means if a parameter is required ask for it.

1213
00:54:09,360 –> 00:54:10,560
If it isn’t required don’t.

1214
00:54:10,560 –> 00:54:13,760
If a parameter can be derived deterministically, derive it.

1215
00:54:13,760 –> 00:54:14,760
Don’t ask.

1216
00:54:14,760 –> 00:54:19,040
If ambiguity exists, the agent should say so explicitly and force this ambiguity before

1217
00:54:19,040 –> 00:54:20,040
it touches tools.

1218
00:54:20,040 –> 00:54:23,480
That sounds strict because it is execution needs strictness.

1219
00:54:23,480 –> 00:54:24,480
Now flows.

1220
00:54:24,480 –> 00:54:28,440
Flow is where long running work stops being a chat problem and becomes a systems problem,

1221
00:54:28,440 –> 00:54:29,960
which is where it belongs.

1222
00:54:29,960 –> 00:54:34,080
If the action takes more than a few seconds, document generation multi-system updates approvals

1223
00:54:34,080 –> 00:54:36,520
provisioning don’t trap it inside a conversational loop.

1224
00:54:36,520 –> 00:54:40,600
Kick off a flow, return a receipt, provide a status check, and the turn.

1225
00:54:40,600 –> 00:54:42,480
And the hidden benefit is operational.

1226
00:54:42,480 –> 00:54:46,680
Flow can be built with id-impotency patterns, correlation IDs, retries and timeouts that

1227
00:54:46,680 –> 00:54:47,760
chat cannot give you.

1228
00:54:47,760 –> 00:54:51,880
If the user asks twice, the system should detect same request already in progress and return

1229
00:54:51,880 –> 00:54:53,840
status, not run the action twice.

1230
00:54:53,840 –> 00:54:57,480
That’s not an optimization, that’s how you avoid double execution incidents.

1231
00:54:57,480 –> 00:55:01,840
So, orchestration should treat long running execution as asynchronous by default, the agent

1232
00:55:01,840 –> 00:55:05,680
becomes the interface, the workflow becomes the engine, the system of record becomes the

1233
00:55:05,680 –> 00:55:10,120
truth, which leads to the next design rule, status updates aren’t politeness, they’re

1234
00:55:10,120 –> 00:55:11,640
control.

1235
00:55:11,640 –> 00:55:17,000
If execution takes time, the agent must report state transitions, accepted, validating,

1236
00:55:17,000 –> 00:55:22,240
awaiting approval, executing, completed, failed, escalated, those aren’t progress messages,

1237
00:55:22,240 –> 00:55:27,920
they are how you prevent users from resubmitting, how you preserve trust, and how you make operations

1238
00:55:27,920 –> 00:55:28,920
debuggable.

1239
00:55:28,920 –> 00:55:33,120
And yes, you want retries, but you want id-impotent retries, not try again and hope.

1240
00:55:33,120 –> 00:55:36,840
If the workflow can safely retry a step, it should, if it can’t, it should stop and

1241
00:55:36,840 –> 00:55:38,560
hand off with the evidence captured.

1242
00:55:38,560 –> 00:55:42,680
Now, escalation, escalation isn’t a fallback topic with a generic apology.

1243
00:55:42,680 –> 00:55:45,120
Escalation is a designed hand off where state is preserved.

1244
00:55:45,120 –> 00:55:49,400
If the agent can’t proceed missing entitlement failed precondition and big US input system

1245
00:55:49,400 –> 00:55:54,400
down, it should produce a structured package, what the user asked, what it verified, what

1246
00:55:54,400 –> 00:55:58,600
failed, what it attempted, and what a human needs to complete the work.

1247
00:55:58,600 –> 00:56:01,880
Then root that package into the right queue with ownership and SLA.

1248
00:56:01,880 –> 00:56:05,720
That’s how you avoid the most common operational dead end.

1249
00:56:05,720 –> 00:56:06,880
Try again later.

1250
00:56:06,880 –> 00:56:10,760
Try again later is how you create duplicate tickets in consistent records and user work

1251
00:56:10,760 –> 00:56:16,160
around. So the orchestration structure you want is almost boring, a small set of execution

1252
00:56:16,160 –> 00:56:20,400
topics with strict parameter collection, a deterministic tool invocation policy, asynchronous

1253
00:56:20,400 –> 00:56:25,920
flows for real work, explicit status transitions, id-impotent retries, designed escalation with

1254
00:56:25,920 –> 00:56:29,720
state, not a shrug, and here’s the part that will bother people who want agents to feel

1255
00:56:29,720 –> 00:56:30,800
magical.

1256
00:56:30,800 –> 00:56:34,760
When orchestration is done correctly, the conversation gets smaller, not bigger.

1257
00:56:34,760 –> 00:56:38,560
Because the goal isn’t to simulate a helpful colleague, the goal is to run a controlled system

1258
00:56:38,560 –> 00:56:40,680
through a human-friendly interface.

1259
00:56:40,680 –> 00:56:44,600
Once you accept that, the architecture becomes obvious, and you can finally do the thing

1260
00:56:44,600 –> 00:56:47,160
most agent programs avoid until it’s too late.

1261
00:56:47,160 –> 00:56:51,840
You can manage agents like a portfolio, not like a craft project.

1262
00:56:51,840 –> 00:56:55,320
Governance, agent portfolio management, not one off botcraft.

1263
00:56:55,320 –> 00:56:57,600
Here’s what happens after the first agentships.

1264
00:56:57,600 –> 00:56:58,960
Nothing stays the first agent.

1265
00:56:58,960 –> 00:57:02,200
It becomes a template, a precedent, and an excuse.

1266
00:57:02,200 –> 00:57:06,400
And if governance doesn’t exist as an operating system, you don’t get an agent program.

1267
00:57:06,400 –> 00:57:08,040
You get agents sprawl.

1268
00:57:08,040 –> 00:57:11,040
Examples of semi-owned chat surfaces wired into production systems.

1269
00:57:11,040 –> 00:57:12,120
Each one, temporary.

1270
00:57:12,120 –> 00:57:14,120
Each one, carrying a little more security debt.

1271
00:57:14,120 –> 00:57:18,040
So governance, in agent terms, isn’t a committee, its portfolio management.

1272
00:57:18,040 –> 00:57:23,360
A portfolio means an inventory, an owner, a life cycle, and a repeatable way to decide what

1273
00:57:23,360 –> 00:57:27,360
gets built, what gets promoted, what gets retired, and what gets blocked.

1274
00:57:27,360 –> 00:57:28,360
And it needs a loop.

1275
00:57:28,360 –> 00:57:32,320
A real one, plan implement, manage, improve, extend, not as a slide, as an operating rhythm

1276
00:57:32,320 –> 00:57:35,760
that survives maker turnover and leadership attention drift.

1277
00:57:35,760 –> 00:57:38,880
One is where the organization forces discipline.

1278
00:57:38,880 –> 00:57:43,160
Outcomes, scoped intents, risk tier, and the non-negotiables.

1279
00:57:43,160 –> 00:57:46,560
Identity model, system of record, audit requirement, and refusal rules.

1280
00:57:46,560 –> 00:57:48,680
This is also where you classify the agent.

1281
00:57:48,680 –> 00:57:50,720
Because not every agent deserves the same controls.

1282
00:57:50,720 –> 00:57:55,560
A discovery agent that only retrieves curated policy guidance isn’t governed like an execution

1283
00:57:55,560 –> 00:57:57,320
agent that can mutate records.

1284
00:57:57,320 –> 00:58:00,640
If you govern them the same, you’ll either suffocate harmless agents or under-governed

1285
00:58:00,640 –> 00:58:01,800
dangerous ones.

1286
00:58:01,800 –> 00:58:03,040
Both outcomes are common.

1287
00:58:03,040 –> 00:58:04,520
Both are failures.

1288
00:58:04,520 –> 00:58:07,320
Agent is where teams prove they can build within constraints.

1289
00:58:07,320 –> 00:58:10,400
Tool allow lists, environment strategy, and release gates.

1290
00:58:10,400 –> 00:58:14,120
This is where you prevent the most predictable architecture erosion.

1291
00:58:14,120 –> 00:58:17,040
Building in production because it’s just a small change.

1292
00:58:17,040 –> 00:58:20,360
Agents degrade that way faster than apps because prompt edits feel harmless.

1293
00:58:20,360 –> 00:58:21,360
They aren’t.

1294
00:58:21,360 –> 00:58:22,720
Their behavior changes.

1295
00:58:22,720 –> 00:58:26,200
Manage is where organizations stop pretending ownership is implicit.

1296
00:58:26,200 –> 00:58:29,360
Every agent needs an accountable owner and a technical owner.

1297
00:58:29,360 –> 00:58:33,560
Accountable means they take the heat when the agent causes rework, risk, or reputation

1298
00:58:33,560 –> 00:58:34,560
or damage.

1299
00:58:34,560 –> 00:58:37,560
Technical means they can actually change the thing when it drifts.

1300
00:58:37,560 –> 00:58:39,920
If those aren’t named, the agent is already orphaned.

1301
00:58:39,920 –> 00:58:41,680
It just hasn’t been discovered yet.

1302
00:58:41,680 –> 00:58:43,400
Inventory is the other half of manage.

1303
00:58:43,400 –> 00:58:45,040
You need a tenant-wide list.

1304
00:58:45,040 –> 00:58:46,360
What agents exist.

1305
00:58:46,360 –> 00:58:47,520
Where they’re deployed.

1306
00:58:47,520 –> 00:58:48,600
What channels they run in.

1307
00:58:48,600 –> 00:58:49,600
What data they touch.

1308
00:58:49,600 –> 00:58:53,280
What tools they can invoke and which identities they run as.

1309
00:58:53,280 –> 00:58:54,720
Without that you don’t have governance.

1310
00:58:54,720 –> 00:58:57,240
You have hope.

1311
00:58:57,240 –> 00:59:00,240
Improve is where the feedback loop gets operational.

1312
00:59:00,240 –> 00:59:02,160
Not users said it was cool.

1313
00:59:02,160 –> 00:59:03,440
No signals.

1314
00:59:03,440 –> 00:59:04,440
Escalation patterns.

1315
00:59:04,440 –> 00:59:05,440
Failure modes.

1316
00:59:05,440 –> 00:59:06,440
Intervention rates.

1317
00:59:06,440 –> 00:59:08,320
And which intents produce exceptions.

1318
00:59:08,320 –> 00:59:10,640
And yes, this is where you do the boring work.

1319
00:59:10,640 –> 00:59:11,640
Titan contracts.

1320
00:59:11,640 –> 00:59:12,800
Prune knowledge.

1321
00:59:12,800 –> 00:59:13,800
Remove tools.

1322
00:59:13,800 –> 00:59:14,800
Split topics.

1323
00:59:14,800 –> 00:59:15,800
Adjust refusal paths.

1324
00:59:15,800 –> 00:59:18,520
And fix the things that keep creating human cleanup.

1325
00:59:18,520 –> 00:59:22,160
Extend is where scale happens without repeating the same mistakes.

1326
00:59:22,160 –> 00:59:24,760
Standardized patterns become reusable assets.

1327
00:59:24,760 –> 00:59:26,520
Intent contract templates.

1328
00:59:26,520 –> 00:59:28,400
Tool contract patterns.

1329
00:59:28,400 –> 00:59:29,400
Environment setups.

1330
00:59:29,400 –> 00:59:30,640
And testing baselines.

1331
00:59:30,640 –> 00:59:35,520
This is also where you decide when to add advanced capability, new connectors, new channels,

1332
00:59:35,520 –> 00:59:40,000
autonomous triggers, without turning the platform into an uncontrolled feature buffet.

1333
00:59:40,000 –> 00:59:43,920
Now the governance mechanics enterprises always underestimate environments, DLP and

1334
00:59:43,920 –> 00:59:45,400
release gates.

1335
00:59:45,400 –> 00:59:46,680
Environments aren’t bureaucracy.

1336
00:59:46,680 –> 00:59:48,280
They are blast radius control.

1337
00:59:48,280 –> 00:59:51,560
If makers can build anywhere they will, if they can publish anywhere they will.

1338
00:59:51,560 –> 00:59:52,560
That’s not malice.

1339
00:59:52,560 –> 00:59:54,360
That’s entropy.

1340
00:59:54,360 –> 00:59:56,960
So you need a clear environment strategy.

1341
00:59:56,960 –> 00:59:58,160
Development for building.

1342
00:59:58,160 –> 00:59:59,360
Test for validation.

1343
00:59:59,360 –> 01:00:00,360
Production for runtime.

1344
01:00:00,360 –> 01:00:04,760
Promotions happen through solutions and pipelines, not through copy paste and I changed one line

1345
01:00:04,760 –> 01:00:05,960
of instructions.

1346
01:00:05,960 –> 01:00:07,520
And the gating rule is simple.

1347
01:00:07,520 –> 01:00:11,960
If the agent can act it needs a release gate that includes testing and an owner sign off.

1348
01:00:11,960 –> 01:00:13,440
Then DLP.

1349
01:00:13,440 –> 01:00:17,120
Not as a checkbox but as a constraint on connectors, triggers and knowledge sources.

1350
01:00:17,120 –> 01:00:21,920
You don’t allow any connector that works that you allow the ones that match the contracts.

1351
01:00:21,920 –> 01:00:24,560
DLP is where you prevent the classic drift.

1352
01:00:24,560 –> 01:00:28,640
Someone adds a new connector to solve a one-off problem and suddenly the agent can exfiltrate

1353
01:00:28,640 –> 01:00:31,840
data to a place governance doesn’t monitor.

1354
01:00:31,840 –> 01:00:37,000
And finally the part most programs skip until the incident change control that survives

1355
01:00:37,000 –> 01:00:38,560
people leaving.

1356
01:00:38,560 –> 01:00:39,560
Agents are easy to build.

1357
01:00:39,560 –> 01:00:40,560
That’s the point.

1358
01:00:40,560 –> 01:00:41,560
It’s also the problem.

1359
01:00:41,560 –> 01:00:45,400
If the only person who understands an agent is the maker who built it at 2am, it isn’t

1360
01:00:45,400 –> 01:00:46,560
an enterprise asset.

1361
01:00:46,560 –> 01:00:47,560
It’s a pending outage.

1362
01:00:47,560 –> 01:00:49,880
So governance has to enforce three realities.

1363
01:00:49,880 –> 01:00:52,200
Ownership, inventory and release discipline.

1364
01:00:52,200 –> 01:00:55,760
Because once you operate agents as a portfolio, a second truth shows up.

1365
01:00:55,760 –> 01:01:00,280
You can stop arguing about AI quality in abstract terms and start measuring whether the architecture

1366
01:01:00,280 –> 01:01:01,280
is holding.

1367
01:01:01,280 –> 01:01:05,120
And measurement is where governance becomes real or becomes theatre.

1368
01:01:05,120 –> 01:01:07,760
Metrics that matter, measuring architecture, not sentiment.

1369
01:01:07,760 –> 01:01:11,200
If governance is portfolio management, metrics are how you prove the portfolio isn’t

1370
01:01:11,200 –> 01:01:12,360
quietly rotting.

1371
01:01:12,360 –> 01:01:15,520
And the first thing to accept is that most agent metrics are comfort metrics.

1372
01:01:15,520 –> 01:01:17,400
They tell you the agent is being talked to.

1373
01:01:17,400 –> 01:01:20,440
They don’t tell you the agent is doing the job you delegated.

1374
01:01:20,440 –> 01:01:23,560
Session counts, conversation volume, thumbs up ratios.

1375
01:01:23,560 –> 01:01:24,560
Nice to have.

1376
01:01:24,560 –> 01:01:27,360
I don’t answer the only question that matters in an enterprise.

1377
01:01:27,360 –> 01:01:31,640
Did the system behave predictably under control and with evidence so the metric set has to

1378
01:01:31,640 –> 01:01:33,760
measure architecture, not sentiment?

1379
01:01:33,760 –> 01:01:39,040
It has to expose where your design is probabilistic, where it’s deterministic, and where humans

1380
01:01:39,040 –> 01:01:41,560
are doing cleanup work you didn’t admit existed.

1381
01:01:41,560 –> 01:01:44,760
Start with the most honest metric in the entire program.

1382
01:01:44,760 –> 01:01:45,760
Intervention rate.

1383
01:01:45,760 –> 01:01:50,440
Intervention rate is how often a human overrides, red does or corrects what the agent produced.

1384
01:01:50,440 –> 01:01:52,400
Not escalations as a concept.

1385
01:01:52,400 –> 01:01:56,880
Local interventions, users say is never mind, reopens the ticket, changes fields after the

1386
01:01:56,880 –> 01:02:01,880
agent created the record, reruns the flow manually, or asks a human to validate before they

1387
01:02:01,880 –> 01:02:02,880
trusted.

1388
01:02:02,880 –> 01:02:07,440
A high intervention rate means your agent is producing work-shaped output that still requires

1389
01:02:07,440 –> 01:02:11,440
human verification, that is not automation, that is outsourced uncertainty.

1390
01:02:11,440 –> 01:02:15,800
And the enterprise tends to fund that for months because the agent looks busy.

1391
01:02:15,800 –> 01:02:18,240
Intervention rate is how you stop funding theatre.

1392
01:02:18,240 –> 01:02:20,160
Second, determinism score.

1393
01:02:20,160 –> 01:02:24,720
This is the metric most people don’t measure because it forces uncomfortable design changes.

1394
01:02:24,720 –> 01:02:26,320
The definition is simple.

1395
01:02:26,320 –> 01:02:30,240
Given the same intent and the same context, does the agent take the same action path?

1396
01:02:30,240 –> 01:02:34,440
Not, does it respond similarly, does it root to the same topic, call the same tool, validate

1397
01:02:34,440 –> 01:02:37,280
the same preconditions and produce the same state transition?

1398
01:02:37,280 –> 01:02:42,520
If the action path changes based on phrasing, random retrieval chunks or prompt drift,

1399
01:02:42,520 –> 01:02:43,720
determinism is low.

1400
01:02:43,720 –> 01:02:48,280
And low determinism is the root cause of operational distrust because humans can’t build stable

1401
01:02:48,280 –> 01:02:51,520
expectations around something that behaves differently every time.

1402
01:02:51,520 –> 01:02:55,680
So the practical way to measure determinism is to maintain a fixed test set of canonical

1403
01:02:55,680 –> 01:03:00,040
requests, 10 to 50 high value intents, and rerun them after every change.

1404
01:03:00,040 –> 01:03:03,440
Same user profile, same data conditions where possible, same environment.

1405
01:03:03,440 –> 01:03:05,520
Measure whether the root and tool cause remains stable.

1406
01:03:05,520 –> 01:03:09,360
If you can’t keep the path stable, the solution isn’t trained users.

1407
01:03:09,360 –> 01:03:12,240
The solution is tighter contracts and smaller tool surfaces.

1408
01:03:12,240 –> 01:03:14,520
Third, audit completeness.

1409
01:03:14,520 –> 01:03:17,200
Audit completeness is not, we have a transcript.

1410
01:03:17,200 –> 01:03:18,200
Transcripts are narrative.

1411
01:03:18,200 –> 01:03:20,600
Audits require reconstructable evidence.

1412
01:03:20,600 –> 01:03:22,120
Audit completeness asks.

1413
01:03:22,120 –> 01:03:26,800
For every agent initiated action, can the organization show what was requested, what identity

1414
01:03:26,800 –> 01:03:32,480
executed it, what tool or workflow ran, what record changed, what preconditions were checked,

1415
01:03:32,480 –> 01:03:35,040
what approval occurred and what the outcome was?

1416
01:03:35,040 –> 01:03:37,760
If the answer is kind of, you don’t have an agent system.

1417
01:03:37,760 –> 01:03:40,240
You have a conversational interface with side effects.

1418
01:03:40,240 –> 01:03:44,240
This is where a system of record stops being an integration detail and becomes the core

1419
01:03:44,240 –> 01:03:45,240
control.

1420
01:03:45,240 –> 01:03:48,800
If you’re not writing structured entries into the system of record for agent actions,

1421
01:03:48,800 –> 01:03:51,840
you are building an unorditable automation layer.

1422
01:03:51,840 –> 01:03:54,480
That will end exactly how you think it will end.

1423
01:03:54,480 –> 01:03:58,560
Now add operational KPIs, but tie them to strict definitions.

1424
01:03:58,560 –> 01:04:02,320
Resolution rate only matters if resolved means the workflow reached a defined terminal

1425
01:04:02,320 –> 01:04:03,320
state.

1426
01:04:03,320 –> 01:04:07,680
Deflection rate only matters if deflection means no human head to intervene later.

1427
01:04:07,680 –> 01:04:11,840
Mean time to resolution only matters if you measure it from request to verified state change,

1428
01:04:11,840 –> 01:04:13,560
not from chat start to chat end.

1429
01:04:13,560 –> 01:04:17,640
Otherwise you can improve the metric by ending sessions faster while pushing work downstream.

1430
01:04:17,640 –> 01:04:22,480
The last metric that separates mature programs from pilot programs is refusal quality.

1431
01:04:22,480 –> 01:04:24,840
If refusal is a feature, it should be measurable.

1432
01:04:24,840 –> 01:04:29,200
How often the agent refuses, why it refuses and whether refusal roots users into a clean

1433
01:04:29,200 –> 01:04:31,600
escalation path with preserved state.

1434
01:04:31,600 –> 01:04:34,840
A refusal that creates a usable ticket with context is a control.

1435
01:04:34,840 –> 01:04:38,960
A refusal that tells the user to try again later is just a delayed failure.

1436
01:04:38,960 –> 01:04:42,360
So the metric set becomes a dashboard of architectural truth.

1437
01:04:42,360 –> 01:04:47,040
A convention rate, determinism score, audit completeness, operational KPIs with real definitions

1438
01:04:47,040 –> 01:04:48,640
and refusal quality.

1439
01:04:48,640 –> 01:04:50,960
And once you track those, two things happen.

1440
01:04:50,960 –> 01:04:54,160
First, you stop arguing about whether the agent is good.

1441
01:04:54,160 –> 01:04:56,360
You start seeing where the system is uncontrolled.

1442
01:04:56,360 –> 01:05:01,240
Second, you realize metrics without testing discipline are just forensic reports after trust

1443
01:05:01,240 –> 01:05:02,240
is already damaged.

1444
01:05:02,240 –> 01:05:06,360
So the next mandate is evidence test like software because you are shipping a system,

1445
01:05:06,360 –> 01:05:08,000
not a personality.

1446
01:05:08,000 –> 01:05:09,000
Testing discipline.

1447
01:05:09,000 –> 01:05:11,320
From try it out to evidence.

1448
01:05:11,320 –> 01:05:15,480
Once you start measuring determinism and intervention, you run into a problem that can’t be solved

1449
01:05:15,480 –> 01:05:16,640
with optimism.

1450
01:05:16,640 –> 01:05:18,640
You can’t manage what you can’t reproduce.

1451
01:05:18,640 –> 01:05:23,280
And most co-pilot agent programs are built on the least reproducible testing method in enterprise

1452
01:05:23,280 –> 01:05:24,280
IT.

1453
01:05:24,280 –> 01:05:28,840
Someone trying it out in a chat pane, getting a decent answer once and shipping.

1454
01:05:28,840 –> 01:05:29,840
That’s not testing.

1455
01:05:29,840 –> 01:05:30,840
That’s a vibe check.

1456
01:05:30,840 –> 01:05:31,920
Agents don’t need more encouragement.

1457
01:05:31,920 –> 01:05:33,120
They need evidence.

1458
01:05:33,120 –> 01:05:37,160
So the testing discipline has to look like software discipline adapted for a probabilistic

1459
01:05:37,160 –> 01:05:39,680
component, not because the platform is broken.

1460
01:05:39,680 –> 01:05:43,360
Because the platform is doing exactly what probabilistic systems do, they vary.

1461
01:05:43,360 –> 01:05:47,280
Your job is to bound that variance until it becomes operationally acceptable.

1462
01:05:47,280 –> 01:05:48,800
The first shift is simple.

1463
01:05:48,800 –> 01:05:50,120
Move testing left.

1464
01:05:50,120 –> 01:05:53,840
Early, repeatable and tied to the intents you actually care about.

1465
01:05:53,840 –> 01:05:56,960
Before anything goes near production, you define a core scenario set.

1466
01:05:56,960 –> 01:06:00,200
10 to 50 scenarios that represent the real outcomes you’re delegating.

1467
01:06:00,200 –> 01:06:03,320
Not edge cases, not fun prompts, not clever demos.

1468
01:06:03,320 –> 01:06:07,400
The boring high frequency paths that will generate tickets, approvals, changes and audit

1469
01:06:07,400 –> 01:06:08,400
questions.

1470
01:06:08,400 –> 01:06:12,040
And you test them every time you change anything that can alter behavior.

1471
01:06:12,040 –> 01:06:17,120
Instructions, topic triggers, two definitions, knowledge sources, connector auth, models,

1472
01:06:17,120 –> 01:06:18,640
even environment policies.

1473
01:06:18,640 –> 01:06:20,320
Because those are all behavior changes.

1474
01:06:20,320 –> 01:06:22,120
Some are just disguised as configuration.

1475
01:06:22,120 –> 01:06:24,600
Copilot Studio gives you multiple ways to do this.

1476
01:06:24,600 –> 01:06:26,800
And the point isn’t which feature is best.

1477
01:06:26,800 –> 01:06:30,640
The point is you stop relying on memory and start relying on repeatable runs.

1478
01:06:30,640 –> 01:06:33,520
Use the built-in evaluation capability when it fits.

1479
01:06:33,520 –> 01:06:36,440
Single turn checks for quality, groundedness and completeness.

1480
01:06:36,440 –> 01:06:39,800
It’s useful for quickly catching regressions in generative answers.

1481
01:06:39,800 –> 01:06:44,120
And it can generate test sets based on your agent metadata, knowledge sources or past

1482
01:06:44,120 –> 01:06:45,120
chats.

1483
01:06:45,120 –> 01:06:46,120
That’s fine for breadth.

1484
01:06:46,120 –> 01:06:48,400
But don’t confuse that with end-to-end proof.

1485
01:06:48,400 –> 01:06:52,120
Single turn evaluation won’t tell you whether a multi-turn execution path collects parameters

1486
01:06:52,120 –> 01:06:57,200
correctly, validates preconditions, calls the right tool and writes to the system of record.

1487
01:06:57,200 –> 01:06:59,040
It won’t show you item potency failures.

1488
01:06:59,040 –> 01:07:04,320
It won’t show you what happens when a connector throws an HTTP error halfway through a workflow.

1489
01:07:04,320 –> 01:07:07,840
So for execution scenarios you need multi-turn automated testing.

1490
01:07:07,840 –> 01:07:12,080
That’s where bulk testing approaches like the Copilot Studio Kit become the adult option.

1491
01:07:12,080 –> 01:07:16,640
You run test conversations that include the messy user request, the clarification turns,

1492
01:07:16,640 –> 01:07:20,520
the confirmation steps, the tool invocation, the error path and the escalation path.

1493
01:07:20,520 –> 01:07:23,800
And you run them at scale because non-determinism hides in volume.

1494
01:07:23,800 –> 01:07:25,680
The goal isn’t to prove it works.

1495
01:07:25,680 –> 01:07:29,080
The goal is to discover where it fails before users do.

1496
01:07:29,080 –> 01:07:31,480
Now add the most ignored part of agent testing.

1497
01:07:31,480 –> 01:07:32,720
Identity profiles.

1498
01:07:32,720 –> 01:07:35,440
An agent that works under a maker account proves nothing.

1499
01:07:35,440 –> 01:07:40,640
You test with designated user profiles that reflect real entitlements, standard employee,

1500
01:07:40,640 –> 01:07:45,520
privileged operator, regional user, contractor, service desk, analyst, same intent, different

1501
01:07:45,520 –> 01:07:47,280
identity, different access graph.

1502
01:07:47,280 –> 01:07:48,440
That’s not a corner case.

1503
01:07:48,440 –> 01:07:50,600
That’s the system.

1504
01:07:50,600 –> 01:07:55,040
Because many agent failures are just authorization reality finally becoming visible.

1505
01:07:55,040 –> 01:07:59,240
The test discipline has to surface that deliberately, not by surprise in a team’s channel.

1506
01:07:59,240 –> 01:08:01,560
Then you treat tests as architecture artifacts.

1507
01:08:01,560 –> 01:08:02,560
They are versioned.

1508
01:08:02,560 –> 01:08:03,560
They are owned.

1509
01:08:03,560 –> 01:08:04,560
They live with the solution.

1510
01:08:04,560 –> 01:08:05,560
They run in pipelines.

1511
01:08:05,560 –> 01:08:06,560
And they gate releases.

1512
01:08:06,560 –> 01:08:10,320
If you can’t describe your release gate in one sentence, we won’t promote this agent unless

1513
01:08:10,320 –> 01:08:15,560
core scenarios pass under these identities and tool calls produce these records.

1514
01:08:15,560 –> 01:08:17,240
You don’t have a release process.

1515
01:08:17,240 –> 01:08:19,360
You have hope with deployment permissions.

1516
01:08:19,360 –> 01:08:21,880
And yes, you need regression after every change.

1517
01:08:21,880 –> 01:08:27,280
Agents drift from small edits, knowledge updates, connector changes, model updates, capacity

1518
01:08:27,280 –> 01:08:28,280
limits.

1519
01:08:28,280 –> 01:08:29,280
Those aren’t hypothetical.

1520
01:08:29,280 –> 01:08:31,320
They’re routine platform reality.

1521
01:08:31,320 –> 01:08:35,200
So the discipline is same test set, same thresholds, rerun continuously.

1522
01:08:35,200 –> 01:08:39,480
Finally, testing has to include failure path design, not just success path validation.

1523
01:08:39,480 –> 01:08:40,960
Force the tool to fail.

1524
01:08:40,960 –> 01:08:45,960
Remove permissions, break the dependency, exceed the token limit with oversized inputs.

1525
01:08:45,960 –> 01:08:46,960
Trigger throttling.

1526
01:08:46,960 –> 01:08:48,320
Watch what the agent does.

1527
01:08:48,320 –> 01:08:51,280
If it masks the failure with language, that’s a defect.

1528
01:08:51,280 –> 01:08:55,080
If it escalates with state preserved and evidence attached, that’s architecture.

1529
01:08:55,080 –> 01:08:57,320
This is what enterprise grade actually means.

1530
01:08:57,320 –> 01:09:00,920
Not fewer failures, but failures that are predictable, explainable and containable.

1531
01:09:00,920 –> 01:09:04,640
Once you have that evidence loop, you stop debating whether the agent is ready.

1532
01:09:04,640 –> 01:09:05,640
You know.

1533
01:09:05,640 –> 01:09:08,360
And that’s when operations becomes the next inevitability.

1534
01:09:08,360 –> 01:09:10,240
Drift doesn’t stop because you tested once.

1535
01:09:10,240 –> 01:09:12,080
It just becomes visible sooner.

1536
01:09:12,080 –> 01:09:15,160
Agent ops, drift, incidents and platform reality.

1537
01:09:15,160 –> 01:09:17,520
Now take that testing discipline and assume it worked.

1538
01:09:17,520 –> 01:09:18,800
You shipped.

1539
01:09:18,800 –> 01:09:20,320
Core scenarios pass.

1540
01:09:20,320 –> 01:09:22,000
The agent behaves.

1541
01:09:22,000 –> 01:09:23,920
Everyone celebrates.

1542
01:09:23,920 –> 01:09:25,960
Then the real system starts operating.

1543
01:09:25,960 –> 01:09:30,400
The one made of people, policies, connectors, models and monthly platform updates.

1544
01:09:30,400 –> 01:09:34,280
This is where agent ops stops being optional and becomes the only thing between successful

1545
01:09:34,280 –> 01:09:36,960
pilot and quiet abandonment.

1546
01:09:36,960 –> 01:09:39,880
Agent ops is just the operational truth of agents.

1547
01:09:39,880 –> 01:09:42,840
These systems drift and drift creates incidents.

1548
01:09:42,840 –> 01:09:44,960
And drift does not require malice or incompetence.

1549
01:09:44,960 –> 01:09:46,160
It only requires time.

1550
01:09:46,160 –> 01:09:47,640
Drift shows up in three places.

1551
01:09:47,640 –> 01:09:49,520
First you change the agent.

1552
01:09:49,520 –> 01:09:51,640
Instructions, triggers, prompts, knowledge, tools.

1553
01:09:51,640 –> 01:09:54,040
That’s a behavior change wearing a friendly UI.

1554
01:09:54,040 –> 01:09:56,320
Second, dependencies changed.

1555
01:09:56,320 –> 01:10:02,040
Access, schemers, catalogs, choice values, conditional access, DLP, tokens, secrets, licensing,

1556
01:10:02,040 –> 01:10:03,040
capacity.

1557
01:10:03,040 –> 01:10:04,840
Third, the platform moved.

1558
01:10:04,840 –> 01:10:09,880
Models update, retrieval indexing shifts, throttling changes, preview features, sunset.

1559
01:10:09,880 –> 01:10:11,560
The agent didn’t get weird yet.

1560
01:10:11,560 –> 01:10:13,040
Your runtime changed.

1561
01:10:13,040 –> 01:10:14,280
So the mandate is simple.

1562
01:10:14,280 –> 01:10:15,920
Treat agents like services.

1563
01:10:15,920 –> 01:10:16,920
Services have runbooks.

1564
01:10:16,920 –> 01:10:17,920
They have telemetry.

1565
01:10:17,920 –> 01:10:19,080
They have incident reviews.

1566
01:10:19,080 –> 01:10:20,080
They have backlogs.

1567
01:10:20,080 –> 01:10:23,520
They have owners who get paged when reality disagrees with the demo.

1568
01:10:23,520 –> 01:10:28,320
But with reliability patterns, because most agent incidents are boring in the same way.

1569
01:10:28,320 –> 01:10:34,080
Timeouts, partial completions, duplicate execution, silent tool failures, inconsistent routing,

1570
01:10:34,080 –> 01:10:37,280
users resubmitting the same request because the agent went quiet.

1571
01:10:37,280 –> 01:10:38,280
So you design for that.

1572
01:10:38,280 –> 01:10:41,280
Retrieves exist, but only where the operation is idempotent.

1573
01:10:41,280 –> 01:10:45,720
If a create ticket action can run twice and create two incidents, you don’t retry.

1574
01:10:45,720 –> 01:10:50,400
You add a correlation ID and a deduplication check in the execution layer, then retry becomes

1575
01:10:50,400 –> 01:10:52,200
resume, not repeat.

1576
01:10:52,200 –> 01:10:53,200
Timeouts exist.

1577
01:10:53,200 –> 01:10:58,080
So every long-running action needs explicit time budgets and user-visible state transitions,

1578
01:10:58,080 –> 01:11:01,520
accepted, queued, awaiting approval, executing, completed.

1579
01:11:01,520 –> 01:11:07,080
If it fails, the agent returns a receipt with a record ID or escalation artifact, not a paragraph.

1580
01:11:07,080 –> 01:11:09,160
Fallbacks exist, but fallbacks must be engineered.

1581
01:11:09,160 –> 01:11:13,480
If the model can’t confidently root, it should ask a disambiguation question or escalate

1582
01:11:13,480 –> 01:11:14,480
with context.

1583
01:11:14,480 –> 01:11:17,800
A fallback that produces plausible text is not resilience.

1584
01:11:17,800 –> 01:11:19,160
It’s fraud with better grammar.

1585
01:11:19,160 –> 01:11:21,360
And human in the loop checkpoints aren’t a concession.

1586
01:11:21,360 –> 01:11:22,360
There are control points.

1587
01:11:22,360 –> 01:11:27,040
High-impact changes require a human approval gate, even if the agent did everything else.

1588
01:11:27,040 –> 01:11:29,680
That’s how you keep autonomy from turning into liability.

1589
01:11:29,680 –> 01:11:34,160
Now observability, this is where most agent programs collapse because they rely on chat

1590
01:11:34,160 –> 01:11:35,960
transcripts like their logs.

1591
01:11:35,960 –> 01:11:36,960
They’re not.

1592
01:11:36,960 –> 01:11:38,720
Transcripts are narrative, not telemetry.

1593
01:11:38,720 –> 01:11:43,560
You need instrumentation for, which intent fired, which topic ran, which tools were invoked,

1594
01:11:43,560 –> 01:11:48,000
which identity executed them, how long each step took, what errors returned, how often

1595
01:11:48,000 –> 01:11:51,760
users abandoned, and how often escalation happened.

1596
01:11:51,760 –> 01:11:54,640
Uncalled telemetry matters more than the agent said something.

1597
01:11:54,640 –> 01:11:57,800
And you need to capture latency because latency creates user behavior.

1598
01:11:57,800 –> 01:12:02,080
When agents take too long, users resubmit, resubmission creates duplicates.

1599
01:12:02,080 –> 01:12:03,600
Duplicates create rework.

1600
01:12:03,600 –> 01:12:05,000
Rework destroys trust.

1601
01:12:05,000 –> 01:12:06,680
This is not user training.

1602
01:12:06,680 –> 01:12:08,000
This is system behavior.

1603
01:12:08,000 –> 01:12:09,920
Then you need an operational posture.

1604
01:12:09,920 –> 01:12:12,680
Weekly review, not quarterly post-mortems.

1605
01:12:12,680 –> 01:12:17,600
Look for drift signals, rising intervention rate, rising fallback rate, increased unrecognized

1606
01:12:17,600 –> 01:12:21,720
utterances, increased tool call failures, rising latency, more escalating.

1607
01:12:21,720 –> 01:12:24,760
In the specific intent, those are leading indicators.

1608
01:12:24,760 –> 01:12:27,160
If you wait for the incident, you’ve already lost adoption.

1609
01:12:27,160 –> 01:12:32,080
When an incident does happen, the review has to be brutally specific, not the model hallucinated.

1610
01:12:32,080 –> 01:12:33,080
That’s not a root cause.

1611
01:12:33,080 –> 01:12:35,320
The root cause is always architectural.

1612
01:12:35,320 –> 01:12:40,480
Missing precondition checks, ambiguous rooting, overbroad tool surface, unowned knowledge, identity

1613
01:12:40,480 –> 01:12:42,640
mismatch, or inadequate logging.

1614
01:12:42,640 –> 01:12:46,200
And every incident produces a backlog item that hardens the system.

1615
01:12:46,200 –> 01:12:50,640
Titer contracts, better validation, narrower tools, improved escalation artifacts, additional

1616
01:12:50,640 –> 01:12:52,520
tests, and clearer runbooks.

1617
01:12:52,520 –> 01:12:54,160
That’s the real promise of agent ops.

1618
01:12:54,160 –> 01:12:55,520
Not that incidents disappear.

1619
01:12:55,520 –> 01:12:59,480
That incidents become containable, explainable, and less frequent over time.

1620
01:12:59,480 –> 01:13:02,480
Because the uncomfortable truth is that platforms will keep moving.

1621
01:13:02,480 –> 01:13:03,720
The tenant will keep changing.

1622
01:13:03,720 –> 01:13:06,200
Your organization will keep accumulating entropy.

1623
01:13:06,200 –> 01:13:11,120
The question is whether your agent program treats that as AI being AI or whether you run

1624
01:13:11,120 –> 01:13:12,120
it like production.

1625
01:13:12,120 –> 01:13:15,200
If you choose production, drift becomes a managed cost.

1626
01:13:15,200 –> 01:13:19,800
If you choose vibes, drift becomes your entire story, where it’s in the executive playbook.

1627
01:13:19,800 –> 01:13:22,280
The five decisions leadership must force.

1628
01:13:22,280 –> 01:13:24,320
At this point, the pattern should be obvious.

1629
01:13:24,320 –> 01:13:26,160
Agent failure isn’t a tooling problem.

1630
01:13:26,160 –> 01:13:28,880
It’s a leadership problem disguised as a maker problem.

1631
01:13:28,880 –> 01:13:31,920
Because agents sit at the intersection of accountability and automation.

1632
01:13:31,920 –> 01:13:35,640
And if leadership doesn’t force a few hard decisions up front, the system will make those

1633
01:13:35,640 –> 01:13:37,040
decisions for you.

1634
01:13:37,040 –> 01:13:41,360
Accidentally, inconsistently, and usually during an incident review.

1635
01:13:41,360 –> 01:13:43,840
So here are the five decisions leadership has to force.

1636
01:13:43,840 –> 01:13:46,640
Not suggest, not encourage, force.

1637
01:13:46,640 –> 01:13:47,640
Decision one.

1638
01:13:47,640 –> 01:13:50,040
The agent owns end to end and what stays human.

1639
01:13:50,040 –> 01:13:52,720
This is the boundary between assistance and delegation.

1640
01:13:52,720 –> 01:13:58,000
If the agent owns an outcome, it owns the workflow path, the tool calls, and the evidence.

1641
01:13:58,000 –> 01:14:02,720
If a human owns it, the agent can guide, summarize, and prepare, but it does not commit.

1642
01:14:02,720 –> 01:14:05,920
Leadership has to pick because ambiguity here creates blame later.

1643
01:14:05,920 –> 01:14:09,720
And the fastest way to kill adoption is to ship an agent that sometimes acts and sometimes

1644
01:14:09,720 –> 01:14:12,120
doesn’t with no reliable rule behind it.

1645
01:14:12,120 –> 01:14:13,120
Decision two.

1646
01:14:13,120 –> 01:14:16,360
The identity model and the least privileged boundary.

1647
01:14:16,360 –> 01:14:22,120
Leadership has to decide whether execution runs as the user or as a service identity per intent.

1648
01:14:22,120 –> 01:14:25,880
Then enforce least privilege as a design constraint, not a security aspiration.

1649
01:14:25,880 –> 01:14:29,200
This is also where leaders stop tolerating temporary access.

1650
01:14:29,200 –> 01:14:32,480
Agents will operationalize whatever access that you’ve accumulated.

1651
01:14:32,480 –> 01:14:37,120
If the business wants action-capable agents, the business funds, identity hygiene, access

1652
01:14:37,120 –> 01:14:42,960
reviews, app permission discipline, and a clear run as pattern that survives staff turnover.

1653
01:14:42,960 –> 01:14:43,960
Decision three.

1654
01:14:43,960 –> 01:14:47,960
This record and the audit requirement, an agent that acts without writing to a system of record

1655
01:14:47,960 –> 01:14:50,040
is not an enterprise system.

1656
01:14:50,040 –> 01:14:52,960
It’s an opinionated chat session with side effects.

1657
01:14:52,960 –> 01:14:56,720
Leadership has to name the record authority per workflow, service now dynamics a line of

1658
01:14:56,720 –> 01:15:02,000
business system, and make it non-negotiable that agent actions produce reconstructable evidence.

1659
01:15:02,000 –> 01:15:06,520
Request, identity, preconditions, approvals, tool invocation outcome.

1660
01:15:06,520 –> 01:15:10,000
If that evidence doesn’t exist, the agent isn’t allowed to execute.

1661
01:15:10,000 –> 01:15:11,000
Simple rule.

1662
01:15:11,000 –> 01:15:12,720
Massive impact.

1663
01:15:12,720 –> 01:15:13,720
Section four.

1664
01:15:13,720 –> 01:15:14,720
Release gates.

1665
01:15:14,720 –> 01:15:16,600
Testing thresholds and rollback expectations.

1666
01:15:16,600 –> 01:15:21,480
If leadership wants deterministic ROI, leadership forces deterministic release discipline.

1667
01:15:21,480 –> 01:15:27,000
That means pre-production environments, version changes, test sets that cover core intents

1668
01:15:27,000 –> 01:15:30,760
under real identity profiles and a past threshold that blocks promotion.

1669
01:15:30,760 –> 01:15:34,000
It also means rollback is designed, not improvised.

1670
01:15:34,000 –> 01:15:38,600
When the platform shifts under you, and it will, you need a way to revert behavior quickly

1671
01:15:38,600 –> 01:15:41,360
without a late night prompt edit in production.

1672
01:15:41,360 –> 01:15:42,360
Decision five.

1673
01:15:42,360 –> 01:15:45,320
Decision metrics tied to outcomes, not democquality.

1674
01:15:45,320 –> 01:15:47,640
Leadership has to kill vanity metrics early.

1675
01:15:47,640 –> 01:15:49,040
Session volume is not success.

1676
01:15:49,040 –> 01:15:50,720
Positive reactions are not success.

1677
01:15:50,720 –> 01:15:54,960
Success is an operational delta that survives outside the agent team.

1678
01:15:54,960 –> 01:15:59,400
Reduced escalations with strict definitions, reduced rework, faster approvals with preserved

1679
01:15:59,400 –> 01:16:04,060
compliance, measurable cycle time reduction, improved throughput and intervention rates

1680
01:16:04,060 –> 01:16:05,800
trending down over time.

1681
01:16:05,800 –> 01:16:07,920
And leaders have to assign metric ownership.

1682
01:16:07,920 –> 01:16:10,600
If nobody owns the metric, the agent doesn’t have a goal.

1683
01:16:10,600 –> 01:16:11,800
It has a launch date.

1684
01:16:11,800 –> 01:16:16,800
When these five decisions are explicit, the rest becomes execution, contracts, tools,

1685
01:16:16,800 –> 01:16:20,280
context, orchestration, governance, measurement and operations.

1686
01:16:20,280 –> 01:16:23,360
And when they’re not explicit, what you get is predictable too.

1687
01:16:23,360 –> 01:16:28,680
Chat first agents with broad permissions, unclear boundaries and an AI is unpredictable

1688
01:16:28,680 –> 01:16:33,200
narrative that conveniently avoids the actual root cause.

1689
01:16:33,200 –> 01:16:36,120
This is what thought leadership should really do in this space.

1690
01:16:36,120 –> 01:16:41,080
Stop arguing about whether agents are ready and start forcing architectural honesty.

1691
01:16:41,080 –> 01:16:43,240
Because the platform will keep improving.

1692
01:16:43,240 –> 01:16:44,440
Models will keep changing.

1693
01:16:44,440 –> 01:16:45,520
Features will keep shipping.

1694
01:16:45,520 –> 01:16:46,520
That’s not your constraint.

1695
01:16:46,520 –> 01:16:51,640
Your constraint is whether you can turn probabilistic reasoning into deterministic operations.

1696
01:16:51,640 –> 01:16:55,280
And that only happens when leadership decides what gets enforced.

1697
01:16:55,280 –> 01:16:56,280
Conclusion.

1698
01:16:56,280 –> 01:16:57,760
Agents don’t fail because they’re dumb.

1699
01:16:57,760 –> 01:17:01,200
They fail because you ask them to act without a control plane.

1700
01:17:01,200 –> 01:17:05,240
An architecture is what turns probabilistic reasoning into governed execution.

1701
01:17:05,240 –> 01:17:09,800
If this helped, leave a review so fewer teams keep funding chat-shaped risk.

1702
01:17:09,800 –> 01:17:13,080
Just put your computer on LinkedIn and tell them what you’re building and send the next

1703
01:17:13,080 –> 01:17:15,560
failure pattern you’re seeing so it becomes the next episode.





Source link

0 Votes: 0 Upvotes, 0 Downvotes (0 Points)

Leave a reply

Follow
Search
Popular Now
Loading

Signing-in 3 seconds...

Signing-up 3 seconds...

Discover more from 365 Community Online

Subscribe now to keep reading and get access to the full archive.

Continue reading