
1
00:00:00,000 –> 00:00:02,120
Most organizations think deploy Copilot
2
00:00:02,120 –> 00:00:03,920
and suddenly they have an agentec workforce.
3
00:00:03,920 –> 00:00:06,160
They are wrong agents don’t create discipline.
4
00:00:06,160 –> 00:00:09,360
They amplify whatever entropy already exists.
5
00:00:09,360 –> 00:00:12,560
Bad data, unclear ownership, and controls
6
00:00:12,560 –> 00:00:14,080
that only live in PowerPoint.
7
00:00:14,080 –> 00:00:16,800
So week two arrives, the first confident wrong answer
8
00:00:16,800 –> 00:00:19,880
hits the wrong audience and adoption quietly dies.
9
00:00:19,880 –> 00:00:22,640
This is a 30-day roadmap that produces measurable outcomes,
10
00:00:22,640 –> 00:00:23,440
not a demo.
11
00:00:23,440 –> 00:00:24,680
Three pillars in order.
12
00:00:24,680 –> 00:00:27,560
Copilot Studio orchestration first, Azure AI Search
13
00:00:27,560 –> 00:00:31,200
plus MCP grounding second and Entra Agent ID governance third.
14
00:00:31,200 –> 00:00:33,640
And there’s one design choice that prevents ghost agents
15
00:00:33,640 –> 00:00:34,240
later.
16
00:00:34,240 –> 00:00:35,000
It’s coming.
17
00:00:35,000 –> 00:00:37,480
Define high performance in executive terms.
18
00:00:37,480 –> 00:00:39,520
Before anyone builds an agent, leadership
19
00:00:39,520 –> 00:00:41,400
has to define high performance in terms
20
00:00:41,400 –> 00:00:42,520
that business can audit.
21
00:00:42,520 –> 00:00:43,720
Not users loved it.
22
00:00:43,720 –> 00:00:45,720
Not we shipped four bots.
23
00:00:45,720 –> 00:00:47,200
Outcomes.
24
00:00:47,200 –> 00:00:49,360
Because the platform will happily generate activity
25
00:00:49,360 –> 00:00:50,520
without impact.
26
00:00:50,520 –> 00:00:52,080
You can have thousands of chats and still
27
00:00:52,080 –> 00:00:54,080
have the same backlog, the same seller breaches
28
00:00:54,080 –> 00:00:55,320
and the same escalations.
29
00:00:55,320 –> 00:00:56,560
That distinction matters.
30
00:00:57,560 –> 00:01:00,680
In executive terms, high performance means the system
31
00:01:00,680 –> 00:01:03,800
measurably changes three things, demand time and risk.
32
00:01:03,800 –> 00:01:05,280
Demand is volume reduction.
33
00:01:05,280 –> 00:01:08,400
If the agent works, fewer tickets get created at all.
34
00:01:08,400 –> 00:01:10,320
Not because users stopped having problems,
35
00:01:10,320 –> 00:01:12,720
but because the first interaction resolves them.
36
00:01:12,720 –> 00:01:13,520
That is deflection.
37
00:01:13,520 –> 00:01:15,680
And it’s the only metric that actually hits cost.
38
00:01:15,680 –> 00:01:17,200
Time is cycle reduction.
39
00:01:17,200 –> 00:01:20,120
If a ticket still gets created, it should be created
40
00:01:20,120 –> 00:01:23,320
with better classification, better context, and fewer handoffs.
41
00:01:23,320 –> 00:01:26,120
That shows up as SLA reduction faster first response
42
00:01:26,120 –> 00:01:28,000
and higher first contact resolution.
43
00:01:28,000 –> 00:01:29,760
Risk is controlled behavior.
44
00:01:29,760 –> 00:01:31,880
The agent doesn’t helpfully guess.
45
00:01:31,880 –> 00:01:34,880
It either answers with grounded evidence or it escalates.
46
00:01:34,880 –> 00:01:37,240
And every action is attributable to an identity
47
00:01:37,240 –> 00:01:38,200
with an audit trail.
48
00:01:38,200 –> 00:01:40,000
So for a 30-day window, the KPIs
49
00:01:40,000 –> 00:01:42,880
have to be realistic, measurable and tied to one domain.
50
00:01:42,880 –> 00:01:45,000
Here are target leaders can sign their name to.
51
00:01:45,000 –> 00:01:49,400
For service IT, 20% to 40% ticket deflection at LL1,
52
00:01:49,400 –> 00:01:52,480
15% to 30% reduction in SLA time for the subset
53
00:01:52,480 –> 00:01:55,840
of tickets the agent touches and 10% to 25% fewer escalations.
54
00:01:55,840 –> 00:01:57,040
Those aren’t vanity numbers.
55
00:01:57,040 –> 00:01:59,200
They come directly from three operational levers,
56
00:01:59,200 –> 00:02:02,560
rooting accuracy, containment boundaries, and handoff latency.
57
00:02:02,560 –> 00:02:06,040
For user productivity, 30 to 60 minutes saved per user
58
00:02:06,040 –> 00:02:07,720
per week in the target group.
59
00:02:07,720 –> 00:02:09,560
Not time saved in theory.
60
00:02:09,560 –> 00:02:12,760
Time saved as measured by reduced back and forth,
61
00:02:12,760 –> 00:02:17,000
fewer status check messages and fewer who owns this detours.
62
00:02:17,000 –> 00:02:21,080
Also, over 60% task completion without a human handoff
63
00:02:21,080 –> 00:02:22,560
for the narrow workflow you choose,
64
00:02:22,560 –> 00:02:25,280
an adoption in the target group of 30 to 50%.
65
00:02:25,280 –> 00:02:27,040
If nobody uses it, it doesn’t exist.
66
00:02:27,040 –> 00:02:31,040
For quality and risk, greater than 85% grounded answer accuracy
67
00:02:31,040 –> 00:02:34,280
on an evaluation set, zero access violations,
68
00:02:34,280 –> 00:02:36,520
an audit logging enabled from day one.
69
00:02:36,520 –> 00:02:37,640
Not after the pilot.
70
00:02:37,640 –> 00:02:40,200
Day one, now the antimetrics, these are the numbers teams
71
00:02:40,200 –> 00:02:41,480
love because they’re easy.
72
00:02:41,480 –> 00:02:42,520
They are also useless.
73
00:02:42,520 –> 00:02:44,920
Prompt counts, check counts, token consumption,
74
00:02:44,920 –> 00:02:47,320
number of agents, these measure noise, not outcomes,
75
00:02:47,320 –> 00:02:49,520
they also incentivize exactly the wrong behavior.
76
00:02:49,520 –> 00:02:51,440
Build more, publish more, celebrate more.
77
00:02:51,440 –> 00:02:54,040
Meanwhile, the system decays, a better mental model is this.
78
00:02:54,040 –> 00:02:57,080
Every KPI maps to an operational lever you can actually tune.
79
00:02:57,080 –> 00:02:59,040
Deflection and first contact resolution map
80
00:02:59,040 –> 00:03:01,640
to containment design, what the agent must solve
81
00:03:01,640 –> 00:03:03,240
versus what it must escalate.
82
00:03:03,240 –> 00:03:04,560
If you don’t define that boundary,
83
00:03:04,560 –> 00:03:06,960
you will either over escalate and waste time
84
00:03:06,960 –> 00:03:09,520
or over confidently answer and destroy trust.
85
00:03:09,520 –> 00:03:12,680
SLA reduction maps to handoff latency and enrichment.
86
00:03:12,680 –> 00:03:15,560
If escalation requires the user to repeat everything,
87
00:03:15,560 –> 00:03:17,640
you didn’t build an agent, you built a delay.
88
00:03:17,640 –> 00:03:20,800
The handoff has to carry the context, intent, urgency,
89
00:03:20,800 –> 00:03:24,080
impacted service, device, and what the agent already tried.
90
00:03:24,080 –> 00:03:25,960
Grounded accuracy maps to knowledge coverage
91
00:03:25,960 –> 00:03:27,040
and retrieval quality.
92
00:03:27,040 –> 00:03:29,840
If your content is messy, stale or too large to retrieve
93
00:03:29,840 –> 00:03:31,680
cleanly, the model will improvise.
94
00:03:31,680 –> 00:03:33,920
It’s not malice, it’s math.
95
00:03:33,920 –> 00:03:36,480
An adoption maps to user experience, short answers,
96
00:03:36,480 –> 00:03:39,240
clear next actions and fewer decisions per interaction.
97
00:03:39,240 –> 00:03:41,440
Paragraphs don’t ship work, decisions do.
98
00:03:41,440 –> 00:03:43,280
The next thing leaders miss is ownership.
99
00:03:43,280 –> 00:03:46,360
High performance doesn’t come from who built the bot.
100
00:03:46,360 –> 00:03:47,880
It comes from who owns the outcome,
101
00:03:47,880 –> 00:03:49,720
so a sign an outcome owner per use case,
102
00:03:49,720 –> 00:03:52,360
not a maker, not a dev lead, an accountable operator.
103
00:03:52,360 –> 00:03:54,920
For IT triage, that’s usually the service owner
104
00:03:54,920 –> 00:03:56,200
or the head of service desk.
105
00:03:56,200 –> 00:03:58,960
They sign the KPI targets, they decide what done means.
106
00:03:58,960 –> 00:04:01,400
They also own deprecating topics that don’t perform
107
00:04:01,400 –> 00:04:03,800
because if nobody has authority to kill weak behavior,
108
00:04:03,800 –> 00:04:06,200
the system accumulates entropy generators forever.
109
00:04:06,200 –> 00:04:08,560
Finally, set the system boundary, pick one domain,
110
00:04:08,560 –> 00:04:10,520
one channel, one audience, one backlog.
111
00:04:10,520 –> 00:04:12,120
Performance requires a closed system
112
00:04:12,120 –> 00:04:13,640
where change is observable.
113
00:04:13,640 –> 00:04:15,280
If every department ships an agent
114
00:04:15,280 –> 00:04:17,720
to solve a personal annoyance, you don’t get a workforce.
115
00:04:17,720 –> 00:04:19,560
You get a zoo and that’s the transition point.
116
00:04:19,560 –> 00:04:21,440
The roadmap starts by forcing a boundary
117
00:04:21,440 –> 00:04:23,720
because without one, everything becomes theater.
118
00:04:23,720 –> 00:04:26,520
The core misconception.
119
00:04:26,520 –> 00:04:28,880
Automation isn’t an agentic workforce.
120
00:04:28,880 –> 00:04:31,240
Most leaders have already funded automation.
121
00:04:31,240 –> 00:04:32,720
Some of it even worked.
122
00:04:32,720 –> 00:04:34,920
A power-automate flow here, a ticket template there,
123
00:04:34,920 –> 00:04:37,120
maybe a chatbot that answers the top five questions
124
00:04:37,120 –> 00:04:38,640
when the moon is in the right phase.
125
00:04:38,640 –> 00:04:40,080
That’s not an agentic workforce.
126
00:04:40,080 –> 00:04:42,200
That’s sparkling automation, isolated wins
127
00:04:42,200 –> 00:04:43,360
that look great in a demo
128
00:04:43,360 –> 00:04:45,480
because they run in a clean, staged world.
129
00:04:45,480 –> 00:04:46,960
But they don’t compose into a system.
130
00:04:46,960 –> 00:04:49,120
They don’t share a vocabulary of intent.
131
00:04:49,120 –> 00:04:50,600
They don’t have consistent boundaries.
132
00:04:50,600 –> 00:04:52,000
They don’t learn from failure.
133
00:04:52,000 –> 00:04:54,280
And when they break, nobody can explain why
134
00:04:54,280 –> 00:04:56,440
because they were never instrumented like a system.
135
00:04:56,440 –> 00:04:58,000
They were shipped like a feature.
136
00:04:58,000 –> 00:04:59,600
The uncomfortable truth is this.
137
00:04:59,600 –> 00:05:01,240
Agentic isn’t a UI choice.
138
00:05:01,240 –> 00:05:02,520
It’s an operating model.
139
00:05:02,520 –> 00:05:04,760
A real agent behaves less like a chat widget
140
00:05:04,760 –> 00:05:06,800
and more like a distributed decision engine.
141
00:05:06,800 –> 00:05:10,000
It takes an event, interprets intent, pulls context,
142
00:05:10,000 –> 00:05:13,480
selects tools, takes action, verifies the outcome,
143
00:05:13,480 –> 00:05:15,960
and then hands off when the risk exceeds its mandate.
144
00:05:15,960 –> 00:05:18,280
That loop is the definition, not the chat transcript.
145
00:05:18,280 –> 00:05:19,960
And yes, that sounds like a lot.
146
00:05:19,960 –> 00:05:21,160
Good, it should.
147
00:05:21,160 –> 00:05:24,360
Because what most organizations build first is the opposite.
148
00:05:24,360 –> 00:05:27,720
A conversational front end bolted onto existing chaos
149
00:05:27,720 –> 00:05:31,040
with permission sprawl and a vague goal like help users.
150
00:05:31,040 –> 00:05:32,160
Helpful isn’t the spec.
151
00:05:32,160 –> 00:05:34,480
It’s how you get confident wrong behavior at scale.
152
00:05:34,480 –> 00:05:37,160
So the shift leaders need to make is not task completion.
153
00:05:37,160 –> 00:05:38,720
It’s outcome completion.
154
00:05:38,720 –> 00:05:41,560
Task completion is, answer the question.
155
00:05:41,560 –> 00:05:43,320
Create the ticket.
156
00:05:43,320 –> 00:05:44,680
Summary is the policy.
157
00:05:44,680 –> 00:05:46,480
It’s transactional.
158
00:05:46,480 –> 00:05:50,360
Outcome completion is, resolve the incident without escalation.
159
00:05:50,360 –> 00:05:51,760
Reduce time to restore.
160
00:05:51,760 –> 00:05:53,360
Prevent policy violations.
161
00:05:53,360 –> 00:05:54,640
Outcomes have constraints.
162
00:05:54,640 –> 00:05:55,480
They have ownership.
163
00:05:55,480 –> 00:05:56,320
They have rollback.
164
00:05:56,320 –> 00:05:58,280
They have accountability.
165
00:05:58,280 –> 00:06:01,600
That distinction matters because once you aim at outcomes,
166
00:06:01,600 –> 00:06:05,160
you’re forced to design the system that makes outcomes repeatable.
167
00:06:05,160 –> 00:06:07,240
And then there’s the part nobody wants to hear.
168
00:06:07,240 –> 00:06:10,200
System learning doesn’t happen because the model is smart.
169
00:06:10,200 –> 00:06:12,600
It happens because the platform is instrumented.
170
00:06:12,600 –> 00:06:14,240
If you don’t capture failure reasons,
171
00:06:14,240 –> 00:06:18,120
escalation causes, missing knowledge coverage, tool errors and routing ambiguity,
172
00:06:18,120 –> 00:06:19,000
nothing improves.
173
00:06:19,000 –> 00:06:20,720
You don’t get an agentic workforce.
174
00:06:20,720 –> 00:06:22,920
You get a static bot that slowly becomes wrong
175
00:06:22,920 –> 00:06:25,240
as policies drift and services change.
176
00:06:25,240 –> 00:06:28,320
Entropy always wins when feedback loops don’t exist.
177
00:06:28,320 –> 00:06:31,760
This is where the frontier firm framing actually becomes useful,
178
00:06:31,760 –> 00:06:33,760
if you strip out the hype.
179
00:06:33,760 –> 00:06:34,440
Humans lead.
180
00:06:34,440 –> 00:06:37,200
They define outcomes, boundaries and acceptable risk.
181
00:06:37,200 –> 00:06:38,080
Agents operate.
182
00:06:38,080 –> 00:06:40,800
They execute within those constraints consistently.
183
00:06:40,800 –> 00:06:41,560
Systems learn.
184
00:06:41,560 –> 00:06:44,120
They improve because the organization measures the right things
185
00:06:44,120 –> 00:06:45,480
and updates the design.
186
00:06:45,480 –> 00:06:47,720
But only if you do the boring part, the constraints.
187
00:06:47,720 –> 00:06:51,400
Most rollouts fail for three reasons that are painfully predictable.
188
00:06:51,400 –> 00:06:52,760
First, vague goals.
189
00:06:52,760 –> 00:06:54,480
Improved productivity means nothing.
190
00:06:54,480 –> 00:06:55,480
It produces nothing.
191
00:06:55,480 –> 00:06:57,120
It creates competing interpretations
192
00:06:57,120 –> 00:06:59,560
and a dozen half-built agents that nobody owns.
193
00:06:59,560 –> 00:07:01,360
Second, no constraints.
194
00:07:01,360 –> 00:07:04,840
Unlimited tool access turns an agent into a probabilistic admin.
195
00:07:04,840 –> 00:07:05,960
People call it innovation.
196
00:07:05,960 –> 00:07:07,240
Auditors call it a finding.
197
00:07:07,240 –> 00:07:08,800
Third, uncontrolled publishing.
198
00:07:08,800 –> 00:07:11,120
When every team can publish an agent to everyone,
199
00:07:11,120 –> 00:07:12,240
you don’t get empowerment.
200
00:07:12,240 –> 00:07:13,080
You get collision.
201
00:07:13,080 –> 00:07:14,680
Users don’t ask for 50 agents.
202
00:07:14,680 –> 00:07:15,960
They ask for one that works.
203
00:07:15,960 –> 00:07:18,000
So they try three, get two wrong answers
204
00:07:18,000 –> 00:07:19,840
and decide the whole thing is a toy.
205
00:07:19,840 –> 00:07:22,000
Everything clicked for most experienced architects
206
00:07:22,000 –> 00:07:23,480
when they realize this.
207
00:07:23,480 –> 00:07:25,320
Automation reduces steps.
208
00:07:25,320 –> 00:07:27,440
An agentic workforce reduces uncertainty.
209
00:07:27,440 –> 00:07:29,360
Automation says, if X, then Y.
210
00:07:29,360 –> 00:07:33,040
Agents say given messy input, what is X and which Y is allowed.
211
00:07:33,040 –> 00:07:34,920
That’s why the governance and grounding work
212
00:07:34,920 –> 00:07:36,080
isn’t Phase 2.
213
00:07:36,080 –> 00:07:37,040
It’s foundational.
214
00:07:37,040 –> 00:07:39,440
If you skip it, the system doesn’t become agentic.
215
00:07:39,440 –> 00:07:40,800
It becomes conditional chaos.
216
00:07:40,800 –> 00:07:42,760
So the roadmap can’t be a feature rollout.
217
00:07:42,760 –> 00:07:46,120
It has to be a 30-day operating model that forces clarity.
218
00:07:46,120 –> 00:07:49,040
One domain, explicit outcomes, tool boundaries,
219
00:07:49,040 –> 00:07:51,480
evidence requirements, and a publishing path
220
00:07:51,480 –> 00:07:54,160
that doesn’t turn every experiment into production.
221
00:07:54,160 –> 00:07:56,400
Because if you don’t force that clarity up front,
222
00:07:56,400 –> 00:07:57,920
week two shows up on schedule.
223
00:07:57,920 –> 00:08:00,600
And the platform will do exactly what you configured,
224
00:08:00,600 –> 00:08:02,680
not what you intended.
225
00:08:02,680 –> 00:08:05,560
The 30-day operating model, a 30-day roadmap
226
00:08:05,560 –> 00:08:07,920
fails when it’s treated like a project plan.
227
00:08:07,920 –> 00:08:08,520
It isn’t.
228
00:08:08,520 –> 00:08:10,680
It’s an operating model that constraints behave
229
00:08:10,680 –> 00:08:12,640
you long enough for reality to show up.
230
00:08:12,640 –> 00:08:14,160
So the structure is simple.
231
00:08:14,160 –> 00:08:15,880
Four weeks, each with a different purpose,
232
00:08:15,880 –> 00:08:18,160
and each with a gate you either pass or you stop.
233
00:08:18,160 –> 00:08:18,960
No heroics.
234
00:08:18,960 –> 00:08:20,160
No will fix it later.
235
00:08:20,160 –> 00:08:21,720
Later is where Agents sprawl is born.
236
00:08:21,720 –> 00:08:23,520
Week one is baseline and constraints.
237
00:08:23,520 –> 00:08:25,920
Not building, measuring and boxing the problem in.
238
00:08:25,920 –> 00:08:27,640
You pick one domain and one channel.
239
00:08:27,640 –> 00:08:30,440
For this roadmap, IT service is the least controversial place
240
00:08:30,440 –> 00:08:32,240
to start because the metrics exist.
241
00:08:32,240 –> 00:08:33,520
The workflow is repetitive,
242
00:08:33,520 –> 00:08:35,800
and the political blast radius is manageable.
243
00:08:35,800 –> 00:08:37,400
Then you establish the baseline.
244
00:08:37,400 –> 00:08:39,520
Ticket volume categories, current deflection,
245
00:08:39,520 –> 00:08:42,280
SLA, escalation rate, and the top intent patterns
246
00:08:42,280 –> 00:08:44,040
that show up in real user language.
247
00:08:44,040 –> 00:08:46,520
You also define the containment boundary on day one.
248
00:08:46,520 –> 00:08:47,800
What the agent must solve,
249
00:08:47,800 –> 00:08:50,200
what it must never attempt, and what triggers escalation.
250
00:08:50,200 –> 00:08:51,760
That boundary becomes the contract.
251
00:08:51,760 –> 00:08:53,080
Week two is built in ground.
252
00:08:53,080 –> 00:08:54,520
This is where most teams want to start.
253
00:08:54,520 –> 00:08:56,280
They’re impatient and they ship a chat box.
254
00:08:56,280 –> 00:08:59,160
Don’t week two means you build the first agent
255
00:08:59,160 –> 00:09:00,840
that can do one thing end-to-end,
256
00:09:00,840 –> 00:09:03,960
classify, retrieve, propose, and either resolve or root.
257
00:09:03,960 –> 00:09:06,040
And you begin grounding discipline immediately.
258
00:09:06,040 –> 00:09:07,480
No source, no answer.
259
00:09:07,480 –> 00:09:09,760
If the agent can’t cite a policy, a runbook,
260
00:09:09,760 –> 00:09:12,040
or a known-outage notice, it escalates.
261
00:09:12,040 –> 00:09:14,800
This is also where you create your initial evaluation set
262
00:09:14,800 –> 00:09:16,800
and start scoring grounded accuracy.
263
00:09:16,800 –> 00:09:18,360
Not perfect, measurable.
264
00:09:18,360 –> 00:09:20,440
Week three is orchestrate and integrate.
265
00:09:20,440 –> 00:09:22,080
This is where the system becomes real.
266
00:09:22,080 –> 00:09:24,640
Orchestration turns a response into a workflow.
267
00:09:24,640 –> 00:09:27,840
You integrate the deterministic steps with power automate.
268
00:09:27,840 –> 00:09:31,200
Ticket creation, assignment, user notifications,
269
00:09:31,200 –> 00:09:33,080
logging, and the hand-off payload.
270
00:09:33,080 –> 00:09:34,800
You introduce tool boundaries.
271
00:09:34,800 –> 00:09:37,520
Read operations are default, write operations are gated.
272
00:09:37,520 –> 00:09:39,200
You add the first approval pattern
273
00:09:39,200 –> 00:09:41,600
if you’re doing anything that changes state.
274
00:09:41,600 –> 00:09:43,760
And you begin instrumenting failure reasons
275
00:09:43,760 –> 00:09:45,800
so the system can improve without guessing.
276
00:09:45,800 –> 00:09:47,480
Week four is hardened and scale.
277
00:09:47,480 –> 00:09:49,320
Hardening doesn’t mean polishing the prompt.
278
00:09:49,320 –> 00:09:51,240
It means making the behavior survivable.
279
00:09:51,240 –> 00:09:54,520
You lock down publishing paths, verify logging,
280
00:09:54,520 –> 00:09:58,440
validate access boundaries, and run adversarial tests.
281
00:09:58,440 –> 00:10:02,120
Prompt injection, tool misuse, and helpful requests
282
00:10:02,120 –> 00:10:04,480
that should trigger escalation.
283
00:10:04,480 –> 00:10:07,400
You identify topics with high confusion and kill them.
284
00:10:07,400 –> 00:10:11,440
You finalize the lifecycle model, pilot, active, deprecated.
285
00:10:11,440 –> 00:10:13,240
And then you prepare the next domain
286
00:10:13,240 –> 00:10:16,160
based on what the metrics proved, not what leadership feels.
287
00:10:16,160 –> 00:10:18,520
Now the work selection rule, because you can’t do everything
288
00:10:18,520 –> 00:10:21,360
in 30 days, you choose processes that are high volume,
289
00:10:21,360 –> 00:10:23,120
low variance, and high friction.
290
00:10:23,120 –> 00:10:25,160
High volume means the savings show up quickly.
291
00:10:25,160 –> 00:10:27,480
Low variance means the intent space is stable enough
292
00:10:27,480 –> 00:10:28,720
to root reliably.
293
00:10:28,720 –> 00:10:30,760
High friction means people hate doing it
294
00:10:30,760 –> 00:10:33,040
and will actually use an agent that removes the pain.
295
00:10:33,040 –> 00:10:35,320
Password reset flows, access requests,
296
00:10:35,320 –> 00:10:38,080
how do I policy questions common incident triage,
297
00:10:38,080 –> 00:10:40,240
service catalog routing, that class of work,
298
00:10:40,240 –> 00:10:43,520
and you define done in a way that prevents theater.
299
00:10:43,520 –> 00:10:45,360
Done means three things at once.
300
00:10:45,360 –> 00:10:48,320
Measureable improvement, auditability, and safe rollback.
301
00:10:48,320 –> 00:10:50,520
If you can’t roll it back, you didn’t build a system.
302
00:10:50,520 –> 00:10:51,840
You built a liability.
303
00:10:51,840 –> 00:10:54,240
Measureable improvement means the KPI is moved
304
00:10:54,240 –> 00:10:56,800
for the slice of work you targeted, not anecdotes,
305
00:10:56,800 –> 00:10:58,160
not screenshots.
306
00:10:58,160 –> 00:11:01,360
Auditability means you can answer what did the agent decide,
307
00:11:01,360 –> 00:11:04,040
what sources did it use, what tool did it call,
308
00:11:04,040 –> 00:11:05,280
and what outcome occurred.
309
00:11:05,280 –> 00:11:07,640
If you can’t reconstruct the decision, you can’t defend it.
310
00:11:07,640 –> 00:11:09,720
Safe rollback means you can disable the agent
311
00:11:09,720 –> 00:11:12,440
or remove tool access without breaking the underlying process.
312
00:11:12,440 –> 00:11:14,840
That distinction matters because humans still need to work
313
00:11:14,840 –> 00:11:16,000
when the model misbehales.
314
00:11:16,000 –> 00:11:18,800
Now the governance move that prevents parallel chaos.
315
00:11:18,800 –> 00:11:21,120
Single intake, single backlog, single cadence.
316
00:11:21,120 –> 00:11:23,640
Every agent request goes through one intake path.
317
00:11:23,640 –> 00:11:26,160
One queue, one set of prioritization rules.
318
00:11:26,160 –> 00:11:29,680
Not because bureaucracy is fun, but because parallel agent building
319
00:11:29,680 –> 00:11:33,200
creates incompatible vocabularies and duplicated tool chains.
320
00:11:33,200 –> 00:11:36,560
That turns into conditional chaos faster than any threat actor.
321
00:11:36,560 –> 00:11:38,200
Cadence is also non-negotiable.
322
00:11:38,200 –> 00:11:40,800
A daily build loop for shipping small changes,
323
00:11:40,800 –> 00:11:43,240
a weekly governance review for permissions and publishing,
324
00:11:43,240 –> 00:11:45,160
and an end-of-week KPI check.
325
00:11:45,160 –> 00:11:47,200
If the metrics don’t move, you don’t scale.
326
00:11:47,200 –> 00:11:48,640
You fix.
327
00:11:48,640 –> 00:11:50,800
And that’s the punchline of the operating model.
328
00:11:50,800 –> 00:11:54,400
The platform moves fast, but your organization must move deliberately,
329
00:11:54,400 –> 00:11:55,680
otherwise the system will drift,
330
00:11:55,680 –> 00:11:58,120
and it will drift away from your intent.
331
00:11:58,120 –> 00:12:01,960
Choose the first use case, IT, ticket triage as the entry pillar.
332
00:12:01,960 –> 00:12:05,640
If leadership wants a 30-day win that survives contact with reality,
333
00:12:05,640 –> 00:12:08,240
IT, ticket triage is the entry pillar.
334
00:12:08,240 –> 00:12:12,120
Not because IT is special, but because IT has three things most departments don’t.
335
00:12:12,120 –> 00:12:14,600
Volume, instrumentation, and consequences.
336
00:12:14,600 –> 00:12:18,400
Tickets already have timestamps, categories, owners, and escalation paths.
337
00:12:18,400 –> 00:12:20,160
That means performance is observable,
338
00:12:20,160 –> 00:12:23,880
and when the system gets something wrong, the impact is obvious enough to fix quickly.
339
00:12:23,880 –> 00:12:25,480
It also wins politically.
340
00:12:25,480 –> 00:12:29,360
HR, finance, and legal are high-risk domains with high sensitivity
341
00:12:29,360 –> 00:12:31,480
and low tolerance for probabilistic behavior.
342
00:12:31,480 –> 00:12:34,920
IT service is still risky, but it’s socially acceptable to iterate.
343
00:12:34,920 –> 00:12:38,320
People already expect a service desk to ask clarifying questions.
344
00:12:38,320 –> 00:12:41,520
They don’t expect the payroll agent to take a guess.
345
00:12:41,520 –> 00:12:43,880
So the use case is not built in IT chatbot.
346
00:12:43,880 –> 00:12:47,440
The use case is ticket triage as a controlled decision pipeline.
347
00:12:47,440 –> 00:12:51,480
Classify the issue, enrich the context, attempt a resolution when it’s safe,
348
00:12:51,480 –> 00:12:53,440
and otherwise root to the correct queue
349
00:12:53,440 –> 00:12:56,320
with enough context that the human doesn’t start from zero.
350
00:12:56,320 –> 00:12:57,680
Here’s the flow in plain terms.
351
00:12:57,680 –> 00:12:59,560
A user shows up with free text pane,
352
00:12:59,560 –> 00:13:03,280
Teams message, portal form, email, pick one channel first.
353
00:13:03,280 –> 00:13:05,680
The agent’s first job is intent classification.
354
00:13:05,680 –> 00:13:07,480
Not sentiment, not personality.
355
00:13:07,480 –> 00:13:09,040
What is this in operational terms?
356
00:13:09,040 –> 00:13:14,240
Password reset, VPN, Outlook, device compliance, access request, known outage.
357
00:13:14,240 –> 00:13:16,040
Something is broken with no signal.
358
00:13:16,040 –> 00:13:19,400
That classification determines everything downstream, then comes enrichment.
359
00:13:19,400 –> 00:13:22,000
This is the part most teams skip because it’s not shiny.
360
00:13:22,000 –> 00:13:25,080
The agent needs just enough context to stop wasting human time.
361
00:13:25,080 –> 00:13:28,400
Who the user is, what device they’re on, what service they’re touching,
362
00:13:28,400 –> 00:13:31,240
whether there’s a current incident and whether this is a repeat.
363
00:13:31,240 –> 00:13:35,880
If the organization has an ITSM platform, that’s where the prior history lives.
364
00:13:35,880 –> 00:13:39,560
If the organization has a service catalog, that’s where rooting should land.
365
00:13:39,560 –> 00:13:42,640
If none of that exists, the agent doesn’t magically create it.
366
00:13:42,640 –> 00:13:44,280
It just makes the absence visible.
367
00:13:44,280 –> 00:13:47,760
After enrichment, the agent makes the only decision that matters.
368
00:13:47,760 –> 00:13:50,120
Resolve, root, or create.
369
00:13:50,120 –> 00:13:53,840
Resolve means the agent has a deterministic fix path and the risk is low.
370
00:13:53,840 –> 00:13:56,240
Reset a password through an approved workflow.
371
00:13:56,240 –> 00:13:58,400
Provide a step-by-step runbook with citations.
372
00:13:58,400 –> 00:14:00,560
Confirm the user did it, verify success.
373
00:14:00,560 –> 00:14:01,480
Close the loop.
374
00:14:01,480 –> 00:14:05,200
Root means the agent can’t safely execute, but it can identify the right team
375
00:14:05,200 –> 00:14:06,720
and hand them a clean payload.
376
00:14:06,720 –> 00:14:11,720
The intent, the likely service, the urgency, the impact, the evidence, and what was attempted.
377
00:14:11,720 –> 00:14:13,480
Routing without payload is theater.
378
00:14:13,480 –> 00:14:15,600
Payload is where SLA actually improves.
379
00:14:15,600 –> 00:14:20,000
Create means the user insists on escalation, or the process requires a record,
380
00:14:20,000 –> 00:14:21,480
or the system detects risk.
381
00:14:21,480 –> 00:14:26,280
The agent creates the ticket with structured fields, not a copy paste of the conversation.
382
00:14:26,280 –> 00:14:28,200
This is where power automate earns its keep.
383
00:14:28,200 –> 00:14:30,160
Create a sign, notify, and log.
384
00:14:30,160 –> 00:14:32,120
Deterministic steps stay deterministic.
385
00:14:32,120 –> 00:14:34,280
Now define the containment boundary upfront.
386
00:14:34,280 –> 00:14:38,400
The agent must have a contract that says, these are the things it is allowed to solve end to end.
387
00:14:38,400 –> 00:14:40,720
And these are the things it must never attempt.
388
00:14:40,720 –> 00:14:44,080
Never includes anything privileged, anything financially material,
389
00:14:44,080 –> 00:14:48,440
anything that changes access without approval, and anything that lacks an authoritative source.
390
00:14:48,440 –> 00:14:50,440
That boundary is not about limiting the agent.
391
00:14:50,440 –> 00:14:51,760
It’s about protecting trust.
392
00:14:51,760 –> 00:14:54,320
And this is where the no source, no answer policy starts.
393
00:14:54,320 –> 00:14:57,000
Not in week three, not after the first incident on day one.
394
00:14:57,000 –> 00:15:01,920
If the agent can’t cite a runbook, a policy, a known issue, or a service status update,
395
00:15:01,920 –> 00:15:02,920
it doesn’t answer.
396
00:15:02,920 –> 00:15:03,920
It escalates.
397
00:15:03,920 –> 00:15:07,320
The first time an agent gives a confident, wrong answer to a user who’s already frustrated,
398
00:15:07,320 –> 00:15:10,040
adoption dies quietly, permanently.
399
00:15:10,040 –> 00:15:13,160
So for ticket triage, citations, and evidence aren’t academic.
400
00:15:13,160 –> 00:15:17,720
There how the system earns the right to exist, tie it directly to the KPIs you said earlier.
401
00:15:17,720 –> 00:15:22,600
Deflection comes from resolving the low risk, high volume intents inside the boundary.
402
00:15:22,600 –> 00:15:26,320
First contact resolution comes from clean enrichment plus grounded runbooks.
403
00:15:26,320 –> 00:15:29,960
Fewer escalations come from correct rooting and fewer dead end handoffs.
404
00:15:29,960 –> 00:15:33,360
SLA improvement comes from structured tickets and reduced back and forth.
405
00:15:33,360 –> 00:15:37,240
And the best part is you can measure all of it without inventing new telemetry.
406
00:15:37,240 –> 00:15:40,320
The ITSM system already tracks timestamps and assignments.
407
00:15:40,320 –> 00:15:43,600
You just need to tag agent touched and capture escalation reasons.
408
00:15:43,600 –> 00:15:48,080
The transition to the next section is the uncomfortable constraint that makes triage work.
409
00:15:48,080 –> 00:15:49,320
Topics aren’t free.
410
00:15:49,320 –> 00:15:52,400
Every helpful new intent you add creates ambiguity.
411
00:15:52,400 –> 00:15:54,640
Every ambiguous root creates mysteryage.
412
00:15:54,640 –> 00:15:57,320
And mysteryage is just escalation with extra steps.
413
00:15:57,320 –> 00:16:01,080
So if you want ticket triage to become a pillar instead of a pilot, you have to treat intent
414
00:16:01,080 –> 00:16:02,600
like a design asset.
415
00:16:02,600 –> 00:16:04,040
Not a brainstorm list.
416
00:16:04,040 –> 00:16:08,280
Copilot Studio Design Law intent first, topic second.
417
00:16:08,280 –> 00:16:13,360
Copilot Studio encourages people to think in topics because topics are visible.
418
00:16:13,360 –> 00:16:14,560
They feel like progress.
419
00:16:14,560 –> 00:16:17,360
Click name it, write a few trigger phrases, ship it.
420
00:16:17,360 –> 00:16:18,840
That’s how topics brawl happens.
421
00:16:18,840 –> 00:16:20,360
And topics brawl isn’t just messy.
422
00:16:20,360 –> 00:16:22,080
It’s an entropy generator.
423
00:16:22,080 –> 00:16:26,600
Every new topic adds another overlapping root the system can take, which increases ambiguity,
424
00:16:26,600 –> 00:16:30,680
which increases misclassification, which increases escalations, which makes everyone conclude
425
00:16:30,680 –> 00:16:32,400
the agent doesn’t work.
426
00:16:32,400 –> 00:16:33,400
It does work.
427
00:16:33,400 –> 00:16:35,600
You just turned routing into a probabilistic game.
428
00:16:35,600 –> 00:16:36,760
So the design law is blunt.
429
00:16:36,760 –> 00:16:38,720
intent first, topic second.
430
00:16:38,720 –> 00:16:41,080
Intent is the operational meaning behind the user’s words.
431
00:16:41,080 –> 00:16:42,080
It’s stable.
432
00:16:42,080 –> 00:16:44,360
When you set my password will still exist next quarter.
433
00:16:44,360 –> 00:16:48,040
Can’t get into my account, still maps to the same underlying outcome.
434
00:16:48,040 –> 00:16:50,120
Intent is what you can instrument and improve.
435
00:16:50,120 –> 00:16:51,360
Topics are just routing tables.
436
00:16:51,360 –> 00:16:53,080
They are implementation detail.
437
00:16:53,080 –> 00:16:54,600
That distinction matters.
438
00:16:54,600 –> 00:16:58,440
Because most organizations start by collecting departmental wish lists.
439
00:16:58,440 –> 00:17:00,120
We need a topic for VPN.
440
00:17:00,120 –> 00:17:01,560
We need a topic for printers.
441
00:17:01,560 –> 00:17:02,960
We need a topic for teams.
442
00:17:02,960 –> 00:17:07,760
And then they create a hundred topics that all trigger on the word can’t help or not working.
443
00:17:07,760 –> 00:17:09,120
You didn’t build coverage.
444
00:17:09,120 –> 00:17:10,440
You built collisions.
445
00:17:10,440 –> 00:17:13,280
So the first rule is to cap the initial intent space.
446
00:17:13,280 –> 00:17:14,280
Ten to fifteen intents.
447
00:17:14,280 –> 00:17:18,800
Not because the rest don’t exist, but because you need stability before you need coverage.
448
00:17:18,800 –> 00:17:23,320
In the first 30 days you’re proving that the system can classify and contain reliably.
449
00:17:23,320 –> 00:17:25,800
Not that it can answer every question in the organization.
450
00:17:25,800 –> 00:17:28,880
Here’s what those first intents look like in IT triage.
451
00:17:28,880 –> 00:17:30,280
Password and account access.
452
00:17:30,280 –> 00:17:32,280
VPN or remote access.
453
00:17:32,280 –> 00:17:33,280
Email and calendar.
454
00:17:33,280 –> 00:17:34,600
Teams calling and meetings.
455
00:17:34,600 –> 00:17:35,600
Device compliance.
456
00:17:35,600 –> 00:17:36,600
Wi-Fi.
457
00:17:36,600 –> 00:17:37,600
Software install.
458
00:17:37,600 –> 00:17:38,600
Access request.
459
00:17:38,600 –> 00:17:39,760
Service outage check.
460
00:17:39,760 –> 00:17:42,600
And unknown issue as a controlled catch all.
461
00:17:42,600 –> 00:17:45,640
That’s enough volume to matter and enough clarity to tune.
462
00:17:45,640 –> 00:17:47,560
Now how do you design intents without guessing?
463
00:17:47,560 –> 00:17:51,360
You use intents signals and co-pilot studio gives you more signals than people use.
464
00:17:51,360 –> 00:17:53,880
The obvious signal is user language patterns.
465
00:17:53,880 –> 00:17:56,440
The phrases and synonyms people actually type.
466
00:17:56,440 –> 00:17:58,360
Not what IT calls it, what users call it.
467
00:17:58,360 –> 00:18:02,720
No my laptop won’t connect is not 802.1x supplicant failure.
468
00:18:02,720 –> 00:18:05,280
You model the human input, then translate.
469
00:18:05,280 –> 00:18:06,840
The next signal is metadata.
470
00:18:06,840 –> 00:18:10,800
If you have a portal form, it might include service or category fields.
471
00:18:10,800 –> 00:18:14,760
If you have ITSM integration, you might have existing categories you can map to.
472
00:18:14,760 –> 00:18:17,960
These can reduce ambiguity, but only if your categories aren’t garbage.
473
00:18:17,960 –> 00:18:22,600
If your ITSM taxonomy is 15 variations of other, the agent can’t salvage it.
474
00:18:22,600 –> 00:18:23,600
Then there’s channel context.
475
00:18:23,600 –> 00:18:26,840
A team’s message at 9 a.m. Monday is often I’m stuck right now.
476
00:18:26,840 –> 00:18:28,720
A portal submission might be more structured.
477
00:18:28,720 –> 00:18:32,080
An email to a shared mailbox is often a dump of symptoms.
478
00:18:32,080 –> 00:18:33,200
Channel changes language.
479
00:18:33,200 –> 00:18:36,400
That means channel is part of intent detection, not just where you published.
480
00:18:36,400 –> 00:18:39,240
Now the part most people avoid, the fallback strategy.
481
00:18:39,240 –> 00:18:41,640
In Copilot Studio, fallback is not a safety net.
482
00:18:41,640 –> 00:18:42,960
It is a control surface.
483
00:18:42,960 –> 00:18:47,260
If you let fallback behave like I’ll try to be helpful anyway, you just build hallucinations
484
00:18:47,260 –> 00:18:48,800
into your routing layer.
485
00:18:48,800 –> 00:18:51,960
The agent will invent an intent, pick a tool and act with confidence.
486
00:18:51,960 –> 00:18:55,440
So you need one controlled fallback, one.
487
00:18:55,440 –> 00:18:56,800
Fallback should do three things.
488
00:18:56,800 –> 00:19:03,120
In order, ask one clarifying question to force disambiguation, check for known outage or incident
489
00:19:03,120 –> 00:19:07,440
context and then escalate with a structured payload if it still can’t classify.
490
00:19:07,440 –> 00:19:09,960
No long conversations, no 20 questions.
491
00:19:09,960 –> 00:19:13,300
If the agent can’t classify after one clarifier, it roots.
492
00:19:13,300 –> 00:19:16,280
That keeps the system fast and keeps the failure mode predictable.
493
00:19:16,280 –> 00:19:17,680
And yes, it feels strict.
494
00:19:17,680 –> 00:19:18,680
Good.
495
00:19:18,680 –> 00:19:20,640
Strict is how you prevent conditional chaos.
496
00:19:20,640 –> 00:19:22,160
Now topic lifecycle.
497
00:19:22,160 –> 00:19:25,000
This is the part that prevents your backlog from becoming a museum.
498
00:19:25,000 –> 00:19:27,280
You kill weak topics early, not later.
499
00:19:27,280 –> 00:19:28,880
Early.
500
00:19:28,880 –> 00:19:31,480
Set deprecation criteria upfront.
501
00:19:31,480 –> 00:19:37,800
Low usage, high confusion, low containment, high escalation, high unknown fallback rate,
502
00:19:37,800 –> 00:19:40,080
or repeated misroots into the wrong cues.
503
00:19:40,080 –> 00:19:43,920
If a topic can’t hit containment and routing accuracy targets inside two weeks, it doesn’t
504
00:19:43,920 –> 00:19:44,920
get a third month.
505
00:19:44,920 –> 00:19:49,920
It gets removed or merged because every weak topic you keep becomes permanent ambiguity.
506
00:19:49,920 –> 00:19:51,880
And ambiguity compounds.
507
00:19:51,880 –> 00:19:56,000
Finally, close the loop with a design choice that prevents ghost agents and sprawl later,
508
00:19:56,000 –> 00:19:59,960
treat intents as a shared enterprise asset, not as per agent inventions.
509
00:19:59,960 –> 00:20:04,480
One intent registry, one naming scheme, one owner, one backlog.
510
00:20:04,480 –> 00:20:06,760
Agents can vary by channel and audience.
511
00:20:06,760 –> 00:20:07,760
Intents shouldn’t.
512
00:20:07,760 –> 00:20:10,200
When intents are stable, topics become small and boring.
513
00:20:10,200 –> 00:20:12,040
That’s the point.
514
00:20:12,040 –> 00:20:14,040
Orchestration becomes the real product.
515
00:20:14,040 –> 00:20:17,120
Event to decision to action, to verification, to hand off.
516
00:20:17,120 –> 00:20:19,080
Topics just tell the system where to start.
517
00:20:19,080 –> 00:20:22,920
And that transition matters because once routing is disciplined, you can finally build
518
00:20:22,920 –> 00:20:24,520
the thing people think they are buying.
519
00:20:24,520 –> 00:20:25,920
A control plane.
520
00:20:25,920 –> 00:20:27,120
Orchestration is a control plane.
521
00:20:27,120 –> 00:20:30,280
Once intents are disciplined, the next mistake is thinking the work is done.
522
00:20:30,280 –> 00:20:31,280
It isn’t.
523
00:20:31,280 –> 00:20:32,280
Routing is just a switchboard.
524
00:20:32,280 –> 00:20:34,080
The real product is orchestration.
525
00:20:34,080 –> 00:20:37,920
Orchestration is the control plane that turns an interaction into a verified outcome.
526
00:20:37,920 –> 00:20:41,040
Event to reasoning, to action, to verification, to hand off.
527
00:20:41,040 –> 00:20:43,240
Without that loop, you have a conversational index.
528
00:20:43,240 –> 00:20:45,480
With it, you have something that can actually replace work.
529
00:20:45,480 –> 00:20:48,800
That distinction matters because most early agents behave like this.
530
00:20:48,800 –> 00:20:52,240
The user types a problem, the agent answers in paragraphs, and then the user still has
531
00:20:52,240 –> 00:20:53,840
to do the next five steps.
532
00:20:53,840 –> 00:20:55,360
They copy text into a ticket.
533
00:20:55,360 –> 00:20:56,640
They hunt for the right form.
534
00:20:56,640 –> 00:20:57,640
They message the wrong team.
535
00:20:57,640 –> 00:20:58,640
They repeat themselves.
536
00:20:58,640 –> 00:21:00,320
The agent helped, but nothing moved.
537
00:21:00,320 –> 00:21:02,520
A control plane agent doesn’t aim to be eloquent.
538
00:21:02,520 –> 00:21:03,960
It aims to be operational.
539
00:21:03,960 –> 00:21:06,760
So the orchestration pattern is simple and repeatable.
540
00:21:06,760 –> 00:21:08,280
First, classify.
541
00:21:08,280 –> 00:21:10,280
Confirm the intent and the containment boundary.
542
00:21:10,280 –> 00:21:13,560
If the user asks for something outside the boundary, the agent doesn’t negotiate.
543
00:21:13,560 –> 00:21:14,560
It roots.
544
00:21:14,560 –> 00:21:18,040
That prevents the slow drift from helpful assistant into probabilistic operator.
545
00:21:18,040 –> 00:21:19,800
Second, retrieve.
546
00:21:19,800 –> 00:21:23,160
Pull the minimum authoritative knowledge needed to propose a solution.
547
00:21:23,160 –> 00:21:27,120
That can be a runbook, a policy, a service status item, or a known issue record.
548
00:21:27,120 –> 00:21:28,440
No source means no answer.
549
00:21:28,440 –> 00:21:30,640
This is where the agent proves it’s not improvising.
550
00:21:30,640 –> 00:21:31,800
Third, propose.
551
00:21:31,800 –> 00:21:34,200
The agent gives a short, actionable recommendation.
552
00:21:34,200 –> 00:21:35,200
Not an essay.
553
00:21:35,200 –> 00:21:37,680
Two or three steps max, written like instructions.
554
00:21:37,680 –> 00:21:38,800
And here’s the weird part.
555
00:21:38,800 –> 00:21:40,640
The proposal is not the action.
556
00:21:40,640 –> 00:21:42,440
It’s a plan the system can verify.
557
00:21:42,440 –> 00:21:43,880
Fourth, confirm.
558
00:21:43,880 –> 00:21:47,440
This is where human in the loop becomes precise instead of performative.
559
00:21:47,440 –> 00:21:49,480
You don’t approve the whole agent.
560
00:21:49,480 –> 00:21:50,960
You approve decision points.
561
00:21:50,960 –> 00:21:51,960
Should I reset your password?
562
00:21:51,960 –> 00:21:54,640
Should I create a ticket in this category?
563
00:21:54,640 –> 00:21:56,680
Should I request access on your behalf?
564
00:21:56,680 –> 00:21:59,520
The approval happens at the boundary between read and write.
565
00:21:59,520 –> 00:22:00,520
Fifth, execute.
566
00:22:00,520 –> 00:22:03,240
Only after confirmation and only through allowed tools.
567
00:22:03,240 –> 00:22:06,920
This is where power automate and MCP style tool boundaries matter.
568
00:22:06,920 –> 00:22:11,680
The agent should not be composing ad hoc API calls like a drunk junior developer.
569
00:22:11,680 –> 00:22:14,120
It should call known tools with known parameters.
570
00:22:14,120 –> 00:22:15,720
Sixth, verify.
571
00:22:15,720 –> 00:22:17,600
The agent checks whether the action worked.
572
00:22:17,600 –> 00:22:19,120
Did the password reset succeed?
573
00:22:19,120 –> 00:22:20,520
Did the ticket get created?
574
00:22:20,520 –> 00:22:22,680
Did the user confirm access is restored?
575
00:22:22,680 –> 00:22:26,840
Verification is where agents stop being theater and start being reliable.
576
00:22:26,840 –> 00:22:29,200
Seventh, hand off.
577
00:22:29,200 –> 00:22:32,960
If the agent can’t resolve, it escalates with a structured payload.
578
00:22:32,960 –> 00:22:36,560
Intent, enriched context, sources used, steps attempted,
579
00:22:36,560 –> 00:22:38,520
and what it needs the human to decide.
580
00:22:38,520 –> 00:22:40,040
The human shouldn’t read a transcript.
581
00:22:40,040 –> 00:22:41,200
They should read a case file.
582
00:22:41,200 –> 00:22:43,480
That’s orchestration.
583
00:22:43,480 –> 00:22:47,000
Now the uncomfortable constraint, tool invocation boundaries.
584
00:22:47,000 –> 00:22:50,200
Every tool you connect is an entropy generator if you don’t gate it.
585
00:22:50,200 –> 00:22:51,240
Start read only.
586
00:22:51,240 –> 00:22:52,480
List get search.
587
00:22:52,480 –> 00:22:54,720
Delay create update delete.
588
00:22:54,720 –> 00:22:57,880
If you allow write operations on day one, you’re not accelerating delivery.
589
00:22:57,880 –> 00:22:59,720
You’re creating an incident with better marketing.
590
00:22:59,720 –> 00:23:02,080
So implement a two tier tool policy.
591
00:23:02,080 –> 00:23:05,320
Tier one tools are read only and safe.
592
00:23:05,320 –> 00:23:06,880
Check service status.
593
00:23:06,880 –> 00:23:08,520
Search the knowledge base.
594
00:23:08,520 –> 00:23:09,640
Look up ticket history.
595
00:23:09,640 –> 00:23:11,200
Fetch device compliance state.
596
00:23:11,200 –> 00:23:12,880
These tools reduce uncertainty.
597
00:23:12,880 –> 00:23:14,040
They don’t change state.
598
00:23:14,040 –> 00:23:16,480
Tier two tools are write actions and risky.
599
00:23:16,480 –> 00:23:17,400
Create a ticket.
600
00:23:17,400 –> 00:23:22,320
Update a user attribute, grant access, reset credentials, trigger changes in downstream systems.
601
00:23:22,320 –> 00:23:25,520
These tools require explicit approval and stronger logging.
602
00:23:25,520 –> 00:23:27,200
Some require stronger authentication.
603
00:23:27,200 –> 00:23:28,560
The point is not to slow down.
604
00:23:28,560 –> 00:23:31,720
The point is to keep autonomy proportional to risk.
605
00:23:31,720 –> 00:23:34,400
Human in the loop also needs to be placed correctly.
606
00:23:34,400 –> 00:23:36,360
Approve everything is just a slow agent.
607
00:23:36,360 –> 00:23:41,400
Approve nothing is how you end up explaining to leadership why an LLM updated a production
608
00:23:41,400 –> 00:23:42,400
record.
609
00:23:42,400 –> 00:23:45,760
The control plane placement is approve at irreversible boundaries.
610
00:23:45,760 –> 00:23:50,480
Actions, privileged operations, external communications, anything financially material, anything
611
00:23:50,480 –> 00:23:53,520
that could become an audit question, everything else should run.
612
00:23:53,520 –> 00:23:55,240
And here’s why this improves adoption.
613
00:23:55,240 –> 00:23:56,680
Users don’t want explanations.
614
00:23:56,680 –> 00:23:57,800
They want next actions.
615
00:23:57,800 –> 00:24:00,000
A good orchestration response looks like.
616
00:24:00,000 –> 00:24:02,040
I can do A or B. Here’s what I found.
617
00:24:02,040 –> 00:24:03,040
Pick one.
618
00:24:03,040 –> 00:24:04,600
That turns chat into decisions.
619
00:24:04,600 –> 00:24:05,920
Decisions turn into outcomes.
620
00:24:05,920 –> 00:24:07,440
Outcomes are what executives fund.
621
00:24:07,440 –> 00:24:10,720
So by the end of this section, the system is no longer an agent that talks.
622
00:24:10,720 –> 00:24:13,640
It’s a control plane that roots, acts and verifies.
623
00:24:13,640 –> 00:24:19,000
An orchestration has a hidden dependency and this is where most builds stall, context enrichment.
624
00:24:19,000 –> 00:24:22,200
Because reasoning without context is just confident guessing.
625
00:24:22,200 –> 00:24:24,320
Context enrichment without overreach.
626
00:24:24,320 –> 00:24:28,600
Context enrichment is where most smart agents quietly become privacy incidents.
627
00:24:28,600 –> 00:24:30,440
Because enrichment feels harmless.
628
00:24:30,440 –> 00:24:34,480
Pull a little identity, a little device info, maybe some recent tickets, maybe a list
629
00:24:34,480 –> 00:24:35,480
of installed apps.
630
00:24:35,480 –> 00:24:36,480
What could go wrong?
631
00:24:36,480 –> 00:24:37,480
Overreach goes wrong.
632
00:24:37,480 –> 00:24:39,280
And it goes wrong in two ways at the same time.
633
00:24:39,280 –> 00:24:40,480
You increase risk.
634
00:24:40,480 –> 00:24:41,760
And you decrease accuracy.
635
00:24:41,760 –> 00:24:46,360
The model gets more tokens, more noise, more chance to anchor on irrelevant detail.
636
00:24:46,360 –> 00:24:48,280
You trade it clarity for context hoarding.
637
00:24:48,280 –> 00:24:50,440
So the rule is minimum viable context.
638
00:24:50,440 –> 00:24:53,560
Only the facts the process needs to make the next decision safely.
639
00:24:53,560 –> 00:24:56,040
For IT triage, that minimum set is boring.
640
00:24:56,040 –> 00:24:57,440
And that’s why it works.
641
00:24:57,440 –> 00:24:59,120
First, user identity.
642
00:24:59,120 –> 00:25:00,400
Not a biography.
643
00:25:00,400 –> 00:25:03,760
Just the stable identifiers that matter for rooting and policy.
644
00:25:03,760 –> 00:25:08,320
User principle name, department if it maps to support groups, location if it maps to service
645
00:25:08,320 –> 00:25:11,720
availability, and whether the user is privileged.
646
00:25:11,720 –> 00:25:13,960
User’s change the containment boundary.
647
00:25:13,960 –> 00:25:17,800
An agent can’t treat a help desk admin like an intern with a locked out mailbox.
648
00:25:17,800 –> 00:25:19,160
Second, device state.
649
00:25:19,160 –> 00:25:23,480
If the use case touches endpoint compliance, you need device ID, OS management state and
650
00:25:23,480 –> 00:25:24,560
compliance status.
651
00:25:24,560 –> 00:25:26,400
You don’t need a full inventory dump.
652
00:25:26,400 –> 00:25:27,400
The question is simple.
653
00:25:27,400 –> 00:25:30,480
Can this user do the thing they’re asking for on the device they’re using?
654
00:25:30,480 –> 00:25:31,800
Third, service context.
655
00:25:31,800 –> 00:25:33,160
Is there an active incident?
656
00:25:33,160 –> 00:25:34,520
Is the service degraded?
657
00:25:34,520 –> 00:25:36,000
Is there a change window in progress?
658
00:25:36,000 –> 00:25:37,640
This one is the hidden time saver.
659
00:25:37,640 –> 00:25:39,080
Half of my teams is broken.
660
00:25:39,080 –> 00:25:40,280
Isn’t a user problem.
661
00:25:40,280 –> 00:25:41,200
It’s an outage.
662
00:25:41,200 –> 00:25:44,560
If the agent can detect that early, it stops the endless troubleshooting theatre and
663
00:25:44,560 –> 00:25:46,280
routes to the right message.
664
00:25:46,280 –> 00:25:47,280
Status.
665
00:25:47,280 –> 00:25:48,280
Expectation.
666
00:25:48,280 –> 00:25:49,280
And next check.
667
00:25:49,280 –> 00:25:50,800
Fourth, recent history.
668
00:25:50,800 –> 00:25:51,800
Recent tickets.
669
00:25:51,800 –> 00:25:52,800
Recent similar incidents.
670
00:25:52,800 –> 00:25:54,040
Recent failed attempts.
671
00:25:54,040 –> 00:25:57,400
This is how you avoid re-triaging the same person every week.
672
00:25:57,400 –> 00:25:58,400
But keep it tight.
673
00:25:58,400 –> 00:25:59,400
Last five tickets.
674
00:25:59,400 –> 00:26:00,400
Last seven days.
675
00:26:00,400 –> 00:26:01,400
Same category.
676
00:26:01,400 –> 00:26:04,000
Anything beyond that becomes narrative, not signal.
677
00:26:04,000 –> 00:26:06,120
Fifth, service catalog mapping.
678
00:26:06,120 –> 00:26:10,440
If your ITSM has a catalog, pull the service and the default assignment group that turns
679
00:26:10,440 –> 00:26:14,800
I need help into this belongs to Group X with fields Y populated.
680
00:26:14,800 –> 00:26:16,520
Which is what actually reduces SLA.
681
00:26:16,520 –> 00:26:19,800
Now the sources, people say M365 signals like that’s one thing.
682
00:26:19,800 –> 00:26:20,800
It isn’t.
683
00:26:20,800 –> 00:26:24,000
It’s a pile of systems with different governance, different data boundaries and
684
00:26:24,000 –> 00:26:25,640
different failure modes.
685
00:26:25,640 –> 00:26:29,520
Enrichment sources you can justify in an IT triage workflow are usually
686
00:26:29,520 –> 00:26:33,480
Entra directory attributes, device compliance signals from endpoint management,
687
00:26:33,480 –> 00:26:37,400
ITSM fields and ticket history and a curated service status source.
688
00:26:37,400 –> 00:26:39,880
In some orgs, that service status is in service now.
689
00:26:39,880 –> 00:26:42,040
In others, it’s in a team’s channel post nobody owns.
690
00:26:42,040 –> 00:26:45,680
Either way, make it a source the agent can cite, not a rumor it repeats.
691
00:26:45,680 –> 00:26:48,000
And this is where the system law shows up again.
692
00:26:48,000 –> 00:26:49,000
Normalize inputs.
693
00:26:49,000 –> 00:26:53,320
If your taxonomy for service has 15 spellings of the same thing, enrichment will amplify
694
00:26:53,320 –> 00:26:54,320
the mess.
695
00:26:54,320 –> 00:26:57,320
The agent can only be as deterministic as the labels you feed it.
696
00:26:57,320 –> 00:27:02,800
So define a small taxonomy for the pilot, service, urgency, impact, environment, not 50 fields,
697
00:27:02,800 –> 00:27:03,800
four.
698
00:27:03,800 –> 00:27:04,800
And make the mapping explicit.
699
00:27:04,800 –> 00:27:09,840
If the user says Outlook, the system maps to Exchange Online, not email-ish problem.
700
00:27:09,840 –> 00:27:12,320
Then add a verification step before any right action.
701
00:27:12,320 –> 00:27:15,160
This is the difference between enrichment and hallucination.
702
00:27:15,160 –> 00:27:18,720
The agent should play back the enriched facts and ask for confirmation when those facts
703
00:27:18,720 –> 00:27:19,960
will change behavior.
704
00:27:19,960 –> 00:27:24,000
Your on-device X, it’s non-compliant and this request requires compliance.
705
00:27:24,000 –> 00:27:25,200
Is that correct?
706
00:27:25,200 –> 00:27:29,280
Your requesting access to system Y, which is a privileged app, confirm.
707
00:27:29,280 –> 00:27:30,800
Verification doesn’t mean you approve everything.
708
00:27:30,800 –> 00:27:33,760
It means the agent doesn’t treat guest context as truth.
709
00:27:33,760 –> 00:27:37,720
Now privacy, if you’re tempted to pull HR attributes, manager chains, performance data
710
00:27:37,720 –> 00:27:40,440
or everything in graph because it’s available, stop.
711
00:27:40,440 –> 00:27:41,440
That’s not enrichment.
712
00:27:41,440 –> 00:27:42,920
That’s surveillance, cosplay.
713
00:27:42,920 –> 00:27:47,440
The minimum viable context principle protects you because it forces a justification.
714
00:27:47,440 –> 00:27:49,520
What decision does this field influence?
715
00:27:49,520 –> 00:27:52,720
If the answer is none, the field doesn’t belong in the agent.
716
00:27:52,720 –> 00:27:57,600
And yes, more context sometimes improves accuracy, but accuracy without boundary becomes a liability.
717
00:27:57,600 –> 00:28:01,560
The agent will learn to use data it shouldn’t and users will learn they can prompt it into
718
00:28:01,560 –> 00:28:02,560
revealing it.
719
00:28:02,560 –> 00:28:05,680
So keep enrichment tightly scoped, audited and reversible.
720
00:28:05,680 –> 00:28:10,160
If a field becomes risky, remove it and the system still functions because context is the
721
00:28:10,160 –> 00:28:12,320
hidden engine of orchestration.
722
00:28:12,320 –> 00:28:16,680
But the next failure mode is what happens when the agent has context has a plan and still
723
00:28:16,680 –> 00:28:18,320
answers wrong with confidence.
724
00:28:18,320 –> 00:28:22,720
That’s grounding and it kills adoption faster than any outage ever will.
725
00:28:22,720 –> 00:28:25,680
The failure mode that kills adoption, confident wrong answer.
726
00:28:25,680 –> 00:28:28,640
If you remember one thing about adoption, make it this.
727
00:28:28,640 –> 00:28:30,880
Users will forgive, I don’t know.
728
00:28:30,880 –> 00:28:34,080
They will not forgive, I’m sure, followed by being wrong.
729
00:28:34,080 –> 00:28:38,080
It’s the failure mode that kills an agent program in week two because the first time an agent
730
00:28:38,080 –> 00:28:41,800
answers confidently and incorrectly, the user doesn’t file a bug report.
731
00:28:41,800 –> 00:28:45,040
They don’t politely provide feedback, they screenshot it, paste it into a team’s chat
732
00:28:45,040 –> 00:28:47,840
and the story becomes co-pilot make stuff up.
733
00:28:47,840 –> 00:28:51,800
And once that narrative exists, every future success gets dismissed as luck.
734
00:28:51,800 –> 00:28:53,680
This is why grounding isn’t an enhancement.
735
00:28:53,680 –> 00:28:55,240
It’s a survival requirement.
736
00:28:55,240 –> 00:28:57,360
Grounding failures usually happen for three reasons.
737
00:28:57,360 –> 00:29:01,320
First, the agent answers from its general model knowledge instead of your enterprise truth.
738
00:29:01,320 –> 00:29:05,400
It’s fine when the question is generic and catastrophic when the question is policy, HR,
739
00:29:05,400 –> 00:29:06,880
security or internal process.
740
00:29:06,880 –> 00:29:11,320
The model can sound correct while being wrong in exactly the ways that matter to auditors.
741
00:29:11,320 –> 00:29:13,600
Second, the agent retrieves the wrong document.
742
00:29:13,600 –> 00:29:16,760
Not because retrieval is broken but because your content is ambiguous.
743
00:29:16,760 –> 00:29:20,440
Two similar policies, a stale runbook, a SharePoint page with three unrelated procedures
744
00:29:20,440 –> 00:29:21,920
jammed into one.
745
00:29:21,920 –> 00:29:24,600
Retrieval doesn’t fix entropy, it indexes it.
746
00:29:24,600 –> 00:29:27,280
Third, the agent blends sources.
747
00:29:27,280 –> 00:29:31,360
It pulls one chunk from one dock, another chunk from another dock and then stitches a reasonable
748
00:29:31,360 –> 00:29:32,960
answer that never existed.
749
00:29:32,960 –> 00:29:35,520
It feels helpful, it also becomes impossible to defend.
750
00:29:35,520 –> 00:29:38,560
So the control rule has to be explicit and enforced.
751
00:29:38,560 –> 00:29:40,960
Citations required for any non-trivial claim.
752
00:29:40,960 –> 00:29:43,320
Not citations are nice, required.
753
00:29:43,320 –> 00:29:45,840
If the user asks, “What’s the VPN setup?”
754
00:29:45,840 –> 00:29:50,200
The agent can cite, “If the user asks, can I access this customer data set from my personal
755
00:29:50,200 –> 00:29:51,200
device?”
756
00:29:51,200 –> 00:29:54,840
The agent’s sites or escalates, “If the agent can’t produce a source, it doesn’t answer,
757
00:29:54,840 –> 00:29:55,840
it routes.”
758
00:29:55,840 –> 00:30:00,160
Also where you separate knowledge from action, answering and doing our different risk classes
759
00:30:00,160 –> 00:30:03,120
and the platform won’t keep them separate unless you design it that way.
760
00:30:03,120 –> 00:30:07,720
A grounded answer is red behavior, it’s retrieval plus summarization with evidence, an action
761
00:30:07,720 –> 00:30:10,800
is right behavior, it’s changing state in a system of record.
762
00:30:10,800 –> 00:30:15,160
You don’t grant permissions because the agent wrote a persuasive paragraph, you grant permissions
763
00:30:15,160 –> 00:30:19,840
because a tool call executed under a constrained identity with explicit approval and a log
764
00:30:19,840 –> 00:30:20,840
you can defend.
765
00:30:20,840 –> 00:30:24,680
That distinction matters because many teams accidentally couple them.
766
00:30:24,680 –> 00:30:26,920
The agent answers, therefore it acts.
767
00:30:26,920 –> 00:30:30,600
That’s how you end up with tool misuse driven by conversational confidence.
768
00:30:30,600 –> 00:30:35,440
Now define grounded accuracy because vague quality discussions turn into feelings.
769
00:30:35,440 –> 00:30:40,880
Grounded accuracy means in a test set of real questions, the agent’s answer is supported
770
00:30:40,880 –> 00:30:44,240
by the cited source and the source is the correct source for that question.
771
00:30:44,240 –> 00:30:48,360
Not close, not sounds right, supported and correct.
772
00:30:48,360 –> 00:30:52,080
You measure it with sampling, you don’t need a PhD evaluation framework to start.
773
00:30:52,080 –> 00:30:53,480
You need a fixed question set.
774
00:30:53,480 –> 00:30:57,960
The top intents, the top policies, the top runbooks and the top known issue questions.
775
00:30:57,960 –> 00:31:02,120
You run them weekly, you score correct with correct citation, correct with wrong citation,
776
00:31:02,120 –> 00:31:06,240
incorrect with citation, incorrect with no citation and escalated appropriately.
777
00:31:06,240 –> 00:31:11,640
Your target in 30 days is greater than 85% grounded accuracy on that evaluation set.
778
00:31:11,640 –> 00:31:15,600
That’s realistic if you constrain scope and enforce no source, no answer.
779
00:31:15,600 –> 00:31:20,680
And you categorize failure reasons because fixing the wrong thing wastes weeks.
780
00:31:20,680 –> 00:31:23,640
Using doc, the knowledge doesn’t exist in an indexable form.
781
00:31:23,640 –> 00:31:28,040
Wrong doc, retrieval pulled in adjacent policy often because metadata is weak.
782
00:31:28,040 –> 00:31:30,760
Wrong inference, the agent made a leap the doc didn’t support.
783
00:31:30,760 –> 00:31:33,200
Stale doc, the truth changed, the index didn’t.
784
00:31:33,200 –> 00:31:36,480
Now the part everyone avoids until it hurts, red teaming prompts.
785
00:31:36,480 –> 00:31:37,480
Do it early.
786
00:31:37,480 –> 00:31:41,840
Not because you expect a nation state attacker in week one, but because normal users accidentally
787
00:31:41,840 –> 00:31:43,240
behave like attackers.
788
00:31:43,240 –> 00:31:47,760
They paste emails, they paste error messages, they paste internal links and sometimes they
789
00:31:47,760 –> 00:31:50,080
paste instructions that conflict with policy.
790
00:31:50,080 –> 00:31:51,400
The model will try to comply.
791
00:31:51,400 –> 00:31:56,240
So your red team set includes prompt injection attempts, requests to ignore policy, requests
792
00:31:56,240 –> 00:32:01,160
to reveal sensitive data and instructions to perform actions outside the containment boundary.
793
00:32:01,160 –> 00:32:05,680
You run them against the agent before you expand the pilot group and the key is you don’t
794
00:32:05,680 –> 00:32:08,440
treat failures as the model is dumb.
795
00:32:08,440 –> 00:32:13,800
You treat them as design signals, tighten boundaries, improve retrieval, add an escalation clause,
796
00:32:13,800 –> 00:32:16,720
remove an unsafe tool because trust doesn’t come from charm.
797
00:32:16,720 –> 00:32:20,880
It comes from predictable behavior and grounding is what makes behavior predictable, but grounding
798
00:32:20,880 –> 00:32:23,440
can’t be implemented as vibes and prompt warnings.
799
00:32:23,440 –> 00:32:27,400
It needs a computable knowledge layer and a retrieval strategy you can tune.
800
00:32:27,400 –> 00:32:28,640
SharePoints brawl won’t save you.
801
00:32:28,640 –> 00:32:31,160
That’s why the next section is Azure AI search.
802
00:32:31,160 –> 00:32:35,000
Turning knowledge into something the system can actually retrieve on purpose.
803
00:32:35,000 –> 00:32:37,960
As your AI search make knowledge computable.
804
00:32:37,960 –> 00:32:39,560
SharePoint is not a knowledge strategy.
805
00:32:39,560 –> 00:32:41,680
It’s a document landfill with a search box.
806
00:32:41,680 –> 00:32:43,360
And yes, Microsoft search has improved.
807
00:32:43,360 –> 00:32:46,840
The pilot can sometimes find the right page, but sometimes is exactly the problem.
808
00:32:46,840 –> 00:32:51,880
An agentic workforce can’t run on probabilistic discovery when the output needs to be audible,
809
00:32:51,880 –> 00:32:53,320
repeatable and fast.
810
00:32:53,320 –> 00:32:55,880
The system needs knowledge it can retrieve on purpose.
811
00:32:55,880 –> 00:32:57,960
That’s what Azure AI search actually does.
812
00:32:57,960 –> 00:32:59,480
It doesn’t make content smarter.
813
00:32:59,480 –> 00:33:01,080
It makes content computable.
814
00:33:01,080 –> 00:33:05,360
Indexed, chunked, tagged, refreshed and security trimmed so retrieval becomes a designed
815
00:33:05,360 –> 00:33:07,240
behavior instead of a hope.
816
00:33:07,240 –> 00:33:10,840
That distinction matters because grounding collapses when retrieval is accidental.
817
00:33:10,840 –> 00:33:15,440
So the simple version is Azure AI search turns your messy pile of documents into an index
818
00:33:15,440 –> 00:33:17,280
the agent can query with structure.
819
00:33:17,280 –> 00:33:18,680
And the structure is the whole game.
820
00:33:18,680 –> 00:33:21,720
The first design choice is what you index.
821
00:33:21,720 –> 00:33:25,280
Most organizations point at the SharePoint site and call it done.
822
00:33:25,280 –> 00:33:27,960
That’s how you get blended answers and stale policy conflicts.
823
00:33:27,960 –> 00:33:31,280
Instead, index the knowledge that is allowed to be operational truth.
824
00:33:31,280 –> 00:33:36,480
Runbooks approved SOPs, policy documents, known issue articles, service status notices and
825
00:33:36,480 –> 00:33:38,200
service catalog entries.
826
00:33:38,200 –> 00:33:41,520
Not drafts, not personal notes, not the random wiki nobody owns.
827
00:33:41,520 –> 00:33:45,760
If it doesn’t have an owner and a life cycle, it doesn’t belong in an index feeding production
828
00:33:45,760 –> 00:33:46,760
answers.
829
00:33:46,760 –> 00:33:47,760
Then comes chunking.
830
00:33:47,760 –> 00:33:50,360
This is where retrieval either stays clean or becomes a smear.
831
00:33:50,360 –> 00:33:54,280
chunking is splitting documents into smaller pieces so the system can retrieve the exact
832
00:33:54,280 –> 00:33:56,000
part that answers the question.
833
00:33:56,000 –> 00:34:00,520
If your chunk is too large, the model gets an entire page with three procedures and it
834
00:34:00,520 –> 00:34:01,640
will blend them.
835
00:34:01,640 –> 00:34:06,400
If the chunk is too small, the model loses context and starts inventing transitions.
836
00:34:06,400 –> 00:34:08,120
The right answer is boring.
837
00:34:08,120 –> 00:34:09,680
Product by atomic procedure.
838
00:34:09,680 –> 00:34:14,080
One policy section per chunk, one runbook step sequence per chunk, one exception clause per
839
00:34:14,080 –> 00:34:15,080
chunk.
840
00:34:15,080 –> 00:34:16,880
The goal is not to index the dog.
841
00:34:16,880 –> 00:34:18,880
The goal is to index the decision unit.
842
00:34:18,880 –> 00:34:20,720
That’s the atomic knowledge rule.
843
00:34:20,720 –> 00:34:24,680
If a piece of content can’t stand alone as an answer source, it’s not a good chunk.
844
00:34:24,680 –> 00:34:25,680
Next is metadata.
845
00:34:25,680 –> 00:34:28,720
Without metadata, retrieval becomes a popularity contest.
846
00:34:28,720 –> 00:34:32,760
Metadata is how you turn VPN policy into VPN policy for contractors.
847
00:34:32,760 –> 00:34:39,200
One EU applies to Windows, updated 2025-01-12-Owner security ops.
848
00:34:39,200 –> 00:34:41,320
The agent doesn’t need all of that in the response.
849
00:34:41,320 –> 00:34:44,560
It needs it for filtering and ranking so it doesn’t retrieve the wrong thing.
850
00:34:44,560 –> 00:34:49,480
So tag content by service, audience, region, risk tier, dog type and last review date.
851
00:34:49,480 –> 00:34:51,960
Keep the taxonomy small, consistent and enforced.
852
00:34:51,960 –> 00:34:55,840
If your metadata is optional, it will be missing on the documents that matter most.
853
00:34:55,840 –> 00:34:57,160
That’s how entropy works.
854
00:34:57,160 –> 00:34:59,680
Now security trimming, this is not a nice to have.
855
00:34:59,680 –> 00:35:04,120
If the index can retrieve content, the user shouldn’t see, you will eventually leak something.
856
00:35:04,120 –> 00:35:07,920
Not because the model is malicious, because a user will ask a question that causes retrieval
857
00:35:07,920 –> 00:35:11,480
to surface restricted content and the system will try to be helpful.
858
00:35:11,480 –> 00:35:13,920
So the index must respect access controls.
859
00:35:13,920 –> 00:35:17,720
Retrieval should only return chunks the requesting user is entitled to read.
860
00:35:17,720 –> 00:35:22,680
In other words, your knowledge plane must obey the same boundary rules as your data plane.
861
00:35:22,680 –> 00:35:24,480
Refresh cadence is the next trap.
862
00:35:24,480 –> 00:35:25,920
Static index becomes wrong.
863
00:35:25,920 –> 00:35:30,160
Next policies change, outages resolve, runbooks get updated after incidents.
864
00:35:30,160 –> 00:35:35,200
If your index refreshes weekly and your operations change daily, the agent will confidently answer
865
00:35:35,200 –> 00:35:36,680
with yesterday’s truth.
866
00:35:36,680 –> 00:35:37,840
Users will notice.
867
00:35:37,840 –> 00:35:38,840
Trust will die.
868
00:35:38,840 –> 00:35:43,880
So set refresh cadence by dog type, service status and known issues refresh frequently.
869
00:35:43,880 –> 00:35:48,120
Policies refresh on publish, runbooks refresh on change control and make last indexed visible
870
00:35:48,120 –> 00:35:52,840
in telemetry because stale answers look identical to hallucinations from the user’s perspective.
871
00:35:52,840 –> 00:35:56,720
Now the output requirement shows sources, not because citations feel academic because
872
00:35:56,720 –> 00:35:59,280
they’re the only way to make the system defensible.
873
00:35:59,280 –> 00:36:03,060
The agent response should include the answer, the linked source and the specific section
874
00:36:03,060 –> 00:36:04,920
title or except reference.
875
00:36:04,920 –> 00:36:07,160
If the system can’t provide that, it escalates.
876
00:36:07,160 –> 00:36:08,240
No source, no answer.
877
00:36:08,240 –> 00:36:11,440
That rule becomes enforceable when retrieval is a design component.
878
00:36:11,440 –> 00:36:12,440
And here’s the weird part.
879
00:36:12,440 –> 00:36:16,280
Once you implement Azure AI search, you stop arguing about prompt quality as if it’s
880
00:36:16,280 –> 00:36:17,280
the product.
881
00:36:17,280 –> 00:36:18,640
Prompts become thin glue.
882
00:36:18,640 –> 00:36:19,640
Retrieval becomes the product.
883
00:36:19,640 –> 00:36:24,360
Co-pilot studio can sit on top as the orchestration layer, but as your AI search becomes the grounded
884
00:36:24,360 –> 00:36:28,440
knowledge backbone, it’s how you make policy computable, not just searchable.
885
00:36:28,440 –> 00:36:32,160
But retrieval still isn’t action, knowing the right runbook step doesn’t execute the
886
00:36:32,160 –> 00:36:33,160
step.
887
00:36:33,160 –> 00:36:36,880
Knowing which ticket category applies doesn’t create the ticket for that you need tools
888
00:36:36,880 –> 00:36:39,520
that are predictable, governed and reusable.
889
00:36:39,520 –> 00:36:43,040
That’s where MCP shows up and why it matters more than most people want to admit.
890
00:36:43,040 –> 00:36:45,880
MCP, turning co-pilot from chat into a system.
891
00:36:45,880 –> 00:36:48,800
Azure AI search makes knowledge retrievable on purpose.
892
00:36:48,800 –> 00:36:51,520
But the next failure mode shows up immediately.
893
00:36:51,520 –> 00:36:54,440
The agent still can’t do anything predictable with that knowledge.
894
00:36:54,440 –> 00:36:58,800
It can explain a runbook, it can cite a policy, and then it stops waiting for a human to
895
00:36:58,800 –> 00:37:00,720
carry the work across the finish line.
896
00:37:00,720 –> 00:37:02,240
That’s where MCP comes in.
897
00:37:02,240 –> 00:37:06,040
Most people hear model context protocol and think it’s a developer convenience.
898
00:37:06,040 –> 00:37:07,320
It is not.
899
00:37:07,320 –> 00:37:12,280
In enterprise terms, MCP is a standard contract for tools, a predictable way for an agent
900
00:37:12,280 –> 00:37:15,960
to discover capabilities, call them and receive structured results.
901
00:37:15,960 –> 00:37:18,680
Especpo glue, more reusable capability.
902
00:37:18,680 –> 00:37:22,840
And that distinction matters because without a standard tool interface, every agent becomes
903
00:37:22,840 –> 00:37:24,840
a one off integration project.
904
00:37:24,840 –> 00:37:27,720
One bot talks to service now through a custom connector.
905
00:37:27,720 –> 00:37:31,320
Another talks to Gira through a different pattern, a third one hits graph with a different
906
00:37:31,320 –> 00:37:32,560
auth model.
907
00:37:32,560 –> 00:37:34,720
Over time, you’re not building an agent ecosystem.
908
00:37:34,720 –> 00:37:36,560
You’re building an integration junkyard.
909
00:37:36,560 –> 00:37:40,520
Okay, so basically, MCP makes tools legible to agents.
910
00:37:40,520 –> 00:37:43,120
A tool isn’t just an API endpoint.
911
00:37:43,120 –> 00:37:46,400
A tool becomes name, description, parameters and expected output.
912
00:37:46,400 –> 00:37:50,040
That’s what gives the agent the ability to plan and execute in a loop without you hard
913
00:37:50,040 –> 00:37:51,040
coding every branch.
914
00:37:51,040 –> 00:37:55,040
It’s the difference between a human seeing a labeled button that says create ticket versus
915
00:37:55,040 –> 00:37:59,200
a human being handed a raw rest API spec and being told, figure it out.
916
00:37:59,200 –> 00:38:01,640
Now why does that translate into fast ROI?
917
00:38:01,640 –> 00:38:06,320
Because MCP shifts effort from building new agents to reusing the same small set of enterprise
918
00:38:06,320 –> 00:38:07,320
tools everywhere.
919
00:38:07,320 –> 00:38:10,000
Build one good ticket create tool.
920
00:38:10,000 –> 00:38:14,280
Use it across IT triage, HR requests, access requests and facilities.
921
00:38:14,280 –> 00:38:16,640
Build one service status lookup tool.
922
00:38:16,640 –> 00:38:18,520
Reuse it across every support experience.
923
00:38:18,520 –> 00:38:21,640
Build one KB retrieval with citations tool.
924
00:38:21,640 –> 00:38:23,400
Reuse it everywhere grounded answers matter.
925
00:38:23,400 –> 00:38:25,480
That’s how you scale without multiplying chaos.
926
00:38:25,480 –> 00:38:30,520
But the uncomfortable truth is MCP also accelerates the risk you are already going to have.
927
00:38:30,520 –> 00:38:32,200
Because tools are authority.
928
00:38:32,200 –> 00:38:36,200
And when you give an agent a tool that can write, you’ve handed it a lever that moves production
929
00:38:36,200 –> 00:38:37,200
systems.
930
00:38:37,200 –> 00:38:40,920
Even law for MCP in the first 30 days is strict.
931
00:38:40,920 –> 00:38:42,760
Read operations first.
932
00:38:42,760 –> 00:38:44,600
Write operations gated.
933
00:38:44,600 –> 00:38:46,960
Read list get search tools are where you start.
934
00:38:46,960 –> 00:38:49,840
They increase accuracy without changing state.
935
00:38:49,840 –> 00:38:54,080
Write tools create update delete only enter the system when you’ve already proven
936
00:38:54,080 –> 00:38:56,640
routing stability grounding discipline and logging.
937
00:38:56,640 –> 00:38:59,240
And even then you gate them behind explicit approvals.
938
00:38:59,240 –> 00:39:00,800
This is not a philosophical stance.
939
00:39:00,800 –> 00:39:02,640
It’s entropy management.
940
00:39:02,640 –> 00:39:07,460
The fastest way to create an agentic incident is to connect a right capable tool with no
941
00:39:07,460 –> 00:39:10,120
life cycle, no allow list and no rollback.
942
00:39:10,120 –> 00:39:11,960
And yes, tokens sprawl becomes real here.
943
00:39:11,960 –> 00:39:16,360
API tokens, client secrets and unmanaged credentials become shadow admin keys.
944
00:39:16,360 –> 00:39:19,680
The moment they get copied into three environments and ten agents.
945
00:39:19,680 –> 00:39:22,360
The organization forgets they exist until one expires.
946
00:39:22,360 –> 00:39:24,720
Or worse, gets reused somewhere it shouldn’t.
947
00:39:24,720 –> 00:39:28,320
So MCP governance starts with a tool allow list and a denial list.
948
00:39:28,320 –> 00:39:32,800
That means only approved MCP servers and approved tools inside those servers.
949
00:39:32,800 –> 00:39:37,520
Denialist means explicitly block classes of tools you know you don’t want.
950
00:39:37,520 –> 00:39:42,040
Delete operations bulk updates privilege grants anything that changes identity access or
951
00:39:42,040 –> 00:39:44,400
finance without a human boundary.
952
00:39:44,400 –> 00:39:47,360
That’s governance by design not governance after cleanup.
953
00:39:47,360 –> 00:39:51,240
Now how does MCP actually turn chat into a system?
954
00:39:51,240 –> 00:39:52,840
Because it creates a closed loop.
955
00:39:52,840 –> 00:39:57,840
The agent can reason call a tool read the response validated and decide the next step.
956
00:39:57,840 –> 00:40:02,400
But loop is what makes an agent definition hold runs tools in a loop to achieve a goal
957
00:40:02,400 –> 00:40:03,400
without tools.
958
00:40:03,400 –> 00:40:06,280
The agent is a narrator with tools it becomes an operator.
959
00:40:06,280 –> 00:40:07,920
But only if the tools are predictable.
960
00:40:07,920 –> 00:40:10,280
So you keep tool descriptions short and specific.
961
00:40:10,280 –> 00:40:14,440
You don’t expose 50 vaguely named operations and hope the model chooses the right one.
962
00:40:14,440 –> 00:40:16,800
That is just conditional chaos with better packaging.
963
00:40:16,800 –> 00:40:21,560
You expose a small well label tool set that matches your orchestration steps.
964
00:40:21,560 –> 00:40:27,400
Classify retrieve check status, create ticket, update ticket, notify user, request approval.
965
00:40:27,400 –> 00:40:31,400
When you instrument tool outcomes tool errors matter more than model errors because tool errors
966
00:40:31,400 –> 00:40:36,240
break workflows track which tool failed, why it failed and what the agent did next.
967
00:40:36,240 –> 00:40:40,960
If the agent retries endlessly you’ve built an infinite loop that burns budget and trust.
968
00:40:40,960 –> 00:40:45,560
Finally the key design choice that prevents ghost agents later shows up again.
969
00:40:45,560 –> 00:40:51,560
Prefer reusable tools over reusable agents agents are experiences tools are capabilities.
970
00:40:51,560 –> 00:40:55,880
When you standardize tools agents can stay small, domain scoped and disposable.
971
00:40:55,880 –> 00:41:00,160
And you don’t every agent becomes a fragile snowflake that nobody can maintain.
972
00:41:00,160 –> 00:41:02,240
So MCP isn’t the cool protocol.
973
00:41:02,240 –> 00:41:06,640
It’s the enforcement layer that makes tools composable, governable and repeatable.
974
00:41:06,640 –> 00:41:10,200
Now that you have retrieval that’s computable in tools that are standardized you can finally
975
00:41:10,200 –> 00:41:12,120
connect the two into a real system.
976
00:41:12,120 –> 00:41:14,480
So next it stops being theoretical.
977
00:41:14,480 –> 00:41:19,520
IT ticket triage end to end with co-pilot studio routing, Azure AI search grounding,
978
00:41:19,520 –> 00:41:23,640
MCP tools for actions and power automate doing the deterministic work.
979
00:41:23,640 –> 00:41:27,040
Demo architecture one, IT ticket triage end to end.
980
00:41:27,040 –> 00:41:31,800
Here’s the end to end demo architecture that makes executives stop asking so it’s chat and
981
00:41:31,800 –> 00:41:34,160
start asking when can this hit production.
982
00:41:34,160 –> 00:41:38,160
The flow is intentionally boring because boring is how enterprise systems survive.
983
00:41:38,160 –> 00:41:43,360
Start with the users issue coming in through one channel, pick teams or a portal first, don’t do all of them.
984
00:41:43,360 –> 00:41:46,160
Channel sprawl is just topics sprawl with better UI.
985
00:41:46,160 –> 00:41:52,840
The user types free text, VPN died, outlook won’t send, can’t access SharePoint, whatever.
986
00:41:52,840 –> 00:41:55,800
The first step is intent classification in co-pilot studio.
987
00:41:55,800 –> 00:41:59,000
This is where your 10, 15 intents actually earn their keep.
988
00:41:59,000 –> 00:42:03,920
The agent selects an intent and immediately applies the containment boundary tied to that intent.
989
00:42:03,920 –> 00:42:06,520
Then the agent enriches context, not everything.
990
00:42:06,520 –> 00:42:11,520
Minimum viable context, user identity, device compliance state if relevant and service context
991
00:42:11,520 –> 00:42:12,840
like known outages.
992
00:42:12,840 –> 00:42:15,480
This enrichment should come from predictable sources.
993
00:42:15,480 –> 00:42:19,880
Entra attributes endpoint management signals and the IT SM ticket history.
994
00:42:19,880 –> 00:42:23,920
If the organization doesn’t have those sources well defined, the demo still works.
995
00:42:23,920 –> 00:42:28,200
It just surfaces the real constraint your operations are not computable yet.
996
00:42:28,200 –> 00:42:32,760
Now the fork in the road, resolve root or create, resolve means the agent can safely close
997
00:42:32,760 –> 00:42:33,760
the loop.
998
00:42:33,760 –> 00:42:36,400
This is where the grounded knowledge path triggers.
999
00:42:36,400 –> 00:42:39,880
It retrieves a runbook or known issue article and answers with citations.
1000
00:42:39,880 –> 00:42:44,320
If the fix requires a deterministic step, like triggering a password reset workflow,
1001
00:42:44,320 –> 00:42:46,400
that step should not be done by reasoning.
1002
00:42:46,400 –> 00:42:50,480
It should be done by a tool called in this demo, Power Automate handles that deterministic
1003
00:42:50,480 –> 00:42:51,480
execution.
1004
00:42:51,480 –> 00:42:55,600
The agent proposes the action, confirms with the user at the right boundary, then Power
1005
00:42:55,600 –> 00:42:57,640
Automate executes and logs.
1006
00:42:57,640 –> 00:43:00,800
Root means the agent can’t solve, but it can root cleanly.
1007
00:43:00,800 –> 00:43:02,280
The output isn’t a transcript.
1008
00:43:02,280 –> 00:43:08,000
It’s a payload, intent, impacted service, urgency, device state, and what was already attempted.
1009
00:43:08,000 –> 00:43:11,280
That payload becomes a ticket description plus structured fields.
1010
00:43:11,280 –> 00:43:13,400
The human gets a case file, not a chat log.
1011
00:43:13,400 –> 00:43:14,680
That’s what reduces SLA.
1012
00:43:14,680 –> 00:43:15,960
The human doesn’t retry out.
1013
00:43:15,960 –> 00:43:17,880
They start where the agent left off.
1014
00:43:17,880 –> 00:43:20,480
Create means you must open a ticket no matter what.
1015
00:43:20,480 –> 00:43:23,960
Policy requires it, access requires it, or the user insists.
1016
00:43:23,960 –> 00:43:28,320
The agent still adds value by making the ticket structured and pre-classified.
1017
00:43:28,320 –> 00:43:32,000
That’s the difference between we deployed co-pilot and we reduced backlog.
1018
00:43:32,000 –> 00:43:36,240
Now where each product fits, co-pilot studio drives the orchestration and routing.
1019
00:43:36,240 –> 00:43:37,520
Topics are just the front door.
1020
00:43:37,520 –> 00:43:39,480
The core logic is the decision loop.
1021
00:43:39,480 –> 00:43:44,160
Classify, enrich, retrieve, propose, confirm, execute, verify, hand off.
1022
00:43:44,160 –> 00:43:45,760
Power Automate is the system’s muscle.
1023
00:43:45,760 –> 00:43:49,480
It handles the deterministic steps you never want the model inventing.
1024
00:43:49,480 –> 00:43:55,800
Create ticket, update ticket, assign queue, notify user, write to audit log, post to teams,
1025
00:43:55,800 –> 00:43:56,880
and trigger approvals.
1026
00:43:56,880 –> 00:44:00,000
If it’s an if this then that action, power automate does it.
1027
00:44:00,000 –> 00:44:02,920
The agent decides when to call it, not how to rewrite it.
1028
00:44:02,920 –> 00:44:08,400
The ITSM backend, service now, Gira service management, whatever, remains the system of record.
1029
00:44:08,400 –> 00:44:09,400
That matters.
1030
00:44:09,400 –> 00:44:11,160
You are not replacing ITSM.
1031
00:44:11,160 –> 00:44:14,920
You’re improving the front end decision quality and reducing the human time spent on intake,
1032
00:44:14,920 –> 00:44:16,400
classification, and backend fourth.
1033
00:44:16,400 –> 00:44:20,480
If someone tries to rebuild ITSM with agents, the demo should fail on purpose.
1034
00:44:20,480 –> 00:44:21,720
Because that’s how programs die.
1035
00:44:21,720 –> 00:44:23,400
Now layer in MCP where it’s useful.
1036
00:44:23,400 –> 00:44:28,840
In the demo, MCP provides standardized tools to interact with ITSM and status sources.
1037
00:44:28,840 –> 00:44:29,840
Retools first.
1038
00:44:29,840 –> 00:44:31,640
List open incidents for this user.
1039
00:44:31,640 –> 00:44:33,080
Check service status.
1040
00:44:33,080 –> 00:44:35,480
Retrieve ticket templates, fetch routing groups.
1041
00:44:35,480 –> 00:44:38,880
Write tools only where you’ve already defined approval and rollback.
1042
00:44:38,880 –> 00:44:41,600
Create ticket, update ticket, request access.
1043
00:44:41,600 –> 00:44:44,200
The point is tool predictability, not novelty.
1044
00:44:44,200 –> 00:44:46,120
These criteria are not subjective.
1045
00:44:46,120 –> 00:44:47,120
Containment rate.
1046
00:44:47,120 –> 00:44:50,080
How many interactions resolved without ticket creation?
1047
00:44:50,080 –> 00:44:55,120
That’s deflection.resolution time for the interactions the agent touches does cycle time drop.
1048
00:44:55,120 –> 00:44:57,800
That’s SLA impact escalation reduction.
1049
00:44:57,800 –> 00:45:00,600
Fewer wrong handoffs, fewer ping pong assignments.
1050
00:45:00,600 –> 00:45:02,120
That’s operational stability.
1051
00:45:02,120 –> 00:45:03,440
And you instrument it.
1052
00:45:03,440 –> 00:45:07,920
Every run logs detected intent, confidence, retrieved sources, tools invoked, execution
1053
00:45:07,920 –> 00:45:10,760
status, escalation reason, and final outcome.
1054
00:45:10,760 –> 00:45:12,720
If you can’t answer why did it do that?
1055
00:45:12,720 –> 00:45:13,720
You don’t have an agent.
1056
00:45:13,720 –> 00:45:14,840
You have a magic trick.
1057
00:45:14,840 –> 00:45:18,760
The best part of this demo is you can run it with a small pilot group in week two and
1058
00:45:18,760 –> 00:45:20,160
you can improve it daily.
1059
00:45:20,160 –> 00:45:24,680
You’ll discover missing knowledge coverage, ambiguous intents and tool errors immediately.
1060
00:45:24,680 –> 00:45:25,680
That’s not failure.
1061
00:45:25,680 –> 00:45:27,560
That’s the system finally telling the truth.
1062
00:45:27,560 –> 00:45:28,560
And here’s the transition.
1063
00:45:28,560 –> 00:45:32,320
Once you can triage end to end, the next demo isn’t about tickets.
1064
00:45:32,320 –> 00:45:34,040
It’s about trust.
1065
00:45:34,040 –> 00:45:37,440
Grounded policy answers with evidence or nothing.
1066
00:45:37,440 –> 00:45:39,120
Demo architecture two.
1067
00:45:39,120 –> 00:45:40,920
Grounded policy answers with evidence.
1068
00:45:40,920 –> 00:45:43,000
The second demo exists for one reason.
1069
00:45:43,000 –> 00:45:46,400
Policy answers are where hallucinations become career limiting.
1070
00:45:46,400 –> 00:45:50,400
IT and HR questions feel harmless until an agent confidently tells someone they’re allowed
1071
00:45:50,400 –> 00:45:52,840
to do something they’re explicitly not allowed to do.
1072
00:45:52,840 –> 00:45:54,280
Or it cites the wrong clause.
1073
00:45:54,280 –> 00:45:59,000
Or it mixes two versions of the same policy and produces a third policy that never existed.
1074
00:45:59,000 –> 00:46:00,880
Nobody audits the chat transcript for tone.
1075
00:46:00,880 –> 00:46:01,880
They audit it for harm.
1076
00:46:01,880 –> 00:46:05,520
So this demo is built around the highest risk question types.
1077
00:46:05,520 –> 00:46:08,960
SOPs, runbooks, compliance rules, and internal policy.
1078
00:46:08,960 –> 00:46:12,240
The staff people ask in a hurry, copy into emails, and then treat as truth.
1079
00:46:12,240 –> 00:46:14,280
The architecture is intentionally strict.
1080
00:46:14,280 –> 00:46:17,800
User asks, can contractors store customer data in one drive?
1081
00:46:17,800 –> 00:46:21,360
Or what’s the process for requesting elevated access?
1082
00:46:21,360 –> 00:46:26,360
Or are we allowed to use personal devices for M365?
1083
00:46:26,360 –> 00:46:28,040
The agent’s first behavior is not to answer.
1084
00:46:28,040 –> 00:46:30,880
It’s to classify the question type and set the response mode.
1085
00:46:30,880 –> 00:46:34,240
If it’s policy or compliance, the agent goes into evidence mode.
1086
00:46:34,240 –> 00:46:37,560
Retrieve first, site always, and refuse to speculate.
1087
00:46:37,560 –> 00:46:41,440
Now the backbone, as your AI search as the source of truth, not SharePoint search,
1088
00:46:41,440 –> 00:46:43,240
not I think I saw a doc.
1089
00:46:43,240 –> 00:46:48,280
As your AI search index, designed for policy retrieval, chunked by atomic sections,
1090
00:46:48,280 –> 00:46:53,680
tagged with metadata like policy domain, audience, region, and last review date,
1091
00:46:53,680 –> 00:46:57,960
and security trimmed so the user only retrieves what they’re allowed to read.
1092
00:46:57,960 –> 00:47:01,040
The retrieval step pulls the top chunks that match the question,
1093
00:47:01,040 –> 00:47:02,680
but the key design constraint is this.
1094
00:47:02,680 –> 00:47:05,200
The agent can only answer from retrieved content.
1095
00:47:05,200 –> 00:47:07,400
It can paraphrase, it can compress, it can explain.
1096
00:47:07,400 –> 00:47:10,280
It cannot invent, that’s how you get audit-ready responses.
1097
00:47:10,280 –> 00:47:13,440
The output format is also strict because format is a control mechanism.
1098
00:47:13,440 –> 00:47:17,400
The response is short answer, source, next action, escalation path.
1099
00:47:17,400 –> 00:47:19,200
Short answer means one paragraph max.
1100
00:47:19,200 –> 00:47:23,080
If the agent needs five paragraphs, it doesn’t understand the policy boundary well enough,
1101
00:47:23,080 –> 00:47:24,880
or the policy itself is ambiguous.
1102
00:47:24,880 –> 00:47:28,960
Either way, long answers are where the model starts, helping.
1103
00:47:28,960 –> 00:47:33,640
Source means the exact document in section with a link if your environment allows it.
1104
00:47:33,640 –> 00:47:37,520
If there are multiple sources, the agent lists them explicitly and calls out the conflict.
1105
00:47:37,520 –> 00:47:39,040
It does not blend them.
1106
00:47:39,040 –> 00:47:42,480
Next action means the operational step the user should take.
1107
00:47:42,480 –> 00:47:44,320
Submit request via this form.
1108
00:47:44,320 –> 00:47:49,240
Open a ticket in this category, use this approved storage location, or escalate to security
1109
00:47:49,240 –> 00:47:50,240
ops.
1110
00:47:50,240 –> 00:47:52,200
Policies without next actions don’t reduce work.
1111
00:47:52,200 –> 00:47:54,040
They create more meetings.
1112
00:47:54,040 –> 00:47:56,440
Escalation path is the final safety valve.
1113
00:47:56,440 –> 00:48:01,360
If the question involves exceptions, regulatory jurisdiction, or privileged access the agent
1114
00:48:01,360 –> 00:48:04,760
routes, it doesn’t negotiate policy exceptions in chat.
1115
00:48:04,760 –> 00:48:09,280
Now the inevitable reality, policy conflicts, you will have them every enterprise does.
1116
00:48:09,280 –> 00:48:14,360
Two docs disagree, one is newer, one is unofficial, one has the right title but the wrong audience.
1117
00:48:14,360 –> 00:48:18,360
The system needs a conflict strategy that doesn’t rely on trust the model.
1118
00:48:18,360 –> 00:48:20,720
So the demo includes conflict handling rules.
1119
00:48:20,720 –> 00:48:25,280
If two policies conflict, the agent answers conflict detected, sides both, and escalates
1120
00:48:25,280 –> 00:48:26,520
to the policy owner.
1121
00:48:26,520 –> 00:48:31,480
That escalation payload includes the retrieved chunks and the reason the system flagged ambiguity,
1122
00:48:31,480 –> 00:48:33,400
you don’t hide the mess, you surface it.
1123
00:48:33,400 –> 00:48:38,000
If the doc is outdated, the agent says it’s outdated, sides the last reviewed date, and escalates.
1124
00:48:38,000 –> 00:48:41,760
Stale truth is not truth and if the index doesn’t contain the answer, the agent refuses,
1125
00:48:41,760 –> 00:48:42,760
no source, no answer.
1126
00:48:42,760 –> 00:48:44,880
The refusal is not, I’m sorry.
1127
00:48:44,880 –> 00:48:46,160
It’s operational.
1128
00:48:46,160 –> 00:48:48,280
I can’t find an approved source.
1129
00:48:48,280 –> 00:48:51,760
I can open a ticket to the policy owner and include your question.
1130
00:48:51,760 –> 00:48:55,080
Now measuring accuracy because this is where most teams lie to themselves.
1131
00:48:55,080 –> 00:48:59,640
You build an evaluator set, a fixed list of real policy questions that matter.
1132
00:48:59,640 –> 00:49:04,640
Then you score failures with a taxonomy missing doc wrong doc wrong inference stale doc or conflict.
1133
00:49:04,640 –> 00:49:08,240
That taxonomy matters because each failure has a different fix.
1134
00:49:08,240 –> 00:49:12,520
Missing doc is content work, wrong doc is metadata or chunking, wrong inference is response
1135
00:49:12,520 –> 00:49:13,520
constraints.
1136
00:49:13,520 –> 00:49:17,440
Stale doc is lifecycle governance and the demo closes with the proof point your stakeholders
1137
00:49:17,440 –> 00:49:19,000
actually care about.
1138
00:49:19,000 –> 00:49:20,720
You can show the evidence trail.
1139
00:49:20,720 –> 00:49:23,560
Every answer has citations, every escalation has a reason.
1140
00:49:23,560 –> 00:49:27,800
Every I don’t know becomes a backlog item to improve knowledge coverage.
1141
00:49:27,800 –> 00:49:31,640
That’s the difference between co pilot answer the question and the organization can trust
1142
00:49:31,640 –> 00:49:33,040
what it answered.
1143
00:49:33,040 –> 00:49:37,080
Demo architecture three approvals via teams adaptive cards.
1144
00:49:37,080 –> 00:49:41,720
This third demo is where the agent work force stops sounding like marketing and starts looking
1145
00:49:41,720 –> 00:49:46,320
like a control system because approvals are the moment an agent either earns trust or
1146
00:49:46,320 –> 00:49:47,880
becomes a liability.
1147
00:49:47,880 –> 00:49:52,120
Most organizations try to handle risk with a generic human in the loop rule.
1148
00:49:52,120 –> 00:49:53,520
Someone should approve it.
1149
00:49:53,520 –> 00:49:55,640
It’s not control that’s delay.
1150
00:49:55,640 –> 00:49:57,560
A real enterprise pattern is tighter.
1151
00:49:57,560 –> 00:50:02,320
The agent runs everything it can deterministically then it poses only a decision boundaries that
1152
00:50:02,320 –> 00:50:05,280
are irreversible, privileged or ordered relevant.
1153
00:50:05,280 –> 00:50:08,240
That boundary becomes visible through adaptive cards and teams.
1154
00:50:08,240 –> 00:50:12,160
And yes teams is the right place for this not because it’s trendy but because it’s where
1155
00:50:12,160 –> 00:50:14,600
approvals already happen in the real world.
1156
00:50:14,600 –> 00:50:16,800
Managers, service owners, security reviewers.
1157
00:50:16,800 –> 00:50:18,200
They don’t want to read paragraphs.
1158
00:50:18,200 –> 00:50:20,200
They want to click a decision and move on.
1159
00:50:20,200 –> 00:50:21,960
So the demo flow starts with detection.
1160
00:50:21,960 –> 00:50:25,840
The user asks for something that crosses a risk threshold, grant access to this share
1161
00:50:25,840 –> 00:50:27,240
point side.
1162
00:50:27,240 –> 00:50:28,240
Approve an exception.
1163
00:50:28,240 –> 00:50:30,200
Reset MFA for a user.
1164
00:50:30,200 –> 00:50:32,040
Create a mailbox delegation.
1165
00:50:32,040 –> 00:50:33,480
Approve a spend request.
1166
00:50:33,480 –> 00:50:34,840
Approve a change.
1167
00:50:34,840 –> 00:50:36,960
It doesn’t matter which domain you pick.
1168
00:50:36,960 –> 00:50:39,200
What matters is the shape of the workflow.
1169
00:50:39,200 –> 00:50:41,880
Request, validate, approve, execute, log.
1170
00:50:41,880 –> 00:50:45,280
Step one, the agent classifies the intent and evaluates risk.
1171
00:50:45,280 –> 00:50:46,280
This is not a vibe check.
1172
00:50:46,280 –> 00:50:47,280
It’s a rules check.
1173
00:50:47,280 –> 00:50:51,200
If the intent maps to a right action or touches sensitive data or changes permissions,
1174
00:50:51,200 –> 00:50:52,960
the agent flips into approval mode.
1175
00:50:52,960 –> 00:50:56,720
It gathers the minimum context required to make the decision reviewable.
1176
00:50:56,720 –> 00:50:58,320
Who is requesting?
1177
00:50:58,320 –> 00:50:59,840
What resource is being changed?
1178
00:50:59,840 –> 00:51:00,840
Why?
1179
00:51:00,840 –> 00:51:01,840
For how long?
1180
00:51:01,840 –> 00:51:02,840
And what policy applies?
1181
00:51:02,840 –> 00:51:07,040
And it retrieves the relevant policy source because approvals without policy context
1182
00:51:07,040 –> 00:51:09,800
just become whoever clicks first wins.
1183
00:51:09,800 –> 00:51:12,360
Step two, the agent generates the approval payload.
1184
00:51:12,360 –> 00:51:14,560
This is a structured object, not a narrative.
1185
00:51:14,560 –> 00:51:19,000
Request our target resource, requested action, scope, duration, justification, evidence
1186
00:51:19,000 –> 00:51:22,840
link and the downstream action that will execute if approved.
1187
00:51:22,840 –> 00:51:25,840
The agent also includes a deny reason required field.
1188
00:51:25,840 –> 00:51:28,080
Denials without reasons create shadow workflows.
1189
00:51:28,080 –> 00:51:30,360
People just resubmit until someone approves.
1190
00:51:30,360 –> 00:51:31,680
Entropy wins.
1191
00:51:31,680 –> 00:51:34,880
Step three, teams adaptive card is posted to the approver.
1192
00:51:34,880 –> 00:51:35,880
The card format matters.
1193
00:51:35,880 –> 00:51:39,600
It should be short enough that the approver can decide in 10 seconds, but complete enough
1194
00:51:39,600 –> 00:51:41,360
that they don’t need a follow-up meeting.
1195
00:51:41,360 –> 00:51:46,160
So the card contains the action summary in one line, the justification in one sentence,
1196
00:51:46,160 –> 00:51:48,920
the policy citation as a link, and three buttons.
1197
00:51:48,920 –> 00:51:51,720
Approved, deny and request more info.
1198
00:51:51,720 –> 00:51:53,480
That third button is not politeness.
1199
00:51:53,480 –> 00:51:58,600
It is how you prevent deny from becoming the default because the approver lacked context.
1200
00:51:58,600 –> 00:52:00,960
Now the important part, the card is not just the UI.
1201
00:52:00,960 –> 00:52:05,160
It is the enforcement mechanism because clicking approved triggers a deterministic workflow
1202
00:52:05,160 –> 00:52:07,280
that is version controlled and logged.
1203
00:52:07,280 –> 00:52:09,400
This is where Power Automate does the heavy lifting.
1204
00:52:09,400 –> 00:52:13,000
It captures the approval decision, stamps who approved when they approved the exact
1205
00:52:13,000 –> 00:52:17,160
payload they approved and then executes the right action through the approved toolpath.
1206
00:52:17,160 –> 00:52:18,280
No ad hoc calls.
1207
00:52:18,280 –> 00:52:20,760
No agent decided to improvise.
1208
00:52:20,760 –> 00:52:25,200
If the approval is denied, the workflow logs the reason and routes it back to the requester
1209
00:52:25,200 –> 00:52:26,360
with the next step.
1210
00:52:26,360 –> 00:52:29,400
What to change, who to contact or what policy blocked it.
1211
00:52:29,400 –> 00:52:32,520
That reduces rework and stops the agent from becoming a dead end.
1212
00:52:32,520 –> 00:52:37,160
If the approver requests more info, the agent re-engages the requester with one targeted question,
1213
00:52:37,160 –> 00:52:39,360
updates the payload and re-issues the card.
1214
00:52:39,360 –> 00:52:40,360
No long chat.
1215
00:52:40,360 –> 00:52:42,040
Just fill the missing field and continue.
1216
00:52:42,040 –> 00:52:45,480
Now the guardrails, you need a risk threshold model, not complex.
1217
00:52:45,480 –> 00:52:46,880
Three tiers.
1218
00:52:46,880 –> 00:52:47,880
No risk.
1219
00:52:47,880 –> 00:52:52,160
Read only actions, status checks, knowledge retrieval, no approvals.
1220
00:52:52,160 –> 00:52:53,160
Medium risk.
1221
00:52:53,160 –> 00:52:56,800
Write actions that are reversible and low impact, like creating a ticket or updating a
1222
00:52:56,800 –> 00:52:58,600
non-sensitive field.
1223
00:52:58,600 –> 00:52:59,600
Optional approvals.
1224
00:52:59,600 –> 00:53:00,600
High risk.
1225
00:53:00,600 –> 00:53:01,600
Access changes.
1226
00:53:01,600 –> 00:53:02,600
Privileged changes.
1227
00:53:02,600 –> 00:53:03,600
Financial actions.
1228
00:53:03,600 –> 00:53:05,000
External communication.
1229
00:53:05,000 –> 00:53:07,160
Or anything that changes identity.
1230
00:53:07,160 –> 00:53:08,160
Mandatory approvals.
1231
00:53:08,160 –> 00:53:10,840
And you enforce this in orchestration, not in documentation.
1232
00:53:10,840 –> 00:53:12,600
The operational effect is immediate.
1233
00:53:12,600 –> 00:53:16,200
Cycle time drops because the agent does the prep work and routes the decision to the correct
1234
00:53:16,200 –> 00:53:17,680
human with the right context.
1235
00:53:17,680 –> 00:53:22,640
Risk drops because right actions only execute after an explicit, log decision.
1236
00:53:22,640 –> 00:53:25,760
And adoption rises because users see outcomes, not chat.
1237
00:53:25,760 –> 00:53:28,160
This demo also closes an early open loop.
1238
00:53:28,160 –> 00:53:32,560
The design choice that prevents ghost agents later is the same choice that makes approvals
1239
00:53:32,560 –> 00:53:33,560
work.
1240
00:53:33,560 –> 00:53:35,080
Put decision boundaries in the system.
1241
00:53:35,080 –> 00:53:36,080
Not in the slide deck.
1242
00:53:36,080 –> 00:53:37,880
Agents Brawl is predictable.
1243
00:53:37,880 –> 00:53:38,880
Design for it upfront.
1244
00:53:38,880 –> 00:53:40,880
Agents Brawl isn’t a later problem.
1245
00:53:40,880 –> 00:53:45,760
It’s the default outcome of giving people a new capability with no enforced boundary.
1246
00:53:45,760 –> 00:53:48,120
Most organizations think the risk is too few agents.
1247
00:53:48,120 –> 00:53:51,440
The real risk is too many because users don’t want 50 agents.
1248
00:53:51,440 –> 00:53:55,200
They want five that work every time in the same places they already work.
1249
00:53:55,200 –> 00:53:58,080
When the ecosystem turns into a catalog, nobody understands.
1250
00:53:58,080 –> 00:53:59,440
Adoption doesn’t fail loudly.
1251
00:53:59,440 –> 00:54:00,520
It just decays.
1252
00:54:00,520 –> 00:54:04,240
People go back to emailing the service desk and forwarding PDFs because it’s faster than
1253
00:54:04,240 –> 00:54:06,600
deciding which agent might be correct.
1254
00:54:06,600 –> 00:54:09,200
And sprawl creates a second quieter problem.
1255
00:54:09,200 –> 00:54:10,200
Ghost agents.
1256
00:54:10,200 –> 00:54:14,280
Ghost agents are agents that nobody uses, nobody owns, and nobody remembers.
1257
00:54:14,280 –> 00:54:19,160
But they still exist, still connect to knowledge, still have two permissions, and still generate
1258
00:54:19,160 –> 00:54:20,160
audit surface.
1259
00:54:20,160 –> 00:54:22,360
They’re the perfect entropy generators.
1260
00:54:22,360 –> 00:54:24,960
Invisible most days, catastrophic on the wrong day.
1261
00:54:24,960 –> 00:54:29,200
So if leadership wants scale without liability, the program has to treat sprawl as inevitable
1262
00:54:29,200 –> 00:54:31,200
and designed for it upfront.
1263
00:54:31,200 –> 00:54:32,840
Start with the uncomfortable truth.
1264
00:54:32,840 –> 00:54:35,720
Self-service agent creation is not democratization.
1265
00:54:35,720 –> 00:54:37,840
It is distributed risk creation.
1266
00:54:37,840 –> 00:54:42,600
And the platform will happily allow it because platforms optimize for capability, not
1267
00:54:42,600 –> 00:54:44,080
for your audit findings.
1268
00:54:44,080 –> 00:54:45,600
Distinction matters.
1269
00:54:45,600 –> 00:54:49,720
So the first control is a catalog discipline that feels boring and saves the program.
1270
00:54:49,720 –> 00:54:53,760
Every agent needs a name that describes an outcome, an owner that signs the outcome, a
1271
00:54:53,760 –> 00:54:58,400
business sponsor that signs the risk and a life cycle state that reflects reality.
1272
00:54:58,400 –> 00:55:01,080
Pilot, active, deprecated, retired.
1273
00:55:01,080 –> 00:55:04,040
Not V1 and V2 and test final final.
1274
00:55:04,040 –> 00:55:05,120
Life cycle state isn’t a label.
1275
00:55:05,120 –> 00:55:06,280
It’s a policy trigger.
1276
00:55:06,280 –> 00:55:08,320
Pilot agents can’t publish broadly.
1277
00:55:08,320 –> 00:55:10,160
Deprecated agents can’t receive new users.
1278
00:55:10,160 –> 00:55:12,280
Retired agents lose tool access and knowledge bindings.
1279
00:55:12,280 –> 00:55:15,920
If those transitions don’t remove capability, they aren’t life cycle states.
1280
00:55:15,920 –> 00:55:17,840
They are stickers.
1281
00:55:17,840 –> 00:55:21,320
Next adopt a reuse strategy that prevents the agent for everything pattern.
1282
00:55:21,320 –> 00:55:23,120
The right reuse unit isn’t the agent.
1283
00:55:23,120 –> 00:55:25,480
It’s the tool.
1284
00:55:25,480 –> 00:55:26,640
Agents are experiences.
1285
00:55:26,640 –> 00:55:28,080
Tools are capabilities.
1286
00:55:28,080 –> 00:55:33,920
If the organization standardizes tools, ticket creation, status checks, identity lookup,
1287
00:55:33,920 –> 00:55:39,280
KB retrieval, teams can build small, scoped agents without reinventing integrations.
1288
00:55:39,280 –> 00:55:43,000
If the organization tries to standardize agents, it builds brittle monoliths that nobody
1289
00:55:43,000 –> 00:55:44,800
trusts and everybody forks.
1290
00:55:44,800 –> 00:55:46,880
Forking is how sprawl becomes permanent.
1291
00:55:46,880 –> 00:55:51,840
So the rule is, prefer reusable MCPN points and flows over reusable agents.
1292
00:55:51,840 –> 00:55:52,840
Build once.
1293
00:55:52,840 –> 00:55:53,920
Reuse everywhere.
1294
00:55:53,920 –> 00:55:55,360
Then control publishing.
1295
00:55:55,360 –> 00:55:58,280
This is the simplest governance posture that still allows innovation.
1296
00:55:58,280 –> 00:56:00,120
You can build but you can’t publish.
1297
00:56:00,120 –> 00:56:02,240
People can prototype in a constrained environment.
1298
00:56:02,240 –> 00:56:03,520
They can test with themselves.
1299
00:56:03,520 –> 00:56:05,400
They can even share inside a small group.
1300
00:56:05,400 –> 00:56:09,240
But the moment an agent becomes enterprise facing, it enters a publishing world.
1301
00:56:09,240 –> 00:56:16,120
Workflow with review gates, data sources, tools, write permissions, grounding strategy, telemetry
1302
00:56:16,120 –> 00:56:17,920
and owner assignment.
1303
00:56:17,920 –> 00:56:20,720
If you don’t separate build from publish, you’ll never catch sprawl.
1304
00:56:20,720 –> 00:56:25,480
You’ll only discover it when the CEO asks why there are three VPN helpers and none of them
1305
00:56:25,480 –> 00:56:26,480
agree.
1306
00:56:26,480 –> 00:56:28,640
Now define the retirement policy before you need it.
1307
00:56:28,640 –> 00:56:30,680
This is where programs usually lie to themselves.
1308
00:56:30,680 –> 00:56:32,080
We’ll clean it up later.
1309
00:56:32,080 –> 00:56:34,120
They won’t.
1310
00:56:34,120 –> 00:56:35,720
Retirement needs automatic triggers.
1311
00:56:35,720 –> 00:56:40,120
No usage over a time window, high escalation rates repeated incorrect routing, missing
1312
00:56:40,120 –> 00:56:44,000
owner, missing sponsor or tool access that no longer matches the approved pattern.
1313
00:56:44,000 –> 00:56:49,120
When a trigger hits, the owner gets a review task, improve, merge or retire.
1314
00:56:49,120 –> 00:56:50,840
And retirement must be real.
1315
00:56:50,840 –> 00:56:55,360
Remove it from discovery, revoke tool permissions, archive logs and keep the audit record.
1316
00:56:55,360 –> 00:56:58,680
And if you only hide the agent but leave the permissions, you didn’t retire it.
1317
00:56:58,680 –> 00:57:00,120
You just made it harder to notice.
1318
00:57:00,120 –> 00:57:04,360
Finally, the design choice that prevents topic sprawl from turning into agents sprawl,
1319
00:57:04,360 –> 00:57:06,720
create intents as an enterprise registry.
1320
00:57:06,720 –> 00:57:08,760
One intent taxonomy shared across agents.
1321
00:57:08,760 –> 00:57:12,880
If each team invents its own intent set, the ecosystem fragments immediately.
1322
00:57:12,880 –> 00:57:17,240
Password reset becomes account unlock, becomes login problem, becomes access issue.
1323
00:57:17,240 –> 00:57:19,760
And now routing quality collapses across every agent.
1324
00:57:19,760 –> 00:57:21,680
The user doesn’t care which team built it.
1325
00:57:21,680 –> 00:57:23,840
They care that the system behaves consistently.
1326
00:57:23,840 –> 00:57:27,320
So make intent to shared asset make tools reusable and make publishing gated.
1327
00:57:27,320 –> 00:57:29,320
That’s how you get scale without panic.
1328
00:57:29,320 –> 00:57:31,840
Because the platform will not stop you from creating entropy.
1329
00:57:31,840 –> 00:57:35,360
You have to enter agent ID identity for nonhumans.
1330
00:57:35,360 –> 00:57:40,720
Once agents sprawl becomes visible, most organizations reach for governance as paperwork, a spreadsheet,
1331
00:57:40,720 –> 00:57:44,200
a review meeting, a naming convention and a promise to be careful.
1332
00:57:44,200 –> 00:57:45,360
That is not governance.
1333
00:57:45,360 –> 00:57:47,080
That’s documentation of drift.
1334
00:57:47,080 –> 00:57:51,280
The only governance that holds in an enterprise is enforced identity because identity is the
1335
00:57:51,280 –> 00:57:56,120
anchor point for permissions, conditional access, audit, trails and incident response.
1336
00:57:56,120 –> 00:58:00,880
Without it, your agent ecosystem is just anonymous tool chains acting on real systems.
1337
00:58:00,880 –> 00:58:03,640
So enter agent ID matters for one blunt reason.
1338
00:58:03,640 –> 00:58:06,280
Agents become actors, actors need identities.
1339
00:58:06,280 –> 00:58:10,200
In intra terms, an agent identity is the non-human principle that represents the agent
1340
00:58:10,200 –> 00:58:12,480
when it touches data or invokes tools.
1341
00:58:12,480 –> 00:58:15,560
It’s the thing that answers the question auditors always ask.
1342
00:58:15,560 –> 00:58:17,560
And your team’s always struggle to answer.
1343
00:58:17,560 –> 00:58:18,560
Who did what?
1344
00:58:18,560 –> 00:58:21,960
When using what permissions and under which policy controls?
1345
00:58:21,960 –> 00:58:24,200
If the answer is the agent did it, you don’t have an answer.
1346
00:58:24,200 –> 00:58:25,200
You have a story.
1347
00:58:25,200 –> 00:58:27,560
Entra agent ID turns that story into a record.
1348
00:58:27,560 –> 00:58:30,320
Now here’s the part most organizations miss.
1349
00:58:30,320 –> 00:58:33,160
Agent ID isn’t about whether a user can chat with the agent.
1350
00:58:33,160 –> 00:58:35,440
That surface access.
1351
00:58:35,440 –> 00:58:38,600
Agent ID is about what the agent can do once it’s engaged.
1352
00:58:38,600 –> 00:58:43,120
What data it can read, what systems it can call and what actions it can execute.
1353
00:58:43,120 –> 00:58:47,280
That distinction matters because most early rollouts over focus on user access lists and
1354
00:58:47,280 –> 00:58:49,560
under focus on agent capability boundaries.
1355
00:58:49,560 –> 00:58:52,120
Over time, user access stays roughly stable.
1356
00:58:52,120 –> 00:58:54,000
Agent capabilities always expand.
1357
00:58:54,000 –> 00:58:57,680
They expand because someone adds just one more connector, just one more tool, just one
1358
00:58:57,680 –> 00:59:00,560
more right action and those exceptions accumulate.
1359
00:59:00,560 –> 00:59:02,200
Entropy always wins through exceptions.
1360
00:59:02,200 –> 00:59:06,000
So the first architectural law for agent ID is least privileged by default.
1361
00:59:06,000 –> 00:59:07,920
Not least privileged for users.
1362
00:59:07,920 –> 00:59:09,680
Least privileged for the agent as an actor.
1363
00:59:09,680 –> 00:59:13,720
That means the agent identity gets only the permissions needed for its defined outcome
1364
00:59:13,720 –> 00:59:16,600
in its defined scope, in its defined environment.
1365
00:59:16,600 –> 00:59:20,720
If the agent triages it issues, it does not need permissions to create accounts.
1366
00:59:20,720 –> 00:59:24,200
If it answers policy questions, it does not need permissions to grant access.
1367
00:59:24,200 –> 00:59:29,040
If it can read device compliance state, it does not need the ability to update device configuration.
1368
00:59:29,040 –> 00:59:33,200
Read and write must stay separate identities or at least separate permission sets because
1369
00:59:33,200 –> 00:59:36,200
write permissions are how experiments become incidents.
1370
00:59:36,200 –> 00:59:39,120
Next, conditional access becomes your control surface.
1371
00:59:39,120 –> 00:59:42,000
Most people treat conditional access as user security.
1372
00:59:42,000 –> 00:59:46,440
In architectural terms, conditional access is an execution policy layer for identities.
1373
00:59:46,440 –> 00:59:51,480
If the agent identity exists, you can constrain when and where that identity can be used.
1374
00:59:51,480 –> 00:59:56,280
To comply and devices, restrict to trusted locations, restrict session behavior,
1375
00:59:56,280 –> 00:59:59,040
and block high-risk sign-in conditions.
1376
00:59:59,040 –> 01:00:02,200
This is how you stop toolchains from becoming shadow admins.
1377
01:00:02,200 –> 01:00:04,920
And you don’t wait until after the first scary incident.
1378
01:00:04,920 –> 01:00:08,080
You design the baseline policy the same way you design the agent.
1379
01:00:08,080 –> 01:00:10,080
Start restrictive then loosen with evidence.
1380
01:00:10,080 –> 01:00:14,040
Now the thing leadership actually cares about isn’t the identity object, it’s the audit
1381
01:00:14,040 –> 01:00:15,040
anchor.
1382
01:00:15,040 –> 01:00:19,120
Agent ID gives you a stable principle that shows up in logs across the stack.
1383
01:00:19,120 –> 01:00:24,720
A sign-in activity where applicable, power platform activity, tool invocation, approvals,
1384
01:00:24,720 –> 01:00:27,360
ticket creation, and downstream system changes.
1385
01:00:27,360 –> 01:00:31,160
When something goes wrong, you can trace the chain without that you’re debugging a ghost.
1386
01:00:31,160 –> 01:00:33,920
And this is where the escalation model becomes real.
1387
01:00:33,920 –> 01:00:37,240
Privileged actions require stronger boundaries than normal actions.
1388
01:00:37,240 –> 01:00:39,840
So your orchestration needs a privileged tier model.
1389
01:00:39,840 –> 01:00:42,640
Standard actions run under the base agent identity.
1390
01:00:42,640 –> 01:00:47,600
Privileged actions either require step-up authentication, explicit approvals, or a separate
1391
01:00:47,600 –> 01:00:52,320
privileged agent identity that can only be used through an approval workflow.
1392
01:00:52,320 –> 01:00:56,840
You don’t let the same agent casually move from reading a runbook to changing access
1393
01:00:56,840 –> 01:00:57,840
controls.
1394
01:00:57,840 –> 01:00:59,160
That’s not autonomy.
1395
01:00:59,160 –> 01:01:01,920
That’s uncontrolled privilege drift.
1396
01:01:01,920 –> 01:01:06,120
Finally, agent ID is how you make ownership enforceable.
1397
01:01:06,120 –> 01:01:09,680
Earlier the system required every agent to have an owner and a sponsor.
1398
01:01:09,680 –> 01:01:12,440
Agent ID ties that to an identity life cycle.
1399
01:01:12,440 –> 01:01:16,840
When the owner changes roles, the sponsor changes, or the agent gets deprecated, you can
1400
01:01:16,840 –> 01:01:22,320
change access, rotate secrets, revoke permissions, and retire the identity.
1401
01:01:22,320 –> 01:01:26,360
That’s what stops ghost agents from retaining power after everyone forgot they exist.
1402
01:01:26,360 –> 01:01:29,800
So the simple version is “entra agent ID” makes agents accountable.
1403
01:01:29,800 –> 01:01:33,280
And an enterprise system’s accountability is the only thing that scales.
1404
01:01:33,280 –> 01:01:37,200
Because as soon as non-humans can act, you’re no longer managing chatbots, you’re managing
1405
01:01:37,200 –> 01:01:39,120
identities with tools.
1406
01:01:39,120 –> 01:01:40,920
Governance that doesn’t kill innovation.
1407
01:01:40,920 –> 01:01:45,000
Most leaders hear governance and picture a committee that meets monthly to approve nothing.
1408
01:01:45,000 –> 01:01:46,000
That’s not governance.
1409
01:01:46,000 –> 01:01:48,280
It’s delay disguised as control.
1410
01:01:48,280 –> 01:01:52,080
Real governance is a design constraint that lets building happen fast while keeping the
1411
01:01:52,080 –> 01:01:53,360
blast radius small.
1412
01:01:53,360 –> 01:01:58,000
So the program starts with a posture that feels restrictive but actually accelerates delivery.
1413
01:01:58,000 –> 01:02:01,000
Retrieval only first, then control dried operations.
1414
01:02:01,000 –> 01:02:03,480
Retrieval only agents are where innovation should be cheap.
1415
01:02:03,480 –> 01:02:07,040
They read approved knowledge, site sources, and escalate when they can’t.
1416
01:02:07,040 –> 01:02:11,360
That gives you immediate deflection and reduces human workload without risking state changes
1417
01:02:11,360 –> 01:02:12,880
in downstream systems.
1418
01:02:12,880 –> 01:02:15,960
And it trains your organization on how to build with discipline.
1419
01:02:15,960 –> 01:02:20,480
And ownership, metadata, evaluation sets, and no source, no answer.
1420
01:02:20,480 –> 01:02:23,320
If you can’t govern retrieval, you have no business governing actions.
1421
01:02:23,320 –> 01:02:27,720
Then you introduce right operations as a gated capability, not a default feature.
1422
01:02:27,720 –> 01:02:31,520
Create ticket, update ticket, trigger and approval, send a templated email.
1423
01:02:31,520 –> 01:02:34,640
These are controlled actions with audit logs and rollbacks.
1424
01:02:34,640 –> 01:02:39,560
Anything beyond that, privilege changes, access grants, deletions, lives behind higher gates
1425
01:02:39,560 –> 01:02:41,200
or separate identities.
1426
01:02:41,200 –> 01:02:44,440
Autonomy expands only when observability proves it’s safe.
1427
01:02:44,440 –> 01:02:47,400
Now the scaling mechanism is environment strategy.
1428
01:02:47,400 –> 01:02:51,560
Not because power platform needs more environments for fun, but because environments are your safety
1429
01:02:51,560 –> 01:02:52,560
lanes.
1430
01:02:52,560 –> 01:02:54,440
You need three lanes and they are not optional.
1431
01:02:54,440 –> 01:02:56,000
Lane one is personal productivity.
1432
01:02:56,000 –> 01:02:58,040
This is where experimentation lives.
1433
01:02:58,040 –> 01:02:59,400
People can build, test and learn.
1434
01:02:59,400 –> 01:03:00,760
They can’t publish broadly.
1435
01:03:00,760 –> 01:03:02,080
Tool access is minimal.
1436
01:03:02,080 –> 01:03:03,360
Knowledge access is constrained.
1437
01:03:03,360 –> 01:03:06,200
The purpose is skill building without enterprise liability.
1438
01:03:06,200 –> 01:03:07,400
Lane two is departmental.
1439
01:03:07,400 –> 01:03:11,440
This is where teams solve real workflow problems with a bounded audience.
1440
01:03:11,440 –> 01:03:13,520
Publishing is limited to the department or group.
1441
01:03:13,520 –> 01:03:17,120
Tool access expands, but only to approved connectors and MCP tools.
1442
01:03:17,120 –> 01:03:19,840
This is where you prove adoption without creating global sprawl.
1443
01:03:19,840 –> 01:03:21,320
Lane three is business critical.
1444
01:03:21,320 –> 01:03:22,320
This is production.
1445
01:03:22,320 –> 01:03:23,320
ALM exists.
1446
01:03:23,320 –> 01:03:24,320
Dev test.
1447
01:03:24,320 –> 01:03:25,320
Prod separation exists.
1448
01:03:25,320 –> 01:03:26,320
Owners exist.
1449
01:03:26,320 –> 01:03:27,320
Sponsors exist.
1450
01:03:27,320 –> 01:03:28,320
Audit logging is on.
1451
01:03:28,320 –> 01:03:29,320
Tool allow lists are enforced.
1452
01:03:29,320 –> 01:03:33,560
And if an agent touches sensitive data or performs right actions, it passes review gates
1453
01:03:33,560 –> 01:03:36,000
before anyone outside the build team can use it.
1454
01:03:36,000 –> 01:03:40,960
That lane structure is how you scale without turning every cool idea into enterprise risk.
1455
01:03:40,960 –> 01:03:45,160
Now for the part that makes or breaks governance, data boundary enforcement, DLP and connector
1456
01:03:45,160 –> 01:03:46,760
policies aren’t paperwork.
1457
01:03:46,760 –> 01:03:52,720
They’re the only thing standing between useful automation and accidental data exfiltration.
1458
01:03:52,720 –> 01:03:57,040
If a builder can casually connect an agent to a personal mailbox, a consumer storage
1459
01:03:57,040 –> 01:04:01,800
service and a high trust internal system in the same flow, you didn’t build an agent platform,
1460
01:04:01,800 –> 01:04:05,160
you built a leakage machine so you define connector boundaries by lane.
1461
01:04:05,160 –> 01:04:08,760
In personal productivity, block out bound and high risk connectors.
1462
01:04:08,760 –> 01:04:14,000
And departmental allow a curated set in business critical allow what’s needed, but only through
1463
01:04:14,000 –> 01:04:18,520
approved tool surfaces with identities you can audit cross boundary data movement becomes
1464
01:04:18,520 –> 01:04:23,120
a design decision not a maker decision human in the loop also needs to be used correctly
1465
01:04:23,120 –> 01:04:24,120
here.
1466
01:04:24,120 –> 01:04:26,240
The governance mistake is to force approvals everywhere.
1467
01:04:26,240 –> 01:04:28,000
That’s how you kill adoption.
1468
01:04:28,000 –> 01:04:32,280
The correct placement is still the same decision boundaries with risk approvals for right
1469
01:04:32,280 –> 01:04:37,160
actions above a threshold approvals for privilege approvals for external communications
1470
01:04:37,160 –> 01:04:38,840
everything else runs.
1471
01:04:38,840 –> 01:04:42,360
And this is the principle that stops chaos without stopping builders.
1472
01:04:42,360 –> 01:04:45,520
You can build, but you can’t publish that posture sounds cynical.
1473
01:04:45,520 –> 01:04:47,200
It is, it’s also true.
1474
01:04:47,200 –> 01:04:48,840
Builders don’t need permission to explore.
1475
01:04:48,840 –> 01:04:53,360
They need a path to graduate their work into production without bypassing controls.
1476
01:04:53,360 –> 01:04:56,320
So the governance model gives them a clear graduation path.
1477
01:04:56,320 –> 01:04:57,720
Prove rooting stability.
1478
01:04:57,720 –> 01:04:59,040
Prove grounded accuracy.
1479
01:04:59,040 –> 01:05:00,040
Prove tool discipline.
1480
01:05:00,040 –> 01:05:01,040
Prove logging.
1481
01:05:01,040 –> 01:05:02,040
Assign an owner.
1482
01:05:02,040 –> 01:05:03,040
Then publish.
1483
01:05:03,040 –> 01:05:04,800
The gates are explicit and the gates are fast.
1484
01:05:04,800 –> 01:05:07,320
If you hide the gates behind meetings, people root around them.
1485
01:05:07,320 –> 01:05:10,880
If you enforce the gates in design, people ship safely.
1486
01:05:10,880 –> 01:05:13,480
Governance that doesn’t kill innovation is simple.
1487
01:05:13,480 –> 01:05:16,880
Constraint where agents can act, constrain what they can touch and make publishing the
1488
01:05:16,880 –> 01:05:18,360
only real checkpoint.
1489
01:05:18,360 –> 01:05:22,560
Then the system can scale because control that depends on memory always erodes.
1490
01:05:22,560 –> 01:05:24,800
Control that depends on design holds.
1491
01:05:24,800 –> 01:05:25,800
Observability.
1492
01:05:25,800 –> 01:05:27,600
The flight recorder problem.
1493
01:05:27,600 –> 01:05:31,200
Governance without observability is just optimism with a meeting invite.
1494
01:05:31,200 –> 01:05:34,400
Most organizations can describe what they intended an agent to do.
1495
01:05:34,400 –> 01:05:36,720
They cannot describe what it actually did.
1496
01:05:36,720 –> 01:05:37,720
Not reliably.
1497
01:05:37,720 –> 01:05:38,800
Not at scale.
1498
01:05:38,800 –> 01:05:43,480
Not after the first incident review when legal asks for the exact sequence of actions.
1499
01:05:43,480 –> 01:05:44,960
This is the flight recorder problem.
1500
01:05:44,960 –> 01:05:47,800
In aviation, nobody debates whether they need logs after a crash.
1501
01:05:47,800 –> 01:05:51,520
In enterprise AI, teams still treat logs like a premium feature.
1502
01:05:51,520 –> 01:05:53,560
Nice later, not required now.
1503
01:05:53,560 –> 01:05:56,760
That’s how you end up with agents that can act but can’t be explained.
1504
01:05:56,760 –> 01:05:59,800
And an agent that can’t be explained is an agent you can’t defend.
1505
01:05:59,800 –> 01:06:04,480
Your observability becomes the third pillar of control alongside grounding and identity.
1506
01:06:04,480 –> 01:06:07,160
It is what makes intent enforceable over time.
1507
01:06:07,160 –> 01:06:08,360
Start with the minimum.
1508
01:06:08,360 –> 01:06:09,360
Decision logs.
1509
01:06:09,360 –> 01:06:10,920
Not chat transcripts.
1510
01:06:10,920 –> 01:06:11,920
Decision logs.
1511
01:06:11,920 –> 01:06:13,720
A transcript tells you what the user said.
1512
01:06:13,720 –> 01:06:17,800
It does not tell you why the agent chose an intent, why it called a tool, why it escalated
1513
01:06:17,800 –> 01:06:18,960
or why it refused.
1514
01:06:18,960 –> 01:06:21,400
You need a record of the agent’s decision points.
1515
01:06:21,400 –> 01:06:22,400
Classification result.
1516
01:06:22,400 –> 01:06:23,400
Confidence.
1517
01:06:23,400 –> 01:06:24,400
Retrieved sources.
1518
01:06:24,400 –> 01:06:25,400
Tool calls attempted.
1519
01:06:25,400 –> 01:06:26,400
Tool results.
1520
01:06:26,400 –> 01:06:27,880
Verification outcomes.
1521
01:06:27,880 –> 01:06:31,400
Proofers requested approvals received and the final disposition.
1522
01:06:31,400 –> 01:06:34,960
That means every meaningful step produces a log event you can query.
1523
01:06:34,960 –> 01:06:36,600
Not an export you have to dig up later.
1524
01:06:36,600 –> 01:06:39,040
A queryable record.
1525
01:06:39,040 –> 01:06:43,280
And yes, the platform gives you pieces, power platform admin views, agent registries, purview
1526
01:06:43,280 –> 01:06:46,120
DSPM for AI and defender signals.
1527
01:06:46,120 –> 01:06:50,800
But those surfaces only help if you treat observability as a design requirement, not a retrospective
1528
01:06:50,800 –> 01:06:51,800
cleanup task.
1529
01:06:51,800 –> 01:06:53,520
Audit logging is the first non-negotiable.
1530
01:06:53,520 –> 01:06:57,980
If the agent can access enterprise data or execute actions, its activity must land in
1531
01:06:57,980 –> 01:07:00,120
audit logs with a stable identity anchor.
1532
01:07:00,120 –> 01:07:02,360
This is where agent ID stops being theoretical.
1533
01:07:02,360 –> 01:07:06,200
It becomes the principle you can search for when you need to know what happened.
1534
01:07:06,200 –> 01:07:07,840
The next surface is monitoring.
1535
01:07:07,840 –> 01:07:10,000
Not just is it up, but is it behaving?
1536
01:07:10,000 –> 01:07:12,600
So the metrics that matter are not messages sent.
1537
01:07:12,600 –> 01:07:16,200
Their operational outcomes, containment rate, how often the interaction ends without a
1538
01:07:16,200 –> 01:07:20,280
human, escalation rate, how often it puns to a human, and why.
1539
01:07:20,280 –> 01:07:24,280
And then, the first thing you need to know is the data.
1540
01:07:24,280 –> 01:07:29,120
And then, the first thing you need to know is the data.
1541
01:07:29,120 –> 01:07:32,480
And then, the second thing you need to know is the data.
1542
01:07:32,480 –> 01:07:36,160
And then, the second thing you need to know is the data.
1543
01:07:36,160 –> 01:07:39,920
And then, the second thing you need to know is the data.
1544
01:07:39,920 –> 01:07:43,920
And then, the second thing you need to know is the data.
1545
01:07:43,920 –> 01:07:46,920
And then, the second thing you need to know is the data.
1546
01:07:46,920 –> 01:07:49,920
And then, the second thing you need to know is the data.
1547
01:07:49,920 –> 01:07:52,840
And then, the second thing you need to know is the data.
1548
01:07:52,840 –> 01:07:56,600
And then, the second thing you need to know is the data.
1549
01:07:56,600 –> 01:08:00,000
And then, the second thing you need to know is the data.
1550
01:08:00,000 –> 01:08:03,280
And then, the second thing you need to know is the data.
1551
01:08:03,280 –> 01:08:06,960
And then, the second thing you need to know is the data.
1552
01:08:06,960 –> 01:08:09,560
And then, the second thing you need to know is the data.
1553
01:08:09,560 –> 01:08:12,600
And then, the second thing you need to know is the data.
1554
01:08:12,600 –> 01:08:15,560
And then, the second thing you need to know is the data.
1555
01:08:15,560 –> 01:08:18,560
And then, the second thing you need to know is the data.
1556
01:08:18,560 –> 01:08:21,480
And then, the second thing you need to know is the data.
1557
01:08:21,480 –> 01:08:24,440
And then, the second thing you need to know is the data.
1558
01:08:24,440 –> 01:08:27,440
And then, the second thing you need to know is the data.
1559
01:08:27,440 –> 01:08:30,040
And then, the second thing you need to know is the data.
1560
01:08:30,040 –> 01:08:32,440
And then, the second thing you need to know is the data.
1561
01:08:32,440 –> 01:08:35,000
And then, the second thing you need to know is the data.
1562
01:08:35,000 –> 01:08:37,400
And then, the second thing you need to know is the data.
1563
01:08:37,400 –> 01:08:40,200
And then, the second thing you need to know is the data.
1564
01:08:40,200 –> 01:08:42,840
And then, the second thing you need to know is the data.
1565
01:08:42,840 –> 01:08:46,040
And then, the second thing you need to know is the data.
1566
01:08:46,040 –> 01:08:48,280
And then, the second thing you need to know is the data.
1567
01:08:48,280 –> 01:08:51,200
And then, the second thing you need to know is the data.
1568
01:08:51,200 –> 01:08:54,160
And then, the second thing you need to know is the data.
1569
01:08:54,160 –> 01:08:57,160
And then, the second thing you need to know is the data.
1570
01:08:57,160 –> 01:08:59,760
And then, the second thing you need to know is the data.
1571
01:08:59,760 –> 01:09:02,920
And then, the second thing you need to know is the data.
1572
01:09:02,920 –> 01:09:05,960
And then, the second thing you need to know is the data.
1573
01:09:05,960 –> 01:09:08,640
And then, the second thing you need to know is the data.
1574
01:09:08,640 –> 01:09:11,680
And then, the second thing you need to know is the data.
1575
01:09:11,680 –> 01:09:14,920
And then, the second thing you need to know is the data.
1576
01:09:14,920 –> 01:09:17,920
And then, the third thing you need to know is the data.
1577
01:09:17,920 –> 01:09:20,840
And then, the third thing you need to know is the data.
1578
01:09:20,840 –> 01:09:23,800
And then, the third thing you need to know is the data.
1579
01:09:23,800 –> 01:09:26,800
And then, the third thing you need to know is the data.
1580
01:09:26,800 –> 01:09:29,400
And then, the third thing you need to know is the data.
1581
01:09:29,400 –> 01:09:32,560
And then, the third thing you need to know is the data.
1582
01:09:32,560 –> 01:09:35,560
And then, the third thing you need to know is the data.
1583
01:09:35,560 –> 01:09:38,280
And then, the third thing you need to know is the data.
1584
01:09:38,280 –> 01:09:41,320
And then, the third thing you need to know is the data.
1585
01:09:41,320 –> 01:09:44,560
And then, the third thing you need to know is the data.
1586
01:09:44,560 –> 01:09:47,560
And then, the third thing you need to know is the data.
1587
01:09:47,560 –> 01:09:50,200
Turns all of this into a working enterprise system.
1588
01:09:50,200 –> 01:09:52,520
Days one, 10, build the first working system.
1589
01:09:52,520 –> 01:09:54,840
Days one through 10 aren’t for vision decks.
1590
01:09:54,840 –> 01:09:57,120
Therefore, forcing a working system into existence
1591
01:09:57,120 –> 01:09:59,800
with constraints tight enough that it can’t lie to you.
1592
01:09:59,800 –> 01:10:03,000
Days one and two, pick the use case, lock the KPI targets,
1593
01:10:03,000 –> 01:10:04,720
and draw the escalation boundary in ink.
1594
01:10:04,720 –> 01:10:07,120
You’re not selecting AI opportunities.
1595
01:10:07,120 –> 01:10:09,000
You’re selecting one operational flow,
1596
01:10:09,000 –> 01:10:12,520
where deflection and cycle time can move inside 30 days.
1597
01:10:12,520 –> 01:10:15,720
IT Ticket triage wins because it’s measurable, high volume,
1598
01:10:15,720 –> 01:10:17,800
and politically survivable.
1599
01:10:17,800 –> 01:10:20,000
Then you set targets that can’t be gamed.
1600
01:10:20,000 –> 01:10:23,040
20% to 40% deflection for the chosen intake channel,
1601
01:10:23,040 –> 01:10:26,120
15% to 30% SLA reduction for the subset of tickets
1602
01:10:26,120 –> 01:10:29,680
the agent touches and 10% to 25% fewer escalations
1603
01:10:29,680 –> 01:10:31,320
caused by mis-routing.
1604
01:10:31,320 –> 01:10:34,440
You also define the hard line, what the agent must never do.
1605
01:10:34,440 –> 01:10:37,160
Anything involving access grants, privilege, changes,
1606
01:10:37,160 –> 01:10:39,880
or policy exceptions escalates by design.
1607
01:10:39,880 –> 01:10:41,160
No debates later.
1608
01:10:41,160 –> 01:10:43,160
The agent doesn’t try.
1609
01:10:43,160 –> 01:10:45,080
It routes.
1610
01:10:45,080 –> 01:10:48,040
Days three through five, design the intents and topics,
1611
01:10:48,040 –> 01:10:50,120
then implement the fail-safe behaviors.
1612
01:10:50,120 –> 01:10:52,440
This is the week where topics brawl either starts
1613
01:10:52,440 –> 01:10:53,680
or gets prevented.
1614
01:10:53,680 –> 01:10:56,280
You create 10 to 15 intents, not departments,
1615
01:10:56,280 –> 01:10:59,120
not everything users might ask, stable intents
1616
01:10:59,120 –> 01:11:01,040
that map to a containment boundary.
1617
01:11:01,040 –> 01:11:03,160
For each intent, you write three things.
1618
01:11:03,160 –> 01:11:06,320
Success criteria, allowed actions, and escalation triggers.
1619
01:11:06,320 –> 01:11:08,640
If you can’t express those in one screen of text,
1620
01:11:08,640 –> 01:11:11,400
the intent is too broad, then you implement fallback.
1621
01:11:11,400 –> 01:11:12,480
One fallback.
1622
01:11:12,480 –> 01:11:14,520
Fallback is not be helpful.
1623
01:11:14,520 –> 01:11:16,840
Fallback is controlled uncertainty.
1624
01:11:16,840 –> 01:11:19,720
Ask one clarifying question, then escalate.
1625
01:11:19,720 –> 01:11:21,760
If your fallback tries to answer anyway,
1626
01:11:21,760 –> 01:11:23,960
you’re building confident wrongness into the core.
1627
01:11:23,960 –> 01:11:26,440
And you set kill criteria now, not after the pilot
1628
01:11:26,440 –> 01:11:29,720
embarrasses you, low usage, high confusion, high escalation
1629
01:11:29,720 –> 01:11:30,720
rate, low containment.
1630
01:11:30,720 –> 01:11:33,000
If a topic fails, it gets merged or retired.
1631
01:11:33,000 –> 01:11:34,320
Dead topics aren’t harmless.
1632
01:11:34,320 –> 01:11:36,640
They create ambiguous rooting forever.
1633
01:11:36,640 –> 01:11:38,920
Days six through eight, implement orchestration
1634
01:11:38,920 –> 01:11:42,120
and enrichment, connect ITSM with deterministic automation.
1635
01:11:42,120 –> 01:11:44,400
This is where you build the actual loop, classify,
1636
01:11:44,400 –> 01:11:48,760
enrich, retrieve, propose, confirm, execute, verify,
1637
01:11:48,760 –> 01:11:49,640
handoff.
1638
01:11:49,640 –> 01:11:51,640
The orchestration lives in co-pilot studio
1639
01:11:51,640 –> 01:11:54,800
because you need a control plane, not just a chat interface.
1640
01:11:54,800 –> 01:11:57,800
The enrichment is minimal, identity, device state,
1641
01:11:57,800 –> 01:12:01,000
if relevant, recent incidents, and service status.
1642
01:12:01,000 –> 01:12:02,000
Don’t horde context.
1643
01:12:02,000 –> 01:12:05,040
Context hoarding becomes privacy risk and retrieval confusion.
1644
01:12:05,040 –> 01:12:07,680
Then you connect your ITSM system with power automate,
1645
01:12:07,680 –> 01:12:10,240
not because it’s pretty, but because it’s deterministic.
1646
01:12:10,240 –> 01:12:12,760
Ticket creation, assignment, notifications,
1647
01:12:12,760 –> 01:12:14,720
and logging belong in a workflow engine,
1648
01:12:14,720 –> 01:12:16,520
not in an LLM’s improvisation.
1649
01:12:16,520 –> 01:12:19,520
At this stage, tool discipline matters more than cleverness.
1650
01:12:19,520 –> 01:12:21,080
Start read only where possible.
1651
01:12:21,080 –> 01:12:23,760
Check service health, list known incidents,
1652
01:12:23,760 –> 01:12:25,320
pull user ticket history.
1653
01:12:25,320 –> 01:12:28,480
If you must write, keep it reversible, create a ticket,
1654
01:12:28,480 –> 01:12:32,800
add a note, post an update, anything privileged waits.
1655
01:12:32,800 –> 01:12:36,280
Days nine and ten, pilot users create an evaluation set
1656
01:12:36,280 –> 01:12:38,400
and baseline containment and accuracy.
1657
01:12:38,400 –> 01:12:41,280
Pick a small pilot group that represents real usage,
1658
01:12:41,280 –> 01:12:42,280
not enthusiasts.
1659
01:12:42,280 –> 01:12:44,280
You want normal people with normal impatience,
1660
01:12:44,280 –> 01:12:47,320
give them one clear instruction, use this for these issues
1661
01:12:47,320 –> 01:12:49,200
and if it escalates, that’s expected.
1662
01:12:49,200 –> 01:12:52,160
Now build the evaluation set, this is not test cases.
1663
01:12:52,160 –> 01:12:53,960
It’s a fixed list of the top questions
1664
01:12:53,960 –> 01:12:55,920
and intends the agent must handle.
1665
01:12:55,920 –> 01:12:59,360
VPN access, password resets, device compliance questions,
1666
01:12:59,360 –> 01:13:02,920
known outages, ticket status, and basic how-to procedures.
1667
01:13:02,920 –> 01:13:05,320
You run the same set every week to see drift
1668
01:13:05,320 –> 01:13:07,400
and you measure three numbers immediately,
1669
01:13:07,400 –> 01:13:09,840
containment rate, escalation reasons,
1670
01:13:09,840 –> 01:13:12,640
and grounded accuracy for anything that produced an answer
1671
01:13:12,640 –> 01:13:13,480
with a source.
1672
01:13:13,480 –> 01:13:15,320
If you don’t have sources yet, you still measure
1673
01:13:15,320 –> 01:13:18,120
refusal correctness, did it escalate when it should?
1674
01:13:18,120 –> 01:13:19,400
Now the gate to proceed.
1675
01:13:19,400 –> 01:13:21,600
You do not advance to days 11 through 20
1676
01:13:21,600 –> 01:13:23,200
because the team feels good.
1677
01:13:23,200 –> 01:13:26,240
You advance because the system meets three conditions.
1678
01:13:26,240 –> 01:13:29,760
Measureable lift against the baseline, no access violations,
1679
01:13:29,760 –> 01:13:32,080
and logging turned on with traceable outcomes.
1680
01:13:32,080 –> 01:13:35,480
Lift means the pilot produced a real reduction in human touches
1681
01:13:35,480 –> 01:13:37,680
for the selected intents, even if it’s small.
1682
01:13:37,680 –> 01:13:40,200
No access violations means the agent didn’t retrieve
1683
01:13:40,200 –> 01:13:42,880
restricted content or execute actions outside boundary.
1684
01:13:42,880 –> 01:13:45,040
Logging means you can explain every escalation
1685
01:13:45,040 –> 01:13:45,960
and every tool call.
1686
01:13:45,960 –> 01:13:48,200
If you can’t pass that gate in 10 days,
1687
01:13:48,200 –> 01:13:50,120
the program doesn’t need more features.
1688
01:13:50,120 –> 01:13:52,680
It needs tighter scope because the first 10 days
1689
01:13:52,680 –> 01:13:55,000
are about proving the system boundary holds.
1690
01:13:55,000 –> 01:13:57,240
Once it does, you’re allowed to solve the next problem,
1691
01:13:57,240 –> 01:13:58,360
trust at scale.
1692
01:13:58,360 –> 01:14:01,040
And trust only comes from grounding plus tool discipline,
1693
01:14:01,040 –> 01:14:02,360
enforced relentlessly.
1694
01:14:02,360 –> 01:14:06,000
Days 11 to 20, ground stabilize and reduce entropy.
1695
01:14:06,000 –> 01:14:09,520
Days 11 through 20 are where most programs either become a system
1696
01:14:09,520 –> 01:14:12,600
or become a clever demo that everyone quietly stops using.
1697
01:14:12,600 –> 01:14:15,200
Week 2 proved you can root and contain within a boundary.
1698
01:14:15,200 –> 01:14:17,320
Now you have to make the answers defensible,
1699
01:14:17,320 –> 01:14:20,480
the tools predictable, and the failure modes measurable.
1700
01:14:20,480 –> 01:14:23,800
This is the phase where entropy shows up as small exceptions.
1701
01:14:23,800 –> 01:14:26,440
And small exceptions are how agent programs die.
1702
01:14:26,440 –> 01:14:29,800
Days 11 through 13, build the Azure AI search index
1703
01:14:29,800 –> 01:14:32,200
and make the chunking strategy non-negotiable.
1704
01:14:32,200 –> 01:14:33,760
You’re not connecting SharePoint,
1705
01:14:33,760 –> 01:14:35,920
you’re building an operational knowledge index,
1706
01:14:35,920 –> 01:14:38,360
so you start by choosing what qualifies as truth.
1707
01:14:38,360 –> 01:14:41,440
Approved runbooks, SOPs, known issue articles and policies
1708
01:14:41,440 –> 01:14:43,760
with owners, if it’s a draft it doesn’t go in.
1709
01:14:43,760 –> 01:14:45,320
If it’s ownerless it doesn’t go in.
1710
01:14:45,320 –> 01:14:47,920
If it changes without change control it doesn’t go in.
1711
01:14:47,920 –> 01:14:49,880
Then chunking, don’t overthink it.
1712
01:14:49,880 –> 01:14:52,440
The unit of retrieval is the decision unit.
1713
01:14:52,440 –> 01:14:55,760
One procedure, one exception clause, one policy section.
1714
01:14:55,760 –> 01:14:57,760
If your chunk contains multiple outcomes,
1715
01:14:57,760 –> 01:14:59,960
you’ve created ambiguity on purpose.
1716
01:14:59,960 –> 01:15:02,760
Metadata follows, service, audience, region,
1717
01:15:02,760 –> 01:15:04,720
risk tier and last reviewed date.
1718
01:15:04,720 –> 01:15:06,920
Make metadata mandatory in the content pipeline
1719
01:15:06,920 –> 01:15:08,600
because optional metadata is metadata
1720
01:15:08,600 –> 01:15:10,200
that won’t exist where you need it.
1721
01:15:10,200 –> 01:15:12,280
Finally, design refresh cadence like you mean it.
1722
01:15:12,280 –> 01:15:15,520
Policies refresh on publish, known issues refresh frequently,
1723
01:15:15,520 –> 01:15:19,280
runbooks refresh on change, and you keep last indexed.
1724
01:15:19,280 –> 01:15:21,680
Visible in telemetry, because stale answers
1725
01:15:21,680 –> 01:15:24,200
and hallucinations look identical to the user.
1726
01:15:24,200 –> 01:15:28,080
Days 14 through 16 integrate MCP tools in read-only mode
1727
01:15:28,080 –> 01:15:30,560
and enforce no source, no answer.
1728
01:15:30,560 –> 01:15:32,680
This is where programs get tempted to move fast
1729
01:15:32,680 –> 01:15:34,760
by making the agent do things.
1730
01:15:34,760 –> 01:15:36,560
Don’t, not yet.
1731
01:15:36,560 –> 01:15:40,080
Start with MCP tools that only read check service health,
1732
01:15:40,080 –> 01:15:42,880
list known incidents, look up a user’s open tickets,
1733
01:15:42,880 –> 01:15:45,920
retrieve a service catalog entry, pull a ticket template.
1734
01:15:45,920 –> 01:15:47,320
This makes the agent more accurate
1735
01:15:47,320 –> 01:15:49,200
without letting it change state.
1736
01:15:49,200 –> 01:15:52,040
And it gives you tool telemetry without operational risk.
1737
01:15:52,040 –> 01:15:54,520
Then you enforce the grounding rule with teeth.
1738
01:15:54,520 –> 01:15:57,520
If the response is non-trivial and there’s no retrieved source
1739
01:15:57,520 –> 01:15:59,280
it refuses and escalates.
1740
01:15:59,280 –> 01:16:01,880
Not a friendly guess, not based on my knowledge.
1741
01:16:01,880 –> 01:16:04,720
A controlled refusal with an escalation path.
1742
01:16:04,720 –> 01:16:06,760
This is also where you explicitly separate knowledge
1743
01:16:06,760 –> 01:16:07,440
from action.
1744
01:16:07,440 –> 01:16:09,480
The agent can answer from grounded content.
1745
01:16:09,480 –> 01:16:12,080
The agent can propose an action, but execution only happens
1746
01:16:12,080 –> 01:16:15,160
through a tool call and only after you hit a decision boundary.
1747
01:16:15,160 –> 01:16:17,480
That separation is what stops healthfulness
1748
01:16:17,480 –> 01:16:20,440
from becoming unauthorized change.
1749
01:16:20,440 –> 01:16:23,160
Days 17 and 18 add approvals for write actions
1750
01:16:23,160 –> 01:16:24,960
and introduce adaptive cards.
1751
01:16:24,960 –> 01:16:26,560
Now you’re allowed to add write tools
1752
01:16:26,560 –> 01:16:28,400
but only inside a governed pattern.
1753
01:16:28,400 –> 01:16:31,320
Propose, confirm, approve, execute, log, pick one
1754
01:16:31,320 –> 01:16:34,080
or two write actions that are low risk and reversible.
1755
01:16:34,080 –> 01:16:35,600
Ticket creation is the obvious one.
1756
01:16:35,600 –> 01:16:37,560
Updating a ticket with context is another.
1757
01:16:37,560 –> 01:16:41,280
Anything involving identity, access, finance or deletion stays out.
1758
01:16:41,280 –> 01:16:43,560
Adaptive cards and teams become the approval surface
1759
01:16:43,560 –> 01:16:45,200
because they force clarity.
1760
01:16:45,200 –> 01:16:47,840
The approver sees who requested, what will change, why,
1761
01:16:47,840 –> 01:16:49,040
and which policy applies.
1762
01:16:49,040 –> 01:16:51,520
They click approve, deny or request more info,
1763
01:16:51,520 –> 01:16:53,640
no paragraph debates, no hidden side channels.
1764
01:16:53,640 –> 01:16:55,960
And the card isn’t just UX, it’s control.
1765
01:16:55,960 –> 01:16:58,600
The approval decision triggers a deterministic workflow,
1766
01:16:58,600 –> 01:17:01,240
records the payload, stamps who approved,
1767
01:17:01,240 –> 01:17:03,520
and then executes through the approved two path.
1768
01:17:03,520 –> 01:17:05,160
If you can’t show the approval record,
1769
01:17:05,160 –> 01:17:07,320
you didn’t approve anything, you just delayed it.
1770
01:17:07,320 –> 01:17:08,840
Days 19 and 20.
1771
01:17:08,840 –> 01:17:11,920
Red team prompts, injection tests and tighten fallbacks.
1772
01:17:11,920 –> 01:17:14,240
This is where you stop pretending users behave nicely.
1773
01:17:14,240 –> 01:17:16,360
You test prompt injection, you test instructions
1774
01:17:16,360 –> 01:17:17,760
that try to override policy,
1775
01:17:17,760 –> 01:17:20,040
you test ignore previous instructions patterns,
1776
01:17:20,040 –> 01:17:22,560
you test misleading inputs designed to force retrieval
1777
01:17:22,560 –> 01:17:23,680
of restricted content,
1778
01:17:23,680 –> 01:17:25,360
you test social engineering prompts
1779
01:17:25,360 –> 01:17:27,120
that try to get the agent to generate an action
1780
01:17:27,120 –> 01:17:28,200
it should never take.
1781
01:17:28,200 –> 01:17:30,320
And you don’t treat failures as model problems,
1782
01:17:30,320 –> 01:17:32,080
you treat them as boundary problems.
1783
01:17:32,080 –> 01:17:33,880
If the agent retrieved the wrong content,
1784
01:17:33,880 –> 01:17:35,440
fixed chunking and metadata,
1785
01:17:35,440 –> 01:17:37,280
if it answered without sources,
1786
01:17:37,280 –> 01:17:38,680
tighten the response policy.
1787
01:17:38,680 –> 01:17:40,000
If it tried to call a write tool
1788
01:17:40,000 –> 01:17:41,880
when it shouldn’t, tighten orchestration.
1789
01:17:41,880 –> 01:17:43,920
If it got stuck in two retry loops,
1790
01:17:43,920 –> 01:17:46,600
add hard stop conditions and escalation triggers.
1791
01:17:46,600 –> 01:17:49,120
Now the gate to proceed in today’s 21 through 30
1792
01:17:49,120 –> 01:17:52,080
is strict because this is where scale becomes liability.
1793
01:17:52,080 –> 01:17:54,640
You proceed only when three conditions are true.
1794
01:17:54,640 –> 01:17:58,160
Grounded accuracy in your evaluator set is above 85%,
1795
01:17:58,160 –> 01:18:00,520
rooting stability holds under real usage,
1796
01:18:00,520 –> 01:18:03,320
and escalation rates are trending down for the right reasons.
1797
01:18:03,320 –> 01:18:05,040
Not because the agent refuses everything,
1798
01:18:05,040 –> 01:18:06,400
because it retrieves correctly,
1799
01:18:06,400 –> 01:18:08,960
sites correctly and escalates only at real boundaries.
1800
01:18:08,960 –> 01:18:10,400
Once you pass that gate,
1801
01:18:10,400 –> 01:18:13,600
you’re ready for the final phase, scale without panic.
1802
01:18:13,600 –> 01:18:16,040
That means identity alignment, publishing discipline
1803
01:18:16,040 –> 01:18:18,280
and life cycle controls that stop the ecosystem
1804
01:18:18,280 –> 01:18:20,400
from rotting the moment it grows.
1805
01:18:20,400 –> 01:18:22,840
Days 21 30 scale to a workforce
1806
01:18:22,840 –> 01:18:24,480
without creating a liability.
1807
01:18:24,480 –> 01:18:28,120
Days 21 through 30 are where leadership usually ruins the win.
1808
01:18:28,880 –> 01:18:31,400
Week one and two produced a working system boundary,
1809
01:18:31,400 –> 01:18:33,600
week three proved grounding and approvals.
1810
01:18:33,600 –> 01:18:36,960
Now leadership sees momentum and asks for more agents
1811
01:18:36,960 –> 01:18:39,520
across more domains, more connectors, more autonomy,
1812
01:18:39,520 –> 01:18:40,320
more channels.
1813
01:18:40,320 –> 01:18:43,160
That request is how liability gets invited into production.
1814
01:18:43,160 –> 01:18:45,520
So this final phase is not build more.
1815
01:18:45,520 –> 01:18:47,400
It’s make the first one survivable,
1816
01:18:47,400 –> 01:18:49,200
then expand with discipline.
1817
01:18:49,200 –> 01:18:50,960
Days 21 through 23.
1818
01:18:50,960 –> 01:18:53,960
Align the agent with enterprise identity and least privilege.
1819
01:18:53,960 –> 01:18:56,200
This is where entry agent ID stops being conceptual
1820
01:18:56,200 –> 01:18:57,360
and becomes a dependency.
1821
01:18:57,360 –> 01:18:59,520
The agent needs a stable identity anchor
1822
01:18:59,520 –> 01:19:02,320
for audit, conditional access and incident response.
1823
01:19:02,320 –> 01:19:04,760
That identity also forces the uncomfortable question.
1824
01:19:04,760 –> 01:19:07,040
What permissions does this agent actually need?
1825
01:19:07,040 –> 01:19:09,240
And what permissions did it accidentally inherit
1826
01:19:09,240 –> 01:19:11,240
because someone clicked allow all?
1827
01:19:11,240 –> 01:19:13,200
So you do a least privilege review as a gate,
1828
01:19:13,200 –> 01:19:14,280
not as a cleanup task.
1829
01:19:14,280 –> 01:19:17,320
You separate read capability from write capability.
1830
01:19:17,320 –> 01:19:19,800
You keep write actions behind approvals.
1831
01:19:19,800 –> 01:19:21,920
And if the agent touches sensitive systems,
1832
01:19:21,920 –> 01:19:23,720
you apply conditional access constraints
1833
01:19:23,720 –> 01:19:27,280
that match reality, restrict where the identity can be used,
1834
01:19:27,280 –> 01:19:31,320
restrict session behavior and block high-risk sign-in conditions.
1835
01:19:31,320 –> 01:19:33,280
This is where you prevent the classic failure.
1836
01:19:33,280 –> 01:19:35,520
A helpful agent becomes a privileged actor
1837
01:19:35,520 –> 01:19:37,960
without anyone explicitly deciding it should.
1838
01:19:37,960 –> 01:19:39,880
Days 24 through 26.
1839
01:19:39,880 –> 01:19:42,040
Production ALM and controlled publishing.
1840
01:19:42,040 –> 01:19:43,400
If the agent is business critical,
1841
01:19:43,400 –> 01:19:45,000
it doesn’t ship like a maker project.
1842
01:19:45,000 –> 01:19:47,560
You enforce dev, test and proc separation
1843
01:19:47,560 –> 01:19:50,440
so you can change behavior without gambling in production.
1844
01:19:50,440 –> 01:19:52,120
You publish through a controlled path,
1845
01:19:52,120 –> 01:19:54,840
so builders can’t accidentally make something global.
1846
01:19:54,840 –> 01:19:57,000
You lock down who can update the production agent
1847
01:19:57,000 –> 01:19:58,640
and you keep rollback real.
1848
01:19:58,640 –> 01:20:00,960
If containment drops or escalation spike,
1849
01:20:00,960 –> 01:20:02,440
you revert and investigate.
1850
01:20:02,440 –> 01:20:05,520
No hero debugging, no late night prompt edits in prod.
1851
01:20:05,520 –> 01:20:07,640
And you formalize the, you can build,
1852
01:20:07,640 –> 01:20:10,880
but you can’t publish posture into a workflow.
1853
01:20:10,880 –> 01:20:12,480
Publishing requires a named owner,
1854
01:20:12,480 –> 01:20:14,840
named sponsor, tool-allow-list confirmation,
1855
01:20:14,840 –> 01:20:16,400
grounding policy confirmation,
1856
01:20:16,400 –> 01:20:18,360
and audit logging verification.
1857
01:20:18,360 –> 01:20:19,760
Not to slow teams down,
1858
01:20:19,760 –> 01:20:23,280
to keep the system deterministic as its scales.
1859
01:20:23,280 –> 01:20:26,200
Days 27 and 28, roll out to a target group
1860
01:20:26,200 –> 01:20:28,720
with adoption tied to workflows, not training theatre.
1861
01:20:28,720 –> 01:20:30,800
This is where most orgs waste time.
1862
01:20:30,800 –> 01:20:33,560
They run a co-pilot training, teach people how to prompt,
1863
01:20:33,560 –> 01:20:35,600
and then wonder why adoption stalls.
1864
01:20:35,600 –> 01:20:38,360
People don’t adopt prompts, they adopt outcomes.
1865
01:20:38,360 –> 01:20:39,960
So you roll out by workflow,
1866
01:20:39,960 –> 01:20:41,400
you put the agent in the channel
1867
01:20:41,400 –> 01:20:43,080
where the work already starts.
1868
01:20:43,080 –> 01:20:45,920
You give users three things, what issues it handles,
1869
01:20:45,920 –> 01:20:47,120
what it will escalate,
1870
01:20:47,120 –> 01:20:50,080
and what evidence it will show when it answers, that’s it.
1871
01:20:50,080 –> 01:20:51,720
Then you measure real usage,
1872
01:20:51,720 –> 01:20:53,160
task completion without handoff,
1873
01:20:53,160 –> 01:20:55,000
containment rate and escalation reasons.
1874
01:20:55,000 –> 01:20:57,640
If adoption is low, you don’t fix it with more training.
1875
01:20:57,640 –> 01:20:59,000
You fix it by tightening, rooting,
1876
01:20:59,000 –> 01:21:00,280
shortening the answer format
1877
01:21:00,280 –> 01:21:03,200
and adding the next action buttons that remove friction.
1878
01:21:03,200 –> 01:21:04,920
Days 29 and 30,
1879
01:21:04,920 –> 01:21:08,040
executive readout in the next 90-day plan.
1880
01:21:08,040 –> 01:21:10,080
The executive readout isn’t we built an agent,
1881
01:21:10,080 –> 01:21:10,920
that get up to,
1882
01:21:10,920 –> 01:21:12,280
it’s KPI deltas,
1883
01:21:12,280 –> 01:21:14,480
risk posture and operational truth.
1884
01:21:14,480 –> 01:21:17,120
Show the baseline and the change, deflection,
1885
01:21:17,120 –> 01:21:20,160
SLA impact, escalation reduction, and time saved,
1886
01:21:20,160 –> 01:21:23,520
show grounded accuracy and the evaluator set results,
1887
01:21:23,520 –> 01:21:24,920
show audit readiness,
1888
01:21:24,920 –> 01:21:27,080
logging enabled, tool usage tracked,
1889
01:21:27,080 –> 01:21:29,400
approvals captured and identity anchored.
1890
01:21:29,400 –> 01:21:31,600
Then show what you refuse to do on purpose,
1891
01:21:31,600 –> 01:21:33,960
no custom LLMs, no broad-right permissions,
1892
01:21:33,960 –> 01:21:36,160
no uncontrolled connectors, no agent sprawl.
1893
01:21:36,160 –> 01:21:38,960
Executives need to hear that restraint created the result
1894
01:21:38,960 –> 01:21:41,800
because the instinct will be to remove restraint next quarter.
1895
01:21:41,800 –> 01:21:44,720
Finally, define exit criteria before you declare victory.
1896
01:21:44,720 –> 01:21:46,440
Success is measurable ROI
1897
01:21:46,440 –> 01:21:48,960
plus audit-ready telemetry plus life cycle controls
1898
01:21:48,960 –> 01:21:50,000
that prevent sprawl.
1899
01:21:50,000 –> 01:21:51,520
If any one of those is missing,
1900
01:21:51,520 –> 01:21:53,160
you didn’t build an agente workforce.
1901
01:21:53,160 –> 01:21:56,640
You built a short-lived demo with a future incident attached
1902
01:21:56,640 –> 01:21:58,520
and the last transition is the simplest truth
1903
01:21:58,520 –> 01:21:59,680
in the entire roadmap.
1904
01:21:59,680 –> 01:22:02,080
Agents don’t scale because they’re intelligent.
1905
01:22:02,080 –> 01:22:05,120
They scale because you made their decisions enforceable.
1906
01:22:05,120 –> 01:22:08,560
Conclusion, the law replace work, don’t imitate chat.
1907
01:22:08,560 –> 01:22:09,640
The law is simple,
1908
01:22:09,640 –> 01:22:12,400
co-pilot succeeds when orchestration replaces work,
1909
01:22:12,400 –> 01:22:15,320
grounding enforces truth and identity enforces boundaries.
1910
01:22:15,320 –> 01:22:17,600
Everything else is just sparkling automation.
1911
01:22:17,600 –> 01:22:18,720
If you want the next step,
1912
01:22:18,720 –> 01:22:21,480
listen to the next M365FM episode
1913
01:22:21,480 –> 01:22:23,360
on building an enterprise agent catalog
1914
01:22:23,360 –> 01:22:25,840
that prevents sprawl without stalling delivery.
1915
01:22:25,840 –> 01:22:28,240
Subscribe for more uncomfortable architecture truths
1916
01:22:28,240 –> 01:22:29,880
about Entra, co-pilot studio,
1917
01:22:29,880 –> 01:22:31,440
and the systems that quietly break
1918
01:22:31,440 –> 01:22:33,160
when governance stays optional.






