
1
00:00:00,000 –> 00:00:03,560
The one validation that prevents smart agents doing dumb things.
2
00:00:03,560 –> 00:00:07,760
There’s one gate that turns clever into competent, the pre-execution contract check.
3
00:00:07,760 –> 00:00:10,960
Before any tool runs, the validator proves three things in order.
4
00:00:10,960 –> 00:00:14,960
In 10 matches are real capability, the caller has permission right now,
5
00:00:14,960 –> 00:00:18,480
and the requested outcome is feasible within the declared data boundaries.
6
00:00:18,480 –> 00:00:21,000
Fail any part, and nothing executes.
7
00:00:21,000 –> 00:00:22,880
Not be careful, not try anyway.
8
00:00:22,880 –> 00:00:24,520
Deny with reasons and alternatives.
9
00:00:24,520 –> 00:00:26,000
Start with capability match.
10
00:00:26,000 –> 00:00:28,200
The plan says update SharePoint list item.
11
00:00:28,200 –> 00:00:32,040
The validator asks which tool, which method, which schema version.
12
00:00:32,040 –> 00:00:37,960
Arguments are checked against the registry, required fields present, types correct, value ranges sane,
13
00:00:37,960 –> 00:00:41,800
and yes, no extra fields smuggled in, hoping someone ignores them.
14
00:00:41,800 –> 00:00:45,240
Tool aliasing is forbidden. Use the canonical name or get rejected.
15
00:00:45,240 –> 00:00:49,160
This kills hallucinated tools and gassy parameters before they can misbehave.
16
00:00:49,160 –> 00:00:51,560
Next, policy compliance.
17
00:00:51,560 –> 00:00:55,240
The policy engine evaluates the proposed call against active policy,
18
00:00:55,240 –> 00:00:59,480
allow lists for tools and domains, RBIAC or ABC checks tied to
19
00:00:59,480 –> 00:01:04,680
enter ID claims and scopes, environment, tier rules, data classification boundaries,
20
00:01:04,680 –> 00:01:06,600
if the agent is scoped to files.
21
00:01:06,600 –> 00:01:10,360
Read for a specific site, an update call or a different site is a hard no.
22
00:01:10,360 –> 00:01:14,920
If the payload dips into restricted classification without a human checkpoint also know,
23
00:01:14,920 –> 00:01:20,040
tokens are verified fresh, scopes are verified exact and privileged escalation by
24
00:01:20,040 –> 00:01:23,560
just this once is treated like what it is, an attempted breach,
25
00:01:23,560 –> 00:01:27,080
then post-conditioned feasibility. It’s not enough to want an outcome.
26
00:01:27,080 –> 00:01:29,320
It has to be achievable and verifiable.
27
00:01:29,320 –> 00:01:33,080
The validator asks, does the destination support idempotency keys?
28
00:01:33,080 –> 00:01:36,520
Will the system emit a durable identifier or ETAC we can check after?
29
00:01:36,520 –> 00:01:39,000
Is there a compensating action if downstream fails?
30
00:01:39,000 –> 00:01:43,400
If the plan can’t produce verifiable post-conditions, it’s rejected or rewritten.
31
00:01:43,400 –> 00:01:47,240
We don’t accept trust me, our logit later. Later is how incidents happen.
32
00:01:47,240 –> 00:01:51,560
Put those together and you get the triogate capability match policy compliance post-conditioned
33
00:01:51,560 –> 00:01:56,200
feasibility, pass all three and the executor proceeds, fail any and the deny with reason
34
00:01:56,200 –> 00:02:00,520
path activates. That path is polite, thorough and unambiguous.
35
00:02:00,520 –> 00:02:05,160
Here’s what you tried, here’s the exact policy or schema you broke, here are safe alternatives.
36
00:02:05,160 –> 00:02:09,560
If the user intent can be repaired, narrow the scope, switch to a read only summary,
37
00:02:09,560 –> 00:02:11,160
root to a permitted site.
38
00:02:11,160 –> 00:02:14,600
The validator proposes a compliant plan and asks for approval.
39
00:02:14,600 –> 00:02:19,000
If it can’t be repaired, escalate to a human checkpoint with full context or stop outright.
40
00:02:19,000 –> 00:02:21,400
No mystery stalls, no silence success.
41
00:02:21,400 –> 00:02:25,320
Quick micro story you’ve lived, even if you didn’t notice.
42
00:02:25,320 –> 00:02:29,720
The agent is asked to summarize all HR docs from last quarter an email legal.
43
00:02:29,720 –> 00:02:33,000
Retrieval proposes graph queries across HR and legal sites.
44
00:02:33,000 –> 00:02:34,920
Validator sees a boundary crossing.
45
00:02:34,920 –> 00:02:37,480
HR is restricted, legal is internal.
46
00:02:37,480 –> 00:02:40,600
The triogate blocks the cross-read and the outbound email.
47
00:02:40,600 –> 00:02:42,360
The deny path returns.
48
00:02:42,360 –> 00:02:44,040
Restricted content detected.
49
00:02:44,040 –> 00:02:47,080
Proposed alternative, summarize internal policy index,
50
00:02:47,080 –> 00:02:50,520
provide a request link for HR summary to authorised reviewers.
51
00:02:50,520 –> 00:02:53,720
User approves the compliant plan, executor runs it,
52
00:02:53,720 –> 00:02:57,560
sites internal content and attaches a permission request for the HR material.
53
00:02:57,560 –> 00:02:59,160
No leak, no drama, still helpful.
54
00:02:59,160 –> 00:03:03,000
Implementation is boring by design, a policy DSL defines who can do what,
55
00:03:03,000 –> 00:03:04,600
where, with which side effects,
56
00:03:04,600 –> 00:03:09,800
a schema registry stores tool contracts, names, versions, argument shapes and post-conditions.
57
00:03:09,800 –> 00:03:14,040
An allow list resolver maps domains, sites and scopes to environment tiers.
58
00:03:14,040 –> 00:03:18,040
The validator composes these, stamps every decision with an audit record inputs,
59
00:03:18,040 –> 00:03:22,920
policy evaluations outcomes and hands either a green light or a repair plan to the executor.
60
00:03:22,920 –> 00:03:27,160
The executor never freelances around a red light, it enforces the decision or stops.
61
00:03:27,160 –> 00:03:31,400
The mental model is clean enough to tattoo on the forehead of your architecture diagram.
62
00:03:31,400 –> 00:03:34,840
Executors enforce, graphs constrain, validators decide.
63
00:03:34,840 –> 00:03:37,880
In that order, the model proposes only within the fenced yard
64
00:03:37,880 –> 00:03:42,360
and the minute it tries to climb over, the validator pulls it back, explains why,
65
00:03:42,360 –> 00:03:43,880
and points to the gate.
66
00:03:43,880 –> 00:03:47,880
Order preserved, safety enforced, progress unblocked, within policy.
67
00:03:47,880 –> 00:03:52,360
You wanted one thing that prevents smart agents from doing dumb things.
68
00:03:52,360 –> 00:03:55,800
This is it a pre-execution contract check that proves capability,
69
00:03:55,800 –> 00:03:59,240
permission and verifiable outcome before any real world mutation.
70
00:03:59,240 –> 00:04:03,400
It turns, I think I can into, I am allowed, I know how and I can prove I did.
71
00:04:03,400 –> 00:04:08,200
You now have the architecture.
72
00:04:08,200 –> 00:04:09,000
Use it.
73
00:04:09,000 –> 00:04:10,840
Key takeaway plus CT.
74
00:04:10,840 –> 00:04:12,840
Key takeaway prompts our opinions,
75
00:04:12,840 –> 00:04:15,400
executors and validated graphs are operations,
76
00:04:15,400 –> 00:04:19,000
and the pre-execution contract check is the guardrail that keeps both honest.
77
00:04:19,000 –> 00:04:22,120
If this saved you time, repay the debt.
78
00:04:22,120 –> 00:04:22,680
Subscribe.
79
00:04:22,680 –> 00:04:28,920
Next, watch the long graph versus Microsoft agent framework breakdown on performance and observability,
80
00:04:28,920 –> 00:04:31,560
actual traces, costs and P95s.
81
00:04:31,560 –> 00:04:35,880
Lock in your upgrade path, follow, enable alerts and get the next episode delivered on schedule.
82
00:04:35,880 –> 00:04:36,520
Proceed.
83
00:04:36,520 –> 00:04:38,920
Most people think better prompts fix flaky agents.
84
00:04:38,920 –> 00:04:42,680
Cute theory, prompt skythodes, they don’t execute operations.
85
00:04:42,680 –> 00:04:46,520
The truth, reliability comes from executors and graph validation.
86
00:04:46,520 –> 00:04:49,560
The spine that keeps agents from face planting when reality shows up.
87
00:04:49,560 –> 00:04:51,560
We’re going to wire this to Microsoft scenarios.
88
00:04:51,560 –> 00:04:55,160
Microsoft 365 Graph Retrieval as your open AI reasoning
89
00:04:55,160 –> 00:04:57,800
and co-pilot studio agents that don’t go rogue.
90
00:04:57,800 –> 00:05:02,920
Stakes are simple, accuracy, latency, cost and auditability.
91
00:05:02,920 –> 00:05:04,680
Measureable, not vibes.
92
00:05:04,680 –> 00:05:07,000
I’ll give you a mental model you can run in your head,
93
00:05:07,000 –> 00:05:12,360
diagrams in words you won’t forget and one validation step that stops smart agents from doing dumb things.
94
00:05:12,360 –> 00:05:14,840
Enter the architecture you should have used day one.
95
00:05:14,840 –> 00:05:18,360
Why prompts fail at operations executors don’t?
96
00:05:18,360 –> 00:05:19,560
Okay, so here’s the thing.
97
00:05:19,560 –> 00:05:21,080
LLM’s handle cognition.
98
00:05:21,080 –> 00:05:23,720
Executors handle operations.
99
00:05:23,720 –> 00:05:25,400
Mixing those is how you get chaos.
100
00:05:25,400 –> 00:05:27,320
The model can propose a plan.
101
00:05:27,320 –> 00:05:32,600
It cannot guarantee that the email was sent, the file was saved or the permission existed.
102
00:05:32,600 –> 00:05:35,880
It speaks in probabilities operations demand guarantees.
103
00:05:35,880 –> 00:05:37,320
Enter the executor.
104
00:05:37,320 –> 00:05:42,520
Think of it as the adult in the room, a policy bound function runner with state constraints and item potency.
105
00:05:42,520 –> 00:05:45,720
It doesn’t believe an action succeeded.
106
00:05:45,720 –> 00:05:46,520
It checks.
107
00:05:46,520 –> 00:05:48,440
It doesn’t assume a tool exists.
108
00:05:48,440 –> 00:05:52,920
It validates capability, parameters and permissions before it even tries.
109
00:05:52,920 –> 00:05:58,200
And when it fails, it fails loudly, classifies the error and takes the prescribed recovery path.
110
00:05:58,200 –> 00:06:01,000
Most prompt only agents fall into three failure modes.
111
00:06:01,000 –> 00:06:03,880
First, hallucinated tools.
112
00:06:03,880 –> 00:06:08,760
The model requests a function that isn’t registered or calls it with fields that don’t exist.
113
00:06:08,760 –> 00:06:10,600
Second, missing preconditions.
114
00:06:10,600 –> 00:06:16,600
It tries to edit a SharePoint file without checking if it can access the site, the list or the item version.
115
00:06:16,600 –> 00:06:21,480
Third, Silent Partials step two fails, but the agent keeps going and declares victory because the text looks confident.
116
00:06:21,480 –> 00:06:22,200
You’ve seen this.
117
00:06:22,200 –> 00:06:23,640
You just called it flaky.
118
00:06:23,640 –> 00:06:26,600
Executors run a loop that looks boring and that’s why it works.
119
00:06:26,600 –> 00:06:27,560
Preconditions.
120
00:06:27,560 –> 00:06:30,040
Verify inputs, permissions and invariants.
121
00:06:30,040 –> 00:06:30,680
Action.
122
00:06:30,680 –> 00:06:34,840
Call the tool with an item potency key so retries won’t double bill or double post.
123
00:06:34,840 –> 00:06:36,600
Post-conditions.
124
00:06:36,600 –> 00:06:38,600
Confirm effects against the source of truth.
125
00:06:38,600 –> 00:06:40,680
Error taxonomy.
126
00:06:40,680 –> 00:06:44,360
Is it validation, transient, rate limit, youth or policy?
127
00:06:44,360 –> 00:06:45,960
Recovery.
128
00:06:45,960 –> 00:06:49,400
Back-off and retry for transient, re-auth or reconsent for oath,
129
00:06:49,400 –> 00:06:53,080
fallbacks for known alternates and hard stops with reasons for policy violations.
130
00:06:53,080 –> 00:06:55,080
Edempotency is non-negotiable.
131
00:06:55,080 –> 00:07:00,360
If an action can be retried, it needs a key that makes the second attempt a no-op or a consistent override.
132
00:07:00,360 –> 00:07:02,120
Timeouts prevent zombie calls.
133
00:07:02,120 –> 00:07:03,720
Back-off respects rate limits.
134
00:07:03,720 –> 00:07:08,520
This is deterministic behavior, governing, inherently, stochastic text generation.
135
00:07:08,520 –> 00:07:11,720
The executor is the gearbox, the LLM is the engine.
136
00:07:11,720 –> 00:07:14,680
You don’t floor the engine and hope the wheels understand.
137
00:07:14,680 –> 00:07:17,800
Contract first outputs are the antidote to best effort paragraphs.
138
00:07:17,800 –> 00:07:20,280
The reasoning model doesn’t get to ramble.
139
00:07:20,280 –> 00:07:24,840
It emits JSON matching a schema, tool name, arguments and expected post-conditions.
140
00:07:24,840 –> 00:07:27,640
Validators check schema compliance before anything runs.
141
00:07:27,640 –> 00:07:30,200
If the shape is wrong, the executor denies with a reason,
142
00:07:30,200 –> 00:07:32,200
requests a corrected plan or escalates.
143
00:07:32,200 –> 00:07:35,240
That’s how you stop noun-salads from becoming production incidents.
144
00:07:35,240 –> 00:07:38,760
Now you might be thinking, can’t I just prompt the model to check its work?
145
00:07:38,760 –> 00:07:41,000
You can ask, it will say yes, it always does.
146
00:07:41,000 –> 00:07:42,680
The average user accepts that.
147
00:07:42,680 –> 00:07:47,000
Professionals require proofs, proofs live in post-conditions verified against systems like
148
00:07:47,000 –> 00:07:51,640
Microsoft Graph, SharePoint or Exchange, real sources of truth, not the model’s memory.
149
00:07:51,640 –> 00:07:53,240
But here’s where it gets interesting.
150
00:07:53,240 –> 00:07:54,520
Single steps are fine.
151
00:07:54,520 –> 00:07:58,920
Real workflows have branches, parallelism, compensations and human checkpoints.
152
00:07:58,920 –> 00:08:02,040
You need more than an executor, you need to map the executor can read.
153
00:08:02,040 –> 00:08:03,160
That’s a workflow graph.
154
00:08:03,160 –> 00:08:06,280
In a graph, nodes are tasks or sub-agents with explicit contracts.
155
00:08:06,280 –> 00:08:08,360
Edge is defined, control flow and data flow.
156
00:08:08,360 –> 00:08:09,320
State is first class.
157
00:08:09,320 –> 00:08:11,880
What persists, what’s check-pointed, what’s ephemeral.
158
00:08:11,880 –> 00:08:16,360
The executor walks the graph deterministically honoring allow lists and schemers at each edge.
159
00:08:16,360 –> 00:08:19,880
If a node fails, the graph specifies compensations.
160
00:08:19,880 –> 00:08:22,280
Undo, repair or escalate.
161
00:08:22,280 –> 00:08:25,320
No mystery pipes, no and then the magic happens.
162
00:08:25,320 –> 00:08:27,080
This is the moment reliability flips.
163
00:08:27,080 –> 00:08:28,200
The LLM proposes.
164
00:08:28,200 –> 00:08:32,120
The executor enforces the graph constraints validation decides.
165
00:08:32,120 –> 00:08:33,400
Order, restored.
166
00:08:33,400 –> 00:08:35,800
Once you separate cognition from operations,
167
00:08:35,800 –> 00:08:38,280
your agents stop improvising and start behaving.
168
00:08:38,280 –> 00:08:41,800
And yes, they still feel smart because they are, but now they are supervised.
169
00:08:41,800 –> 00:08:42,920
Blueprint ready.
170
00:08:42,920 –> 00:08:46,920
Let’s wire it to Microsoft 365 without turning your tenant into a buffet.
171
00:08:46,920 –> 00:08:49,080
Graph workflows 101.
172
00:08:49,080 –> 00:08:51,400
Nodes, edges and state that doesn’t leak.
173
00:08:51,400 –> 00:08:52,200
Picture this.
174
00:08:52,200 –> 00:08:54,440
You’ve got a reliable executor but no map.
175
00:08:54,440 –> 00:08:56,680
It will follow orders, but to where?
176
00:08:56,680 –> 00:08:59,240
Enter the workflow graph, your operating manual.
177
00:08:59,240 –> 00:09:02,520
Nodes are tasks or sub-agents with explicit contracts.
178
00:09:02,520 –> 00:09:04,360
Edge is defined, control flow and data flow.
179
00:09:04,360 –> 00:09:06,920
The graph encodes what runs, when it runs,
180
00:09:06,920 –> 00:09:09,080
and what data is allowed to cross boundaries.
181
00:09:09,080 –> 00:09:10,600
No improvisational jazz.
182
00:09:10,600 –> 00:09:11,880
This is sheet music.
183
00:09:11,880 –> 00:09:13,000
Start with the shape.
184
00:09:13,000 –> 00:09:16,360
Most production graphs are DAGs, directed acyclic graphs.
185
00:09:16,360 –> 00:09:19,880
Because cycles invite infinite loops and state corrosion.
186
00:09:19,880 –> 00:09:23,640
You can still have loops, but you mark them intentionally with counters and guards.
187
00:09:23,640 –> 00:09:26,600
Maximum iterations, exit predicates and timeouts.
188
00:09:26,600 –> 00:09:29,320
That’s how you keep think harder from becoming think forever.
189
00:09:29,320 –> 00:09:32,280
Conditional routing is explicit.
190
00:09:32,280 –> 00:09:36,520
If the retrieval confidence is above threshold, branch to synthesis.
191
00:09:36,520 –> 00:09:38,280
If not, branch to requery.
192
00:09:38,280 –> 00:09:40,280
Parallelism is first class.
193
00:09:40,280 –> 00:09:43,400
Run summarization and citation extraction side by side,
194
00:09:43,400 –> 00:09:47,720
then join at a barrier node that verifies both met their post-conditions.
195
00:09:47,720 –> 00:09:49,320
State is not a vibe, it’s a ledger.
196
00:09:49,320 –> 00:09:51,320
You maintain three kinds, persistent state.
197
00:09:51,320 –> 00:09:53,880
Durable checkpoints you can recover from after a crash.
198
00:09:53,880 –> 00:09:55,800
Inputs, decisions, signed actions.
199
00:09:55,800 –> 00:09:58,920
Ephemeral state short-lived buffers like intermediate model outputs
200
00:09:58,920 –> 00:10:00,680
you don’t want to pollute long-term memory.
201
00:10:00,680 –> 00:10:04,520
Derived state, re-computable artifacts like embeddings or filtered results
202
00:10:04,520 –> 00:10:06,040
you can rebuild deterministically.
203
00:10:06,040 –> 00:10:06,920
The rule is simple.
204
00:10:06,920 –> 00:10:09,080
Only persist what you can defend in an audit.
205
00:10:09,080 –> 00:10:10,840
Everything else is disposable on purpose.
206
00:10:10,840 –> 00:10:12,440
Rollback strategy matters.
207
00:10:12,440 –> 00:10:14,440
When a node mutates the outside world,
208
00:10:14,440 –> 00:10:17,320
creates a calendar event, updates a list item,
209
00:10:17,320 –> 00:10:21,560
you record an inverse action if it exists or a compensating plan if it doesn’t.
210
00:10:21,560 –> 00:10:25,960
If a downstream node fails fatally, the graph can walk those compensations in reverse order.
211
00:10:25,960 –> 00:10:27,160
No, this is not overkill.
212
00:10:27,160 –> 00:10:31,480
It’s how you avoid oops, double-booked the CEO becoming oops, we can’t fix it.
213
00:10:31,480 –> 00:10:32,920
Edge is carry contracts.
214
00:10:32,920 –> 00:10:34,680
An edge isn’t a mystery pipe.
215
00:10:34,680 –> 00:10:36,520
It’s an API between nodes.
216
00:10:36,520 –> 00:10:38,120
Define IO schemas.
217
00:10:38,120 –> 00:10:40,280
Types required fields allowed values.
218
00:10:40,280 –> 00:10:43,800
Define allow lists, what tools or domains the next node may call.
219
00:10:43,800 –> 00:10:46,760
Define capability tags, what the receiving node promises to do
220
00:10:46,760 –> 00:10:48,120
and what it will refuse.
221
00:10:48,120 –> 00:10:50,440
The executor enforces those at runtime.
222
00:10:50,440 –> 00:10:53,720
A node can’t smuggle a sharepoint token through a summary edge.
223
00:10:53,720 –> 00:10:55,080
That’s not security theater.
224
00:10:55,080 –> 00:10:58,120
That’s how you prevent lateral movement by your own agent.
225
00:10:58,120 –> 00:11:01,560
Error handling lives in the graph, not in vibes.
226
00:11:01,560 –> 00:11:05,400
Every node declares its error taxonomy, validation error,
227
00:11:05,400 –> 00:11:07,880
transient infrastructure, rate limiting,
228
00:11:07,880 –> 00:11:12,200
authentication, authorization, policy violation, and unknown.
229
00:11:12,200 –> 00:11:14,760
For each class, the graph provides a path.
230
00:11:14,760 –> 00:11:17,400
Retry with exponential back-off for transient,
231
00:11:17,400 –> 00:11:20,520
refreshed token for authentication, alternate tool for rate limiting,
232
00:11:20,520 –> 00:11:22,200
deny with reason for policy.
233
00:11:22,200 –> 00:11:24,360
Deadlet accuse exist for the unknowns.
234
00:11:24,360 –> 00:11:27,000
Failed payloads go to quarantine with full context,
235
00:11:27,000 –> 00:11:30,440
so humans can inspect without replaying chaos into production.
236
00:11:30,440 –> 00:11:34,200
Human in the loop check points are nodes, not ad hoc Slack messages.
237
00:11:34,200 –> 00:11:37,320
They freeze the execution, present the proposed action and evidence,
238
00:11:37,320 –> 00:11:40,040
and require an approval or edit that’s logged and signed.
239
00:11:40,040 –> 00:11:41,640
Once approved execution resumes,
240
00:11:41,640 –> 00:11:44,200
if denied the graph routes to a safe fallback.
241
00:11:44,200 –> 00:11:46,680
Congratulations, you’ve just implemented change control
242
00:11:46,680 –> 00:11:48,600
that developers will actually follow
243
00:11:48,600 –> 00:11:49,960
because it’s faster than email.
244
00:11:49,960 –> 00:11:52,120
Memory isolation is non-negotiable.
245
00:11:52,120 –> 00:11:53,880
Each session gets scoped context,
246
00:11:53,880 –> 00:11:57,160
only the documents, tokens, and intermediate results it needs.
247
00:11:57,160 –> 00:12:00,280
Cross-session poisoning, where one conversation’s prompt injection bleeds
248
00:12:00,280 –> 00:12:03,640
into another is how you accidentally ex-filterate data.
249
00:12:03,640 –> 00:12:05,080
The graph enforces boundaries.
250
00:12:05,080 –> 00:12:08,520
No shared mutable memory, only sanctioned reads from a vetted store
251
00:12:08,520 –> 00:12:10,520
with content filters and schema validators.
252
00:12:10,520 –> 00:12:12,280
Yes, you can cache embeddings and summaries,
253
00:12:12,280 –> 00:12:14,200
but you tag them with provenance and permissions
254
00:12:14,200 –> 00:12:16,040
and you evict them on policy changes.
255
00:12:16,040 –> 00:12:18,520
Observability is built in, not bolted on.
256
00:12:18,520 –> 00:12:21,080
Node-level traces show inputs, outputs, durations,
257
00:12:21,080 –> 00:12:22,520
retries, and downstream effects.
258
00:12:22,520 –> 00:12:24,680
You stitch traces into a graph run ID
259
00:12:24,680 –> 00:12:27,880
so you can replay, diagnose, and prove compliance.
260
00:12:27,880 –> 00:12:28,920
Who did what?
261
00:12:28,920 –> 00:12:30,040
When and why?
262
00:12:30,040 –> 00:12:32,440
Anomaly detection flags weird patterns.
263
00:12:32,440 –> 00:12:35,480
Sudden spikes in tool calls, unusual domains, token blowups.
264
00:12:35,480 –> 00:12:36,680
That’s your early warning system
265
00:12:36,680 –> 00:12:38,920
before interesting becomes incident.
266
00:12:38,920 –> 00:12:40,920
Essentially, the graph is the constitution.
267
00:12:40,920 –> 00:12:42,440
The executor is law enforcement.
268
00:12:42,440 –> 00:12:44,360
The LLM is counsel, not judge.
269
00:12:44,360 –> 00:12:47,400
When you encode nodes, edges, and state like this,
270
00:12:47,400 –> 00:12:49,640
you don’t just get workflows that succeed.
271
00:12:49,640 –> 00:12:52,280
You get workflows that fail safely, explain themselves
272
00:12:52,280 –> 00:12:53,800
and recover predictably.
273
00:12:53,800 –> 00:12:57,000
Now, blueprint in hand, we can connect to Microsoft 365
274
00:12:57,000 –> 00:12:59,400
without turning your tenant into a buffer.
275
00:12:59,400 –> 00:13:02,920
Secure by design, graph validation beats chaos engineering.
276
00:13:02,920 –> 00:13:05,800
You don’t prove reliability by throwing chaos at production
277
00:13:05,800 –> 00:13:08,040
and hoping the survivors are resilient.
278
00:13:08,040 –> 00:13:11,400
You prove it by rejecting unsafe workflows before they ever run.
279
00:13:11,400 –> 00:13:12,440
That’s graph validation.
280
00:13:12,440 –> 00:13:14,920
Static checks to keep nonsense out, run time guard rails
281
00:13:14,920 –> 00:13:16,360
to keep danger in a box.
282
00:13:16,360 –> 00:13:18,040
Static validation is the pre-flight.
283
00:13:18,040 –> 00:13:19,800
You check the structure before wheels up.
284
00:13:19,800 –> 00:13:22,120
Cycles that create unbounded loops,
285
00:13:22,120 –> 00:13:24,280
rejected or forced to declare iteration guards,
286
00:13:24,280 –> 00:13:27,640
unreachable nodes, dead code is risk, delete or justify.
287
00:13:27,640 –> 00:13:28,920
Missing contracts?
288
00:13:28,920 –> 00:13:31,400
Every node must declare input and output schemers
289
00:13:31,400 –> 00:13:33,480
required capabilities and side effects.
290
00:13:33,480 –> 00:13:36,760
Privileged boundaries, nodes that mutate external systems
291
00:13:36,760 –> 00:13:39,800
must run in segments with least privileged credentials
292
00:13:39,800 –> 00:13:41,080
and explicit allow lists.
293
00:13:41,080 –> 00:13:43,640
And yes, if your summarized node suddenly requests
294
00:13:43,640 –> 00:13:46,920
right access to SharePoint, the validator says no with prejudice.
295
00:13:46,920 –> 00:13:48,120
Now, run time.
296
00:13:48,120 –> 00:13:49,640
This is where people get sloppy.
297
00:13:49,640 –> 00:13:53,000
A policy engine sits beside the executor, not behind it.
298
00:13:53,000 –> 00:13:55,320
Tool and domain allow lists aren’t documentation
299
00:13:55,320 –> 00:13:56,760
they’re enforced decisions.
300
00:13:56,760 –> 00:13:59,560
At call time, the engine checks R-back or R-back
301
00:13:59,560 –> 00:14:00,920
against the active principle,
302
00:14:00,920 –> 00:14:03,320
Entra ID token, scopes, claims,
303
00:14:03,320 –> 00:14:05,160
and propagates auth correctly down the chain.
304
00:14:05,160 –> 00:14:06,600
No ambient superpowers.
305
00:14:06,600 –> 00:14:08,840
Tokens are scoped, refreshed when permitted,
306
00:14:08,840 –> 00:14:10,760
and never smuggled through friendly edges.
307
00:14:10,760 –> 00:14:13,720
The agent earns access on every call or a doesn’t call.
308
00:14:13,720 –> 00:14:16,680
Input and output sanitization is hygiene, not optional.
309
00:14:16,680 –> 00:14:18,440
Prompt injection isn’t clever.
310
00:14:18,440 –> 00:14:19,720
It’s predictable.
311
00:14:19,720 –> 00:14:22,520
Every inbound content source passes through content filters,
312
00:14:22,520 –> 00:14:24,200
HTML’s, markdown scrubbers,
313
00:14:24,200 –> 00:14:27,560
and instruction firewalls that strip ignore previous nonsense.
314
00:14:27,560 –> 00:14:30,040
Output passes schema validators.
315
00:14:30,040 –> 00:14:31,800
If a node promised Jason,
316
00:14:31,800 –> 00:14:34,840
the executor rejects pros and requests a repair.
317
00:14:34,840 –> 00:14:36,120
The model can argue,
318
00:14:36,120 –> 00:14:37,720
the validator doesn’t negotiate.
319
00:14:37,720 –> 00:14:40,440
It enforces types, ranges, and invariants.
320
00:14:40,440 –> 00:14:43,320
Sandboxing and segmentation contain the blast radius.
321
00:14:43,320 –> 00:14:46,440
Notes that call external code or untrusted connectors
322
00:14:46,440 –> 00:14:47,880
run in constrained environments.
323
00:14:47,880 –> 00:14:50,920
API gateways with rate limits and out-of-there network policies
324
00:14:50,920 –> 00:14:53,800
that prevent lateral movement and egress controls
325
00:14:53,800 –> 00:14:56,600
that only permit traffic to vetted domains.
326
00:14:56,600 –> 00:15:01,000
You don’t let a retrieval node discover a new data source in production.
327
00:15:01,000 –> 00:15:02,440
Discovery happens in dev,
328
00:15:02,440 –> 00:15:05,240
behind tests with signed updates to the allow list.
329
00:15:05,240 –> 00:15:07,880
Observability isn’t log somewhere, it’s surgical.
330
00:15:07,880 –> 00:15:10,040
Node-level tracing records, inputs, outputs,
331
00:15:10,040 –> 00:15:12,680
durations, retries, and decisions from the policy engine.
332
00:15:12,680 –> 00:15:15,000
You correlate everything under a run ID.
333
00:15:15,000 –> 00:15:17,160
Audit logs are immutable and attributed,
334
00:15:17,160 –> 00:15:18,440
which principle authorized,
335
00:15:18,440 –> 00:15:20,760
which action with what scopes at what time
336
00:15:20,760 –> 00:15:22,280
and why the policy allowed it.
337
00:15:22,280 –> 00:15:24,120
When a regulator asks who did what,
338
00:15:24,120 –> 00:15:25,560
you don’t shrug, you search.
339
00:15:25,560 –> 00:15:28,600
Compliance ready means your evidence is boring and complete.
340
00:15:28,600 –> 00:15:31,000
Patch and third party risk are supply chain problems.
341
00:15:31,000 –> 00:15:32,040
Treat them like it.
342
00:15:32,040 –> 00:15:33,960
Dependency scanning runs on every build.
343
00:15:33,960 –> 00:15:35,560
Connectors and plugins are audited,
344
00:15:35,560 –> 00:15:37,800
version pinned, and reviewed for permission creep.
345
00:15:37,800 –> 00:15:41,000
If you import an MCP or open API spec,
346
00:15:41,000 –> 00:15:43,000
you validate that the declared methods
347
00:15:43,000 –> 00:15:45,000
match the least privileged policy you expect.
348
00:15:45,000 –> 00:15:47,240
No wildcard endpoints,
349
00:15:47,240 –> 00:15:49,320
no hidden right paths pretending to be red.
350
00:15:49,320 –> 00:15:51,640
Hygiene isn’t glamorous,
351
00:15:51,640 –> 00:15:53,240
it is however the reason you sleep.
352
00:15:53,240 –> 00:15:56,520
The truth, graph validation, outperforms chaos engineering
353
00:15:56,520 –> 00:15:58,440
because it prevents classes of incidents
354
00:15:58,440 –> 00:16:00,120
rather than documenting their fallout.
355
00:16:00,120 –> 00:16:01,320
You still test failure modes,
356
00:16:01,320 –> 00:16:02,920
but you do it to confirm guardrails
357
00:16:02,920 –> 00:16:04,840
not to discover that you forgot to install them.
358
00:16:04,840 –> 00:16:06,520
Let’s stitch this back to the mental model.
359
00:16:06,520 –> 00:16:09,080
Static checks keep the blueprint sane.
360
00:16:09,080 –> 00:16:12,200
No impossible paths, no orphaned work,
361
00:16:12,200 –> 00:16:13,240
no privileged leaks.
362
00:16:13,240 –> 00:16:16,120
Runtime guardrails keep behavior sane.
363
00:16:16,120 –> 00:16:17,560
Every call authenticated,
364
00:16:17,560 –> 00:16:19,880
authorized, sanitized, and observed.
365
00:16:19,880 –> 00:16:22,040
The executor enforces the graph constraints
366
00:16:22,040 –> 00:16:23,880
the validator decides the model,
367
00:16:23,880 –> 00:16:25,560
it proposes within boundaries
368
00:16:25,560 –> 00:16:27,240
and gets clipped when it wonders.
369
00:16:27,240 –> 00:16:28,520
Isn’t this heavy?
370
00:16:28,520 –> 00:16:30,280
Only if you enjoy breaches.
371
00:16:30,280 –> 00:16:32,520
The overhead is mechanical and automated.
372
00:16:32,520 –> 00:16:35,640
Static validation runs at build time and deploy time.
373
00:16:35,640 –> 00:16:37,160
Runtime policy is a sidecar,
374
00:16:37,160 –> 00:16:38,200
fast and local.
375
00:16:38,200 –> 00:16:39,880
Schema checks are milliseconds.
376
00:16:39,880 –> 00:16:42,680
The cost you remove, incidents, manual reviews,
377
00:16:42,680 –> 00:16:45,720
retrofits, dwarfs the micro latency you add.
378
00:16:45,720 –> 00:16:47,240
And the payoff is measurable,
379
00:16:47,240 –> 00:16:49,080
fewer unauthorized calls,
380
00:16:49,080 –> 00:16:50,600
fewer token blowups,
381
00:16:50,600 –> 00:16:52,600
and far fewer why did it do that?
382
00:16:52,600 –> 00:16:53,400
Post mortems.
383
00:16:53,400 –> 00:16:56,440
One more point, the average user misses.
384
00:16:56,440 –> 00:16:58,600
Validation is composable.
385
00:16:58,600 –> 00:17:01,000
You can wrap third party tools with proxy nodes
386
00:17:01,000 –> 00:17:03,080
that enforce contracts and policies
387
00:17:03,080 –> 00:17:04,520
without trusting the tool itself.
388
00:17:04,520 –> 00:17:06,760
You can segment graphs by sensitivity,
389
00:17:06,760 –> 00:17:08,840
public, internal, restricted,
390
00:17:08,840 –> 00:17:10,680
and promote workflows between tiers
391
00:17:10,680 –> 00:17:13,240
only after validation passes for the new boundary.
392
00:17:13,240 –> 00:17:14,760
That’s how you scale safely.
393
00:17:14,760 –> 00:17:16,360
Secure by design isn’t a slogan,
394
00:17:16,360 –> 00:17:17,800
it’s a workflow property.
395
00:17:17,800 –> 00:17:19,640
You don’t hope agents behave.
396
00:17:19,640 –> 00:17:21,800
You make misbehavior structurally hard
397
00:17:21,800 –> 00:17:23,720
and operationally visible.
398
00:17:23,720 –> 00:17:24,760
With the rails in place,
399
00:17:24,760 –> 00:17:27,480
plugging in Microsoft 365 Graph and Azure OpenAI
400
00:17:27,480 –> 00:17:28,520
isn’t roulette.
401
00:17:28,520 –> 00:17:30,360
It’s controlled power on your terms.
402
00:17:30,360 –> 00:17:32,920
Now we can talk about wiring, not firefighting.
403
00:17:32,920 –> 00:17:34,520
The Microsoft scenario,
404
00:17:34,520 –> 00:17:38,920
M365 Graph plus Azure OpenAI plus Copilot Studio.
405
00:17:38,920 –> 00:17:40,360
Let’s assemble the cast.
406
00:17:40,360 –> 00:17:42,600
Retrieval agent, disciplined librarian
407
00:17:42,600 –> 00:17:45,320
that only fetches from Microsoft Graph with least privilege.
408
00:17:45,320 –> 00:17:47,160
Reasoning agent,
409
00:17:47,160 –> 00:17:49,080
Azure OpenAI model that plans,
410
00:17:49,080 –> 00:17:51,160
sites, and never freelancers past its brief.
411
00:17:51,160 –> 00:17:53,960
Executor, policy bound operator
412
00:17:53,960 –> 00:17:55,960
that runs tools with idempotency
413
00:17:55,960 –> 00:17:57,320
and post-conditioned checks.
414
00:17:57,960 –> 00:18:00,920
Valley data, the bouncer, schema, policy,
415
00:18:00,920 –> 00:18:02,360
and boundary enforcement.
416
00:18:02,360 –> 00:18:04,120
Policy guard, runtime site card
417
00:18:04,120 –> 00:18:06,920
that ties everything to enter id scopes and R-Back.
418
00:18:06,920 –> 00:18:08,920
Together, they behave like a competent team
419
00:18:08,920 –> 00:18:10,440
instead of a committee thread.
420
00:18:10,440 –> 00:18:12,360
Data access starts with Graph, not guesswork.
421
00:18:12,360 –> 00:18:14,200
The retrieval agent holds an app registration
422
00:18:14,200 –> 00:18:16,280
with granular scopes, files.
423
00:18:16,280 –> 00:18:17,880
Read for a specific site, mail.
424
00:18:17,880 –> 00:18:20,280
Read basic for a confined mailbox calendars.
425
00:18:20,280 –> 00:18:21,880
Read for a resource calendar.
426
00:18:21,880 –> 00:18:24,360
No graph read right all heroics.
427
00:18:24,360 –> 00:18:26,200
Queries use delta and selective fields
428
00:18:26,200 –> 00:18:27,400
to keep payloads thin.
429
00:18:27,960 –> 00:18:28,920
Paging is first class.
430
00:18:28,920 –> 00:18:31,160
The executor follows next links deterministically
431
00:18:31,160 –> 00:18:33,480
with timeouts, honoring service throttling.
432
00:18:33,480 –> 00:18:35,000
And when 429s happen,
433
00:18:35,000 –> 00:18:36,360
back off is mathematical.
434
00:18:36,360 –> 00:18:39,000
No tantrums, just exponential patience.
435
00:18:39,000 –> 00:18:40,760
Grounding isn’t a vibe, it’s a pipeline.
436
00:18:40,760 –> 00:18:44,200
Retrieve candidate documents via graph search or list queries.
437
00:18:44,200 –> 00:18:46,520
The dupe by item id and version ETAC
438
00:18:46,520 –> 00:18:48,360
so you don’t blend stale and current.
439
00:18:48,360 –> 00:18:50,280
Chunk by semantic boundaries,
440
00:18:50,280 –> 00:18:52,040
section headers, slide breaks,
441
00:18:52,040 –> 00:18:53,400
then attach provenance,
442
00:18:53,400 –> 00:18:55,720
drive, site, path, item id,
443
00:18:55,720 –> 00:18:57,400
last modified and assigned hash.
444
00:18:57,400 –> 00:19:00,200
The reasoning agent only sees chunks plus metadata
445
00:19:00,200 –> 00:19:03,480
and is required to output citations mapped back to those IDs.
446
00:19:03,480 –> 00:19:04,840
No citation no claim.
447
00:19:04,840 –> 00:19:06,840
The executor enforces that as a post-condition
448
00:19:06,840 –> 00:19:08,520
before any outward action.
449
00:19:08,520 –> 00:19:10,600
Enter co-pilot studio for orchestration.
450
00:19:10,600 –> 00:19:12,120
You define declarative tools,
451
00:19:12,120 –> 00:19:14,520
graph query packs, sharepoint write actions,
452
00:19:14,520 –> 00:19:16,200
teams posts, outlook sends,
453
00:19:16,200 –> 00:19:19,160
each behind a proxy with explicit schemas and allow lists.
454
00:19:19,160 –> 00:19:21,240
Agent to agent coordination is structured.
455
00:19:21,240 –> 00:19:23,880
The retrieval agent exposes a ground tool.
456
00:19:23,880 –> 00:19:26,440
The reasoning agent requests it with parameters.
457
00:19:26,440 –> 00:19:28,680
The executor mediates, validates,
458
00:19:28,680 –> 00:19:30,280
and returns grounded context.
459
00:19:30,280 –> 00:19:33,080
Human checkpoints are native.
460
00:19:33,080 –> 00:19:35,640
A proposed action node pauses the run,
461
00:19:35,640 –> 00:19:39,080
presents the plan plus citations and requires approval.
462
00:19:39,080 –> 00:19:40,440
Approval is signed and logged,
463
00:19:40,440 –> 00:19:43,160
denial routes to a safe alternative or escalation.
464
00:19:43,160 –> 00:19:46,120
Tokens and latency are managed, not wished away.
465
00:19:46,120 –> 00:19:49,080
Selective context means you feed only the relevant chunks,
466
00:19:49,080 –> 00:19:50,600
not your entire tenant.
467
00:19:50,600 –> 00:19:52,280
Summaries are pre-computed and cached
468
00:19:52,280 –> 00:19:55,160
with embeddings keyed by content hash and permissions.
469
00:19:55,160 –> 00:19:56,760
Change the dock, change the key,
470
00:19:56,760 –> 00:19:58,440
miss the cache, recompute.
471
00:19:58,440 –> 00:20:00,440
Streaming responses keep the UI alive
472
00:20:00,440 –> 00:20:02,280
while the executor handles side effects
473
00:20:02,280 –> 00:20:04,840
only after the full schema valid plan arrives.
474
00:20:04,840 –> 00:20:07,240
Early exit conditions stop the reasoning loop
475
00:20:07,240 –> 00:20:09,400
when confidence plus coverage hits threshold.
476
00:20:09,400 –> 00:20:12,120
No extra thinking because the model felt poetic.
477
00:20:12,120 –> 00:20:13,960
Auditability is baked in.
478
00:20:13,960 –> 00:20:16,280
Every action is signed by the service principle
479
00:20:16,280 –> 00:20:19,480
or delegated user and stamped with run ID,
480
00:20:19,480 –> 00:20:24,120
tool, parameters, redactedware necessary, scopes and result.
481
00:20:24,120 –> 00:20:26,520
Immutable logs live in your observability stack,
482
00:20:26,520 –> 00:20:28,600
pick your favorite so you can replay a run
483
00:20:28,600 –> 00:20:30,680
without re-executing side effects.
484
00:20:30,680 –> 00:20:31,960
Who did what when and why?
485
00:20:31,960 –> 00:20:33,480
Becomes a query, not a witch hunt.
486
00:20:33,480 –> 00:20:35,480
And yes, the citations survive intact
487
00:20:35,480 –> 00:20:37,480
so you can verify that the answer traced
488
00:20:37,480 –> 00:20:39,800
to actual tenant content, not model lore.
489
00:20:39,800 –> 00:20:41,400
Failure is normalized and boring.
490
00:20:41,400 –> 00:20:44,200
429s, the executor retrieves with jitter
491
00:20:44,200 –> 00:20:47,320
then falls back to a lower cost query or reduced page size.
492
00:20:47,320 –> 00:20:50,200
Stale cache, the validator detects mismatched e-tags
493
00:20:50,200 –> 00:20:51,640
and forces a refresh.
494
00:20:51,640 –> 00:20:54,040
Permission denial, the policy guard denies with reason
495
00:20:54,040 –> 00:20:56,120
proposes a consent request path or roots
496
00:20:56,120 –> 00:20:58,440
to a redacted summary that doesn’t leak.
497
00:20:58,440 –> 00:21:00,680
Tool outage, the graph declares alternates
498
00:21:00,680 –> 00:21:02,520
or parks the run in a dead letter queue
499
00:21:02,520 –> 00:21:04,920
with full context for human remediation.
500
00:21:04,920 –> 00:21:08,200
Deterministic fallbacks turn incident into ticket.
501
00:21:08,200 –> 00:21:10,760
Now a very short walkthrough, user asks,
502
00:21:10,760 –> 00:21:14,040
draft a summary of last quarter’s roadmap decisions with links.
503
00:21:14,040 –> 00:21:15,400
Reasoning agent proposes,
504
00:21:15,400 –> 00:21:17,720
use graph search across a specific sharepoint site
505
00:21:17,720 –> 00:21:20,040
and a team’s channel filter by last quarter,
506
00:21:20,040 –> 00:21:22,040
then synthesize validator checks
507
00:21:22,040 –> 00:21:24,760
that the requested scopes match the agent’s role.
508
00:21:24,760 –> 00:21:25,640
They do.
509
00:21:25,640 –> 00:21:28,440
executor issues graph calls with paging and field selection,
510
00:21:28,440 –> 00:21:31,880
dedupes by item ID, chunks and returns context with provenance.
511
00:21:31,880 –> 00:21:35,480
Reasoning produces a summary with inline citations mapped to item IDs.
512
00:21:35,480 –> 00:21:39,080
Validator checks schema and citations, passes.
513
00:21:39,080 –> 00:21:41,400
Human checkpoint appears with summary and evidence.
514
00:21:41,400 –> 00:21:43,320
Approver clicks, okay.
515
00:21:43,320 –> 00:21:46,200
executor posts the result in teams and emails stakeholders,
516
00:21:46,200 –> 00:21:47,880
each action using idempotency keys,
517
00:21:47,880 –> 00:21:49,320
so retries don’t double post.
518
00:21:49,320 –> 00:21:51,800
Note the discipline, no agent invents a tool,
519
00:21:51,800 –> 00:21:54,760
no note crosses a domain outside its allow list.
520
00:21:54,760 –> 00:21:58,200
Tokens are scoped and propagated correctly via Entra ID,
521
00:21:58,200 –> 00:22:00,360
not copy pasted between nodes.
522
00:22:00,360 –> 00:22:02,360
The model never concludes success.
523
00:22:02,360 –> 00:22:05,000
The executor proves it with graph post conditions.
524
00:22:05,000 –> 00:22:07,960
Created message ID, updated item ETag calendar event ID,
525
00:22:07,960 –> 00:22:09,320
then stamps the run complete.
526
00:22:09,320 –> 00:22:11,000
And yes, you can extend this safely,
527
00:22:11,000 –> 00:22:15,000
bring in planner, viva or third party services via mcp or open api,
528
00:22:15,000 –> 00:22:18,680
but only behind proxy tools with strict schemas and network egress controls.
529
00:22:18,680 –> 00:22:21,640
Wrap every connector with the same validator logic and policy guard.
530
00:22:21,640 –> 00:22:24,360
Promotion between environments requires validation passes
531
00:22:24,360 –> 00:22:26,920
that match the new boundary, dev to test to prod,
532
00:22:26,920 –> 00:22:28,920
with scope increases reviewed, not assumed.
533
00:22:28,920 –> 00:22:31,320
That’s Microsoft’s architecture done properly.
534
00:22:31,320 –> 00:22:32,440
Graph for truth.
535
00:22:32,440 –> 00:22:34,200
As your open AI for thinking,
536
00:22:34,200 –> 00:22:36,280
co-pilot studio for orchestration,
537
00:22:36,280 –> 00:22:37,960
executors for operations,
538
00:22:37,960 –> 00:22:39,960
validators and policy for safety,
539
00:22:39,960 –> 00:22:41,800
and observability for proof.
540
00:22:41,800 –> 00:22:42,920
Numbers next.
541
00:22:42,920 –> 00:22:43,880
Not vibes.
542
00:22:43,880 –> 00:22:45,160
Before after metrics.
543
00:22:45,160 –> 00:22:48,440
Accuracy, latency, cost, admin, overhead.
544
00:22:48,440 –> 00:22:49,560
Nice architecture.
545
00:22:49,560 –> 00:22:50,280
Prove it.
546
00:22:50,280 –> 00:22:51,720
Numbers, not vibes.
547
00:22:51,720 –> 00:22:54,440
Baseline first, prompt only agents are drama queens.
548
00:22:54,440 –> 00:22:58,600
Accuracy is inconsistent because they invent sources and forget citations.
549
00:22:58,600 –> 00:23:01,640
Without grounding, you get confident pros that points to nowhere.
550
00:23:01,640 –> 00:23:03,800
Tail latency is brutal.
551
00:23:03,800 –> 00:23:06,440
One long chain of serial think harder calls,
552
00:23:06,440 –> 00:23:08,760
each bloated with redundant context.
553
00:23:08,760 –> 00:23:10,840
Cost spirals because every turn,
554
00:23:10,840 –> 00:23:13,720
ships full transcripts and raw documents back to the model
555
00:23:13,720 –> 00:23:15,400
like an overpaid courier service.
556
00:23:15,400 –> 00:23:16,760
Admin overhead?
557
00:23:16,760 –> 00:23:17,880
High.
558
00:23:17,880 –> 00:23:20,040
Incidents, hotfixes, mystery failures,
559
00:23:20,040 –> 00:23:22,680
and audits that feel like archaeology with a blindfold.
560
00:23:22,680 –> 00:23:25,560
Now the after-state with executors and validated graphs.
561
00:23:25,560 –> 00:23:27,880
Accuracy jumps because claims require receipts.
562
00:23:27,880 –> 00:23:30,840
Grounded citations tied to graph item IDs,
563
00:23:30,840 –> 00:23:32,920
e-tags and signed hashes mean an answer
564
00:23:32,920 –> 00:23:35,480
that lacks provenance simply doesn’t pass the validator.
565
00:23:35,480 –> 00:23:36,600
The effect is immediate.
566
00:23:36,600 –> 00:23:37,960
Fewer wrong answers shipped.
567
00:23:37,960 –> 00:23:42,280
Fewer rework loops and a measurable lift in task success rates on e-valtzets.
568
00:23:42,280 –> 00:23:43,800
When a claim can’t be supported,
569
00:23:43,800 –> 00:23:47,000
the agent denies with reason or requests human approval,
570
00:23:47,000 –> 00:23:49,320
predictable, reviewable, safe.
571
00:23:49,320 –> 00:23:51,320
Latency compresses for three reasons.
572
00:23:51,320 –> 00:23:53,560
First, parallelism, retrieval, re-ranking,
573
00:23:53,560 –> 00:23:55,640
and citation extraction run side by side,
574
00:23:55,640 –> 00:23:57,320
then synchronize at a barrier note.
575
00:23:57,320 –> 00:23:59,160
Second, caching, embeddings and summaries
576
00:23:59,160 –> 00:24:00,920
keyed by content hash and permission scope
577
00:24:00,920 –> 00:24:03,400
avoid recomputing what hasn’t changed.
578
00:24:03,400 –> 00:24:04,760
Third, early exit.
579
00:24:04,760 –> 00:24:07,960
Once coverage and confidence hit threshold, the graph stops the loop.
580
00:24:07,960 –> 00:24:09,640
Compare that to serial prompting,
581
00:24:09,640 –> 00:24:11,640
where the model reflects for a paragraph
582
00:24:11,640 –> 00:24:13,080
and your users reflect on quitting.
583
00:24:13,080 –> 00:24:16,280
Cost drops because token discipline is enforced, not begged.
584
00:24:16,280 –> 00:24:18,680
Schema-constrained outputs prevent rambling.
585
00:24:18,680 –> 00:24:21,000
Selective context feeds only the relevant chunks
586
00:24:21,000 –> 00:24:23,720
with metadata, not entire sites.
587
00:24:23,720 –> 00:24:26,440
Short or prompt, smaller responses, fewer retries.
588
00:24:26,440 –> 00:24:28,600
The executors’ identity and back-off logic
589
00:24:28,600 –> 00:24:31,320
avoid duplicate calls and wasted cycles.
590
00:24:31,320 –> 00:24:34,040
The net effect is fewer tokens per successful outcome
591
00:24:34,040 –> 00:24:37,080
and far less variance, finance likes, variance reduction.
592
00:24:37,080 –> 00:24:39,560
Admin overhead shrinks because observability is engineered,
593
00:24:39,560 –> 00:24:40,680
not improvised.
594
00:24:40,680 –> 00:24:42,360
Note-level traces and immutable logs
595
00:24:42,360 –> 00:24:44,520
collapse incident time to diagnose.
596
00:24:44,520 –> 00:24:47,240
You see which note failed, why the policy engine denied,
597
00:24:47,240 –> 00:24:49,480
and what the executor tried next.
598
00:24:49,480 –> 00:24:52,680
Repeatable deployments cut works on my machine theater.
599
00:24:52,680 –> 00:24:54,600
Compliance stops being a seasonal crisis
600
00:24:54,600 –> 00:24:58,680
because every run already contains who, what, when, and why.
601
00:24:58,680 –> 00:25:00,440
Let’s make this concrete with a measurement rig
602
00:25:00,440 –> 00:25:01,880
you can actually run.
603
00:25:01,880 –> 00:25:03,800
Build an evil set of representative tasks,
604
00:25:03,800 –> 00:25:06,760
Q&A with citations, summary with links and action proposals
605
00:25:06,760 –> 00:25:07,720
with approvals.
606
00:25:07,720 –> 00:25:11,160
For each defined golden answers or acceptance criteria,
607
00:25:11,160 –> 00:25:13,240
correct facts with mapped item IDs,
608
00:25:13,240 –> 00:25:15,720
citation coverage, and allowed variance.
609
00:25:15,720 –> 00:25:18,840
Instrument SLOs, P50 and P95, end-to-end latency,
610
00:25:18,840 –> 00:25:21,560
tokens spend per successful task and policy deny rates.
611
00:25:21,560 –> 00:25:23,880
Link every metric to traces, so any regression
612
00:25:23,880 –> 00:25:25,240
has a breadcrumb trail.
613
00:25:25,240 –> 00:25:27,880
Results you should expect if you follow the architecture,
614
00:25:27,880 –> 00:25:29,080
not improvised.
615
00:25:29,080 –> 00:25:31,560
Higher answer validity because unsupported claims
616
00:25:31,560 –> 00:25:33,320
never leave staging.
617
00:25:33,320 –> 00:25:36,840
Lower P95 latency because long tails get sliced
618
00:25:36,840 –> 00:25:39,240
by parallel nodes and early exits.
619
00:25:39,240 –> 00:25:41,720
Lower token spend because you stop shipping novels
620
00:25:41,720 –> 00:25:44,040
and start shipping relevant snippets.
621
00:25:44,040 –> 00:25:46,280
Fewer pages to admins because most failures
622
00:25:46,280 –> 00:25:48,520
get handled by deterministic fallbacks.
623
00:25:48,520 –> 00:25:51,240
And yes, the boring metric everyone forgets.
624
00:25:51,240 –> 00:25:53,240
Successful first pass completion ratio,
625
00:25:53,240 –> 00:25:55,160
more runs finish without human rescue.
626
00:25:55,160 –> 00:25:56,840
Business impact faster resolutions mean
627
00:25:56,840 –> 00:25:58,680
user stop opening duplicate tickets.
628
00:25:58,680 –> 00:26:00,360
Predictable spend means budgeting
629
00:26:00,360 –> 00:26:01,880
without surprise token hangovers.
630
00:26:01,880 –> 00:26:04,120
Compliance confidence means fewer audit cycles
631
00:26:04,120 –> 00:26:05,720
hijacking your roadmap.
632
00:26:05,720 –> 00:26:08,440
The non-obvious win is reputational.
633
00:26:08,440 –> 00:26:10,840
When the agents answers site tenant content
634
00:26:10,840 –> 00:26:14,120
and the links work, people trusted, use it,
635
00:26:14,120 –> 00:26:16,600
and stop forwarding screenshots that begin with
636
00:26:16,600 –> 00:26:18,120
why did it say this?
637
00:26:18,120 –> 00:26:21,320
Direct imperative advice, measure, trace to metric linkage
638
00:26:21,320 –> 00:26:22,680
or you’re flying on opinion.
639
00:26:22,680 –> 00:26:25,000
If you can’t open a run and see exactly
640
00:26:25,000 –> 00:26:27,480
which node inflated tokens or stalled latency,
641
00:26:27,480 –> 00:26:30,360
you don’t have observability, you have vibes with timestamps.
642
00:26:30,360 –> 00:26:32,520
Everything changes when the validator sits between
643
00:26:32,520 –> 00:26:34,840
nice plan and real action.
644
00:26:34,840 –> 00:26:37,480
Accuracy stabilizes latency narrows,
645
00:26:37,480 –> 00:26:40,040
cost flattens, admins sleep, that’s not magic,
646
00:26:40,040 –> 00:26:42,920
that’s executors, graphs and validation
647
00:26:42,920 –> 00:26:45,640
doing the work you incorrectly assigned to prompts.