
This is a new area for me…
I've spent years working with Dynamics 365 CE, Power Platform and everything around it and I will keep doing that 😉 But the world keeps changing and I keep moving toward where the interesting problems are. 😁
Right now, that seems to be agents. Not just building them, but figuring out whether they actually work before a real customer has to find out they don't.
Because here's what I've noticed: the internet is full of content on how to build agents. How to extend them. How to connect them to your data. How to give them tools and memory and a persona. That's all useful, but almost nobody is talking about what happens before you hand one to a customer. What does it mean to test something that doesn't behave the same way twice? How do you verify that a system making decisions on your behalf is actually making the right ones?
So I decided to look into that part. And I am learning. And this is the beginning of a series, so I'm bringing you all on the journey with me 😉
I am presenting a session called Trust, But Verify at Agent Academy Live (extremely excited by the way 🤩) focused on testing Copilot Studio agents. And to support this, I decided to dive deeper into the features available.
But Copilot Studio is just one piece of the puzzle. There are three places to build agents, each with its own approach, its own agent types, and its own testing story.
So I decided to go all in 😁
Copilot Studio is the low-code platform most people start with. It supports conversational agents built on topics, autonomous agents that react to business events, and structured workflow agents. It has the most mature built-in testing tooling, and it got a huge upgrade when Agent Evaluation went generally available very recently.
Agent Builder lives inside Microsoft 365 Copilot. It's where you create declarative agents, which are agents that use M365 Copilot's own orchestrator and models, but scoped to a specific job, persona, or knowledge base. Building one takes minutes. Testing is a different matter.
Microsoft Foundry is the developer platform. If you're writing code, using the Azure AI SDK, building multi-agent systems or connecting to Azure OpenAI directly, this is your space. The evaluation tooling here is the most powerful and the most complex 😅
The series is organized in three parts:
I try to keep each post short and practical. I'd rather write ten posts you actually read than one book you bookmark and never open. 😁 But lets see how this works out 😁
No matter which platform you're on, a few things are general to them all:
Testing should not be a phase at the end of development. Or an afterthought when things don't work out as you hoped. It's the thing that tells you whether what you built actually works. Wouldn't you want to figure that out before go live? 😁
I will start the series with Copilot Studio as it builds up nicely from the Agent Academy Live session and because most people reading this are likely already building or managing agents there. Then we'll move to Agent Builder, then Foundry (we'll see how that one will go, might ask for some help 😉 ).
Let's get into it.
Next up: The different agent types in Copilot Studio and what breaks them.
Check Vivian Voss’s original post https://vivian.tiiman.com/everyones-building-agents-whos-testing-them/ on vivian.tiiman.com which was published 2026-05-12 12:24:00