{Do you know}Evaluate test sets with multiple graders in Copilot Studio

Malla Reddy GurramDyn365CE5 hours ago39 Views

Hello Everyone,

Today I am going to share my thoughts on the evaluation test sets with multiple graders in Copilot Studio.

Let’s get started.
Yes – Copilot Studio now supports evaluating a single test set with multiple graders in one run. This is listed as a Public preview in the 2025 release wave 2 plan, with availability starting February 8, 2026.
What it does:
  • You can attach several graders to the same test set, such as general quality, text similarity, and exact match.
  • Each grader can have its own pass criteria.
  • When you run the evaluation, Copilot Studio applies all selected graders to every test case in that run.
  • Results show up as separate columns per grader, plus an evaluation summary with aggregated results.

Why this helps:
  • You can assess different aspects of agent quality in one execution instead of rerunning the same test set multiple times.
  • Microsoft’s guidance also recommends combining multiple evaluation approaches rather than relying on a single grading method.
Related limits and setup:
  • Test sets can contain up to 100 test cases.
  • You can create test sets by generating them in Copilot Studio, importing a .csv or .txt file, writing cases manually, or using production data themes.
If you’re trying to use it in the product:
  1. Go to your agent’s Evaluation page.
  2. Create or open a test set.
  3. Add multiple graders for the test.
  4. Define pass thresholds for each grader.
  5. Run the evaluation and compare the grader-specific result columns and summary.
That’s it for today.
I hope this helps.
Malla Reddy Gurram aka @UK365GUY

Original Post https://mscrmgmr.blogspot.com/2026/03/do-you-knowevaluate-test-sets-with.html

0 Votes: 0 Upvotes, 0 Downvotes (0 Points)

Leave a reply

Follow
Search
Popular Now
Loading

Signing-in 3 seconds...

Signing-up 3 seconds...

Discover more from 365 Community Online

Subscribe now to keep reading and get access to the full archive.

Continue reading