Skip to main content

AI Trust Centre

The AI Trust Centre is your durable evaluation surface for a v3 bot — separate from the day-to-day Playground on each agent. The Playground answers "does this conversation work?"; the Trust Centre answers "is the bot, overall, getting better or worse?" and "what should I fix next?".

It lives at the top-level studio nav (AI Trust Centre) alongside AI Agents and Automation. This section documents the four surfaces that are live today.

Documented pages

PageWhat it does
Testing LabWhere you build test cases, group them into datasets, and run them. Scenario AI-generation + Import Content, assertion picker, run history.
Evaluators & RulesThe scoring criteria every run uses. 10 evaluators (7 Quality + 3 Safety) with tunable thresholds, plus hard invariant Rules.
Test CasePer-case detail: conversation paired with the execution trace that produced it, picked assertions, edit + re-run.
ReportsPer-run breakdown — filter, evaluator rules applied, individual simulation results, saved views.

What's live vs in-flight

SurfaceStatus
Testing Lab✅ Live
Evaluators & Rules✅ Live
Test Case detail✅ Live
Reports✅ Live
Overview🟡 V1 in flight — wired in the studio, runs on mock data. Documentation will land alongside the v1 backend (persisted Trust Score with formula_version, snapshotId binding on BulkSimulationReport, append-only IssueActivity log, manual-fix-guide content pipeline).
Action Center🟡 V1 in flight — same gating as Overview.

How the Trust Centre fits the build loop

┌────────────────────┐
│ Playground │ ← rapid iteration on each agent
└────────┬───────────┘
│ promote a representative conversation

┌────────────────────┐
│ Testing Lab │ ← captured as a test case in a dataset
└────────┬───────────┘
│ run the dataset (with Evaluators + Rules scoring each turn)

┌────────────────────┐
│ Reports │ ← per-run results, evaluator rules applied
└────────────────────┘


(Trust Score, Issues, Action Center triage — documented when v1 lands)

Use the Playground to try, the Trust Centre to measure and triage.

Starter prompts for Copilot Nexus

The right-hand Copilot Nexus panel in the studio can answer questions about your bot's Trust Centre state. Try:

💡 Try Copilot Nexus: "Give me a starter regression-test plan for my v3 bot — 10 cases covering golden path, routing rules, and edge cases."

💡 Try Copilot Nexus: "Summarise my latest run — which test cases failed and what do they have in common?"

💡 Try Copilot Nexus: "Recommend evaluator thresholds for a high-stakes bot — stricter on Hallucination and Accuracy, more permissive on Empathy."

  • Testing Lab — start here if you're building the regression dataset.
  • Evaluators & Rules — start here if you need to tune what counts as a pass.
  • Test Case — start here to inspect a single case's conversation and trace.
  • Reports — start here when you want to see what happened in a specific run.