AI Trust Centre
The AI Trust Centre is your durable evaluation surface for a v3 bot — separate from the day-to-day Playground on each agent. The Playground answers "does this conversation work?"; the Trust Centre answers "is the bot, overall, getting better or worse?" and "what should I fix next?".
It lives at the top-level studio nav (AI Trust Centre) alongside AI Agents and Automation. This section documents the four surfaces that are live today.
Documented pages
| Page | What it does |
|---|---|
| Testing Lab | Where you build test cases, group them into datasets, and run them. Scenario AI-generation + Import Content, assertion picker, run history. |
| Evaluators & Rules | The scoring criteria every run uses. 10 evaluators (7 Quality + 3 Safety) with tunable thresholds, plus hard invariant Rules. |
| Test Case | Per-case detail: conversation paired with the execution trace that produced it, picked assertions, edit + re-run. |
| Reports | Per-run breakdown — filter, evaluator rules applied, individual simulation results, saved views. |
What's live vs in-flight
| Surface | Status |
|---|---|
| Testing Lab | ✅ Live |
| Evaluators & Rules | ✅ Live |
| Test Case detail | ✅ Live |
| Reports | ✅ Live |
| Overview | 🟡 V1 in flight — wired in the studio, runs on mock data. Documentation will land alongside the v1 backend (persisted Trust Score with formula_version, snapshotId binding on BulkSimulationReport, append-only IssueActivity log, manual-fix-guide content pipeline). |
| Action Center | 🟡 V1 in flight — same gating as Overview. |
How the Trust Centre fits the build loop
┌────────────────────┐
│ Playground │ ← rapid iteration on each agent
└────────┬───────────┘
│ promote a representative conversation
▼
┌────────────────────┐
│ Testing Lab │ ← captured as a test case in a dataset
└────────┬───────────┘
│ run the dataset (with Evaluators + Rules scoring each turn)
▼
┌────────────────────┐
│ Reports │ ← per-run results, evaluator rules applied
└────────────────────┘
│
▼
(Trust Score, Issues, Action Center triage — documented when v1 lands)
Use the Playground to try, the Trust Centre to measure and triage.
Starter prompts for Copilot Nexus
The right-hand Copilot Nexus panel in the studio can answer questions about your bot's Trust Centre state. Try:
💡 Try Copilot Nexus: "Give me a starter regression-test plan for my v3 bot — 10 cases covering golden path, routing rules, and edge cases."
💡 Try Copilot Nexus: "Summarise my latest run — which test cases failed and what do they have in common?"
💡 Try Copilot Nexus: "Recommend evaluator thresholds for a high-stakes bot — stricter on Hallucination and Accuracy, more permissive on Empathy."
Read next
- Testing Lab — start here if you're building the regression dataset.
- Evaluators & Rules — start here if you need to tune what counts as a pass.
- Test Case — start here to inspect a single case's conversation and trace.
- Reports — start here when you want to see what happened in a specific run.