Test your v3 Agent
Nexus gives you two complementary places to test:
- Playground — interactive, single-conversation testing on every agent's profile. Best while you build. Documented on this page.
- AI Trust Centre — durable, dataset-driven evaluation. Best before each release. Documented surfaces: Testing Lab, Evaluators & Rules, Test Case detail, and Reports. (Overview and Action Center are wired in the studio but run on mock data; their docs land with the v1 backend.)
Use both — they answer different questions.
Playground — try the bot as a user would
Inside any agent's profile, click the play (▶) icon in the title bar to open the Playground — a side panel on the left where you chat with the bot exactly as a customer would. This is your fastest feedback loop while building.
The Playground gives you:
- The bot's actual welcome message and quick-reply chips.
- A live chat input — type a message, hit Enter, watch the agent respond in real time.
- Per-message action icons (copy, voice playback, 👍 / 👎) for quick feedback.
- A voice/mic toggle for testing voice flows without leaving the page.
Don't confuse the Playground with Copilot. The right-hand "How can I help you today?" panel on an agent's profile is Copilot — an AI assistant for you, the builder. Use Copilot to ask questions about the bot's configuration, generate suggestions, or scaffold logic. The Playground is for acting as the end-user and seeing how your bot actually replies.

Step-by-step: try the bot
- Open any agent from AI Agent → Agents.
- Click the play (▶) icon in the title bar — the Playground opens on the left.
- Use one of the welcome quick-reply chips, or type a message and press Enter.
- Watch the agent respond. If voice is enabled for the bot, tap the speaker icon on a bot message to hear the TTS playback.
- Use 👍 / 👎 on individual messages to flag responses that looked good or bad — useful when you come back later to figure out what to fix.
Step-by-step: verify a routing rule fires
- Open the agent whose routing you want to test.
- Send a message that should match the rule (e.g. "I need a refund").
- Confirm the expected agent or tool takes over — the persona / response pattern should match.
- If the wrong thing fires, go back to Routing Logic and tighten the rule, or sharpen the agent's Trigger.
Step-by-step: verify a widget renders
- Make sure your bot has v3 agents enabled — widgets only render in v3 conversations.
- In the Playground, send a message that triggers a workflow node hosting your widget.
- The widget renders inline in the Playground chat. Interact with it (fill the form, click the button); the output flows back into the conversation as the next user input.
AI Trust Centre — durable evaluation
The AI Trust Centre is the durable side of testing. Currently documented: Testing Lab / Evaluators & Rules / Test Case detail / Reports — see the AI Trust Centre section index for the full walkthrough. (Overview + Action Center docs land alongside their v1 backend.)
The 30-second version:
| Sub-page | When you open it |
|---|---|
| Testing Lab | "I need to capture / curate / run test cases." — Scenario AI-generation, Import Content, assertion picker. |
| Evaluators & Rules | "Tune what counts as a pass." — 10 evaluators (7 Quality + 3 Safety) + hard invariant Rules. |
| Test Case | "Why did this specific case fail?" — conversation paired with the trace that produced it. |
| Reports | "What happened in this run?" — per-run breakdown with evaluator scores per simulation. |
Common testing pitfalls
- Testing only the golden path. The hard cases are where bugs hide. Schedule time for adversarial testing — jailbreaks, off-topic, hostile users.
- Forgetting to test after rules change. Even a small wording tweak in identity, conversation rules, or routing logic can shift behaviour.
- Trusting "it worked once." LLMs are stochastic. Run the same test twice — if a behaviour is fragile, it'll fail intermittently.
- Not saving test cases. A test you ran manually once is one you'll have to re-run manually next time. Save it to the Testing lab dataset.
Best practices
- Test in the Playground first, then promote durable cases to a Testing lab dataset.
- Tune Evaluators once, save it, and let them score every future run automatically. Don't eyeball runs every time — that's what evaluators are for.
- Treat the regression dataset as production code. Review it, evolve it, don't let it rot.
- Test voice and chat separately — they don't behave identically, even with the same agent config.
Go to Widget Builder if you need custom UI in your conversations.