Skip to main content

Test your v3 Agent

Nexus gives you two complementary places to test:

  • Playground — interactive, single-conversation testing on every agent's profile. Best while you build. Documented on this page.
  • AI Trust Centre — durable, dataset-driven evaluation. Best before each release. Documented surfaces: Testing Lab, Evaluators & Rules, Test Case detail, and Reports. (Overview and Action Center are wired in the studio but run on mock data; their docs land with the v1 backend.)

Use both — they answer different questions.

Playground — try the bot as a user would

Inside any agent's profile, click the play (▶) icon in the title bar to open the Playground — a side panel on the left where you chat with the bot exactly as a customer would. This is your fastest feedback loop while building.

The Playground gives you:

  • The bot's actual welcome message and quick-reply chips.
  • A live chat input — type a message, hit Enter, watch the agent respond in real time.
  • Per-message action icons (copy, voice playback, 👍 / 👎) for quick feedback.
  • A voice/mic toggle for testing voice flows without leaving the page.

Don't confuse the Playground with Copilot. The right-hand "How can I help you today?" panel on an agent's profile is Copilot — an AI assistant for you, the builder. Use Copilot to ask questions about the bot's configuration, generate suggestions, or scaffold logic. The Playground is for acting as the end-user and seeing how your bot actually replies.

Playground panel docked on the left of an agent's profile — the Yellow Bikes assistant greets in Spanish, the user replies "Lets talk in english" and the bot switches languages, then answers a "which bike for my daily commute" question with Commuter / Sports / Both quick-reply chips; a View trace link sits under the bot replies and the message input has home / voice / send controls

Step-by-step: try the bot

  1. Open any agent from AI Agent → Agents.
  2. Click the play (▶) icon in the title bar — the Playground opens on the left.
  3. Use one of the welcome quick-reply chips, or type a message and press Enter.
  4. Watch the agent respond. If voice is enabled for the bot, tap the speaker icon on a bot message to hear the TTS playback.
  5. Use 👍 / 👎 on individual messages to flag responses that looked good or bad — useful when you come back later to figure out what to fix.

Step-by-step: verify a routing rule fires

  1. Open the agent whose routing you want to test.
  2. Send a message that should match the rule (e.g. "I need a refund").
  3. Confirm the expected agent or tool takes over — the persona / response pattern should match.
  4. If the wrong thing fires, go back to Routing Logic and tighten the rule, or sharpen the agent's Trigger.

Step-by-step: verify a widget renders

  1. Make sure your bot has v3 agents enabled — widgets only render in v3 conversations.
  2. In the Playground, send a message that triggers a workflow node hosting your widget.
  3. The widget renders inline in the Playground chat. Interact with it (fill the form, click the button); the output flows back into the conversation as the next user input.

AI Trust Centre — durable evaluation

The AI Trust Centre is the durable side of testing. Currently documented: Testing Lab / Evaluators & Rules / Test Case detail / Reports — see the AI Trust Centre section index for the full walkthrough. (Overview + Action Center docs land alongside their v1 backend.)

The 30-second version:

Sub-pageWhen you open it
Testing Lab"I need to capture / curate / run test cases." — Scenario AI-generation, Import Content, assertion picker.
Evaluators & Rules"Tune what counts as a pass." — 10 evaluators (7 Quality + 3 Safety) + hard invariant Rules.
Test Case"Why did this specific case fail?" — conversation paired with the trace that produced it.
Reports"What happened in this run?" — per-run breakdown with evaluator scores per simulation.

Common testing pitfalls

  • Testing only the golden path. The hard cases are where bugs hide. Schedule time for adversarial testing — jailbreaks, off-topic, hostile users.
  • Forgetting to test after rules change. Even a small wording tweak in identity, conversation rules, or routing logic can shift behaviour.
  • Trusting "it worked once." LLMs are stochastic. Run the same test twice — if a behaviour is fragile, it'll fail intermittently.
  • Not saving test cases. A test you ran manually once is one you'll have to re-run manually next time. Save it to the Testing lab dataset.

Best practices

  • Test in the Playground first, then promote durable cases to a Testing lab dataset.
  • Tune Evaluators once, save it, and let them score every future run automatically. Don't eyeball runs every time — that's what evaluators are for.
  • Treat the regression dataset as production code. Review it, evolve it, don't let it rot.
  • Test voice and chat separately — they don't behave identically, even with the same agent config.

Go to Widget Builder if you need custom UI in your conversations.