Skip to main content

Test Case detail

The Test Case detail panel is where you drill into a single case — read its conversation, inspect the trace that produced it, edit the inputs, manage its assertions, and re-run it on demand. Deep-link to it from any test-case row in Testing Lab, or from a failure in Reports.

Don't confuse this with the Test Case list page (the index view under Trust Centre → Test Cases), which groups cases by typeKnowledge Base / Copilot Saved Session / Scenario. This page documents the per-case detail panel that opens when you click into a single case.

Detail-panel layout

A two-column layout: the conversation on the left, the case metadata + execution trace on the right (it varies slightly by tab — the Conversation tab keeps the trace inline).

The detail panel has three tabs:

TabWhat's on it
ConversationThe captured user ↔ agent turns, paired with the execution trace per turn (when a baseline exists).
AssertionsThe checks ticked when the case was saved (turn-anchored + global).
SettingsCase name, source agent, initial state, datasets the case belongs to, and the Re-run button. A stale-baseline banner appears here when the underlying config has drifted (see below).

Conversation tab — inline traces

For cases captured via Trial run (i.e. cases with a baseline), each agent message is paired with the execution trace that produced it:

  • Tool calls — which tool fired, with what arguments, with what return value.
  • Memory updates — keys read / written during the turn, with before / after values.
  • Agent decisions — Context Expert routing pick, sub-agent push / pop.
  • Per-turn metrics — latency, token cost, evaluator scores.

The trace fetches lazily from GET /v1/agentic-qa/testcases/:slug/trace?bot=<botId> only when you switch to the Conversation tab — 5-minute staleTime (baselines are immutable, no point re-fetching).

If the case predates the baseline-capture flow (v2 agents, legacy cases imported from elsewhere), the trace is unavailable and the tab shows the plain bubble list. There's no way to retrofit a baseline onto a legacy case — re-capture via Trial run if you need the inline traces.

💡 Try Copilot Nexus: "This test case is failing on turn 3. Explain what the bot did, why it picked the wrong agent, and what to change."

Assertions tab

Shows the list you ticked in the Assertion picker when the case was saved. Each assertion has:

FieldNotes
Typetool_called, response_contains, goal_status, no_tool_called, etc.
AnchorturnIndex for turn-anchored assertions; conversation for global assertions.
Expected valueWhat the assertion checks for.
EnabledToggle — disable an assertion without removing it (useful if you're debugging a flake).

You can add an assertion manually on this tab (without going through the picker again) or remove one that's become too strict.

Settings tab — edit + re-run

ActionWhat it does
Edit inputsChange the user messages or initial state. Marks the case stale until you re-capture the baseline.
Reassign to a datasetMove the case between datasets, or add it to multiple datasets.
Re-runQueues this single case immediately. Useful when iterating on a fix — no need to run the whole dataset.
DeleteSoft-deletes the case. Recoverable from the dataset's archive for 30 days.

The Re-run button is the fastest feedback loop in the Trust Centre: tweak the bot config, click Re-run, watch the conversation + trace re-render with the new behaviour.

When a case goes stale

If the bot config changed in ways that affect this case (an evaluator threshold shifted, a routing rule was added, a referenced variable was renamed), the detail panel surfaces a stale-baseline banner prompting a re-run. The same signal appears as a chip on the case's row in the Testing Lab table.

Two options to clear staleness:

  1. Re-capture the baseline — open Trial run on this case, replay it, approve the new trace.
  2. Accept the staleness — if you know the change was intentional and the case still tests the right thing, dismiss the chip. The case keeps running; the dashboard just doesn't promise the trace baseline is current.

Best practices

  • Open the Conversation tab first when a case fails. The trace pairing usually tells you why — wrong tool, missing memory write, the model misread an argument. Skip the tab if you're sure it's a flake.
  • Add assertions over time, not at creation. Start with "right agent fired"; once that's stable, add response-content checks; later, add latency / cost checks.
  • Re-run a single case before re-running the dataset. Faster iteration; cheaper queue cost.
  • Don't edit and re-baseline without thinking. A stale chip is a warning, not a bug. If the case was right and the bot is wrong, the case should keep failing — fix the bot, don't lower the bar.

💡 Try Copilot Nexus: "This case used to pass and is now failing after I added a Routing Logic rule. Show me the diff between the old baseline trace and the new run, and tell me whether to update the case or revert the rule."

Read next: Reports — see how this case scored in the latest run alongside the rest of its dataset.