Tools — Best Practices

Practical patterns from teams running v3 agents in production. Read once before you start, revisit before each release.

Naming

Verb-first, scoped to the domain. getOrderStatus, updateAddress, escalateToBilling. Avoid tool1, helper, doStuff.
Keep names stable. Tool names show up in routing rules and traces. Renaming is a chore — pick well the first time.
One name, one purpose. Don't reuse lookup for three different tools across domains. Be specific: lookupOrder, lookupCustomer, lookupTicket.

Descriptions

The description field is the most important text on the page. The LLM reads it to decide whether to call your tool. Treat it like a function docstring written for a junior teammate.

A good description answers all four:

What does the tool do? ("Look up a customer's current shipment status…")
What inputs does it need? ("…by order ID.")
When should it be used? ("Use when the user asks about an existing order.")
When should it NOT be used? ("Do not use for new orders or pricing questions.")

Omitting any of these makes the LLM guess.

Input schema

Always include description for each property. Naming a field orderId is half the message; describing it as "The customer's order ID, typically a 10-character alphanumeric string" closes the loop.
Mark requireds explicitly. Without required, the LLM may call your tool with missing arguments.
Use enum for fixed sets. Enum values prevent the LLM from inventing inputs.
Keep schemas small. Every input is one more thing the LLM has to extract correctly. Cut what's not strictly needed.
Don't ask for what you can derive. If the user is authenticated and you have their email in session state, don't ask the LLM to extract their email.

Output schema

Always define one. Without an output schema, the LLM doesn't know how to read the response.
Match the actual shape. If your workflow returns {status, trackingNumber}, your output schema should declare exactly those keys.
Use clear types. Don't return numbers as strings or vice versa. The LLM will get confused.

Granularity

One tool, one job. A tool that does five things via a "mode" parameter is harder for the LLM to use correctly than five focused tools.

✅ Good:

getOrderStatus(orderId)
cancelOrder(orderId, reason)
updateShippingAddress(orderId, newAddress)

❌ Bad:

manageOrder(orderId, action: "status" | "cancel" | "updateAddress", ...) — the LLM will mis-pick the action.

Choose the right tool type

You want to…	Use
Run multi-step backend logic, hit an API, branch on data	Workflow tool
Answer from your documents and FAQs	Knowledge Base tool
Hand off a chat to a human	Escalate to Agent
Forward a voice call to a human	Transfer Call
Hit a single REST endpoint (no branching)	Wrap in a one-step workflow today; HTTP Webhook when it ships
Connect a custom data source	Connect MCP Server when it ships

When two types could work, pick the more specific one. KB > workflow for content questions. Escalate to Agent > workflow for handoffs.

Test before you save

The Test tab in the configuration drawer exists for a reason. Use it.

Run the tool with a realistic input. Confirm output is what you expect.
Run it with a missing/invalid input. Confirm it fails gracefully.
For workflow tools, run with mocks during development so you don't burn through production data or paid API credits.

Make critical tools deterministic

The LLM will usually pick the right tool from descriptions alone. For anything critical — escalations, payments, account changes — don't rely on usually. Add a Routing Logic rule:

"If the user asks about billing, you must call the getInvoice tool before composing a reply. Do not answer billing questions from memory."

See Routing Logic.

Tool descriptions and routing rules work together

Descriptions tell the LLM how the tool works.
Routing rules tell the LLM when to use it.

Both matter. A great description with no routing rule means the LLM is guessing in ambiguous cases. A great rule with a vague description means the LLM may call the tool with bad arguments.

Iterate on real conversations

Your first guesses about descriptions and schemas will be wrong in places. The fix loop:

Run a batch of test conversations in AI Trust Centre → Testing Lab.
Open the Playground (▶ on any agent) and re-run the failing input — watch how the agent uses the tool's output.
Look at where the bot picked the wrong tool, missed inputs, or misread output.
Adjust the offending tool's description / schema.
Re-run.

Three rounds of this beats six months of "it should work."

Keep the catalog tidy

Delete tools you no longer use. The LLM sees every available tool. Stale tools dilute its decisions.
Don't keep "test" tools in production. If you need a sandbox tool, build it in a non-prod bot.
Document the catalog. Keep a brief external doc — a Notion page, a wiki — listing every tool, what it does, and who owns it. Future builders will thank you.

Pre-release checklist

Every production tool has a clear description with when-to-use and when-not-to-use.
Every input schema has property descriptions and required set correctly.
Output schemas match what the underlying workflow / KB / API actually returns.
Critical tools (escalation, payments, account changes) have a Routing Logic rule.
Each tool tested in isolation via the Test tab.
Stale or test tools removed from the catalog.
Regression dataset includes test cases for every tool path.

Common mistakes

Vague descriptions. "Order tool" tells the LLM nothing useful.
Ten-input schemas. The LLM will fail to extract all of them. Trim.
No output schema. The LLM ignores the response or hallucinates.
Five overlapping tools. Two tools with similar descriptions confuse the LLM. Sharpen each so triggers are distinct.
Forgetting Routing Logic for critical paths. Hoping the LLM picks the right tool every time is wishful thinking.
Skipping the Test tab. "It looked right in the config" is not the same as "it works."
Leaving stale tools in the catalog. Every unused tool is noise the LLM has to filter out.

Naming​

Descriptions​

Input schema​

Output schema​

Granularity​

Choose the right tool type​

Test before you save​

Make critical tools deterministic​

Tool descriptions and routing rules work together​

Iterate on real conversations​

Keep the catalog tidy​

Pre-release checklist​

Common mistakes​