Skip to main content

Using voice input nodes(prompt) to build a voice bot

A voice input node is used to gather user information, process it, and manage various scenarios of conversing in a human-like manner. This node is in auto-sync with the conversation design so all the designs you add there will appear here automatically.


Voice input node will work only for voice bots, i.e you should have enabled Voice bot while creating your bot.

Example use case to build a voice bot

Let us consider a use case to build a Banking voice bot that asks a set of questions to identify the user and provide a resolution for the asked queries (as shown in the below conversation). This is a happy conversation flow where the bank account holder provides his name and phone number and asks for the eligible home loan and the bot replies with the details.

Let's break down the conversation:

  • Bot (asks): What is your name?
  • User(replies): Karan
  • Bot(stores): Karan as Name
  • Bot (asks): How may I help?
  • User(replies): Loan eligibility
  • Bot(understands): User request = Loan eligibility
  • Bot(logic): Required phone number to calculate Loan eligibility
  • Bot (asks): What is your phone number?
  • User(replies): 9890**
  • Bot(validates if the number is correct): Uses logic to calculate Loan eligibility
  • Bot (response): You can avail X amount on X% interest.

--- End of the call ---


For guidelines to build a good conversation, click here.

1. Access voice input node

Go to Studio > Prompts > Voice input

Once you click the voice node, you will see the following sections,

  1. Input type

Input type defines the type of information the node will collect from the end user

  1. Validator

Validator validates the user input against the chosen criteria.

  1. Capturing the user input

This section enables the configuration of how the bot should gather user inputs.

  1. Additional Settings

Additional configurations to increase the conversation's authenticity and emulate human-like interactions.


Values updated in this node will override the global values.

Here's a detailed account of all the above mentioned sections,

1.1 Input type

Input typeChoose the type of user response.
3. Phone
4. Question
Bot SpeaksThe bot's reply that will be vocalized to the end-user during a call.
Repeat messageThe bot's reply that will be vocalized upon the user's request for the bot to reiterate the question or prompt.

You can add multiple messages to all of the above mentioned options by adding random messages . You can also play the message (ssml) to hear the bot response.

1.2 Validator

How do you want to validate user inputChoose the criteria against which the user input should be validated.
Validator Failure MessageFailure Message - This message is vocalized when the validation fails
No response message - This message is vocalized when there is no user response in the configured max response time.
+Add Message - Click this button to add up to three successive fallback responses for both invalid and no-response scenarios.
Boost phrasesSome user responses can be confusing for the bot to understand. Region specific words, new genz lingos, internet terminologies, trending phrases, abbreviations are trained specially so that the bot understands the exact intention. For example, COVID is a new term that has been used frequently, the phrase COVID must be boosted, otherwise it gets translated to kovind/ go we/ co-wid etc. Ex - you should add the phrases that you expect from the user response like, < I want to take covid vaccine >

You can choose several intents and entities in the validator. If the user's intention matches any of them, it will be validated, and the flow will move to the next node.

1.3 Capturing user input

Capture input asCapture the user response
3. Voice and Keypad

The following fields will be displayed based on the selection of user response.


STT engineSelect an engine from the dropdown- Google/Microsoft.
STT modeSelect mode from the dropdown. Microsoft provides "Static", "Streaming" or "Streaming Advanced", "Streaming 2.0". Google provides "Static".
STT languageSpeech-To-Text i.e. transcription language (or user language)(ISO code) can be selected from the dropdown. Click Microsoft or Google for more information on the languages)
STT engine endpointOptional endpoint of the STT custom model.
Recording max durationThis value is the Max duration for which the bot will wait after asking a question (in any step) even while the user is speaking. For example, after asking “Which city are you from?” and the recording duration value is “5" - the bot records only 5 seconds of user response. This option is necessary to avoid consuming unwanted information and to stay with the conversational flow. If the user mistakenly replies with long paragraphs when a question is asked or if the user's response is getting shadowed with constant background noises, the bot must not process those long inputs. Hence, with this configuration, the bot only takes the necessary response and can quickly process the user response.
Recording silence durationThis value is the Max duration for which the bot will wait after asking a question (in any step) for the user to respond. For example, if recording silence duration is 5 seconds, bot waits for 5 seconds for the response if the user is silent. If the user does not respond anything within 6 seconds, bot Message will be played.
Initial silence durationTo provide more customization on the silence duration parameter, “streaming” and “streaming-advanced” STT modes (of Microsoft STT engine) allow to specifically configure the maximum acceptable silence duration before the user starts speaking. For example, the acceptable initial silence duration for the application number question could be higher (~3/4 seconds) but in the case of a quick conversational binary question, it could be configured to 1 second.
Final silence durationSimilar to the initial silence duration, the final silence duration is indicative of the maximum duration of pause that the bot will wait for once the user has started speaking. For example, for binary/one-word questions like yes/no we could set the final silence duration to ~0.5/1.0 seconds and for address-like fields where taking a pause is intrinsic in conversation, we can set the final silence duration to ~1.5/2.5 seconds.


DTMF digital lengthEnter the length of characters to be captured. Ex: For an indian phone number, it is 10.
DTMF finish characterCharacter which defines when the bot must stop capturing. Supported finish characters - "*" and "#"

Voice and keypad

All the above mentioned options for Voice and Keypad will be listed together.

1.4 Additional settings

Enable wait messageEnable this toggle to vocalize an acknowledgement message to the user awaiting a message from the agent.
Wait messageAcknowledgement message displayed to the user.
Recording actionWith the recording management options, you can select to pause/resume/stop recording depending upon different use-cases and conversations. By default, the recording is ON. Once you STOP the recording (for recording sensitive dialogues), it can’t be resumed back.
TTS engineSelect the engines from the dropdown- Microsoft Azure, Google Wavenet, Amazon Polly.
Text typeSelect Text/SSML from the dropdown.
TTS languageBot Language(ISO code) can be selected from the dropdown.
PitchPitch value can be any decimal value depending on the base of voice required, 0 is ideal. You can add this for Microsoft if text_type = "text" and for Google for text_type = "text" and "SSML".
Voice IDType the characters of voice ID. You can add this for Microsoft if text_type = "text" and for Google if text_type = "text" and "SSML".
TTS SpeedThis value defines how fast the bot must converse. This value can be 0.9 - 1.5 for the bot to soundly humanly. You can add this for Microsoft if text_type = "text" and for Google if text_type = "text" and "SSML".

2. Voice Input node demo

Here's a demo on collecting a user's phone number using the voice input node.

  1. In the Input type section, choose the Input type as Phone and enter the messages for Bot speaks and Repeat message fields.

  1. In the Validator section, choose Phone in the How do you want to validate user input drop-down and enter the Failure message and No Response message. Mention the Boost Phrases too.

  1. In the Capturing user input section, choose Voice and Keypad in the Capture input as drop-down and fill in the rest of the fields.

  1. Store the response in a variable.