Skip to main content

Using voice input nodes(prompt) to build a voice bot

The voice input node offers added features that enhance the performance of voice bots. Building voice bots using the "Voice input" node improves conversational experiences with a more natural and human-like interaction style.

  • Streamlined SSML handling: Removes the necessity for manual SSML (Speech Synthesis Markup Language) coding.
  • Simplified validation process: Integrates validation directly into the flow, eliminating the need for separate configuration.
  • Intent and Entity recognition: Facilitates the identification of user intents and entities, enabling the voice bot to understand and respond appropriately to diverse user queries and requests.
  • Tracking of unidentified utterances: Provides the ability to monitor and manage unidentified user utterances efficiently.

Build a flow with voice input node

Use-case

Consider building a Banking voice bot that asks users questions to identify them and resolve their queries. In this example, the bot guides the user through providing their name and phone number to inquire about an eligible home loan, receiving detailed responses.

Let's break down the conversation:

Bot (asks): What is your name?
User (replies): Karan
Bot (stores): Karan as Name
Bot (asks): How may I help?
User (replies): Loan eligibility
Bot (understands): User request = Loan eligibility
Bot (logic): Required phone number to calculate Loan eligibility
Bot (asks): What is your phone number?
User (replies): 9890**
Bot (validates if the number is correct): Uses logic to calculate Loan eligibility
Bot (response): You can avail X amount on X% interest.

--- End of the call ---

note

For guidelines to build a good conversation, click here.


Steps to configure voice input node

Prerequisite

The Voice Input node will only work for voice bots, meaning you must have enabled Voice bot when creating your bot.

center>

Steps to configure

  1. Go to Automation > Prompts > Voice input.

  2. In Input type, specify the type of information the node will collect from the end user.

    1. In Input type, choose your preferred input. For example, if Phone* is selected, the bot will prompt the user to input their phone number during the conversation.
    2. In Bot speaks, enter the message that the bot will say to request the user's phone number.
    3. In Repeat message, input the message that the bot will repeat if the user's response is unclear or requires verification.
  3. In Validator, validate the user input against the chosen criteria.

  4. In Capturing the user input, configure how the bot should gather user inputs.

  5. In Additional settings, adjust configurations to enhance conversation authenticity and emulate human-like interactions.

image

note

Values updated in this node will override the global values.

Example: Configure node for phone number

Following are the steps to collect a user's phone number using the voice input node:

  1. In the Input type section, choose the Input type as Phone and enter the messages for Bot speaks and Repeat message fields.

image

  1. In the Validator section, choose Phone in the How do you want to validate user input drop-down and enter the Failure message and No Response message. Mention the Boost Phrases too.

image

  1. In the Capturing user input section, choose Voice and Keypad in the Capture input as drop-down and fill in the rest of the fields.

image

  1. Store the response in a variable.


Input type

FieldDescription
Input typeChoose the type of user response.
1.Name
2.Email
3. Phone
4. Question
Bot SpeaksThe bot's reply that will be vocalized to the end-user during a call.
Repeat messageThe bot's reply that will be vocalized upon the user's request for the bot to reiterate the question or prompt.

You can add multiple messages to all of the above mentioned options by adding random messages . You can also play the message (SSML) to hear the bot response.

PCI data masking in voice calls

PCI data masking is a critical process used to conceal sensitive user information during voice interactions, ensuring compliance with PCI DSS (Payment Card Industry Data Security Standard) requirements. This process protects cardholder data from unauthorized access while allowing essential operations to proceed within the yellow.ai bot.

For instance, when the bot accepts credit card information as user input, it passes this data through an API call to process the payment securely. In the voice input node, you can select specific input types from a dropdown, including:

  • PCI - Card Number
  • PCI - Card Security Code (CVV)
  • PCI - Card PIN
  • PCI - Card Expiry Date

The validator ensures that all credit card data is masked, meaning it will not appear in the conversation logs, backend services, or any storage system. This data is completely protected and not retained in any logs or databases.

drawing

Validator

How do you want to validate user input

Choose the criteria against which the user input should be validated.
1. Name
2. Phone
3. Email
4. Intent
5. Entity
6. Regex
7. Intent and entity

note

You can choose several intents and entity or their combination in the validator. If the user's intention matches any of them, it will be validated, and the flow will move to the next node.

image

Failure Message

Failure message is vocalized when the validation fails.

  • +Add message: Click this button to add up to three successive fallback responses when the bot encounters invalid response from the user.

No response message

No response message is vocalized when there is no user response in the configured max response time.

  • +Add message: Click this button to add up to three successive fallback responses when the bot gets no-response from the user.
note

You can add upto 3 Failure messages and No response messages each.

Boost phrases

Some user responses can be confusing for the bot to understand. Region specific words, new genz lingos, internet terminologies, trending phrases, abbreviations are trained specially so that the bot understands the exact intention. For example, COVID is a new term that has been used frequently, the phrase COVID must be boosted, otherwise it gets translated to kovind/ go we/ co-wid etc. Ex - you should add the phrases that you expect from the user response like, < I want to take covid vaccine >


Capturing user input

Capture input as:

You can capture the user response as:

  • Voice: Capture only voice input.
  • Keypad: Capture only input from the keyboard.
  • Voice and Keypad: Capture input from both voice and keyboard.

Voice

FieldsDescription
STT engineSelect an engine from the dropdown- Google/Microsoft.
STT modeSelect mode from the dropdown. Microsoft provides "Static", "Streaming" or "Streaming Advanced", "Streaming 2.0". Google provides "Static".
STT languageSpeech-To-Text i.e. transcription language (or user language)(ISO code) can be selected from the dropdown. Click Microsoft or Google for more information on the languages)
STT engine endpointOptional endpoint of the STT custom model.
Recording max durationThis value is the Max duration for which the bot will wait after asking a question (in any step) even while the user is speaking. For example, after asking “Which city are you from?” and the recording duration value is “5" - the bot records only 5 seconds of user response. This option is necessary to avoid consuming unwanted information and to stay with the conversational flow. If the user mistakenly replies with long paragraphs when a question is asked or if the user's response is getting shadowed with constant background noises, the bot must not process those long inputs. Hence, with this configuration, the bot only takes the necessary response and can quickly process the user response.
Recording silence durationThis value is the Max duration for which the bot will wait after asking a question (in any step) for the user to respond. For example, if recording silence duration is 5 seconds, bot waits for 5 seconds for the response if the user is silent. If the user does not respond anything within 6 seconds, bot Message will be played.
Initial silence durationTo provide more customization on the silence duration parameter, “streaming” and “streaming-advanced” STT modes (of Microsoft STT engine) allow to specifically configure the maximum acceptable silence duration before the user starts speaking. For example, the acceptable initial silence duration for the application number question could be higher (~3/4 seconds) but in the case of a quick conversational binary question, it could be configured to 1 second.
Final silence durationSimilar to the initial silence duration, the final silence duration is indicative of the maximum duration of pause that the bot will wait for once the user has started speaking. For example, for binary/one-word questions like yes/no we could set the final silence duration to ~0.5/1.0 seconds and for address-like fields where taking a pause is intrinsic in conversation, we can set the final silence duration to ~1.5/2.5 seconds.

Keypad

FieldDescription
DTMF digital lengthEnter the length of characters to be captured. Ex: For an indian phone number, it is 10.
DTMF finish characterCharacter which defines when the bot must stop capturing. Supported finish characters - "*" and "#"

Voice and keypad

All the above mentioned options for Voice and Keypad will be listed together.


Additional settings

FieldsDescription
Enable wait messageEnable this toggle to vocalize an acknowledgement message to the user awaiting a message from the agent.
Wait messageAcknowledgement message displayed to the user.
Recording actionWith the recording management options, you can select to pause/resume/stop recording depending upon different use-cases and conversations. By default, the recording is ON. Once you STOP the recording (for recording sensitive dialogues), it can’t be resumed back.
TTS engineSelect the engines from the dropdown- Microsoft Azure, Google Wavenet, Amazon Polly.
Text typeSelect Text/SSML from the dropdown.
TTS languageBot Language(ISO code) can be selected from the dropdown.
PitchPitch value can be any decimal value depending on the base of voice required, 0 is ideal. You can add this for Microsoft if text_type = "text" and for Google for text_type = "text" and "SSML".
Voice IDType the characters of voice ID. You can add this for Microsoft if text_type = "text" and for Google if text_type = "text" and "SSML".
TTS SpeedThis value defines how fast the bot must converse. This value can be 0.9 - 1.5 for the bot to soundly humanly. You can add this for Microsoft if text_type = "text" and for Google if text_type = "text" and "SSML".

Custom response

Custom responses for each identified input

When the system is expecting an input from a user, it handles scenarios where the input doesn't align with expectations. In such cases, customized responses can be used to guide or assist the user. Each input is treated as a condition within an if-else structure. Validators are utilized to define these conditions. For every validation or validator, the voice input node interprets it as a distinct if-else condition. If the user's input matches one of the expected values defined by the validators, the corresponding action or response is triggered. If none of the expected values match the user's input, it's categorized as an unidentified utterance.

image

Fallback

Fallback nodes can be linked directly to the voice input node. To specify the action in case of encountered issues or call failures, connect the Fallback for failure point to the subsequent node. To proceed with the next action if the user hasn't responded to the input, connect the Failure for no response point to the next node.


Unidentified input storage

When the validation fails, the utterance is saved in the default table along with the following data:

  • SID
  • Phone Number
  • Drop Step [same as the voice input node name]
  • Utterance
  • Recording URL
  • Chat URL
  • Language
  • Translated Message
  • Date & Time (of the call initiation)
  • Issue [UIU / Intent Missing - {{Intent name}}]