Sign in
Sign up
Reference Visual Designer Tutorials
    • API Overview
      • API Endpoint
      • Authentication
      • Requests
      • Responses
      • Paging
      • Reason Codes Dictionary
    • Management APIs
      • Accounts
        • Retrieve Account
        • Create Account
        • Update Account
        • Delete Account
        • Account Roles
      • Tags
        • Create Tag
        • Update Tag
        • Get Tag list
        • Retrieve Tag
        • Delete Tag
      • Profiles
        • Create Profile
        • Update Profile
        • Unlink a Profile from an Account
        • Link a Profile to an Organization
        • Unlink a Profile to an Organization
        • Get Profiles List
        • Paging
        • Filtering & Profile Document Inclusion
        • Get Profile Details
        • Get Profile Relationship
        • Get Account Relationship
        • Delete Profile
      • Applications
      • Clients
        • Create a Client
        • Delete a Client
        • Change Client’s Password
        • Get a List of Available Clients
      • Incoming Phone Numbers
        • IncomingPhoneNumber Instance Resource
        • IncomingPhoneNumbers List Resource
        • Local IncomingPhoneNumber Factory Resource
        • Toll-Free IncomingPhoneNumber Factory Resource
        • Mobile IncomingPhoneNumber Factory Resource
        • Attach a phone number to an application
        • Delete a phone number
        • List of Phone Numbers
        • Incoming Phone Number Regex Support
      • Notifications
      • Usage Records
      • Audit Logs
    • Voice APIs
      • Calls
        • Call List Resource URI
        • Making a Call
        • Modifying Live Calls
        • Examples
        • List Filter
        • Paging Information
      • Outgoing Caller ID
      • Conference Management
        • Supported Operations
        • Conference List Resource URI
      • Conference Participants Management
        • Participants List Resource URI
      • Recordings
      • SIP Refer Support
    • SMS APIs
      • Outgoing Caller ID
      • Messages
        • Send SMS
        • Get SMS List
        • Get single SMS Information
        • SMS Attributes
      • Email
    • Turnkey Apps APIs
      • Microsoft Teams
        • Business Customer
        • Manage Mappings
        • Bot details
        • Messages
      • Smart 2FA
        • Sending One-Time Passwords
        • Verifying One-Time Passwords
        • Cancel One-Time Passwords
        • Session Detail Record (SDR)
        • Get list of One-Time Passwords
        • Get a Single One-Time Password
        • Usage Record One-Time Passwords
      • Message Exchange for Cisco UC-One
        • Create Operation
        • Read Operation
        • Update Operation
        • Delete Operation
        • Error Codes
      • Message Exchange for Cisco Webex Teams
        • Create Operation
        • Read Operation
        • Update Operation
        • Delete Operation
      • Call Queuing
        • Create Queue
        • Queue RCML
        • Enqueue RCML
      • Auto Attendant
        • Enterprise
        • User
        • Announcement
        • Auto Attendant System
        • Menu
        • Schedule
        • Phone Number
        • HMAC Key
      • Number Masking
      • Task Router
        • Create Enterprise
        • Get a List of Enterprises
        • Get Single Enterprise
        • Delete Enterprise
        • Create User
        • Get a List of Users
        • Get a Single User
        • Update User
        • Delete user
      • Campaign Manager
        • Business Customers
          • Business Customer Status
          • Create Business Customer
          • Update Business Customer
          • Delete Business Customer
          • Get List of Business Customers
          • Get Single Business Customer
        • User
          • User Role and Status
          • Create User
          • Update User
          • Delete User
        • Get List of Users
        • Get Single User
        • Manage Credits
        • Create Credit
          • Get List of Credits
          • Get Single Credit
        • Campaign
          • Campaign Status
          • Get List of Campaigns
          • Get Single Campaign
    • RCML
      • Overview
        • Interacting with Your Application
        • RCML Verbs
      • Dial
        • Client
        • Conference
        • Number
        • SIP
      • Email
      • Gather
      • Say
        • SSML Reference
      • Play
      • SMS
      • Hangup
      • Pause
      • Redirect
      • Record
      • Reject
docs 1.0
  • docs
    • 1.0
  • docs
  • CSP:RCML
  • CSP:Say
  • CSP:SSML Reference

Text-to-Speech and SSML Support

Text-to-Speech (TTS)

The <Say> verb is used to convert text into a human-like speech real-time. All you need is to provide the text in the Visual designer’s Say element and Restcomm will synthesize speech and playback the audio. The default TTS provider is Amazon Polly. A default US English dialect is used with a male voice.

When using <Say> you have a choice between using male or female Google or Amazon Polly voices.

Speech Synthesis Markup Language (SSML)

You can send Speech Synthesis Markup Language (SSML) in your Text-to-Speech request to allow for more customization in your audio response by providing details on pauses, and audio formatting for acronyms, dates, times, abbreviations, or text that should be censored.

Supported Voices and languages

For detailed information about all supported languages and voice with Amazon Polly and Google please visit the following resources:

  • Amazon Polly supported voices and languages

  • Google supported voices and languages

Examples

SSML Markup and Text-to-Speech Synthesizes of The Text

<speak>
  This is a <say-as interpret-as="characters">SSML</say-as> example.
  I can pause <break time="3s"/>.
  I can play a sound
  <audio src="https://www.example.com/MY_MP3_FILE.mp3">didn't get your MP3 audio file</audio>.
  I can speak in cardinals. Your number is <say-as interpret-as="cardinal">10</say-as>.
  Or I can speak in ordinals. You are <say-as interpret-as="ordinal">10</say-as> in line.
  Or I can even speak in digits. The digits for ten are <say-as interpret-as="characters">10</say-as>.
  I can also substitute phrases, like the <sub alias="World Wide Web Consortium">W3C</sub>.
  Finally, I can speak a paragraph with two sentences.
  <p><s>This is sentence one.</s><s>This is sentence two.</s></p>
</speak>

Below is the synthesized text for the example SSML document:

This is a S S M L samples. I can pause [3 second pause]. I can play a sound [audio file plays].
I can speak in cardinals. Your number is ten.
Or I can speak in ordinals. You are tenth in line.
Or I can even speak in digits. The digits for ten are one oh.
I can also substitute phrases, like the World Wide Web Consortium.
Finally, I can speak a paragraph with two sentences. This is sentence one. This is sentence two.

The Google Cloud Text-to-Speech supports a subset of available SSML tags.

For more information about how to create audio data from SSML input with the Google Cloud Text-to-Speech, see Creating Voice Audio Files.

Google Cloud Support for SSML Elements

You can use various SSML elements and options for your actions. For more information check out Google Cloud Support for SSML elements.

Amazon Polly Support for SSML Elements

For more information about Amazon Polly supported SSML tags visit Amazon Polly Supported SSML Tags.

Using Speech Synthesis Markup Language (SSML) in Visual Designer

You can use SSML within a <Say> verb in Visual designer as shown below.

  • Click on the gear icon to expand the <Say> verb settings. You will notice a Language drop-down field. Select the desired language.

  • Select the male or female icon next to the Language field to set a voice variation.

  • Save your application.

Using SSML in Vsual designer

Using Speech Synthesis Markup Language (SSML) in RCML

You can use SSML in your RCML applications as follows to create pauses, and audio formatting for acronyms, dates, times, abbreviations, or text that should be censored.

The <emphasis> element can be used to add or remove emphasis from text contained by the element as follows.

<Response>
<Say voice="woman" language="en" loop="3">
<speak>
   <emphasis level="moderate">This is an important announcement</emphasis>
</speak>
</Say>
</Response>

The <break> element lets you control pausing or other prosodic boundaries between words. Using <break> between any pair of tokens is optional. If this element is not present between words, the break is automatically determined based on the linguistic context.

This element accepts two optional attributes:

  • time: Sets the length of the break by seconds or milliseconds (e.g. "3s" or "250ms").

  • strength: Sets the strength of the output’s prosodic break by relative terms. Valid values are: "x-weak", weak", "medium", "strong", and "x-strong". The value "none" indicates that no prosodic break boundary should be outputted, which can be used to prevent a prosodic break that the processor would otherwise produce. The other values indicate monotonically non-decreasing (conceptually increasing) break strength between tokens. The stronger boundaries are typically accompanied by pauses. The following example shows how to use the <break> element to pause between steps:

<Response>
<Say voice="woman" language="en" loop="3">
<speak>
  Step 1, take a deep breath. <break time="200ms"/>
  Step 2, exhale.
  Step 3, take a deep breath again. <break strength="weak"/>
  Step 4, exhale.
</speak>
</Say>
</Response>

The <say‑as> lets you indicate information about the type of text construct that is contained within the element. It also helps specify the level of detail for rendering the contained text.

The <say‑as> element has the required attribute, interpret-as, which determines how the value is spoken. Optional attributes format and detail may be used depending on the particular interpret-as value. The interpret-as attribute supports the following values:

cardinal

The following example is spoken as "Twelve thousand three hundred forty five" (for US English) or "Twelve thousand three hundred and forty five (for UK English)":

<Response>
<Say voice="woman" language="en" loop="3">
<speak>
  <say-as interpret-as="cardinal">12345</say-as>
</speak>
</Say>
</Response>

ordinal

The following example is spoken as "First":

<Response>
<Say voice="woman" language="en" loop="3">
<speak>
  <say-as interpret-as="ordinal">1</say-as>
</speak>
</Say>
</Response>

characters

The following example is spoken as "C A N":

<Response>
<Say voice="woman" language="en" loop="3">
  <speak>
    <say-as interpret-as="characters">can</say-as>
  </speak>
</Say>
</Response>

expletive or bleep

The following example comes out as a beep, as though it has been censored:

<Response>
<Say voice="woman" language="en" loop="3">
 <speak>
   <say-as interpret-as="expletive">censor this</say-as>
 </speak>
 </Say>
 </Response>

verbatim or spell-out

The following example is spelled out letter by letter:

<Response>
<Say voice="woman" language="en" loop="3">
  <speak>
    <say-as interpret-as="verbatim">abcdefg</say-as>
  </speak>
</Say>
</Response>

date

The format attribute is a sequence of date field character codes. Supported field character codes in format are {y, m, d} for year, month, and day (of the month) respectively. If the field code appears once for year, month, or day then the number of digits expected are 4, 2, and 2 respectively. If the field code is repeated then the number of expected digits is the number of times the code is repeated. Fields in the date text may be separated by punctuation and/or spaces.

The detail attribute controls the spoken form of the date. For detail='1' only the day fields and one of month or year fields are required, although both may be supplied. This is the default when less than all three fields are given. The spoken form is "The \{ordinal day} of {month}, {year}".

The following example is spoken as "The thirtieth of September, two thousand and nineteen":

<Response>
<Say voice="woman" language="en" loop="3">
<speak>
  <say-as interpret-as="date" format="yyyymmdd" detail="1">
    2019-09-30
  </say-as>
</speak>
</Say>
</Response>

The following example is spoken as "The thirtieth of September":

<speak>
  <say-as interpret-as="date" format="dm">30-9</say-as>
</speak>

If you are looking for building more complex SSML scenarios make sure to check out the Google Cloud and Amazon Polly documentation pages.

Testing your SSML settings

You can test your SSML settings by initiating a call to your application. Make sure to bind it to a phone number or SIP client prior to that.

Contact Us

+1 (650) 263 6146

SALES

SUPPORT

GENERAL

Follow Us

Turnkey Applications

SMART 2FA

MESSAGE EXCHANGE

CISCO WEBEX

CALL QUEUE

Learn

BLOG

TERMS AND CONDITIONS

Additional Links

ABOUT

FAQ'S

PRIVACY POLICY

CONTACT