Say

Say

The say command sends synthesized speech to the remote party. The text provided can be either plain text or use SSML tags. Zerpia supports a large number of speech vendors out of the box (see list below), and you can add others via the custom speech API.

{
  "verb": "say",
  "text": "hi there!",
  "synthesizer": {
    "vendor": "google",
    "language": "en-US"
  }
}

You can use the following attributes with the say command:

Option Description Required
text Text to speak; may contain SSML tags. Yes
synthesizer.vendor Speech vendor to use (see list below, along with any others you add via the custom speech API). No
synthesizer.language Language code to use. No
synthesizer.fallbackVendor Fallback speech vendor to use (see list below, along with any others you add via the custom speech API). No
synthesizer.fallbackLanguage Fallback language code to use. No
synthesizer.gender (Google only) MALE, FEMALE, or NEUTRAL. No
synthesizer.voice Voice to use. Note that the voice list differs whether you are using AWS or Google. Defaults to application setting, if provided. No
loop The number of times a text is to be repeated; 0 means repeat forever. Defaults to 1. No
earlyMedia If true and the call has not yet been answered, play the audio without answering the call. Defaults to false. No

Text-to-Speech Vendors

Zerpia natively supports the following text-to-speech services:

  • AWS
  • Azure
  • Deepgram
  • ElevenLabs
  • Google
  • IBM
  • Nuance
  • NVIDIA
  • WellSaid
  • Whisper

Note: Microsoft supports on-prem and private link options for deploying the speech service in addition to the hosted Microsoft service.

Ready To Get Started?