Overview Webhook

Overview

Zerpia controls calls through JSON payloads exchanged over either an HTTP(s) connection. When an incoming call for your account is received, Zerpia retrieves the URL you’ve configured for the application you want to run.

If the URL begins with ‘http(s)://’, Zerpia makes an HTTP request to that URL. Zerpia then sends an initial message describing the incoming call, and your web app is responsible for returning a JSON payload that indicates how you want the call handled.

Specifically for HTTP, when an incoming call for your account is received, Zerpia makes an HTTP request to your configured URL, and your web app will then return a response containing a JSON body that indicates how you want the call handled.

Generating an outbound call works similarly: you’ll make an HTTP request using the Webhook, specifying a URL or application identifier to be invoked once the call is answered. Again, your response to that HTTP request will contain a JSON body indicating how you want the call handled.

Simple enough, right?

—

Basic JSON Message Structure

The JSON payload you provide in response to an HTTP request must be an array, with each item describing a task the platform should perform. These tasks are executed sequentially in the order they appear in the array. Each task is identified by a **verb** (e.g., “dial”, “gather”, “hangup”, etc.) with associated details to configure how the action should be carried out.

If the caller hangs up during the execution of an application for that call, the current task is allowed to complete, and any remaining tasks in the application are ignored.

[
  {
    "verb": "say",
    "text": "Hi there! Please leave a message at the tone.",
    "synthesizer": {
      "vendor": "Google",
      "language": "en-US",
      "gender": "FEMALE"
    }
  },
  {
    /* ..next verb */
  }
]

Some verbs allow other verbs to be nested; for example, “gather” can have a nested “say” command to play a prompt and collect a response in one command:

{
  "verb": "gather",
  "actionHook": "/gatherCardNumber",
  "input": ["speech", "dtmf"],
  "timeout": 16,
  "numDigits": 6,
  "recognizer": {
    "vendor": "Google",
    "language": "en-US"
  },
  "say": {
    "text": "Please say or enter your six-digit card number now",
    "synthesizer": {
      "vendor": "Google",
      "language": "en-US",
      "gender": "FEMALE"
    }
  }
}

Overall, a simple voicemail application could look like this:

[
  {
    "verb": "say",
    "text": "Hi there! Please leave a message at the tone, and we will get back to you shortly."
  },
  {
    "verb": "listen",
    "actionHook": "http://example.com/voicemail",
    "url": "wss://example.com/my-recorder",
    "finishOnKey": "#",
    "metadata": {
      "topic": "voicemail"
    },
    "playBeep": true,
    "timeout": 20
  },
  {
    "verb": "say",
    "text": "Thanks for your message. We'll get back to you."
  }
]

—

HTTP Connection Details

Each HTTP request Zerpia makes to one of your callbacks will include (at least) the following information, either as query arguments (in a GET request) or in the body of the response as a JSON payload (in a POST request):

call_sid: A unique identifier for the call.
application_sid: A unique identifier for the Zerpia application controlling this call.
account_sid: A unique identifier for the Zerpia account associated with the application.
direction: The direction of the call: inbound or outbound.
from: The calling party number.
to: The called party number.
caller_id: The caller name, if known.
call_status: Current status of the call; see table below.
sip_status: The most recent SIP status code received or generated for the call.

Additionally, the request **MAY** include:

parent_call_sid: The call_sid of a parent call to this call, if this call is a child call.

And the initial webhook for a new incoming call will have:

originating_sip_trunk_name: Name of the SIP trunk that originated the call to the platform.
originating_sip_ip: The IP address and port of the SIP gateway that originated the call.

Finally, if you specify to use a POST method for the initial webhook for an incoming call, the JSON payload in that POST will also contain the entire incoming SIP INVITE request details in a ‘sip’ property (this is not provided if a GET request is used). This can be useful if you need a detailed look at all of the SIP headers or the Session Description Protocol being offered.

Note: You can add arbitrary information of your own into the payloads that Zerpia sends you by using the tag verb early in your application flow. Data elements you provide in that verb will then come back to you in further webhook callbacks for that call. This can be useful for managing stateful information during a call that you may want to drive decision logic later in the call.

call_status value	description
`trying`	A new incoming call has arrived or an outbound call has just been sent.
`ringing`	A 180 Ringing response has been sent or received.
`early-media`	An early media connection has been established prior to answering the call (183 Session Progress).
`in-progress`	Call has been answered.
`completed`	An answered call has ended.
`failed`	A call attempt failed.
`busy`	A call attempt failed porque el destinatario devolvió estado de ocupado.
`no-answer`	A call attempt failed porque no fue contestada a tiempo.

—

Securing Your HTTP Endpoints

Before we go any further, let’s discuss how to properly secure your endpoints.

This is important because your response to HTTP webhook requests will contain information that must be kept private between you and the Zerpia platform. We recommend using HTTPS connections secured with TLS certificates for your endpoints, and additionally taking steps to verify that the incoming request was actually sent by Zerpia, and not an imposter.

For the latter, you have two options:

You can use HTTP basic authentication to secure your endpoint with a username and password.
On the hosted platform, you can verify the signature of the HTTP request to confirm it was sent by Zerpia.

Verifying a Signed Request

HTTP requests sent to you from the hosted platform will include a Zerpia-Signature header. This is a hash of the request payload signed with your webhook secret, which you can view (and change, when desired) in the self-service portal. Using that secret, you can verify that the request was actually sent by Zerpia.

—

Initial State of Incoming Calls

When the Zerpia platform receives a new incoming call, it responds 100 Trying to the INVITE but doesn’t automatically answer the call. It’s up to your application to decide how to finally respond to the INVITE. You have a few choices here.

Your application can:

**Answer the call:** This connects the call to a media endpoint that can perform IVR functions.
**Outdial a new call:** Bridge the two calls together (i.e., use the dial verb).
**Reject the call:** With a specified SIP status code and reason.
**Establish an early media connection:** Play audio to the caller without answering the call.

The last option is particularly interesting and worth further comment. The intent is to let you play audio to callers without necessarily answering the call. You signal this by including an earlyMedia property with a value of true in the application. When receiving this, Zerpia will create an early media connection (using 183 Session Progress), as shown in the example below.

Note: An early media connection will not be possible if the call has already been answered by an earlier verb in the application. In such a scenario, the earlyMedia property is ignored.

[
  {
    "verb": "say",
    "earlyMedia": true,
    "text": "Please call back later, we are currently at lunch",
    "synthesizer": {
      "vendor": "aws",
      "language": "en-US",
      "voice": "Amy"
    }
  },
  {
    "verb": "sip:decline",
    "status": 480,
    "headers": {
      "Retry-After": 1800
    }
  }
]

Please note:

The say, play, gather, listen, and transcribe verbs all support the earlyMedia property.
The dial verb supports a similar feature of not answering the inbound call unless/until the dialed call is answered via the answerOnBridge property.

—

Authenticating SIP Clients

Zerpia allows SIP clients such as softphones, SIP phones, and WebRTC clients to register with the platform and make and receive calls.

Managing SIP registrations is a shared activity between the platform and your application, using webhooks. The platform handles the SIP messaging details, but the determination of whether to authenticate a specific SIP user is the responsibility of the application, which is notified of incoming REGISTER or INVITE requests by means of a registration webhook.

This approach ensures that SIP credentials—which embody highly confidential and private information—are stored within customer networks and and are never directly exposed to the Zerpia platform.

When the platform receives an incoming SIP request from an endpoint that is not a carrier SIP trunk, the request is challenged with a 401 Unauthorized response that includes a WWW-Authenticate header.

When the originating SIP device then resends the request with credentials (e.g., an Authorization header), the SIP domain is retrieved from the request and used to look up the account that owns that domain. Then, the associated registration webhook is invoked with the details provided in the Authorization header, for example:

{
  "method": "REGISTER",
  "realm": "example.com",
  "username": "foo",
  "expires": 3600,
  "nonce": "InFriVGWVoKeCckYrTx7wg==",
  "uri": "sip:example.com",
  "algorithm": "MD5",
  "qop": "auth",
  "cnonce": "03d8d2aafd5a975f2b07dc90fe5f4100",
  "nc": "00000001",
  "response": "db7b7dbec7edc0c427c1708031f67cc6"
}
```

The application’s responsibility is to retrieve the password associated with the username and perform digest authentication to authenticate the request using the information provided, including the calculated response value.

Regardless of whether the request is authenticated or not, the application should respond with a 200 OK to the HTTP POST and with a JSON body.

If the request is authenticated, the JSON body in the response should simply contain a status attribute with a value of ok, for example:

{
  "status": "ok"
}

If the application wishes to enforce a shorter expires value, it may include that value in the response, for example:

{
  "status": "ok",
  "expires": 1800
}

If the request is *not* authenticated, the JSON body in the response should contain a status of fail and optionally a msg attribute, for example:

{
  "status": "fail",
  "msg": "invalid password"
}
```

—

Speech Integration

The platform makes use of text-to-speech (TTS) as well as real-time speech recognition (STT). Both Google and AWS are supported for both TTS and STT.

Synthesized audio is cached for up to 24 hours. This means if the same {text, language, voice} combination is requested more than once in that period, it will be served from cache, thereby reducing your TTS bill from Google or AWS.

When you configure your applications in the portal, you can set defaults for the language and voice to use for speech synthesis, as well as the language to use for speech recognition. These can then be overridden by verbs in the application, by using the synthesizer and recognizer properties.

—

Webhook URL Specifiers

Many verbs specify a webhook that will be called when the verb completes or has some information to deliver to your application. These verbs contain a property that allows you to configure that webhook. By convention, the property name will always end in “Hook”; e.g., actionHook, dtmfHook, and so on.

You can either specify the webhook as a simple string, providing an absolute or relative URL:

"actionHook": "https://my.appserver.com/results"

"actionHook": "/results"

In the latter case, the base URL of the application will be applied.

Alternatively, you can provide an object containing a URL and optional method and basic authentication parameters, for example:

"actionHook": {
  "url": "https://my.appserver.com/results",
  "method": "GET",
  "username": "foo",
  "password": "bar"
}

In the sections that follow, we will describe each of the verbs in detail.