Overview of Speech Link Architecture.

The speech link protocol is an application-layer control protocol for transferring callers between cooperating speech applications, pre-existing interactive voice response (IVR) applications based on dual-tone multi-frequency (DTMF, or TouchTone) input, and human agents both in call centers and in other locations. The voice path of the call can be carried via the Public Switched Telephone Network (PSTN), a Voice over IP (VoIP) network, or a combination of the two. The control of the call is handled via an IP network, usually the internet. A connection managed via the speech link protocol is referred to as a "speech link".

Speech links use the Session Initiation Protocol (SIP - IETF RFC2543) to implement call control. Information about the call is exchanged between cooperating applications using multi-part MIME-encoded user data in the SIP message body, and within the message body information is specified via XML and follows the XML syntax. The speech link protocol uses specific XML tags itself allowing cooperating applications to exchange information through the same mechanism. An additional MIME slot is provided for passing application specific data.

A description of the speech link protocol is provided in speech link protocol .

Component Architecture

Speech links within a SpeechWorks recognizer system provides a new hardware layer for SpeechWorks. Speech links is essentially an extension to the telephony hardware driver and is responsible for managing all call-control related functions. The figure above shows a stack of the components.

The hardware telephony API ( SLhwTel) provides the mechanism to control calls through the low level drivers for the telephony hardware. It is responsible for setting up lines off or on hook, dialing, transferring calls, detecting hang ups from other parties, receiving and sending DTMF, providing event handling mechanism, and setting and querying hardware properties of the underlying hardware.

The SIP message stack ( SLsip ) is responsible for low level SIP message marshaling, tying messages to call sessions, and handling IP based events.

The speech link Telephony Layer ( SLtel) handles the coordination between the SIP stack and the hardware telephony stack. It has the ability to launch threads to handle call coordination, and bridges calls.

The SLtel API uses data buffer passing and event returns to communicate with the external world. It also supports call control for setting lines to different states. The SLhwTel and SLtel APIs must be semantically equivalent and syntactically almost the same because from the point of view of the external use, speech link SIP based calls are just an extension to the telephony API. They have to be implemented differently because the transfers are different semantics with or without speech link .

The Sldata API provides mechanism to set and retrieve values set in the body of a speech link SIP message. A SIP message may contain a body. In the context of speech links, the body, if present, is a MIME document whose mimetype is application/speechlink. Sldata is responsible to generate the MIME document based on the value associated to each attributes and to parse a MIME document to retrieve the value of each attribute of a message.

The entire API index is available at speech link interfaces.

Copyright © 2000-2001 SpeechWorks International, Inc.
This work may only be distributed under the terms of the SpeechWorks Open Document License v1.0