Open VXI Architecture
The Open VXI VoiceXML interpreter is a portable open source library
that interprets the VoiceXML dialog markup language. It is designed to
serve as a reference for parties interested in understanding how VoiceXML
might be interpreted, as a component of a VoiceXML-based debugger, browser
or other VoiceXML-based system. Although it is perfectly suitable
for PC desktop applications, its design reflects VoiceXML's target of telephony
platforms.
Introduction
In our original development efforts we used the Open VXI as a component
to implement a telephony based speech browser system incorporating SpeechWorks
6.5 speech recognition engine and Speechify text-to-speech (TTS) engine.
A browser is a client application program that takes one or more input
streams, on a platform and executes an application that lives on one or
more document servers by interpreting markup. In the case of VoiceXML,
the application consists of the call flow logic, the prompts for the application,
and any associated grammars (See figure below). The document server executes
portions of the application dialog by delivering VoiceXML markup to the
browser in response to a document request. The markup interpreter renders
the VoiceXML markup within an interpreter context, perhaps changing the
context, and then makes calls into the implementation platform. The implementation
platform contains all of the resources needed by the markup interpreter
to render the dialog.
Process Architecture
The figure above shows this process architecture. When a call is received
it is detected by the implementation platform. The platform sends an event
to the markup interpreter, which looks in its context for the URI of the
initial document to fetch. The interpreter then sends a Request to the
Document Server for the initial document. The Document Server then sends
the document back to the Markup Interpreter that then instructs the Implementation
Platform on the first steps to perform on behalf of the caller. The Markup
Interpreter then interprets the result of an execution in the Implementation
Platform. The interpretation may result in the Markup Interpreter making
additional document requests to the Document Server.
System Architecture
The next figure shows the system architecture where the Open VXI is integrated
onto a speech browser by adding SpeechWorks technology as described above
and receives VoiceXML pages from a document server. The document server
consists of a web server, potentially an application framework, and a VoiceXML
application. The VoiceXML application can be one or more VoiceXML files,
or these files can be dynamically generated using CGI scripts or other
computations.
The speech browser executes the VoiceXML pages to provide the speech
service to the caller connected over the telephone network. The Client
logically consists of four parts:
-
An operations administration and maintenance system and main process.
This collection of tools is responsible for system management and error
reporting. This critical component also invokes the speech browser within
a thread it creates to being execution.
-
The Open VXI. This is the component that interprets the VoiceXML
markup and calls into the implementation platform to render the markup.
-
The platform providing the services necessary for the system to
run. The Open VXI software specifies platform APIs and services which must
be implemented in order for the system to function. The APIs do not define
the mechanism for communication between the implementation of the API and
the recognition engine. This could be done using client/server or direct
communication.
-
The hardware and base OS. The hardware and OS layer contains the
base operating system services and the hardware needed to receive phone
calls. The Open VXI toolkit provides an API for telephony events. Any input
conforming to this standard is supportable. A threaded OS is assumed. NT4,
Windows 2000, or standard Unix releases like Linux, and Solaris are supportable
by the toolkit.
Open VXI Toolkit Functionality
The figure above shows the Open VXI toolkit architecture and the component
parts in the case of integration with SpeechWorks products. All components
are designed to be portable across NT and Unix operating systems although
not all reference implementations will be portable on the first release.
The toolkit consists of:
-
The Open VXI
The VXI interprets all VoiceXML markup and acts as the main control
loop. The VXI fully implements the VoiceXML 1.0 language and contains the
addition of session scope support.
-
An XML parser API.
The API provides access to a XML DOM parser. An API implementation
that integrates with the Apache Xerces DOM parser is also included.
-
A JavaScript engine API The API provides access to a JavaScript interpreter.
An implementation of the API that integrates with the Mozilla SpiderMonkey
JavaScript interpreter is also provided.
-
An Internet/OS Library API
The Internet/OS library provides platform independent access to the
Internet and the operation system. An implementation of the API for Windows
NT using the WININET DLL is provided as part of the toolkit.
-
A Logging API
A reference logging API is provided for a start on operations, administration,
and maintenance (OA&M) capabilities. The API defines logging methods
for errors and events, but now how the log is constructed or managed. An
implementation for file based error and event logging is supplied as part
of the toolkit.
The core resets on a set of platform APIs. These include:
-
A Recognizer API
The Recognizer API must support the full VoiceXML specification. This
requires that the API support dynamic grammar construction and grammar
enabling. A reference implementation that works with a text input engine
is provided as open source.
-
A Prompt API
The Prompt API is used to interpret the <prompt> markup. This markup
can contain embedded scripting, audio files, and text-to-speech. The prompt
API is a master API that dispatches to module APIs beneath to play particular
prompt types. An implementation that supports text output will be supplied.
The API supports asynchronous play, receiving a callback on prompt completion,
flushing prompts, and handling URI based prompts. The Prompt API also supports
TTS prompts.
-
A Telephony API
The Telephony API supports all the telephony events that can be delivered
and call control methods in VoiceXML.
-
Object API
The object API will provide support for integration of vendor specific
plug-ins. This API will be available in a later release
General API requirements
-
All APIs are written in C
-
All APIs use a base type system. This type system abstracts all the basic
C types and enable platform independence
-
All APIs follow the standard call convention: VXIresult VXI<module name><component><function>
(handle, in variables, in/out variables, out variables)
-
All implementations are DLLs on NT or shared libraries on UNIX. Where module
is the name of the API and is in lower case. Component is the name of major
piece of the API and is optional, but must be in upper case. Function is
the name of the function on the API and must start with an upper case.
Handle is a typed handle for the API or an address to a handle. VXIresult
is a standard result type that can be mapped into error codes.
-
All memory management is isolated to a module or API via a handle. Constructors
and destructors on that memory are supplied by the module or API.
-
Error codes returned by functions must by negative if they are errors and
are provided in ranges for each API.
VoiceXML is a Trademark of the VoiceXML
forum.
Copyright 2000, 2001. SpeechWorks International, Inc. All rights reserved. Distributed
under SpeechWorks Open Document License,
v1.0