Open VXI Architecture


The Open VXI VoiceXML interpreter is a portable open source library that interprets the VoiceXML dialog markup language. It is designed to serve as a reference for parties interested in understanding how VoiceXML might be interpreted, as a component of a VoiceXML-based debugger, browser or other VoiceXML-based system.  Although it is perfectly suitable for PC desktop applications, its design reflects VoiceXML's target of telephony platforms.

Introduction

In our original development efforts we used the Open VXI as a component to implement a telephony based speech browser system incorporating SpeechWorks 6.5 speech recognition engine and Speechify text-to-speech (TTS) engine.  A browser is a client application program that takes one or more input streams, on a platform and executes an application that lives on one or more document servers by interpreting markup. In the case of VoiceXML, the application consists of the call flow logic, the prompts for the application, and any associated grammars (See figure below). The document server executes portions of the application dialog by delivering VoiceXML markup to the browser in response to a document request. The markup interpreter renders the VoiceXML markup within an interpreter context, perhaps changing the context, and then makes calls into the implementation platform. The implementation platform contains all of the resources needed by the markup interpreter to render the dialog.
 

Process Architecture

The figure above shows this process architecture. When a call is received it is detected by the implementation platform. The platform sends an event to the markup interpreter, which looks in its context for the URI of the initial document to fetch. The interpreter then sends a Request to the Document Server for the initial document. The Document Server then sends the document back to the Markup Interpreter that then instructs the Implementation Platform on the first steps to perform on behalf of the caller. The Markup Interpreter then interprets the result of an execution in the Implementation Platform. The interpretation may result in the Markup Interpreter making additional document requests to the Document Server.

System Architecture

The next figure shows the system architecture where the Open VXI is integrated onto a speech browser by adding SpeechWorks technology as described above and receives VoiceXML pages from a document server. The document server consists of a web server, potentially an application framework, and a VoiceXML application. The VoiceXML application can be one or more VoiceXML files, or these files can be dynamically generated using CGI scripts or other computations.

The speech browser executes the VoiceXML pages to provide the speech service to the caller connected over the telephone network. The Client logically consists of four parts:

  1. An operations administration and maintenance system and main process. This collection of tools is responsible for system management and error reporting. This critical component also invokes the speech browser within a thread it creates to being execution.
  2. The Open VXI. This is the component that interprets the VoiceXML markup and calls into the implementation platform to render the markup.
  3. The platform providing the services necessary for the system to run. The Open VXI software specifies platform APIs and services which must be implemented in order for the system to function. The APIs do not define the mechanism for communication between the implementation of the API and the recognition engine. This could be done using client/server or direct communication.
  4. The hardware and base OS. The hardware and OS layer contains the base operating system services and the hardware needed to receive phone calls. The Open VXI toolkit provides an API for telephony events. Any input conforming to this standard is supportable. A threaded OS is assumed. NT4, Windows 2000, or standard Unix releases like Linux, and Solaris are supportable by the toolkit.

Open VXI Toolkit Functionality

The figure above shows the Open VXI toolkit architecture and the component parts in the case of integration with SpeechWorks products. All components are designed to be portable across NT and Unix operating systems although not all reference implementations will be portable on the first release. The toolkit consists of:

  1. The Open VXI

  2. The VXI interprets all VoiceXML markup and acts as the main control loop. The VXI fully implements the VoiceXML 1.0 language and contains the addition of session scope support.
  3. An XML parser API.

  4. The API provides access to a XML DOM parser. An API implementation that integrates with the Apache Xerces DOM parser is also included.
  5. A JavaScript engine API The API provides access to a JavaScript interpreter. An implementation of the API that integrates with the Mozilla SpiderMonkey JavaScript interpreter is also provided.
  6. An Internet/OS Library API

  7. The Internet/OS library provides platform independent access to the Internet and the operation system. An implementation of the API for Windows NT using the WININET DLL is provided as part of the toolkit.
  8. A Logging API

  9. A reference logging API is provided for a start on operations, administration, and maintenance (OA&M) capabilities. The API defines logging methods for errors and events, but now how the log is constructed or managed. An implementation for file based error and event logging is supplied as part of the toolkit.
The core resets on a set of platform APIs. These include:
  1. A Recognizer API

  2. The Recognizer API must support the full VoiceXML specification. This requires that the API support dynamic grammar construction and grammar enabling. A reference implementation that works with a text input engine is provided as open source.
  3. A Prompt API

  4. The Prompt API is used to interpret the <prompt> markup. This markup can contain embedded scripting, audio files, and text-to-speech. The prompt API is a master API that dispatches to module APIs beneath to play particular prompt types. An implementation that supports text output will be supplied. The API supports asynchronous play, receiving a callback on prompt completion, flushing prompts, and handling URI based prompts. The Prompt API also supports TTS prompts.
  5. A Telephony API

  6. The Telephony API supports all the telephony events that can be delivered and call control methods in VoiceXML.
  7. Object API

  8. The object API will provide support for integration of vendor specific plug-ins. This API will be available in a later release

General API requirements

VoiceXML is a Trademark of the VoiceXML forum.


Copyright 2000, 2001. SpeechWorks International, Inc. All rights reserved. Distributed under SpeechWorks Open Document License, v1.0