Open VXI Library Reference
VoiceXML Coverage and Specification Interpretation.
It is the goal of the Open VXI toolkit to be fully compliant with the VoiceXML
standard as it evolves in the
W3C consortium.
This version of the VoiceXML interpreter is based on the VoiceXML 1.0 specification
(Mar. 7 2000).
The current release has several minor difference from the specification,
which come from two sources. The first is known bugs and unimplemented
semantics. The second is that the VoiceXML 1.0 specification is unclear
or ambiguous in many cases. The significant differences and interpretations
are listed below. Refer to the
Release Notes
for more detail on bugs and limitations of the current release.
-
<record> Element: The code for processing the <record>
element is fully implemented in the Open VXI. However, it is currently
short circuited to throw an error.unsupported.record event (see
VXI_form.cpp).
This we because we do not currently have any way to resprent the resulting
audio in ECMAscript as required by VoiceXML. We are currently considering
alternatives and should have a solution in subsequent releases.
We also do not support simultaneous recognition and recording in the
<record>
element. The issue here is that the VoiceXML specification is not clear
on how the handle the results. The <record> element can in
this case return two values, the recorded audio and recognition result,
but there is only one field variable to contain them. We await clarification
from the W3C before enabling simultaneous recognition and recording.
-
<transfer> Element:
-
We do not support recognition over bridged transfers.
-
There are 2 simultaeous mechanisms for handling error conditions return
values in item var, and event throwing (with potential races between them)
but apparently no mechanism to describe success conditions! We will do
the following:
-
If attempt to transfer was unsuccessful, set field var to error code
-
If transfer was successful and non-bridging, set field var to 'transferred'
and THEN throw disconnent.transfer event
-
If transfer was successful and bridging:
-
If far_end_disconnect set field var and continue in FIA
-
If our user hung up or was otherwise disconnected, throw hangup event
-
We also need return policy for maxtime. Here we will throw a telephone.bridge.timeout
event. Which will probably log the event and exit, cleaning up the session.
(Ideally the user could be tranferred without tied up local resources and
then transfered back into old session with return values. However, this
will probably require a server-side implementation).
-
<subdialog> Element:
-
The spec is somewhat unclear about which grammars are active during a subdialog.
At present, we follow lexical scoping. Essentially the implementation ignores
the modal attribute and does not load any grammar from the dynamic call
chain.
-
Our implementation of parameter passing is similar to call-by-value keyword
argument passing in a language such as LISP. We assemble the params name,val
pairs in a arg object and pass it into the called execution context. When
form-level var's are initialized and there is no expr attribute, the value
from the param object is used, Otherwise the parameters are ignored. There
is no penality for passed in unused <param> values.
-
Another question is whether a subdialog in the same document of the caller
shares the document and applications scopes of the caller. For now, we
are going to execute subdialogs in a competely different execution context.
The practical implication of this is the NO variables are shared between
caller and callee (even at the application doc level) and all communication
must be done through parameter passing and return values.
-
Another issue unresolved by the spec is the handling of results. The FIA
algorithm in the spec simply says to put the result in the field item var
(as an object), however, this does not properly trigger filled elements
as described in the "process" phase of the FIA. We implement <subdialog>
and <return> such that filled elements are properly triggered and subdialog
returns are treated as a special case of field results.
-
<catch> Element:
-
Our recognition result processing does not throw any vocabulary specific
event. This seems like an undue intrusion into the realm of the applicaiton
programmer. There are several alternate methods for handling global commands,
such as application document dialogs.
We do, however, support all <catch> of the shortcut elements,
<error>,<help>,<nomatch>,
etc.
-
Prefix matches are defined to mean the event attribute of <catch>
is a prefix of the event thrown. This is different from the language in
the specification document, though probably what is meant.
Portal Support and Extensions
The Open VXI VoiceXML interpreter incorporates a number of features to
support browser administration and support Voice Portals. In addition,
we support a small number of sttribute extensions which seem useful to
developers.
-
Default Document
Rather than build in default catch handlers into the VXI, we support
an optional 'default document' which contains these handlers. The URI of
the default document can be passed to the VXI when it is created. Only
the <catch> elements of the default document are currently
processed, dialogs and link are ignored.
-
Portal Document
The VXI allows the installation of a single 'portal document' which
is intended to support Voice Portal-based navigation. In particular, the
portal document should implement a 'hot-word', which, when spoken, returns
the caller to the portal. Full dialogs are suported in the portal document,
rather than simply a single link, to allow for confirmation of the 'hot-word'
and to prevent the user from being returned to the portal as a result of
a mis-recognition.
-
srcexpr Attribute
We support a srcexpr attribute in the
<audio>,
and <grammar>elements to allow prompts and grammars to be selected
based on dialog results. The prompting case is very useful in confirmation
dialogs, for example.
-
<value> Element in <grammar>
We intend to support the use of a <value> element as a
child of <grammar> to allow the dynamic creation of in-line
grammar fragments without requiring a server query.
Programming Reference
Alll interfaces where generated using doc++. This is available for download from source forge
VXI API
The VXI programming interface consists of four calls.
VXIresult VXIapiCreate(void* platform, void* os, VXIint32 line, void* logger,
JSIHandle runtime, void** vxip);
The VXIapiCreate creates and initializes a new VXI and associated
data structures. The VXI is a light-weight structure and VXIapiCreate
and
VXIapiDestroy are typically called once for each new call.
The channel argument contains handles on any channel specific
resources, for example, resources requied by the rec, prompts
or tel interfaces. These resources must be created and initialized
by the program using the VXI library (referred to henceforth as the 'framework').
This is typically done by calling the
CreateResource>/code> functions
of the various platform API's.
The channel argment also contains values to be used to initialize
the session.telephony variables in VoiceXML. These values should be put
in the following VXIObject properties of channel (refer to the VoiceXML
1.0 specification for the semantics of these values).
The default document can be passed in at this point as a URI under
the property
The os, logger, and line argments have been deprecated and this information
should be be transmitted via the channel.
The runtime, argument is a handle on the global JavaScript runtime
as returned by VXIjsiInit().
If a new VXI has been succesfully created VXIapiCreate returns VXIresultSUCCESS
and stores a handle through the vxip argument. This handle should be passed
into subsequent VXIapi calls.
VXIresult VXIapiDestroy(void* vxi);
This call is the reciprocal of VXIapiCreate and should be called when
a given VXI object is no longer needed. Typically this called at the end
of each call.
VXIresult VXIapiRun(void* vxi, VXIchar* url);
This is the main call to invoke the VoiceXML interpreter. The framework
is responsable for taking a call, creating and initializing a new VXI,
and determining the URL to be invoked. The url argument points to the initial
page to be interpreted by the VXI. The VXIapiRun function returns when
VoiceXML processing is complete. That is, either an <exit> element is
encountered, there is no active dialog to process, or an unhandled error
has occurred. In the former two cases, VXIresultSUCCESS is returned, in
the latter, an error code.
VXIresult VXIapiGetResult(void* vxi, VXIValue* vp);
This call returns any return value specified by the last <exit>
element as a VXIValue. The functions relating
to VXIValue can be usedto query the type of this value and convert
back to basic C/C++ data types.
Platform API Calls
This section under development. See also Integration.html.
Reference Client Usage
There are two reference clients included in this release. A console-based
client testVXI and a Win32 GUI client nt_client. At the moment these are
mainly useful for testing pieces of interpreter functionality. The command
line arguments for both are the same (indeed they share all code except
UI):
testVXI url [options]
Where url
-maxCalls n
Where n is a positive integer. The default is infinite.
-numChannel n
Where n is a positive integer. The default is 1.
-defaultDoc url Where url refers to a document containing default catch
handlers.
This document refers to files in the Open VXI release. This is available at
http://www.speech.cs.cmu.edu/openvxi .
VoiceXML is a Trademark of the VoiceXML
forum.
Copyright 2000, 2001. SpeechWorks International, Inc. All rights reserved.
Distributed under SpeechWorks Open
Document License, v1.0