CMU Communicator

References and Manuals

1 main | 2.1 requirements | 2.2 install | 3.1 manuals | 3.2 quick guide | 4.1 downloads | 4.2 quick fixes | 5 links

C M U nicator Overview

The Communicator Travel Planning system is made up of 10 separate modules. The source code for these modules is available as part of the CMU Spoken Dialog System Toolkit. The purpose of the current document is to provide an overview of what each module does. In some cases greater detail may be provided in technical articles produced by the Carnegie Mellon Communicator group, as cited below. The goal of the present document is not to provide detailed documentation for the individual modules but rather to explain the purpose of the module in the overall system and to describe  the principles on which it operates.

Gentner/Gentner Emulator

Location
cmu/servers/src/gentner/

Summary:
The Communicator system is session-oriented. That is, interactions with users take place over a specific channel, over a defined interval of time. This module is used to initiate and terminate a session. Initiation and termination are typically under the control of the user: who can either call up the system on the telephone or click the Start button in the Emulator application's window; they end the session by hanging up or by clicking on the Stop button. Occasionally the system may decide to hang up on a particular user (if it judges that the session has gone irreparably astray). In practical terms, a "session" causes the decoder to start listening and a new session log to be created. The session log includes an event-level trace of the call and recordings of the speech produced by the user.

If you are using the system in desktop mode (which is the default in the distribution) calls are controlled through a panel on the screen.

If you are using the system in telephony mode, you need to install the necessary hardware. In either case, the audio interface is the same and is the standard sound board in your computer. The additional hardware consists of an echo-cancellation device and a serial interface box that allows the computer to communicate with the echo canceller. We have been happy with the Gentner DH20 device.

back to the top

Sphinx/Listener

Location:
cmu/servers/src/sphinx/

Summary:
The Listener segments the input stream into utterances, determines whether a candidate utterance should be attended to (potentially triggering a barge-in).

Sphinx decodes the input utterance to produce a top-1 hypothesis and adds a confidence marker to each word in the hypothesis. Sphinx interacts with other modules. Specifically, it will, on barge-in, send a signal to the synthesis module to stop speech output; it will accept language-model switching messages from the dialog manager (the system uses state-specific language models to improve recognition accuracy) and it will send the final decoding to the Phoenix parser module.

Sphinx needs to be configured with a domain-appropriate set of acoustic models, lexical models and language models (these are included in the distribution). The Communicator language model is class-based, so in fact it comes in two components, a trigram language model and a set of class membership definitions. Lexical items in classes have intra-class probabilities specified. In addition there is a set of lexical models (the dictionary) together with a configuration file that specifies (e.g.) search parameters for the decoder.

back to the top

Phoenix

Location:
cmu/servers/src/phoenix/

Summary:
Phoenix parses the decoding it receives from Sphinx, using a semantic grammar. Phoenix can potentially produce multiple parses. Parses consist of hierarchical slot arrays, with the slots corresponding to semantic entities in the domain ontology. The ontology is simple and is best thought of as a type hierarchy. The structure of the grammar directly mirrors the ontology and additionally includes (domain-independent) discourse concepts, for example "yes" and "no". Together these define the expected user language for the domain.

Note that not all concepts used in the system need to have a corresponding expression in (spoken) language: some can be used to semantically code other events of interest to the system. These may include events in other modalities (such as pointing gestures) or asynchronous events generated in the domain (for example alarms). The Communicator system does not make use of such events but other CMU systems (e.g., LARRI) do.

A note on language development: It's been our observation that while at first seemingly highly variable, human language in well-defined domains is for practical purposes limited: there are only so many ways to express in-domain information. Observed language will be a function of the domain, the user population and the skill with which the system provides guidance on use and manages exploratory behavior. In practical terms, a high-coverage grammar can be constructed from a combination of transcribed in-domain speech and reuse of common sub-languages such as those for dates, times and numbers. The Communicator Travel Planning grammar is based on an ATIS grammar augmented through the analysis of an approximately 20,000 utterance Communicator corpus.

back to the top

Helios

Location:
cmu/servers/src/helios

Summary:
Helios is the "post-parser". At present its role is to assess the level of confidence for an incoming parse using information from the decoder, parse and dialog levels of the system. The accordingly annotated parse is then sent to the dialog manager. While Helios makes use of learned parameters to assign confidence, there are no domain-specific components in it.

Helios is also the locus for multi-modal integration. The Communicator system does not use multi-modal input;  the LARRI system at Carnegie Mellon, based on the same architecture, incorporates a multi-modal Helios module that integrates speech and manual inputs. 

back to the top

Dialog Manager

Location:
cmu/servers/src/dm_server

Summary:
The Communicator Dialog Manager implements the Carnegie Mellon AGENDA dialog manager. The module contains an execution engine, and a handler  library. The library is domain-specific and contains individual handlers and handler (sub-)trees, both of which are assembled into a dynamic product tree over the course of a session. The Engine additionally manages the dialog agenda which controls the interpretation of user inputs.

Handlers are implemented as C++ objects and incorporate logic for interpreting particular inputs, interacting with domain agents or for managing child-nodes in the product tree. The product is built up dynamically over the course of a session; as a consequence the system does not follow a dialog "script" in the conventional sense, rather the sequence of interactions is determined by (legal) extensions to the product and by user topic-focusing behavior. The Dialog Manager focuses on the task and discourse aspects of the dialog and performs minimal domain-specific reasoning, which is primarily located in the ABE module. You can read a fuller description of the AGENDA dialog manager.

back to the top

DateTime

Location:
cmu/servers/src/datetime3

Summary:
DateTime interprets temporal expressions in user input and resolves them to absolute dates and times. DateTime has knowledge of holidays commonly observed in the United States. It does not however provide full coverage for religious holidays (particularly those that require computations based on external events). The module operates on (date and time) fragments of the input parse and makes use of context information (such the current time and date) to resolve relative expressions (e.g., "tomorrow"). The module also maps numeric expressions.

While the DateTime module was developed specifically to cover expressions encountered in the travel domain, the date and time (and number) sub-languages appear to be largely domain-independent, so this module can be easily used as is in new domains.

back to the top

ABE (Airline Back End) and database

Location:
cmu/systems/abe

Summary:
ABE performs a variety of domain-specific functions and is in some sense the "application" that the dialog system interfaces to. The functions include access to information in the system database, retirieval of information on the web and domain-specific reasoning.

ABE interfaces to web-based resources to obtain information about flights and hotels. Information includes schedules and prices for flights and locations, prices and availability for hotels. The information is live although some of it is also cached for varying durations. ABE also incorporates domain-specific reasoning to deal with, for example, the resolution of ambiguous references ("Is that Portland in Maine or Portland in Oregon?") and managing solution sets (for example, ranking flights on "desirability").

ABE interacts with the database, which contains geographical information (about 500 world-wide destinations) and information about airlines. The database also contains information about how users might refer to various entities in the domain (for example airport names) and information about how the system should in turn refer to entities when speaking to the user.

back to the top

Profile

Location:
cmu/systems/profile

Summary:
The Profile module manages information about individuals known to the Travel Planning system. The user profile notes various preferences (for airlines or hotels; where to email confirmation of a itinerary, calling frequency, etc.) The information is kept in a database. The profile feature is used in the Carnegie Mellon system to manage personalization. It is disabled in the current distribution of the system. (However it can be activated if desired.)

back to the top

Rosetta

Location:
cmu/systems/nlg

Summary:
Rosetta is the language generation module. It receives semantically coded requests from the dialog manager and computes a corresponding word string that can be spoken. Rosetta incorporates two generation strategies: templates and stochastic. Rosetta also makes use of information in the database to obtain expressions for particular domain entities (for example airport names). The stochastic generation component makes use of language models built from a corpus of transcribed travel-agent speech to generate natural language for output expressions common to human travel agents and the dialog system. Other output (such as greetings or error notifications) are handled by template.

back to the top

Festival

Location:
festival

Summary:
Speech synthesis is done using the Festival system in a limited domain mode. Festival is a concatenative synthesizer. That is, a database of recorded human speech is used to create requested outputs by selecting appropriate units from the database and combining these by splicing. Limited-domain means that the database was recorded expressly for this application and contains complete forms of frequently encountered items (for example city names). This minimizes the need for intra-word splicing and consequently results in higher-quality output speech. 

back to the top

Process Monitor

Location:
cmu/servers/src/pmonitor-10-18-01

Summary:
The system can be (should be) started up and brought down using the Process Monitor application. The Process Monitor references a list to bring up (in sequence) the different modules in the system. It additionally monitors system processes and can restart any that have died. It will optionally email someone if it doesn't succeed in restoring the system to operational status. 

back to the top

Implementation Notes

The Travel Planning systems uses the Galaxy architecture for inter-module communication and for logging. The core modules of the system are implemented in C/C++ and uses native Galaxy messages to communicate information. The nature of the massages and their routing is noted in the "hub" program. Some of the modules, in particular ABE, is implemented in Perl. We found it convenient to isolate these modules, so the actual implementation consists of the module itself plus a Galaxy-based proxy that manages communication between the module and the rest of the (Galaxy-based) system.

back to the top


Please email all comments and feedbacks to Yitao Sun.