ESPER: Extracting Speaker Information From Children's Stories for Speech Synthesis

| [Home] | [People] | [Publications] | [Demo] | [ESPER Download] | [Links] | [Contact]

ESPER is a component of the StoryTeller project, which focuses on speech synthesis for children's stories. Funded by the NSF, the project is a collaboration between the Language Technologies Institute(LTI) at CMU and Center for Spoken Language Understanding (CSLU) at OGI.


Within the framework of rendering children's stories as synthetic speech, we are looking at greater text analysis in order to better choose appropriate voices in synthesis. This involves finding (and defining) appropriate markup for children's story text that is sufficient for modelling of intonation, tagging and parsing of the text, as well as discovering what aspects of language make detectable effects on intonation prosody.

In order to narrate a children's story using a variety of synthesized voices, ESPER steps through a number of stages to assign an appropriate character voice to each piece of spoken text:

At each processing step, ESPER encapsulates all the acquired speech information in a markup format such as HTML, Sable (an XML-based speech synthesis markup language), and CSML, (Childrens Story Markup Language), a specially-created Markup language for speech information in children's stories.

ESPER is implemented within the Festival Speech Synthesis framework. Although ESPER itself does not speak, it will be a component of the larger storyteller system. Festival also provides much of the infrastructure that detailed text analysis requires: such as controllable, punctuation and tokenization, part of speech tagging, utterance representation, well-defined extraction of data for machine learning techniques. In addition, we also made use of Festival's XML support.

Updated: 17-May-2003
Web Comments, Email Jason Zhang