ESPER: Extracting Speaker Information From Children's Stories for Speech Synthesis
ESPER is a component of the StoryTeller project, which focuses on speech synthesis for children's stories. Funded by the NSF, the project is a collaboration between the Language Technologies Institute(LTI) at CMU and Center for Spoken Language Understanding (CSLU) at OGI.
Within the framework of rendering children's stories as synthetic speech, we are looking at greater text analysis in order to better choose appropriate voices in synthesis. This involves finding (and defining) appropriate markup for children's story text that is sufficient for modelling of intonation, tagging and parsing of the text, as well as discovering what aspects of language make detectable effects on intonation prosody.
In order to narrate a children's story using a variety of synthesized voices, ESPER steps through a number of stages to assign an appropriate character voice to each piece of spoken text:
ESPER is implemented within the Festival Speech Synthesis framework. Although ESPER itself does not speak, it will be a component of the larger storyteller system. Festival also provides much of the infrastructure that detailed text analysis requires: such as controllable, punctuation and tokenization, part of speech tagging, utterance representation, well-defined extraction of data for machine learning techniques. In addition, we also made use of Festival's XML support.