Speech at CMU   |   Speech Software at CMU

CMU Speech Software



Language Factory


Mailing List

Welcome to Speech Software at CMU
These pages provide a distribution mechanism for a number of Speech related software systems developed at, hosted at or substatially used within the CMU Speech Group. These pages are part of our continuing goal to provide state of the art, stable, free software components to allow anyone to build and use speech technology systems.

Each project provides documentation, downloads, examples and discussion lists allowing developers and users to share their comments, some of the projects here have been developed many years but only recently been released as free software, others are new.

Speech Recognition

  • CMU Sphinx a collection of real-time speech recognition engines
  • SphinxTrain an acoustic model trainer and documentation for building acoustic models for the Sphinx suite of recognisers

Speech Synthesis

  • Festvox: building synthetic voices documentation, tools and techniques for building synthetic voices English and other languages, includes support for various waveform synthesis techniques: diphones, unit selection and limited domain, as well prosodic modeling, text processing, lexicons etc.
  • University of Edinburgh's Festival Speech Synthesis System a general multi-lingual TTS engine.
  • Flite a small C-based fast run-time synthesizer engine designed for servers and embedded systems. Festival/Festvox compatible.

Language Processing

  • Language Factory a set of tools for building language models for many projects.

Dialog Systems

  • CMUnicator Communicator Open Source Dialog Toolkit (CSDTk)
  • Ariadne Spoken Dialog System. Domain independent dialog toolkit for building systems to control applications by speech, runs unders Windows (uses SAPI)
  • speechlink an application-layer control protocol for transferring callers between cooperating speech applications, from Scansoft
  • openvxi a free VoiceXML browser from Scansoft


  • CMU ARCTIC 4 single speaker phonetically balanced databases of around 1200 utterances (around 40K phonemes) each, with waveform plus EGG, designed for use in speech synthesis.
  • CMU Chaplain conversational speech, 4.15 hours, close-talking microphone, 16bit, 16KHz. Hand transcribed. Recorded as role-playing between US Army Chaplains as part of the Tongues Audio Voice Translation project. Constructed by CMU and Lockheed Martin Systems Integration
  • CMU Microphone Array Database
This page is maintained by Alan W Black (awb@cs.cmu.edu)
and Kevin A. Lenzo (lenzo@cs.cmu.edu)
Hephaestus is a project within the Sphinx Group at Carnegie Mellon University