The CMU Sphinx Group Open Source Speech Recognition Engines

Speech at CMU   |   Sphinx at SourceForge

Introduction

General Documentation

CMUSphinx Components

Common library

Decoders

Acoustic Model Training

Language Model Training

Utilities


Latest News

PocketSphinx 0.5.1 and SphinxBase 0.4.1
2008-11-29 22:42
Read More »

PocketSphinx: 0.5 release
2008-07-08 16:02
Read More »

cmudict.0.7a release
2008-02-19 18:22
Read More »

Site news archive »


External Links

Notice: if you have comments about the links below, please contact the authors directly.

CMU Sphinx documentation Wiki

Language Model

The language model is the component of a speech recognizer which decides whether a particular sequence of words is a likely utterance in some human language. In the so-called FundamentalEquation of speech recognition:

\begin{displaymath}
\hat S = \arg\max_S P(X|S,\lambda) P(S|\lambda)
\end{displaymath}

The language model is the source of the PriorProbability $P(S|\lambda)$, that is, the language model tells us what the probability of a sentence S is, given the model $\lambda$.

In most medium to large vocabulary speech recognition systems, a so-called N-Gram language model is used. This type of model is based on the fact that, given the ChainRuleOfProbability, the probability of a sentence can be decomposed into the probability of each word given all of the previous words, or the history. The N-Gram model is based on the assumption that only the last N-1 words are actually relevant in predicting the current word. As linguists we know this to be completely false, but in practice it works very well.

Why is this? Primarily, the N-Gram model is very easy to train from large amounts of unannotated or minimally unannotated text.

LanguageModel (last edited 2008-02-21 03:34:54 by localhost)

SourceForge.net Logo This page is maintained by David Huggins-Daines ()
CMUSphinx is a project within the Sphinx Group at Carnegie Mellon