Version 2 of the toolkit is the most up-to-date version publicly available.
Version 1 of the toolkit is also available.
Note: The CMU SLM toolkit is meant for large amounts of training data. If you intend to train a language model from a few dozen or even hundred sentences, please refer to the lmtool.
The Carnegie Mellon Statistical Language Modeling (CMU SLM) Toolkit is a set of unix software tools designed to facilitate language modeling work in the research community.

Some of the tools are used to process general textual data into:

Others use the resulted language models to compute: Future versions may include support for other modeling schemes, such as Deleted Interpolation, and for adaptive language modeling (e.g. caches). In addition to their primary usage, the tools are also meant to be used as building blocks for new experimental language models.
