All models use the new "40 phone" phoneset and are compatible with CMUDict 0.6d and 0.7, as well as the Sphinx Knowledge Base Tool. These models are released under the same permissive license as Sphinx-3. Although the databases they are trained from are not freely redistributable, as far as we know, this does not affect models trained on them.
These models are packaged for use with Sphinx-3 and PocketSphinx. Unpack them using tar and take note of the directory that is created by doing so. To use them, you will specify this directory in your configuration file, or on the command line, using the -hmm argument. This will ensure that the proper feature extraction parameters are picked up by the decoder.
The mixture_weights file is not necessary in order to use the semi-continuous models with PocketSphinx. Since it is quite large, you may choose to omit it in cases where storage is limited. We have included it because it is (currently) necessary in order to do acoustic model adaptation.
If you use these continuous density models with PocketSphinx you may encounter segfaults. This is a known bug which will be fixed in the next release. To solve it, you need to edit the noisedict file and remove all lines where the pronunciation of a word contains phones other than SIL or noise phones (ones starting and ending with +). Alternately, here is a corrected noise dictionary. Copy it over the noisedict file.
While Sphinx-3 supports MLLT feature transformations, version 0.7 does not find them automatically in the acoustic model directory. Therefore, for this version, in order to use the wideband WSJ models below, you will need to add this argument to the configuration or command line:
-lda wsj_all_cd30.mllt_cd_cont_4000/feature_transform
If you aren't running the directory from the same directory as the model, you will need to add the full path to it.
Note also that if you intend to use the semi-continuous models with Sphinx-3, you must add these arguments to your configuration:
-feat s2_4x -senmgau .s2semi.
Important note: These continuous-density acoustic models are very large and will not run in real-time with the standard set of parameters. Unfortunately, Sphinx-3 has a large number of tunable options for speeding things up, and tuning them is something of a black art. We suggest starting with the following settings:
-subvq Communicator_40.cd_cont_4000/subvq -beam 1e-60 -wbeam 1e-40 -ci_pbeam 1e-8 -subvqbeam 1e-2 -maxhmmpf 2000 -maxcdsenpf 1000 -maxwpf 8 -ds 2
Important note: See note above about the -lda argument to Sphinx-3
Important note #2: These continuous-density acoustic
models are very large and will not run in real-time with the standard
set of parameters. Unfortunately, Sphinx-3 has a large number of
tunable options for speeding things up, and tuning them is something
of a black art. We suggest starting with the following settings:
-subvq wsj_all_cd30.mllt_cd_cont_4000/subvq -beam 1e-80 -wbeam 1e-60 -subvqbeam 1e-2 -maxhmmpf 2500 -maxcdsenpf 1500 -maxwpf 20
Important note: These models use an older (44-phone) phoneset, so you must use the Advanced Sphinx Knowledge Base Tool instead (select "Reduced (Sphinx_44)" in the dictionary and language model parameters).
Maintained by David Huggins-Daines Last modified: Wed Mar 19 15:36:12 EDT 2008