Sphinx Knowledge Base Tool


[Simple Version] Advanced Version     FAQ


Create a consistent set of lexical and language modeling files for the Sphinx-II decoder.

Sentence corpus file:

Enter the paths to an Exception Dictionary. The script will extract the vocabulary from the corpus file, but use the entries in the .handdict file to override the standard pronunciation algorithm (otherwise available through Pronunciation). You might want to use a handdict if you discover that the automatically generated pronunciations need to be tuned. (The automatic pronouncer, for example, will not always do a good job on an unusual last name.)
You may also want to add Additional Words that are not included in your sentence list, but for whatever reason you still want the decoder to be able to recognize.

Exception Dictionary (typically .handdict file): (optional)

Additional words file:


Dictionary and Language Model Parameters
(Should be left as-is for most uses.)
Pronunciations can be generated using either the "base" phone set, used by Sphinx3 and PocketSphinx, the "reduced" phone set, used by Sphinx2, or the "full" phone set, which includes deletable stops and TS. If you do not understand what this is about, do not change this default.
Base (Sphinx_40)
Reduced (Sphinx_44)
Full (Sphinx_51)
Would you like to have each sentence be delimited by <s> </s>? Other software, such as the Sphinx decoder, expects these to be part of the language model.
Yes.
No.
Choose the model type:
Bigram.
Trigram.

Not Available. For most purposes you want a trigram model, since it does a better job of capturing the constraints in your corpus. If you really want bigrams, let us know and we'll consider it.
Discount:
This is a uniform ratio discount applied to all contexts.



Alex Rudnicky
Last modified: Wed Mar 9 18:54:58 EST 2005