LOGIOS Lexicon Tool
This tool generates a pronunciation dictionary from a list of (English) words.
If you just want to see how a word is pronounced,
If you notice any errors in the output (such as a seemingly incorrect
pronunciation), please report it and we will look into it.
You can send reports to air:cs'cmu,|edu|.
|If your input file looks something like this left-hand column:
||Your output file will look something like this right-hand column:
HELLO HH EH L OW
HELLO(1) HH AH L OW
WORLD W ER L D
COMPOUND_WORD K AA M P AW N D W ER D
HYPHEN-ATED HH AY F AH N EY T IH D
ONE23 OW EH N IY T UW TH R IY
2008 T UW Z IY R OW Z IY R OW EY T
BOOM! B UW M
KWEEZLEBOTTER K W IY Z L AH B AA T AH R
Please note the following:
- Some words may have multiple pronunciations; these will appear on
separate line and will be differentiated by an instance id such as
"(1)". The current implementation of the Sphinx decoder expects each
dictionary entry to be unique. Note however that this tool does not
check for uniqueness, so if you include multiple instances of an input
word it will appear multiple times. As a rule you want to sort your
input files before you submit them.
- Words with internal separators such as "_" and "-" will be
rendered as a single word; the internal characters will be kept as part
of the orthographic element.
- Alpha-numeric items, as well as numbers, will be rendered
character-by-character. This is because such items are ambiguous and
can be rendered several ways (e.g., "one two three", "one
twenty-three", etc.) It is you responsibility to determine how such
items will be spoken. Typically this will vary by domain.
- Punctuation marks will be ignored
- Words that do not exist in the tool's dictionary will be
generated according to letter-to-sound rules. There is no guarantee
that such a pronunciation will be correct. You are advised to check these before use.
If you choose to manually alter pronunciations, be sure that you follow the formatting; and be sure that the phones are part of the legal set.
This tool is a component of
package which allows you to input a Phoenix grammar and receive a
compiler grammar, an n-gram language model and a pronouncing
The tool currently accesses cmudict.0.7a
and produces pronunciations using the (currently standard) 40 item phone inventory.
Please note that the dictionary may be updated from time to time, we hope in the direction of greater accuracy :-).