LexTool

LOGIOS Lexicon Tool


This tool generates a pronunciation dictionary from a list of (English) words in a form suitable for use with a speech recognizer, such as CMUSphinx. The Lexicon Tool uses the CMUdict dictionary along with some simple normalization and inflection rules (as detailed below) to identify a word, and uses letter-to-sound rules when all else fails.

If you simply want to see if a word can be found in CMUdict, try this tool. CMUdict is a freely-available open-source pronunciation dictionary that was developed for use in speech recognition. The current version can be found at http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict

If you notice any errors in the output, such as a seemingly incorrect pronunciation, please report it and we will figure out what to do with it.
You can send reports to air:cs'cmu,|?edu|. Keep track of developments by following @CMUSpeechGroup on Twitter.


  • For best results, list your words one to a line.
  • You can specify a "hand file" that lists your own pronunciations to include, if different from what the tool gives.


  • word file:
    hand file:




    An example

    If your input file looks something like this: Your output file should look something like this: The log will tell you the following:
    Hello
    	
    HELLO        HH EH L OW
    HELLO(1) HH AH L OW
    HELLO - Main
          
    world
    compound_word
    hyphen-ated
    ONE23
    2008
    boom!
    kwEezLebOTter
    	
    WORLD	W ER L D
    COMPOUND_WORD	K AA M P AW N D W ER D
    HYPHEN-ATED	HH AY F AH N EY T IH D
    ONE23	OW EH N IY T UW TH R IY
    2008	T UW Z IY R OW Z IY R OW EY T
    BOOM!	B UW M
    KWEEZLEBOTTER	K W IY Z L AH B AA T AH R
    	
    WORLD - Main 
    COMPOUND - Main [base] 
    WORD - Main [base] 
    HYPHEN - Main [base] 
    pronounce: verbosity is 1
    ATED - Morpheme: A TED 
    LETTER-O - Morpheme: LETTER-O 
    LETTER-N - Morpheme: LETTER-N 
    LETTER-E - Morpheme: LETTER-E 
    TWO - Morpheme: TWO 
    THREE - Morpheme: THREE 
    I think this is a non-word: 2008
    TWO - Morpheme: TWO 
    ZERO - Morpheme: ZERO 
    ZERO - Morpheme: ZERO 
    EIGHT - Morpheme: EIGHT 
    BOOM - Morpheme: BOOM 
    KWEEZLEBOTTER - By LtoS rules
    

    Please note the following:

    This tool is derived from the Logios package which allows you to input a Phoenix semantic grammar and receive a compiled grammar, an n-gram language model and a pronouncing dictionary.

    This tool currently uses cmudict-0.7b and produces pronunciations using the (currently standard) 40 item phone inventory (see above). Please note that the dictionary is updated from time to time, so you may get slighly different results over time. We hope of greater accuracy :-).