| Name | Last modified | Size | Description | |
|---|---|---|---|---|
| Parent Directory | - | |||
| README.html | 06-Nov-2009 23:08 | 3.9K | ||
| domain.corpus | 06-Nov-2009 23:08 | 67 | ||
| domain.ctl | 06-Nov-2009 23:08 | 67 | ||
| domain.def | 06-Nov-2009 23:08 | 345 | ||
| domain.dic | 06-Nov-2009 23:08 | 924 | ||
| domain.lm | 06-Nov-2009 23:08 | 868 | ||
| domain.sent | 06-Nov-2009 23:08 | 84 | ||
| domain.sent.arpabo | 06-Nov-2009 23:08 | 868 | ||
| domain.token | 06-Nov-2009 23:08 | 48 | ||
| domain.vocab | 06-Nov-2009 23:08 | 38 | ||
| domain.words | 06-Nov-2009 23:08 | 261 | ||
| sphinx2-simple-class | 06-Nov-2009 23:08 | 726 | ||
| turtle-class-lm-2001-03-29.tgz | 06-Nov-2009 23:08 | 3.0K | ||
This is an example 'turtle' domain with a class-based language model.
For this example, the contents domain.corpus
were given to the
lmtool
web-based language model building
tool:
GO [direction] [number] [unit]
ROTATE [direction] [number] [unit]
This will cover a small language of things like
GO FORWARD TEN METERS and
ROTATE RIGHT FORTY FIVE DEGREES.
After the corpus is run through the
lmtool,
the tarball is unpacked. Here, the relevant files have all been renamed to
domain.something, where the something is a typical
three-letter extension.
sphinx2-simple-class
is a shell script to invoke the sphinx2-continuous executable
with an appropriate argument list:
#!/bin/sh
S2CONTINUOUS=/usr/local/bin/sphinx2-continuous
HMM=/usr/local/share/sphinx2/model/hmm/6k
TASK=.
$S2CONTINUOUS -live TRUE -ctloffset 0 -ctlcount 100000000 -cepdir ${TASK}/ctl -datadir ${TASK}/ctl -agcmax FALSE -langwt 6.5 -fwdflatlw 8.5 -rescorelw 9.5 -ugwt 0.5 -fillpen 1e-10 -silpen 0.005 -inspen 0.65 -top 1 -topsenfrm 3 -topsenthresh -70000 -beam 2e-06 -npbeam 2e-06 -lpbeam 2e-05 -lponlybeam 0.0005 -nwbeam 0.0005 -fwdflat FALSE -fwdflatbeam 1e-08 -fwdflatnwbeam 0.0003 -bestpath TRUE -kbdumpdir ${TASK} -lmctlfn ${TASK}/domain.ctl -dictfn ${TASK}/domain.dic -ndictfn ${HMM}/noisedict -phnfn ${HMM}/phone -mapfn ${HMM}/map -hmmdir ${HMM} -hmmdirlist ${HMM} -8bsen TRUE -sendumpfn ${HMM}/sendump -cbdir ${HMM}
In particular, note the following arguments:
-lmctlfn domain.ctl
Language model control file name. See
domain.ctl as an example.
-dictfn domain.dic
Dictionary file name; here,
domain.dic.
domain.ctl looks like this:
{ domain.def }
domain.lm general {
[direction]
[number]
[unit]
}
domain.def) class definition
file. This can be a fully qualified path, such as
{ /usr0/lenzo/task/domain.def }
domain.lm is an
ARPA back-off language model (arpabo model) in the current
working directory.
general.
.def file:[direction],
[number], and
[unit].
- The
domain.def file might
look like the following (note that there is no white space within
the tokens):
LMCLASS [direction]
FORWARD
BACKWARD
RIGHT
LEFT
END [direction]
LMCLASS [number]
ONE
TWO
THREE
FOUR
FIVE
SIX
SEVEN
EIGHT
NINE
TEN
ELEVEN
TWELVE
THIRTEEN
FOURTEEN
FIFTEEN
SIXTEEN
SEVENTEEN
EIGHTEEN
NINETEEN
TWENTY
THIRTY
FORTY
FIFTY
SIXY
SEVENTY
EIGHTY
NINETY
HUNDRED
THOUSAND
END [number]
LMCLASS [unit]
METER
METERS
DEGREE
DEGREES
END [unit]