PocketSphinx - Sphinx for handhelds

PocketSphinx is a version of the open-source Sphinx-II speech recognition system which runs on handheld and embedded devices. It currently supports embedded Linux and Windows CE (using the GNU arm-wince-pe cross-toolchain). It is released under the same permissive license as Sphinx itself.

Where can I get it?

PocketSphinx downloads have been moved to the CMU Sphinx SourceForge Site. The latest version is 0.3. To build it, you also need SphinxBase 0.2.

This distribution contains acoustic models for connected-digit recognition and wide-band (16kHz sampling rate) microphone dictation. You can also download telephone-bandwidth models separately. To use these with raw audio data you need the following extra command-line options:

  -nfft 256
  -nfilt 31
  -lowerf 200
  -upperf 3500
  -samprate 8000

Previous snapshots can be found in this directory.

You can also obtain it from the Sourceforge Subversion repository:

svn co https://svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinxbase
svn co https://svn.sourceforge.net/svnroot/cmusphinx/trunk/pocketsphinx

If you get it from Subversion, you will need to run the autogen.sh script in the top level of the source tree before compiling.

How do I compile it?

For Linux on StrongARM, assuming you are building on an i686 Linux host and you have built and downloaded the appropriate cross-toolchain for your target distribution (softfloat for OpenZaurus, standard for Familiar and Debian). You can build a cross-toolchain fairly easily with crosstool.

For Windows NT/XP/2000, Visual C++ 6.0 project and workspace files are provided for both SphinxBase and PocketSphinx. For Windows CE, Embedded Visual C++ 4.0 project and workspace files are available, though no demonstration program is provided, and the acoustic models may not be suitable for 11025Hz sampling rates.

You must first build SphinxBase before compiling PocketSphinx. When installing it to a device, make sure you include the SphinxBase libraries (libsphinxfe, libsphinxad, libsphinxutil).

cd sphinxbase
./configure --enable-fixed --without-lapack --host=arm-softfloat-linux-gnu --build=i686-linux
make
cd ../pocketsphinx
./configure --with-sphinxbase=`pwd`/../sphinxbase --host=arm-softfloat-linux-gnu --build=i686-linux
make

You're not required to cross-compile from i686-linux! Any platform that supports GCC will do just fine. Just substitute the appropriate architecture string (e.g. powerpc-darwin) in the --build argument.

For uClinux on Blackfin:

cd sphinxbase
./configure --enable-fixed --without-lapack --host=bfin-uclinux --build=i686-linux
make
cd ../pocketsphinx
./configure --with-sphinxbase=`pwd`/../sphinxbase --host=bfin-uclinux --build=i686-linux
make

For GCC on Windows CE (FIXME: not sure where to get binaries for this anymore although it's available in the Debian and Ubuntu archives):

cd sphinxbase
./confiure --enable=fixed --without-lapack --host=arm-wince-pe --build=i686-linux
make
cd ../pocketsphinx
./configure --with-sphinxbase=`pwd`/../sphinxbase --host=arm-wince-pe --build=i686-linux
make

Does it require any particular device?

It should support most embedded Unix-like operating systems. At the moment we only have StrongARM and Blackfin devices for testing. There is some potential tweaking to do with support for the long long data type in GCC on your target platform - if you have it but it isn't fast (it's fast on PowerPC and x86, probably fast on 68k, not fast on ARM, I dunno about MIPS), you might want to disable it. For now you have to edit configure.in to do this - just remove or comment out the line that says "AC_CHECK_TYPES(long long)" and re-run autogen.sh.

If your target device is big-endian, you may want to generate big-endian model definition, senone and LM dump files. We don't have a tool to byte-swap them, though you can use PocketSphinx to re-generate them in native byte order from model files simply by running it when they don't exist.

How do I make it fast?

The default settings are not enough to achieve sub-realtime performance on most tasks. Here are some command-line flags you should experiment with:

-dsratio
In most cases -dsratio 2 gives the best performance, though accuracy suffers a bit.
-topn
The default value is 4, the fastest value is 2, but accuracy can suffer a bit depending on your acoustic model.
-wbeam
The default value is 1e-48, but 1e-35 or even 1e-30 will make things faster.
-pbeam
The default value is 1e-48, but 1e-40 should be okay too.
-lpbeam
This beam is quite important for performance. Try 1e-30 or even 1e-20.
-lponlybeam
You can also make this one fairly narrow. Try 1e-20 or 1e-30.
-maxhmmpf
This puts a soft bound on the maximum number of HMM states which will be evaluated for any given frame. This really depends on your acoustic model and the beam settings above. For the "wsj0" model, 500 will give real-time performance at the cost of some accuracy, while 800 is an optimal but slower value.
-maxwpf
This puts a hard bound on the maximum number of word transitions which will be generated for any given frame. This depends somewhat on your language model. 10 is a reasonable number.
-mmap
Setting -mmap TRUE may help in low-memory situations.

If you are using your own acoustic models, you will want to use the "mdef_convert" tool (in src/programs) to convert the model definition file to binary format.

Is it only for handhelds?

No! Actually, it's a lot faster than Sphinx-II on desktops as well. For maximum performance on machines with a floating-point unit (i.e. most of them), don't pass the --enable-fixed to autogen.sh or configure

Why did you fork the code, then?

Consider it more of a "branch" than a fork. Most of the optimizations and other modifications will be merged back into Sphinx-II if they prove to be stable and useful. However we also want the ability to remove things from the system that are not used or not useful on handhelds, and to apply optimizations that don't make sense for a general purpose system. Also, for ease of development, it's essential that PocketSphinx still be able to run on the desktop.

It doesn't work?!

It might not. No warranty! Ask me a question at the address below.