PocketSphinx is a version of the open-source Sphinx-II speech recognition system which runs on handheld and embedded devices. It currently supports embedded Linux and Windows CE (using the GNU arm-wince-pe cross-toolchain). It is released under the same permissive license as Sphinx itself.
PocketSphinx downloads have been moved to the CMU Sphinx SourceForge Site. The latest version is 0.3. To build it, you also need SphinxBase 0.2.
This distribution contains acoustic models for connected-digit recognition and wide-band (16kHz sampling rate) microphone dictation. You can also download telephone-bandwidth models separately. To use these with raw audio data you need the following extra command-line options:
-nfft 256 -nfilt 31 -lowerf 200 -upperf 3500 -samprate 8000
Previous snapshots can be found in this directory.
You can also obtain it from the Sourceforge Subversion repository:
svn co https://svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinxbase svn co https://svn.sourceforge.net/svnroot/cmusphinx/trunk/pocketsphinx
If you get it from Subversion, you will need to run the autogen.sh script in the top level of the source tree before compiling.
For Linux on StrongARM, assuming you are building on an i686 Linux host and you have built and downloaded the appropriate cross-toolchain for your target distribution (softfloat for OpenZaurus, standard for Familiar and Debian). You can build a cross-toolchain fairly easily with crosstool.
For Windows NT/XP/2000, Visual C++ 6.0 project and workspace files are provided for both SphinxBase and PocketSphinx. For Windows CE, Embedded Visual C++ 4.0 project and workspace files are available, though no demonstration program is provided, and the acoustic models may not be suitable for 11025Hz sampling rates.
You must first build SphinxBase before compiling PocketSphinx. When installing it to a device, make sure you include the SphinxBase libraries (libsphinxfe, libsphinxad, libsphinxutil).
cd sphinxbase ./configure --enable-fixed --without-lapack --host=arm-softfloat-linux-gnu --build=i686-linux make cd ../pocketsphinx ./configure --with-sphinxbase=`pwd`/../sphinxbase --host=arm-softfloat-linux-gnu --build=i686-linux make
You're not required to cross-compile from i686-linux! Any platform that supports GCC will do just fine. Just substitute the appropriate architecture string (e.g. powerpc-darwin) in the --build argument.
For uClinux on Blackfin:
cd sphinxbase ./configure --enable-fixed --without-lapack --host=bfin-uclinux --build=i686-linux make cd ../pocketsphinx ./configure --with-sphinxbase=`pwd`/../sphinxbase --host=bfin-uclinux --build=i686-linux make
For GCC on Windows CE (FIXME: not sure where to get binaries for this anymore although it's available in the Debian and Ubuntu archives):
cd sphinxbase ./confiure --enable=fixed --without-lapack --host=arm-wince-pe --build=i686-linux make cd ../pocketsphinx ./configure --with-sphinxbase=`pwd`/../sphinxbase --host=arm-wince-pe --build=i686-linux make
It should support most embedded Unix-like operating systems. At the moment we only have StrongARM and Blackfin devices for testing. There is some potential tweaking to do with support for the long long data type in GCC on your target platform - if you have it but it isn't fast (it's fast on PowerPC and x86, probably fast on 68k, not fast on ARM, I dunno about MIPS), you might want to disable it. For now you have to edit configure.in to do this - just remove or comment out the line that says "AC_CHECK_TYPES(long long)" and re-run autogen.sh.
If your target device is big-endian, you may want to generate big-endian model definition, senone and LM dump files. We don't have a tool to byte-swap them, though you can use PocketSphinx to re-generate them in native byte order from model files simply by running it when they don't exist.
The default settings are not enough to achieve sub-realtime performance on most tasks. Here are some command-line flags you should experiment with:
If you are using your own acoustic models, you will want to use the "mdef_convert" tool (in src/programs) to convert the model definition file to binary format.
No! Actually, it's a lot faster than Sphinx-II on desktops as well. For maximum performance on machines with a floating-point unit (i.e. most of them), don't pass the --enable-fixed to autogen.sh or configure
Consider it more of a "branch" than a fork. Most of the optimizations and other modifications will be merged back into Sphinx-II if they prove to be stable and useful. However we also want the ability to remove things from the system that are not used or not useful on handhelds, and to apply optimizations that don't make sense for a general purpose system. Also, for ease of development, it's essential that PocketSphinx still be able to run on the desktop.
It might not. No warranty! Ask me a question at the address below.