School of Computer Science SPHINX SPEECH GROUP
School of Computer Science
Carnegie Mellon University, Pittsburgh PA 15213-3891

THE SPHINX-II OCX DISTRIBUTION
System Documentation

by Paul C. Constantinides
9 July, 1999


1.0 What's provided in the download and how do I get it?

The files contained in the sphinxocx.zip distribution are:

NT Download

sphinxocx.zip -- contains the files listed above. The files are archived using zip (you can use a program like WinZip to extract them). Download this fila and unzip it on your computer.

NOTE: The OCX released in this distribution works only under NT (not Windows 95 or 98). This is due to a differing threading model between the operating systems. We might offer a version for these other operating systems in the future.

In the future we plan to release other distributions that include:

ATTENTION LINUX USERS: We will also be releasing an RPM distribution for Sphinx-II. Check back for updates.

2.0 How do I add the Sphinx OCX to my application?

In Windows, you can use OCX controls under a variety of different programming environments. We provide overview descriptions of how to use the Sphinx OCX control within three different frameworks: The interface of the control is fixed, and consistent across these different environments. Additionally, the general procedure for using the control maps between the different environments. Integrating the Sphinx control into your already existing application, or building a new application with the Sphinx control, is easy using the step-by-step descriptions provided below.

Registering the Control

To use the Sphinx control on your system, it must first be registered. This can be done by invoking the registration function; we've provided the regsvr32.exe program, which does this (this application is also distributed with Windows).

To register the Sphinx control on your system, drag and drop the sphinx.ocx file onto the regsvr32.exe program. This will add the appropriate entries in the system registry for the Sphinx control. It's important to keep in mind that the path information of the OCX file is added to the registry when the object is registered.

NOTE: If you move the sphinx.ocx file without re-registering, it will not work properly.

All Component Object Model (COM) objects such as OCX or ActiveX controls support the IUnknown interface), which permits the objects to export their specific signature in a well defined manner. As part of this structure, COM objects are also self-registering. In other words, these objects implement DllRegisterServer and DllUnregisterServer that are called by the regsvr32.exe program. These functions create or update the necessary system registry entries. Further details of the IUnknown interface, are well documented in the Microsoft support documentation.

Below, we describe the process for building a new application that integrates the Sphinx control, for several different development environments. The goal is to show the necessary steps for integrating the control, and to point out how the speech recognition results are provided from the Sphinx control.

2.1 Visual Basic

Visual Basic is the easiest environment for creating an application that uses the Sphinx control. The major steps are:
  1. adding the component to your project
  2. adding the component to the form of your application
  3. writing the code to use the component in your application
The completed project that's described in this section is also available for download: vbdemo.zip

Create a VB Application that Includes Sphinx

  1. To begin, start up Microsoft Visual Basic and choose Standard EXE from the New tab in the dialog that pops up. If no dialog pops up at the start, choose New Project from the File menu to create a new project.
  2. Under the Project menu choose Components… (You can also get here by right-clicking the toolbar on the side of the screen and choosing Components… from the menu, or by typing CTRL-T.)
  3. Scroll through the list of components that pops up in the Controls tab of the dialog box, and check the box next to Sphinx ActiveX Control module (or something along those lines). If it's not there, be sure you successfully registered the sphinx.ocx file (see Registering the Control under Section 2.0).
  4. Once you've checked the box next to Sphinx, click OK. You should now have a Sphinx icon on the toolbar. Click this icon, then draw a square on your form. It will add an instance of the Sphinx control to your application (with the same icon shown below). You're now ready to use Sphinx in your application.

     

Using Sphinx in your Application

Now that you have the Sphinx control included into your application, you can use the control to do various things. Here, we'll explain how to get the result of a speech recognition into your application.
  1. By double clicking on the form you've created (on an open area, not over a button or control), the FormLoad subroutine will be automatically generated. In this subroutine, add a call to the Sphinx initalization function by typing:

    sphinx1.init

    Now, this function will be called whenever the application is loaded.

  2. Return to the form view, and double click on the Sphinx icon that you drew on the form. A handler function for the UtteranceResult event will be generated. The parameter Result contains the result of the decoding.
  3. For this example project, you can pop up a message box with this text by adding the following line to the Sphinx_UtteranceResult subroutine:

    MsgBox Result

  4. The last thing to do before running your application is to be sure that the control's properties are set correctly. To do this, go to the form view, and select the Sphinx icon on your applications form. There should be a properties window on the screen that displays the values of the properties of the control (if not, you can get to it by selecting Properties Window… from the View Menu or by typing F4).
  5. Check that the argfile property correctly points to the .arg file provided in this distribution. We'll discuss how to modify and customize this file in Section 5.0.
    Once this file is customized, your application will be ready to run.
To find out more about the other events, methods, and properties that the Sphinx OCX exports, please refer to Section 3.0.

2.2 Visual C++

You can also make a speech application using C++ in the Microsoft Visual C++ environment. This steps of this process are almost identical those in Visual Basic. The completed project that's described in this section is also available for download: vcdemo.zip

Create a Visual C++ Project That Includes Sphinx

  1. To begin, load Visual C++ and create a new project by chosing New from the File menu, and select the Projects tab.
  2. Select an MFC AppWizard(exe), and name your project along with specifying the path you want.
  3. In this sample, we'll make a dialog based application, so select the Dialog based radio box.
  4. In the next set of options be sure that the Automation checkbox is selected, and that your dialog is properly titled, the other defaults should be acceptable.
  5. The rest of the defaults should be okay for this sample, so click Finish. Developer's Studio will generate some application files for you automatically. The code that's been generated is the skeleton code for the dialog application.
  6. The next step is to add the Sphinx control to your program. To do this select Add to Project from the Project menu. Choose Components and Controls.
  7. Double click the Registered ActiveX Controls folder to find the Sphinx control.
  8. Select the Sphinx control and click Insert. Select OK when prompted whether you want to insert the control.

  9. In the next window, be sure the the CSphinx class is selected, and click OK to continue. Close the Components and Controls Gallery window to continue. A new pair of files have been automatically generated and added to your project. Under the class view, you will see the CSphinx class that implements the calls to Sphinx OCX control. Sphinx is now added to your project.
  10. To add Sphinx to your application, you need to select the newly added Sphinx icon from the toolbar and drag a box on your form. This may just show as a white area, but that's okay. This is where the Sphinx icon will show up on your application when it runs.
  11. Now that the Sphinx control has been added to the form of your dialog box, you can add a member variable for that control. To do this, load the Class Wizard from the View menu (or use Ctrl-W). Select the Member Variables tab. Under the class for the dialog box (suffixed by Dlg), there should be an entry with a control ID that contains SPHINX (most likely, it will be something like IDC_SPHINXCTRL1). Select this entry and click Add Variable. Name your variable something appropriate (like m_sphinx); be sure that the Category for the item is Control, and that the type is CSphinx. Click OK to add the variable, then OK to dismiss the MFC Class Wizard. In your Dlg class, you now have a member variable for the Sphinx control.

Use the Sphinx Control In Your Application

Now we will discuss how to get your application to use Sphinx to trap the UtteranceResult event, and use the output from the speech recognition.
  1. First, you need to add the necessary initialization code for the Sphinx object. In your Dlg class, there will be an initialization function called OnInitDialog. Near the end of the function, there will be a comment indicating where to put your initialization code, e.g.:

    // TODO: Add extra initialization here

    In this section, you want to set the important properties that Sphinx needs to have set before it can start listening. You can also initialize the decoder to load the models and start listening. This involves writing a few lines of code (substitute the path given, with the correct one for your files):

    /* This line points Sphinx to the absolute path of
     * the argument file for the decoder. Creating and
     * modifying this file is discussed in
     * Section 5.0.
     */

    m_sphinx.SetArgFile("c:\\sphinx\\demo\\sphinx.arg");

    /* This line points Sphinx to a file it will use
     * to store the log from the decoder.
     */

    m_sphinx.SetLogFile("c:\\sphinx\\demo\\sphinxocx.log");

    /* This funciton tells Sphinx to load the decoder
     * models and start the decoding thread for
     * speech recognition. The details of this
     * function, and other related functions are
     * discussed in Section 3.0.
     */

    m_sphinx.StartListening();

  2. Assuming that these properties are correctly set, the next step is to add a handler to trap the UtteranceResult event. To do this, load the Class Wizard again (Ctrl-W), and select the Message Maps tab. Under the Dlg class, select the object ID corresponding to Sphinx (e.g. IDC_SPHINXCTRL1). Select the UtteranceResult message, and click AddFunction. Name the function appropriately (the default is good), and click OK. In the Class Wizard click Edit Code to go to the function. This function will be called whenever the Sphinx control fires an UtteranceResult message.

  3. In this handler function, you can do whatever you want with the Result string that's passed as a paramenter. This string is the top scoring recognition result for the decoded speech. For this application, you can launch a message box displaying the result by adding the line:
    AfxMessageBox(Result);
  4. Now, you're ready to save your project and compile. Save your project (under the File menu). Under the Build menu, select the Build… option.
  5. Before running your application, refer to Section 5.0 to make sure that your argument file is correctly configured. Your project won't work correctly until this file is configured.
To find out more about the other events, methods, and properties that the Sphinx OCX exports, please refer to Section 3.0.

NOTE: An important difference between Dialog apps and MDI or SDI apps is that in the latter, the CSphinx::Create function needs to be explicitly called in the initialization code of the object containing the Sphinx object. In Dialog apps, this function does not need to be called, as it is replaced by having the object on the form.

2.3 Visual J++

Coming soon...

3.0 How do I use the Sphinx OCX control?

In this section we deal with the details of the Sphinx control interface. It should be clear, from the examples above, how to get Sphinx into your application. Now, we will explain more precisely how the interface works, and point out uses for the exported handles.

NOTE: If you are interested in using Sphinx-II in batch mode, you can do this by specifying a control file in the argfile (see the Sphinx-II FBS8 User Guide for more details on this). When the argfile is loaded (either with a call to Init or a call to StartListening), the batch process will begin.

Events

The Sphinx OCX fires an event when certain things happen inside it. You can create handler code that will execute whenever these events are fired.

NewUtterance -- fired when the user has started speaking. This can be useful if you want to have an indicator letting the user know the system has detected that they've started speaking.
EndUtterance -- fired when the user has finished speaking.
StartListening -- fired when Sphinx is ready to start listening. This can be useful if you want an indicator displaying the state of the Sphinx control.
StopListening -- fired when Sphinx has stopped listening.
UtteranceError -- fired when something's wrong. This shouldn't ever fire, so it shouldn't ever be useful.
UtteranceResult -- fired when Sphinx has finished processing something the user said. It returns the text result in the Result parameter.
UtterancePartialResult -- fired at intervals set by the PartialResultInverval property, only while Sphinx is decoding. It returns the text of the best scoring result so far in the in the Result parameter. You can use this function to keep a running display of what Sphinx thinks the user is saying.

Methods

The Sphinx OCX has various methods (functions) you can call to get it to do things.

Init -- Tell Sphinx to load all of the model files for the decoder. This is a blocking call and takes about a minute to load all of the files.
StartListening -- Tell it to start listening to the user's voice. The first time you call this, it will take about a minute to load all the models (if Init has not already been called).
StopListening -- Tell it to stop listening to the user's voice.
ToggleListening -- If it's listening, this is the same as StopListening; if it's not, it's the same as StartListening.

There are other methods, but they aren't useful for simple applications, so you can safely ignore them. (Other method descriptions coming soon.)

Properties

ArgFile -- this should be set to the path and file name of the argument file containing the configuration for Sphinx. For more details see Section 5.0
LogFile -- this specifies the path fo the file where the Sphinx OCX will output various log information. It contains a lot of information that's not generally useful, unless you really know about Sphinx. It is a good source of information if your application died because of Sphinx. See Section 6.0 for more details.
LogDirectory -- if this is specified, Sphinx OCX will output RAW audio files of everything you say while you use the program to this directory. This is useful for collecting data, but eats up your disk space quickly, so don't use it unless you have a good reason to.
IgnoreEmptyUtterance -- if this is True (which it is by default), you won't get an UtteranceResult message if Sphinx didn't get anything useful out of your utterance. This is usually what you want, because otherwise things like breaths, door slams, or other short background noise can show up as empty utterances.
PartialResultInverval -- specifies the firing interval (in ms) for the UtterancePartialResult event during the process of decoding. SamplingRate -- the sampling rate of the audio input to the decoder.

4.0 How do I build decoder models for my specific task?

In order to get Sphinx to work for your particular task, you need to build a custom dictionary and language model. Sphinx uses several different knowledge sources to perform speech recognition. The version of Sphinx in this distribution uses the following information sources: For specific applications, only the last two knowledge sources need to be customized. The others can only be customized to an application when a large amount of data has been collected. To build dictionary and language model files for a specific task (containing the words and phrases you want the system to recognize), follow these steps:
  1. Using any plain text editor, such as Emacs or Notepad, create a list of sentences you would like your program to recognize. For example, if you were creating a system to process taqueria orders, your sentence list might look like:

    I'd like a burrito supreme.
    Give me a seven layer burrito.
    I want a double decker taco
    etc…

  2. Save your file, and go to the Sphinx Knowledge Base Tool (if you have trouble accessing or using this page, please contact Alex Rudnicky).
  3. Using the Browse… button for choosing the Sentence corpus file, select the file you just created.
  4. Click the COMPILE KNOWLEDGE BASE button. This could take a couple minutes, depending on the size of your corpus.
  5. Once it's done, you will get a screen titled Sphinx knowledge base, with five bulleted links. Download the dictionary and language model files. You can use the default names, or give them your own names, just be sure to keep the .dic and .lm extensions so that you know which is which later. It's useful to save them in the same location as the other models. Also, remember what the files are named, as we'll need to refer to them in the next section.

5.0 How do I configure the argument file?

The directory where you unzipped this distribution contains a file called sphinx.arg. The Sphinx-II decoder uses this file to access many of its configurable settings, including input flags telling the decoder where to look for the model files it's expecting. This file is loaded when the Init or StartListening methods are called. To use the Sphinx-II decoder, the parameters in this file need to be properly set; this can be done either using the Custom Property Pages we've implemented, or by hand.

Using the Custom Property Page

In the Sphinx OCX, we have implemented a set of property pages in the (Custom) dialog box to facilitate configuration of the decoder. You can use these pages to modify some of the decoder properties and argfile parameters.

NOTE: This dialog box loads (and saves) some properties from (to) the argument file, so before loading the dialog box, be sure that your ArgFile property is correctly set.

To access this dialog box from the Visual Basic development environment, right click on the Sphinx icon on the form of your application and select Properties… There are two tabs in the dialog box that comes up. The first tab, labeled General contains the properties of the Sphinx control. The second tab, labeled ArgFile contains properties loaded from the argument file.

The General tab allows you to set the ArgFile, LogFile, SamplingRate and IgnoreEmptyUtterance Sphinx OCX properties (see Section 3.0 for the details on these).
The Argfile tab allows you to set some of the variables in the argfile, namely the variables that point to files used by the decoder. These files should be uniquely identified by their extension; brief descriptions are also given below.

Manually Editing the Argfile

  1. Open the Argfile (sphinx.arg) with a text editor (such as Emacs or Notpad) from the directory where you unzipped this distribution. This file should be the same one that is specified in the Argfile variable of the instance of the Sphinx OCX in your application.
  2. The file contains settings for the Sphinx recognizer. Most of these you don't need to modify, or worry about, but some we do. These are at the top of the file, in the INPUT MODELS section delimited with #s. This section also contains brief descriptions for each argfile entry.

    ###########################################################
    ## INPUT MODELS
    ###########################################################

    # DARPA format bigram/trigram backoff LM file
    -lmfn c:\sphinx\demo\demo.lm

    # Main pronunciation dictionary file
    -dictfn c:\sphinx\demo\demo.dic

    # Phone and map files with senone mapping information
    # for the given dictionary and acoustic model
    -phnfn c:\sphinx\demo\h1c1-94.phone
    -mapfn c:\sphinx\demo\h1c1-94.map

    # Directory containing precompiled binary versions
    # of LM files
    # -kbdumpdir c:\sphinx\demo\

    # Directory with Sphinx-II semi-continuous HMM acoustic
    # models and codebook
    -hmmdir c:\sphinx\demo\10000-g
    -hmmdirlist c:\sphinx\demo\10000-g
    -cbdir c:\sphinx\demo\10000-g

    # 8-bit senone model file created from 32-bit HMM model
    # 8bsen should be set to TRUE if 8-bit senones are used
    -sendumpfn c:\sphinx\demo\g.sen
    -8bsen TRUE

  3. To customize this file, replace the c:\sphinx\demo\*.* path names with the actual paths of the files you have (either from the download, or generated from the web page in Section 4.0). Relative paths often work, but can sometimes lead to problems if your program changes working directories.
  4. Save the file. To find out more about what these parameters are, and how to use the other parameters in the argfile, please refer to the Sphinx-II FBS8 User Guide.

6.0 Troubleshooting

Listed below are some common problems that you may encounter while trying to get Sphinx to work.

6.1 My program dies.

Under certain conditions, if the Sphinx control reaches an error, it can kill its container's process. If you notice that your program dies when it runs (or when the Sphinx.Init function is called), the best source of information is the logfile. This file saves a lot of information, but at the very end probably has an error message indicating that it couldn't open one of the model files, or found an error in one of these files. Examine this error statement, and compare it with what's listed in your argfile. Make sure that all the file pointers in the argfile are set correctly. Here's an example of what you might see in the last two lines of the log file if the argfile parameter -phnfn points to a file that doesn't exist:

C:\users\rkm\fbs8-6-97\src\libfbs\kb.c(309): Reading phone file [.\lm\faa.phone]
C:\users\rkm\fbs8-6-97\src\libfbs\phone.c(71): fopen(.\lm\faa.phone,r) failed

6.2 I'm talking to my program, but nothing happens.

There are several possible causes to this problem. Here are some things to investigate:

6.3 Redraw doesn't work for my form since I added the Sphinx OCX to it.

If you're using Visual C++, you may have observed that some weird behavior occurs after you've added the Sphinx control to your form. This might be characterized as incorrect redraw of your form, or of what seems like lockup of the environment. We're trying to figure out why this happens, but in the meantime, typing Ctrl-Break and closing the dialog editing window, when it happens, seems to have a remedying effect.

7.0 Tips and Tricks

7.1 Getting Better Recognition

Depending on the type of system you're trying to build there are a number of different things you can do to increase your recognition accuracy.
  1. Use a good microphone. There's not too much the recognizer can do with unclean acoustic input. You'll notice the highest accuracy using a close-talking, noise-cancelling (probably head-mounted) microphone with little or no background noise.

  2. Improve your language model by extending your corpus. You might initially build a language model with sentences you think someone might say using your system. In reality, for any given task, there are a confounding number of ways to say the same thing. To start tackling this problem, build your corpus with input from other people. This could involve asking people to give you things they would say in a given scenario. This can be augmented with transcriptions of the audio from real sessions. (Transcriptions are also necessary to do acoustic retraining, but that's outside of the scope of this document. We have some guidelines on transcription that will be forthcoming.)

  3. Decrease confusability by making compound words. If your system doesn't rely on allowing people to say things any which way they choose, but rather has a fixed set of commands, you can decrease the confusability between the commands by combining commands or parts of commands a single compound words as follows:

    FOLLOW STANDARD INSTRUMENT DEPARTURE could be converted to FOLLOW_STANDARD_INSTRUMENT_DEPARTURE