SPHINX SPEECH GROUP
THE SPHINX-II OCX DISTRIBUTION
The files contained in thesphinxocx.zipdistribution are:
sphinx.ocx-- the OCX control that contains the Sphinx-II speech recognition engineregsvr32.exe-- the program used to register the OCX control on your computersphinx.arg-- the argument file for the Sphinx-II decoderh1c1-94.map-- a file with senone mapping informationh1c1-94.phone-- another file with senone mapping information10000-g.sen-- a file containing clustered acoustic models for the decoder10000-g/-- a directory containing a set of acoustic model fileslicense.txt-- the conditions you are accepting by downloading and using this codeNT Download
sphinxocx.zip-- contains the files listed above. The files are archived using zip (you can use a program like WinZip to extract them). Download this fila and unzip it on your computer.NOTE: The OCX released in this distribution works only under NT (not Windows 95 or 98). This is due to a differing threading model between the operating systems. We might offer a version for these other operating systems in the future.
In the future we plan to release other distributions that include:
- source code
- model building code (as used from the web in Section 4.0)
- general lm + dict files (trained on large corpora across several different tasks)
ATTENTION LINUX USERS: We will also be releasing an RPM distribution for Sphinx-II. Check back for updates.
In Windows, you can use OCX controls under a variety of different programming environments. We provide overview descriptions of how to use the Sphinx OCX control within three different frameworks:The interface of the control is fixed, and consistent across these different environments. Additionally, the general procedure for using the control maps between the different environments. Integrating the Sphinx control into your already existing application, or building a new application with the Sphinx control, is easy using the step-by-step descriptions provided below.
- Visual Basic
- Visual C++
- Visual J++
Registering the Control
To use the Sphinx control on your system, it must first be registered. This can be done by invoking the registration function; we've provided theregsvr32.exeprogram, which does this (this application is also distributed with Windows).To register the Sphinx control on your system, drag and drop the
sphinx.ocxfile onto theregsvr32.exeprogram. This will add the appropriate entries in the system registry for the Sphinx control. It's important to keep in mind that the path information of the OCX file is added to the registry when the object is registered.NOTE: If you move the
sphinx.ocxfile without re-registering, it will not work properly.All Component Object Model (COM) objects such as OCX or ActiveX controls support the IUnknown interface), which permits the objects to export their specific signature in a well defined manner. As part of this structure, COM objects are also self-registering. In other words, these objects implement
DllRegisterServerandDllUnregisterServerthat are called by theregsvr32.exeprogram. These functions create or update the necessary system registry entries. Further details of the IUnknown interface, are well documented in the Microsoft support documentation.Below, we describe the process for building a new application that integrates the Sphinx control, for several different development environments. The goal is to show the necessary steps for integrating the control, and to point out how the speech recognition results are provided from the Sphinx control.
2.1 Visual Basic
Visual Basic is the easiest environment for creating an application that uses the Sphinx control. The major steps are:The completed project that's described in this section is also available for download:
- adding the component to your project
- adding the component to the form of your application
- writing the code to use the component in your application
vbdemo.zipCreate a VB Application that Includes Sphinx
- To begin, start up Microsoft Visual Basic and choose Standard EXE from the New tab in the dialog that pops up. If no dialog pops up at the start, choose New Project from the File menu to create a new project.
- Under the Project menu choose Components… (You can also get here by right-clicking the toolbar on the side of the screen and choosing Components… from the menu, or by typing
CTRL-T.)- Scroll through the list of components that pops up in the Controls tab of the dialog box, and check the box next to Sphinx ActiveX Control module (or something along those lines). If it's not there, be sure you successfully registered the
sphinx.ocxfile (see Registering the Control under Section 2.0).
- Once you've checked the box next to Sphinx, click OK. You should now have a Sphinx icon on the toolbar. Click this icon, then draw a square on your form. It will add an instance of the Sphinx control to your application (with the same icon shown below). You're now ready to use Sphinx in your application.
 
Using Sphinx in your Application
Now that you have the Sphinx control included into your application, you can use the control to do various things. Here, we'll explain how to get the result of a speech recognition into your application.To find out more about the other events, methods, and properties that the Sphinx OCX exports, please refer to Section 3.0.
- By double clicking on the form you've created (on an open area, not over a button or control), the
FormLoadsubroutine will be automatically generated. In this subroutine, add a call to the Sphinx initalization function by typing:
sphinx1.initNow, this function will be called whenever the application is loaded.
- Return to the form view, and double click on the Sphinx icon that you drew on the form. A handler function for the
UtteranceResultevent will be generated. The parameterResultcontains the result of the decoding.- For this example project, you can pop up a message box with this text by adding the following line to the
Sphinx_UtteranceResultsubroutine:
MsgBox Result
- The last thing to do before running your application is to be sure that the control's properties are set correctly. To do this, go to the form view, and select the Sphinx icon on your applications form. There should be a properties window on the screen that displays the values of the properties of the control (if not, you can get to it by selecting Properties Window… from the View Menu or by typing
F4).- Check that the
argfileproperty correctly points to the.argfile provided in this distribution. We'll discuss how to modify and customize this file in Section 5.0.
Once this file is customized, your application will be ready to run.2.2 Visual C++
You can also make a speech application using C++ in the Microsoft Visual C++ environment. This steps of this process are almost identical those in Visual Basic. The completed project that's described in this section is also available for download:vcdemo.zipCreate a Visual C++ Project That Includes Sphinx
- To begin, load Visual C++ and create a new project by chosing New from the File menu, and select the Projects tab.
- Select an MFC AppWizard(exe), and name your project along with specifying the path you want.
- In this sample, we'll make a dialog based application, so select the Dialog based radio box.
- In the next set of options be sure that the Automation checkbox is selected, and that your dialog is properly titled, the other defaults should be acceptable.
- The rest of the defaults should be okay for this sample, so click Finish. Developer's Studio will generate some application files for you automatically. The code that's been generated is the skeleton code for the dialog application.
- The next step is to add the Sphinx control to your program. To do this select Add to Project from the Project menu. Choose Components and Controls.
- Double click the Registered ActiveX Controls folder to find the Sphinx control.
- Select the Sphinx control and click Insert. Select OK when prompted whether you want to insert the control.
![]()
- In the next window, be sure the the
CSphinxclass is selected, and click OK to continue. Close the Components and Controls Gallery window to continue. A new pair of files have been automatically generated and added to your project. Under the class view, you will see theCSphinxclass that implements the calls to Sphinx OCX control. Sphinx is now added to your project.- To add Sphinx to your application, you need to select the newly added Sphinx icon from the toolbar and drag a box on your form. This may just show as a white area, but that's okay. This is where the Sphinx icon will show up on your application when it runs.
- Now that the Sphinx control has been added to the form of your dialog box, you can add a member variable for that control. To do this, load the Class Wizard from the View menu (or use
Ctrl-W). Select the Member Variables tab. Under the class for the dialog box (suffixed byDlg), there should be an entry with a control ID that containsSPHINX(most likely, it will be something likeIDC_SPHINXCTRL1). Select this entry and click Add Variable. Name your variable something appropriate (likem_sphinx); be sure that the Category for the item isControl, and that the type isCSphinx. Click OK to add the variable, then OK to dismiss the MFC Class Wizard. In your Dlg class, you now have a member variable for the Sphinx control.
![]()
Use the Sphinx Control In Your Application
Now we will discuss how to get your application to use Sphinx to trap theUtteranceResultevent, and use the output from the speech recognition.To find out more about the other events, methods, and properties that the Sphinx OCX exports, please refer to Section 3.0.
- First, you need to add the necessary initialization code for the Sphinx object. In your Dlg class, there will be an initialization function called
OnInitDialog. Near the end of the function, there will be a comment indicating where to put your initialization code, e.g.:
In this section, you want to set the important properties that Sphinx needs to have set before it can start listening. You can also initialize the decoder to load the models and start listening. This involves writing a few lines of code (substitute the path given, with the correct one for your files): // TODO: Add extra initialization here
/* This line points Sphinx to the absolute path of
* the argument file for the decoder. Creating and
* modifying this file is discussed in
* Section 5.0.
*/
m_sphinx.SetArgFile("c:\\sphinx\\demo\\sphinx.arg");/* This line points Sphinx to a file it will use
* to store the log from the decoder.
*/
m_sphinx.SetLogFile("c:\\sphinx\\demo\\sphinxocx.log");/* This funciton tells Sphinx to load the decoder
* models and start the decoding thread for
* speech recognition. The details of this
* function, and other related functions are
* discussed in Section 3.0.
*/
m_sphinx.StartListening();
- Assuming that these properties are correctly set, the next step is to add a handler to trap the
UtteranceResultevent. To do this, load the Class Wizard again (Ctrl-W), and select theMessage Mapstab. Under the Dlg class, select the object ID corresponding to Sphinx (e.g.IDC_SPHINXCTRL1). Select theUtteranceResultmessage, and clickAddFunction. Name the function appropriately (the default is good), and click OK. In the Class Wizard clickEdit Codeto go to the function. This function will be called whenever the Sphinx control fires anUtteranceResultmessage.
![]()
- In this handler function, you can do whatever you want with the
Resultstring that's passed as a paramenter. This string is the top scoring recognition result for the decoded speech. For this application, you can launch a message box displaying the result by adding the line:
AfxMessageBox(Result);- Now, you're ready to save your project and compile. Save your project (under the File menu). Under the Build menu, select the Build… option.
- Before running your application, refer to Section 5.0 to make sure that your argument file is correctly configured. Your project won't work correctly until this file is configured.
NOTE: An important difference between Dialog apps and MDI or SDI apps is that in the latter, the
CSphinx::Createfunction needs to be explicitly called in the initialization code of the object containing the Sphinx object. In Dialog apps, this function does not need to be called, as it is replaced by having the object on the form.2.3 Visual J++
Coming soon...
In this section we deal with the details of the Sphinx control interface. It should be clear, from the examples above, how to get Sphinx into your application. Now, we will explain more precisely how the interface works, and point out uses for the exported handles.
NOTE: If you are interested in using Sphinx-II in batch mode, you can do this by specifying a control file in the argfile (see the Sphinx-II FBS8 User Guide for more details on this). When the argfile is loaded (either with a call to
Initor a call toStartListening), the batch process will begin.
Events
The Sphinx OCX fires an event when certain things happen inside it. You can create handler code that will execute whenever these events are fired.
NewUtterance-- fired when the user has started speaking. This can be useful if you want to have an indicator letting the user know the system has detected that they've started speaking.
EndUtterance-- fired when the user has finished speaking.
StartListening-- fired when Sphinx is ready to start listening. This can be useful if you want an indicator displaying the state of the Sphinx control.
StopListening-- fired when Sphinx has stopped listening.
UtteranceError-- fired when something's wrong. This shouldn't ever fire, so it shouldn't ever be useful.
UtteranceResult-- fired when Sphinx has finished processing something the user said. It returns the text result in theResultparameter.
UtterancePartialResult-- fired at intervals set by thePartialResultInvervalproperty, only while Sphinx is decoding. It returns the text of the best scoring result so far in the in theResultparameter. You can use this function to keep a running display of what Sphinx thinks the user is saying.
Methods
The Sphinx OCX has various methods (functions) you can call to get it to do things.
Init-- Tell Sphinx to load all of the model files for the decoder. This is a blocking call and takes about a minute to load all of the files.
StartListening-- Tell it to start listening to the user's voice. The first time you call this, it will take about a minute to load all the models (ifInithas not already been called).
StopListening-- Tell it to stop listening to the user's voice.
ToggleListening-- If it's listening, this is the same as StopListening; if it's not, it's the same as StartListening.
There are other methods, but they aren't useful for simple applications, so you can safely ignore them. (Other method descriptions coming soon.)
Properties
ArgFile-- this should be set to the path and file name of the argument file containing the configuration for Sphinx. For more details see Section 5.0
LogFile-- this specifies the path fo the file where the Sphinx OCX will output various log information. It contains a lot of information that's not generally useful, unless you really know about Sphinx. It is a good source of information if your application died because of Sphinx. See Section 6.0 for more details.
LogDirectory-- if this is specified, Sphinx OCX will output RAW audio files of everything you say while you use the program to this directory. This is useful for collecting data, but eats up your disk space quickly, so don't use it unless you have a good reason to.
IgnoreEmptyUtterance-- if this isTrue(which it is by default), you won't get anUtteranceResultmessage if Sphinx didn't get anything useful out of your utterance. This is usually what you want, because otherwise things like breaths, door slams, or other short background noise can show up as empty utterances.
PartialResultInverval-- specifies the firing interval (in ms) for theUtterancePartialResultevent during the process of decoding.SamplingRate-- the sampling rate of the audio input to the decoder.
In order to get Sphinx to work for your particular task, you need to build a custom dictionary and language model. Sphinx uses several different knowledge sources to perform speech recognition. The version of Sphinx in this distribution uses the following information sources:For specific applications, only the last two knowledge sources need to be customized. The others can only be customized to an application when a large amount of data has been collected. To build dictionary and language model files for a specific task (containing the words and phrases you want the system to recognize), follow these steps:
- phonetic models (in the 10000-g/ hmm directory)
- clustered phoenetic models (in the .sen file)
- map and phone files (.map, .phone) that contain senone mappings for the given dictionary and acoustic model
- a pronunciation dictionary for the words in the lexicon
- a language model used for scoring word sequences
- Using any plain text editor, such as Emacs or Notepad, create a list of sentences you would like your program to recognize. For example, if you were creating a system to process taqueria orders, your sentence list might look like:
I'd like a burrito supreme.
Give me a seven layer burrito.
I want a double decker taco
etc…
- Save your file, and go to the Sphinx Knowledge Base Tool (if you have trouble accessing or using this page, please contact Alex Rudnicky).
- Using the Browse… button for choosing the Sentence corpus file, select the file you just created.
- Click the COMPILE KNOWLEDGE BASE button. This could take a couple minutes, depending on the size of your corpus.
- Once it's done, you will get a screen titled Sphinx knowledge base, with five bulleted links. Download the dictionary and language model files. You can use the default names, or give them your own names, just be sure to keep the .dic and .lm extensions so that you know which is which later. It's useful to save them in the same location as the other models. Also, remember what the files are named, as we'll need to refer to them in the next section.
The directory where you unzipped this distribution contains a file calledsphinx.arg. The Sphinx-II decoder uses this file to access many of its configurable settings, including input flags telling the decoder where to look for the model files it's expecting. This file is loaded when theInitorStartListeningmethods are called. To use the Sphinx-II decoder, the parameters in this file need to be properly set; this can be done either using the Custom Property Pages we've implemented, or by hand.Using the Custom Property Page
In the Sphinx OCX, we have implemented a set of property pages in the(Custom)dialog box to facilitate configuration of the decoder. You can use these pages to modify some of the decoder properties and argfile parameters.NOTE: This dialog box loads (and saves) some properties from (to) the argument file, so before loading the dialog box, be sure that your
ArgFileproperty is correctly set.To access this dialog box from the Visual Basic development environment, right click on the Sphinx icon on the form of your application and select
Properties…There are two tabs in the dialog box that comes up. The first tab, labeled General contains the properties of the Sphinx control. The second tab, labeled ArgFile contains properties loaded from the argument file.
The General tab allows you to set the ArgFile,LogFile,SamplingRateandIgnoreEmptyUtteranceSphinx OCX properties (see Section 3.0 for the details on these).The Argfile tab allows you to set some of the variables in the argfile, namely the variables that point to files used by the decoder. These files should be uniquely identified by their extension; brief descriptions are also given below. Manually Editing the Argfile
- Open the Argfile (
sphinx.arg) with a text editor (such as Emacs or Notpad) from the directory where you unzipped this distribution. This file should be the same one that is specified in theArgfilevariable of the instance of the Sphinx OCX in your application.- The file contains settings for the Sphinx recognizer. Most of these you don't need to modify, or worry about, but some we do. These are at the top of the file, in the
INPUT MODELSsection delimited with #s. This section also contains brief descriptions for each argfile entry.
###########################################################
## INPUT MODELS
###########################################################
# DARPA format bigram/trigram backoff LM file
-lmfn c:\sphinx\demo\demo.lm
# Main pronunciation dictionary file
-dictfn c:\sphinx\demo\demo.dic
# Phone and map files with senone mapping information
# for the given dictionary and acoustic model
-phnfn c:\sphinx\demo\h1c1-94.phone
-mapfn c:\sphinx\demo\h1c1-94.map
# Directory containing precompiled binary versions
# of LM files
# -kbdumpdir c:\sphinx\demo\
# Directory with Sphinx-II semi-continuous HMM acoustic
# models and codebook
-hmmdir c:\sphinx\demo\10000-g
-hmmdirlist c:\sphinx\demo\10000-g
-cbdir c:\sphinx\demo\10000-g
# 8-bit senone model file created from 32-bit HMM model
# 8bsen should be set to TRUE if 8-bit senones are used
-sendumpfn c:\sphinx\demo\g.sen
-8bsen TRUE
- To customize this file, replace the
c:\sphinx\demo\*.*path names with the actual paths of the files you have (either from the download, or generated from the web page in Section 4.0). Relative paths often work, but can sometimes lead to problems if your program changes working directories.- Save the file. To find out more about what these parameters are, and how to use the other parameters in the argfile, please refer to the Sphinx-II FBS8 User Guide.
Listed below are some common problems that you may encounter while trying to get Sphinx to work.
6.1 My program dies.
Under certain conditions, if the Sphinx control reaches an error, it can kill its container's process. If you notice that your program dies when it runs (or when theSphinx.Initfunction is called), the best source of information is the logfile. This file saves a lot of information, but at the very end probably has an error message indicating that it couldn't open one of the model files, or found an error in one of these files. Examine this error statement, and compare it with what's listed in your argfile. Make sure that all the file pointers in the argfile are set correctly. Here's an example of what you might see in the last two lines of the log file if the argfile parameter-phnfnpoints to a file that doesn't exist:
C:\users\rkm\fbs8-6-97\src\libfbs\kb.c(309): Reading phone file [.\lm\faa.phone]
C:\users\rkm\fbs8-6-97\src\libfbs\phone.c(71): fopen(.\lm\faa.phone,r) failed
6.2 I'm talking to my program, but nothing happens.
There are several possible causes to this problem. Here are some things to investigate:
- Using sound recorder, try to record yourself speaking. Play the file back, and make sure the levels are set correctly (no clipping, and not too soft).
- Check that
Sphinx.StartListeningfunction is called. If necessary, put a breakpoint at that line to make sure that it is triggered.- Check that you added a handler for the
UtteranceResultevent. You can also try setting a breakpoint in this function to make sure that it fires.6.3 Redraw doesn't work for my form since I added the Sphinx OCX to it.
If you're using Visual C++, you may have observed that some weird behavior occurs after you've added the Sphinx control to your form. This might be characterized as incorrect redraw of your form, or of what seems like lockup of the environment. We're trying to figure out why this happens, but in the meantime, typingCtrl-Breakand closing the dialog editing window, when it happens, seems to have a remedying effect.
7.1 Getting Better Recognition
Depending on the type of system you're trying to build there are a number of different things you can do to increase your recognition accuracy.
- Use a good microphone. There's not too much the recognizer can do with unclean acoustic input. You'll notice the highest accuracy using a close-talking, noise-cancelling (probably head-mounted) microphone with little or no background noise.
- Improve your language model by extending your corpus. You might initially build a language model with sentences you think someone might say using your system. In reality, for any given task, there are a confounding number of ways to say the same thing. To start tackling this problem, build your corpus with input from other people. This could involve asking people to give you things they would say in a given scenario. This can be augmented with transcriptions of the audio from real sessions. (Transcriptions are also necessary to do acoustic retraining, but that's outside of the scope of this document. We have some guidelines on transcription that will be forthcoming.)
- Decrease confusability by making compound words. If your system doesn't rely on allowing people to say things any which way they choose, but rather has a fixed set of commands, you can decrease the confusability between the commands by combining commands or parts of commands a single compound words as follows:
FOLLOW STANDARD INSTRUMENT DEPARTUREcould be converted toFOLLOW_STANDARD_INSTRUMENT_DEPARTURE