Training an acoustic model with LDA and MLLT feature transforms
Attention! This feature is for single-stream (i.e. SphinxThree) models only. It will not work for Sphinx2 or PocketSphinx. Sorry about that folks...
A recently added feature to SphinxTrain is the ability to train feature-space transformations for acoustic models. There are a couple of benefits to using these. First of all, it can dramatically reduce the word error rate (up to 25% relative in some of our tests). Second, it also makes the decoder slightly faster since it reduces the dimensionality of the features, and also reduces the size of the acoustic model.
Unfortunately the training process becomes a bit more involved when using this feature. The reason is that it's necessary to do some parts of training several times over. Specifically, you have to train a basic model in order to train each feature transformation, then retrain the model with the transformation applied to the input features. This has to be done for each feature transformation (currently there are two of them as they have been found to have additive effects).
These feature transformations are "discriminative" in the sense that they try to improve the separability of acoustic classes in the feature space. This means that it's necessary to define the set of acoustic classes on which they are trained. There are two obvious choices which both seem to work well - the simplest one and the quickest to train is simply the context-independent phonemes. The more involved one is to use the context-dependent tied triphones (senones). In both cases, the SphinxTrain scripts try to automate the whole process for you.
Required software components
First, you need to have the necessary Python modules installed in order to do LDA and MLLT. You should (obviously) also have Python 2.3 or newer. To make sure that you have the necessary modules installed, make sure that you can run Python and load the numpy and scipy.optimize modules:
dhuggins@lima:~$ python Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32) [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> import scipy >>> import scipy.optimize >>>
Configuration changes
Once you're sure that Python is working, you need to make sure that you are using the most recent version of SphinxTrain (either a Subversion check-out or the nightly tarball) and that the Python modules in your copy of SphinxTrain have been built. To do this, run python setup.py build in the python directory of SphinxTrain.
You should also make sure that the SphinxTrain scripts are up to date in your training directory. The easiest way to do this is simply to blow away the scripts_pl and python directories and run setup_SphinxTrain.pl from the SphinxTrain scripts_pl directory.
Finally, you need to turn on LDA and MLLT in your sphinx_train.cfg file. To do this make sure it contains two lines reading:
$CFG_LDA_MLLT = 'yes'; $CFG_LDA_DIMENSION = 29;
You can adjust $CFG_LDA_DIMENSION if you like, though 29 or 30 seems to be a nearly-optimal value for many data sets.
Running training
Now, to do the "simple" version mentioned above, you can simply re-run perl scripts_pl/RunAll.pl. You can also get slightly better accuracy, at the cost of more training time, by running scripts_pl/RunAll_CDMLLT.pl.
Decoding with the MLLT model
First, you need to update your decoding scripts, because the MLLT acoustic model will be created in a different directory than the standard one - if, for example, your acoustic model is rm with 1000 senones, the MLLT model will be in model_parameters/rm.mllt_cd_cont_1000 rather than model_parameters/rm.cd_cont_1000.
If you are using the released version of SphinxThree (sphinx3-0.7), then you will also have to add the MLLT transformation file to the command-line. It is called feature_transform and it lives inside the acoustic model directory. So, using the example above, you would add something like this to your command-line:
-lda model_parameters/rm.mllt_cd_cont_1000/feature_transform