The CMU Sphinx Group Open Source Speech Recognition Engines

Speech at CMU   |   Sphinx at SourceForge

Introduction

General Documentation

CMUSphinx Components

Common library

Decoders

Acoustic Model Training

Language Model Training

Utilities


Latest News

PocketSphinx: 0.5 release
2008-07-08 16:02
Read More »

cmudict.0.7a release
2008-02-19 18:22
Read More »

New IRC channel and documentation wiki
2007-12-20 16:01
Read More »

Site news archive »


External Links

Notice: if you have comments about the links below, please contact the authors directly.

CMU Sphinx documentation Wiki

Adapting the default acoustic model

This page describes how to do some simple acoustic model adaptation to improve speech recognition on your voice. The methods of adaptation are a bit different between PocketSphinx and SphinxThree due to the different types of acoustic models used. For more technical information on that see AcousticModelTypes.

Creating an adaptation corpus

The first thing you need to do is create a corpus of adaptation data. This will consist of a list of sentences, a dictionary describing the pronunciation of all the words in that list of sentences, and a recording of you speaking each of those sentences.

Required files

The actual set of sentences you use is somewhat arbitrary, but ideally it should have good coverage of the most frequently used words or phonemes in the set of sentences or the type of text you want to recognize. We have had good results simply using sentences from the CMU ARCTIC text-to-speech databases. To that effect, here are the first 20 sentences from ARCTIC, a control file, a transcription file, and a dictionary for them:

The sections below will refer to these files, so it would be a good idea to download them now. You should also make sure that you have downloaded and compiled SphinxBase and SphinxTrain.

Recording your adaptation data

This is unfortunately a bit more complicated than it ought to be. Basically, you need to record a single audio file for each sentence in the adaptation corpus, naming the files according to the names listed in arctic20.transcription and arctic20.listoffiles. In addition, you will need to make sure that you record these at a sampling rate of 16kHz in mono (1 channel).

If you are at a Linux command line, you can accomplish this in very nerdy style with the following bash one-liner from the directory in which you downloaded arctic20.txt:

for i in `seq 1 20`; do fn=`printf arctic_%04d $i`; read sent; echo $sent; rec -r 16000 -sw $fn.raw 2>/dev/null; done < arctic20.txt

This will echo each sentence to the screen and start recording immediately. Hit Control-C to move on to the next sentence. You should see the following files in the current directory afterwards:

arctic_0001.raw  arctic_0007.raw  arctic_0013.raw  arctic_0019.raw
arctic_0002.raw  arctic_0008.raw  arctic_0014.raw  arctic_0020.raw
arctic_0003.raw  arctic_0009.raw  arctic_0015.raw  arctic20.dic
arctic_0004.raw  arctic_0010.raw  arctic_0016.raw  arctic20.listoffiles
arctic_0005.raw  arctic_0011.raw  arctic_0017.raw  arctic20.transcription
arctic_0006.raw  arctic_0012.raw  arctic_0018.raw  arctic20.txt

You should verify that these recordings sound okay. To do this you can play them back with:

for i in *.raw; do play -t raw -r 16000 -sw $i; done

Adapting the acoustic model (PocketSphinx version)

First we will copy the default acoustic model from PocketSphinx into the current directory in order to work on it. Assuming that you installed PocketSphinx under /usr/local, the acoustic model directory is /usr/local/share/pocketsphinx/hmm/wsj1. Copy this directory to your working directory:

cp -a /usr/local/share/pocketsphinx/model/hmm/wsj1 .

Generating acoustic feature files

In order to run the adaptation tools, you must generate a set of acoustic model feature files from these raw audio recordings. This can be done with the sphinx_fe tool from SphinxBase. It is imperative that you make sure you are using the same acoustic parameters to extract these features as were used to train the standard acoustic model. Since PocketSphinx 0.4, these are stored in a file called feat.params in the acoustic model directory. You can simply add it to the command line for sphinx_fe, like this:

sphinx_fe `cat wsj1/feat.params` -samprate 16000 -c arctic20.listoffiles -di . -do . -ei raw -eo mfc -raw yes

You should now have the following files in your working directory:

arctic_0001.mfc  arctic_0006.raw  arctic_0012.mfc  arctic_0017.raw
arctic_0001.raw  arctic_0007.mfc  arctic_0012.raw  arctic_0018.mfc
arctic_0002.mfc  arctic_0007.raw  arctic_0013.mfc  arctic_0018.raw
arctic_0002.raw  arctic_0008.mfc  arctic_0013.raw  arctic_0019.mfc
arctic_0003.mfc  arctic_0008.raw  arctic_0014.mfc  arctic_0019.raw
arctic_0003.raw  arctic_0009.mfc  arctic_0014.raw  arctic_0020.mfc
arctic_0004.mfc  arctic_0009.raw  arctic_0015.mfc  arctic_0020.raw
arctic_0004.raw  arctic_0010.mfc  arctic_0015.raw  arctic20.dic
arctic_0005.mfc  arctic_0010.raw  arctic_0016.mfc  arctic20.listoffiles
arctic_0005.raw  arctic_0011.mfc  arctic_0016.raw  arctic20.transcription
arctic_0006.mfc  arctic_0011.raw  arctic_0017.mfc  arctic20.txt

Converting the sendump and mdef files

Unfortunately, there is an extra file which you need which was left out of the PocketSphinx distribution in order to save space. You can download the bzip2-compressed version from http://www.cs.cmu.edu/~dhuggins/Projects/pocketsphinx/wsj1/mixture_weights.bz2 and decompress it in the wsj1 folder:

cd wsj1
wget \
    http://www.cs.cmu.edu/~dhuggins/Projects/pocketsphinx/wsj1/mixture_weights.bz2
bunzip2 mixture_weights.bz2
cd ..

Alternately, if you have installed the SphinxTrain Python modules, you can use sendump_undump.py to convert the sendump file from the acoustic model to a mixture_weights file.

You will also need to convert the mdef file from the acoustic model to the plain text format used by the SphinxTrain tools. To do this, use the pocketsphinx_mdef_convert program:

pocketsphinx_mdef_convert -text wsj1/mdef wsj1/mdef.txt

Accumulating observation counts

The next step in adaptation is to collect statistics from the adaptation data. This is done using the bw program from SphinxTrain. You should be able to find this in a directory called bin.i686-pc-linux-gnu or bin-x86_64-unknown-linux-gnu (on Linux) or in bin\Debug or bin\Release (on Windows) inside the SphinxTrain directory. Copy it to the working directory along with the map_adapt and mk_s2sendump programs from the same directory.

Now, to collect statistics, run:

./bw \
    -hmmdir wsj1 \
    -moddeffn wsj1/mdef.txt \
    -ts2cbfn .semi. \
    -feat s2_4x -cmn current -agc none \
    -dictfn arctic20.dic \
    -ctlfn arctic20.listoffiles \
    -lsnfn arctic20.transcription \
    -accumdir .

The -agc none parameter is very important.

Updating the acoustic model files with MAP

We will now copy the acoustic model directory and overwrite the newly created directory with adapted model files:

cp -a wsj1 wsj1adapt

To do adaptation, use the map_adapt program:

map_adapt \
    -meanfn wsj1/means \
    -varfn wsj1/variances \
    -mixwfn wsj1/mixture_weights \
    -tmatfn wsj1/transition_matrices \
    -accumdir . \
    -mapmeanfn wsj1adapt/means \
    -mapvarfn wsj1adapt/variances \
    -mapmixwfn wsj1adapt/mixture_weights \
    -maptmatfn wsj1adapt/transition_matrices

Recreating the adapted sendump file

Now we have to recreate the sendump file from the updated mixture_weights file:

./mk_s2sendump \
    -pocketsphinx yes \
    -moddeffn wsj1adapt/mdef.txt \
    -mixwfn wsj1adapt/mixture_weights \
    -sendumpfn wsj1adapt/sendump

Congratulations! You now have an adapted acoustic model! You can delete the files wsj1adapt/mixture_weights and wsj1adapt/mdef.txt to save space if you like, because they are not used by the decoder.

Adapting the Acoustic Model (Sphinx3 version)

For SphinxThree, we can use a different type of acoustic model adaptation which does not require you to modify the acoustic model files. In addition, SphinxThree model files are not compressed and don't need to be extracted. Therefore, you simply need to know where the acoustic model is located. If you installed SphinxThree in /usr/local (the default), then you can find it in /usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd.

Generating acoustic feature files

The standard model with Sphinx3 uses a different set of acoustic feature parameters than PocketSphinx. Luckily these happen to be the default ones in sphinx_fe. So, to extract features for Sphinx3, you can use this command:

sphinx_fe -samprate 16000 -c arctic20.listoffiles -di . -do . -ei raw -eo mfc -raw yes

Accumulating observation counts

The next step in adaptation is to collect statistics from the adaptation data. This is done using the bw program from SphinxTrain. You should be able to find this in a directory called bin.i686-pc-linux-gnu or bin-x86_64-unknown-linux-gnu (on Linux) or in bin\Debug or bin\Release (on Windows) inside the SphinxTrain directory. Copy it to the working directory along with the mllr_solve program from the same directory.

SphinxThree's default acoustic model is a bit different than the PocketSphinx one, in that it does not include a noise dictionary. In order to collect statistics you will need to create one to use. Since there are no noise words in the adaptation data, simply create a text file called arctic20.filler with the following contents:

<s> SIL
</s> SIL
<sil> SIL

Now you can run bw as usual to collect statistics:

./bw \
    -hmmdir /usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd \
    -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none \
    -dictfn arctic20.dic \
    -fdictfn arctic20.filler \
    -ctlfn arctic20.listoffiles \
    -lsnfn arctic20.transcription -accumdir .

As before, the -agc none parameter is very important, because for some dumb reason it is not the default in the bw command line.

Generating the MLLR transformation

Next we will generate an MLLR transformation which we will pass to the decoder to adapt the acoustic model at run-time. This is done with the mllr_solve program:

./mllr_solve \
    -meanfn /usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means \
    -varfn /usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances \
    -outmllrfn mllr_matrix -accumdir .

Now, if you wish to decode with the adapted model, simply add -mllr mllr_matrix (or whatever the path to the mllr_matrix file you created is) to your SphinxThree command line or configuration file.

AcousticModelAdaptation (last edited 2008-02-21 03:34:54 by localhost)

SourceForge.net Logo This page is maintained by David Huggins-Daines ()
CMUSphinx is a project within the Sphinx Group at Carnegie Mellon