Adapting the default acoustic model
Contents
This page describes how to do some simple acoustic model adaptation to improve speech recognition on your voice. The methods of adaptation are a bit different between PocketSphinx and SphinxThree due to the different types of acoustic models used. For more technical information on that see AcousticModelTypes.
Creating an adaptation corpus
The first thing you need to do is create a corpus of adaptation data. This will consist of a list of sentences, a dictionary describing the pronunciation of all the words in that list of sentences, and a recording of you speaking each of those sentences.
Required files
The actual set of sentences you use is somewhat arbitrary, but ideally it should have good coverage of the most frequently used words or phonemes in the set of sentences or the type of text you want to recognize. We have had good results simply using sentences from the CMU ARCTIC text-to-speech databases. To that effect, here are the first 20 sentences from ARCTIC, a control file, a transcription file, and a dictionary for them:
The sections below will refer to these files, so it would be a good idea to download them now. You should also make sure that you have downloaded and compiled SphinxBase and SphinxTrain.
Recording your adaptation data
This is unfortunately a bit more complicated than it ought to be. Basically, you need to record a single audio file for each sentence in the adaptation corpus, naming the files according to the names listed in arctic20.transcription and arctic20.listoffiles. In addition, you will need to make sure that you record these at a sampling rate of 16kHz in mono (1 channel).
If you are at a Linux command line, you can accomplish this in very nerdy style with the following bash one-liner from the directory in which you downloaded arctic20.txt:
for i in `seq 1 20`; do fn=`printf arctic_%04d $i`; read sent; echo $sent; rec -r 16000 -sw $fn.raw 2>/dev/null; done < arctic20.txt
This will echo each sentence to the screen and start recording immediately. Hit Control-C to move on to the next sentence. You should see the following files in the current directory afterwards:
arctic_0001.raw arctic_0007.raw arctic_0013.raw arctic_0019.raw arctic_0002.raw arctic_0008.raw arctic_0014.raw arctic_0020.raw arctic_0003.raw arctic_0009.raw arctic_0015.raw arctic20.dic arctic_0004.raw arctic_0010.raw arctic_0016.raw arctic20.listoffiles arctic_0005.raw arctic_0011.raw arctic_0017.raw arctic20.transcription arctic_0006.raw arctic_0012.raw arctic_0018.raw arctic20.txt
You should verify that these recordings sound okay. To do this you can play them back with:
for i in *.raw; do play -t raw -r 16000 -sw $i; done
Adapting the acoustic model (PocketSphinx version)
First we will copy the default acoustic model from PocketSphinx into the current directory in order to work on it. Assuming that you installed PocketSphinx under /usr/local, the acoustic model directory is /usr/local/share/pocketsphinx/hmm/wsj1. Copy this directory to your working directory:
cp -a /usr/local/share/pocketsphinx/model/hmm/wsj1 .
Generating acoustic feature files
In order to run the adaptation tools, you must generate a set of acoustic model feature files from these raw audio recordings. This can be done with the sphinx_fe tool from SphinxBase. It is imperative that you make sure you are using the same acoustic parameters to extract these features as were used to train the standard acoustic model. Since PocketSphinx 0.4, these are stored in a file called feat.params in the acoustic model directory. You can simply add it to the command line for sphinx_fe, like this:
sphinx_fe `cat wsj1/feat.params` -samprate 16000 -c arctic20.listoffiles -di . -do . -ei raw -eo mfc -raw yes
You should now have the following files in your working directory:
arctic_0001.mfc arctic_0006.raw arctic_0012.mfc arctic_0017.raw arctic_0001.raw arctic_0007.mfc arctic_0012.raw arctic_0018.mfc arctic_0002.mfc arctic_0007.raw arctic_0013.mfc arctic_0018.raw arctic_0002.raw arctic_0008.mfc arctic_0013.raw arctic_0019.mfc arctic_0003.mfc arctic_0008.raw arctic_0014.mfc arctic_0019.raw arctic_0003.raw arctic_0009.mfc arctic_0014.raw arctic_0020.mfc arctic_0004.mfc arctic_0009.raw arctic_0015.mfc arctic_0020.raw arctic_0004.raw arctic_0010.mfc arctic_0015.raw arctic20.dic arctic_0005.mfc arctic_0010.raw arctic_0016.mfc arctic20.listoffiles arctic_0005.raw arctic_0011.mfc arctic_0016.raw arctic20.transcription arctic_0006.mfc arctic_0011.raw arctic_0017.mfc arctic20.txt
Converting the sendump and mdef files
Unfortunately, there is an extra file which you need which was left out of the PocketSphinx distribution in order to save space. You can download the bzip2-compressed version from http://www.cs.cmu.edu/~dhuggins/Projects/pocketsphinx/wsj1/mixture_weights.bz2 and decompress it in the wsj1 folder:
cd wsj1
wget \
http://www.cs.cmu.edu/~dhuggins/Projects/pocketsphinx/wsj1/mixture_weights.bz2
bunzip2 mixture_weights.bz2
cd ..Alternately, if you have installed the SphinxTrain Python modules, you can use sendump_undump.py to convert the sendump file from the acoustic model to a mixture_weights file.
You will also need to convert the mdef file from the acoustic model to the plain text format used by the SphinxTrain tools. To do this, use the pocketsphinx_mdef_convert program:
pocketsphinx_mdef_convert -text wsj1/mdef wsj1/mdef.txt
Accumulating observation counts
The next step in adaptation is to collect statistics from the adaptation data. This is done using the bw program from SphinxTrain. You should be able to find this in a directory called bin.i686-pc-linux-gnu or bin-x86_64-unknown-linux-gnu (on Linux) or in bin\Debug or bin\Release (on Windows) inside the SphinxTrain directory. Copy it to the working directory along with the map_adapt and mk_s2sendump programs from the same directory.
Now, to collect statistics, run:
./bw \
-hmmdir wsj1 \
-moddeffn wsj1/mdef.txt \
-ts2cbfn .semi. \
-feat s2_4x -cmn current -agc none \
-dictfn arctic20.dic \
-ctlfn arctic20.listoffiles \
-lsnfn arctic20.transcription \
-accumdir .The -agc none parameter is very important.
Updating the acoustic model files with MAP
We will now copy the acoustic model directory and overwrite the newly created directory with adapted model files:
cp -a wsj1 wsj1adapt
To do adaptation, use the map_adapt program:
map_adapt \
-meanfn wsj1/means \
-varfn wsj1/variances \
-mixwfn wsj1/mixture_weights \
-tmatfn wsj1/transition_matrices \
-accumdir . \
-mapmeanfn wsj1adapt/means \
-mapvarfn wsj1adapt/variances \
-mapmixwfn wsj1adapt/mixture_weights \
-maptmatfn wsj1adapt/transition_matrices
Recreating the adapted sendump file
Now we have to recreate the sendump file from the updated mixture_weights file:
./mk_s2sendump \
-pocketsphinx yes \
-moddeffn wsj1adapt/mdef.txt \
-mixwfn wsj1adapt/mixture_weights \
-sendumpfn wsj1adapt/sendumpCongratulations! You now have an adapted acoustic model! You can delete the files wsj1adapt/mixture_weights and wsj1adapt/mdef.txt to save space if you like, because they are not used by the decoder.
Adapting the Acoustic Model (Sphinx3 version)
For SphinxThree, we can use a different type of acoustic model adaptation which does not require you to modify the acoustic model files. In addition, SphinxThree model files are not compressed and don't need to be extracted. Therefore, you simply need to know where the acoustic model is located. If you installed SphinxThree in /usr/local (the default), then you can find it in /usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd.
Generating acoustic feature files
The standard model with Sphinx3 uses a different set of acoustic feature parameters than PocketSphinx. Luckily these happen to be the default ones in sphinx_fe. So, to extract features for Sphinx3, you can use this command:
sphinx_fe -samprate 16000 -c arctic20.listoffiles -di . -do . -ei raw -eo mfc -raw yes
Accumulating observation counts
The next step in adaptation is to collect statistics from the adaptation data. This is done using the bw program from SphinxTrain. You should be able to find this in a directory called bin.i686-pc-linux-gnu or bin-x86_64-unknown-linux-gnu (on Linux) or in bin\Debug or bin\Release (on Windows) inside the SphinxTrain directory. Copy it to the working directory along with the mllr_solve program from the same directory.
SphinxThree's default acoustic model is a bit different than the PocketSphinx one, in that it does not include a noise dictionary. In order to collect statistics you will need to create one to use. Since there are no noise words in the adaptation data, simply create a text file called arctic20.filler with the following contents:
<s> SIL </s> SIL <sil> SIL
Now you can run bw as usual to collect statistics:
./bw \
-hmmdir /usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd \
-ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none \
-dictfn arctic20.dic \
-fdictfn arctic20.filler \
-ctlfn arctic20.listoffiles \
-lsnfn arctic20.transcription -accumdir .As before, the -agc none parameter is very important, because for some dumb reason it is not the default in the bw command line.
Generating the MLLR transformation
Next we will generate an MLLR transformation which we will pass to the decoder to adapt the acoustic model at run-time. This is done with the mllr_solve program:
./mllr_solve \
-meanfn /usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means \
-varfn /usr/local/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances \
-outmllrfn mllr_matrix -accumdir .Now, if you wish to decode with the adapted model, simply add -mllr mllr_matrix (or whatever the path to the mllr_matrix file you created is) to your SphinxThree command line or configuration file.