This directory contains the alphanumeric database (aka "census" aka "an4") recorded at Carnegie Mellon University circa 1991. This database is described in details in "Acoustical and environmental robustness in automatic speech recognition", by Alex Acero, published by Kluwer Academic Publishers, 1993.
Subjects were asked to spell out personal information, such as name, address, telephone number, birthdates, etc. They were instructed to not use their actual numbers. In addition to these, subjects also spoke randomly generated sequences of words containing control words. The database used internally at CMU has 1018 training and 140 test utterances, whereas the database provided here has 948 training and 130 test utterances. The sentences containing social security numbers were removed, just in case any of the subjects did not follow the advice to use a fake number.
All data are sampled at 16 kHz, 16-bit linear sampling. All recordings were made with a close talking microphone.
Each of the compressed tar files contains data in one of the following formats:
*.raw : audio files in raw format (linear PCM, no
header).*.sph : audio files in NIST's Sphere
format.*.mfc : audio files encoded in mel cepstral coefficients.The contents of each directory are described individually below. At the top level, you will find:
wav or feat The wav directory contains audio in one of the formats (NIST's Sphere, raw big endian, raw little endian). The feat directory contains the same data, but in cepstra format. Each package contains only one of wav or feat.
etc This directory contains the transcriptions, control files, dictionary, phone list, and flat unigram language model, for both training and test data where appropriate.
Under wav or feat, you will find:
an4_clstk The directory with training data has 74 sub-directories, one for each speaker. 21 of them are female, 53 are male. The total number of utterances is 948, and the average duration is about 3 seconds, totalling a little less than 50 minutes of speech.
an4test_clstk The directory with test data has 10 sub-directories, one for each speaker. 3 of them are female, 7 are male. The total number of utterances is 130, totalling around 6 minutes of speech.