This directory contains the alphanumeric database (aka "census" aka "an4") recorded at Carnegie Mellon University circa 1991. This database is described in details in "Acoustical and environmental robustness in automatic speech recognition", by Alex Acero, published by Kluwer Academic Publishers, 1993.

Subjects were asked to spell out personal information, such as name, address, telephone number, birthdates, etc. They were instructed to not use their actual numbers. In addition to these, subjects also spoke randomly generated sequences of words containing control words. The database used internally at CMU has 1018 training and 140 test utterances, whereas the database provided here has 948 training and 130 test utterances. The sentences containing social security numbers were removed, just in case any of the subjects did not follow the advice to use a fake number.

All data are sampled at 16 kHz, 16-bit linear sampling. All recordings were made with a close talking microphone.

Each of the compressed tar files contains data in one of the following formats:

The contents of each directory are described individually below. At the top level, you will find:

Under wav or feat, you will find: