Q2.1: What sampling do I need for speech?

For recorded speech to be understood by humans you need an 8kHz sampling rate or more and at least 8 bit sampling. This produces poor quality speech - but in can be understood.

Improvements can be achieved by increasing the number of bits in sampling to 12bits or 16bits, or by using a non-linear encoding technique such as mu-law or A-law (see Q2.7). This improves the "signal-to-noise" ratio.

Increasing the sampling rate above 8kHz, say to 10kHz, 16kHz or 20Khz, improves the frequency response: the higher the sampling frequency the better the high frequency content will be. A 16kHz sampling rate is a reasonable target for high quality speech recording and playback.

When doing speech recognition you need to remember that the your computer is not as good as your ear so it will have trouble with poor quality sounds. The choice of an appropriate sampling setup depends very much on the speech recognition task and the amount of computer power available.

Back to Section 2 of the comp.speech FAQ Home Page.
Jump to SpeechLinks, [Q2.2], [Q2.3], [Q2.4], [Q2.5], [Q2.6], [Q2.7], [Q2.8]

Administrivia, Copyright, Submit Information : Last Revision: 01:53 12-Apr-1996