include/cont_ad.h File Reference

Continuous A/D listening and silence filtering module. More...

#include <sphinxbase_export.h>
#include <prim_type.h>
#include <stdio.h>

Go to the source code of this file.

Data Structures

struct  spseg_s
struct  cont_ad_t
 Continuous listening module or object Continuous listening module or object. More...

Defines

#define CONT_AD_STATE_SIL   0
#define CONT_AD_STATE_SPEECH   1

Typedefs

typedef struct spseg_s spseg_t

Functions

SPHINXBASE_EXPORT cont_ad_tcont_ad_init (ad_rec_t *ad, int32(*adfunc)(ad_rec_t *ad, int16 *buf, int32 max))
 Initialize a continuous listening/silence filtering object.
SPHINXBASE_EXPORT cont_ad_tcont_ad_init_rawmode (ad_rec_t *ad, int32(*adfunc)(ad_rec_t *ad, int16 *buf, int32 max))
 Initializes a continuous listening object which simply passes data through (!).
SPHINXBASE_EXPORT int32 cont_ad_read (cont_ad_t *r, int16 *buf, int32 max)
 Read raw audio data into the silence filter.
SPHINXBASE_EXPORT int32 cont_ad_calib (cont_ad_t *cont)
 Calibrate the silence filte.r.
SPHINXBASE_EXPORT int32 cont_ad_calib_loop (cont_ad_t *r, int16 *buf, int32 max)
 Calibrate the silence filter without an audio device.
SPHINXBASE_EXPORT int32 cont_ad_set_thresh (cont_ad_t *cont, int32 sil, int32 sp)
 Set silence and speech threshold parameters.
SPHINXBASE_EXPORT int32 cont_ad_set_params (cont_ad_t *r, int32 delta_sil, int32 delta_speech, int32 min_noise, int32 max_noise, int32 winsize, int32 speech_onset, int32 sil_onset, int32 leader, int32 trailer, float32 adapt_rate)
 Set the changable parameters.
SPHINXBASE_EXPORT int32 cont_ad_get_params (cont_ad_t *r, int32 *delta_sil, int32 *delta_speech, int32 *min_noise, int32 *max_noise, int32 *winsize, int32 *speech_onset, int32 *sil_onset, int32 *leader, int32 *trailer, float32 *adapt_rate)
 PWP 1/14/98 -- get the changable params.
SPHINXBASE_EXPORT int32 cont_ad_reset (cont_ad_t *cont)
 Reset, discarding any accumulated speech segments.
SPHINXBASE_EXPORT int32 cont_ad_close (cont_ad_t *cont)
 Close the continuous listening object.
SPHINXBASE_EXPORT void cont_ad_powhist_dump (FILE *fp, cont_ad_t *cont)
 Dump the power histogram.
SPHINXBASE_EXPORT int32 cont_ad_detach (cont_ad_t *c)
 Detach the given continuous listening module from the associated audio device.
SPHINXBASE_EXPORT int32 cont_ad_attach (cont_ad_t *c, ad_rec_t *a, int32(*func)(ad_rec_t *, int16 *, int32))
 Attach the continuous listening module to the given audio device/function.
SPHINXBASE_EXPORT int32 cont_ad_set_rawfp (cont_ad_t *c, FILE *fp)
 Set a file for dumping raw audio input.
SPHINXBASE_EXPORT int32 cont_ad_set_logfp (cont_ad_t *c, FILE *fp)
 Set the file to which cont_ad logs its progress.
SPHINXBASE_EXPORT int32 cont_set_thresh (cont_ad_t *r, int32 silence, int32 speech)
 Set the silence and speech thresholds.


Detailed Description

Continuous A/D listening and silence filtering module.

This module is intended to be interposed as a filter between any raw A/D source and the application to remove silence regions. Its main purpose is to remove regions of silence from the raw input speech. It is initialized with a raw A/D source function (during the cont_ad_init call). The application is responsible for setting up the A/D source, turning recording on and off as it desires. Filtered A/D data can be read by the application using the cont_ad_read function.

In other words, the application calls cont_ad_read instead of the raw A/D source function (e.g., ad_read in libad) to obtain filtered A/D data with silence regions removed. This module itself does not enforce any other structural changes to the application.

The cont_ad_read function also updates an "absolute" timestamp (see cont_ad_t.read_ts) at the end of each invocation. The timestamp indicates the total number of samples of A/D data read until this point, including data discarded as silence frames. The application is responsible for using this timestamp to make any policy decisions regarding utterance boundaries or whatever.

Definition in file cont_ad.h.


Function Documentation

SPHINXBASE_EXPORT int32 cont_ad_attach ( cont_ad_t c,
ad_rec_t *  a,
int32(*)(ad_rec_t *, int16 *, int32)  func 
)

Attach the continuous listening module to the given audio device/function.

(Like cont_ad_init, but without the calibration.)

Returns:
0 if successful, -1 otherwise.

Definition at line 1263 of file cont_ad_base.c.

References cont_ad_t::ad, cont_ad_t::adfunc, cont_ad_attach(), and cont_ad_t::eof.

Referenced by cont_ad_attach().

SPHINXBASE_EXPORT int32 cont_ad_calib ( cont_ad_t cont  ) 

Calibrate the silence filte.r.

Calibration to determine an initial silence threshold. This function can be called any number of times. It should be called at least once immediately after cont_ad_init. The silence threshold is also updated internally once in a while, so this function only needs to be called in the middle if there is a definite change in the recording environment. The application is responsible for making sure that the raw audio source is turned on before the calibration. Return value: 0 if successful, <0 otherwise.

Parameters:
cont  In: object pointer returned by cont_ad_init

Definition at line 995 of file cont_ad_base.c.

References cont_ad_t::ad, cont_ad_t::adbuf, cont_ad_t::adfunc, cont_ad_calib(), cont_ad_t::headfrm, cont_ad_t::n_frm, cont_ad_t::pow_hist, cont_ad_t::spf, and cont_ad_t::thresh_update.

Referenced by cont_ad_calib().

SPHINXBASE_EXPORT int32 cont_ad_calib_loop ( cont_ad_t r,
int16 *  buf,
int32  max 
)

Calibrate the silence filter without an audio device.

If the application has not passed an audio device into the silence filter at initialisation, this routine can be used to calibrate the filter. The buf (of length max samples) should contain audio data for calibration. This data is assumed to be completely consumed. More than one call may be necessary to fully calibrate. Return value: 0 if successful, <0 on failure, >0 if calibration not complete.

Definition at line 1031 of file cont_ad_base.c.

References cont_ad_t::adbuf, cont_ad_calib_loop(), cont_ad_t::headfrm, cont_ad_t::n_frm, cont_ad_t::pow_hist, and cont_ad_t::spf.

Referenced by cont_ad_calib_loop().

SPHINXBASE_EXPORT int32 cont_ad_detach ( cont_ad_t c  ) 

Detach the given continuous listening module from the associated audio device.

Returns:
0 if successful, -1 otherwise.

Definition at line 1251 of file cont_ad_base.c.

References cont_ad_t::ad, cont_ad_t::adfunc, and cont_ad_detach().

Referenced by cont_ad_detach().

SPHINXBASE_EXPORT int32 cont_ad_get_params ( cont_ad_t r,
int32 *  delta_sil,
int32 *  delta_speech,
int32 *  min_noise,
int32 *  max_noise,
int32 *  winsize,
int32 *  speech_onset,
int32 *  sil_onset,
int32 *  leader,
int32 *  trailer,
float32 *  adapt_rate 
)

PWP 1/14/98 -- get the changable params.

delta_sil, delta_speech, min_noise, and max_noise are in dB, winsize, speech_onset, sil_onset, leader and trailer are in frames of 16 ms length (256 samples @ 16kHz sampling).

Definition at line 1169 of file cont_ad_base.c.

References cont_ad_t::adapt_rate, cont_ad_get_params(), cont_ad_t::delta_sil, cont_ad_t::delta_speech, cont_ad_t::leader, cont_ad_t::max_noise, cont_ad_t::min_noise, cont_ad_t::sil_onset, cont_ad_t::speech_onset, cont_ad_t::trailer, and cont_ad_t::winsize.

Referenced by cont_ad_get_params().

SPHINXBASE_EXPORT cont_ad_t* cont_ad_init ( ad_rec_t *  ad,
int32(*)(ad_rec_t *ad, int16 *buf, int32 max)  adfunc 
)

Initialize a continuous listening/silence filtering object.

One time initialization of a continuous listening/silence filtering object/module.

Returns:
A pointer to a READ-ONLY structure used in other calls to the object. If any error occurs, the return value is NULL.
Parameters:
ad  In: The A/D source object to be filtered
adfunc  In: adfunc = source function to be invoked to obtain raw A/D data. See ad.h for the required prototype definition.

SPHINXBASE_EXPORT cont_ad_t* cont_ad_init_rawmode ( ad_rec_t *  ad,
int32(*)(ad_rec_t *ad, int16 *buf, int32 max)  adfunc 
)

Initializes a continuous listening object which simply passes data through (!).

Like cont_ad_init, but put the module in raw mode; i.e., all data is passed through, unfiltered. (By special request.)

SPHINXBASE_EXPORT void cont_ad_powhist_dump ( FILE *  fp,
cont_ad_t cont 
)

Dump the power histogram.

For debugging...

Definition at line 229 of file cont_ad_base.c.

References cont_ad_powhist_dump(), cont_ad_t::pow_hist, cont_ad_t::spf, cont_ad_t::sps, and cont_ad_t::tot_frm.

Referenced by cont_ad_powhist_dump().

SPHINXBASE_EXPORT int32 cont_ad_read ( cont_ad_t r,
int16 *  buf,
int32  max 
)

Read raw audio data into the silence filter.

The main read routine for reading speech/silence segmented audio data. Audio data is copied into the caller provided buffer, much like a file read routine. In normal mode, only speech segments are copied; silence segments are dropped. In rawmode (cont_ad module initialized using cont_ad_init_rawmode()), all data are passed through to the caller. But, in either case, any single call to cont_ad_read will never return data that crosses a speech/silence segment boundary.

The following variables are updated for use by the caller (see cont_ad_t above): cont_ad_t.state, cont_ad_t.read_ts, cont_ad_t.seglen, cont_ad_t.siglvl.

Return value: Number of samples actually read, possibly 0; <0 if EOF on A/D source.

Parameters:
r  In: Object pointer returned by cont_ad_init
buf  Out: On return, buf contains A/D data returned by this function, if any.
max  In: Maximum number of samples to be filled into buf. NOTE: max must be at least 256; otherwise the functions returns -1.

Definition at line 707 of file cont_ad_base.c.

References cont_ad_t::ad, cont_ad_t::adbuf, cont_ad_t::adbufsize, cont_ad_t::adfunc, cont_ad_read(), E_ERROR, cont_ad_t::eof, cont_ad_t::frm_pow, cont_ad_t::headfrm, cont_ad_t::leader, cont_ad_t::logfp, cont_ad_t::n_frm, cont_ad_t::n_other, cont_ad_t::n_sample, cont_ad_t::rawfp, cont_ad_t::rawmode, cont_ad_t::read_ts, cont_ad_t::seglen, cont_ad_t::siglvl, cont_ad_t::spf, cont_ad_t::spseg_head, cont_ad_t::spseg_tail, cont_ad_t::state, cont_ad_t::tail_state, cont_ad_t::thresh_sil, cont_ad_t::thresh_speech, cont_ad_t::thresh_update, cont_ad_t::tot_frm, cont_ad_t::win_startfrm, cont_ad_t::win_validfrm, and cont_ad_t::winsize.

Referenced by cont_ad_read().

SPHINXBASE_EXPORT int32 cont_ad_reset ( cont_ad_t cont  ) 

Reset, discarding any accumulated speech segments.

Returns:
0 if successful, <0 otherwise.

Definition at line 1206 of file cont_ad_base.c.

References cont_ad_reset(), cont_ad_t::headfrm, cont_ad_t::n_frm, cont_ad_t::n_other, cont_ad_t::n_sample, cont_ad_t::spseg_head, cont_ad_t::spseg_tail, cont_ad_t::tail_state, cont_ad_t::win_startfrm, and cont_ad_t::win_validfrm.

Referenced by cont_ad_close(), and cont_ad_reset().

SPHINXBASE_EXPORT int32 cont_ad_set_logfp ( cont_ad_t c,
FILE *  fp 
)

Set the file to which cont_ad logs its progress.

Mainly for debugging. If fp is NULL, logging is turned off.

Returns:
0 if successful, -1 otherwise.

Definition at line 1330 of file cont_ad_base.c.

References cont_ad_set_logfp(), and cont_ad_t::logfp.

Referenced by cont_ad_set_logfp().

SPHINXBASE_EXPORT int32 cont_ad_set_params ( cont_ad_t r,
int32  delta_sil,
int32  delta_speech,
int32  min_noise,
int32  max_noise,
int32  winsize,
int32  speech_onset,
int32  sil_onset,
int32  leader,
int32  trailer,
float32  adapt_rate 
)

Set the changable parameters.

delta_sil, delta_speech, min_noise, and max_noise are in dB, winsize, speech_onset, sil_onset, leader and trailer are in frames of 16 ms length (256 samples @ 16kHz sampling).

Definition at line 1096 of file cont_ad_base.c.

References cont_ad_t::adapt_rate, cont_ad_set_params(), cont_ad_t::delta_sil, cont_ad_t::delta_speech, E_ERROR, cont_ad_t::leader, cont_ad_t::max_noise, cont_ad_t::min_noise, cont_ad_t::sil_onset, cont_ad_t::speech_onset, cont_ad_t::trailer, cont_ad_t::win_validfrm, and cont_ad_t::winsize.

Referenced by cont_ad_set_params().

SPHINXBASE_EXPORT int32 cont_ad_set_rawfp ( cont_ad_t c,
FILE *  fp 
)

Set a file for dumping raw audio input.

The application can ask cont_ad to dump the raw audio input that cont_ad processes to a file. Use this function to give the FILE* to the cont_ad object. If invoked with fp == NULL, dumping is turned off. The application is responsible for opening and closing the file. If fp is non-NULL, cont_ad assumes the file pointer is valid and opened for writing.

Returns:
0 if successful, -1 otherwise.

Definition at line 1316 of file cont_ad_base.c.

References cont_ad_set_rawfp(), and cont_ad_t::rawfp.

Referenced by cont_ad_set_rawfp().

SPHINXBASE_EXPORT int32 cont_ad_set_thresh ( cont_ad_t cont,
int32  sil,
int32  sp 
)

Set silence and speech threshold parameters.

The silence threshold is the max power level, RELATIVE to the peak background noise level, in any silence frame. Similarly, the speech threshold is the min power level, RELATIVE to the peak background noise level, in any speech frame. In general, silence threshold <= speech threshold. Increasing the thresholds (say, from the default value of 2 to 3 or 4) reduces the sensitivity to background noise, but may also increase the chances of clipping actual speech.

Returns:
: 0 if successful, <0 otherwise.
Parameters:
cont  In: Object ptr from cont_ad_init
sil  In: silence threshold (default 2)
sp  In: speech threshold (default 2)

Definition at line 1070 of file cont_ad_base.c.

References cont_ad_set_thresh(), cont_ad_t::delta_sil, and cont_ad_t::delta_speech.

Referenced by cont_ad_set_thresh().

SPHINXBASE_EXPORT int32 cont_set_thresh ( cont_ad_t r,
int32  silence,
int32  speech 
)

Set the silence and speech thresholds.

For this to remain permanently in effect, the auto_thresh field of the continuous listening module should be set to FALSE or 0. Otherwise the thresholds may be modified by the noise- level adaptation.

Definition at line 1278 of file cont_ad_base.c.

References cont_set_thresh(), cont_ad_t::frm_pow, cont_ad_t::n_other, cont_ad_t::tail_state, cont_ad_t::thresh_sil, cont_ad_t::thresh_speech, cont_ad_t::win_startfrm, and cont_ad_t::win_validfrm.

Referenced by cont_set_thresh().


Generated on Mon Jul 7 22:32:38 2008 for SphinxBase by  doxygen 1.5.5