Go to the first, previous, next, last section, table of contents.


7 APIs

Flite is a library that we expected will be embedded into other applications. Included with the distribution is a small example executable that allows synthesis of strings of text and text files from the command line.

7.1 flite binary

The example flite binary may be suitable for very simple applications. Unlike Festival its start up time is very short (less that 25ms on a PIII 500MHz) making it practical (on larger machines) to call it each time you need to synthesize something.

flite TEXT OUTPUTTYPE

If TEXT contains a space it is treated as a string of text and converted to speech, if it does not contain a space TEXT is treated as a file name and the contents of that file are converted to speech. The option -t specifies TEXT is to be treat as text (not a filename) and -f forces treatment as a file. Thus

flite -t hello 

will say the word "hello" while

flite hello 

will say the content of the file `hello'. Likewise

flite "hello world."

will say the words "hello world" while

flite -f "hello world"

will say the contents of a file `hello world'. If no argument is specified text is read from standard input.

The second argument OUTPUTTYPE is the name of a file the output is written to, or if it is play then it is played to the audio device directly. If it is none then the audio is created but discarded, this is used for benchmarking. If it is stream then the audio is streamed through a call back function (though this is not particularly useful in the command line version. If OUTPUTTYPE is omitted, play is assumed. You can also explicitly set the outputtype with the -o flag.

flite -f doc/alice -o alice.wav

7.2 Voice selection

All the voices in the distribution are collected into a single simple list in the global variable flite_voice_list. You can select a voice from this list from the command line

flite -voice awb -f doc/alice -o alice.wav

And list which voices are currently supported in the binary with

flite -lv

The voices which get linked together are those listed in the VOICES in the `main/Makefile'. You can change that as you require.

7.3 C example

Each voice in Flite is held in a structure, a pointer to which is returned by the voice registration function. In the standard distribution, the example diphone voice is cmu_us_kal.

Here is a simple C program that uses the flite library

#include "flite.h"

register_cmu_us_kal();

int main(int argc, char **argv)
{
    cst_voice *v;

    if (argc != 2)
    {
        fprintf(stderr,"usage: flite_test FILE\n");
        exit(-1);
    }

    flite_init();

    v = register_cmu_us_kal(NULL);

    flite_file_to_speech(argv[1],v,"play");

}

Assuming the shell variable FLITEDIR is set to the flite directory the following will compile the system (with appropriate changes for your platform if necessary).

gcc -Wall -g -o flite_test flite_test.c -I$FLITEDIR/include -L$FLITEDIR/lib 
    -lflite_cmu_us_kal -lflite_usenglish -lflite_cmulex -lflite -lm

7.4 Public Functions

Although, of course you are welcome to call lower level functions, there a few key functions that will satisfy most users of flite.

void flite_init(void);
This must be called before any other flite function can be called. As of Flite 1.1, it actually does nothing at all, but there is no guarantee that this will remain true.
cst_wave *flite_text_to_wave(const char *text,cst_voice *voice);
Returns a waveform (as defined in `include/cst_wave.h') synthesized from the given text string by the given voice.
float flite_file_to_speech(const char *filename, cst_voice *voice, const char *outtype);
synthesizes all the sentences in the file `filename' with given voice. Output (at present) can only reasonably be, play or none. If the feature file_start_position with an integer, that point is used as start position in the file to be synthesized.
float flite_text_to_speech(const char *text, cst_voice *voice, const char *outtype);
synthesizes the text in string point to by text, with the given voice. outtype may be a filename where the generated waveform is written to, or "play" and it will be sent to the audio device, or "none" and it will be discarded. The return value is the number of seconds of speech generated.
cst_utterance *flite_synth_text(const char *text,cst_voice *voice);
synthesize the given text with the given voice and returns an utterance from it for further processing and access.
cst_utterance *flite_synth_phones(const char *phones,cst_voice *voice);
synthesize the given phones with the given voice and returns an utterance from it for further processing and access.
cst_voice *flite_voice_select(const char *name);
returns a pointer to the voice named name. Will retrurn NULL if there is not match, if name == NULL then the first voice in the voice list is returned.
int flite_voice_add_lex_addenda(cst_voice *v, const cst_string *lexfile);
loads the pronunciations from lexfile into the lexicon identified in the given voice (which will cause all other voices using that lexicon to also get this new addenda list. An example lexicon file is given in `flite/tools/examples.lex'. Words may be in double quotes, an optional part of speech tag may be give. A colon separates the headword/postag from the list of phonemes. Stress values (if used in the lexicon) must be specified. Bad phonemes will be complained about on standard out.

7.5 Streaming Synthesis

In 1.4 support was added for streaming synthesis. Basically you may provided a call back function that will be called with waveform data immediately when it is available. This potentially can reduce the dealy bewteen sending text to the synthesized and having audio available.

The support is through a call back function of type

int audio_stream_chunk(const cst_wave *w, int start, int size, 
                       int last, void *user)

If the utterance feature streaming_info is set (which can be set in a voice or in an utterance). The LPC or MLSA resynthesis functions will call the provided function as buffers become available. The LPC and MLSA waveform synthesis functions are used for diphones, limited domain, unit selection and clustergen voices. Note explicit support is required for streaming so new waveform synthesis function may not have the functionality.

An example streaming function is provided in `src/audio/au_streaming.c' and is used by the example flite main program when stream is given as the playing option. (Though in the command line program the function it isn't really useful.)

In order to use streaming you must provide call back function in your particualr thread. This is done bay adding features to the voice in your thread. Suppose your function was declrared as

int example_audio_stream_chunk(const cst_wave *w, int start, int size, 
                       int last, void *user)

You can add this function as the streaming function through the statement

     cst_audio_streaming_info *asi;
...
     asi = new_audio_streaming_info();
     asi->asc = example_audio_stream_chunk;
     feat_set(voice->features,
             "streaming_info",
             audio_streaming_info_val(asi));

You may also optionally include your own pointer to any information you additionally want to pass to your function. For example

typedef my_callback_struct {
   cst_audiodev *fd;
   int count;
};
cst_audio_streaming_info *asi;

...

mcs = cst_alloc(my_callback_struct,1);
mcs->fd=NULL;
mcs->count=1;

asi = new_audio_streaming_info();
asi->asc = example_audio_stream_chunk;
asi->userdata = mcs;
feat_set(voice->features,
         "streaming_info",
         audio_streaming_info_val(asi));


Go to the first, previous, next, last section, table of contents.