Public release of Haitian Creole language data by Carnegie Mellon
The Language Technologies Institute (LTI) of Carnegie Mellon
University's School of Computer Science (CMU SCS) is making publicly
available the Haitian Creole spoken and text data that we have
collected or produced.  We are providing this data with minimal
restrictions (see 
license) in order to
allow others to develop language technology 
for Haiti, in parallel with our own efforts to help with this crisis.
Since organizing the data in a useful fashion is not instantaneous,
and more text data is currently being produced by collaborators, we
will be publishing the data incrementally on the web, as it becomes
available.  
Orthography
Note that several spelling systems exist for Haitian Creole.
We use here the official Haitian orthography
for Haitian Creole, by the IPN (Institut
Pedagogique National), 1979.
Haitian Creole data
Data License
Haitian Creole Speech data
Update:  
Directory reorganized, ASR models added, noon EST on 28 January 2010.
Update:  
Additional speech data added (data2), 12:45pm EST on 2 February 2010.
Update:  
Added detailed description of speech data collection methodology on 24
March 2010.
Speech data originally collected by the U.S. DARPA-funded DIPLOMAT project
Haitian Creole Text data
Update:  
There is an important update to this directory as of 1 p.m. EST on 
27 January 2010.  Please re-visit if you have used this data.
Various text data, including:
- Medical domain phrases and sentences collected at Carnegie
Mellon under the U.S. NSF-funded (jointly with the E.U.)
NESPOLE! project, and
 translated into Haitian Creole by Eriksen Translations Inc.
- Parallel text data created by the U.S. DARPA-funded DIPLOMAT project
Acknowledgements
In addition to the members of the projects cited above: 
  Jeff Allen, SAP (formerly of Carnegie Mellon)
  Vigdis Eriksen, Eriksen Translations Inc.
  Manuel Stoeckl, Eriksen Translations Inc.
  Karen Wallace
and these current Carnegie Mellon members:
  Vamshi Ambati
  Gopala Krishna Anumanchipalli
  Alan W Black
  Ralf Brown
  Jaime Carbonell
  Robert Frederking
  Greg Hanneman
  Sanjika Hewavitharana
  David Huggins-Daines
  Alon Lavie
  Stephan Vogel 
Contact for this page: Robert Frederking