Public release of Haitian Creole language data by Carnegie Mellon
The Language Technologies Institute (LTI) of Carnegie Mellon
University's School of Computer Science (CMU SCS) is making publicly
available the Haitian Creole spoken and text data that we have
collected or produced. We are providing this data with minimal
license) in order to
allow others to develop language technology
for Haiti, in parallel with our own efforts to help with this crisis.
Since organizing the data in a useful fashion is not instantaneous,
and more text data is currently being produced by collaborators, we
will be publishing the data incrementally on the web, as it becomes
Note that several spelling systems exist for Haitian Creole.
We use here the official Haitian orthography
for Haitian Creole, by the IPN (Institut
Pedagogique National), 1979.
Haitian Creole data
Haitian Creole Speech data
Directory reorganized, ASR models added, noon EST on 28 January 2010.
Additional speech data added (data2), 12:45pm EST on 2 February 2010.
Added detailed description of speech data collection methodology on 24
Speech data originally collected by the U.S. DARPA-funded DIPLOMAT project
Haitian Creole Text data
There is an important update to this directory as of 1 p.m. EST on
27 January 2010. Please re-visit if you have used this data.
Various text data, including:
- Medical domain phrases and sentences collected at Carnegie
Mellon under the U.S. NSF-funded (jointly with the E.U.)
NESPOLE! project, and
translated into Haitian Creole by Eriksen Translations Inc.
- Parallel text data created by the U.S. DARPA-funded DIPLOMAT project
In addition to the members of the projects cited above:
Jeff Allen, SAP (formerly of Carnegie Mellon)
Vigdis Eriksen, Eriksen Translations Inc.
Manuel Stoeckl, Eriksen Translations Inc.
and these current Carnegie Mellon members:
Gopala Krishna Anumanchipalli
Alan W Black
Contact for this page: Robert Frederking