The tone and break-index tiers represent the core prosodic analysis. The tone tier is the part of the transcription that corresponds most closely to a phonological analysis of the utterance's intonation pattern. It consists of labels for distinctive pitch events, transcribed as a sequence of high (H) and low (L) tones marked with diacritics indicating their intonational function as parts of pitch accents or as phrase tones marking the edges of two types of intonationally marked prosodic units. The inventory of pitch events and their definitions are based on autosegmental analyses, in particular the analysis of Pierrehumbert and her colleagues (see Pierrehumbert & Hirschberg, 1990, and the references cited in it) with some modifications toward such alternative analyses as that of Ladd (1983). In example utterance <<jam1>>, there is a production of the question "Will you have marmalade, or jam?" with two pitch accents (the L* tones), two phrase accents (the H- tones), and a H% boundary tone.
EXAMPLE <<jam1>>: Will you have marmalade, or jam? L* H- L* H-H%
EXAMPLE <<jam1>>: Will you have marmalade, or jam? 1 1 1 3 1 4
The miscellaneous tier, like the orthographic tier, can include many events that are arguably not part of prosody per se. However, many events that are typically marked on this tier are important for interpreting the analyses on the tone tier and break-index tier, because they disrupt the smooth rhythm of the utterance or interrupt the intonation contour. This tier is essentially a `comment' tier that can be used to mark events such as the cough in example utterance <<cough>>. Except for very few exceptions (most notably, the label `disfl' often stands alone to flag the occurrence of a perceived disfluency of some type), labels on this tier come in pairs, to mark the beginning and end of each event interval. If it were not for the disruption of the cough labelled on the miscellaneous tier here, the tone transcription would have to be parsed as either unfinished or ill-formed.
EXAMPLE <<cough>>: Will you have marmalade ... L* L* 1 1 1 1p cough< cough>
The categorical aspects of prosody which we try to capture completely (by the first principle) are of two types. The first is the prosodic structure -- the rhythm of more and less stressed words alternating with each other, and the grouping of words into prosodic constituents of various sizes -- and the second is the intonation pattern -- the sequence of contrastive pitch events that we call pitch accents, phrase accents, and boundary tones.
An example of the noncategorical aspects of prosody which we leave out (in accordance with the second principle) is the local tempo of each word in the utterance, which we feel could be more accurately and directly captured by some quantitative measure such as normalized segment duration (e.g., Campbell, 1992) than by any symbolic transcription such as an arbitrary division into, say, categories `1', `2', and `3' (for `slow', `medium', and `fast' tempi). An exception to this principle is the marking for each phrase of the point of highest fundamental frequency associated with an accent (HiF0), which we use as a measure of pitch range in order to facilitate research on the relationship between pitch range and discourse structure (see, e.g., Grosz & Hirschberg, 1992, and references therein). We anticipate being able to do away with this marking when we have developed automatic tools for detecting accent-related peaks directly from the fundamental frequency contour in conjunction with the tone tier transcription.
A categorical aspect of prosody which we leave out (in accordance with the third principle) because it should be fairly predictable is the marking of the stressed and unstressed syllables within each word. By this level of stress we mean the word-internal alternation between more and less stressed syllables where the relative prominence of any pair of syllables is fairly fixed and can be thought of as inherent to the word's dictionary entry. For example, if the first and third syllables in the word "marmalade" are not pronounced with more prominence than the second, native speakers will judge the vowels in these two syllable to be mispronounced. (That is the first and third syllables should not have reduced vowels, whereas the second one should.) Since such word-internal rhythms are thus a fixed part of the word's pronunciation, we leave this specification out. That is, for example, in the transcription of utterances <<jam1>> and <<cough>>, we have not marked the first and third syllables as relatively more stressed than the second syllable, since this aspect of the prosodic structure would be marked in any dictionary entry for the word, so that users of ToBI-transcribed databases could interface the orthographic tier with an online dictionary to fill in this information.
The marking of stress -- Pitch accents and prominence
Example utterance <<made1>> illustrates the unpredictability of prominences above the word, with three different productions of the same sentence -- "Marianna made the marmalade" -- each of which has a different stress pattern. In the first production, there are two syllables that are relatively more prominent than any other, the accented syllables in the words "Marianna" and "marmalade". In the second production of the sentence, on the other hand, there is only the one relatively more prominent syllable in "Marianna", and "marmalade" has been `deaccented'. This level of stress is marked in the ToBI system by directly transcribing the pitch accent on the tone tier. Thus, in the transcription of the first production in the example, there are H* accents marked for both "Marianna" and "marmalade", whereas in the second production there is only the L+H* accent marked on "Marianna". (The third production, like the first, also has accents on "Marianna" and "marmalade", but it has a different stress pattern because both of these accents are nuclear stresses, whereas in the first production only "marmalade" has a nuclear accent. We will describe this higher level of stress in more detail in the next subsection.)
EXAMPLE <<made1>>: Marianna made the marmalade. in three productions 1) H* H* L-L% 2) L+H* L-L% 3) L+H*L-H% L* H* L-L%
EXAMPLE <<made2>>: Marianna made the marmalade. in four productions 1) H* H* L-L% 2) L+H* L-L% 3) L+H* !H* L-L% 4) H* L-L%
EXAMPLE <<made3>>: Marianna made the marmalade. in four productions 1) L+H* !H* L-L% 2) H* L-L% 3) L* L* H-H% 4) L* H-H%
The marking of stress -- Intonational phrasing and prominence
EXAMPLE <<made1>>: Marianna made the marmalade. in three productions 1) H* H* L-L% 1 1 1 4 2) L+H* L-L% 1 1 1 4 3) L+H*L-H% L* H* L-L% 4 1 1 4
An intonation phrase contains one or more intermediate phrases, and the end of an intonation phrase is by definition also the end of an intermediate phrase (break index 3). This fact is reflected on the tone tier in the requirement that there be a sequence of phrase accent (for the last intermediate phrase) followed by a boundary tone at the end of every intonation phrase. The last production of the sentence in <<made1>> illustrates this nicely with clear reflexes of the tone string in the fundamental frequency contour. Note first the fall from the peak for the L+H* nuclear pitch accent to the L- phrase accent for the first intermediate phrase, followed by the small rise in fundamental frequency to the H% boundary tone at the intonation phrase boundary.
Utterance <<insert>> illustrates the next lower level of disjuncture, that between two intermediate phrases that are grouped into one intonation phrase. In the second production of the sentence "`I' means insert", there is a fall from a H* nuclear accent into a L- phrase accent, but there is no subsequent boundary tone, since this in not an intonation phrase boundary.
EXAMPLE <<insert>> -- `I' means insert. in two productions 1) H* H* L-L% 1 1 4 2) H* L- H* L-L% 3 1 4
EXAMPLE <<made4>>: Marianna made the marmalade. in two productions 1) L* *? L* H-H% 1 1 1 4 2) L* H- L* H-H% 3 1 1 4
The conventions for placing labels when using the waves(tm) labelling system are prescribed in the ToBI Annotation Conventions so that labellers can use tools such as John Pitrelli's checker program to check for inadvertent omissions and grammatical errors. To quickly summarize, the break index label is placed at or just after the word label. Phrase accent and boundary tone labels are placed on or just before the corresponding 3 or 4 break index label. Pitch accents are placed somewhere within the accented syllable, preferably within the interval that can be identified with the syllable's vowel.
In the non-waves(tm) transcription conventions, the orthographic, tone, and break index labels are ordered within each line so that such a transcription could be generated fairly quickly by merging and sorting a set of waves(tm)-format label files.