The other core part of the prosodic transcription proper is the break index tier. If we think of the tone tier as a marking of the speech signal mediated primarily by our interpretation of the analysis of the f0 contour, the analogous way to think of the break index tier is as a marking of the speech signal as mediated primarily by the rhythmic and segmental analysis implicit in the orthographic tier. The summary statement of ToBI conventions describes this relationship as follows:
Thus, the event s on the break index tier are labels of the utterance's prosodic grouping -- that is, each label denotes a boundary of some kind of constituent which ends at the word that the transcriber has marked on the orthographic tier. The convention for placing break index marks in a waves(tm) label file is that the number should be associated with a point in time at the end of the marked word as indicated by the label in the orthographic tier. It should be located exactly at, or slightly to the right, of this word marker, so that break indices can be unambiguously associated with other tiers.
There are 5 break indices, numbered 0 through 4, roughly in order of lesser to greater degree of perceived separation between the marked word and following material. The break indices are meant to be a label of the SUBJECTIVE strength of the boundary. However, this does not mean that there are no objective criteria for marking the boundaries, or that the five labels form a uniform five-point scale. For example, the lowest-level break index (0) is defined in terms of connected speech processes, such as the flapping of word-final /t/ and /d/ before a following vowel-initial word in many American and Australian dialects, processes that prosodically group words together into `clitic groups' -- larger compound-word-like constituents above the level of the word (see Section 3.2). At the other end of the scale, the two highest break indices (3 and 4) are defined in relationship to the prosodic constituents (intermediate phrases and intonation phrases) that are assumed by the marking of phrase accents and boundary tones on the tone tier (see Section 3.3).
Mainstream phonological theory might lead us to expect that these intonational constituents and the lower-level clitic group constituents will form a strict hierarchy (see Selkirk, 1980; Nespor & Vogel, 1986). The numerical scale of break index values reflects a mild bias in favor of such strictly hierarchical models (see the discussion in Price et al. 1991). Rather than building the expectation rigidly into its transcriptions, however, ToBI provides two regular mechanisms for denoting mismatches between different cues to subjective boundary strength. First, break index 2 denotes a mismatch between the constituency prescribed by the tonal transcription and the sense of disjuncture due to pauses and pause-like phenomena (see Section 3.4). Second, there is a diacritic `p' that can be appended to break indices 1, 2, and 3 to convey some sort of prosodic disfluency -- for example, an abrupt cutoff after a false start or a perceptible prolongation or pause which sounds as if the speaker were hesitating while searching for the next word (see Section 3.5). These two provisions should allow transcribers to avoid the circularity of basing a theory about the nature of the prosodic hierarchy upon the transcription of databases that might be used to explore such issues as the relationship between intonational constituents and pause (see, e.g., Woodbury, 1993, who proposed that pauses can be placed independently of intonational boundaries when the discourse structure requires the indication of competing segmentation strategies for topic structure versus rhetorical structure).
Except in more deliberate speech styles, such as the information-packed style of radio news announcers, the break index value that will be encountered most frequently is probably 1. The ToBI conventions define break index 1 negatively, as the label to be used for "most phrase-medial word boundaries", as contrasted with the marked phrase-medial cases transcribed by break index 0. Break index
EXAMPLE <<understand>>: I'm simply trying to get you to understand. 1 3 0 3 0 0 3 4
EXAMPLE <<kinds-v>>: What kinds of planes... 1 0 1 4
Break indices 0 and 1 form a natural progression with indices 3 and 4. These two break index strengths are equated with the intonational categories of intermediate (intonation) phrase and (full) intonation phrase. Thus, whenever the tonal analysis indicates a L- or H- phrase accent, the transcriber should decide where the end of the intermediate phrase marked by this tone label is and place a 3 on the break index tier to align with the orthographic label for the last word in the intermediate phrase. Similarly, whenever the tonal analysis indicates a L% or H% boundary tone, the transcriber should place a 4 on the break index tier at the end of the last word in the intonation phrase. In actuality, the ordering of these two analyses is sometimes reversed. This is particularly the case with the L% boundary tone; the transcriber might be convinced of the percept of a
EXAMPLE <<names>>: Anna may know my name, and yours too. Anna may know our names? H* L-H% H* H* L-L% L* H-H% 1 1 1 1 4 1 1 4 1 1 1 1 4
EXAMPLE <<park2>>: Definitely the shortest and probably the pleasantest H* L- H* L-L% H* L- H* 1 3 1 4 1 3 1 way to go is through the park. L- L+H* L-L% 0 1 3 1 1 1 4
EXAMPLE <<oregano>>: 1) Let's see I need oregano 'n marjoram 'n some H* H* L-L% L* H- L* H- 1 4 1 1 3 0 3 0 1 fresh basil okay? L+H* !H* L- H* H-H% 1 3 4 2) Oh I don't know it's got oregano 'n marjoram H* !H* !H* L-L% H* H- H* H- 1 1 1 4 1 1 3 0 3 'n some fresh basil. H* H-L% 0 1 1 4
EXAMPLE <<nose>>: Oh don't nuzzle me you marmalade-nose. X*? L- H* !H* L- L* L-H% 3 1 1 3 1 1 4
*********************************************** PRACTICE FIVE: break indices 0, 1, 3, and 4 *********************************************** You have already transcribed the tones on the following. Now transcribe the break indices. _______________________________________________________________________ EASY: EXERCISE <<manitowoc>>: Does Manitowoc have a bowling alley? [See PRACTICE TWO for tones.] EXERCISE <<butcher>>: How'd your operation go? Don't talk to me about it; I'd like to strangle the butchers. [See PRACTICE FOUR for tones.] EXERCISE <<stalin>>: I was wrong, and Stalin was right. I was wrong. [See PRACTICE TWO for tones.] EXERCISE <<flour2>>: Oh nothing special, you know flour and butter and sugar. [See PRACTICE TWO for tones. Transcribe just the second part, after the "you know".] EXERCISE <<thought>>: That's what I thought. [See PRACTICE ONE for tones.] _______________________________________________________________________ INTERMEDIATE: EXERCISE <<I-mean>>: You know what I mean? [See PRACTICE ONE for tones.] EXERCISE <<noodle1>>: We have a lean mini-noodle with beans. Well, we have a lean mini-noodle dish. [See PRACTICE THREE for tones.] EXERCISE <<knock-stuff>>: Mostly they just sat around and knocked stuff. You know, the school, other people. [See PRACTICE TWO for tones.] _______________________________________________________________________ DIFFICULT: EXERCISE <<argument>>: If he can then there's no argument about it. (two productions) [See PRACTICE THREE for tones.] EXERCISE <<artwork>>: State law now requires public construction projects to set aside 1% of their budgets for artwork. [See PRACTICE FOUR for tones.] EXERCISE <<anyway>>: But anyway, if you can't see that then I don't know if I can explain it to you. [See PRACTICE ONE for tones.]
As noted in the previous section, each 3 on the break index tier must correspond to the marking of a phrase accent for the intermediate phrase on the tone tier, and each 4 must correspond to the marking of a boundary tone. The implication is that any other interword juncture will be something that can be transcribed on the break index tier with either a 0 or a 1. However, the subjective impression of boundary strength does not always allow such a neat correspondence. In the course of developing the ToBI transcription system, we encountered several utterances in which we felt a strong sense of disjuncture at a boundary between two words where the pitch pattern showed no evidence of the necessary tonal events for either of these two levels of intonational constituency. We also encountered the converse case: utterances in which the pitch pattern at a boundary between two words clearly indicated an intermediate or intonation phrase boundary with none of the preboundary lengthening or other cues that support the subjective sense of a strong disjuncture. Break index 2 was devised to mark cases of these two types of `mismatch' between the subjective boundary strength and the intonational constituency. These two types are described in the ToBI Annotation Conventions as follows:
a strong disjuncture marked by a pause or virtual pause, but with no tonal marks; i.e. a well-formed tune continues across the juncture. OR a disjuncture that is weaker than expected at what is tonally a clear intermediate or full intonation phrase boundary.
Example utterance <<iraqi>> illustrates the first type of mismatch, and example utterance <<quincy>> illustrates the second. In <<iraqi>>, the smooth sequence of apparent downstepped peak accents with no clear intervening phrase accent suggests that the words "six", "southern", "iraqi", and "cities" all belong to the same intermediate phrase, yet there is an intonation phrase sized pausing between each adjacent pair of these words. In <<quincy>>, the clear tonal markings for at least an intermediate phrase boundary are unaccompanied by any clear preboundary lengthening, making some transcribers uncomfortable in labelling this juncture with a 3.
EXAMPLE <<iraqi>>: The Pentagon reports fighting in six southern L+H* L- L* H-H% H* !H* 1 1 3 4 2 2 2 iraqi cities. !H* X*? L-L% 2 4 A HREF="AU/iraqi.au">
EXAMPLE <<quincy>>: uh Quincy. Could I have the number to uh H* L- H* !H* L-L% 4 2 1 1 1 1 1 1 4 Shore Cab. *? H* H-L% 1 4
Break index 2 was devised for cases where the mismatch between the tonal marking and the disjuncture is not accompanied by any sense of hesitancy or disfluency. When 2 is used in the first way (to indicate a stronger sense of disjuncture than 1 even while producing a coherent contour for an uninterrupted intermediate phrase), it can have the rhetorical effect of careful deliberation, as in the <<iraqi>> example. In the opposite case (when 2 is used to mark intermediate phrase boundaries which do not have a very strong sense of disjuncture) the speaker may be speaking quickly to hold the floor or to convey a sense of urgency, while using the tonal marks necessary to convey attentional focus on several closely placed words. We suspect that both types of 2 will be explained ultimately by a better understanding of the complexities of discourse structure, an understanding that can best be achieved by the transcription and analysis of many occurrences in natural dialogue.
There are other cases of mismatch between tone tier and segmental rhythm, however, where break index 2 does not seem to be appropriate. For example, in utterance <<display>>, the pauses after "Baltimore", "which", and "leave" do not have the feel of a speaker striving for an effect of judicious deliberation, as in the "six southern Iraqi cities" phrase of the <<iraqi>> example, but rather sound disfluent, as if the speaker were hesitating as he searches for the next word. Such cases can be distinguished from fluent cases of 2 by the use of the p diacritic.
EXAMPLE <<display>>: Display all the flights from Baltimore to Dallas 1 1 1 1 3- 2p 0 4 which leave after 4:00 p.m. 2p 2p 3p 2p 4
1p -- an abrupt cutoff before an actual repair, or as if stopping to permit a repair or restart of some kind 2p -- a hesitation pause or prolongation of segmental material where there is no phrase accent perceived in the intonation contour 3p -- a hesitation pause or a pause-like prolongation where there is a phrase accent in the tone tier.
EXAMPLE <<amazing>>: um But I had I mean the stuff he knows is kind of 0 0 1p 3 1 4 1 1 1 4 1 1 1 amazing 'coz he does a lot of um environmental 3 1p 1 1 X 0 1 4 1 impact stuff 2p 4
EXAMPLE <<cheapest>>: I want to see the cheapest flight from Atlanta 1 1 1 1 3p 1 1 1 3 to Baltimore 1 4
Note also that the prolongation of segmental material for a 2p label can physically occur at the beginning of a word rather than at the end, as in example <<least>>, where the hesitation lengthens the [l] of "least" rather than the vowel of "the".
EXAMPLE <<least>>: Between Boston and Denver I'd like to a flight that 3 1 1 4 1 1 3p 1 1 1 takes the least amount of stops to get to Boston 3p 2p 1 0 1 4 1 1 0 4
EXAMPLE <<amazing>>: um But I had I mean the stuff he knows 0 0 1p 3 1 4 1 1 1 4 H* H* L- H* !H* L-L% H* !H* L-L% is kind of amazing 'coz he does a lot of um 1 1 1 3 1p 1 1 X 0 1 4 L+H* L- %r H* !H* L-L% environmental impact stuff 1 2p 4 H* H* H-L%
EXAMPLE <<connections>>: What are the plane sizes for these flights and 1 1 1 1 4 1 1 4 1 H* L* H-H% H* H* H-L% do they ha(ve)- do are there any other flights 2p 1 1p 1p 1 1 1 1 1 %r H* !H* that have s- connections 1 1 1p 4 %r H* L-L%
EXAMPLE <<abbreviation>>: What is the b- abbreviation n under 0 1 1 1p 3- 3 3p H* H- L+H* L- H* !H- the category d c mean 1 1 1 1 4 H* H* H* L-L%
In addition to these two well-defined types of `uncertainty' due either to conflicting evidence about boundary strength (break index 2) or to the interruption of fluent prosodic production at repairs and hesitations (the `p' diacritic), there will be cases of ordinary garden-variety uncertainty for other reasons. For example, (as we have already discussed above in Section 2.3) the f0 contour for an utterance-medial intonation phrase that ends with a L% boundary tone is often difficult to distinguish from a mere intermediate phrase. In such cases, where the transcriber cannot decide from other cues whether the tonal analysis should be L- versus L-L% (or H- versus H-L%), the break index marking is also necessarily ambiguous. The ToBI conventions prescribe that in such cases of transcriber uncertainty, the higher-level boundary should be chosen, and uncertainty marked by appending the `-' diacritic. Thus, in <<park2>> given above in 2.3, if no decision can be made between L- and L-L%, the correct break index marking is `4-'.
The same convention applies at lower levels of the hierarchy. For example, if the transcriber thinks that a word-final /d/ has been pronounced as a flap, joining the word it ends into a close prosodic unit with the following word, but is not certain that it is a flap and not just a rather short [d], then the correct break index marking is `1-'. A similar case involving /t/ is given in example utterance <<democrat>>. Here it is not clear whether the /t/ at the end of the word "democrat" has been flapped, or not released.
EXAMPLE <<democrat>>: The chairman, Wendell Ford, democrat of Kentucky... L+H* L- L+H* !H* L- H* L+H* L-L% 1 3 1 3 1- 1 4
EXAMPLE <<rewarding>>: A really rewarding day. L+H* L- H* L-L%
EXAMPLE <<noodle3>>: We have a lean mini-noodle dish. L+H* L- L+H* L-L% (compare <<noodle2>> given above in PRACTICE THREE)