Guidelines for ToBI Labelling

Preface | Overview
More on the Tone Tier | More on the Break Index Tier
The Miscellaneous Tier | Bibliography
The ToBI Annotation Conventions
How this HTML document was created

    3. More on the break index tier

    1. The break index tier relative to other tiers

      The other core part of the prosodic transcription proper is the break index tier. If we think of the tone tier as a marking of the speech signal mediated primarily by our interpretation of the analysis of the f0 contour, the analogous way to think of the break index tier is as a marking of the speech signal as mediated primarily by the rhythmic and segmental analysis implicit in the orthographic tier. The summary statement of ToBI conventions describes this relationship as follows:

        Break indices represent a rating for the degree of juncture perceived between each pair of words and between the final word and the silence at the end of the utterance. They are to be marked after all words that have been transcribed in the orthographic tier. All junctures -- including those after fragments and filled pauses -- must be assigned an explicit break index value; there is no default juncture type.

      Thus, the event s on the break index tier are labels of the utterance's prosodic grouping -- that is, each label denotes a boundary of some kind of constituent which ends at the word that the transcriber has marked on the orthographic tier. The convention for placing break index marks in a waves(tm) label file is that the number should be associated with a point in time at the end of the marked word as indicated by the label in the orthographic tier. It should be located exactly at, or slightly to the right, of this word marker, so that break indices can be unambiguously associated with other tiers.

      There are 5 break indices, numbered 0 through 4, roughly in order of lesser to greater degree of perceived separation between the marked word and following material. The break indices are meant to be a label of the SUBJECTIVE strength of the boundary. However, this does not mean that there are no objective criteria for marking the boundaries, or that the five labels form a uniform five-point scale. For example, the lowest-level break index (0) is defined in terms of connected speech processes, such as the flapping of word-final /t/ and /d/ before a following vowel-initial word in many American and Australian dialects, processes that prosodically group words together into `clitic groups' -- larger compound-word-like constituents above the level of the word (see Section 3.2). At the other end of the scale, the two highest break indices (3 and 4) are defined in relationship to the prosodic constituents (intermediate phrases and intonation phrases) that are assumed by the marking of phrase accents and boundary tones on the tone tier (see Section 3.3).

      Mainstream phonological theory might lead us to expect that these intonational constituents and the lower-level clitic group constituents will form a strict hierarchy (see Selkirk, 1980; Nespor & Vogel, 1986). The numerical scale of break index values reflects a mild bias in favor of such strictly hierarchical models (see the discussion in Price et al. 1991). Rather than building the expectation rigidly into its transcriptions, however, ToBI provides two regular mechanisms for denoting mismatches between different cues to subjective boundary strength. First, break index 2 denotes a mismatch between the constituency prescribed by the tonal transcription and the sense of disjuncture due to pauses and pause-like phenomena (see Section 3.4). Second, there is a diacritic `p' that can be appended to break indices 1, 2, and 3 to convey some sort of prosodic disfluency -- for example, an abrupt cutoff after a false start or a perceptible prolongation or pause which sounds as if the speaker were hesitating while searching for the next word (see Section 3.5). These two provisions should allow transcribers to avoid the circularity of basing a theory about the nature of the prosodic hierarchy upon the transcription of databases that might be used to explore such issues as the relationship between intonational constituents and pause (see, e.g., Woodbury, 1993, who proposed that pauses can be placed independently of intonational boundaries when the discourse structure requires the indication of competing segmentation strategies for topic structure versus rhetorical structure).

    2. Break indices 0 and 1

      Except in more deliberate speech styles, such as the information-packed style of radio news announcers, the break index value that will be encountered most frequently is probably 1. The ToBI conventions define break index 1 negatively, as the label to be used for "most phrase-medial word boundaries", as contrasted with the marked phrase-medial cases transcribed by break index 0. Break index

  1. , conversely, is defined with positive criteria as the value "for

    cases of clear phonetic marks of clitic groups; e.g. the medial affricate in contractions of `did you' or a flap as in `got it'." Since the other break indices are also defined by positive criteria (markings on the tone tier -- see Sections 3.3 and 3.4), we can think of break index 1 as the `default' (although, of course, there is no real default index in the sense of having a value that need not be marked because it is understood). We have already seen many examples of break index 0 in previous example utterances. For example, in example <<understand>> in Section 2.10 above, there are three cases of
  2. break index: the flapped /t/ on the two instances of the word "to"

    after "trying" and "you" and the palatalization of the /t/ at the juncture between "get" and "you" all are examples of connected-speech processes that we take as criteria for break level 0.

    EXAMPLE <<understand>>: I'm simply trying to get you to understand. 1 3 0 3 0 0 3 4 [GIF}

    Example utterance <<kinds-v>> illustrates yet another such connected-speech process: the apparent deletion of the vowel in "of" after "kinds", to make a phonotactically impermissible /zv/ word-final cluster.

    EXAMPLE <<kinds-v>>: What kinds of planes... 1 0 1 4 [GIF}

    Note that in some cases the phenomena denoted by break index 0 are so frequently encountered in particular types of sequences, that orthographic conventions have developed for marking them. For example, the flapping of the /t/ and consequent cliticization of the word "to" onto a preceding auxilliary verb "got" is sometimes indicated in writing by "gotta". Or the deletion of the initial /h/ and vowel of "have" in sequences such as "would have" can be indicated by spelling it "would've". In such cases, the transcriber has the alternative of marking the prosodic grouping by the choice of label on the orthographic tier instead. For example, by labelling the word as "gotta" rather than "got to" the transcriber has eliminated the word boundary where a 0 label might be placed on the break index tier.

    1. Break indices 3 and 4

      Break indices 0 and 1 form a natural progression with indices 3 and 4. These two break index strengths are equated with the intonational categories of intermediate (intonation) phrase and (full) intonation phrase. Thus, whenever the tonal analysis indicates a L- or H- phrase accent, the transcriber should decide where the end of the intermediate phrase marked by this tone label is and place a 3 on the break index tier to align with the orthographic label for the last word in the intermediate phrase. Similarly, whenever the tonal analysis indicates a L% or H% boundary tone, the transcriber should place a 4 on the break index tier at the end of the last word in the intonation phrase. In actuality, the ordering of these two analyses is sometimes reversed. This is particularly the case with the L% boundary tone; the transcriber might be convinced of the percept of a

  3. versus a 3 level boundary before deciding that there must be a L-L%

    or H-L% sequence as opposed to merely a L- or H- to be marked on the tone tier. Recall from the discussion in Section 2.3 that there may be little or no difference in f0 values between the end of a mere L- and a L-L% sequence or between a mere H- and a H-L% sequence; A L% following a L- is in the bottom of the speaker's pitch range just as a L-, whereas a L% following a H- is upstepped to the same level as the preceding phrase accent. In such cases, the analysis is necessarily more subjective; the transcriber must rely on the percept of degree of disjuncture with less help from the f0 contour. Some pertinent examples from earlier sections are repeated here.

    EXAMPLE <<names>>: Anna may know my name, and yours too. Anna may know our names? H* L-H% H* H* L-L% L* H-H% 1 1 1 1 4 1 1 4 1 1 1 1 4 [GIF}
    EXAMPLE <<park2>>: Definitely the shortest and probably the pleasantest H* L- H* L-L% H* L- H* 1 3 1 4 1 3 1 way to go is through the park. L- L+H* L-L% 0 1 3 1 1 1 4 [GIF}
    EXAMPLE <<oregano>>: 1) Let's see I need oregano 'n marjoram 'n some H* H* L-L% L* H- L* H- 1 4 1 1 3 0 3 0 1 fresh basil okay? L+H* !H* L- H* H-H% 1 3 4 2) Oh I don't know it's got oregano 'n marjoram H* !H* !H* L-L% H* H- H* H- 1 1 1 4 1 1 3 0 3 'n some fresh basil. H* H-L% 0 1 1 4 [GIF}
    EXAMPLE <<nose>>: Oh don't nuzzle me you marmalade-nose. X*? L- H* !H* L- L* L-H% 3 1 1 3 1 1 4 [GIF}

    When using waves(tm) label files, a 3 or 4 break index label and the corresponding phrase accent or boundary tone are placed together at the orthographic label, with the break index label coming last if the labels on the three tiers cannot be absolutely synchronized.

    *********************************************** PRACTICE FIVE: break indices 0, 1, 3, and 4 *********************************************** You have already transcribed the tones on the following. Now transcribe the break indices. _______________________________________________________________________ EASY: EXERCISE <<manitowoc>>: Does Manitowoc have a bowling alley? [See PRACTICE TWO for tones.] EXERCISE <<butcher>>: How'd your operation go? Don't talk to me about it; I'd like to strangle the butchers. [See PRACTICE FOUR for tones.] EXERCISE <<stalin>>: I was wrong, and Stalin was right. I was wrong. [See PRACTICE TWO for tones.] EXERCISE <<flour2>>: Oh nothing special, you know flour and butter and sugar. [See PRACTICE TWO for tones. Transcribe just the second part, after the "you know".] EXERCISE <<thought>>: That's what I thought. [See PRACTICE ONE for tones.] _______________________________________________________________________ INTERMEDIATE: EXERCISE <<I-mean>>: You know what I mean? [See PRACTICE ONE for tones.] EXERCISE <<noodle1>>: We have a lean mini-noodle with beans. Well, we have a lean mini-noodle dish. [See PRACTICE THREE for tones.] EXERCISE <<knock-stuff>>: Mostly they just sat around and knocked stuff. You know, the school, other people. [See PRACTICE TWO for tones.] _______________________________________________________________________ DIFFICULT: EXERCISE <<argument>>: If he can then there's no argument about it. (two productions) [See PRACTICE THREE for tones.] EXERCISE <<artwork>>: State law now requires public construction projects to set aside 1% of their budgets for artwork. [See PRACTICE FOUR for tones.] EXERCISE <<anyway>>: But anyway, if you can't see that then I don't know if I can explain it to you. [See PRACTICE ONE for tones.]

    1. Break index 2

      As noted in the previous section, each 3 on the break index tier must correspond to the marking of a phrase accent for the intermediate phrase on the tone tier, and each 4 must correspond to the marking of a boundary tone. The implication is that any other interword juncture will be something that can be transcribed on the break index tier with either a 0 or a 1. However, the subjective impression of boundary strength does not always allow such a neat correspondence. In the course of developing the ToBI transcription system, we encountered several utterances in which we felt a strong sense of disjuncture at a boundary between two words where the pitch pattern showed no evidence of the necessary tonal events for either of these two levels of intonational constituency. We also encountered the converse case: utterances in which the pitch pattern at a boundary between two words clearly indicated an intermediate or intonation phrase boundary with none of the preboundary lengthening or other cues that support the subjective sense of a strong disjuncture. Break index 2 was devised to mark cases of these two types of `mismatch' between the subjective boundary strength and the intonational constituency. These two types are described in the ToBI Annotation Conventions as follows:

          a strong disjuncture marked by a pause or virtual pause, but with
          no tonal marks; i.e. a well-formed tune continues across the
          juncture.
              OR
          a disjuncture that is weaker than expected at what is tonally a
          clear intermediate or full intonation phrase boundary.
      
      Example utterance <<iraqi>> illustrates the first type of mismatch, and example utterance <<quincy>> illustrates the second. In <<iraqi>>, the smooth sequence of apparent downstepped peak accents with no clear intervening phrase accent suggests that the words "six", "southern", "iraqi", and "cities" all belong to the same intermediate phrase, yet there is an intonation phrase sized pausing between each adjacent pair of these words. In <<quincy>>, the clear tonal markings for at least an intermediate phrase boundary are unaccompanied by any clear preboundary lengthening, making some transcribers uncomfortable in labelling this juncture with a 3.

      EXAMPLE <<iraqi>>: The Pentagon reports fighting in six southern L+H* L- L* H-H% H* !H* 1 1 3 4 2 2 2 iraqi cities. !H* X*? L-L% 2 4 A HREF="AU/iraqi.au">[GIF}
      EXAMPLE <<quincy>>: uh Quincy. Could I have the number to uh H* L- H* !H* L-L% 4 2 1 1 1 1 1 1 4 Shore Cab. *? H* H-L% 1 4 [GIF}

      Break index 2 was devised for cases where the mismatch between the tonal marking and the disjuncture is not accompanied by any sense of hesitancy or disfluency. When 2 is used in the first way (to indicate a stronger sense of disjuncture than 1 even while producing a coherent contour for an uninterrupted intermediate phrase), it can have the rhetorical effect of careful deliberation, as in the <<iraqi>> example. In the opposite case (when 2 is used to mark intermediate phrase boundaries which do not have a very strong sense of disjuncture) the speaker may be speaking quickly to hold the floor or to convey a sense of urgency, while using the tonal marks necessary to convey attentional focus on several closely placed words. We suspect that both types of 2 will be explained ultimately by a better understanding of the complexities of discourse structure, an understanding that can best be achieved by the transcription and analysis of many occurrences in natural dialogue.

    2. The p diacritic (and the %r tone label) [Christine Nakatani and Elizabeth Shriberg contributed greatly to the preparation of this and the following sections.]

      There are other cases of mismatch between tone tier and segmental rhythm, however, where break index 2 does not seem to be appropriate. For example, in utterance <<display>>, the pauses after "Baltimore", "which", and "leave" do not have the feel of a speaker striving for an effect of judicious deliberation, as in the "six southern Iraqi cities" phrase of the <<iraqi>> example, but rather sound disfluent, as if the speaker were hesitating as he searches for the next word. Such cases can be distinguished from fluent cases of 2 by the use of the p diacritic.


      EXAMPLE <<display>>: Display all the flights from Baltimore to Dallas 1 1 1 1 3- 2p 0 4 which leave after 4:00 p.m. 2p 2p 3p 2p 4 [GIF} [GIF}

      The p diacritic is used in conjunction with a break index 1, 2, or 3, to indicate a disfluency in the timing or separation of words across a break. The notation `p' was chosen initially to denote the prolongation of the hesitation pause with break indices 2 and 3, but we have since extended the diacritic's usage to cover also abrupt cutoffs before restarts and repairs, which are often but not necessarily separated from the disfluent stop by a pause. In this case, the appropriate break index is 1. Thus the inventory of combinations of break index and p diacritic is:

      1p -- an abrupt cutoff before an actual repair, or as if stopping to permit a repair or restart of some kind 2p -- a hesitation pause or prolongation of segmental material where there is no phrase accent perceived in the intonation contour 3p -- a hesitation pause or a pause-like prolongation where there is a phrase accent in the tone tier.

      The p diacritic is not used with break index 4, because it is difficult to reliably identify hesitations between two full intonational phrases. Example utterances <<amazing>> and <<cheapest>> illustrate the use of the diacritic with break indices 1 and 3. Example <<display>> also had an occurrence of 3p. Note the presence of the phrase accent distinguishing this interword juncture from the surrounding cases of 2p.

      EXAMPLE <<amazing>>: um But I had I mean the stuff he knows is kind of 0 0 1p 3 1 4 1 1 1 4 1 1 1 amazing 'coz he does a lot of um environmental 3 1p 1 1 X 0 1 4 1 impact stuff 2p 4 [GIF} [GIF}
      EXAMPLE <<cheapest>>: I want to see the cheapest flight from Atlanta 1 1 1 1 3p 1 1 1 3 to Baltimore 1 4 [GIF}

      In general the p diacritic should be used conservatively, and should not become a substitute for 2. A good test for appropriateness is to imagine whether the break would have been the same if the speaker were asked to repeat the utterance with the same intonation, but more `fluently'. If the break were the same upon repetition, it should probably not get the p.

      Note also that the prolongation of segmental material for a 2p label can physically occur at the beginning of a word rather than at the end, as in example <<least>>, where the hesitation lengthens the [l] of "least" rather than the vowel of "the".


      EXAMPLE <<least>>: Between Boston and Denver I'd like to a flight that 3 1 1 4 1 1 3p 1 1 1 takes the least amount of stops to get to Boston 3p 2p 1 0 1 4 1 1 0 4 [GIF} [GIF}

      Closely associated with these definitions of 1p, 2p, and 3p in the break index tier is the tone tier label %r, for restarting with a brand new intonation contour when the the last contour was interrupted without being finished by some disfluency. This is most common at a `repair', where the speaker abruptly stops and begins again with the intended or `repaired' material, as in example utterance <<amazing>>, already cited above, and in example <<connections>>, below.

      EXAMPLE <<amazing>>: um But I had I mean the stuff he knows 0 0 1p 3 1 4 1 1 1 4 H* H* L- H* !H* L-L% H* !H* L-L% is kind of amazing 'coz he does a lot of um 1 1 1 3 1p 1 1 X 0 1 4 L+H* L- %r H* !H* L-L% environmental impact stuff 1 2p 4 H* H* H-L%
      EXAMPLE <<connections>>: What are the plane sizes for these flights and 1 1 1 1 4 1 1 4 1 H* L* H-H% H* H* H-L% do they ha(ve)- do are there any other flights 2p 1 1p 1p 1 1 1 1 1 %r H* !H* that have s- connections 1 1 1p 4 %r H* L-L% [GIF} [GIF}

      As with the use of the p diacritic, one should be conservative in using %r. It is needed only if there is good evidence that a new intonational phrase has begun after disfluent pause, evidence such as a notable change in f0 range or amplitude. It should not be used in cases such as the "had" after the first 1p in <<amazing>>, which continues with a fluent H* accent in the same pitch range (unlike the H* !H* on "he does" after the second 1p in this utterance, which is in a new pitch range). Nor should %r be used in example utterance <<abbreviation>>, where after the speaker stumbles and pauses momentarily around the end of "what is the", the intonation on "abbreviation" continues as if there had been no interruption.

      EXAMPLE <<abbreviation>>: What is the b- abbreviation n under 0 1 1 1p 3- 3 3p H* H- L+H* L- H* !H- the category d c mean 1 1 1 1 4 H* H* H* L-L% [GIF}

      Especially, %r should not be used after a 3p, where the (re)start of a new intonation contour is already implicit in the break index for the intermediate phrase.

    3. Ordinary uncertainty.

      In addition to these two well-defined types of `uncertainty' due either to conflicting evidence about boundary strength (break index 2) or to the interruption of fluent prosodic production at repairs and hesitations (the `p' diacritic), there will be cases of ordinary garden-variety uncertainty for other reasons. For example, (as we have already discussed above in Section 2.3) the f0 contour for an utterance-medial intonation phrase that ends with a L% boundary tone is often difficult to distinguish from a mere intermediate phrase. In such cases, where the transcriber cannot decide from other cues whether the tonal analysis should be L- versus L-L% (or H- versus H-L%), the break index marking is also necessarily ambiguous. The ToBI conventions prescribe that in such cases of transcriber uncertainty, the higher-level boundary should be chosen, and uncertainty marked by appending the `-' diacritic. Thus, in <<park2>> given above in 2.3, if no decision can be made between L- and L-L%, the correct break index marking is `4-'.

      The same convention applies at lower levels of the hierarchy. For example, if the transcriber thinks that a word-final /d/ has been pronounced as a flap, joining the word it ends into a close prosodic unit with the following word, but is not certain that it is a flap and not just a rather short [d], then the correct break index marking is `1-'. A similar case involving /t/ is given in example utterance <<democrat>>. Here it is not clear whether the /t/ at the end of the word "democrat" has been flapped, or not released.


      EXAMPLE <<democrat>>: The chairman, Wendell Ford, democrat of Kentucky... L+H* L- L+H* !H* L- H* L+H* L-L% 1 3 1 3 1- 1 4 [GIF}

      Examples <<rewarding>>, <<noodle2>>, and <<noodle3>> illustrate cases where tonal sequences evident in the pitch contour might seem compatible with several alternative analyses, some with and some without a medial intermediate phrase break. When such utterances are transcribed outside of their larger discourses, these contours might be highly ambiguous.

      EXAMPLE <<rewarding>>: A really rewarding day. L+H* L- H* L-L% [GIF}
      EXAMPLE <<noodle3>>: We have a lean mini-noodle dish. L+H* L- L+H* L-L% (compare <<noodle2>> given above in PRACTICE THREE) [GIF}

      The minus symbol associated with uncertainty in break index value cannot be used in conjunction with the p diacritic. Uncertainty about whether or not to use the p should be conveyed by using `p?'.

      labelling_guide_v2.ASCII (augmented by some HTML)