Guidelines for ToBI Labelling

Preface | Overview
More on the Tone Tier | More on the Break Index Tier
The Miscellaneous Tier | Bibliography
The ToBI Annotation Conventions
How this HTML document was created

    4. The miscellaneous tier (and other aspects of the marking of disfluencies)

    1. The miscellaneous tier defined

      The miscellaneous tier is in essence a `comment' tier for the optional marking of events of any kind other than the standard words, tones, and disjunctures marked on the orthographic tier, the tone tier, and the break index tier. Many of the events labelled on the miscellaneous tiers are things that span longish intervals. In this, miscellaneous events are like the word events labelled on the orthographic tier. However, the two types of events are very different, in that a strict succession of miscellanous events is not essential to speech, whereas speech must be a succession of produced words (or pieces of words). Therefore, whereas the ToBI convention is to mark each event on the orthographic tier only at the end of the interval that the event spans, it prescribes that an event on the miscellaneous tier should in general be marked for both its end and its beginning, using the diacritics `>' and `<', respectively. Thus labels on the miscellaneous tier usually come in pairs, such as:

      breath< breath> laugh< laugh> cough< cough>

      Example <<cough>> in Section 1.1 illustrated the use of the miscellaneous tier to mark the cough that interrupts the utterance.


      EXAMPLE <<cough>>: Will you have marmalade ... L* L* 1 1 1 1p cough< cough>
      Another similar example is the laughter that interrupts the pitch contour in utterance <<laugh>>.


      EXAMPLE <<laugh>>: To me  it this seems very obvious;   to make it on
                           1  3   1    1     1    1       4     1    1  1  3
                            H* L-                L+H*    L-L%     H*     !H* L-
                                           laugh<     >laugh laugh<
                         to make it by hand is much more fun than to make it on
                           1    1  1  1    3  1    1    1   1    1  1    1  0  1
                             H*       L+H* L-   H*
                                   >laugh
                         a computer.
                          1        4
                                   L-L%
      [GIF}
      
      Since such markings are useful for parsing the disruption of otherwise tonally well-formed intonation contours, we can think of them as a source of `disfluency'. Indeed, the ToBI Annotation Conventions encourage the marking of disfluencies, and suggest the use of `disfl<...disfl>' (or `disfl') as a general flag for them:

        In general, it is the assumption of the participants in the common transcription group that silences should be automatically detectable, at least to a first approximation, and that transcriber time should not be spent marking these by hand. Disfluencies, by contrast, are not automatically detectable, and the absence of markings for them makes it difficult to parse the tone and break index tiers. For these reasons, transcribers are urged to mark disfluencies on the miscellaneous tier using `disfl<' and `disfl>' (or `disfl' if the disfluency is extremely localized), and to provide these marks in the miscellaneous tier menu when using waves(tm)).

      However, it is often easier to determine that something is disfluent in some region than it is to determine exactly where the disfluency begins and ends. For this reason, the ToBI Annotation conventions specify that the marks can be used more like a disfluency flag rather than the demarcation of a precise region:

        ...the marks `disfl<' and `disfl>' (or simple `disfl') should be interpreted as rough pointers to the disfluent region and transcribers should not agonize over placing them precisely.

      Note that here the ToBI Annotation Conventions explicitly mention the use of a single mark, rather than a pair of marks for the beginning and end of a region. However, they specifically recommend this usage only for disfluencies, to encourage the marking of something that is typically very difficult to locate precisely in time. Transcribers should be careful about using a single (unpaired) label on the misc tier for anything other than marking the general location of a perceived disfluency, since in any other circumstance, the usual interpretation must be that the event is so localized that its beginning is virtually the same point as its end.

      Example utterance <<fare>> is an example of a disfluency marked in this way.


      EXAMPLE <<fare>>: show me the cheapest fare    from Da- from
                            1  1   1        1    4       1  1p    1
                          H*        L+H*     !H* L-L%        %r
                                                       disfl< disfl>
                        Philadelphia to Dallas     excluding    restriction
                                    3  1      4             4           4
                       L+H* !H*     L-  H*    L-L%  L+H*   L-L%      H* L-L%
                        v   u    slash one
                          1  3        1   4
                        H* !H* L- H*   H* L-L%
      
      [GIF}
      [GIF}
      
      Although the miscellaneous tier is a general-purpose `comment' tier, we recommend that when transcribers at a particular site find themselves often adding comments that fit some particular pattern other than these, they consider defining another extra tier for that purpose.

      Christine Nakatani and Elisabeth Shriberg, both of whom have worked extensively on disfluencies in naturally spoken utterances, differentiate more finely, and suggest guidelines for other transcribers who wish to differentiate types of disfluencies in the same way. The following section is adapted from their suggested guidelines, and uses many of their examples.

    2. Suggested guidelines for marking disfluencies

      Nakatani and Shriberg have identified several different types of events that they feel should count as disfluencies in ToBI. Not all of these need be marked on the misc tier in order to be recovered. In particular, mere hesitation pauses can be recovered from the use of the 2p or 3p marks on the break index tier and (in the case of many filled pauses) from the transcription of the filler material on the orthographic tier. Phenomena that might be flagged as disfluencies on the misc tier include such phenomena as stumbling over a word, or abruptly cutting off a word or phrase in midstream to make a fragment, as in <<fare>> cited above, or <<transport>> below. These are examples of the first of the major classes of disfluency which Nakatani and Shriberg identify, including what they call `phonetic error'.


      EXAMPLE <<transport>>: show ground transpor- ground transportation
                                 1      1        1p      1              4
                                     disfl:repair<
                                       disfl:repair>
                             at  atlanta
                               2p       4
      [GIF}
      [GIF}
      
      The second of the three major classes is the hesitation pause. This includes both silent pauses as in the examples transcribed with 2p above, and filled pauses -- that is hesitation intervals during which the speaker holds the floor by producing hesitation noises or other material, as in <<weight>>.


      EXAMPLE <<weight>>: The weight on a six on a seven sixty seven is
                             1      0  1 1   2p 1 1     2p    1     3- 2p
                          three thousand uh three hundred and twelve
                               1        2p 4     1       1   1      1
                          thousand pounds uh is that including passengers
                                  2p     2p 4  1    1         3          4
      [GIF}
      [GIF}
      
      Nakatani and Shriberg recommend that the spelling of hesitation noises be standardized so that later users of a ToBI transcribed database need search only for a limited set of `words' in recovering the disfluency. In particular, for standard American English, they recommend the use of only "um", "uh", or "mm". That is, transcribers should not invent other spellings such as "ah" or "uhhhh" to reflect differences in the quality of the reduced vowel or the duration of the syllable. With this stipulation, filled pauses of this sort would not need to be flagged on the misc tier, since they would be recoverable from the orthographic tier.

      A filled pause may be perceived as unaccented, and yet as constituting its own intermediate or intonational phrase. Normally each intonation phrase is required to have at least one pitch accent. In the case of filled pauses this criterion is relaxed; an unaccented filled pause in its own phrase can be labelled with the phrase accent (chosen from the full inventory) without the requirement that a pitch accent be marked on the filled pause.

      The last major class of disfluency is the class of repairs and fresh starts, which Nakatani and Shriberg define as "lexical self-corrections of parts of sentences and whole sentences, respectively". They give us utterance <<fare>> as an example of a repair, and <<connections>> as an example of a fresh start. (Here we have used the misc tier to mark these interpretations of the disfluencies.) These two examples also illustrate abrupt cutoffs resulting in word fragments.


      EXAMPLE <<fare>>: show me the cheapest fare from Da- from
                            1  1   1        1    4    1   1p   1
                                                      repair<
                                                         repair>
                        Philadelphia to Dallas excluding restriction
                                    3  1      4         4           4 
                        v u slash one
                         1 3     1   4
      


      EXAMPLE <<connections>>: What are the plane sizes for these flights and 1 1 1 1 4 1 1 4 1 do they ha(ve)- do are there any other flights 2p 1 1p 1p 1 1 1 1 1 restart< restart> that have s- connections 1 1 1p 4
      More detailed suggestions about how to flag repairs can be obtained by writing directly to Christine Nakatani (chn@das.harvard.edu) or Elizabeth Shriberg (ees@speech.sri.com).


      *********************************************************** PRACTICE SIX: break index 2, the p diacritic, disfluencies *********************************************************** Transcribe these exercises using the exercises script. _______________________________________________________________________ EASY: EXERCISE <<park5>>: Uh and then I go under a footbridge and into the park. EXERCISE <<business>>: A lot of people have done this; they sell their business, and they have... If something goes wrong, and they have the first rights to buy it back. [Repeated exercise from PRACTICE THREE. Transcribe the phrase "and they have,..."] EXERCISE <<howto>>: I know we've gotta do it but I don't know how to do it. [Repeated exercise from PRACTICE FOUR.] _______________________________________________________________________ INTERMEDIATE: EXERCISE <<mean>>: Because I I mean, to make a map on computer is not n- nearly as much fun. EXERCISE <<semester>>: The advisor to f- fill out my schedule for the first semester said "Why don't you take Introduction... Intro... Introductory Linguistics." EXERCISE <<spoon2>>: There's a spoon in here. [Compare <<spoon1>> in PRACTICE TWO.] EXERCISE <<author>>: The author of more than eight hundred state supreme court opinions (Hennessy is widely respected for his legal scholarship and his administrative abilities.) [This is the first part of <<hennessy>> in Section 2.10.] EXERCISE <<usually-not>>: Usually not, no. Nah. Usually they won't give you chances. _______________________________________________________________________ DIFFICULT: EXAMPLE <<tuition>>: My learning experiences are on the job, so when I screw something up instead of s- spending all this money to go to college... When I screw up a job, that's my tuition for college. That's exactly, exactly how it works, there's no difference at all. EXERCISE <<fail>>: And what happens is: when you... when you buy my business, and you try to run my business, it's really hard for you to run my business. So a lotta times they fail. [Repeated exercise from PRACTICE THREE. You've transcribed most of the tones already. Now you're ready to worry about the break indices, particularly those around the first "when you..." and "when you try to run my business".] EXERCISE <<figureout>>: Half the job is accomplished by just starting it. [Interviewer: Mm-hmm] So just start doing it, and you'll figure it out. [Interviewer: Yeah] You know what I mean?

      labelling_guide_v2.ASCII (augmented by some HTML)