Menu Content/Inhalt
SFB 673 arrow Events
Lecture
show month-view show week-view   
Andy Lücking: The Bielefeld Speech and Gesture Alignment Corpus (SaGA)
Freitag, 25.06.2010 12:15 - 13:00
The Bielefeld Speech and Gesture Alignment Corpus (SaGA)

People communicate multimodally. Most prominently, they co-produce speech and gesture. How do they do that? Studying the interplay of both modalities has to be informed by empirically observed communication behavior. We present a corpus built of speech and gesture data gained in a controlled study. We describe 1) the setting underlying the data; 2) annotation of the data; and 3) reliability evalution methods and results.

Ad 1): The primary data of the SaGA corpus are made up of 25 dialogs of interlocutors which engage in a spatial communication task combining direction-giving and sight description. The stimulus is a model of a town presented in a Virtual Reality (VR) environment. The VR scenario affords better determination and experimental control of the content of multimodal messages. Additionally, it secures that all participants receive the same stimulus. Upon finishing a “bus ride” through the VR town along five landmarks, a so-called router explained the route as well as the wayside landmarks to an unknown and naive follower. Our primary data consists of 25 dialogs. They make up 280 minutes video material containing 4961 iconic/deictic gestures, approximately 1000 discourse gestures and 39,435 words.

Ad 2): The data has been completely annotated based on two annotation manuals. All gestures have been segmented in order to specify the stroke phase. The strokes are then typed for belonging to a certain kind, namely deictic, iconic or discourse (hybrids are allowed). The classification of iconic gestures furthermore distinguishes eight representation techniques. Each gesture has been coded for its morphology consisting of handshape, wrist position, palm orientation, and back of hand orientation. Movements within any of these dimensions is coded in terms of movement features, syntactically built by concatenating static feature-value atoms. We also transcribed interlocutors' speech on the level of words. Speech is enriched with information about the overall discourse context, that is, in terms of theme and rheme, given and new, and private and shared.

Ad 3): The annotation data has been evaluated in terms of interrater reliability. Here, a qualitative distinction has to be made, namely the distinction between Type I vs. Type II ratings, i.e., non-nominal vs. nominal ratings. As a chance-corrected coefficient determining the level of agreement to be found in Type II data, we calculate the first order agreement coefficient AC1. In order to assess the extent of association between annotations of the Type I gesture morphology, we employ an approach based on angle measures. Chance-corrected agreement on Type II data surpasses the self-set threshold of 70%. Observed interrater agreement on Type I data results in angular values which, by and large, denote rather harmless dissent between annotators (27° at average).
Contact
contact-person: Andy Lücking
E-mail: This e-mail address is being protected from spam bots, you need JavaScript enabled to view it