Rhythm and timing in dialogue
Petra Wagner, Ipke Wachsmuth
Dialogue partners show a strong ability to synchronize their utterances. Models of rhythmic
entrainment can be useful in explaining the temporal synchronizations across utterances, that
is, in the anticipation of temporal windows for backchannel utterances or turn initializations.
In this project, we aim to identify the rhythmic-prosodic cues used by the listener to generate
hypotheses concerning the timing of (potential) upcoming dialogue contributions. The goal of the
project is to build a model that is both descriptively adequate, in line with cognitively
plausible models of rhythmic entrainment, and can be integrated into an artificial agent.
The exact timing of initializing utterances in dialogue is crucial for natural interaction.
Previous analyses have shown the impact of rhythmical structure enabling dialogue partners to
generate precise timing hypotheses. Dialogue partners show a strong ability to synchronise their
speech utterances, a phenomenon that has often been called entrainment. Such models of
entrainment are potentially useful in explaining the temporal synchronisations across
utterances, i.e. it can be helpful in the anticipation of turn ends or temporal windows for
backchannels. It is also well-known that listeners are guided much by their rhythmical
expectations when processing speech or multimodal utterances.
In this new project, we aim to identify the rhythmic-prosodic cues used by the listener to
generate hypotheses concerning the timing of (potential) upcoming dialogue contributions. The
main goal of the project is to build a model of timing in dialogue that is both descriptively
adequate, in line with cognitively plausible models of rhythmical entrainment, and that can be
implemented in an artificial agent. Temporal models of dialogue timing often fail to take into
account the dynamic perspective of the rhythmical properties of speech, i.e. the fact that
within an utterance, speakers may accelerate, decelerate or change the broad rhythmical pattern
of their utterance. Dynamic models of temporal entrainment provide an explanatory basis for a
dynamic perspective on dialogue timing. These ideas seem a promising starting point to improve a
previously developed computational model for turn taking in a dynamic interaction loop. We will
collect and analyse semi-spontaneous speech data and implement these in a dynamic model based on
adaptive oscillators and integrated in an artificial agent for the purpose of model