Agent architecture for the emergent coordination of speaking turns with a user

Agent architecture for the emergent coordination of speaking turns with a user

Mathieu Jégou Pierre Chevaillier 

IRT b-com, 25 rue Claude Chappe, F-29280 Plouzané, France

ENIB, Lab-STICC, 25 rue Claude Chappe, F-29280 Plouzané, France

Corresponding Author Email:;
31 October 2017
| Citation



This article presents a continuous and emergent model for the speaking turns coordination of user-agent dyads. The peculiarity of this model resides in its ability to manage incrementally a large number of situations observed in human spoken interactions, competitive overlaps or smooth transitions. We begin this paper by introducing our conceptual model then we introduce the implementation of our model. We finally evaluate our model by a wizard-of-oz experiment where users interact in real-time with the agent. The results of this evaluation shows that our model improves the agent’s ability to coordinate its speaking turns with the user. However, we also observe that users have difficulties to link the agent’s behavior to its intentions towards turn-taking management.


conversational agent, behavioral architecture, coordination, perception-action, turn-taking, prosody

1. Introduction
2. Positionnement
3. Modèle théorique
4. Conception de l’architecture
5. Validation du modèle : interactions utilisateur-agent
6. Conclusion

Bailly G., & Gouvernayre C. (2012). Pauses and respiratory markers of the structure of book reading. In 13th Annual Conference of the International Speech Communication Association (InterSpeech 2012)

Baumann T., & Schlangen D. (2012). INPRO_iSS: A Component for Just-in-time Incremental Speech Synthesis. In Proceedings of the ACL 2012 System Demonstrations (p. 103–108).

Bevacqua E., Stanković I., Maatallaoui A., Nédélec A., & Loor P. D. (2014). Effects of Coupling in Human-Virtual Agent Body Interaction. In Intelligent Virtual Agents 2014 (p. 54‑63).

Cafaro A., Glas N., & Pelachaud C. (2016). The Effects of Interrupting Behavior on Interpersonal Attitude and Engagement in Dyadic Interactions. In AAMAS 2016 (p. 911–920).

Cassell J., Bickmore T., Campbell L., & Vilhjálmsson H. (2000). Conversation as a System Framework: Designing Embodied Conversational Agents. Embodied conversational agents, 29–63.

Clark H. H. (1996). Using Language. Cambridge: Cambridge University Press.

De Vault D., Mell J., & Gratch J. (2015). Toward natural turn-taking in a virtual human negotiation agent. In 2015 AAAI Spring Symposium Series.

De Vault D., Sagae K., & Traum D. (2011). Incremental interpretation and prediction of utterance meaning for interactive dialogue. Dialogue & Discourse, 2(1), 143-170.

Duncan S. (1972). Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology, 23(2), 283-292.

Ferrer L., Shriberg E., & Stolcke A. (2002). Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody. In ICSLP-2002 (p. 2061-2064).

Goldberg J. A. (1990). Interrupting the discourse on interruptions. Journal of Pragmatics, 14(6), 883-903.

Gravano A., & Hirschberg J. (2011). Turn-taking cues in task-oriented dialogue. Computer Speech & Language, 25(3), 601-634.

Heldner M., & Edlund J. (2010). Pauses, gaps and overlaps in conversations. Journal of Phonetics, 38(4), 555-568.

Huang L., Morency L.-P., & Gratch J. (2011). A Multimodal End-of-turn Prediction Model: Learning from Parasocial Consensus Sampling. In The 10th International Conference on Autonomous Agents and Multiagent Systems – Vol. 3 (p. 1289–1290).

Jégou M., Lefebvre L., & Chevaillier P. (2015). A Continuous Model for the Management of Turn-Taking in User-Agent Spoken Interactions Based on the Variations of Prosodic Signals. In Intelligent Virtual Agents 2015 (p. 389-398).

Jonsdottir G. R., & Thórisson K. R. (2013). A Distributed Architecture for Real-time Dialogue and On-task Learning of Efficient Co-operative Turn-taking. Coverbal Synchrony in Human-Machine Interaction, 293.

Kopp S., Krenn B., Marsella S., Marshall A. N., Pelachaud C., Pirker H., Vilhjálmsson H. (2006). Towards a common framework for multimodal generation: The behavior markup language. In IVA’06 Proceedings of the 6th international conference on Intelligent Virtual Agents (p. 205–217).

Kopp S., Welbergen H. van, Yaghoubzadeh R., & Buschmeier H. (2014). An architecture for fluid real-time conversational agents: integrating incremental output generation and input processing. Journal on Multimodal User Interfaces, 8(1), 97-108.

Kurtić E., Brown G. J., & Wells B. (2013). Resources for turn competition in overlapping talk. Speech Communication, 55(5), 721-743.

Lessmann N., Kranstedt A., & Wachsmuth I. (2004). Towards a cognitively motivated processing of turn-taking signals for the embodied conversational agent Max. In Proceedings of the Workshop Embodied Conversational Agents: Balanced Perception and Action (p. 65).

Levitan R., Benus S., Gravano A., & Hirschberg J. (2015). Entrainment and Turn-Taking in Human-Human Dialogue. In 2015 AAAI Spring Symposium Series.

McFarland D. H. (2001). Respiratory markers of conversational interaction. Journal of Speech, Language, and Hearing Research: JSLHR, 44(1), 128-143.

O’Connell D. C., Kowal S., & Kaltenbacher E. (1990). Turn-taking: A critical analysis of the research tradition. Journal of Psycholinguistic Research, 19(6), 345-373.

Oertel C., Wlodarczak M., Edlund J., Wagner P., & Gustafson J. (2013). Gaze patterns in turn-taking. In 13th Annual Conference of the International Speech Communication Association 2012 (INTERSPEECH 2012).

Ohshima N., Kimijima K., Yamato J., & Mukawa N. (2015). A conversational robot with vocal and bodily fillers for recovering from awkward silence at turn-takings (p. 325-330). IEEE.

Ratcliff R. (1978). A theory of memory retrieval. Psychological review, 85(2), 59-109.

Raux A., & Eskenazi M. (2012). Optimizing the Turn-taking Behavior of Task-oriented Spoken Dialog Systems. ACM Trans. Speech Lang. Process., 9(1), 1:1–1, 23.

Ravenet B., Cafaro A., Biancardi B., Ochs M., & Pelachaud C. (2015). Conversational Behavior Reflecting Interpersonal Attitudes in Small Group Interactions. In Intelligent Virtual Agents: 15th International Conference, IVA 2015, Delft, The Netherlands, August 26-28, 2015, Proceedings (Vol. 9238, p. 375).

Roberts F., & Francis A. L. (2013). Identifying a temporal threshold of tolerance for silent gaps after requests. The Journal of the Acoustical Society of America, 133(6).

Sacks H., Schegloff E. A., & Jefferson G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 696–735.

Schlangen D., Baumann T., Buschmeier H., Buss O., Kopp S., Skantze G., & Yaghoubzadeh R. (2010). Middleware for incremental processing in conversational agents. In Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue (p. 51–54).

Selfridge E., Arizmendi I., Heeman P., & Williams J. (2013). Continuously predicting and processing barge-in during a live spoken dialogue task. In Proceedings of the SIGDIAL 2013 Conference (p. 384–393).

Selfridge E. O., & Heeman P. A. (2009). A bidding approach to turn-taking. In 1st International Workshop on Spoken Dialogue Systems.

Sellen A. J. (1995). Remote Conversations: The Effects of Mediating Talk with Technology. Human-Computer Interaction, 10(4), 401‑44.

Skantze G., & Hjalmarsson A. (2010). Towards incremental speech generation in dialogue systems. In Proceedings of SIGDIAL 2010 (p. 1–8). Association for Computational Linguistics.

Ter Maat M., Truong K. P., & Heylen D. (2010). How turn-taking strategies influence users’ impressions of an agent. In Intelligent Virtual Agents (p. 441–453).

Thórisson K. R. (1999). A Mind Model for Multimodal Communicative Creatures & Humanoids. International Journal of Applied Artificial Intelligence, 13(4), 449-486.

Thórisson K. R. (2002). Natural turn-taking needs no manual: Computational theory and model, from perception to action. Multimodality in language and speech systems, 19.

Thórisson K. R., Gislason O., Jonsdottir G. R. & Thorisson H. T. (2010). A multiparty multimodal architecture for realtime turntaking. In Intelligent Virtual Agents (p. 350–356).

Warren W. H. (2006). The Dynamics of Perception and Action. Psychological Review, 113(2), 358-389.

Wilson M. & Wilson T. P. (2005). An oscillator model of the timing of turn-taking. Psychonomic Bulletin & Review, 12(6), 957-968.