Temporal association rules of social signals for the synthesis of behaviors of embodied conversationnal agents. Application to interpersonal stance

Temporal association rules of social signals for the synthesis of behaviors of embodied conversationnal agents. Application to interpersonal stance

Thomas Janssoone Chloé Clavel Kévin Bailly Gaël Richard 

Sorbonne Universités, UPMC Univ Paris 06, CNRS, ISIR, 4 place Jussieu, 75252 Paris, France

LTCI, Télécom ParisTech, Université Paris Saclay, Paris, France

Corresponding Author Email: 
prenom.nom@isir.upmc.fr; prenom.nom@telecom-paristech.fr
Page: 
511-536
|
DOI: 
https://doi.org/10.3166/RIA.31.511-536
Received: 
|
Accepted: 
|
Published: 
31 October 2017
| Citation
Abstract: 

In the field of Embodied Conversational Agent (ECA) one of the main challenges is to generate socially believable agents. The long run objective of the present study is to infer rules for the multimodal generation of agents’ socio-emotional behaviour. In this paper, we introduce the Social Multimodal Association Rules with Timing (SMART) algorithm. It proposes to learn the rules from the analysis of a multimodal corpus composed by audio-video recordings of human-human interactions. The proposed methodology consists in applying a Sequence Mining algorithm using automatically extracted Social Signals such as prosody, head movements and facial muscles activation as an input. This allows us to infer Temporal Association Rules for the behaviour generation. We show that this method can automatically compute Temporal Association Rules coherent with prior results found in the literature especially in the psychology and sociology fields. The results of a perceptive evaluation confirms the ability of a Temporal Association Rules based agent to express a specific stance.

Keywords: 

temporal association rules, TITARL, virtual agent, interpersonal stance, social signal processing

1. Introduction
2. État de l’art
3. SMART : trouver l’information temporelle liant les signaux sociaux
4. Validation : études selon différents signaux sociaux et différentes échelles de temps
5. Conclusion
Remerciements

Ce travail a été réalisé au sein du Labex SMART avec le support financier de l’État français, représenté par l’ANR, dans le cadre du programme Investissements d’Avenir sous la référence ANR-11-IDEX-0004-02. Les auteurs veulent également remercier la plate-forme Teralab pour son aide dans la réalisation de ce projet.

  References

Argyle M. (1975). Bodily communication. Methuen Publishing Company. Audibert N. (2007). Morphologie prosodique des expressions vocale des affects: quel timing pour le décodage de l’information émotionnelle. Actes des VIIèmes RJC Parole, Paris.

Barbulescu A., Ronfard R., Bailly G. (2016). Characterization of audiovisual dramatic attitudes. In Interspeech.

Bawden R., Clavel C., Landragin F. (2015). Towards the generation of dialogue acts in socioaffective ecas: a corpus-based prosodic analysis. Language Resources and Evaluation.

Boersma P., Weenink D. (2017, March). Praat: doing phonetics by computer [computer program]. version 6.0.27. Consulté sur http://www.praat.org/

Cafaro A., Vilhjálmsson H. H., Bickmore T., Heylen D., Jóhannsdóttir K. R., Valgardsson G. S. (2012). First impressions: Users’ judgments of virtual agents’ personality and interpersonal attitude in first encounters. In International conference on intelligent virtual agents.

Chen Y., Gao W., Zhu T., Ling C. (2002). Learning prosodic patterns for mandarin speech synthesis. Journal of Intelligent Information Systems.

Chindamo M., Allwood J., Ahlsen E. (2012). Some suggestions for the study of stance in communication. In Privacy, security, risk and trust (passat), 2012 international conference on and 2012 international conference on social computing (socialcom).

Chollet M., Ochs M., Pelachaud C. (2014). From non-verbal signals sequence mining to bayesian networks for interpersonal attitudes expression. In International conference on intelligent virtual agents.

Cowie R., Gunes H., McKeown G., Vaclau-Schneider L., Armstrong J., Douglas-Cowie E. (2010). The emotional and communicative significance of head nods and shakes in a naturalistic database. In Proc. of lrec int. workshop on emotion.

Cowie R., Sawey M. (2011). Gtrace-general trace program from queen’s, belfast.

Degottex G., Kane J., Drugman T., Raitio T., Scherer S. (2014). Covarep - a collaborative voice analysis repository for speech technologies. In 2014 ieee international conference on acoustics, speech and signal processing (icassp).

Dermouche S., Pelachaud C. (2016). Sequence-based multimodal behavior modeling for social agents. In 18th acm international conference on multimodal interaction. ACM.

Fernandez R., Rendel A., Ramabhadran B., Hoory R. (2014). Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In Interspeech.

Guillame-Bert M., Crowley J. L. (2012). Learning temporal association rules on symbolic time sequences. In Asian conference on machine learning.

Janssoone T., Clavel C., Bailly K., Richard G. (2016). Using temporal association rules for the synthesis of embodied conversational agent with a specific stance. In International conference on intelligent virtual agents.

Keltner D. (1995). Signs of appeasement: Evidence for the distinct displays of embarrassment, amusement, and shame. Journal of personality and social psychology, vol. 68, no 3.

Laskowski K., Edlund J., Heldner M. (2008). Learning prosodic sequences using the fundamental frequency variation spectrum. In Proc. 4th international conference on speech prosody.

Lee J., Marsella S. (2012). Modeling speaker behavior: A comparison of two approaches. In Iva.

Martínez H. P., Yannakakis G. N. (2011). Mining multimodal sequential patterns: a case study on affect detection. In Proceedings of the 13th international conference on multimodal interfaces.

McKeown G., Valstar M., Cowie R., Pantic M., Schröder M. (2012). The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affective Computing, vol. 3, no 1.

Nicolle J., Bailly K., Chetouani M. (2015). Facial action unit intensity prediction via hard multi-task metric learning for kernel regression. , vol. 6.

Ochs M., Pelachaud C. (2012). Model of the perception of smiling virtual character. In 11th international conference on autonomous agents and multiagent systems-volume 1.

Pentland A. (2004). Social dynamics: Signals and behavior. In 3rd international conference on developmental learning.

Ravenet B., Ochs M., Pelachaud C. (2013). From a user-created corpus of virtual agent’s non-verbal behavior to a computational model of interpersonal attitudes. In International workshop on intelligent virtual agents.

Rudovic O., Nicolaou M. A., Pavlovic V. (2014). 1 machine learning methods for social signal processing.

Sandbach G., Zafeiriou S., Pantic M. (2013). Markov random field structures for facial action unit intensity estimation. In Ieee international conference on computer vision workshops.

Savran A., Cao H., Nenkova A., Verma R. (2014). Temporal bayesian fusion for affect sensing: Combining video, audio, and lexical modalities. IEEE transactions on cybernetics, vol. 45, no 9.

Scherer K. R. (2005). What are emotions? and how can they be measured? Social science information.

Srikant R., Agrawal R. (1996). Mining sequential patterns: Generalizations and performance improvements. Advances in Database Technology—EDBT’96.

Truong K., Heylen D., Chetouani M., Mutlu B., Salah A. A. (2015). Erm4ct ’15: Proceedings of the international workshop on emotion representations and modelling for companion technologies.

Tusing K. J., Dillard J. P. (2000). The sounds of dominance. Human Communication Research.

Vinciarelli A., Pantic M., Bourlard H. (2009). Social signal processing: Survey of an emerging domain. Image and vision computing, vol. 27, no 12.

Vinciarelli A., Pantic M., Heylen D., Pelachaud C., Poggi I., D’Errico F. et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing.

Ward N., A. S. (2016). Action-coordinating prosody. Speech Prosody.

Zhao R., Sinha T., Black A., Cassell J. (2016). Socially-aware virtual agents: Automatically assessing dyadic rapport from temporal patterns of behavior. In International conference on intelligent virtual agents.