Implémentation et Évaluation d’un Modèle D’Attention pour la Vision Adaptative

Implémentation et Évaluation d’un Modèle D’Attention pour la Vision Adaptative

Matthieu Perreira Da Silva Vincent Courboulay 

IRCCyN, Université de Nantes Rue Christian Pauc - BP 50609, F-44306 Nantes cedex 03

L3I, Université de La Rochelle Avenue M. Crépeau, F-17042 La Rochelle cedex 01

| |
| | Citation



In the field of scene analysis for computer vision, a trade-off must be found between the quality of the results expected, and the amount of computer resources allocated for each task. Using an adaptive vision system provides a more flexible solution as its analysis strategy can be changed according to the information available concerning the execution context. We describe how to create and evaluate a visual attention system tailored for interacting with a computer vision system so that it adapts its processing according to the interest (or salience) of each element of the scene. We propose a new set of constraints named PAIRED to evaluate the adequacy of a model with respect to its different applications. We justify why dynamical systems provide good properties for simulating the dynamic competition between different kinds of information. We present different results that demonstrate that our results are fast and highly configurable and plausible.

Extended Abstract

While machine vision systems are becoming increasingly powerful, in most regards they are still far inferior to their biological counterparts. In human, the mechanisms of evolution have generated the visual attention system which selects the most important information in order to reduce both cognitive load and scene understanding ambiguity. Thus, studying the biological systems and applying the findings to the construction of computational vision models and artificial vision systems are a promising way of advancing the field of machine vision.

In the field of scene analysis for computer vision, a trade-off must be found between the quality of the results expected, and the amount of computer resources allocated for each task. It is usually a design time decision, implemented through the choice of pre-defined algorithms and parameters. However, this way of doing it limits the generality of the system. Using an adaptive vision system provides a more flexible solution as its analysis strategy can be changed according to the information available concerning the execution context. As a consequence, such a system requires some kind of guiding mechanism to explore the scene faster and more efficiently.

In this article, we propose a first step to building a bridge between computer vision algorithms and visual attention. In particular, we describe how to create and evaluate a visual attention system tailored for interacting with a computer vision system so that it adapts its processing according to the interest (or salience) of each element of the scene. Somewhere in between hierarchical salience based and competitive distributed models, we propose a hierarchical yet competitive model. Our original approach allows us to generate the evolution of attentional focus points without the need of either saliency map or explicit inhibition of return mechanism. This new real-time computational model is based on a dynamical system. The use of such a complex system is justified by an adjustable trade-off between nondeterministic attentional behavior and properties of stability, reproducibility and reactiveness.

In the first two sections, we start by giving a brief overview of the main theories and concepts of human visual attention and we provide the forces and weaknesses of state of the art attention models. This analysis is based on their potential of integration into adaptable computer vision system. We propose a new set of constraints called ‘PAIRED’ to evaluate the adequacy of a model with respect to its different applications.

In a third section, we provide an in-depth description of our model and its implementation. We justify why dynamical systems are a good choice for visual attention simulation, and we show that preys/predators models provide good properties for simulating the dynamic competition between different kinds of information. This dynamical system is also used to generate a focus point at each time step of the simulation. In order to show that our model can be integrated in an adaptable computer vision system, we show that this architecture is fast and allows a flexible real time visual attention simulation. In particular, we present a feedback mechanism used to change the scene exploration behavior of the model. This mechanism can be used to maximize the scene coverage (explore each and every part) or maximize focalization on a particular salient area (tracking).

In a last section we present the evaluation results of our model. Since the model is highly configurable, its evaluation will not cover not its plausibility compared to human eye fixations (already studied in (Perreira Da Silva et al., 2011)), but the influence of each parameter on a set of properties:

– stability: do the values of the dynamical system stay within their nominal range when the different parameters of the model are changed?

– reproducibility: as discrete dynamical system can have a chaotic behavior, what is the influence of the various parameters of the model (in particular, noise) on the variability of the focus paths generated during different simulations on the same data?

– scene exploration: which parameters influence the scene exploration strategy of our model?

– system dynamics: how can we influence the reactivity of the system? In particular how do we deal with mean fixation time?

For all of these properties we have also studied the influence of top-down feedback.


Dans le domaine de l’analyse de scène en vision par ordinateur, un compromis doit être trouvé entre la qualité des résultats attendus et les ressources allouées pour effectuer les traitements. Une solution flexible consiste à utiliser un système de vision adaptatif capable de moduler sa stratégie d’analyse en fonction de l’information disponible et du contexte. Dans cet article, nous décrivons comment concevoir et évaluer un système d’attention visuelle conçu pour interagir avec un système de vision de façon à ce que ce dernier adapte ses traitements en fonction de l’intérêt (de la saillance) de chaque élément de la scène. Nous proposons également un nouvel ensemble de contraintes nommé PAIRED, permettant d’évaluer l’adéquation du modèle à différentes applications. Nous justifions le choix des systèmes dynamiques par leurs propriétés intéressantes pour simuler la compétition entre différentes sources d’informations. Nous présentons enfin une validation à travers différentes métriques montrant que nos résultats sont rapides, hautement configurables et pertinents.


attention model, dynamical model, adaptive vision, implementation, evaluation.


modèle dynamique d’attention, vision adaptative, implémentation, évaluation.

1. Introduction
2. Modèles Computationnels d’Attention Visuelle
3. Un Modèle d’Attention Visuelle Hiérarchique Compétitif
4. Évaluation du Modèle
5. Conclusion

Ahmad S. (1992). Visit: An efficient computational model of human visual attention. Phd, University of Illinois, Champaign, IL.

Allport D. A. (1987). Selection for action: Some behavioral and neurophysiological considerations of attention and action. In H. Heuer, S. A.F. (Eds.), Perspectives on perception and action, p. 395–419. Hillsdale, NJ, Lawrence Erlbaum Associates.

Avraham T., Lindenbaum M. (2010). Esaliency (extended saliency): meaningful attention using stochastic image modeling. IEEE transactions on pattern analysis and machine intelligence, vol. 32, no 4, p. 693–708.

Aziz M., Mertsching B. (2009). Towards Standardization of Evaluation Metrics and Methods for Visual Attention Models. In Attention in cognitive systems, p. 227–241. Springer.

Baldi P., Itti L. (2005). Attention: Bits versus Wows. In 2005 international conference on neural networks and brain, p. 56–61. Ieee.

Belardinelli A., Pirri F., Carbone A. (2009). Motion Saliency Maps from Spatiotemporal

Filtering. In Lecture notes in artificial intelligence, p. 112–123. Springer.

Berthoz A. (2009). La simplexité. Paris, Odile Jacob.

Bruce B., Jernigan E. (2003). Evolutionary design of context-free attentional operators. In Proc. icip’03, p. 0–3. Citeseer.

Bruce N. D. B., Tsotsos J. K. (2009). Saliency, attention, and visual search: An information theoretic approach. Journal of Vision, vol. 9, no 3, p. 5.

Deco G. (2004). A Neurodynamical cortical model of visual attention and invariant object recognition. Vision Research, vol. 44, no 6, p. 621–642.

Desimone R., Duncan J. (1995). Neural mechanisms of selective visual attention. Annual review of neuroscience, vol. 18, p. 193–222.

Dorr M., Gegenfurtner K. R., Barth E. (2010). Variability of eye movements when viewing dynamic natural scenes. Journal of Vision, vol. 10, p. 1–17.

Eliasmith C. (1995). Mind as a dynamical system. Thèse de master, University of Waterloo. 20mind.masters.pdf

Fox M. D., Snyder A. Z., Vincent J. L., Raichle M. E. (2007). Intrinsic Fluctuations within Cortical Systems Account for Intertrial Variability in Human Behavior. Neuron, vol. 56, no 1, p. 171–184.

Frintrop S. (2005). VOCUS: A Visual Attention System for Object Detection and Goal- Directed Search. Phd, University of Bonn.

Frintrop S., Backer G., Rome E. (2005). Selecting what is important: Training visual attention. In 28th annual german conference on ai (ki), p. 351–366. Koblenz, Germany, Springer Verlag.

Frintrop S., Klodt M., Rome E. (2007). A real-time visual attention system using integral images. In 5th international conference on computer vision systems (icvs). Bielefeld, Germany, Applied Computer Science Group.

Gilles S. (1996). Description and experimentation of image matching using mutual information. Rapport technique. Oxford University, Robotics Research Group, Department of Engineering Science.

Hamker F. (2005). The emergence of attention by population-based inference and its role in distributed processing and cognitive control of vision. Computer Vision and Image Understanding, vol. 100, no 1-2, p. 64–106.

Heijden A. H. C. van der, Bem S. (1997). Successive approximations to an adequate model of attention. Consciousness and cognition, vol. 6, no 2-3, p. 413–28.

Idema T. (2005). The behaviour and attractiveness of the Lotka-Volterra equations. Phd, Universiteit Leiden.\~{ idema/publications/maththesis.pdf

Itti L., Koch C. (2001). Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging, vol. 10, p. 161–169.

Itti L., Koch C., Niebur E., Others. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelligence, vol. 20, no 11, p. 1254–1259.

Kadir T., Brady M. (2001). Saliency, scale and image description. International Journal of Computer Vision, vol. 45, no 2, p. 83–105.

Koch C., Ullman S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Hum Neurobiology, vol. 4, no 4, p. 219–27.

Le Meur O. (2005). Attention sélective en visualisation d’images fixes et animées affichées sur écran : modèles et évaluation de performances - applications. Thèse de doctorat, Ecole polytechnique de l’Université de Nantes.

Le Meur O., Le Callet P. (2009). What we see is most likely to be what matters: Visual attention and applications. In International conference on image processing. Cairo, Egypt.

Le Meur O., Le Callet P., Barba D., Thoreau D. (2006). A coherent computational approach to model bottom-up visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no 5, p. 802–817.\~{}lecallet/paper/LeMeur-IEEEPAMI06.pdf

Lesser M., Dinah M. (1998). Mind as a dynamical system : Implications for autism. In In psychobiology of autism : current research & practice.

Lienhart R., Maydt J. (2002). An extended set of haar-like features for rapid object detection. In Ieee icip, vol. 1, p. 900–903. Citeseer.

Lopez M., Fernandezcaballero A., Fernandez M., Mira J., Delgado A. (2006). Motion features to enhance scene segmentation in active visual attention. Pattern Recognition Letters, vol. 27, no 5, p. 469–478.

Mancas M. (2007). Computational Attention : Towards attentive computers. Phd, Faculté Polytechnique de Mons.

Mozer M. C., Sitton M. (1998). Computational modeling of spatial attention. Attention, p. 341–393.

Murray J. (2003). Mathematical biology: An introduction. Berlin, Heidelberg, Springer Verlag.

Navalpakkam V., Arbib M., Itti L. (2005). Attention and scene understanding. In L. Itti, G. Rees, J. Tsotsos (Eds.), Neurobiology of attention, p. 197–203. ACADEMIC PRESS.

Navalpakkam V., Itti L. (2006). Top-down attention selection is fine grained. Journal of Vision, vol. 6, no 11, p. 4. 

Orabona F., Metta G., Sandini G. (2008). A Proto-object based visual attention model. In L. Paletta (Ed.), Attention in cognitive systems. theories and systems from an interdisciplinary viewpoint (wapcv), p. 198–215. Berlin, Heidelberg, Springer.


Park S., An K., Lee M. (2002). Saliency map model with adaptive masking based on independent component analysis. Neurocomputing, vol. 49, no 1, p. 417–422.

Perreira Da Silva M., Courboulay V., Estraillier P. (2011). Objective validation of a dynamical and plausible computational model of visual attention. In 3rd european workshop on visual information processing (euvip).

Peters R., Iyer A., Itti L., Koch C. (2005). Components of bottom-up gaze allocation in natural images. Vision Research, vol. 45, p. 2397–2416.

Rensink R. A. (2000). The dynamic representation of scenes. Visual Cognition, vol. 7, p. 17–42.

Rissanen J. (1978). Modeling by shortest data description. Automatica, vol. 14, p. 465–471.

Spratling M. W., Johnson M. H. (2004). A feedback model of visual attention. Journal of cognitive neuroscience, vol. 16, no 2, p. 219–37.

Sun Y., Fisher R.,Wang F., Gomes H. (2008). A computer vision model for visual-object-based attention and eye movements. Computer Vision and Image Understanding, vol. 112, no 2, p. 126–142.

Tatler B. W. (2007). The central fixation bias in scene viewing : Selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision, vol. 7, p. 1–17.

Tatler B.W., Baddeley R. J., Gilchrist I. D. (2005). Visual correlates of fixation selection: effects of scale and time. Vision research, vol. 45, no 5, p. 643–59.

Torralba A., Oliva A., Castelhano M. S., Henderson J. M. (2006). Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychological review, vol. 113, no 4, p. 766–86.

Treisman A. (1969). Strategies and models of selective attention. Psychological Review, vol. 76, p. 282–299.

Treisman A., Gelade G. (1980). A Feature-Integration Theory of Attention. Cognitive Psychology, vol. 136, no 12, p. 97–136.

Tsotsos J., Liu Y., Martineztrujillo J., Pomplun M., Simine E., Zhou K. (2005). Attending to visual motion. Computer Vision and Image Understanding, vol. 100, no 1-2, p. 3–40.

Tsotsos J. K. (1990). Analysing vision at the complexity level. Behavioral. and. Brain. Sciences, vol. 13, p. 423–469.

Tsotsos J. K. (2007). A selective History of Visual Attention. ECCV 2008 Tutorial.

Van Rullen R., Koch C. (2005). Visual Attention and Visual Awareness. In G. Celesia (Ed.), Disorders of visual processing, vol 5, vol. 91125, p. 65–83. Elsevier.

Viola P., Jones M. (2002). Robust real-time object detection. International Journal of Computer

Vision, vol. 57, no 2, p. 137–154.


Vitay J., Rougier N., Alexandre F. (2005). A distributed model of spatial visual attention. In Biomimetic neural learning for intelligent robots, p. 54–72. Springer.

Walther D., Koch C. (2006). Modeling attention to salient proto-objects. Neural networks : the official journal of the International Neural Network Society, vol. 19, no 9, p. 1395–407.