Implémentation et Évaluation d’un Modèle D’Attention pour la Vision Adaptative

Implémentation et Évaluation d’un Modèle D’Attention pour la Vision Adaptative

Matthieu Perreira Da Silva Vincent Courboulay 

IRCCyN, Université de Nantes Rue Christian Pauc - BP 50609, F-44306 Nantes cedex 03

L3I, Université de La Rochelle Avenue M. Crépeau, F-17042 La Rochelle cedex 01

Page: 
611-641
|
DOI: 
https://doi.org/10.3166/TS.28.611-641
Received: 
N/A
| |
Accepted: 
N/A
| | Citation

OPEN ACCESS

Abstract: 

In the field of scene analysis for computer vision, a trade-off must be found between the quality of the results expected, and the amount of computer resources allocated for each task. Using an adaptive vision system provides a more flexible solution as its analysis strategy can be changed according to the information available concerning the execution context. We describe how to create and evaluate a visual attention system tailored for interacting with a computer vision system so that it adapts its processing according to the interest (or salience) of each element of the scene. We propose a new set of constraints named PAIRED to evaluate the adequacy of a model with respect to its different applications. We justify why dynamical systems provide good properties for simulating the dynamic competition between different kinds of information. We present different results that demonstrate that our results are fast and highly configurable and plausible.

Extended Abstract

While machine vision systems are becoming increasingly powerful, in most regards they are still far inferior to their biological counterparts. In human, the mechanisms of evolution have generated the visual attention system which selects the most important information in order to reduce both cognitive load and scene understanding ambiguity. Thus, studying the biological systems and applying the findings to the construction of computational vision models and artificial vision systems are a promising way of advancing the field of machine vision.

In the field of scene analysis for computer vision, a trade-off must be found between the quality of the results expected, and the amount of computer resources allocated for each task. It is usually a design time decision, implemented through the choice of pre-defined algorithms and parameters. However, this way of doing it limits the generality of the system. Using an adaptive vision system provides a more flexible solution as its analysis strategy can be changed according to the information available concerning the execution context. As a consequence, such a system requires some kind of guiding mechanism to explore the scene faster and more efficiently.

In this article, we propose a first step to building a bridge between computer vision algorithms and visual attention. In particular, we describe how to create and evaluate a visual attention system tailored for interacting with a computer vision system so that it adapts its processing according to the interest (or salience) of each element of the scene. Somewhere in between hierarchical salience based and competitive distributed models, we propose a hierarchical yet competitive model. Our original approach allows us to generate the evolution of attentional focus points without the need of either saliency map or explicit inhibition of return mechanism. This new real-time computational model is based on a dynamical system. The use of such a complex system is justified by an adjustable trade-off between nondeterministic attentional behavior and properties of stability, reproducibility and reactiveness.

In the first two sections, we start by giving a brief overview of the main theories and concepts of human visual attention and we provide the forces and weaknesses of state of the art attention models. This analysis is based on their potential of integration into adaptable computer vision system. We propose a new set of constraints called ‘PAIRED’ to evaluate the adequacy of a model with respect to its different applications.

In a third section, we provide an in-depth description of our model and its implementation. We justify why dynamical systems are a good choice for visual attention simulation, and we show that preys/predators models provide good properties for simulating the dynamic competition between different kinds of information. This dynamical system is also used to generate a focus point at each time step of the simulation. In order to show that our model can be integrated in an adaptable computer vision system, we show that this architecture is fast and allows a flexible real time visual attention simulation. In particular, we present a feedback mechanism used to change the scene exploration behavior of the model. This mechanism can be used to maximize the scene coverage (explore each and every part) or maximize focalization on a particular salient area (tracking).

In a last section we present the evaluation results of our model. Since the model is highly configurable, its evaluation will not cover not its plausibility compared to human eye fixations (already studied in (Perreira Da Silva et al., 2011)), but the influence of each parameter on a set of properties:

– stability: do the values of the dynamical system stay within their nominal range when the different parameters of the model are changed?

– reproducibility: as discrete dynamical system can have a chaotic behavior, what is the influence of the various parameters of the model (in particular, noise) on the variability of the focus paths generated during different simulations on the same data?

– scene exploration: which parameters influence the scene exploration strategy of our model?

– system dynamics: how can we influence the reactivity of the system? In particular how do we deal with mean fixation time?

For all of these properties we have also studied the influence of top-down feedback.

RÉSUMÉ

Dans le domaine de l’analyse de scène en vision par ordinateur, un compromis doit être trouvé entre la qualité des résultats attendus et les ressources allouées pour effectuer les traitements. Une solution flexible consiste à utiliser un système de vision adaptatif capable de moduler sa stratégie d’analyse en fonction de l’information disponible et du contexte. Dans cet article, nous décrivons comment concevoir et évaluer un système d’attention visuelle conçu pour interagir avec un système de vision de façon à ce que ce dernier adapte ses traitements en fonction de l’intérêt (de la saillance) de chaque élément de la scène. Nous proposons également un nouvel ensemble de contraintes nommé PAIRED, permettant d’évaluer l’adéquation du modèle à différentes applications. Nous justifions le choix des systèmes dynamiques par leurs propriétés intéressantes pour simuler la compétition entre différentes sources d’informations. Nous présentons enfin une validation à travers différentes métriques montrant que nos résultats sont rapides, hautement configurables et pertinents.

Keywords: 

attention model, dynamical model, adaptive vision, implementation, evaluation.

MOTS-CLÉS

modèle dynamique d’attention, vision adaptative, implémentation, évaluation.

1. Introduction
2. Modèles Computationnels d’Attention Visuelle
3. Un Modèle d’Attention Visuelle Hiérarchique Compétitif
4. Évaluation du Modèle
5. Conclusion
  References

Ahmad S. (1992). Visit: An efficient computational model of human visual attention. Phd, University of Illinois, Champaign, IL. http://ftp.icsi.berkeley.edu/ftp/pub/techreports/1991/tr-91-049.pdf

Allport D. A. (1987). Selection for action: Some behavioral and neurophysiological considerations of attention and action. In H. Heuer, S. A.F. (Eds.), Perspectives on perception and action, p. 395–419. Hillsdale, NJ, Lawrence Erlbaum Associates.

Avraham T., Lindenbaum M. (2010). Esaliency (extended saliency): meaningful attention using stochastic image modeling. IEEE transactions on pattern analysis and machine intelligence, vol. 32, no 4, p. 693–708. http://www.ncbi.nlm.nih.gov/pubmed/20224124

Aziz M., Mertsching B. (2009). Towards Standardization of Evaluation Metrics and Methods for Visual Attention Models. In Attention in cognitive systems, p. 227–241. Springer. http://www.springerlink.com/index/v713433834617727.pdf

Baldi P., Itti L. (2005). Attention: Bits versus Wows. In 2005 international conference on neural networks and brain, p. 56–61. Ieee. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1614548

Belardinelli A., Pirri F., Carbone A. (2009). Motion Saliency Maps from Spatiotemporal

Filtering. In Lecture notes in artificial intelligence, p. 112–123. Springer. http://www.springerlink.com/index/425j618q84762l43.pdf

Berthoz A. (2009). La simplexité. Paris, Odile Jacob.

Bruce B., Jernigan E. (2003). Evolutionary design of context-free attentional operators. In Proc. icip’03, p. 0–3. Citeseer. http://www.cse.yorku.ca/~neil/ICIPnbruce.pdf

Bruce N. D. B., Tsotsos J. K. (2009). Saliency, attention, and visual search: An information theoretic approach. Journal of Vision, vol. 9, no 3, p. 5. http://www.journalofvision.org/content/9/3/5.full.pdf

Deco G. (2004). A Neurodynamical cortical model of visual attention and invariant object recognition. Vision Research, vol. 44, no 6, p. 621–642. http://linkinghub.elsevier.com/retrieve/pii/S0042698903006928

Desimone R., Duncan J. (1995). Neural mechanisms of selective visual attention. Annual review of neuroscience, vol. 18, p. 193–222. http://www.ncbi.nlm.nih.gov/pubmed/7605061

Dorr M., Gegenfurtner K. R., Barth E. (2010). Variability of eye movements when viewing dynamic natural scenes. Journal of Vision, vol. 10, p. 1–17. http://www.journalofvision.org/content/10/10/28.full.pdf

Eliasmith C. (1995). Mind as a dynamical system. Thèse de master, University of Waterloo. http://www.arts.uwaterloo.ca/~celiasmi/Papers/eliasmith.1995.dynamic% 20mind.masters.pdf

Fox M. D., Snyder A. Z., Vincent J. L., Raichle M. E. (2007). Intrinsic Fluctuations within Cortical Systems Account for Intertrial Variability in Human Behavior. Neuron, vol. 56, no 1, p. 171–184. http://linkinghub.elsevier.com/retrieve/pii/S0896627307006666

Frintrop S. (2005). VOCUS: A Visual Attention System for Object Detection and Goal- Directed Search. Phd, University of Bonn. http://www.iai.uni-bonn.de/~frintrop/paper/frintrop_phd06.pdf

Frintrop S., Backer G., Rome E. (2005). Selecting what is important: Training visual attention. In 28th annual german conference on ai (ki), p. 351–366. Koblenz, Germany, Springer Verlag. http://www.iai.uni-bonn.de/~frintrop/paper/frintrop_etal_ki05.pdf

Frintrop S., Klodt M., Rome E. (2007). A real-time visual attention system using integral images. In 5th international conference on computer vision systems (icvs). Bielefeld, Germany, Applied Computer Science Group. http://biecoll.ub.uni-bielefeld.de/volltexte/2007/36/pdf/ICVS2007-66.pdf

Gilles S. (1996). Description and experimentation of image matching using mutual information. Rapport technique. Oxford University, Robotics Research Group, Department of Engineering Science. http://www.robots.ox.ac.uk/~cvrg/trinity2002/seb/mutual_info.ps.gz

Hamker F. (2005). The emergence of attention by population-based inference and its role in distributed processing and cognitive control of vision. Computer Vision and Image Understanding, vol. 100, no 1-2, p. 64–106. http://linkinghub.elsevier.com/retrieve/pii/S1077314205000767

Heijden A. H. C. van der, Bem S. (1997). Successive approximations to an adequate model of attention. Consciousness and cognition, vol. 6, no 2-3, p. 413–28. http://www.ncbi.nlm.nih.gov/pubmed/9262419

Idema T. (2005). The behaviour and attractiveness of the Lotka-Volterra equations. Phd, Universiteit Leiden. http://www.ilorentz.org/\~{ idema/publications/maththesis.pdf

Itti L., Koch C. (2001). Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging, vol. 10, p. 161–169. http://papers.klab.caltech.edu/84/

Itti L., Koch C., Niebur E., Others. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelligence, vol. 20, no 11, p. 1254–1259. http://ilab.usc.edu/publications/doc/Itti_etal98pami.pdf

Kadir T., Brady M. (2001). Saliency, scale and image description. International Journal of Computer Vision, vol. 45, no 2, p. 83–105. http://www.springerlink.com/index/T45N2G8543574026.pdf

Koch C., Ullman S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Hum Neurobiology, vol. 4, no 4, p. 219–27. http://papers.klab.caltech.edu/104/1/200.pdf

Le Meur O. (2005). Attention sélective en visualisation d’images fixes et animées affichées sur écran : modèles et évaluation de performances - applications. Thèse de doctorat, Ecole polytechnique de l’Université de Nantes. http://www.irisa.fr/temics/staff/lemeur/publi/LeMeur_These.pdf

Le Meur O., Le Callet P. (2009). What we see is most likely to be what matters: Visual attention and applications. In International conference on image processing. Cairo, Egypt. http://www.irisa.fr/temics/staff/lemeur/publi/LeMeur_ICIP09.pdf

Le Meur O., Le Callet P., Barba D., Thoreau D. (2006). A coherent computational approach to model bottom-up visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no 5, p. 802–817. http://www.irccyn.ec-nantes.fr/\~{}lecallet/paper/LeMeur-IEEEPAMI06.pdf

Lesser M., Dinah M. (1998). Mind as a dynamical system : Implications for autism. In In psychobiology of autism : current research & practice.

Lienhart R., Maydt J. (2002). An extended set of haar-like features for rapid object detection. In Ieee icip, vol. 1, p. 900–903. Citeseer. http://mmc36.informatik.uni-augsburg.de/mediawiki/images/c/c3/Icip2002.pdf

Lopez M., Fernandezcaballero A., Fernandez M., Mira J., Delgado A. (2006). Motion features to enhance scene segmentation in active visual attention. Pattern Recognition Letters, vol. 27, no 5, p. 469–478. http://linkinghub.elsevier.com/retrieve/pii/S0167865505002631

Mancas M. (2007). Computational Attention : Towards attentive computers. Phd, Faculté Polytechnique de Mons. http://theses.eurasip.org/media/theses/documents/mancas-matei-computational-attention-towards-attentive-computers.pdf

Mozer M. C., Sitton M. (1998). Computational modeling of spatial attention. Attention, p. 341–393. http://www.nbu.bg/cogs/events/2002/materials/Mozer/mozer1998.pdf

Murray J. (2003). Mathematical biology: An introduction. Berlin, Heidelberg, Springer Verlag.

Navalpakkam V., Arbib M., Itti L. (2005). Attention and scene understanding. In L. Itti, G. Rees, J. Tsotsos (Eds.), Neurobiology of attention, p. 197–203. ACADEMIC PRESS. http://ilab.usc.edu/publications/doc/Navalpakkam_etal05noa.pdf

Navalpakkam V., Itti L. (2006). Top-down attention selection is fine grained. Journal of Vision, vol. 6, no 11, p. 4. http://www.journalofvision.org/content/6/11/4.full.pdf 

Orabona F., Metta G., Sandini G. (2008). A Proto-object based visual attention model. In L. Paletta (Ed.), Attention in cognitive systems. theories and systems from an interdisciplinary viewpoint (wapcv), p. 198–215. Berlin, Heidelberg, Springer. http://www.springerlink.com/

index/71U3T3262424M763.pdf

Park S., An K., Lee M. (2002). Saliency map model with adaptive masking based on independent component analysis. Neurocomputing, vol. 49, no 1, p. 417–422. http://www.ingentaconnect.com/content/els/09252312/2002/00000049/00000001/art00637

Perreira Da Silva M., Courboulay V., Estraillier P. (2011). Objective validation of a dynamical and plausible computational model of visual attention. In 3rd european workshop on visual information processing (euvip). http://hal.archives-ouvertes.fr/docs/00/61/77/30/PDF/euvip_perreira.pdf

Peters R., Iyer A., Itti L., Koch C. (2005). Components of bottom-up gaze allocation in natural images. Vision Research, vol. 45, p. 2397–2416. http://linkinghub.elsevier.com/retrieve/pii/S0042698905001975

Rensink R. A. (2000). The dynamic representation of scenes. Visual Cognition, vol. 7, p. 17–42. http://homepages.rpi.edu/~grayw/courses/cogs6962/papers/REN00_VisCog.pdf

Rissanen J. (1978). Modeling by shortest data description. Automatica, vol. 14, p. 465–471.

Spratling M. W., Johnson M. H. (2004). A feedback model of visual attention. Journal of cognitive neuroscience, vol. 16, no 2, p. 219–37. http://www.ncbi.nlm.nih.gov/pubmed/15068593

Sun Y., Fisher R.,Wang F., Gomes H. (2008). A computer vision model for visual-object-based attention and eye movements. Computer Vision and Image Understanding, vol. 112, no 2, p. 126–142. http://linkinghub.elsevier.com/retrieve/pii/S1077314208000167

Tatler B. W. (2007). The central fixation bias in scene viewing : Selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision, vol. 7, p. 1–17. http://www.journalofvision.org/content/7/14/4.full.pdf

Tatler B.W., Baddeley R. J., Gilchrist I. D. (2005). Visual correlates of fixation selection: effects of scale and time. Vision research, vol. 45, no 5, p. 643–59. http://www.ncbi.nlm.nih.gov/pubmed/15621181

Torralba A., Oliva A., Castelhano M. S., Henderson J. M. (2006). Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychological review, vol. 113, no 4, p. 766–86. http://www.ncbi.nlm.nih.gov/pubmed/17014302

Treisman A. (1969). Strategies and models of selective attention. Psychological Review, vol. 76, p. 282–299.

Treisman A., Gelade G. (1980). A Feature-Integration Theory of Attention. Cognitive Psychology, vol. 136, no 12, p. 97–136. http://www.yorku.ca/mfallah/bandb/treisman_gelade.pdf

Tsotsos J., Liu Y., Martineztrujillo J., Pomplun M., Simine E., Zhou K. (2005). Attending to visual motion. Computer Vision and Image Understanding, vol. 100, no 1-2, p. 3–40. http://linkinghub.elsevier.com/retrieve/pii/S1077314205000779

Tsotsos J. K. (1990). Analysing vision at the complexity level. Behavioral. and. Brain. Sciences, vol. 13, p. 423–469. http://www.cse.yorku.ca/~tsotsos/Homepage%20of%20John%20K_files/bbs-90.pdf

Tsotsos J. K. (2007). A selective History of Visual Attention. ECCV 2008 Tutorial. http://www.cse.yorku.ca/~albertlr/attention_tutorial_eccv2008.htm

Van Rullen R., Koch C. (2005). Visual Attention and Visual Awareness. In G. Celesia (Ed.), Disorders of visual processing, vol 5, vol. 91125, p. 65–83. Elsevier. http://papers.klab.caltech.edu/277/1/442.pdf

Viola P., Jones M. (2002). Robust real-time object detection. International Journal of Computer

Vision, vol. 57, no 2, p. 137–154. http://research.microsoft.com/en-us/um/people/viola/

Pubs/Detect/violaJones_IJCV.pdf

Vitay J., Rougier N., Alexandre F. (2005). A distributed model of spatial visual attention. In Biomimetic neural learning for intelligent robots, p. 54–72. Springer. http://www.springerlink.com/index/2qwwddx022jy6naq.pdf

Walther D., Koch C. (2006). Modeling attention to salient proto-objects. Neural networks : the official journal of the International Neural Network Society, vol. 19, no 9, p. 1395–407. http://www.ncbi.nlm.nih.gov/pubmed/17098563