Home Journals TS Fusion D’Informations pour la Comprehension de Scènes

JOURNAL METRICS

Impact Factor (JCR) 2022: 1.9 ℹImpact Factor (JCR):

The JCR provides quantitative tools for ranking, evaluating, categorizing, and comparing journals. The impact factor is one of these; it is a measure of the frequency with which the “average article” in a journal has been cited in a particular year or period. The annual JCR impact factor is a ratio between citations and recent citable items published. Thus, the impact factor of a journal is calculated by dividing the number of current year citations to the source items published in that journal during the previous two years.

5-Year Impact Factor: 1.8 ℹ5-Year Impact Factor:

A 5-Year Impact Factor shows the long-term citation trend for a journal. This is calculated differently from the Journal Impact Factor, so it is not simply an average of the Impact Factors in the time period. The Impact Factor itself is based only on Web of Science Core Collection citation data from the last three years and thus reflects only recent impact. The Journal Impact Factor is the average number of times articles from the journal published in the past two years have been cited in the Journal Citation Reports year.

123.png

Fusion D’Informations pour la Comprehension de Scènes

Philippe Xu | Franck Davoine | Jean-Baptiste Bordes | Thierry Denœux

UMR CNRS 7253, Heudiasyc Université de Technologie de Compiègne BP 20529, 60205 Compiègne Cedex, France

LIAMA, CNRS Key Lab of Machine Perception (MOE) Peking University, Pékin, R.P. Chine

Received:

11 September 2013

| |

Accepted:

2 June 2014

| | Citation

ts31_1-2_57-80.pdf

OPEN ACCESS

Abstract:

This paper addresses the problem of scene understanding for driver assistance systems. In order to recognize the large number of objects that may be found on the road, several sensors and classiﬁcation algorithms have to be used. The proposed approach is based on the representation of all available information in over-segmented image regions. The main novelty of the framework is its capability to incorporate new classes of objects and to include new sensors or detection methods. Several classes as ground, vegetation or sky are considered, as well as three different sensors. The approach was evaluated on real and publicly available urban driving scene data.

Extended Abstract

This paper addresses the problem of information fusion for trafﬁc scene understanding. In order to tackle the numerous tasks that may be expected from advanced driver assistance systems, we propose a multimodal information fusion system that is ﬂexible enough to include new sensors, new processing modules and also new classes of objects. Several issues has to be dealt with in multisensor systems. A ﬁrst issue is to combine information from sensors that perceive the environment differently. To address this point, we formulate the problem as an image labelling one.The image acquired by a camera is ﬁrst over-segmented, then, the common task of all the modules, whatever the data representation they use, becomes to label each individual image segment. Another important issue is to combine processing modules that deal with different classes of objects. The theory of belief functions is used to overcome this problem as it can easily represent knowledge over sets of classes.

This paper shows how to construct mass functions using a distance to model formulation. The parameters of the mass functions are optimized by minimizing a loss function deﬁned from the contour functions. A ﬁrst module is built to detect the ground from 3D information computed by a stereo camera system. The 3D point cloud generated from a disparity map is used to estimate the ground plane. For each image segment, a mass function is then computed from the distance between the segment and the estimated ground plane. The same formulation is used to detect ground from 3D information acquired by a LiDAR sensor. A texture-based monocular module is then considered to detect the sky and vegetation. The texture of an image segment is encoded by the Walsh-Hadamard coefﬁcients and a model is built from a bag-of-worlds approach. Finally, a temporal propagation module is proposed to link the segments of two consecutive images.

The KITTI Vision Benchmark Suite was used to validate our approach, considering two color cameras and a Velodyne 64-beam LiDAR. However, only one layer of the Velodyne LiDAR was used in order to simulate a single layer LiDAR, commonly employed in mobile robotics. A total of 110 images were manually annotated, 70 for training and 40 for testing. Several modules were ﬁrst combined for a simpliﬁed task: ground/non-ground classiﬁcation. The ability of the proposed approach to process any number of classes was then illustrated by adding vegetation and sky detectionmodules.Overall,ﬁveclassesweredeﬁned:grass,road,tree,obstacleandsky. The grass and road classes were deﬁned by intersecting the ground class with the vegetation and non-vegetation classes, respectively. Similarly, the tree class was deﬁned as the intersection of the non-ground class and the vegetation class, while the obstacle class actually referred to anything that was neither the sky, the ground nor vegetation.

RÉSUMÉ

Cet article traite du problème de la compréhension de scènes routières pour des systèmes d’aide à la conduite. Aﬁn de pouvoir reconnaître le grand nombre d’objets pouvant être présents dans la scène, plusieurs capteurs et algorithmes de classiﬁcation doivent être utilisés. L’approche proposée est fondée sur la représentation de toutes les informations disponibles au niveau d’une image sur-segmentée. La principale nouveauté de la méthode est sa capacité à inclure de nouvelles classes d’objets ainsi que de nouveaux capteurs ou méthodes de détection. Plusieurs classes comme le sol, la végétation et le ciel sont considérées, ainsi que trois capteurs différents. L’approche est validée sur des données réelles de scènes routières en milieu urbain.

Keywords:

informationfusion,trafﬁcsceneunderstanding,theoryofbelieffunctions,intelligent vehicles.

MOTS-CLÉS

fusion d’informations, compréhension de scènes routières, théorie des fonctions de croyance, véhicules intelligents.

1. Introduction

2. Annotation D’Images sur-Segmentées

3. Théorie des Fonctions de Croyance

4. Application à la Compréhension de Scènes Routières

5. Résultats Expérimentaux

6. Conclusions et Perspectives

Remerciements

References

Achanta R., Shaji A., Smith K., Lucchi A., Fua P., Susstrunk S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 34, no 11, p. 2274–2282.

Badino H., Franke U., Mester R. (2007). Free space computation using stochastic occupancy grids and dynamic programming. In Proc. International Conference on Computer Vision Workshop on Dynamical Vision. Rio de Janeiro, Brazil.

BarnettJ.A. (1991). CalculatingDempster-Shaferplausibility. IEEETrans.onPatternAnalysis and Machine Intelligence, vol. 13, no 6, p. 599–602.

Denœux T. (1995). A k-nearest neighbor classiﬁcation rule based on Dempster-Shafer theory. IEEE Trans. on Systems, Man and Cybernetics, vol. 25, no 5, p. 804–813.

Dollár P., Wojek C., Schiele B., Perona P. (2011). Pedestrian detection: an evaluation of the state of the art. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 34, no 4, p. 743–761. ErnstI.,HirschmüllerH. (2008). Mutualinformationbasedsemi-globalstereomatchingonthe GPU. In Proc. International Symposium on Advances in Visual Computing, p. 228–239.

Las Vegas, USA. Ess A., Müller T., Grabner H., Van Gool L. (2009). Segmentation based urban trafﬁc scene understanding. In Proc. British Machine Vision Conference, p. 1–11.

London, UK. Farabet C., Couprie C., Najman L., LeCun Y. (2012). Scene parsing with multiscale feature learning, purity trees, and optimal covers. In Proc. Internation Conference on Machine Learning. Edinburgh, Scotland.

Geiger A., Lenz P., Urtasun R. (2012). Are we ready for autonomous driving? The KITTI visionbenchmarksuite. InProc.IEEEConf.onComputerVisionandPatternRecognition, p. 3354–3361. Providence, USA.

Geiger A., Moosmann F., Car O., Schuster B. (2012). Automatic camera and range sensor calibration using a single shot. In Proc. IEEE International Conference on Robotics and Automation, p. 3936–3943. Saint Paul, USA.

Geiger A., Wojek C., Urtasun R. (2011). Joint 3D estimation of objects and scene layout. In Proc. Conf. on Neural Information Processing Systems, p. 1467–1475.

Granada, Spain. Hoiem D., Efros A. A., Hebert M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, vol. 75, no 1, p. 151–172.

Hoiem D., Efros A. A., Hebert M. (2008). Closing the loop on scene interpretation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p. 1-8. Anchorage, USA.

Ladický L., Sturgess P., Russell C., Sengupta S., Bastanlar Y., et al. (2012). Joint optimization for object class segmentation and dense stereo reconstruction. International Journal of Computer Vision, vol. 100, no 2, p. 122-133.

Leibe B., Cornelis N., Cornelis K., Van Gool L. (2007). Dynamic 3D scene analysis from a moving vehicle. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition,p. 1–8. Minneapolis, USA. Levinshtein A., Stere A., Kutulakos K. N., Fleet D. J., Dickinson S. J., Siddiqi K. (2009). TurboPixels: Fast superpixels using geometric ﬂows. IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no 12, p. 2290–2297.

Mathevet S., Trassoudaine L., Checchin P., Alizon J. (1999). Combinaison de segmentations en régions. Traitement du Signal, vol. 16, no 2, p. 93–104.

Moras J., Cherfaoui V., Bonnifait P. (2011). Moving objects detection by conﬂict analysis in evidential grids. In Proc. IEEE Intelligent Vehicles Symposium, p. 1120–1125. BadenBaden, Germany.

Morre A. P., Prince S. J. D., Warrel J., Mohammed U., Jones G. (2009). Scene shape priors for superpixel segmentation. In Proc. IEEE International Conference on Computer Vision, p. 771-778. Kyoto, Japan.

RenC.Y.,ReidI. (2011). gSLIC:areal-timeimplementationofSLICsuperpixelsegmentation. Technical report. University of Oxford, Department of Engineering Science.

Rodríguez S. A., Frémont V., Bonnifait P., Cherfaoui V. (2011). Multi-modal object detection and localization for high integrity driving assistance. Machine Vision and Applications, vol. 14, p. 1-16. Scharstein D., Szeliski R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, vol. 47, no 1–3, p. 7–42.

Shafer G. (1976). A mathematical theory of evidence. Princeton, New Jersey, Princeton University Press. Smets P., Kennes R. (1994). The transferable belief model. Artiﬁcial Intelligence, vol. 66, p. 191–243.

Thrun S., Burgard W., Fox D. (2005). Probabilistic robotics (intelligent robotics and autonomous agents). Cambridge, Massachusetts, The MIT Press.

Wang C. C., Thorpe C., Thrun S., Hebert M., Durrant-Whyte H. (2007). Simultaneous localization, mapping and moving object tracking. International Journal of Robotics Research, vol. 26, no 1, p. 889–916.

Wedel A., Badino H., Rabe C., Loose H., Franke U., Cremers D. (2009). B-spline modeling of road surfaces with an application to free-space estimation. IEEE Trans. on Intelligent Transportation Systems, vol. 10, no 4, p. 572–583.

Werlberger M. (2012). Convex Approaches for High Performance Video Processing. PhD thesis, Graz University of Technology.

Zhang J., Marszalek M., Lazebnik S., Schmid C. (2007). Local features and kernels for classiﬁcation of texture and object categories: a comprehensive study. International Journal of Computer Vision, vol. 73, no 2, p. 213–238.

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Fusion D’Informations pour la Comprehension de Scènes