Caméras Virtuelles pour L’Étalonnage d’un Système de Réalité Augmentée sur Affichage Semi-Transparent

Caméras Virtuelles pour L’Étalonnage d’un Système de Réalité Augmentée sur Affichage Semi-Transparent

Jim Braux-Zin Adrien Bartoli  Romain Dupont  Mohamed Tamaazousti 

CEA, LIST — 91191 Gif-sur-Yvette, France

ISIT, Université d’Auvergne — 63000 Clermont-Ferrand, France

25 September 2013
12 May 2014
30 June 2014
| Citation



We present a novel extrinsic calibration method for optical see-through systems. It is primarilyaimedattablet-likesystemswithasemi-transparentscreen, auser-trackingdeviceand a device dedicated to the localization of the system in the environment but easily generalizable to any optical see-through setup. The proposed algorithm estimates the relative poses of those three component based on the user indicating the projections onto the screen of several reference points chosen on a known object. A convex estimation is computed through the resectioning of virtual cameras and used to initialize a global bundle adjustment. Both synthetic and real experiments show the viability of our approach. 

Extended Abstract

Optical see-through augmented reality systems, using semi-transparent displays, offer compelling advantages with regard to standard video-see-through systems. The more important one is the ability to never isolate the user from reality. This is crucial for the critical applications we envision such as driving or surgery assistance, where lives areatstake. However,opticalsee-through systems are challenging to calibrate,ensuring proper alignment between the virtual augmentation and reality. Indeed one needs to compute the relative poses of the user, the observed scene and the semi-transparent screen for a correct display. 

With no loss of generality, we consider a system made of a semi-transparent screen rigidly tied to two localization devices (one tracking the user and the other one tracking the observed scene). The calibration process involves estimating the pose of those two devices with regard to the screen and minimizing the alignment error between the augmentations and reality. To that end, the user is asked to indicate the on-screen projection of reference point sofa known 3D object. The sought poses can be computed throughnon-linear optimization, minimizing the 2D distance between the user-provided projections and the ones computed by the system from the poses. However, without a priori knowledge about the geometry of the problem, the cost function is highly non-convex and difficult to optimize. 

We propose in this article a new framework allowing to get a direct initial estimate of the poses by resectioning virtual cameras. Those virtual cameras are defined by their optical centers, coinciding with user positions, and their focal planes, all coinciding with the physical screen. The projections indicated by the user can then be seen as 2D observations of the 3D reference points in those cameras, allowing calibration by standard techniques. A similar reasoning allows us to define virtual cameras centered on the 3D reference points, “looking at” the user positions. 

Synthetic and real experiments demonstrate the validity of this approach. Moreover, it is a generic method making no assumption on the system geometry, the type of localization devices (cameras, electromagnetic tracking...) or the display technology used (LCD screen, beam-splitter...).


Nous proposons dans cet article une nouvelle méthode d’étalonnage extrinsèque pour systèmes de réalité augmentée sur affichages semi-transparents. Cette méthode est appliquée ici à un système de type tablette augmentée composé d’un écran semi-transparent, d’un dispositif de suivi de l’utilisateur et d’un dispositif de localisation par rapport à la scène observée. Cette méthode reste cependant générique et applicable à la plupart des scénarios de réalité augmentée. Elle estime les poses relatives de ces trois composants en se basant sur les obervations 2D à l’écran de points de référence d’un objet connu. Ces observations sont fournies par l’utilisateur, par exemple sous forme de clics à la souris. Une initialisation convexe est d’abord calculée grâce à l’étalonnage de caméras virtuelles. Un ajustement de faisceaux global raffine ensuite le résultat. Des expériences sur données synthétiques et réelles montrent le bien-fondé de cette approche. 


calibration, augmented reality, optical see-through, non-overlapping cameras, semitransparent screen.


étalonnage,calibrage,réalitéaugmentée,écransemi-transparent,camérasàchamps disjoints. 

1. Introduction
2. Formulation du Problème
3. Étalonnage et Caméras Virtuelles
4. Scénarios Envisageables et Applications
5. Évaluation
6. Conclusion

Abdel-Aziz Y., Karara H. (1971). Direct linear transformation from comparator to object space coordinates in close-range photogrammetry. In ASP symposium on close-range photogrammetry, p. 1–18. American Society of Photogrammetry. 

Anjum N., Taj M., Cavallaro A. (2007). Relative position estimation of non-overlapping cameras. In International Conference on Acoustics, Speech and Signal Processing, vol. 2, p. II-281–II-284. IEEE Signal Processing Society. 

Caspi Y., Irani M. (2002). Aligning non-overlapping sequences. International Journal of Computer Vision, vol. 48, no 1, p. 39–51. 

Comaniciu D., Ramesh V., Meer P. (2000). Real-time tracking of non-rigid objects using mean shift. In Computer Vision and Pattern Recognition, vol. 2, p. 142–149. IEEE Computer Society. 

Esquivel S., Woelk F., Koch R. (2007). Calibration of a multi-camera rig from non-overlapping views. InF.Hamprecht,C.Schnörr,B.Jähne(Eds.),Patternrecognition,vol.4713,p.82–91. Berlin / Heidelberg, Springer. 

Hartley R. I., Zisserman A. (2004). Multiple view geometry in computer vision (Seconde éd.). Cambridge University Press. 

Horn B. K. (1987). Closed-form solution of absolute orientation using unit quaternions. Journal of the Optical Society of America A, vol. 4, no 4, p. 629–642. 

Juniper Research. (2012). Over 2.5 billion mobile augmented reality apps to be installed per annum by 2017. Press Release. Consulté sur viewpressrelease.php?pr=334 

Kumar R. K., Ilie A., Frahm J.-M., Pollefeys M. (2008). Simple calibration of non-overlapping cameraswithamirror. InComputerVisionandPatternRecognition,p.1–7.IEEE Computer Society.

LébralyP.,RoyerE.,Ait-AiderO.,DeymierC.,DhomeM. (2011). Fast calibration of embedded non-overlapping cameras. In International Conference on Robotics and Automation, p. 221– 227. IEEE Robotics & Automation Society. 

Marquardt D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial & Applied Mathematics, vol. 11, no 2, p. 431–441. 

Oliensis J., Hartley R. (2005). Iterative extensions of the Sturm/Triggs algorithm: Convergence and nonconvergence. Pattern Analysis and Machine Intelligence, vol. 29, no 12, p. 2217– 2233.

RahimiA.,DunaganB.,DarrellT. (2004). Simultaneouscalibrationandtrackingwithanetwork of non-overlapping sensors. In Computer Vision and Pattern Recognition. IEEE Computer Society. 

Rodrigues R., Barreto J., Nunes U. (2010). Camera pose estimation using images of planar mirror reflections. In European Conference on Computer Vision, p. 382–395. Springer. 

Sturm P., Bonfort T. (2006). How to compute the pose of an object without a direct view? In Asian Conference on Computer Vision, vol. 2, p. 21–31. Springer. 

Takahashi K., Nobuhara S., Matsuyama T. (2012). A new mirror-based extrinsic camera calibration using an orthogonality constraint. In Computer Vision and Pattern Recognition. IEEE Computer Society. 

Tang A. (2003). Evaluation of calibration procedures for optical see-through head-mounted displays. In 2nd IEEE/ACM International Symposium on Mixed and Augmented Reality, p. 161. IEEE Computer Society. 

TriggsB. (1996). Factorizationmethodsforprojectivestructureandmotion. InComputerVision and Pattern Recognition, p. 845–851. IEEE Computer Society. 

Triggs B., McLauchlan P. F., Hartley R. I., Fitzgibbon A. W. (2000). Bundle adjustment — a modern synthesis. In Vision algorithms: theory and practice, p. 298–372. Springer. 

Viola P., Jones M. (2001). Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, p. I-511–I-518. IEEE Computer Society. 

ZhangZ. (2000). Aflexiblenewtechniqueforcameracalibration. PatternAnalysisandMachine Intelligence, vol. 22, no 11, p. 1330–1334.