Extension de l’Algorithme Plug-in pour l’Optimisation du Paramètre de Lissage de l’Estimateur du Noyau-Difféomorphisme

Extension de l’Algorithme Plug-in pour l’Optimisation du Paramètre de Lissage de l’Estimateur du Noyau-Difféomorphisme

Molka Troudi Faouzi Ghorbel 

Institut des Hautes Études Commerciales Carthage Présidence, 2016, Tunis, Tunisie

Laboratoire CRISTAL, Pôle GRIFT École Nationale des Sciences de l’Informatique Université La Manouba, 2010, La Manouba, Tunisie

Page: 
321-338
|
DOI: 
https://doi.org/10.3166/TS.31.321-338
Received: 
N/A
| |
Accepted: 
N/A
| | Citation

OPEN ACCESS

Abstract: 

The kernel-diffeomorphism estimate is a generalization of the kernel estimate taking into account of the natural support of estimated densities. Using a suitable change of variables can significantly limit the Gibbs phenomenon. The quality of the estimate depends on the value of the bandwidth which must be adjusted. In this article, we focus on the plug-in algorithm to optimize the bandwidth. Thus, we propose to extend it to the kerneldiffeomorphism estimate. The Mean Integrated Square Error of simulated and re-estimated bounded and semi-bounded densities highlight the  interest of this approach. 

Extended Abstract  

The kernel-diffeomorphism probability density functions estimator is a generalization of the kernel probability density functions estimator. It is adapted to estimate the densities taking into account their natural support. Indeed, a simple change of variable leads to a better estimate which limits significantly the Gibbs phenomenon. However, the quality of the estimate depends on the value of the smoothing parameter, which must be adjusted. In this article, we focus on plug-in algorithm to optimize the smoothing parameter. Thus, we propose an extension of this algorithm to the kernel diffeomorphism estimator. As a first step, we recall the principles and theorems of convergence of the kernel-diffeomorphism estimate. An asymptotic study estimates the optimal value of the smoothing parameter by minimizing the mean integrated square error (MISE). Thus, this estimator is expressed in two entities associated with the probability density f to estimate, Mφ(K) et Jφ(f). The implementation of the generalized plug-in-algorithm presents further difficulties compared to conventional plug-in-algorithm. Indeed, in this algorithm, M(K) depends only on the choice of kernel while, in case of the generalized plug-in algorithm, Mφ(K) is related to the unknown density which must be approached throughout the iterations. Furthermore, Jφ(f) is a function of f and f' in addition to f'' which leads to an increased complexity of the generalized plug-in algorithm. The proposed iterative algorithm is subsequently tested on some bounded or semi-bounded simulated densities. Measuring the MISE enables to highlight the benefits of this approach for densities with support information.  

In the case of semi-bounded densities, the results show a significantly improved estimate. This is not as obvious for densities with bounded support. Indeed, despite a significant limitation of Gibbs effect, the MISE is more important because of the disruption of the estimated density suggesting a value lower than the optimal smoothing parameter value. This divergence could be explained by the accumulation of errors in the estimation of the various entities involved in the estimation of Jφ(f) . To remedy this problem, the optimal bandwidth is adjusted empirically by varying the powers of Jφ(f) entity in the analytical expression of the optimal bandwidth. Thanks to this fitting, the generalized plug-in algorithm provides a significant improvement in the estimation of the densities having a bounded support. 

RÉSUMÉ

L’estimateur noyau-difféomorphisme est une généralisation de l’estimateur à noyau permettant d’estimer les densités en tenant compte de leur support naturel. Le recours à un changement de variable approprié permet de limiter significativement le phénomène de Gibbs. Cependant, la qualité de l’estimation est tributaire de la valeur du pas qui doit être ajusté. Dans cet article, nous nous focalisons sur l’algorithme plug-in pour l’optimisation du pas. Ainsi, nous proposons une extension de cet algorithme itératif à l’estimateur du noyau difféomorphisme. Après un aperçu des théorèmes de convergence de la méthode du noyaudifféomorphisme et la présentation de l’algorithme proposé, la mesure de l'écart quadratique moyen intégré de quelques densités semi-bornées et bornées simulées puis ré-estimées permet de mettre en évidence l'intérêt de cette approche. 

Keywords: 

non parametric estimate, kernel diffeomorphism estimate, smoothing parameter, bounded distributions, plug-in algorithm. 

MOTS-CLÉS

estimateur non paramétrique, estimateur noyau-difféomorphisme, paramètre de lissage, support borné, algorithme plug-in. 

1. Introduction
2. Estimateurs à Noyau
3. Estimateur du Noyau Difféomorphisme
4. Algorithme Plug-in Généralisé
5. Étude Comparative des Estimateurs à Noyau et Noyau-Difféomorphisme avec Ajustement du Paramètre de Lissage
6. Conclusion
  References

Botev Z.I., Grotowski J.F., Kroese D.P. (2010). Kernel density estimation via diffusion. Annals of statistics, vol. 38, p. 2916- 2957.  

Bowman A.W. (1984). An alternative method of cross validation for smoothing of density estimates. Biometrika, vol.7, p. 353- 360. 

Deheuvels P., Hominal P. (1980). Estimation automatique de la densité. Revue de Statistiques Appliquée, vol. 28, p. 25-55. 

Ghorbel F. (2011). Une approche unifiée des aspects géométriques et statistiques de la reconnaissance des formes planes. ARTS-PI editions, Tunis, seconde édition. 

Hall P. (1982). Comparison of two orthogonal series methods of estimating a density and its derivatives on interval. J. Multivariate anal, vol. 12, p. 432-449. 

Hall P., Marron J.S. (1987). Estimation of integrated square density derivatives. J. statistics and probability letters, vol. 6, p.109-115. 

Hall P., Marron J.S. (1987). Extent to which least-squares cross validation minimizes integrated square error in non parametric density estimation. J. Probability Theory and related fields, vol. 74, p. 567-581. 

Hall P., Marron J.S. (1991). Lower bounds for bandwidth selection in density estimation. J. Probability Theory and related fields, vol. 90, p. 149-173. 

Hall P., Marron J.S., Byeong U.P. (1992). Smoothed cross validation. J. Probability Theory and related fields, vol. 92, p. 1-20. 

Hardle W. (1991). Techniques with implementation in S. Springer, New York. 

Mugadi A.R., Ahmad I.A. (2004). A bandwidth selection for kernel density estimation of functions of randoms variables. J. Computational statistics and data analysis, vol. 47, p. 49-62. 

Park B.U., Marron J.S. (1990). Comparison of data driver bandwidth selection. Journal of the American Statistical Association, vol. 85, p. 66-72. 

Parzen E. (1962). On estimation of a probability density function and mode. Annals of mathematical statistics, vol. 33, p. 1065-1076. 

Rozenblatt R. (1956). Remarks on some non-parametric estimates of a density function. Annals of mathematical statistics, vol. 27, p. 832-83. 

Saoudi S., Ghorbel F., Hillion A. (1994). Non parametric probability probability density function estimation on a bounded support : applications to shape classification and speech coding. J. Applied statistic Models and Data Analysis, vol. 10, p. 215-231. 

Saoudi S., Ghorbel F., Hillion A. (1997). Some statisticals properties of the kernel diffeomorphism estimator. J. Applied statistic Models and Data Analysis, vol. 10, p. 39-58. 

Scott D.W., Terrel G.R. (1987). Biased and unbiased cross-validation in density estimation. Journal of the American statistical association, vol. 82, p. 1131-1146. 

Silverman B.W. (1986). Density estimation for statistics and data analysis. Chapman & Hall, London. 

Terrel G.R. (1990). The maximal smoothing principle in density estimation. Journal of the American statistical association, 85:470–477, 1990. 

Troudi M., Alimi A. M., Saoudi S. (2008). Analytical Plug-in Method for Kernel Density Estimator Applied to Genetic Neutrality Study. Eurasip Journal of Advances in Signal Processing, vol. 2008, Article ID 739082, doi:10.1155/2008/739082.  

Troudi M. (2009). Optimisation du paramètre de lissage pour l’estimateur à noyau par des algorithmes itératifs : applications à des données réelles. Thèse en Sciences pour Ingénieurs, Télécom Bretagne, France.