Anonymisation de données par généralisation

Anonymisation de données par généralisation

Feten Ben Fredj Nadira Lammari  Isabelle Comyn-Wattiau 

CEDRIC-CNAM, 2 rue Conté, 75003 Paris, France

ESSEC Business School, 1 av. Bernard Hirsch, 95021 Cergy, France

Corresponding Author Email: 
ilham-nadira.lammari@cnam.fr; wattiau@essec.edu
Page: 
63-87
|
DOI: 
https://doi.org/10.3166/ISI.23.1.63-87
Received: 
| |
Accepted: 
| | Citation
Abstract: 

Many algorithms allow data owners to anonymize personal data, aiming at avoiding disclosure risk without losing data utility. In this paper, we describe a model-driven approach guiding the data owner during the anonymization process. The guidance, informative or suggestive, helps the data owner not only in choosing the most relevant algorithm but also in defining the best input values for the algorithm, given the characteristics of data and the context. In this paper, we focus on generalization algorithms for micro-data. We conducted a reverse engineering process in order to extract some knowledge from existing anonymization tools. The knowledge about anonymization, both theoretical and experimental, is managed thanks to an ontology.

Keywords: 

guidance, security, ontology, methodology, privacy, anonymization, model-driven approach

1. Introduction
2. Etat de l’art
3. Définition du processus d’anonymisation
4. Présentation générale de l’approche
5. Description des étapes de la méthode MAGGO
6. Exemple d’illustration
7. Conclusion
  References

Agrawal H., Cochinwala M., Horgan J.R. (2014). Automated Determination of Quasi-Identifiers Using Program Analysis, U.S. Patent N° 8661423B2, Date: Feb. 25.

Aïmeur E. (2009). Data Mining and Privacy. In Encyclopedia of Data Warehousing and Mining, Second Edition, p. 388-393. IGI Global.

Akoka J., Comyn-Wattiau I., Du Mouza C., Fadili H., Lammari N., Metais E., Cherfi S. S. S. (2014). A semantic approach for semi-automatic detection of sensitive data. Information Resources Management Journal (IRMJ), vol. 27, n° 4, p. 23-44.

Amita S., Ranjan Baghel, Puneeta Panday, Praveen Saini (2014). A Survey on Techniques for Privacy Preserving Data Publishing (PPDP). MIT International Journal of Computer Science and Information Technology, vol. 4, n° 2, August, p. 60-64.

Ayala-Rivera V., McDonagh P., Cerqueus T., et Murphy L. (2014). A Systematic Comparison and Evaluation of k-Anonymization Algorithms for Practitioners. Trans. Data Privacy, vol. 7, n° 3, p. 337-370.

Bayardo R. J. et Agrawal R. (2005). Data privacy through optimal k-anonymization. ICDE. p. 217-228.

BenFredj F., Lammari N., Comyn-Wattiau I (2014). Characterizing Generalization Algorithms-First Guidelines for Data Publishers. International Conference on Knowledge Management and Information Sharing, Rome, Italy.

BenFredj F., Lammari N., Comyn-Wattiau I. (2015) Building an Ontology to Capitalize and Share Knowledge on Anonymization Techniques. European Conference on Knowledge Management, p. 122-131. Kidmore End: Academic Conferences International Limited.

BenFredj F., Lammari N., Comyn-Wattiau I. (2016). L’anonymisation des données par généralisation - un arbre de décision. Ingénierie et management des systèmes d’information, Cepadues, ISBN 978.2.36493.573.0, p. 159-171.

BenFredj F., Lammari N., Comyn-Wattiau I. (2017). Approche guidée pour l’anonymisation de bases de données, Actes de la conférence INFORSID, Toulouse.

BenFredj F. (2017). Méthode et outil de brouillage des données sensibles. Thèse de doctorat, CNAM, Paris, juillet.

Breiman L., Friedman J., Stone C. J. et Olshen R.A. (1984). Classification and Regression Trees. Wadsworth Statistics/Probability.

Brand R, (2002). Microdata protection through noise addition. Inference Control in Statistical Databases, Domingo-Ferrer J. (ed.), LNCS vol. 2316, p. 97-116, Springer.

Burton R., Hundepool A. J., Willenborg L. CRJ, Nitz L. H., Kim K. E. (1997). Record Linkage. In Record Linkage Techniques-1997, Proceedings of an International Workshop and Exposition, March 20-21, Arlington, Va, 139. National Academies.

Ciriani V., De Capitani di Vimercati S., Foresti S., Samarati P. (2007). Microdata Protection. Secure Data Management in Decentralized Systems 2007, Advances in Information Security, p. 291-321, Springer.

Dai C., Ghinita G., Bertino E., Byun J., Li N. (2009). TIAMAT: a Tool for Interactive Analysis of Microdata Anonymization Techniques. PVLDB, vol. 2, n° 2, p. 1618-1621.

Dalenius T. (1977). Towards a methodology for statistical disclosure control. Statistisk Tidskrift.

Defays D., Nanopoulos P. (1993) Panels of enterprises and confidentiality: the small aggregates method, Paper read at the 92nd Symposium on Design and Analysis of Longitudinal Surveys, Ontorio, Canada, November.

Fienberg S.E, McIntyre J. (2004). Data swapping: Variations on a theme by dalenius and reiss. In International Workshop on Privacy in Statistical Databases, p. 14-29. Springer

Fung B., Wang K., Yu P. S. (2005). Top-down specialization for information and privacy preservation. ICDE’05, p. 205-216.

Fung, B. C. M., Ke Wang, Chen R., et Yu. P. S. (2010). Privacy-Preserving Data Publishing: A Survey of Recent Developments. ACM Computing Surveys, vol. 42, n° 4, p. 1-53.

Hand D.J., 1992. Microdata, macrodata, and metadata. Computational Statistics, In Dodge Y., Wittaker J. (Eds), Physica Verlag, Heidelberg, p. 325-340.

Hussien A. A., Hamza N., Hefny H. A. (2013). Attacks on Anonymization-Based Privacy-Preserving: A Survey for Data Mining and Data Publishing. Journal of Information Security, 4, p. 101-112

Ilavarasi B., Sathiyabhama A. K., Poorani S. (2013). A survey on privacy preserving data mining techniques. Int. Journal of Computer Science and Business Informatics, vol. 7, n° 1

Iyengar V. S. (2002). Transforming data to satisfy privacy constraints. ACM SIGKDD’02, p. 279-288.

Kiran P. et Kavya N. P. (2012). A Survey on Methods, Attacks and Metric for Privacy Preserving Data Publishing. International Journal of Computer Applications, vol. 53, n° 18.

LeFevre K., DeWitt D. J., Ramakrishnan R. (2005). Incognito: Efficient full-domain k-anonymity. In Proceedings of the 2005 ACM SIGMOD, p. 49-60.

LeFevre K., DeWitt D.J., Ramakrishnan R. (2006). Mondrian multidimensional k-anonymity. ICDE’06. p. 25-25.

LeFevre K., DeWitt D. J., et Ramakrishnan R. (2008). Workload-Aware Anonymization Techniques for Large-Scale Datasets. ACM Transactions on Database Systems, vol. 33, n° 3, p. 1-47.

Loh W.-Y. (2011). Classification and regression trees. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, vol. 1, n° 1, p. 14-23.

Motwani R., Xu Y. (2007). Efficient Algorithms for Masking and Finding Quasi-Identifiers, In : Proceedings of the Conference on Very Large Data Bases (VLDB), p. 83-93.

Patel L., Gupta R. (2013) A Survey of Perturbation Technique for Privacy-Preserving of Data. Int. Journal of Emerging Technology and Advanced Engineering, vol. 3, n° 6.

Poulis G., Gkoulalas-Divanis A., Loukides G., Skiadopoulos S., Tryfonopoulos C. (2014). SECRETA: A System for Evaluating and Comparing RElational and Transaction Anonymization algorithms. EDBT’14.

Saaty T.L, Sodenkamp M.A. (2008). Making decisions in hierarchic and network systems. IJADS, vol. 1, n° 1. p. 24-79

Samarati P., Sweeney L. (1998). Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI International. http://epic.org/privacy/reidentification/Samarati_Sweeney_paper.pdf.

Samarati P. (2001). Protecting respondents’ identities in microdata release, IEEE Transactions on Knowledge and Data Engineering, vol. 13, n° 6, p. 1010-1027.

Silver M. S. (2006). Decisional Guidance. Broadening the Scope. Human-Computer Interaction in Management Information Systems, Galleta D. et Zhang P. (Eds.). International handbooks on information systems vol. 6, p. 90-119. Armonk, NY: M.E. Sharp.

Sweeney L. (1997). Datafly: a System for Providing Anonymity in Medical Data. Eleventh International Conference on Database Securty XI: Status and Prospects, p. 356-381.

Sweeney L. (2002). Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, n° 05, p. 571-588.

Vaghashia H., Amit G. (2015). A survey: privacy preservation techniques in data mining. International Journal of Computer Applications, vol. 119, n° 4.

Vassilios V.S., Bertino E., Nai Fovino I., Parasiliti Provenza L., Saygin Y., et Theodoridis Y. (2004). State-of-the-art in privacy preserving data mining. ACM Sigmod Record, vol. 33, n° 1, p. 50-57.

Wang K., Yu P.S., Chakraborty S. (2004). Bottom-up generalization: A data mining solution to privacy protection. In ICDM’04, p. 249-256.

Xiao X., Wang G., Gehrke G. (2009). Interactive Anonymization of Sensitive Data. SIGMOD’09, June 29-July 2, Providence, Rhode Island, USA, p. 1051-1054.