A deep neural network-based algorithm for safe release of big data under random noise disturbance

A deep neural network-based algorithm for safe release of big data under random noise disturbance

Jian Yu Hui Wang 

Liuzhou Vocational and Technical College, School of Electronic Information Engineering, Liuzhou 545005, China

Liuzhou Vocational and Technical College, School of Art, Liuzhou 545005, China

Corresponding Author Email: 
31 December 2018
| Citation



Despite its huge benefits, the release of big data is faced with the severe risk of privacy leakage. To solve the problem, this paper proposes a deep neural network (DNN)-based algorithm for safe release of big data under random noise disturbance. Specifically, a random noise of a certain probability distribution was added into the release of the big data, such that the public output will not change significantly whether an individual data record is in the dataset and that that the published data will be basically the same to the original dataset. The algorithm was then optimized in light of the attributes of the correlated datasets in big data. Finally, the proposed algorithm was proved better than the traditional algorithm in large-scale searches of correlated datasets, and capable of ensuring privacy at a lower privacy budget.


deep neural network (DNN), big data, privacy preserving, differential privacy

1. Introduction
2. Definition of privacy in the release of big data
3. Random noise addition mechanism in the release of big data
4. Privacy analysis of correlated datasets
5. Noise addition mechanism of correlated datasets
6. Experiments and results analysis
7. Conclusions

This work is supported by {2018,2019} Foundation of Improving Academic Ability in University for Young Scholars of Guangxi.


Beimel A., Nissim K., Stemmer U. (2014). Private learning and sanitization: Pure vs. approximate differential privacy. APPROX 2013, RANDOM 2013. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, Vol. 8096, pp. 363-378. https://doi.org/10.1007/978-3-642-40328-6-26

Deng L., Yu D. (2014). Deep learning: Methods and applications. Foundations and Trends® in Signal Processing, Vol. 7, No. 3-4, pp. 197-387. http://dx.doi.org/10.1561/2000000039

Dwork C. (2011a). A firm foundation for private data analysis. Communications of the ACM, Vol. 54, No. 1, pp. 86-95. https://doi.org/10.1145/1866739.1866758

Dwork C. (2011b). The promise of differential privacy: a tutorial on algorithmic techniques. Proc of the 52nd Annual IEEE Symposium on Foundations of Computer Science, USA, pp. 1-2. https://doi.org/10.1109/FOCS.2011.88

Dwork C., Roth A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, Vol. 9, No. 3-4, pp. 211-407. https://doi.org/10.1561/0400000042

Fung B. C. M., Wang K., Chen R., Yu P. S. (2010). Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys (CSUR), Vol. 42, No. 4, pp. 1-53. https://doi.org/10.1145/1749603.1749605

Hall R., Rinaldo A., Wasserman L. (2013). Differential privacy for functions and functional data. J. Mach. Learn. Res, Vol. 14, No. 1, pp. 703-727. https://doi.org/10.1109/MCS.2012.2225913

Kifer D., Machanavajjhala A. (2011). No free lunch in data privacy. SIGMOD '11 Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, Athens, Greece, pp. 193-204. https://doi.org/10.1145/1989323.1989345

Kifer D., Machanavajjhala A. (2014). Pufferfish: A framework for mathematical privacy definitions. ACM Transactions on Database Systems, Vol. 39, No. 1, pp. 1-36. https://doi.org/10.1145/2514689

Kifer D., Smith A. D., Thakurta A. (2012). Private convex optimization for empirical risk minimization with applications to high-dimensional regression. In COLT, Edinburgh, United Kingdom Duration, pp.1-40. https://doi.org/10.1109/FOCS.2014.56

Koufogiannis F., Han S., Pappas G. J. (2016). Gradual release of sensitive data under differential privacy. Privacy and Confidentiality, Vol. 7, No. 2, pp. 1-22. https://doi.org/10.29012/jpc.v7i2.649

Li X. G., Li H., Li F. H., Zhu H. (2018). A survey on differential privacy. Journal of Cyber Security, Vol. 3, No. 5, pp. 92-104. http://dx.doi.org/10.19363/J.cnki.cn10-1380/tn.09.08

Noman M., Chen R., Fung B. C. M., Yu S. (2011). Differentially private data release for data mining. Proceeding KDD '11 Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, California, USA, pp. 493-501. https://doi.org/10.1145/2020408.2020487

Parra-Arnau J., Perego A., Ferrari E., Forne J., Rebollo-Monedero D. (2013). Privacy-preserving enhanced collaborative tagging. IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 1, pp. 180-193. https://doi.org/10.1109/tkde.2012.248

Wang Y., Wang Y., Singh A. (2016). A theoretical analysis of noisy sparse subspace clustering on dimensionality-reduced data. CoRR, eprint arXiv, Vol. 1610, No. 07650, pp. 99. http://dx.doi.org/10.1109/TIT.2018.2879912

Wong R. C. W., Fu A. W., Wang K., Xu Y., Yu P. S. (2011). Can the utility of anonymized data be used for privacy breaches. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 5, No. 3, pp. 1-24. https://doi.org/10.1145/1993077.1993080

Xiao Q., Chen R., Tan K. (2014). Differentially private network data release via structural inference. Proceeding KDD '14 Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, pp. 911-920. https://doi.org/10.1145/2623330.2623642