Let T be a set of points each of which is described by the same set of attributes. Let p ∈ T. Then p is a skyline point of T iff there is no other point better than p. A skyline query returns the set of all skyline points. The number of tuples in the result is not always proportional to the size of the source data. Hence, estimate skyline cardinality is a challenging problem which attracted a great deal of work in recent years. This problem is particularly important in case of integration of the skyline operator in database management systems. We propose techniques for estimating skyline cardinality. We first provide an unbiased estimator of the skyline cardinality which requires a pass on data. On another hand, we provide a convergent estimator which does not require any data scanning. It estimates skyline cardinality expectation for those data sets respecting data distribution given as input. The advantages of these solutions are the ease of implementation and, by contrast to other proposals, no costly subskyline queries are required. Our solutions are implemented and some experiments are reported showing both the accuracy of the estimations and the efficiency by which they are obtained.
Skyline cardinality, expectation, sampling, estimation, algorithms
Bartolini I., Ciaccia P., Patella M. (2008). Efficient sort-based skyline evaluation. ACM Trans. Database Syst., vol. 33, no 4.
Bentley J. L., Kung H. T., Schkolnick M., Thompson C. D. (1978, octobre). On the average number of maxima in a set of vectors and applications. J. ACM, vol. 25, no 4, p. 536–543.
Börzsönyi S., Kossmann D., Stocker K. (2001). The skyline operator. In Proceedings of the 17th international conference on data engineering, p. 421–430. Washington, DC, USA, IEEE Computer Society.
Buchta C. (1989, novembre). On the average number of maxima in a set of vectors. Inf. Process. Lett., vol. 33, no 2, p. 63–65.
Chaudhuri S., Dalvi N., Kaushik R. (2006). Robust cardinality and cost estimation for skyline operator. In Proceedings of the 22nd international conference on data engineering, p. 64–. Washington, DC, USA, IEEE Computer Society.
Chomicki J., Godfrey P., Gryz J., Liang D. (2003). Skyline with presorting. In Proc. of ICDE conference.
Efron B. (1979, 01). Bootstrap methods: Another look at the jackknife. Ann. Statist., vol. 7, no 1, p. 1–26.
Godfrey P., Seipel D., Turull-Torres J. (2004). Skyline cardinality for relational processing: how many vectors are maximal?. Rapport technique. York Univ., Toronto, Ont., Canada.
Horvitz D. G., Thompson D. J. (1952, December). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, vol. 47, no 260, p. 663–685.
Lee J., Hwang S. won. (2010). BSkyTree: scalable skyline computation using a balanced pivot selection. In Proc. of EDBT conf.
Luo C., Jiang Z., Hou W.-C., He S., Zhu Q. (2012). A sampling approach for skyline query cardinality estimation. Knowledge and Information Systems, vol. 32, no 2, p. 281-301.
Morse M. D., Patel J. M., Jagadish H. V. (2007). Efficient skyline computation over lowcardinality domains. In Proceedings of VLDB conf.
Xia T., Zhang D., Fang Z., Chen C. X., Wang J. (2012). Online subspace skyline query processing using the compressed skycube. ACM TODS, vol. 37, no 2.
Zhang Z., Yang Y., Cai R., Papadias D., Tung A. (2009). Kernel-based skyline cardinality estimation. In Proceedings of the 2009 acm sigmod international conference on management of data, p. 509–522. New York, NY, USA, ACM.