Sequential pattern mining under constraints is a challenging data mining task. Many efficient ad hoc methods have been developed for mining sequential patterns, but they are all suffering from a lack of genericity. Recent works have investigated Constraint Programming (CP) methods, but they are not still effective because of their encoding. In this paper, we propose a global constraint based on the projected databases principle which remedies to this drawback. Experiments show that our approach clearly outperforms CP approaches and competes well with ad hoc methods on large datasets.
sequential pattern mining, prefix-projection, constraint programming, global constraint.
Agrawal R., Srikant R. (1995). Mining sequential patterns. In P. S. Yu, A. L. P. Chen (Eds.), Proceedings of the eleventh international conference on data engineering (icde), p. 3-14. IEEE Computer Society.
Ayres J., Flannick J., Gehrke J., Yiu T. (2002). Sequential pattern mining using a bitmap representation. In KDD 2002, p. 429–435. ACM.
Béchet N., Cellier P., Charnois T., Crémilleux B. (2012). Sequential pattern mining to discover relations between genes and rare diseases. In Cbms.
Beldiceanu N., Contejean E. (1994). Introducing global constraints in CHIP. Journal of Mathematical and Computer Modelling, vol. 20, no 12, p. 97-123.
Coquery E., Jabbour S., Saïs L., Salhi Y. (2012). A SAT-based approach for discovering frequent, closed and maximal patterns in a sequence. In Ecai 2012 - 20th european conference on artificial intelligence, vol. 242, p. 258-263. IOS Press.
Dong G., Pei J. (2007). Sequence data mining (vol. 33). Kluwer.
Fournier-Viger P., Gomariz A., Gueniche T., Soltani A., Wu C., Tseng V. (2014). SPMF: A Java Open-Source Pattern Mining Library. Journal of Machine Learning Research, vol. 15, p. 3389-3393. Consulté sur http://jmlr.org/papers/v15/fournierviger14a.html
Garofalakis M. N., Rastogi R., Shim K. (2002). Mining sequential patterns with regular expression constraints. IEEE Trans. Knowl. Data Eng., vol. 14, no 3, p. 530-552.
Guns T., Nijssen S., Raedt L. D. (2011). Itemset mining: A constraint programming perspective. Artif. Intell., vol. 175, no 12-13, p. 1951-1983.
Kemmar A., Loudni S., Lebbah Y., Boizumault P., Charnois T. (2015). PREFIX-PROJECTION global constraint for sequential pattern mining. In Principles and practice of constraint programming - 21st international conference, CP 2015, cork, ireland, august 31 - september 4, 2015, proceedings, lncs 9255, p. 226–243. Consulté sur http://dx.doi.org/10.1007/978-3-319-23219-5_17
Kemmar A., Loudni S., Lebbah Y., Boizumault P., Charnois T. (2016). A global constraint for mining sequential patterns with GAP constraint. In Integration of AI and OR techniques in constraint programming - 13th international conference, CPAIOR 2016, banff, ab, canada, may 29 - june 1, 2016, proceedings, vol. 9676, p. 198–215. Springer.
Kemmar A., UgarteW., Loudni S., Charnois T., Lebbah Y., Boizumault P. et al. (2014). Mining relevant sequence patterns with cp-based framework. In 26th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2014), p. 552–559. IEEE Computer Society.
Li C., Yang Q.,Wang J., Li M. (2012, mars). Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data, vol. 6, no 1, p. 2:1–2:39.
Métivier J.-P., Loudni S., Charnois T. (2013). A constraint programming approach for mining sequential patterns in a sequence database. In Ecml/pkdd workshop on languages for data mining and machine learning.
Négrevergne B., Guns T. (2015). Constraint-based sequence mining using constraint programming. In Integration of AI and OR techniques in constraint programming - 12th international conference, CPAIOR 2015, barcelona, spain, may 18-22, 2015, proceedings, vol. 9075, p. 288–305. Springer.
Pei J., Han J., Mortazavi-Asl B., Pinto H., Chen Q., Dayal U. et al. (2001). PrefixSpan: Mining sequential patterns by prefix-projected growth. In Icde, p. 215-224. IEEE Computer Society.
Pei J., Han J., Wang W. (2002). Mining sequential patterns with constraints in large databases. In Cikm’02, p. 18–25. ACM.
Pesant G. (2004). A regular language membership constraint for finite sequences of variables. In M. Wallace (Ed.), Principles and practice of constraint programming - cp 2004, vol. 3258, p. 482-495. Springer.
Srikant R., Agrawal R. (1996). Mining sequential patterns: Generalizations and performance improvements. In Edbt, p. 3-17.
Trasarti R., Bonchi F., Goethals B. (2008). Sequence mining automata: A new technique for mining frequent sequences under regular expressions. In Icdm’08, p. 1061-1066.
Wang J., Han J. (2004). BIDE: Efficient mining of frequent closed sequences. In Icde, p. 79-90.
Yan X., Han J., Afshar R. (2003). CloSpan: Mining closed sequential patterns in large databases. In D. Barbará, C. Kamath (Eds.), Proceedings of the third siam international conference on data mining, san francisco (sdm). SIAM.
Yang G. (2006). Computational aspects of mining maximal frequent patterns. Theor. Comput. Sci., vol. 362, no 1-3, p. 63–85.
Zaki M. J. (2000). Sequence mining in categorical domains: Incorporating constraints. In Proceedings of the 2000 ACM CIKM international conference on information and knowledge management, mclean, va, usa, november 6-11, 2000, p. 422–429.
Zaki M. J. (2001). SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, vol. 42, no 1/2, p. 31-60.