Leveraging efficient indexing schema to support multigraph query answering

Leveraging efficient indexing schema to support multigraph query answering

Vijay Ingalalli Dino Ienco Pascal Poncelet

LIRMM - Montpellier, France

IRSTEA - Montpellier, France

Corresponding Author Email: 
{vijay,pascal.poncelet}@lirmm.fr, dino.ienco@irstea.fr
Page: 
53-74
|
DOI: 
https://doi.org/10.3166/ISI.21.3.53-74
Received: 
N/A
|
Accepted: 
N/A
|
Published: 
30 June 2016
| Citation
Abstract: 

Many real world datasets can be represented by graphs with a set of nodes interconnected with each other by multiple relations (e.g., social network, RDF graph, biological data). Such a rich graph, called multigraph, is well suited to represent real world scenarios with complex interactions. However, performing subgraph query on multigraphs is still an open issue since, unfortunately, all the existing algorithms for subgraph query matching are not able to adequately leverage the multiple relationships that exist between the nodes. Motivated by the lack of approaches for sub-multigraph query and stimulated by the increasing number of datasets that can be modelled as multigraphs, in this paper we propose IMQA (Index based Multigraph Query Answering), a novel algorithm to extract all the embeddings of a sub-multigraph query from a single large multigraph. IMQA is composed of two main phases: Firstly, it implements a novel indexing schema for multiple edges, which will help to efficiently retrieve the vertices of the multigraph that match the query vertices. Secondly, it performs an efficient subgraph search to output the entire set of embeddings for the given query. Extensive experiments conducted on real datasets prove the time efficiency as well as the scalability of IMQA.

Keywords: 

multigraph query, indexing, subgraph query matching

1. Introduction
2. Background
3. An Overview of IMQA
4. Indexing
5. Subgraph Query Processing
6. Experimental Evaluation
7. Conclusion
Acknowledgments
  References

Boden B., Günnemann S., Hoffmann H., Seidl T. (2012). Mining coherent subgraphs in multilayer graphs with edge labels. In Kdd, pp. 1258–1266.

Bonchi F., Gionis A., Gullo F., Ukkonen A. (2014). Distance oracles in edge-labeled graphs. In Edbt, p. 547-558.

Bonnici V., Giugno R., Pulvirenti A., Shasha D., Ferro A. (2013). A subgraph isomorphism algorithm and its application to biochemical data. BMC Bioinformatics, Vol. 14, No. S-7, pp. S13.

Cheng J., Ke Y., Ng W., Lu A. (2007). Fg-index: towards verification-free query processing on graph databases. In Sigmod, pp. 857–872.

Cordella L. P., Foggia P., Sansone C., Vento M. (2004). A (sub) graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, No. 10, pp. 1367–1372.

Han W.-S., Lee J., Lee J.-H. (2013). Turbo iso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In Sigmod, pp. 337–348.

He H., Singh A. K. (2008). Graphs-at-a-time: query language and access methods for graph databases. In Sigmod, pp. 405–418.

Hopcroft J. E., Karp R. M. (1973). An n5=2 algorithm for maximum matchings in bipartite graphs. SIAM Journal on computing, Vol. 2, No. 4, pp. 225–231.

Lee J., Han W.-S., Kasperovics R., Lee J.-H. (2012). An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB, Vol. 6, No. 2, pp. 133–144.

Libkin L., Reutter J., Vrgoˇc D. (2013). Trial for rdf: adapting graph query languages for rdf data. In Pods, pp. 201–212.

Lin Z., Bei Y. (2014). Graph indexing for large networks: A neighborhood tree-based approach. Knowledge-Based Systems, Vol. 72, pp. 48–59.

Ren X., Wang J. (2015). Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. PVLDB, Vol. 8, No. 5, pp. 617–628.

Shang H., Zhang Y., Lin X., Yu J. X. (2008). Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB, Vol. 1, No. 1, pp. 364–375.

Tang L.,Wang X., Liu H. (2012). Community detection via heterogeneous interaction analysis. Data Mining and Knowledge Discovery, Vol. 25, pp. 1-33.

Terrovitis M., Passas S., Vassiliadis P., Sellis T. (2006). A combination of trie-trees and inverted files for the indexing of set-valued attributes. In Cikm, pp. 728–737.

Yan X., Yu P. S., Han J. (2004). Graph indexing: a frequent structure-based approach. In Sigmod, pp. 335–346.

Zhang A. (2009). Protein interaction networks: Computational analysis. Cambridge University Press.

Zhao P., Han J. (2010). On graph query optimization in large networks. PVLDB, Vol. 3, No. 1-2, pp. 340–351.

Zhao X., Xiao C., Lin X.,WangW., Ishikawa Y. (2013). Efficient processing of graph similarity queries with edit distance constraints. VLDB Journal, Vol. 22, No. 6, pp. 727–752.