Home Journals ISI An approach to evaluate RDF data completeness

JOURNAL METRICS

CiteScore 2022: 2.7 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2022: 0.267 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2022: 0.615 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

123.png

An approach to evaluate RDF data completeness

Fayçal Hamdi | Samira Si-said Cherfi

Laboratoire Cédric, Conservatoire national des arts et métiers Paris, France

Corresponding Author Email:

{faycal.hamdi,samira.cherfi}@cnam.fr

Received:

N/A

| |

Accepted:

N/A

| | Citation

isi21_3_04_hamdi-cherfi.pdf

OPEN ACCESS

Abstract:

With the development of data based applications, data quality becomes a burning issue in the context of the Web of Data. Organizations as well as researchers need suitable methods and techniques to help ensuring web data quality along the whole process, from data transformation and publication to data querying and exploitation. Among quality dimensions, completeness is recognized as difficult to evaluate, as it often relies on gold standards and/or a reference schema that are neither always available nor realistic from a practical point of view. In this paper, we propose an approach for the assessment of RDF data completeness. The proposed solution consists, first, on inferring a schema using a frequent itemset mining approach, and second, on measuring the completeness regarding the inferred schema. The paper presents both theoretical background and experimental results performed on real-world RDF datasets.

Keywords:

linked Data, RDF data quality, completeness, quality evaluation

1. Introduction

2. Illustration par l’exemple

3. Problématique

4. Extraction du schéma d’une source de données RDF

5. Évaluation empiriquevv

6. État de l’art

7. Conclusion

References

Ballou D. P., Pazer H. L. (2003). Modeling completeness versus consistency tradeoffs in information decision contexts. Knowledge and Data Engineering, IEEE Transactions on, vol. 15, no 1, p. 240–243.

Batini C., Cappiello C., Francalanci C., Maurino A. (2009). Methodologies for data quality assessment and improvement. ACM Computing Surveys (CSUR), vol. 41, no 3, p. 16.

Bechhofer S., Buchan I., De Roure D., Missier P., Ainsworth J., Bhagat J. et al. (2013). Why linked data is not enough for scientists. Future Generation Computer Systems, vol. 29, no 2, p. 599–611.

Berti-Equille L., Comyn-Wattiau I., Cosquer M., Kedad Z., Nugier S., Peralta V. et al. (2011). Assessment and analysis of information quality: a multidimensional model and case studies. IJIQ, vol. 2, no 4, p. 300–323.

Chen P., Garcia W. (2010). Hypothesis generation and data quality assessment through association mining. In F. Sun, Y. Wang, J. Lu, B. Zhang, W. Kinsner, L. A. Zadeh (Eds.), Proceedings of the 9th IEEE international conference on cognitive informatics, ICCI 2010, july 7-9, 2010, beijing, china, p. 659–666. IEEE.

Codd E. F. (1986). Missing information (applicable and inapplicable) in relational databases. SIGMOD Record, vol. 15, no 4, p. 53–78.

Darari F., Nutt W., Pirrò G., Razniewski S. (2013). Completeness statements about RDF data sources and their use for query answering. In H. Alani et al. (Eds.), The semantic web -ISWC 2013 - 12th international semantic web conference, sydney, nsw, australia, october 21-25, 2013, proceedings, part I, vol. 8218, p. 66–83. Springer.

Eastman C. M., Jansen B. J. (2003). Coverage, relevance, and ranking: The impact of query operators on web search engine results. ACM Transactions on Information Systems (TOIS), vol. 21, no 4, p. 383–411.

Fürber C., Hepp M. (2011). Swiqa-a semantic web information quality assessment framework. In Ecis, vol. 15, p. 19.

Golbeck J. (2006). Combining provenance with trust in social networks for semantic web content filtering. In L. Moreau, I. T. Foster (Eds.), Provenance and annotation of data, international provenance and annotation workshop, IPAW 2006, chicago, il, usa, may 3-5, 2006, revised selected papers, vol. 4145, p. 101–108. Springer.

Gouda K., Zaki M. J. (2001). Efficiently mining maximal frequent itemsets. In Proceedings of the 2001 ieee international conference on data mining, p. 163–170. Washington, DC, USA, IEEE Computer Society.

Grahne G., Zhu J. (2003). Efficiently using prefix-trees in mining frequent itemsets. In B. Goethals,

M. J. Zaki (Eds.), FIMI ’03, frequent itemset mining implementations, proceedings of the ICDM 2003 workshop on frequent itemset mining implementations, 19 december 2003, melbourne, florida, USA, vol. 90. CEUR-WS.org.

Han J., Pei J., Yin Y. (2000). Mining frequent patterns without candidate generation. In W. Chen, J. F. Naughton, P. A. Bernstein (Eds.), Proceedings of the 2000 ACM SIGMOD international conference on management of data, may 16-18, 2000, dallas, texas, USA., p. 1–12. ACM.

Han J., Pei J., Yin Y., Mao R. (2004, janvier). Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov., vol. 8, no 1, p. 53–87.

Hartig O. (2008). Trustworthiness of data on the web. In Proceedings of the sti berlin & csw phd workshop.

Hartig O., Zhao J. (2009). Using web data provenance for quality assessment. In J. Freire, P. Missier, S. S. Sahoo (Eds.), Proceedings of the first international workshop on the role of semantic web in provenance management (SWPM 2009), collocated with the 8th international

semantic web conference (iswc-2009), washington dc, usa, october 25, 2009, vol. 526. CEUR-WS.org.

Herzig D. M., Tran T. (2012). Heterogeneous web data search using relevance-based on the fly data integration. In A. Mille, F. L. Gandon, J. Misselis, M. Rabinovich, S. Staab (Eds.), Proceedings of the 21st world wide web conference 2012, WWW 2012, lyon, france, april 16-20, 2012, p. 141–150. ACM.

Hogan A., Harth A., Passant A., Decker S., Polleres A. (2010). Weaving the pedantic web. In C. Bizer, T. Heath, T. Berners-Lee, M. Hausenblas (Eds.), Proceedings of the WWW2010 workshop on linked data on the web, LDOW 2010, raleigh, usa, april 27, 2010, vol. 628. CEUR-WS.org.

Institute M. G., Chui M., Manyika J., Bughin J., Dobbs R., Roxburgh C. et al. (2012). The social economy: Unlocking value and productivity through social technologies. McKinsey Global Institute.

Jr. R. J. B. (1998). Efficiently mining long patterns from databases. In L. M. Haas, A. Tiwary (Eds.), SIGMOD 1998, proceedings ACM SIGMOD international conference on management of data, june 2-4, 1998, seattle, washington, USA., p. 85–93. ACM Press.

Lee Y.W., Strong D. M., Kahn B. K.,Wang R. Y. (2002). Aimq: a methodology for information quality assessment. Information & management, vol. 40, no 2, p. 133–146.

Markovic M., Edwards P., Corsar D., Pan J. Z. (2012). The crowd and the web of linked data: A provenance perspective. In Wisdom of the crowd, papers from the 2012 AAAI spring symposium, palo alto, california, usa, march 26-28, 2012.

Mendes P. N., Bizer C., Young Y., Miklos Z., Calbimonte J., Moraru A. (2012). Conceptual model and best practices for high-quality metadata. Deliverable 2.1 of PlanetData, FP7 project 257641 (2012).

Mendes P. N., Mühleisen H., Bizer C. (2012). Sieve: linked data quality assessment and fusion. In Proceedings of the 2012 joint edbt/icdt workshops, p. 116–123.

Naumann F., Freytag J.-C., Leser U. (2004). Completeness of integrated information sources. Information Systems, vol. 29, no 7, p. 583–615.

Omitola T., Gibbins N., Shadbolt N. (2010, February). Provenance in Linked Data Integration. In S. Auer, S. Decker, M. Hauswirth (Eds.), Proc. of Linked Data in the Future Internet at the Future Internet Assembly, Ghent 16/17 Dec 2010, vol. 700.

Pipino L. L., Lee Y. W., Wang R. Y. (2002). Data quality assessment. Communications of the ACM, vol. 45, no 4, p. 211–218.

Samwald M., Jentzsch A., Bouton C., Kallesøe C. S., Willighagen E., Hajagos J. et al. (2011). Linked open drug data for pharmaceutical research and development. Journal of cheminformatics, vol. 3, no 1, p. 19.

Wang R. Y., Strong D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of management information systems, p. 5–33.

Zaveri A., Rula A., Maurino A., Pietrobon R., Lehmann J., Auer S. et al. (2013). Quality assessment methodologies for linked open data. Submitted to Semantic Web Journal.

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

An approach to evaluate RDF data completeness