SMILK, linking natural language and data from the web

SMILK, linking natural language and data from the web

Cédric Lopez Molka Tounsi Dhouib Elena Cabrio Catherine Faron Zucker Fabien Gandon Frédérique Segond  

Viseo Technologies R&D 4 avenue doyen Louis Weil, 38000 Grenoble, France

Université Côte d’Azur, Inria, CNRS, I3S Sophia Antipolis, France

Emvista – Montpellier, France

INALCO, ERTIM Paris, France

Corresponding Author Email:;;
30 June 2018
| Citation



As part of the SMILK Joint Lab, we studied the use of Natural Language Processing to: (1) enrich knowledge bases and link data on the web, and conversely (2) use this linked data to contribute to the improvement of text analysis and the annotation of textual content, and to support knowledge extraction. The evaluation focused on brand-related information retrieval in the field of cosmetics. This article describes each step of our approach: the creation of ProVoc, an ontology to describe products and brands; the automatic population of a knowledge base mainly based on ProVoc from heterogeneous textual resources; and the evaluation of an application which that takes the form of a browser plugin providing additional knowledge to users browsing the web.


web of data, ontologies, natural language processing, linked data

1. Introduction
2. ProVoc, une ontologie pour décrire les produits sur le web
3. Peuplement d’une base de connaissances avec ProVoc
4. Application
5. Conclusion

Alec C., Safar B., Reynaud-Delaître C., Sellami Z., Berdugo U. (2014). Peuplement automatique d’ontologie à partir d’un catalogue de produits. In 25e journées francophones d’ingénierie des connaissances, IC 2014, Clermont-Ferrand, France, p. 87–98. 

Amardeilh F., Laublet P., Minel J.-L. (2005). Annotation documentaire et peuplement d’ontologie à partir d’extractions linguistiques. In 26e journées francophones d’ingénierie des connaissances, IC 2005, Grenoble, p. 100–112. 

Ashraf J., Cyganiak R., O’Riain S., Hadzic M. (2011). Open ebusiness ontology usage: Investigating community implementation of goodrelations. In Proceedings of the 20th international world wide web conference. 

Bachimont, B. (2000). Engagement sémantique et engagement ontologique: conception et réalisation d’ontologies en ingénierie des connaissances. Ingénierie des connaissances: évolutions récentes et nouveaux défis, p. 305-323. 

Brickley D., Miller L. (2010). Foaf vocabulary specification 0.98. namespace document 9 august 2010, Marco Polo Edition, 

Ceccarelli D., Lucchese C., Orlando S., Perego R., Trani S. (2013). Dexter: an open source framework for entity linking. In Proceedings of the sixth international workshop on exploiting semantic annotations in information retrieval, CIKM’13, 22nd ACM Int. Conf. on Information and Knowledge Management, San Francisco, CA, USA, p. 17-20. 

Gerber D., Hellmann S., Buhmann L., Soru T., Usbeck R., Ngomo A.-C. N. (2013). Real-time rdf extraction from unstructured data streams. In Proceedings of the 12th international semantic web conference, ISWC 2013, Sydney, NSW, Australia, vol. 8218, p. 135-150. Springer. 

Grüninger M., Fox M. S. (1995). Methodology for the design and evaluation of ontologies. In Proceedings of the Workshop on Basic Ontological Issues in Knowledge Sharing, Workshop on Basic Ontological Issues in Knowledge Sharing, IJCAI, Québec, Canada. 

Hearst M. A. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on computational linguistics, vol. 2, Association for Computational Linguistics, Nantes, France, p. 539–545.

Hepp M. (2005). EclassOWL: A fully-fledged products and services ontology in owl. Poster Proceedings of ISWC2005. Galway. 

Hepp M. (2008). Goodrelations: An ontology for describing products and services offers on the web. Knowledge Engineering: Practice and Patterns, p. 329–346. 

Hepp M., Bruijn J. de. (2007). Gentax: A generic methodology for deriving owl and rdfs ontologies from hierarchical classifications, thesauri, and inconsistent taxonomies. In European semantic web conference, Innsbruck, Austria, p. 129–144. 

Kiryakov A., Popov B., Terziev I., Manov D., Ognyanoff D. (2004). Semantic annotation, indexing, and retrieval. Web Semantics: Science, Services and Agents on the World Wide web, vol. 2, no 1, p. 49–79. 

Kumar K., Manocha S. (2015). Constructing knowledge graph from unstructured text. Self, 3, p. 4. 

Lopez C., Nooralahzadeh F., Cabrio E., Segond F., Gandon, F. (2016). ProVoc: une ontologie pour décrire des produits sur le web. In IC2016: 27e Journées francophones d’Ingénierie des Connaissances, Montpellier, France, p. 61-72. 

Lopez C., Osmuk M., Popovici D., Nooralahzadeh F., Rabarijaona D., Gandon F., Segond F. (2016). Du TALN au LOD: Extraction d’entités, liage, et visualisation. In IC2016: 27e 

Journees francophones d’Ingenierie des Connaissances, Montpellier, France (demo paper). Lopez C., Cabrio E., Segond F. (2017). Extraction de relations pour le peuplement d’une base de connaissances à partir de tweets. In 17e journées francophones extraction et gestion des connaissances, EGC 2017, Grenoble, France, p. 375-380. 

Lopez C., Segond F., Hondermarck O., Curtoni P., Dini L. (2014). Generating a resource for products and brandnames recognition. Application to the cosmetic domain. In Proceedings of the ninth international conference on language resources and evaluation, LREC 2014, Reykjavik, Iceland, p. 2559–2564. European Language Resources Association (ELRA). 

McDonald D. (1996). Internal and external evidence in the identification and semantic categorization of proper names. Corpus processing for lexical acquisition, p. 21–39. 

Nebhi K. (2013). A rule-based relation extraction system using DBpedia and syntactic parsing. Proceedings of the NLP & dbpedia workshop co-located with the 12th int. semantic web conference (ISWC 2013), Sydney, Australia, vol. 1064, p. 74-79. 

Nooralahzadeh F., Lopez C., Cabrio E., Gandon F., Segond F. (2016). Adapting semantic spreading activation to entity linking in text. In Proceedings of the 21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016, Salford, UK, vol. 9612, p. 74–90. Springer. 

Tounsi M., Lopez C., Faron Zucker C., Cabrio E., Gandon F., Segond F. (2017). Peuplement d’une base de connaissances par annotation automatique de textes relatifs à la cosmétique. In 28e Journées francophones d’Ingénierie des Connaissances, IC 2017, p. 104-114. Caen, France. 

Urieli A. (2013). Robust french syntax analysis: reconciling statistical methods and linguistic knowledge in the Talismane toolkit. Thèse de doctorat non publiée, Université Toulouse le Mirail-Toulouse II. 

Uschold M., Gruninger M. (1996). Ontologies: Principles, methods and applications. The knowledge engineering review, vol. 11, no 2, p. 93-136.