Automatic extraction of entities and relations by ontology and inductive logic programming

Automatic extraction of entities and relations by ontology and inductive logic programming

Bernard Espinasse Rinaldo Lima Fred Freitas 

Aix-Marseille Université, LSIS UMR CNRS 6168 Domaine Universitaire de St Jerôme, F-13997, Marseille cedex 20, France

Universidade Federal Rural de Pernambuco – UFRPE-DEINFO Rua Dom Manoel de Medeiros, s/n, Campus Dois Irmãos, Recife/PE, Brasil

Universidade Federal de Pernambuco, CIn - UFPE Centro de Informática, Cx Postal 7851, 50372-970, Recife/PE, Brasil

Corresponding Author Email: 
bernard.espinasse@lsis.org, rinaldo.jose@ufrpe.br, fred@cin.ufpe.br
Page: 
637-674
|
DOI: 
https://doi.org/10.3166/RIA.30.637-674
Received: 
N/A
|
Accepted: 
N/A
|
Published: 
31 December 2016
| Citation
Abstract: 

Faced with the growing amount of information available both on the We b and in digital libraries, the development of automatic Information Extraction (IE) systems, both effective, robust and adaptive, is a big challenge. In IE domain, Named Entity Recognition (NER) and Relation Extraction (RE) are two important tasks. The former aims at finding named instances, as peoplés names, locations, among others, whereas the latter consists detecting and characterizing relations among such named entities in text. Most of the state-of-the-art supervised learning methods for NER and RE relies on statistical machine learning techniques with higher accurate results for NER than RE. These statistical machine learning techniques typically uses a propositional hypothesis space for representing examples, i.e., an attribute-value representation. Such representation presents some limitations particularly to the extraction of complex relations, which demand more semantic resources, and mainly contextual information about the involving instances. In this paper, we present an IE system, named OntoILPER, permitting to extract both entity and relation instances from textual document in english. This system, not only benefits from a domain ontology as semantic resource, but also takes advantage of a higher expressive relational hypothesis space for representing examples whose structure is relevant to the task at hand. OntoILPER induces extraction rules that subsume examples of entities and relation instances from a specific graph-based model of sentence representation. Moreover, the system enables the application of domain ontologies and further ground knowledge in the form of relational features. In addition, this paper presents several experiments with OntoILPER on NER and RE using the TREC reference corpus, and compare these results to other state-of-the-art IE systems.

Keywords: 

entity and relation extraction, symbolic machine learning, ontology-based information extraction, inductive logic programming, ontology population.

1. Introduction
2. Extraction d’information, ontologies et programmation logique inductive
3. Une méthode d’extraction d’information symbolique
4. Le système OntoILPER
5. Évaluation expérimentale d’OntoILPER
6. Évaluation comparative
7. Conclusion
Remerciements

Les auteurs remercient le Conseil national de développement scientifique et technologique du Brésil (CNPq) pour son soutien financier (Grant N°140791/2010-8).

  References

Airola A., Pyysalo S., Björne J., Pahikkala T., Ginter F., Salakoski T. (2008). All-paths graph kernel for protein–protein interaction extraction with evaluation of cross corpus learning, BMC Bioinformatics, 9:S2.

Alphonse E., Rouveirol C. (2000). Lazy propositionalisation for relational learning. In Horn W. (ed.). 14th European Conference on Artificial Intelligence (ECAI’2000), Berlin, Germany, pp. 256-260, IOS Press.

Bach N., Badaskar S. (2007). A Survey on Relation Extraction. Language Technologies Institute, Carnegie Mellon University.

Baeza-Yates R., Ribeiro-Neto B. (1999). Modern Information Retrieval. Addison-Wesley.

Choi S-P, Jeong C-H, Choi Y-S, Myaeng S-H (2009). Relation extraction based on extended composite kernel using flat lexical features. JKIISE: Software Application, 36:8.

Choi S. P., Lee. S., Jung H., Song S. (2013). An intensive case study on kernel-based relation extraction. Proceedings of Multimedia Tools and Applications, Springer, US, p. 1-27.

Culotta A., Sorensen J. (2004). Dependency tree kernels for relation extraction. ACL’2004, p. 423-429. 21-26 July 2004. Barcelona, Spain.

De Raedt L. (2010). Inductive Logic Programming. Encyclopedia of Machine Learning, p. 529-537.

Dipper S., Götze M., Küssner U., and Stede M. (2007). Representing and querying standoff XML. Proceedings of the GLDV-Frühjahrstagung 2007, Tübingen, Germany.

Dlugolinský S., Ciglan M., Laclavík M. (2013). Evaluation of Named Entity Recognition Tools on Microposts (2013). INES 2013, 17th IEEE International Conference on Intelligent Engineering Systems. Budapest p. 197-202.

Erk K., Pado S. (2004). A Powerful and Versatile XML Format for Representing Rolesemantic Annotation. LREC 2004, Lisbon, Portugal.

Ehrmann M. (2008). Les entités nommées, de la linguistique au TAL : statut théorique et méthodes de désambiguïsation, Thèse de Doctorat, Université Paris 7 – Denis Diderot.

Finn A. (2006). Multi-Level Boundary Classification Approach to Information Extraction, Phd thesis, University College Dublin.

Fürnkranz J., Gamberger D., Lavrac N. (2012). Foundations of Rule Learning, Springer-Verlag.

Giuliano C., Lavelli A., Romano L. (2007). Relation extraction and the influence of automatic NER, ACM Transactions on Speech and Language Processing, vol. 5, n° 1, ACM.

Gruber T. (1993). Towards Principles for the Design of Ontologies used for Knowledge Sharing. Int. Workshop on Formal Ontology in Conceptual Analysis and Knowledge Representation. Kluwer Academic Publishers, Deventer, The Netherlands.

Hitzler P., Krötzsch M., Parsia B., Patel-Schneider P.F., Rudolph S. (editors) (2009). OWL 2 Web Ontology Language Primer. W3C Working Draft, http://www.w3.org/TR/owl2-primer/

Horváth T., Paass G., Reichartz F., Wrobel S. (2009). A Logic-based Approach to Relation Extraction from Texts. ILP 2009: 34-48, Leuven, Belgium.

Jiang J. (2012). Information Extraction from Text, in C.C. Aggarwal and C.X. Zhai (eds), Mining Text data, chap. 2, p. 11-41.

Jiang J., Zhai C. X. (2007). A systematic exploration of the feature space for relation extraction. Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL-HLT’2007, Rochester, NY, USA.

Kambhatla N. (2004). Combining lexical, syntactic and semantic features with Maximum Entropy models for extracting relations. ACL’2004 (Poster), 21-26 July 2004, Barcelona, Spain, p. 178-181.

Karkaletsis V., Fragkou P., Petasis G., and Iosif E. (2011). Ontology Based Information Extraction from Text. Paliouras G. et al. (Eds.) Multimedia Information Extraction, LNAI 6050, p. 89-109.

Kinoshita S., Cohen K. B., Ogren P., and Hunter L. (2005). BioCreAtIvEtask 1A: Entity identification with a stochastic tagger. BMC Bioinformatics, 6(Suppl 1):S4.

Kruijff G. J. M. (2002). Formal and Computational Aspects of Dependency Grammar: History and Development of DG, Tech. report, ESSLLI.

Lavrac N., Dzeroski S. (1994). Inductive Logic Programming: Techniques and Applications. Ellis Horwood, New York.

Lima R., Batista J., Ferreira R., Freitas F., Lins R., Simske S., Riss M. (2014). Improving Relation Extraction through the Simplification of Graph-based Representations of Sentences. Proceedings of the 14th ACM Symposium on Document Engineering (DocEng 2014), September 16-19, Denver, Colorado, USA.

Lima R., Espinasse B., Oliveira H., Pentagrossa L., Freitas F. (2013). Information Extraction from the Web: An Ontology–Based Method using Inductive Logic Programming. In Proceedings of the IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2013, Washington DC, USA.

Lima R., Espinasse B., Oliveira H., Freitas F. (2014). Ontology Population from the Web: an Inductive Logic Programming-Based Approach. Proceedings of the 11th International Conference on Information Technology: New Generations, ITNG 2014, Las Vegas, Nevada, USA.

Lima R. (2014). OntoILPER: an Ontology and Inductive Logic Programming-based method to extract instances of Entities and Relations from texts, UFPE, Phd. thesis.

Lima R., Espinasse B., Freitas F. (2015). Relation Extraction from Texts with Symbolic Rules Induced by Inductive Logic Programming. IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2015, Vietri sul Mar, Italy.

De Marneffe M-C., Manning C. D. (2008). Stanford Dependencies Manual. http://nlp.stanford.edu/software/stanford-dependencies.shtml

Mitchel T. (1982). Generalization as Search. Artificial Intelligence 18, p. 203-226.

Muggleton S. (1991). Inductive Logic Programming. New Generation Computing 8 (4), 29.

Muggleton S. (1995). Inverse entailment and Progol. New Generation Computing, 13, p. 245-286).

Muggleton S., Fen C. (1990). Efficient induction of logic programs. 1st Conference on Algorithmic Learning Theory (pp. 368-381), Tokyo, Japan.

Muggleton S., Santos J., Tamaddoni-Nezhad A. (2009). ProGolem: a system based on relative minimal generalisation. 19th International Conference on ILP, Springer, p. 131-148, Leuven, Belgium.

Nazarenko A., Nédellec C., Alphonse E., Aubin S., Hamon T., and Manine A.-P. (2006). Semantic annotation in the Alvis project. In W. Buntine and H. Tirri, editors, Proceedings of the International Workshop on Intelligent Information Access, pages 40–54, Helsinki, Finlande.

Nédellec C., Rouveirol C., Adé H., Bergadano F. et Tausend B (1996). Declarative Bias in ILP. In Advances in Inductive Logic Programming, p. 82-103, De Raedt L. (Ed.), IOS Press.

Nédellec C., Nazarenko A. (2005). Ontologies and Information Extraction. LIPN Internal Report.

Nédellec C., Nazarenko A., Bossy R. (2008). Information Extraction. In: Staab, S., Studer, R. (editors). Ontology Handbook. Springer, Heidelberg.

Okanohara D., Miyao Y., Tsuruoka Y., and Tsujii J. (2006). Improving the scalability of semi-Markov conditional random fields for named entity recognition. Proc. of the 21st International Conf. on Computational Linguistics and the 44th Annual Meeting of the ACL, p. 465-472.

Patel A., Ramakrishnan G., Bhattacharya P. (2010). Incorporating Linguistic Expertise Using ILP for Named Entity Recognition in Data Hungry Indian Languages, LNCS, vol. 5989, p. 178-185, Springer Berlin Heidelberg.

Petasis G., Karkaletsis V., Paliouras G., Krithara A., Zavitsanos E. (2011). Ontology Population and Enrichment: State of the Art, in G. Paliouras et al. (Eds.): Multimedia Information Extraction, LNAI 6050, p. 134-166.

Plotkin G. (1971). A note on inductive generalization. Machine Intelligence 5, p. 153-163.

Rabiner L. R. (1089). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, vol. 77, n° 2, p. 257-287.

Ramakrishnan G., Joshi S., Balakrishnan S., Srinivasan A. (2008). Using ILP to Construct Features for Information Extraction from Semi-structured Text. In Proceedings of the 17th International Conference on Inductive Logic Programming, LNAI 4894, p. 211-224, Berlin, Springer.

Roth D., Yih W. (2007). Global Inference for entity and relation identification via a linear programming formulation. In Introd. to Statistical Relational Learning, L. Getoor and B. Taskar, Eds. MIT Press.

Roth D., Yih W. (2004). A Linear Programming Formulation for Global Inference in Natural Language Tasks. Proc. of CoNLL-2004, Boston, MA, USA.

Saggion H., Funk A., Maynard D., Bontcheva, K. (2007). Ontology-based Information Extraction for Business Intelligence, ISWC’07/ASWC’07, Busan.

Santos J. (2010). Efficient Learning and Evaluation of Complex Concepts in Inductive Logic Programming, Ph.D. Thesis, Imperial College.

Seneviratne M. D. S., Ranasinghe D. N. (2011). Inductive Logic Programming in an Agent System for Ontological Relation Extraction, International Journal of Machine Learning and Computing, vol. 1, n° 4, p. 344-352.

Shen D., Zhang J., Zhou G., Su J., and Tan C.-L. (2003). Effective adaptation of a hidden markov model-based named entity recognizer for biomedical domain. In Proc. of the ACL 2003 Workshop on NLP in Biomedicine, vol. 13, p. 49-56, Sapporo, Japan.

Smole D., Ceh M. and Podobnikar T. (2011). Evaluation of inductive logic programming for information extraction from natural language texts to support spatial data recommendation services. International Journal of Geographical Information Science, 25, p.1809-1827.

Tang J., Hong M., Zhang D., Liang B., and Li J. (2007). Information Extraction: Methodologies and Applications. In The book of Emerging Technologies of Text Mining: Techniques and Applications, Hercules A. Prado and Edilson Ferneda (Ed.), Idea Group Inc., Hershey, USA, p. 1-33.

Wimalasuriya D. C., Dou D. (2009). Ontology-Based Information Extraction: An Introduction and a Survey of Current Approaches, Journal of Information Science, JIS-0987-v4, 2009, p. 1-20.

Zhang M., Zhou G.D., Aw A.T. (2008). Exploring syntactic structured features over parse trees for relation extraction using kernel methods, Information Processing and Management, 44, p. 687-701.

Zhao S. B., Grisman R. (2005). Extracting Relations with Integrated Information using Kernel Methods. ACL’2005, 25–30 June 2005, Ann Arbor, USA, p. 419-426.

Zhou G. D., Su J., Zhang J., Zhang M. (2005). Exploring various knowledge in relation extraction, in: Proceedings of the Annual Meeting of the Association for Computational Linguistics - ACL’2005, 25-30 June 2005, Ann Arbor, Michigan, USA.

Zhou G., Zhang M., Ji D-H., Zhu Q. (2007). Tree Kernel-based Relation Extraction with Context-Sensitive Structured Parse Tree Information. Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague.