PIDGIN: Ontology Alignment using Web Text as Interlingua

Derry Wijaya, Partha Pratim Talukdar, Tom Mitchell

International Conference on Information and Knowledge Management (CIKM 2013), San Francisco, USA.

Abstract

The problem of aligning ontologies and database schemas across different knowledge bases and databases is fundamen- tal to knowledge management problems, including the prob- lem of integrating the disparate knowledge sources that form the semantic web’s Linked Data [5]. We present a novel approach to this ontology alignment problem that employs a very large natural language text corpus as an interlingua to relate different knowledge bases (KBs). The result is a scalable and robust method (PID- GIN1) that aligns relations and categories across different KBs by analyzing both (1) shared relation instances across these KBs, and (2) the verb phrases in the text instantia- tions of these relation instances. Experiments with PIDGIN demonstrate its superior performance when aligning ontolo- gies across large existing KBs including NELL, Yago and Freebase. Furthermore, we show that in addition to align- ing ontologies, PIDGIN can automatically learn from text, the verb phrases to identify relations, and can also type the arguments of relations of different KBs.

Paper

A preprint of the paper is available here.

Data

The subject-verb-object (SVO) dataset used in this paper is available here.

Code

For a copy of the code (or for any other query) please contact dwijaya@andrew.cmu.edu or ppt@cs.cmu.edu