Subject-Verb-Object (SVO) Triples in NELL

This page describes the Subject-Verb-Object (SVO) triples generated as part of the NELL project. Current set of 604 million triples extracted from the entire dependency parsed ClueWeb09 dataset (about 230 billion tokens) is available here (604m triples, 9.5GB, compressed). Each line in the file consists of the following tab-delimited fields: Subject, Verb(+Preposition), Object, Frequency.

The following papers have used this dataset:

Improving Learning and Inference in a Large Knowledge-base using Latent Syntactic Cues.
Matt Gardner, Partha Pratim Talukdar, Bryan Kisiel, and Tom Mitchell.
In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013. [PDF]

PIDGIN: Ontology Alignment using Web Text as Interlingua.
Derry T. Wijaya, Partha P. Talukdar and Tom M. Mitchell.
In Proceedings of the Conference on Information and Knowledge Management (CIKM), 2013. [PDF]

Acquiring Temporal Constraints between Relations.
Partha P. Talukdar, Derry T. Wijaya and Tom M. Mitchell.
In Proceedings of the Conference on Information and Knowledge Management (CIKM), 2012. [PDF]

Acknowledgment

We thank the ClueWeb project (CMU) and the Hazy Research Group for their generous help with data sets. If you use this dataset in any publication, please acknowledge the CIKM 2012 paper above, and send us a quick note so that we can update the list above.

Contact

If you have any question about this dataset, please contact Partha Talukdar (ppt@cs.cmu.edu).