Improving Learning and Inference in a Large Knowledge-base using Latent Syntactic Cues

Matt Gardner, Partha Pratim Talukdar, Bryan Kisiel, and Tom Mitchell

International Conference on Empirical Methods for NLP (EMNLP 2013), Seattle, USA. [Short Paper]


A preprint of the paper can be found here.


The code used in this paper lives in a repository on github. Note, though, that the code has changed a lot, and I can't provide support for getting the old code to work correctly. I can, however, help you get a similar experiment running with newer data. First, follow the quick start guide for the 2015 paper, found here. Next, download and untar the necessary data to create a graph for the 2013 method here. Finally, modify the example experiment spec at examples/experiment_specs/nell/emnlp2013_method/pra_latent_d.json. Replace all the instances of /path/to/data with the path to where you untarred the data. Then run sbt "run ./examples/ pra_latent_d", as done in the 2015 quick start. This should run the method from the 2013 paper on the 2015 data set. That's the best I can do for now.


The data (graph files, training/testing split, and other things) that was used in the paper can be found here (1.8GB download). That download contains only the latent embeddings of the subject-verb-object (SVO) table used in the paper (well, it also contains the SVO edges that were added to the graph for one of the methods, but that's not an easily usable version of the SVO data). The SVO data itself can be found here. A set of scripts that can be used to run the code can be found here. These scripts don't come with much help, though, and you'll have to change a lot of paths in If you want to actually run the scripts to reproduce the results of the paper and are having a hard time, contact information is below. If you just want to use the PRA code in something else, you shouldn't need these particular scripts, and can just look at the documentation of the code that's on github. I think it's pretty easy to use.


If you have any questions about the paper, about using the code, or about obtaining the data, the main point of contact for this paper is Matt Gardner.