Incorporating Vector Space Similarity in Random Walk Inference over Knowledge Bases

Matt Gardner, Partha Talukdar, Jayant Krishnamurthy, and Tom Mitchell

International Conference on Empirical Methods for Natural Language Processing (EMNLP 2014), Doha, Qatar.

Paper

A preprint of the paper can be found here.

Code

The code used in this paper lives in a repository on github.

Data

The data (graph files, training/testing split, scripts to run the code, etc.) that was used in the paper can be found here. (These were large graphs I was working with, and there are several copies of them in different configurations in here, so this is a very large 23GB download.) For a description of what's in that download, see the documentation on the github wiki.

The files containing the actual results from the paper can be found here. (The matrix files, which contain the training / testing feature matrices that are the output of the random walks, form the bulk of this massive 17GB download.)

The SVO data itself can be found here.

Using the data and/or code

If you all you want is to run your new algorithm and compare it against this work, you'll need the data files (to run your algorithm on), and my results files from running on the same data (and possibly the scripts I used to process the results, which are in the results download). If you have trouble understanding what data is where, first try looking at the documentation linked above, which describes the directory layout used. If you still have trouble, feel free to ask me for help (contact info is below).

If you want to use this method on your data, see the github repository linked above (and feel free to file bugs or feature requests, send pull requests, etc.).

If you want to extend the algorithm in some way using my code as a starting point, see the github repository linked above (and feel free to file bugs or feature requests, send pull requests, etc.).

If you want something else, contact information is below.

Contact

If you have any questions about the paper, about using the code, or about obtaining the data, the main point of contact for this paper is Matt Gardner.