"Every Belief in The KB" File

Each line of this file contains one category or relation instance that NELL believes to be true. Nominally, each belief is an (Entity, Relation, Value) tripple; instances of relations have the form (George Harrison, playsInstrument, Guitar), and instances of categories have the form (Guitar, generalizations, muscialInstrument). It is easy to separate category instances from relation instances because the Relation field will always be "generalizations" for a category, and never for a relation.

Each line of the file consists of several tab-delimited columns. The first line of the file has column names, which are described below.

To interpret these files correctly, it is necessary to distinguish between "concepts" and text that can be used to refer to thsoe concepts -- what we call "literal strings". There is a many-to-manny mapping between concepts and literals. For example, NELL's KB might contain the concept of the Apple computer company, NELL might know that both "Apple Inc." and "Apple" are literal strings that can refer to Apple computer company. NELL's KB might also contain the concept of the fruit known as an apple, and NELL might know that "Apple", "apple", and "apples" all can refer to the fruit. All beliefs in NELL's KB are among concepts.

Concepts have names like "concept:coach:peyton_manning". These names often capture the right meaning, but they can be misleading as well. In this example, there is no gauarantee that NELL believes that Peyton Manning is actually a coach, and NELL may believe that the concept belongs to other categories not mentioned. It is also possible that NELL has yet to be certain that it belongs in any category at all. Additionally, it could be the case that NELL is confused about which literal strings refer to which concepts, and it may be that NELL believes that both "Peyton Manning" and "Jim Caldwell" can refer to this one concept. It might not be clear whether NELL has mistaken Jim Caldwell for a football player or Peyton Manning for a coach. Therefore, it is essential to always look at the set of literal strings that refer to a concept, and to look at the set of categories to which a concept belongs in order to determine its true category membership. Simply stripping off the "concept:" prefix and category name will lead to incomplete and erroneous information.

The columns of the file are as follows:

Entity: The Entity part of the (Entity, Relation, Value) tripple. Note that this will be the name of a concept and is not the literal string of characters seen by NELL from some text source, nor does it indicate the category membership of that concept

Relation: The Relation part of the (Entity, Relation, Value) tripple. In the case of a category instance, this will be "generalizations". In the case of a relation instance, this will be the name of the relation.

Value: The Value part of the (Entity, Relation, Value) tripple. In the case of a category instance, this will be the name of the category. In the case of a relation instance, this will be another concept (like Entity).

Iteration of Promotion: The point in NELL's life at which this category or relation instance was promoted to one that NELL beleives to be true. This is a non-negative integer indicating the number of iterations of bootstrapping NELL had gone through.

Probability: A confidence score for the belief. Note that NELL's scores are not actually probabilistic at this time.

Source: A summary of the provenance for the belief indicating the set of learning subcomponents (CPL, SEAL, etc.) that had submitted this belief as being potentially true.

Entity literalStrings: The set of actual textual strings that NELL has read that it believes can refer to the concept indicated in the Entity column.

Value literalStrings: For relations, the set of actual textual strings that NELL has read that it believes can refer to the concept indicated in the Value column. For categories, this should be empty but may contain something spurious.

Best Entity literalString: Of the set of strings in the Entity literalStrings, column, which one string can best be used to describe the concept.

Best Value literalString: Same thing, but for Value literalStrings.

Categories for Entity: The full set of categories (which may be empty) to which NELL belives the concept indicated in the Entity column to belong.

Categories for Value: For relations, the full set of categories (which may be empty) to which NELL believes the concept indicated in the Value column to belong. For categories, this should be empty but may contain something spurious.

Candidate Source: A free-form amalgamation of more specific provenance information describing the justification(s) NELL has for possibly believing this category or relation instance.