NELL's "JSON0" API

We offer programmatic access to NELL's KB along with a few online prediction methods via a web-based API that accepts GET requests and returns JSON objects. This is API is used to power the Ask NELL page, which makes a good way to get a sense of what this API can offer. The documentation below covers the nitty gritty of using the API directly.

NOTE: This API is still largely experimental and can be expected to change as both the project and usage patterns evolve.

Background: Concepts, tokens, and literals

One source of complexity for this API comes from the fact that the arguments given as queries or returned as results may be of one of three types: concepts, tokens, or literals. The documentation for the "Every Belief in the KB" file starts out by describing the difference between concepts and literals. In short, all of NELL's learned knowledge is ultimately stored in terms of abstract concepts, and there is a many-to-many mapping from these to actual literal noun phrase strings read from text. This allows NELL to capture polysemy. In actuality, there is a third intermediate layer that we call "tokens", where, roughly, a token is a case-insensitive and punctuation-insensitive version of a literal. The many-to-many mapping actually exists between concepts and tokens, and then there is a one-to-many mapping from each token to the set of literals representing the variations in capitalization and punctuation that NELL has encountered.

For example, say the end user is interested in the literal "Apple Inc.". If NELL knows anything about "Apple Inc." then there will be a mapping in the KB from that literal to the token "apple_inc_". The token "apple_inc_" may map to other literals (e.g. "APPLE INC."), but the literal "Apple Inc." will map to at most one token. The "apple_inc_" token might then refer to two concepts, maybe "concept:company:apple" and "concept:recordlabel:apple_records". Those two concepts might each have multiple concepts referring to them.

All of NELLs promoted beliefs are among concepts. Some of NELL's learning methods operate in terms of concepts (e.g. PRA looks for structural regularities in relationships among concepts). Some of NELL's learning methods operate in terms of tokens (e.g. CPL looks for cooccurring patterns of words surrounding noun phrases seen in snippets of text). Some of NELL's learning methods operate in terms of literal strings (e.g. CMC looks for prefixes, suffixes, patterns of capitalization, and other orthographical features). Finally, depending on the needs of the end user, queries and responses might be formulated in terms of concepts, tokens, or literals. So we necessarily must disambiguate among these three things when posing or answering queries.

NOTE: Concepts have names like "concept:coach:peyton_manning". These names often capture the right meaning, but they can be misleading as well. In this example, there is no gauarantee that NELL believes that Peyton Manning is actually a coach, and NELL may believe that the concept belongs to other categories not mentioned. It is also possible that NELL has yet to be certain that it belongs in any category at all. Additionally, it could be the case that NELL is confused about which literal strings refer to which concepts, and it may be that NELL believes that both "Peyton Manning" and "Jim Caldwell" can refer to this one concept. It might not be clear whether NELL has mistaken Jim Caldwell for a football player or Peyton Manning for a coach. Therefore, it is essential to always look at the set of literal strings that refer to a concept, and to look at the set of categories to which a concept belongs in order to determine its true category membership. Simply stripping off the "concept:" prefix and category name will lead to incomplete and erroneous information.

Kinds of queries

Fundamentally, the query takes the same form as an assertion; a (argument, category) pair for a category instance query and an (arg1, relation, arg2) tripple for a relation instance query. In this most basic form, the query is asking for one or more scores indicating how likely the query instance is to be true. We allow a wildcard to be supplied in place of zero or more values in the query. Correspondingly, such queries will be answered with a set of beliefs that match the query, each with one or more scores. Examples of wildcard queries include:

The arguments to the query may be concepts, tokens, or literals, and the format of the query allows this to be indicated unambiguously. The query server offers access to multiple prediction agents (described below) that operate variously in terms of concepts, tokens, and literals. The first step in the query process, then, is to map the given query instance into all possible concept, token, and literal forms that it may take so that each prediction agent has the kind of input it needs. By default, answers are returned as-is from the predictors, meaning that a query may return some answers in terms of concepts, others in terms of tokens, and others in terms of literals. To facilitate interpretation, the response also contains the mappings among the set of concepts, tokens, and literals in the answers. To ease the burden on the end user, alternate query modes are available that project answers into only one kind of argument, and other sorts of things along these lines can be added as needed.

Query structure

To query the NELL "JSON0" API, send an HTTP GET request to http://rtw.ml.cmu.edu/rtw/api/json0. The GET variables should be set as follows:

For example, to use only NELL's KB to find the noun phrases that can refer to the CEOs of all known companies that can be referred to with the noun phrase "Apple", one would issue the following request: http://rtw.ml.cmu.edu/rtw/api/json0?lit1=*&predicate=ceoof&lit2=Apple&agent=KB. It may be useful to use the advanced query UI to issue the query and then observe the URL that generates.

Response synopsis

The response will be a JSON object. The "kind" field should be consulted first of all, and the value should be "NELLQueryDemoJSON0". If the value is "error", then an error has occurred in processing the query, and an error message will be available in the "message" field. The response has two main parts. One part, "items", is a list of assertions that answer the query, and the other part, "entMap", provides the concept / token / literal mappings for all arguments occurring in the "items" section.

Each assertion in the "items" section takes the same form as the query, i.e. an ent1 or lit1 value, a predicate value, and, in the case of relations, an ent2 or lit2 value. Each assertion will also have a "justifications" field that is an array of justification structures, one per agent, containing the agent name, the score, and some free-form text indicating the provenance of that prediction.

The "entMap" section is a list with one entry for each ent1, lit2, ent2, or lit2 value occurring in the "items" section. Each entry names the argument in question, and provides the set of token entities, concept entities, and literal strings to which it maps.

The exact structure of the JSON is easily understood by selecting the "Pretty-printed JSON" format from the advanced query UI. For example, predictions of who works for "Apple". Note that the score of the "KI" agent can be used as the single overall score for each prediction, which is useful when more than one agent comes up with the same prediction.

Metadata queries

This facility can also furnish the set of available categories, the set of available relations, and the set of available agents. Metadata is supplied in all three cases (e.g. hierarchical relationships among predicates, domain and range of relations, brief descriptions of agents). The structure of the JSON objects returned here should be self-explanatory.

Detailed agent documentation

As of this writing, the following are valid values to include in the list of agents: