Read the Web :: Query API Documentation

NELL's "JSON0" API

We offer programmatic access to NELL's KB along with a few online prediction methods via a web-based API that accepts GET requests and returns JSON objects. This is API is used to power the Ask NELL page, which makes a good way to get a sense of what this API can offer. The documentation below covers the nitty gritty of using the API directly.

NOTE: This API is still largely experimental and can be expected to change as both the project and usage patterns evolve.

Background: Concepts, tokens, and literals

One source of complexity for this API comes from the fact that the arguments given as queries or returned as results may be of one of three types: concepts, tokens, or literals. The documentation for the "Every Belief in the KB" file starts out by describing the difference between concepts and literals. In short, all of NELL's learned knowledge is ultimately stored in terms of abstract concepts, and there is a many-to-many mapping from these to actual literal noun phrase strings read from text. This allows NELL to capture polysemy. In actuality, there is a third intermediate layer that we call "tokens", where, roughly, a token is a case-insensitive and punctuation-insensitive version of a literal. The many-to-many mapping actually exists between concepts and tokens, and then there is a one-to-many mapping from each token to the set of literals representing the variations in capitalization and punctuation that NELL has encountered.

For example, say the end user is interested in the literal "Apple Inc.". If NELL knows anything about "Apple Inc." then there will be a mapping in the KB from that literal to the token "apple_inc_". The token "apple_inc_" may map to other literals (e.g. "APPLE INC."), but the literal "Apple Inc." will map to at most one token. The "apple_inc_" token might then refer to two concepts, maybe "concept:company:apple" and "concept:recordlabel:apple_records". Those two concepts might each have multiple concepts referring to them.

All of NELLs promoted beliefs are among concepts. Some of NELL's learning methods operate in terms of concepts (e.g. PRA looks for structural regularities in relationships among concepts). Some of NELL's learning methods operate in terms of tokens (e.g. CPL looks for cooccurring patterns of words surrounding noun phrases seen in snippets of text). Some of NELL's learning methods operate in terms of literal strings (e.g. CMC looks for prefixes, suffixes, patterns of capitalization, and other orthographical features). Finally, depending on the needs of the end user, queries and responses might be formulated in terms of concepts, tokens, or literals. So we necessarily must disambiguate among these three things when posing or answering queries.

NOTE: Concepts have names like "concept:coach:peyton_manning". These names often capture the right meaning, but they can be misleading as well. In this example, there is no gauarantee that NELL believes that Peyton Manning is actually a coach, and NELL may believe that the concept belongs to other categories not mentioned. It is also possible that NELL has yet to be certain that it belongs in any category at all. Additionally, it could be the case that NELL is confused about which literal strings refer to which concepts, and it may be that NELL believes that both "Peyton Manning" and "Jim Caldwell" can refer to this one concept. It might not be clear whether NELL has mistaken Jim Caldwell for a football player or Peyton Manning for a coach. Therefore, it is essential to always look at the set of literal strings that refer to a concept, and to look at the set of categories to which a concept belongs in order to determine its true category membership. Simply stripping off the "concept:" prefix and category name will lead to incomplete and erroneous information.

Kinds of queries

Fundamentally, the query takes the same form as an assertion; a (argument, category) pair for a category instance query and an (arg1, relation, arg2) tripple for a relation instance query. In this most basic form, the query is asking for one or more scores indicating how likely the query instance is to be true. We allow a wildcard to be supplied in place of zero or more values in the query. Correspondingly, such queries will be answered with a set of beliefs that match the query, each with one or more scores. Examples of wildcard queries include:

(arg1, *): Return the most likely categories to which arg1 belongs
(arg1, relation, *): Return the most likely arg2 values
(arg1, *, arg2): Return the most likely relationships between arg1 and arg2
(arg1, *, *) Predict the most likely relationships involving arg1

The arguments to the query may be concepts, tokens, or literals, and the format of the query allows this to be indicated unambiguously. The query server offers access to multiple prediction agents (described below) that operate variously in terms of concepts, tokens, and literals. The first step in the query process, then, is to map the given query instance into all possible concept, token, and literal forms that it may take so that each prediction agent has the kind of input it needs. By default, answers are returned as-is from the predictors, meaning that a query may return some answers in terms of concepts, others in terms of tokens, and others in terms of literals. To facilitate interpretation, the response also contains the mappings among the set of concepts, tokens, and literals in the answers. To ease the burden on the end user, alternate query modes are available that project answers into only one kind of argument, and other sorts of things along these lines can be added as needed.

Query structure

To query the NELL "JSON0" API, send an HTTP GET request to http://rtw.ml.cmu.edu/rtw/api/json0. The GET variables should be set as follows:

ent1 or lit1: A string specifying the arg1 of the query. If given via ent1, the string will be interpreted as the name of a concept or token entity in NELL's KB. If given via lit1, the string will be interpreted as a plain string literal (i.e. plain noun phrase). A "*" may be supplied in either lit1 or ent1 in order to make arg1 a wildcard; the choice of lit1 vs. ent1 in this case may influence how the query answers are converted among concept, token, and literal.
predicate: A string specifying the name of the category or relation of the query. A "*" may be supplied in order to make the predicate a wildcard.
ent2 or lit2: An optional string specifying the arg2 of the query. The rules here are the same as for ent1/lit1. If ent2 or lit2 is supplied, then the query is taken to be a relation query. If niether are supplied, then the query is taken to be a category query.
agent: An optional comma-delimited list of prediction agents (without spaces) that will be used to answer the query. Possible values are described below. If omitted, some reasonable but probably minimal default setting will be in effect.

For example, to use only NELL's KB to find the noun phrases that can refer to the CEOs of all known companies that can be referred to with the noun phrase "Apple", one would issue the following request: http://rtw.ml.cmu.edu/rtw/api/json0?lit1=*&predicate=ceoof&lit2=Apple&agent=KB. It may be useful to use the advanced query UI to issue the query and then observe the URL that generates.

Response synopsis

The response will be a JSON object. The "kind" field should be consulted first of all, and the value should be "NELLQueryDemoJSON0". If the value is "error", then an error has occurred in processing the query, and an error message will be available in the "message" field. The response has two main parts. One part, "items", is a list of assertions that answer the query, and the other part, "entMap", provides the concept / token / literal mappings for all arguments occurring in the "items" section.

Each assertion in the "items" section takes the same form as the query, i.e. an ent1 or lit1 value, a predicate value, and, in the case of relations, an ent2 or lit2 value. Each assertion will also have a "justifications" field that is an array of justification structures, one per agent, containing the agent name, the score, and some free-form text indicating the provenance of that prediction.

The "entMap" section is a list with one entry for each ent1, lit2, ent2, or lit2 value occurring in the "items" section. Each entry names the argument in question, and provides the set of token entities, concept entities, and literal strings to which it maps.

The exact structure of the JSON is easily understood by selecting the "Pretty-printed JSON" format from the advanced query UI. For example, predictions of who works for "Apple". Note that the score of the "KI" agent can be used as the single overall score for each prediction, which is useful when more than one agent comes up with the same prediction.

Metadata queries

This facility can also furnish the set of available categories, the set of available relations, and the set of available agents. Metadata is supplied in all three cases (e.g. hierarchical relationships among predicates, domain and range of relations, brief descriptions of agents). The structure of the JSON objects returned here should be self-explanatory.

Get the set of all categories: http://rtw.ml.cmu.edu/rtw/api/json0?action=categories
Get the set of all relations: http://rtw.ml.cmu.edu/rtw/api/json0?action=relations
Get the set of all agents: http://rtw.ml.cmu.edu/rtw/api/json0?action=agents

Detailed agent documentation

As of this writing, the following are valid values to include in the list of agents:

KI: This is a meta-predictor that combines the scores of the other predictors to produce a "final score" for each prediction. This should be used when a single overall score is desired, as in the Ask NELL interface. KI stands for "Knowledge Integrator."
KB: Checks NELL's KB for promoted facts satisfying the query. Note that the KB used by the query server is not necessarily the same one shown in the KB browser elsewhere on our website. This agent natively makes predictions in terms of concepts.
CKB: Checks NELL's KB for all predictions made and recorded in the past by any learning agent. This will include CPL, SEAL, CMC, PRA, and anything else normally seen in the KB browser. Predictions will be variously made in terms of concepts or tokens. These will be the set of scores that NELL uses to come up with the single score returned by the "KB" agent. Note that technical limitations prevent this agent from being able to predict relations in both directions; e.g. CKB can predict the "ceo" relation but not its inverse, "ceoof".
OCMC: An online version of the CMC agent. This can only predict category instances. Notably, this agent can generate predictions for things not currently in NELL's KB. Additionally, this online version makes predictions directly on literals whereas the batch-mode CMC contributing to NELL's KB makes predictions on tokens and therefore can have more difficulty in situations where differences in capitalization are meaningful.
OPRA: An online version of the PRA agent. This can only predict relation instances. This agent generates predictions in terms of concepts. See this paper for more information on the algorithm.
ORWR: An agent similar to OPRA, but that uses a random walk with reset process to predict relations by way of nearness and connectedness. This agent generates predictions in terms of concepts. See this paper for more information on the algorithm.
UNMAP: This is not a prediction agent itself but rather a flag that invokes a post-processing step on the query results before they are returned. When UNMAP is supplied, the query is examined to determine whether it has been posed in terms of concepts, tokens, or literals, and then all results are mapped into concepts, tokens, or literals as needed so that they match the query. This is intended to be a useful simplification for casual users not interested in the inner complexities of NELL. It's operation is still experimental and subject to change.
CMAP: This is like UNMAP, but forces the results to always be returned in terms of concepts. This effect can be useful especially for relation queries to simplify the results into something uniform while retaining the word sense disambiguation that concepts offer. This, too, is experimental and subject to change.