Read the Web :: Publications

NELL Overview Papers

Never-Ending Learning.
T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, J. Welling. In Proceedings of the Conference on Artificial Intelligence (AAAI), 2015. [PDF] [abstract] [bib]
Whereas people learn many different types of knowledge from diverse experiences over many years, most current machine learning systems acquire just a single function or data model from just a single data set. We propose a never-ending learning paradigm for machine learning, to better reflect the more ambitious and encompassing type of learning performed by humans. As a case study, we describe the Never-Ending Language Learner (NELL), which achieves some of the desired properties of a never-ending learner, and we discuss lessons learned. NELL has been learning to read the web 24 hours/day since January 2010, and so far has acquired a knowledge base with over 80 million confidence-weighted beliefs (e.g., servedWith(tea, biscuits)). NELL has also learned millions of features and parameters that enable it to read these beliefs from the web. Additionally, it has learned to reason over these beliefs to infer new beliefs, and is able to extend its ontology by synthesizing new relational predicates. NELL can be tracked online at http://rtw.ml.cmu.edu, and followed on Twitter at @CMUNELL.

@inproceedings{NELL-aaai15, Title = {Never-Ending Learning}, Author = {T. Mitchell and W. Cohen and E. Hruschka and P. Talukdar and J. Betteridge and A. Carlson and B. Dalvi and M. Gardner and B. Kisiel and J. Krishnamurthy and N. Lao and K. Mazaitis and T. Mohamed and N. Nakashole and E. Platanios and A. Ritter and M. Samadi and B. Settles and R. Wang and D. Wijaya and A. Gupta and X. Chen and A. Saparov and M. Greaves and J. Welling}, Booktitle = {Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15)}, Year = {2015}}
Toward an Architecture for Never-Ending Language Learning.
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E.R. Hruschka Jr. and T.M. Mitchell. In Proceedings of the Conference on Artificial Intelligence (AAAI), 2010. [PDF] [abstract] [bib] [supplementary materials]
We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent that runs forever and that each day must (1) extract, or read, information from the web to populate a growing structured knowledge base, and (2) learn to perform this task better than on the previous day. In particular, we propose an approach and a set of design principles for such an agent, describe a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs with an estimated precision of 74%, and discuss lessons learned from this preliminary attempt to build a never-ending learning agent.

@inproceedings{carlson-aaai, Title = {Toward an Architecture for Never-Ending Language Learning}, Author = {Andrew Carlson and Justin Betteridge and Bryan Kisiel and Burr Settles and Estevam R. Hruschka Jr. and Tom M. Mitchell}, Booktitle = {Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010)}, Year = {2010}}

NELL Overview: Video lectures

Never Ending Learning to Read the Web, ACM Webinar, Tom Mitchell, August 2013.
Machine Learning and NELL, Princeton Centenial Celebration of Alan Turing, Tom Mitchell, May 2012. [this is more generally accessible than the following lecture]
Never Ending Language Learning, NAACL Workshop on Knowledge Extraction, Tom Mitchell, June 2012. [contains more technical detail than above lecture]

NELL in the News

NELL Is a Computer That Reads the Web - With a Little Human Help, New York Times, by Steve Lohr, March 11, 2013.
NELL atesta evolução da inteligência artificial no País, Jornal do Comércio, by Patricia Knebel, May 3, 2012.
Tech of the Future, Today: Breakthroughs in Artificial Intelligence, PCWorld, by Ian Paul, February 27, 2011.
NELL is featured on slide 5.
Is NELL The New HAL?, Word of Mouth (NH Public Radio), by Virginia Prescott, October 27, 2010.
Five-minute live interview segment with Burr Settles.
Meet NELL. See NELL Run, Teach NELL How To Run, TechCrunch, by Evelyn Rusli, October 9, 2010.
Includes video interview and NELL demonstration.
Read the Internet, Speak English, Universe, by Clair Evans, republished at io9 , October 8, 2010.
Includes interview with Burr Settles and Tom Mitchell.
Aiming to Learn as We Do, a Machine Teaches Itself, New York Times, by Steve Lohr, October 5, 2010.

Scientific Publications

Joint Extraction of Events and Entities within a Document Context
B. Yang and T. Mitchell. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2016 [PDF] [abstract] [bib]
Events and entities are closely related; entities are often actors or participants in events and events without entities are uncommon. The interpretation of events and entities is highly contextually dependent. Existing work in information extraction typically models events separately from entities, and performs inference at the sentence level, ignoring the rest of the document. In this paper, we propose a novel approach that models the dependencies among variables of events, entities, and their relations, and performs joint inference of these variables across a document. The goal is to enable access to document-level contextual information and facilitate context-aware predictions. We demonstrate that our approach substantially outperforms the state-of-the-art methods for event extraction as well as a strong baseline for entity extraction.

@inproceedings{bishan2016event, author = {Yang, Bishan and Mitchell, Tom}, title = {Joint Extraction of Events and Entities within a Document Context}, booktitle = {Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)}, year = {2016}}
Mapping Verbs in Different Languages to Knowledge Base Relations using Web Text as Interlingua
D. T. Wijaya and T. Mitchell. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2016 [PDF] [abstract] [bib]
In recent years many knowledge bases (KBs) have been constructed, yet there is not yet a verb resource that maps to these growing KB resources. A resource that maps verbs in different languages to KB relations would be useful for extracting facts from text into the KBs, and to aid alignment and integration of knowledge across different KBs and languages. Such a multi-lingual verb resource would also be useful for tasks such as machine translation and machine reading. In this paper, we present a scalable approach to automatically construct such a verb resource using a very large web text corpus as a kind of interlingua to relate verb phrases to KB relations. Given a text corpus in any language and any KB, it can produce a mapping of that language’s verb phrases to the KB relations. Experiments with the English NELL KB and ClueWeb corpus show that the learned English verb-to-relation mapping is effective for extracting relation instances from English text. When applied to a Portuguese NELL KB and a Portuguese text corpus, the same method automatically constructs a verb resource in Portuguese that is effective for extracting relation instances from Portuguese text.

@inproceedings{wijaya2016mapping, author = {Wijaya, Derry Tanti and Mitchell, Tom }, title = {Mapping Verbs In Different Languages to Knowledge Base Relations using Web Text as Interlingua}, booktitle = {Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)}, year = {2016}}
Translation Invariant Word Embeddings
M. Gardner, K. Huang, E. Papalexakis, X. Fu, P. Talukdar, C. Faloutsos, N. Sidiropoulos, T. Mitchell. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015 [PDF] [abstract] [bib]
This work focuses on the task of finding latent vector representations of the words in a corpus. In particular, we address the issue of what to do when there are multiple lanugages in the corpus. Prior work has, among other techniques, used canonical correlation analysis to project pre-trained vectors in two languages into a common space. We propose a simple and scalable method that is inspired by the notion that the learned vector representations should be invariant to translation between languages. We show empirically that our method outperforms prior work on multilingual tasks, matches the performance of prior work on monolingual tasks, and scales linearly with the size of the input data (and thus the number of languages being embedded).

@inproceedings{gardner2015translation, author = {Gardner, Matt and Huang, Kejun Huang and Papalexakis , Evangelos and Fu, Xiao and Talukdar, Partha and Faloutsos , Christos and Sidiropoulos, Nicholas and Mitchell, Tom }, title = {Translation Invariant Word Embeddings}, booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2015}}
AskWorld: Budget-Sensitive Query Evaluation for Knowledge-on-Demand
M. Samadi, P. Talukdar, M. Veloso, T. Mitchell. In International Joint Conference on Artificial Intelligence (IJCAI), 2015. [PDF] [abstract] [bib]
Recently, several Web-scale knowledge harvesting systems have been built, each of which is competent at extracting information from certain types of data (e.g., unstructured text, structured tables on the web, etc.). In order to determine the response to a new query posed to such systems (e.g., is sugar a healthy food?), it is useful to integrate opinions from multiple systems. If a response is desired within a specific time budget (e.g., in less than 2 seconds), then maybe only a subset of these resources can be queried. In this paper, we address the problem of kowledge integration for on-demand time-budgeted query answering. We propose a new method, AskWorld, which learns a policy that chooses which queries to send to which resources, by accommodating varying budget constraints that are available only at query (test) time. Through extensive experiments on real world datasets, we demonstrate AskWorld's capability in selecting the most informative resources to query within test-time constraints, resulting in improved performance compared to competitive baselines.

@inproceedings{samadi2015askworld, title={AskWorld: Budget-Sensitive Query Evaluation for Knowledge-on-Demand}, author={Samadi, Mehdi and Talukdar, Partha and Veloso, Manuela and Mitchell, Tom}, year={2015}, booktitle={International Joint Conference on Artificial Intelligence (IJCAI)}}
Automatic Gloss Finding for a Knowledge Base using Ontological Constraints
B. Dalvi, E. Minkov, P. P. Talukdar, W. W. Cohen. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), 2015. [PDF] [abstract] [bib]
While there has been much research on automatically constructing structured Knowledge Bases (KBs), most of it has focused on generating facts to populate a KB. However, a useful KB must go beyond facts. For example, glosses (short natural language definitions) have been found to be very useful in tasks such as Word Sense Disambiguation. However, the important problem of Automatic Gloss Finding, i.e., assigning glosses to entities in an initially gloss-free KB, is relatively unexplored. We address that gap in this paper. In particular, we propose GLOFIN, a hierarchical semi-supervised learning algorithm for this problem which makes effective use of limited amounts of supervision and available ontological constraints. To the best of our knowledge, GLOFIN is the first system for this task.

Through extensive experiments on real-world datasets, we demonstrate GLOFIN's effectiveness. It is encouraging to see that GLOFIN outperforms other state-of-the-art SSL algorithms, especially in low supervision settings. We also demonstrate GLOFIN's robustness to noise contributed (e.g., Freebase) to automatically constructed (e.g., NELL). To facilitate further research in this area, we have already made datasets and code used in this paper publicly available.

@inproceedings{dalvi2015automatic, title={Automatic gloss finding for a knowledge base using ontological constraints}, author={Dalvi, Bhavana and Minkov, Einat and Talukdar, Partha P and Cohen, William W}, booktitle={Proceedings of the Eighth ACM International Conference on Web Search and Data Mining}, year={2015}}
A Compositional and Interpretable Semantic Space
A. Fyshe, L. Wehbe, P. Talukdar, B. Murphy, T. Mitchell In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2015). [PDF] [abstract] [bib]
Vector Space Models (VSMs) of Semantics are useful tools for exploring the semantics of single words, and the composition of words to make phrasal meaning. While many methods can estimate the meaning (i.e. vector) of a phrase, few do so in an interpretable way. We introduce a new method (CNNSE) that allows word and phrase vectors to adapt to the notion of composition. Our method learns a VSM that is both tailored to support a chosen semantic composition operation, and whose resulting features have an intuitive interpretation. Interpretability allows for the exploration of phrasal semantics, which we leverage to analyze performance on a behavioral task.

@inproceedings{fyshe2015compositional, title={A Compositional and Interpretable Semantic Space}, author={Fyshe, Alona and Wehbe, Leila and Talukdar, Partha P and Murphy, Brian and Mitchell, Tom M}, booktitle ={Proceedings of the NAACL-HLT}, year={2015}}
"A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce": Learning State Changing Verbs from Wikipedia Revision History
D. T. Wijaya, N. Nakashole, T. Mitchell. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015. [abstract] [bib]
Learning to determine when the time-varying facts of a Knowledge Base (KB) have to be updated is a challenging task. We propose to learn state changing verbs from Wikipedia edit history. When a state-changing event, such as a marriage or death, happens to an entity, the infobox on the entity’s Wikipedia page usually gets updated. At the same time, the article text may be updated with verbs either being added or deleted to reflect the changes made to the infobox. We use Wikipedia edit history to distantly supervise a method for automatically learning verbs and state changes. Additionally, our method uses constraints to effectively map verbs to infobox changes. We observe in our experiments that when state-changing verbs are added or deleted from an entity’s Wikipedia page text, we can predict the entity’s infobox updates with 88% precision and 76% recall. One compelling application of our verbs is to incorporate them as triggers in methods for updating existing KBs, which are currently mostly static.

@inproceedings{wijaya2015statechangingverbs, author = {Wijaya, Derry Tanti and Nakashole, Ndapa and Mitchell, Tom M}, title = {{"A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce": Learning State Changing Verbs from Wikipedia Revision History}}, booktitle = {Proceedings of the Comnference on Emprical Methods in Natural Language Processing (EMNLP)}, year = {2015}
Efficient and Expressive Knowledge Base Completion Using Subgraph Feature Extraction.
M. Gardner, T. Mitchell. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015. [abstract] [bib]
We explore some of the practicalities of using random walk inference methods, such as the Path Ranking Algorithm (PRA), for the task of knowledge base completion. We show that the random walk probabilities computed (at great expense) by PRA provide no discernible benefit to performance on this task, and so they can safely be dropped. This result allows us to define a simpler algorithm for generating feature matrices from graphs, which we call subgraph feature extraction (SFE). In addition to being conceptually simpler than PRA, SFE is much more efficient, reducing computation by an order of magnitude, and more expressive, allowing for much richer features than just paths between two nodes in a graph. We show experimentally that this technique gives substantially better performance than PRA and its variants, improving mean average precision from .432 to .528 on a knowledge base completion task using the NELL knowledge base.

@inproceedings{gardner2015sfe, author = {Gardner, Matt and Mitchell, Tom M}, title = {{Efficient and Expressive Knowledge Base Completion Using Subgraph Feature Extraction}}, booktitle = {Proceedings of the Comnference on Emprical Methods in Natural Language Processing (EMNLP)}, year = {2015}
A Knowledge-Intensive Model for Prepositional Phrase Attachment.
N. Nakashole, T. Mitchell. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL), 2015. [PDF] [abstract] [bib]
Prepositional phrases (PPs) express crucial information that knowledge base construction methods need to extract. However, PPs are a major source of syntactic ambiguity and still pose problems in parsing. We present a method for resolving ambiguities arising from PPs, making extensive use of semantic knowledge from various resources. As training data, we use both labeled and unlabeled data, utilizing an expectation maximization algorithm for parameter estimation. Experiments show that our method yields improvements over existing methods including a state of the art dependency parser.

@inproceedings{nakashole2015ppa, author = {Nakashole, Ndapandula and Mitchell, Tom M}, title = {{A Knowledge-Intensive Model for Prepositional Phrase Attachment}}, booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2015}, pages = {365-375}
Learning a Compositional Semantics for Freebase with an Open Predicate Vocabulary.
Jayant Krishnamurthy and Tom M Mitchell. In Transactions of the Association for Computational Linguistics, Volume 3, 2015. [PDF]
AskWorld: Budget-Sensitive Query Evaluation for Knowledge-on-Demand.
Mehdi Samadi, Partha Pratim Talukdar, Manuela M. Veloso, Tom M. Mitchell. In Proceedings of 24th International Joint Conference on Artificial Intelligence (IJCAI), 2015. [PDF]
Weakly Supervised Extraction of Computer Security Events from Twitter.
Alan Ritter, Evan Wright, William Casey and Tom M. Mitchell. In Proceedings of the 24th International Conference on World Wide Web, (WWW), 2015 [PDF] [abstract] [bib]
Twitter contains a wealth of timely information, however.

@inproceedings{ritter2015secevent, author = {Alan Ritter and Evan Wright and William Casey and Tom M. Mitchell}, title = {{Weakly Supervised Extraction of Computer Security Events from Twitter}}, booktitle = {Proceedings of the 24th International Conference on World Wide Web, {WWW}}, year = {2015}, pages = {896--905}
Never-Ending Learning.
T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, J. Welling. In Proceedings of the Conference on Artificial Intelligence (AAAI), 2015. [PDF] [abstract] [bib]
Whereas people learn many different types of knowledge from diverse experiences over many years, most current machine learning systems acquire just a single function or data model from just a single data set. We propose a never-ending learning paradigm for machine learning, to better reflect the more ambitious and encompassing type of learning performed by humans. As a case study, we describe the Never-Ending Language Learner (NELL), which achieves some of the desired properties of a never-ending learner, and we discuss lessons learned. NELL has been learning to read the web 24 hours/day since January 2010, and so far has acquired a knowledge base with over 80 million confidence-weighted beliefs (e.g., servedWith(tea, biscuits)). NELL has also learned millions of features and parameters that enable it to read these beliefs from the web. Additionally, it has learned to reason over these beliefs to infer new beliefs, and is able to extend its ontology by synthesizing new relational predicates. NELL can be tracked online at http://rtw.ml.cmu.edu, and followed on Twitter at @CMUNELL.

@inproceedings{NELL-aaai15, Title = {Never-Ending Learning}, Author = {T. Mitchell and W. Cohen and E. Hruschka and P. Talukdar and J. Betteridge and A. Carlson and B. Dalvi and M. Gardner and B. Kisiel and J. Krishnamurthy and N. Lao and K. Mazaitis and T. Mohamed and N. Nakashole and E. Platanios and A. Ritter and M. Samadi and B. Settles and R. Wang and D. Wijaya and A. Gupta and X. Chen and A. Saparov and M. Greaves and J. Welling}, Booktitle = {Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15)}, Year = {2015}}
Joint Syntactic and Semantic Parsing with Combinatory Categorial Grammar.
J. Krishnamurthy, T. Mitchell. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), 2014. [PDF] [abstract] [bib]
We present an approach to training a joint syntactic and semantic parser that combines syntactic training information from CCGbank with semantic training information from a knowledge base via distant supervision. The trained parser produces a full syntactic parse of any sentence, while simultaneously producing logical forms for portions of the sentence that have a semantic representation within the parser's predicate vocabulary. We demonstrate our approach by training a parser whose semantic representation contains 130 predicates from the NELL ontology. A semantic evaluation demonstrates that this parser produces logical forms better than both comparable prior work and a pipelined syntax-then-semantics approach. A syntactic evaluation on CCGbank demonstrates that the parser's dependency Fscore is within 2.5% of state-of-the-art.

@inproceedings{krishnamurthy2014jointccg, author = {Krishnamurthy, Jayant and Mitchell, Tom M}, title = {{Joint Syntactic and Semantic Parsing with Combinatory Categorial Grammar}}, booktitle = {Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2014}, pages = {1188--1198}}
Assuming Facts Are Expressed More Than Once.
J. Betteridge, A. Ritter and T. Mitchell In Proceedings of the 27th International Florida Artificial Intelligence Research Society Conference (FLAIRS-27), 2014. [PDF] [abstract] [bib]
Distant supervision (DS) is a method for training sentence-level information extraction models using only an unlabeled corpus and a knowledge base (KB). Fundamental to many DS approaches is the assumption that KB facts are expressed at least once (EALO) in the text corpus. Often, however, KB facts are actually expressed in the corpus many times, in which cases EALO-based systems underuse the available training data. To address this problem, we introduce "expressed at least α percent" EALA) assumption, which asserts that expressions of KB facts account for up to α% f the corresponding mentions. We show that for the same level of precision as the EALO approach, the EALA approach achieves up to 66% higher recall on category recognition and 53% higher recall on relation recognition.

@inproceedings{betteridge2014assuming, Author = {Betteridge, Justin and Ritter, Alan and Mitchell, Tom}, Booktitle = {The Twenty-Seventh International Flairs Conference}, Title = {Assuming Facts Are Expressed More Than Once}, Year = {2014}}
Estimating Accuracy from Unlabeled Data.
E. A. Platanios, A. Blum, T. Mitchell. In Uncertainty in Artificial Intelligence (UAI), 2014. [PDF] [abstract] [bib]
We consider the question of how unlabeled data can be used to estimate the true accuracy of learned classifiers. This is an important question for any autonomous learning system that must estimate its accuracy without supervision, and also when classifiers trained from one data distribution must be applied to a new distribution (e.g., document classifiers trained on one text corpus are to be applied to a second corpus). We first show how to estimate error rates exactly from unlabeled data when given a collection of competing classifiers that make independent errors, based on the agreement rates between subsets of these classifiers. We further show that even when the competing classifiers do not make independent errors, both their accuracies and error dependencies can be estimated by making certain relaxed assumptions. Experiments on two data real-world data sets produce estimates within a few percent of the true accuracy, using solely unlabeled data. These results are of practical significance in situations where labeled data is scarce and shed light on the more general question of how the consistency among multiple functions is related to their true accuracies.

@inproceedings{Platanios:2014ti, author = {Platanios, Emmanouil Antonios and Blum, Avrim and Mitchell, Tom M}, title = {{Estimating Accuracy from Unlabeled Data}}, booktitle = {Conference on Uncertainty in Artificial Intelligence}, year = {2014}, pages = {1--10}}
CTPs: Contextual Temporal Profiles for Time Scoping Facts via Entity State Change Detection.
D.T. Wijaya, N. Nakashole and T.M. Mitchell. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. [PDF] [abstract] [bib]
Temporal scope adds a time dimension to facts in Knowledge Bases (KBs). Existing methods for temporal scope infer- ence and extraction still suffer from low accuracy. In this paper, we present a novel method that leverages temporal profiles augmented with context-- Contextual Temporal Profiles (CTPs) of entities. Through change patterns in an entity’s CTP, we model the entity’s state change brought about by real world events that happen to the entity (e.g, hired, fired, divorced, etc.). This leads to a new formulation of temporal scoping problem as a state change detection problem. Our experiments show that this formulation, and the resulting solution are highly effective for inferring temporal scope of facts.

@InProceedings{wijaya-nakashole-mitchell:2014:EMNLP, author = {Wijaya, Derry and Nakashole, Ndapa and Mitchell, Tom}, title = {CTPs: Contextual Temporal Profiles for Time Scoping Facts via Entity State Change Detection}, booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing}, month = {October}, year = {2014}, address = {Doha, Qatar.}, publisher = {Association for Computational Linguistics}}
Incorporating Vector Space Similarity in Random Walk Inference over Knowledge Bases.
M. Gardner, P. Talukdar, J. Krishnamurthy and T.M. Mitchell. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. [PDF] [abstract] [bib]
Much work in recent years has gone into the construction of large knowledge bases (KBs), such as Freebase, DBPedia, NELL, and YAGO. While these KBs are very large, they are still very incomplete, necessitating the use of inference to fill in gaps. Prior work has shown how to make use of a large text corpus to augment random walk inference over KBs. We present two improvements to the use of such large corpora to augment KB inference. First, we present a new technique for combining KB relations and surface text into a single graph representation that is much more compact than graphs used in prior work. Second, we describe how to incorporate vector space similarity into random walk inference over KBs, reducing the feature sparsity inherent in using surface text. This allows us to combine distributional similarity with symbolic logical inference in novel and effective ways. With experiments on many relations from two separate KBs, we show that our methods significantly outperform prior work on KB inference, both in the size of problem our methods can handle and in the quality of predictions made.

@InProceedings{gardner2014incorporating, author = {Gardner, Matt and Talukdar, Partha and Krishnamurthy, Jayant and Mitchell, Tom}, title = {Incorporating Vector Space Similarity in Random Walk Inference over Knowledge Bases}, booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, month = {October}, year = {2014}, address = {Doha, Qatar.}, publisher = {Association for Computational Linguistics}}
Language-Aware Truth Assessment of Fact Candidates.
N. Nakashole, T. Mitchell. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), 2014. [PDF] [abstract] [bib]
This paper introduces FactChecker, alanguage-aware approach to truth-finding. FactChecker differs from prior approaches in that it does not rely on iterative peer voting, instead it leverages language to infer believability of fact candidates. In particular, FactChecker makes use of linguistic features to detect if a given source objectively states facts or is speculative and opinionated. To ensure that fact candidates mentioned in similar sources have similar believability, FactChecker augments objectivity with a co-mention score to compute the overall believability score of a fact candidate. Our experiments on various datasets show that FactChecker yields higher accuracy than existing approaches.

@inproceedings{nakashole2014truth, author = {Nakashole, Ndapandula and Mitchell, Tom M}, title = {{Language-Aware Truth Assessment of Fact Candidates}}, booktitle = {Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL)}, year = {2014}, pages = {1009--1019}}
Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch
P. P. Talukdar, and W. Cohen In 17th International Conference on Artificial Intelligence and Statistics (AISTATS, 2014. [PDF] [abstract] [bib]
Graph-based Semi-supervised learning (SSL) algorithms have been successfully used in a large number of applications. These methods classify initially unlabeled nodes by propa- gating label information over the structure of graph starting from seed nodes. Graph-based SSL algorithms usually scale linearly with the number of distinct labels (m), and require O(m) space on each node. Unfortunately, there exist many applications of practical sig- nificance with very large m over large graphs, demanding better space and time complexity. In this paper, we propose MAD-Sketch, a novel graph-based SSL algorithm which compactly stores label distribution on each node using Count-min Sketch, a random- ized data structure. We present theoretical analysis showing that under mild conditions, MAD-Sketch can reduce space complexity at each node from O(m) to O(logm), and achieve similar savings in time complexity as well. We support our analysis through exper- iments on multiple real world datasets. We observe that MAD-Sketch achieves simi- lar performance as existing state-of-the-art graph-based SSL algorithms, while requir- ing smaller memory footprint and at the same time achieving up to 10x speedup. We find that MAD-Sketch is able to scale to datasets with one million labels, which is be- yond the scope of existing graph-based SSL algorithms.

@inproceedings{talukdar2014scaling, title={Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch}, author={Talukdar, Partha P and Cohen, William W}, booktitle={17th International Conference on Artificial Intelligence and Statistics (AISTATS 2014)}, year={2014}, address ={Reyjavik, Iceland}}
Programming with Personalized PageRank: A Locally Groundable First-Order Probabilistic Logic.
W.Y. Wang, K. Mazaitis and W.W. Cohen. In Proceedings of the Conference on Information and Knowledge Management (CIKM), 2013. [PDF] [abstract] [bib]
Many information-management tasks (including classification, retrieval, information extraction, and information integration) can be formalized as inference in an appropriate probabilistic first-order logic. However, most probabilistic first-order logics are not efficient enough for realistically-sized instances of these tasks. One key problem is that queries are typically answered by "grounding" the query---i.e., mapping it to a propositional representation, and then performing propositional inference---and with a large database of facts, groundings can be very large, making inference and learning computationally expensive. Here we present a first-order probabilistic language which is well-suited to approximate "local" grounding: in particular, every query Q can be approximately grounded with a small graph. The language is an extension of stochastic logic programs where inference is performed by a variant of personalized PageRank. Experimentally, we show that the approach performs well on an entity resolution task, a classification task, and a joint inference task; that the cost of inference is independent of database size; and that speedup in learning is possible by multi-threading.

@article{wangprogramming2013, title={Programming with Personalized PageRank: A Locally Groundable First-Order Probabilistic Logic}, author={Wang, William Yang and Mazaitis, Kathryn and Cohen, William W}, journal={to appear in Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM 2013)}, year={2013}}
Improving Learning and Inference in a Large Knowledge-base using Latent Syntactic Cues.
Matt Gardner, Partha Pratim Talukdar, Bryan Kisiel, and Tom Mitchell. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), 2013. [PDF] [abstract] [bib]
Automatically constructed Knowledge Bases (KBs) are often incomplete and there is a genuine need to improve their coverage. Path Ranking Algorithm (PRA) is a recently proposed method which aims to improve KB coverage by performing inference directly over the KB graph. For the first time, we demonstrate that addition of edges labeled with latent features mined from a large dependency parsed corpus of 500 million Web documents can significantly outperform previous PRA-based approaches on the KB inference task. We present extensive experimental results validating this finding. The resources presented in this paper are publicly available.

@article{gardnerpra2013, title{Improving Learning and Inference in a Large Knowledge-base using Latent Syntactic Cues} author={Gardner, Matt and Talukdar, Partha Pratim and Kisiel, Bryan and Mitchell, Tom}, journal={Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013)}, year={2013}}
PIDGIN: Ontology Alignment using Web Text as Interlingua.
D.T. Wijaya, P.P. Talukdar and T.M. Mitchell. In Proceedings of the Conference on Information and Knowledge Management (CIKM), 2013. [PDF] [abstract] [bib]
The problem of aligning ontologies and database schemas across different knowledge bases and databases is fundamental to knowledge management problems, including the problem of integrating the disparate knowledge sources that form the semantic web's Linked Data. We present a novel approach to this ontology alignment problem that employs a very large natural language text corpus as an interlingua to relate different knowledge bases (KBs). The result is a scalable and robust method (PIDGIN) that aligns relations and categories across different KBs by analyzing both (1) shared relation instances across these KBs, and (2) the verb phrases in the text instantiations of these relation instances. Experiments with PIDGIN demonstrate its superior performance when aligning ontologies across large existing KBs including NELL, Yago and Freebase. Furthermore, we show that in addition to aligning ontologies, PIDGIN can automatically learn from text, the verb phrases to identify relations, and can also type the arguments of relations of different KBs.

@InProceedings{wijaya:2013:PIDGIN, author = {Wijaya, Derry and Talukdar, Partha Pratim and Mitchell, Tom}, title = {PIDGIN: Ontology Alignment using Web Text as Interlingua}, booktitle = {Proceedings of the Conference on Information and Knowledge Management (CIKM 2013)}, month = {October}, year = {2013}, address = {San Francisco, USA}, publisher = {Association for Computing Machinery}}
Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World.
Jayant Krishnamurthy and Thomas Kollar. in Transactions of the Association for Computational Linguistics, Volume 1, 2013. [PDF] [Data and Online Appendix]
Vector Space Semantic Parsing: A Framework for Compositional Vector Space Models.
Jayant Krishnamurthy and Tom M. Mitchell. in Proceedings of the ACL 2013 Workshop on Continuous Vector Space Models and their Compositionality, 2013. [PDF]
Classifying Entities into an Incomplete Ontology.
Bhavana Dalvi, William W. Cohen, and Jamie Callan, in AKBC , 2013, 3rd Knowledge Extraction workshop at CIKM 2013 [Draft]
Exploratory Learning.
Bhavana Dalvi, William W. Cohen, and Jamie Callan, in to appear in Proceedings of European Conference on “Machine Learning” ECML/PKDD , 2013. [PDF] [bib]
@inproceedings{dalvi_ecml13, author = {Dalvi, Bhavana and Cohen, William W. and Callan, Jamie}, title = {Exploratory Learning}, booktitle = {Proceedings of the 2013 European conference on Machine Learning and Knowledge Discovery in Databases}, series = {ECML PKDD'13}, year = {2013}, location = {Prague, Czech Republic}, publisher = {Springer-Verlag}, }
From Topic Models to Semi-Supervised Learning: Biasing Mixed-membership Models to Exploit Topic-Indicative Features in Entity Clustering.
Ramnath Balasubramanyan, Bhavana Dalvi and William W. Cohen, in to appear in Proceedings of European Conference on “Machine Learning” ECML/PKDD, 2013. [PDF] [bib]
@inproceedings{rbalasub_dalvi_ecml13, author = {Balasubramanyan, Ramnath and Dalvi, Bhavana and Cohen, William W.}, title = {From Topic Models to Semi-Supervised Learning: Biasing Mixed-membership Models to Exploit Topic-Indicative Features in Entity Clustering}, booktitle = {Proceedings of the 2013 European conference on Machine Learning and Knowledge Discovery in Databases}, series = {ECML PKDD'13}, year = {2013}, location = {Prague, Czech Republic}, publisher = {Springer-Verlag}, }
Very Fast Similarity Queries on Semi-Structured Data from the Web .
Bhavana Dalvi, William W. Cohen, and Jamie Callan, in SDM , 2013. [PDF]
Conversing Learning: active learning and active social interaction for human supervision in never-ending learning systems.
S. D. S. Pedro and E. R. Hruschka Jr. In Proceedings of the 13th Ibero-American Conference on AI (IBERAMIA) , 2012. [PDF] [abstract] [bib]
The Machine Learning community have been introduced to NELL (Never-Ending Language Learning), a system able to learn from the web and, also, able to use its own knowledge to keep learning better each day. The idea of continuously learning from the web brings concerns about reliability and accuracy, mainly when the learning process uses its own knowledge to improve its learning capabilities. Considering that its knowledge base keeps growing forever, such a system requires self-supervision as well as self-reflection. The increased use of the Internet, that allowed NELL creation, also brought a new source of on-line information. The social media becomes more popular everyday and the AI community can now develop research to take advantage of these information, aiming to turn it into knowledge. This work follows this lead and proposes a new machine learning approach, called Conversing Learning, which uses collective knowledge from web community users to provide self-supervision and self-reflection to intelligent machines, thus, they can improve their own learning capabilities. The Conversing Learning approach explores concepts from Active Learning, as well as Question Answering to achieve the goal of showing what can be done towards autonomous Human Computer Interaction to automatically improve machine learning performance.

@incollection{pedro2012conversing, title={Conversing learning: Active learning and active social interaction for human supervision in never-ending learning systems}, author={Pedro, Saulo DS and {Hruschka Jr}, Estevam R}, booktitle={Advances in Artificial Intelligence--IBERAMIA 2012}, pages={231--240}, year={2012}, publisher={Springer} }
Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction.
S. Verma and E. R. Hruschka Jr. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2012. [PDF] [abstract] [bib] [slides]
Our inspiration comes from Nell (Never Ending Language Learning), a computer program running at Carnegie Mellon University to extract structured information from unstructured web pages. We consider the problem of semi-supervised learning approach to extract category instances (e.g. country(USA), city(New York)) from web pages, starting with a handful of labeled training examples of each category or relation, plus hundreds of millions of unlabeled web documents. Semi-supervised approaches using a small number of labeled examples together with many unlabeled examples are often unreliable as they frequently produce an internally consistent, but nevertheless, incorrect set of extractions. We believe that this problem can be overcome by simultaneously learning independent classifiers in a new approach named Coupled Bayesian Sets algorithm, based on Bayesian Sets, for many different categories and relations (in the presence of an ontology defining constraints that couple the training of these classifiers). Experimental results show that simultaneously learning a coupled collection of classifiers for random 11 categories resulted in much more accurate extractions than training classifiers through original Bayesian Sets algorithm, Naive Bayes, BaS-all and Coupled Pattern Learner (the category extractor used in NELL).

@InProceedings{verma:2012:cbs, author = {Verma, Saurabh and {Hruschka Jr.}, Estevam Rafael}, title = {Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction}, booktitle = {Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2012)}, month = {September}, year = {2012}, address = {Bristol, UK}, publisher = {Association for Computing Machinery}}
Acquiring Temporal Constraints between Relations.
P.P. Talukdar, D.T. Wijaya and T.M. Mitchell. In Proceedings of the Conference on Information and Knowledge Management (CIKM), 2012. [PDF] [abstract] [bib]
We consider the problem of automatically acquiring knowledge about the typical temporal orderings among relations (e.g., actedIn(person, film) typically occurs before wonPrize (film, award)), given only a database of known facts (relation instances) without time information, and a large document collection. Our approach is based on the conjecture that the narrative order of verb mentions within documents correlates with the temporal order of the relations they represent. We propose a family of algorithms based on this conjecture, utilizing a corpus of 890m dependency parsed sentences to obtain verbs that represent relations of interest, and utilizing Wikipedia documents to gather statistics on narrative order of verb mentions. Our proposed algorithm, GraphOrder, is a novel and scalable graph-based label propagation algorithm that takes transitivity of temporal order into account, as well as these statistics on narrative order of verb mentions. This algorithm achieves as high as 38.4% absolute improvement in F1 over a random baseline. Finally, we demonstrate the utility of this learned general knowledge about typical temporal orderings among relations, by showing that these temporal constraints can be successfully used by a joint inference framework to assign specific temporal scopes to individual facts.

@InProceedings{talukdar:2012:temporal, author = {Talukdar, Partha Pratim and Wijaya, Derry and Mitchell, Tom}, title = {Acquiring Temporal Constraints between Relations}, booktitle = {Proceedings of the Conference on Information and Knowledge Management (CIKM 2012)}, month = {October}, year = {2012}, address = {Hawaii, USA}, publisher = {Association for Computing Machinery}}
Weakly Supervised Training of Semantic Parsers.
J. Krishnamurthy and T.M. Mitchell. In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2012. [PDF] [abstract] [bib]
We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms of weak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependency-parsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-the-art accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.

@InProceedings{krishnamurthy:2012, author = {Krishnamurthy, Jayant and Mitchell, Tom M.}, title = {Weakly Supervised Training of Semantic Parsers}, booktitle = {Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)}, month = {July}, year = {2012}, publisher = {Association for Computational Linguistics}}
Collectively Representing Semi-Structured Data from the Web.
Bhavana Dalvi, William W. Cohen, and Jamie Callan, in AKBC-2012 , 2012. [PDF]
Bootstrapping Biomedical Ontologies for Scientific Text using NELL.
Dana Movshovitz-Attias and William W. Cohen, , in BioNLP-2012, 2012. [PDF]
WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction .
Bhavana Dalvi, William W. Cohen, and Jamie Callan, in WSDM-2012 , 2012. [PDF] [bib]
@InProceedings{dalvi:wsdm:2012, author = {Dalvi, Bhavana and Cohen, William W. and Callan, Jamie}, title = {WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction}, booktitle = {Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM)}, month = {February}, year = {2012}, address = {Seattle, Washington, USA}, publisher = {Association for Computing Machinery}}
Coupled Temporal Scoping of Relational Facts.
P.P. Talukdar, D.T. Wijaya and T.M. Mitchell. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), 2012. [PDF] [abstract] [bib]
Recent research has made significant advances in automatically constructing knowledge bases by extracting relational facts (e.g., Bill Clinton-presidentOf-US) from large text corpora. Temporally scoping such relational facts in the knowledge base (i.e., determining that Bill Clinton-presidentOf-US is true only during the period 1993 - 2001) is an important, but relatively unexplored problem. In this paper, we propose a joint inference framework for this task, which leverages fact-specific temporal constraints, and weak supervision in the form of a few labeled examples. Our proposed framework, CoTS (Coupled Temporal Scoping), exploits temporal containment, alignment, succession, and mutual exclusion constraints among facts from within and across relations. Our contribution is multi-fold. Firstly, while most previous research has focused on micro-reading approaches for temporal scoping, we pose it in a macro-reading fashion, as a change detection in a time series of facts' features computed from a large number of documents. Secondly, to the best of our knowledge, there is no other work that has used joint inference for temporal scoping. We show that joint inference is effective compared to doing temporal scoping of individual facts independently. We conduct our experiments on large scale open-domain publicly available time-stamped datasets, such as English Gigaword Corpus and Google Books Ngrams, demonstrating CoTS's effectiveness.

@InProceedings{talukdar:2012:coupled, author = {Talukdar, Partha Pratim and Wijaya, Derry and Mitchell, Tom}, title = {Coupled Temporal Scoping of Relational Facts}, booktitle = {Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM)}, month = {February}, year = {2012}, address = {Seattle, Washington, USA}, publisher = {Association for Computing Machinery}}
Closing the Loop: Fast, Interactive Semi-supervised Annotation With Queries on Features and Instances.
B. Settles. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011. [PDF] [abstract] [bib] [software homepage]
This paper describes DUALIST, an active learning annotation paradigm which solicits and learns from labels on both features (e.g., words) and instances (e.g., documents). We present a novel semi-supervised training algorithm developed for this setting, which is (1) fast enough to support real-time interactive speeds, and (2) at least as accurate as pre-existing methods for learning with mixed feature and instance labels. Human annotators in user studies were able to produce near-state-of-the-art classifiers on several corpora in a variety of application domains with only a few minutes of effort.

@InProceedings{settles:2011:EMNLP, author = {Settles, Burr}, title = {Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances}, booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing}, month = {July}, year = {2011}, address = {Edinburgh, Scotland, UK.}, publisher = {Association for Computational Linguistics}, pages = {1467--1478}, url = {http://www.aclweb.org/anthology/D11-1136}}
Discovering Relations between Noun Categories.
T. Mohamed, E.R. Hruschka Jr. and T.M. Mitchell. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011. [PDF] [abstract] [bib]
Traditional approaches to Relation Extraction from text require manually defining the relations to be extracted. We propose here an approach to automatically discovering relevant relations, given a large text corpus plus an initial ontology defining hundreds of noun categories (e.g., Athlete, Musician, Instrument). Our approach discovers frequently stated relations between pairs of these categories, using a two step process. For each pair of categories (e.g., Musician and Instrument) it first co-clusters the text contexts that connect known instances of the two categories, generating a candidate relation for each resulting cluster. It then applies a trained classifier to determine which of these candidate relations is semantically valid. Our experiments apply this to a text corpus containing approximately 200 million web pages and an ontology containing 122 categories from the NELL system, producing a set of 781 proposed candidate relations, approximately half of which are semantically valid. We conclude this is a useful approach to semi-automatic extension of the ontology for large-scale information extraction systems such as NELL.

@InProceedings{mohamed-hruschka-mitchell:2011:EMNLP, author = {Mohamed, Thahir and Hruschka, Estevam and Mitchell, Tom}, title = {Discovering Relations between Noun Categories}, booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing}, month = {July}, year = {2011}, address = {Edinburgh, Scotland, UK.}, publisher = {Association for Computational Linguistics}, pages = {1447--1455}, url = {http://www.aclweb.org/anthology/D11-1134}}
Random Walk Inference and Learning in A Large Scale Knowledge Base.
N. Lao, T.M. Mitchell, W.W. Cohen In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011. [PDF] [abstract] [bib]
We consider the problem of performing learning and inference in a large scale knowledge base containing imperfect knowledge with incomplete coverage. We show that a soft inference procedure based on a combination of constrained, weighted, random walks through the knowledge base graph can be used to reliably infer new beliefs for the knowledge base. More specifically, we show that the system can learn to infer different target relations by tuning the weights associated with random walks that follow different paths through the graph, using a version of the Path Ranking Algorithm (Lao & Cohen, 2010). We apply this approach to a knowledge base of approximately 500,000 beliefs extracted imperfectly from the web by NELL, a never-ending language learner (Carlson et al., 2010). This new system improves significantly over NELL's earlier Horn-clause learning and inference method: it obtains nearly double the precision at rank 100, and the new learning method is also applicable to many more inference tasks.

@InProceedings{lao-mitchell-cohen:2011:EMNLP, author = {Lao, Ni and Mitchell, Tom and Cohen, William W.}, title = {Random Walk Inference and Learning in A Large Scale Knowledge Base}, booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing}, month = {July}, year = {2011}, address = {Edinburgh, Scotland, UK.}, publisher = {Association for Computational Linguistics}, pages = {529--539}, url = {http://www.aclweb.org/anthology/D11-1049}}
Which Noun Phrases Denote Which Concepts?.
J. Krishnamurthy, T.M. Mitchell. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), 2011. [PDF] [abstract] [bib]
Resolving polysemy and synonymy is required for high-quality information extraction. We present ConceptResolver, a component for the Never-Ending Language Learner (NELL) that handles both phenomena by identifying the latent concepts that noun phrases refer to. ConceptResolver performs both word sense induction and synonym resolution on relations extracted from text using an ontology and a small amount of labeled data. Domain knowledge (the ontology) guides concept creation by defining a set of possible semantic types for concepts. Word sense induction is performed by inferring a set of semantic types for each noun phrase. Synonym detection exploits redundant information to train several domain-specific synonym classifiers in a semi-supervised fashion. When ConceptResolver is run on NELL's knowledge base, 87% of the word senses it creates correspond to real-world concepts, and 85% of noun phrases that it suggests refer to the same concept are indeed synonyms.

@inproceedings{krishnamurthy-acl, Title = {Which Noun Phrases Denote Which Concepts}, Author = {Jayant Krishnamurthy and Tom M. Mitchell}, Booktitle = {Proceedings of the Forty Ninth Annual Meeting of the Association for Computational Linguistics}, Year = {2011}}
Adaptation of Graph-Based Semi-Supervised Methods to Large-Scale Text Data.
Frank Lin and William W. Cohen, in MLG-2011 , 2011. [PDF]
Understanding Semantic Change of Words Over Centuries.
D.T. Wijaya and R. Yeniterzi. In Workshop on Detecting and Exploiting Cultural Diversity on the Social Web (DETECT), 2011 at CIKM 2011. [PDF] [abstract]
In this paper, we propose to model and analyze changes that occur to an entity in terms of changes in the words that co-occur with the entity over time. We propose to do an in-depth analysis of how this co-occurrence changes over time, how the change influences the state (semantic, role) of the entity, and how the change may correspond to events occurring in the same period of time. We propose to identify clusters of topics surrounding the entity over time using Topics-Over-Time (TOT) and k-means clustering. We conduct this analysis on Google Books Ngram dataset. We show how clustering words that co-occur with an entity of interest in 5-grams can shed some lights to the nature of change that occurs to the entity and identify the period for which the change occurs. We find that the period identified by our model precisely coincides with events in the same period that correspond to the change that occurs.
"Nut Case: What does It Mean?": Understanding Semantic Relationship between Nouns in Noun Compounds through Paraphrasing and Ranking the Paraphrases.
D.T. Wijaya and P. Gianfortoni. In Workshop on Search and Mining Entity-Relationship Data (SMER), 2011 at CIKM 2011. [PDF] [abstract]
A noun compound (NC) is a sequence of two or more nouns (entities) acting as a single noun entity that encodes implicit semantic relation between its noun constituents. Given an NC such as 'headache pills' and possible paraphrases such as: 'pills that in-duce headache' or 'pills that relieve head-ache' can we learn to choose which verb: 'induce' or 'relieve' that best describes the semantic relation encoded in 'headache pills'? In this paper, we describe our approaches to rank human-proposed paraphrasing verbs of NCs. Our contribution is a novel approach that uses two-step process of clustering similar NCs and then labeling the best paraphrasing verb as the most prototypical verb in the cluster. The approach performs the best with an average Spearman's rank correlation of 0.55. This approach, while being computationally simpler, gives a better ranking than the current state of the art. The result shows the potential of our approach for finding implicit relations between entities especially when the relations are not explicit in the context in which the entities appear, rather they are implicit in the relationship between its constituents.
Toward an Architecture for Never-Ending Language Learning.
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E.R. Hruschka Jr. and T.M. Mitchell. In Proceedings of the Conference on Artificial Intelligence (AAAI), 2010. [PDF] [abstract] [bib] [supplementary materials]
We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent that runs forever and that each day must (1) extract, or read, information from the web to populate a growing structured knowledge base, and (2) learn to perform this task better than on the previous day. In particular, we propose an approach and a set of design principles for such an agent, describe a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs with an estimated precision of 74%, and discuss lessons learned from this preliminary attempt to build a never-ending learning agent.

@inproceedings{carlson-aaai, Title = {Toward an Architecture for Never-Ending Language Learning}, Author = {Andrew Carlson and Justin Betteridge and Bryan Kisiel and Burr Settles and Estevam R. Hruschka Jr. and Tom M. Mitchell}, Booktitle = {Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010)}, Year = {2010}}
Coupled Semi-Supervised Learning for Information Extraction.
A. Carlson, J. Betteridge, R.C. Wang, E.R. Hruschka Jr. and T.M. Mitchell. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), 2010. [PDF] [abstract] [bib] [supplementary materials]
We consider the problem of semi-supervised learning to extract categories (e.g., academicFields, athletes) and relations (e.g., PlaysSport(athlete, sport)) from web pages, starting with a handful of labeled training examples of each category or relation, plus hundreds of millions of unlabeled web documents. Semi-supervised training using only a few labeled examples is typically unreliable because the learning task is underconstrained. This paper pursues the thesis that much greater accuracy can be achieved by further constraining the learning task, by coupling the semi-supervised training of many extractors for different categories and relations. We characterize several ways in which the training of category and relation extractors can be coupled, and present experimental results demonstrating significantly improved accuracy as a result.

@inproceedings{carlson-wsdm, Title = {Coupled Semi-Supervised Learning for Information Extraction}, Author = {Andrew Carlson and Justin Betteridge and Richard C. Wang and Estevam R. Hruschka Jr. and Tom M. Mitchell}, Booktitle = {Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010)}, Year = {2010}}
Populating the Semantic Web by Macro-Reading Internet Text.
T.M. Mitchell, J.Betteridge, A. Carlson, E.R. Hruschka Jr. and R.C. Wang. Invited Paper, In Proceedings of the International Semantic Web Conference (ISWC), 2009. [PDF] [abstract] [bib]
A key question regarding the future of the semantic web is "how will we acquire structured information to populate the semantic web on a vast scale?" One approach is to enter this information manually. A second approach is to take advantage of pre-existing databases, and to develop common ontologies, publishing standards, and reward systems to make this data widely accessible. We consider here a third approach: developing software that automatically extracts structured information from unstructured text present on the web. We also describe preliminary results demonstrating that machine learning algorithms can learn to extract tens of thousands of facts to populate a diverse ontology, with imperfect but reasonably good accuracy.

@inproceedings{mitchell-iswc09, Title = {Populating the Semantic Web by Macro-Reading Internet Text}, Author = {Tom M. Mitchell and Justin Betteridge and Andrew Carlson and Estevam R. Hruschka Jr. and Richard C. Wang}, Booktitle = {Proceedings of the 8th International Semantic Web Conference (ISWC 2009)}, Year = {2009}}
Coupling Semi-Supervised Learning of Categories and Relations.
A. Carlson, J. Betteridge, E.R. Hruschka Jr. and T.M. Mitchell. In Proceedings of the NAACL HLT Workshop on Semi-supervised Learning for Natural Language Processing, 2009. [PDF] [abstract] [bib]
We consider semi-supervised learning of information extraction methods, especially for extracting instances of noun categories (e.g., athlete, team) and relations (e.g., playsForTeam(athlete,team)). Semisupervised approaches using a small number of labeled examples together with many unlabeled examples are often unreliable as they frequently produce an internally consistent, but nevertheless incorrect set of extractions. We propose that this problem can be overcome by simultaneously learning classifiers for many different categories and relations in the presence of an ontology defining constraints that couple the training of these classifiers. Experimental results show that simultaneously learning a coupled collection of classifiers for 30 categories and relations results in much more accurate extractions than training classifiers individually.

@inproceedings{carlson-sslnlp09, Title = {Coupling Semi-Supervised Learning of Categories and Relations}, Author = {Andrew Carlson and Justin Betteridge and Estevam R. Hruschka Jr. and Tom M. Mitchell}, Booktitle = {Proceedings of the NAACL HLT 2009 Workskop on Semi-supervised Learning for Natural Language Processing}, Year = {2009}}

Conferences and Seminars - Slides and Videos

Never Ending Language Learning
Tom Mitchell's talk in the Univ. of Washington CSE Distinguished Lecture Series, October 21, 2010. [slides], [video]
Toward an Architecture for Never-Ending Language Learning
Andy Carlson's AAAI 2010 presentation, July 2010. [slides]
Never Ending Learning
Tom Mitchell's invited ICML 2010 presentation, June 2010. [slides]
Coupled Semi-Supervised Learning for Information Extraction
Andy Carlson's WSDM 2010 presentation, February 2010. [slides] [video]
How Will We Populate the Semantic Web on a Vast Scale?
Tom Mitchell's keynote talk at ISWC 2009, October 2009. [slides] [video]
Toward Never-Ending Learning of Semantic Knowledge
Tom Mitchell's invited talk at the 2009 AAAI Spring Symposium, March 2009. [slides]