Tuesday, July 02, 2013
Some of the biggest challenges for the scientific community today involve understanding the principles and mechanisms that underlie natural language use on the Web. An example of long-standing problem is language ambiguity; when somebody types the word “Rio” in a query do they mean the city, a movie, a casino, or something else? Understanding the difference can be crucial to help users get the answer they are looking for. In the past few years, a significant effort in industry and academia has focused on disambiguating language with respect to Web-scale knowledge repositories such as Wikipedia and Freebase. These resources are used primarily as canonical, although incomplete, collections of “entities”. As entities are often connected in multiple ways, e.g., explicitly via hyperlinks and implicitly via factual information, such resources can be naturally thought of as (knowledge) graphs. This work has provided the first breakthroughs towards anchoring language in the Web to interpretable, albeit initially shallow, semantic representations. Google has brought the vision of semantic search directly to millions of users via the adoption of the Knowledge Graph. This massive change to search technology has also been called a shift “from strings to things”.
Understanding natural language is at the core of Google's work to help people get the information they need as quickly and easily as possible. At Google we work hard to advance the state of the art in natural language processing, to improve the understanding of fundamental principles, and to solve the algorithmic and engineering challenges to make these technologies part of everyday life. Language is inherently productive; an infinite number of meaningful new expressions can be formed by combining the meaning of their components systematically. The logical next step is the semantic modeling of structured meaningful expressions -- in other words, “what is said” about entities. We envision that knowledge graphs will support the next leap forward in language understanding towards scalable compositional analyses, by providing a universe of entities, facts and relations upon which semantic composition operations can be designed and implemented.
So we’ve just awarded over $1.2 million to support several natural language understanding research awards given to university research groups doing work in this area. Research topics range from semantic parsing to statistical models of life stories and novel compositional inference and representation approaches to modeling relations and events in the Knowledge Graph.
These awards went to researchers in nine universities and institutions worldwide, selected after a rigorous internal review:
- Mark Johnson and Lan Du (Macquarie University) and Wray Buntine (NICTA) for “Generative models of Life Stories”
- Percy Liang and Christopher Manning (Stanford University) for “Tensor Factorizing Knowledge Graphs”
- Sebastian Riedel (University College London) and Andrew McCallum (University of Massachusetts, Amherst) for “Populating a Knowledge Base of Compositional Universal Schema”
- Ivan Titov (University of Amsterdam) for “Learning to Reason by Exploiting Grounded Text Collections”
- Hans Uszkoreit (Saarland University and DFKI), Feiyu Xu (DFKI and Saarland University) and Roberto Navigli (Sapienza University of Rome) for “Language Understanding cum Knowledge Yield”
- Luke Zettlemoyer (University of Washington) for “Weakly Supervised Learning for Semantic Parsing with Knowledge Graphs”
We believe the results will be broadly useful to product development and will further scientific research. We look forward to working with these researchers, and we hope we will jointly push the frontier of natural language understanding research to the next level.