September 21, 2020
KILT (Knowledge Intensive Language Tasks) is a new unified benchmark to help AI researchers build models that are better able to leverage real-world knowledge to accomplish a broad range of tasks.
It unifies 11 widely used public datasets representing five different types of tasks: fact-checking, open-domain question answering, slot filling, entity linking, and dialog generation. KILT is the first benchmark that aggregates datasets representing such a wide variety of knowledge-intensive tasks.
All the datasets in KILT are aligned with a single knowledge source: a recent snapshot of Wikipedia. This can help catalyze research into unified, task-agnostic architectures for knowledge-intensive tasks. It also makes it much easier to experiment with different task-specific solutions.
When evaluating how models perform on knowledge-based tasks, it’s important to consider not just the particular output but also the specific information used to produce it. The KILT benchmark includes provenance information, or the mapping of the correct knowledge that can solve the task. For several tasks, we make the provenance annotation more comprehensive with an annotation campaign. Together, the output and provenance allow researchers to assess a model’s accuracy and its ability to justify a model prediction.
KILT unifies its 11 datasets in a single format and grounds them in a single preprocessed collection of the entire Wikipedia corpus. Preprocessing large corpora is a time-consuming process that can have a large effect on models’ downstream performance. Mapping all datasets to a single corpus not only makes research work in this area more convenient but also enables more accurate and balanced evaluation across different models.
Because all the datasets are mapped to the same corpus and use a unified format, KILT makes it much easier to explore multitask learning approaches and transfer learning. We hope this will enable the development of models and representations that can generalize across the whole suite of KILT tasks.
The AI research community has made great strides in building models that can generate text that mimics natural language. State-of-the-art systems today perform so well that it can be hard to distinguish their output from text written by a person.
An important next step is to make these models generate text that is not only fluent but also grounded in real-world knowledge. These kinds of natural language processing models are already used today in real-world AI applications, from recommender systems to chatbots to intelligent assistants. KILT facilitates the research needed to improve these systems and ultimately to build machines with deep, broadly useful knowledge of our world.