July 1, 2021
Facebook is sharing new research and two new data sets intended to help the research community build more sophisticated and effective conversational AI systems for hundreds of millions of people around the globe.
Conversational AI and digital assistants are advancing rapidly, enabling much more complex and helpful uses, but these improvements are often limited to people who speak widely used languages such as English. Moreover, it is often very difficult to scale an existing model to support new use cases.
One important reason for this is the constraints of labeled training data. These systems use state-of-the-art deep neural models to parse and understand complex requests and commands. These natural language understanding (NLU) models rely on large amounts of annotated training data for each task and language. And extensive labeled data sets are not available in many less widely spoken languages. They are also often difficult or impossible to obtain for novel use cases.
Our method overcomes this limitation. It uses 10x less training data to create state-of-the-art conversational AI systems that can perform unfamiliar, complex tasks. Through improved training techniques and better representation learning, we can build models that much more efficiently understand an instruction such as “Show me driving directions to the Eagles game,” which requires that the AI understand multiple intents.
It is also difficult to scale conversational AI models to new languages. So we are also offering details on a multilingual NLU model that outperforms single-language models. Our method is effective at scaling models to languages that lack large labeled data sets.
By scaling NLU models to support more diverse use cases in more languages, especially ones that lack extensive collections of labeled training data, we can democratize conversational AI and bring this technology to many more people.
Traditional NLU models take a straightforward approach to parsing a question like “How is the weather in San Francisco?” They first match the intent (“GET_WEATHER” in this case) to a set of predefined intent labels and then identify all the necessary slots for that intent (in this case, tagging San Francisco as a LOCATION slot). More complex tasks, however, require a more sophisticated technique, and models typically must have large numbers of domain-specific labeled training examples for each task.
We enhance NLU models to support a more diverse set of domains without having to heavily rely on manually annotated training data. Our method can create task-oriented semantic parsers for new domains with as few as 25 training samples per intent or slot label.
We first show that pretrained Transformer models, such as BART, are critical for learning more rich and robust representations for generalizing to new low-resource domains. On the other hand, these large pretrained models sometimes pose challenges when fine-tuning with very few training samples on the new domains. Therefore, we further employ meta-learning to improve generalization of the BART model trained on the high-resource domains, making it easier to fine-tune it on the target domains with very little training data.
We also propose a technique called low-rank adaptive label smoothing (LORAS) to exploit the latent structure in the label space of the NLU task, which can improve model accuracy in the low-resource setting where few samples are available to learn representations for novel user intents.
We are releasing TOPv2, a multidomain NLU data set with eight domains and over 180,000 annotated samples.
Using our meta-learning and LORAS techniques on TOPv2, we’ve achieved performance similar to that of standard supervised methods while using 10x less training data.
Scaling NLU models to new languages is also challenging, as it typically involves building large annotated data sets, which is both difficult and time consuming. We attempt to simplify this by building multilingual NLU models that can transfer their learnings from languages with large amounts of training data to other languages with less data. We used pretrained multilingual Transformer models, such as XLM-R, mBART, CRISS, and MARGE, as the building blocks of our NLU model. Through our experiments, we show that a shared multilingual NLU model for multiple languages improves performance significantly compared with a per-language model, for all languages, thereby enabling faster language scale-up.
Through the use of machine translation, we further explore translate-align data augmentation and propose a distant supervision technique in order to build models that generalize well without using any training data in the target language. Our zero-shot models, on average, achieve an error rate that approaches that of our best in-language NLU models, which means we can develop a model in Thai without having any Thai training data.
We are releasing the MTOP data set, a multilingual task-oriented parsing data set with roughly 100K total utterances across six languages, 11 domains, and 117 intent types. More details regarding the data set, its creation process, and our experiments are available in this paper.
As Facebook AI’s Chief Scientist, Yann LeCun, has recently noted, the future of AI research is in building more intelligent generalist models that can acquire new skills across different tasks, domains and languages without massive amounts of labeled data. This is particularly true in the field of conversational AI, where systems need to be able to understand all types of users and serve all kinds of needs. Facebook has a long-term commitment to building these sorts of systems because the world is simply too varied and diverse to be understood by machines that were trained only on manually curated and labeled examples. We know that technical innovations, such as new model architectures, are only one part of ensuring that AI systems are fair. Facebook AI has made a long-term commitment to developing AI responsibly and this work will require developing new ways to measure fairness, new technical toolkits, as well as ongoing, open dialogue with outside experts, policymakers, and others.
The new work we are sharing in this blog post will help us move closer to this vision of generalized models that can do many things well for many different groups of people.
We'd like to acknowledge Brian Moran, Keith Diedrick and T.J. Trimble for their help with the preparation of the TOPv2 dataset.
Research Scientist Manager
Applied Research Scientist Manager