June 30, 2021
AI is an important tool to support public health experts around the world in their efforts to keep people safe and informed amid the coronavirus pandemic. Facebook AI is partnering with academic researchers and other experts on a range of initiatives related to COVID-19. We are sharing overviews of several of these now, and we will add updates and more information in the days and weeks to come.
More information is available here about Facebook’s broader efforts related to this pandemic.
The number of coronavirus cases continues to change quickly in different communities around the world. Since April of last year, we have created and shared high-quality, localized COVID-19 forecasting models to help healthcare providers and emergency responders determine how best to plan and allocate their resources in their particular area. We are now open-sourcing our entire stack of COVID-19 forecasting models so that response teams, governments, and researchers can use them to further help their communities.
These AI models, developed by Facebook AI in collaboration with academic researchers at New York University’s Courant Institute of Mathematical Sciences, the Universitat Politècnica de Catalunya (UPC), and the Faculty of Mathematics and the Data Science research platform at the University of Vienna, use publicly available, de-identified time series data about the spread of the disease. They have consistently been among the most accurate models since the beginning of the pandemic (as shown in the figure below).
Through this open research effort, we also hope to help advance epidemiological forecasting as it allows researchers to reuse, extend, and improve our methods. In addition, we continue to provide daily forecasts for the United States, which are hosted and maintained by Data for Good and Facebook AI.
As part of open-sourcing our code, we are also supporting UPC to extend our model to the European Union. Since early in the pandemic, the BIOCOM-SC team at the Universitat Politècnica de Catalunya has provided comprehensive reports and forecasts to the European Commission about the spread of COVID-19. Based on the success of our forecasts in the United States, BIOCOM-SC is now leading the effort to apply the model to similar scenarios in the European Union. First experiments by Enric Alvarez Lacalle, a professor of physics at the Universitat Politècnica de Catalunya, and collaborators have shown promising results, especially for longer forecast horizons and higher spatial resolution. We invite the global research community to explore our open source code for similar applications and forecasting tasks.
As vaccination rates can influence the spread of COVID-19, we have also extended our model to account for these new conditions. The model showed a brief decrease in performance as vaccines became widely available to different communities. But by incorporating public data about vaccination rates, the model adjusted quickly to the new disease dynamics and was soon among the top-performing forecasts in February and March.
As forecasts can inform policy and resource allocation decisions during the pandemic, an important question is whether forecasts are similarly accurate across counties with different demographic characteristics. Answering this question can help to minimize the potential for unfair distribution of those resources. Such evaluation of machine learning models is especially important when they are used to inform these sorts of real-world decisions.
For this reason, we also evaluated the correlation of forecasting errors with demographic properties of U.S. counties, such as those related to income (e.g., percent of residents below poverty level), education (e.g., percent of non-college educated residents), and majority race/ethnicity rates (e.g., percent of non-white population). Our analysis showed that the error of our forecasts has low correlation with these demographic aspects of counties.
As we are now open-sourcing our model, we encourage new users to perform similar analyses for their specific use-cases and forecasting regions.
To help researchers further improve their COVID-19 forecasting efforts, we are also sharing our analysis of which factors contribute to our model’s performance.
Central to our model is a mechanism which predicts future cases in a particular county, using data on recent cases in statistically similar counties. We found that the social connectedness of U.S. counties is an important aspect that is captured in this mechanism. For our analysis, we employ the Social Connectedness Index (SCI), a project released through Facebook’s Data for Good initiative, which measures the strength of connectedness between two geographic areas as represented by Facebook friendship ties. Our results show that the social connectedness of statistically similar counties is between two to eight times higher in our model than that of unrelated counties. We hope that this analysis can also provide new insights for other COVID-19 models and illustrate the usefulness of the SCI for COVID-19 forecasting efforts in general.
October 20, 2020 -- COVID-19 has advanced rapidly and unpredictably, stifling reopening plans in some states and introducing new hotspots in others. This potential for resurgence underscores the need for better understanding of the disease’s progression geographically. Building on our commitment to help keep people safe and informed about the virus, we are publishing new AI-powered forecasts that predict the spread of COVID-19 across the entire United States at the county level. These forecasts are leveraging Facebook’s Data for Good tools, including Symptom Survey and Movement Range Maps. They are available on the Humanitarian Data Exchange, and more information about the effort is on our Data for Good site.
Facebook AI has just published a paper with details of the technique used to generate these county-level forecasts. It explains how we trained our AI models using time series data about the spread of the disease consisting of county names, dates, and number of confirmed cases. To inform the model about covariates like mobility, social distancing, and the prevalence of COVID-like symptoms that can influence the spread of the disease, we are also drawing from non-Facebook public data as well as public aggregate data from the Symptom Survey. Additionally, we’re incorporating maps on population movement that researchers and nonprofits are already using to understand the spread of COVID-19. The maps are aggregated to protect people’s privacy and available on the Humanitarian Data Exchange. These large, comprehensive datasets give us a more accurate starting point from which to begin training our models.
To further improve our COVID-19 forecasting, we developed a new neural autoregressive model that aims to disentangle regional from disease-inherent aspects within these datasets. Central to our model is its ability to account for relationships among different counties, so, for example, an uptick in one area can have an impact on predictions for adjacent or similar districts. This allows us to train models where knowledge about the spread of the disease in one area can improve predictions in a different area and thus borrow statistical strength across counties.
We are partnering with the Universitat Politècnica de Catalunya to see how similar forecasting techniques can be applied in Europe. In the meantime, the team is using our U.S. forecasting data to better understand how the pandemic is evolving in different parts of the world. They provide periodic reports to the European Commission (EC) with analyses and predictions of the spread of COVID-19 in European and other countries, as well as the effectiveness of ongoing prevention efforts. Our U.S. county-level forecasts will now be integrated into specific EC-bound reports, which the research team will use to provide a more comprehensive understanding of global hotspots.
Earlier in the year, we began collaborating with local academic experts in New York, New Jersey, and Austria to provide localized forecasting models to academic partners, who in turn shared forecasts with public health authorities and emergency services providers. The information produced by our AI models improved resource planning for in-demand resources, such as hospitals, ICU beds, ventilators, and masks.
We invite others to use our public projections to shape their reopening plans and future efforts relating to COVID-19. We’ll be updating the 14-day forecasts on the Humanitarian Data Exchange every week. While our methods can be used for future forecasting needs in public health and elsewhere, we are focused first on improving its efficacy with regard to COVID-19.
October 2, 2020 — We launched our COVID-19 Community Help hub earlier this year as a resource for people to request help from their neighbors or offer it to them. Now we’ve added a new AI-powered matching feature so that people can get connected more quickly and easily to those offering the particular type of assistance they need. For example, if someone posts an offer to deliver groceries, they’ll see suggestions to connect with people who recently posted about needing this type of assistance. Similarly, if someone requests masks, AI will surface suggested neighbors who recently posted an offer to make face coverings. The matching feature is available in English and 17 other languages.
We built and deployed this matching algorithm using XLM-R, our open source, cross-lingual understanding model that extends our work on XLM and RoBERTa, to produce a relevance score that ranks how closely a request for help matches the current offers for help in that community. The system then integrates the posts’ ranking score into a set of models trained on PyText, our open source framework for natural language processing.
Since the pandemic is an unprecedented and rapidly evolving health crisis affecting people in big and small ways, it was important for us to deploy this feature quickly. XLM-R contributed to a speedy deployment primarily because it learns through self-supervision, eliminating the need for us to create a new manually labeled dataset for training. The model enables our system to identify similar meanings and make matches even when the semantic structures used in posts are very different. For example, “Does anyone have masks my kids can use?” and “We’d like to donate face coverings” would result in a match even though they appear to be semantically unrelated. And unlike existing candidate-matching logic or simple matching heuristics, XLM-R is also able to understand generalized requests or offers for help — such as “I’m happy to lend a hand with whatever you need!” — which are highly prevalent and can be challenging for NLP systems. XLM-R’s one-model-for-many-languages approach improved the model’s performance overall while reducing maintenance and facilitating an easier route to production.
This latest improvement to our COVID-19 Community Help hub builds upon our earlier work in natural language processing to connect more people to aid. In the sections below, we describe how we’re using a different AI technique, our open source XLM pretraining method, to detect requests for assistance and intent to offer help in public News Feed posts, so that we surface a suggestion to publish it on Community Help and reach more people. This is available in more than a dozen languages, and 50 percent of posts in the hub are coming from the AI model.
Just as people are drawing strength from neighbors to cope with COVID-19, they are also leaning on each other to navigate remote learning brought on because of the pandemic. That’s why we’re also using this tech to power our new education category in Community Help, giving parents, teachers, and others more avenues for seeking and lending support and resources. We hope these efforts will make it easier for people to help others in their community. Because of the magnitude of the COVID-19 crisis, we are focused first on using this matching system to accelerate work that’s already going on in communities across the world. We continue to explore other ways of using AI to help people connect and help each other during this crisis and others.
June 18, 2020 — In collaboration with the Faculty of Mathematics and the Data Science research platform at the University of Vienna, we are using AI to generate district-level projections of where and how quickly COVID-19 is spreading in Austria. These sets of local predictions could help authorities and health-care providers better understand how the pandemic is evolving as some areas begin to ease restrictions and local conditions and regulations change.
We use public data shared by the Austrian government about confirmed COVID-19 cases and then generate weekly seven-day forecasts. To build adaptive models that can respond to rapid changes in each given area, we leverage a variety of techniques, including multivariate Hawkes processes, deep relational autoregression, and neural jump stochastic differential equations. All our models account for relationships between different districts, so for example an uptick in one area could impact predictions for adjacent districts. We provide these projections to our partners at the University of Vienna, who use this information to analyze trends and then share results with health officials.
This initiative builds upon our localized COVID-19 forecasting work in the United States for New York and New Jersey. These forecasts can inform planning decisions for allocating resources such as ventilators and masks, as well as forecasting ICU demand. In the future, we may evaluate other sources of data, like mobility maps from Facebook’s Data for Good team, to see whether they help improve the model’s performance.
In late March, we launched the Community Help hub for COVID-19 as a place for people to request or offer help to neighbors. Posts can be created directly on the hub, and we are using natural language processing (NLP) to help make the feature more visible so that more people can receive support or provide support to others. When our model detects a request to get help or an intent to provide help in a public News Feed post, we surface a suggestion to publish it on Community Help so it can reach more people. We’ve internationalized this model using our XLM pretraining method to support more than a dozen languages to start: English, Korean, Japanese, Turkish, Dutch, Swedish, French, Spanish, German, Thai, Portuguese, Arabic, Urdu, Russian, Chinese, Vietnamese, Hindi, Filipino, and Indonesian. We will continue adding more languages. Fifty percent of posts on the hub are coming from this NLP model today. We use similar NLP and intent detection technology to power our Blood Donations feature, where it helps connect donors to people who need blood, and in our Charitable Giving feature, where it suggests adding a “donate” button to posts seeking to raise funds for a particular nonprofit.
Separately, we have joined the Translation Initiative for COVID-19 (TICO-19), a consortium that aims to help enable the translation about the virus in a wide range of languages, including very low-resource languages. Engineers, researchers, and translation managers from a wide range of institutions, including Translators without Borders, Carnegie Mellon University, Johns Hopkins University, Appen, Google, and Translated are contributing to TICO-19.
Facebook AI’s role is to provide translations of specialized terms and phrases related to the pandemic so that professional translators like those with Translators without Borders have dictionaries and reference tools to expedite their work and ensure consistency and accuracy. Together with our industry partners, we are also contributing professional translations of a curated dataset (about 68K words) related to COVID-19 and other medical terminology, which will serve as a benchmark for researchers and help them build specialized, state of the art translation tools that can be quickly deployed when future crises occur. The set will include 37 languages, and we hope this will be helpful to advance the state-of-the-art in low-resource languages such as Dari, Dinka, Hausa, Luganda, Pashto, and Zulu.
April 20, 2020 — Facebook AI has partnered with New York University’s Courant Institute of Mathematical Sciences to create localized forecasting models of the spread of COVID-19. These local predictions can help health-care providers and emergency responders in a specific county determine how best to allocate their resources (for example, deciding when to adjust a clinic’s staffing schedule to prepare for an expected increase in patients). It is challenging to create forecasts at the county level because the patterns in the data are complex and rapidly evolving. But AI is well suited for this challenge. Facebook AI researchers are using publicly available data published by the State of New Jersey and applying Multivariate Hawkes Processes to create daily COVID-19 predictions for the state. Our colleagues at NYU leverage this information in their models to estimate how progression of the disease will affect hospitals, bed and ICU capacity, and local demand for ventilators, masks, and other PPE needs at a hospital and county level. This information is collectively being shared on a daily basis with the State of New Jersey. Similarly, we have started a collaboration with Cornell University using public data published by the State of New York to model the predicted spread of coronavirus in New York, and we are working with other academic experts to scale these techniques.
We are also collaborating with NYU Langone Health’s Predictive Analytics Unit and Department of Radiology to build hospital-specific forecasts for COVID-19, using reinforcement learning, causal modeling, and supervised/self-supervised learning techniques. These models, which learn from de-identified X-rays and CT scans, as well as other de-identified and aggregated clinical data shared with Facebook in accordance with HIPAA, will help experts better allocate resources for clinical needs and optimize workflow across local hospital systems. For example, using these models, they can predict the number of patients whose condition is likely to improve or worsen in a given time period; how many people are likely to be admitted, transferred to ICUs, or discharged; and the number of ventilators, types of tests, and treatments that might be needed. Facebook AI is neither making nor recommending diagnoses for individual patients.
Similarly, we are partnering with the Mila research institute in Montreal to share predictive, causal, and decision algorithms for analyzing clinical data. No data is being shared in this collaboration, but the project will enable Mila to help hospitals in Montreal use their own patient data to better forecast what resources they will need to treat people with COVID-19.
With these joint efforts with NYU Langone and Mila, our immediate focus is on developing models that can learn from de-identified clinical data and help hospitals determine how to use their resources most effectively. As we refine and build on these techniques, we would like to explore ways to quickly scale the benefits to other organizations. This could include open-sourcing code so that other institutions can train models on their own data.
It’s crucial that public health experts understand the spread of the coronavirus and how best to deploy their resources to help people with COVID-19. We are building on the work described here and looking for more ways to use AI to help address this global crisis.
This blog post was updated on October 20. We will continue sharing information here on Facebook AI’s work related to COVID-19.