Scott Wen-tau Yih

RESEARCH SCIENTIST | SEATTLE, UNITED STATES

Scott is a Research Scientist at Facebook AI Research (FAIR) his general research interests include natural language processing, machine learning and information retrieval. He has worked on a variety of problems over the years, including information extraction, semantic role labeling, email spam filtering, keyword extraction and search & ad relevance. His recent work focuses on continuous representations and neural network models, with applications in knowledge base embedding, semantic parsing and question answering.

Scott's Work

Scott's Publications

April 25, 2025

RESEARCH

NLP

ReasonIR: Training Retrievers for Reasoning Tasks

Qiao Rui, Varsha Kishore, Niklas Muennighoff, Daniela Rus, Bryan Kian Hsiang Low, Sewon Min, Pang Wei Koh, Luke Zettlemoyer, Rulin Shao, Scott Yih, Victoria Lin

April 25, 2025

December 17, 2024

NLP

FLAME : Factuality-Aware Alignment for Large Language Models

Luyu Gao, Jimmy Lin, Xilun Chen, Barlas Oguz, Jack Lin, Scott Yih, Wenhan Xiong

December 17, 2024

December 11, 2024

NLP

COMPUTER VISION

Meta CLIP 1.2

Saining Xie, Hu Xu, Bernie Huang, Ching-Feng Yeh, Christine Jou, Christoph Feichtenhofer, Daniel Li (FAIR), Ellen Tan, Gargi Ghosh, Jacob Kahn, Kim Hazelwood, Luke Zettlemoyer, Omer Levy, Philippe Brunet, Ramya Raghavendra, Scott Yih

December 11, 2024

June 14, 2024

NLP

How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Akari Asai, Minghan Li, Jimmy Lin, Xilun Chen, Barlas Oguz, Sheng-Chieh Lin, Scott Yih

June 14, 2024

November 17, 2023

NLP

FactScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Scott Yih, Luke Zettlemoyer, Mike Lewis, Hannaneh Hajishirzi, Kalpesh Krishna, Mohit Iyyer, Pang Wei Koh, Sewon Min, Xinxi Lyu

November 17, 2023

September 03, 2023

NLP

COMPUTER VISION

Retrieval-Augmented Multimodal Language Modeling

Jure Leskovec, Percy Liang, Scott Yih, Armen Aghajanyan, Luke Zettlemoyer, Michihiro Yasunaga, Mike Lewis, Rich James, Weijia Shi

September 03, 2023

June 20, 2023

NLP

CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

Jimmy Lin, Xilun Chen, Asish Ghoshal, Barlas Oguz, Jack Lin, Minghan Li, Scott Yih, Yashar Mehdad

June 20, 2023

June 06, 2023

NLP

Task-aware Retrieval with Instructions

Sebastian Riedel, Hannaneh Hajishirzi, Scott Yih, Akari Asai, Gautier Izacard, Patrick Lewis, Timo Schick, Xilun Chen

June 06, 2023

June 05, 2023

NLP

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering

Yung-Sung Chuang, Wei Fang, James Glass, Scott Yih, Daniel Li (FAIR)

June 05, 2023

October 31, 2022

RESEARCH

NLP

Autoregressive Search Engines: Generating Substrings as Document Identifiers

Fabio Petroni, Giuseppe Ottaviano, Michele Bevilacqua, Patrick Lewis, Scott Yih, Sebastian Riedel

October 31, 2022

July 07, 2022

NLP

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

Xilun Chen, Armen Aghajanyan, Barlas Oguz, Scott Yih, Sonal Gupta, Patrick Huber

July 07, 2022

May 22, 2022

UNIPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

Madian Khabsa, Amjad Almahairi, Hao Ma, Lambert Mathias, Rui Hou, Scott Yih, Yuning Mao, Jiawei Han

May 22, 2022

May 06, 2022

Domain-matched Pre-training Tasks for Dense Retrieval

Barlas Oguz, Aleksandra Piktus, Anchit Gupta, Kushal Lakhotia, Patrick Lewis, Scott Yih, Sebastian Riedel, Sonal Gupta, Vladimir Karpukhin, Xilun Chen, Yashar Mehdad

May 06, 2022

October 15, 2020

CONVERSATIONAL AI

NLP

An Imitation Game for Learning Semantic Parsers from User Interaction

Scott Yih, Huan Sun, Yiqi Tang, Yu Su, Ziyu Yao

October 15, 2020

June 25, 2020

NLP

Language Models as Fact Checkers?

Madian Khabsa, Belinda Li, Hao Ma, Scott Yih, Sinong Wang, Nayeon Lee

June 25, 2020

May 06, 2020

RESEARCH

NLP

TaBert: Pretraining for Joint Understanding of Textual and Tabular Data

Scott Yih, Sebastian Riedel, Graham Neubig, Pengcheng Yin

May 06, 2020