Scott Wen-tau Yih

RESEARCH SCIENTIST | SEATTLE, UNITED STATES

Scott is a Research Scientist at Facebook AI Research (FAIR) his general research interests include natural language processing, machine learning and information retrieval. He has worked on a variety of problems over the years, including information extraction, semantic role labeling, email spam filtering, keyword extraction and search & ad relevance. His recent work focuses on continuous representations and neural network models, with applications in knowledge base embedding, semantic parsing and question answering.

Scott's Work

Scott's Publications

December 17, 2024

NLP

FLAME : Factuality-Aware Alignment for Large Language Models

Jack Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Scott Yih, Xilun Chen

December 17, 2024

December 11, 2024

NLP

COMPUTER VISION

Meta CLIP 1.2

Hu Xu, Bernie Huang, Ellen Tan, Ching-Feng Yeh, Jacob Kahn, Christine Jou, Gargi Ghosh, Omer Levy, Luke Zettlemoyer, Scott Yih, Philippe Brunet, Kim Hazelwood, Ramya Raghavendra, Daniel Li (FAIR), Saining Xie, Christoph Feichtenhofer

December 11, 2024

June 14, 2024

NLP

How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Scott Yih, Xilun Chen

June 14, 2024

November 17, 2023

NLP

FactScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Scott Yih, Luke Zettlemoyer, Mike Lewis, Hannaneh Hajishirzi, Kalpesh Krishna, Mohit Iyyer, Pang Wei Koh, Sewon Min, Xinxi Lyu

November 17, 2023

September 03, 2023

NLP

COMPUTER VISION

Retrieval-Augmented Multimodal Language Modeling

Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Scott Yih

September 03, 2023

June 20, 2023

NLP

CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

Minghan Li, Jack Lin, Barlas Oguz, Asish Ghoshal, Jimmy Lin, Yashar Mehdad, Scott Yih, Xilun Chen

June 20, 2023

June 06, 2023

NLP

Task-aware Retrieval with Instructions

Akari Asai, Timo Schick, Patrick Lewis, Xilun Chen, Gautier Izacard, Sebastian Riedel, Hannaneh Hajishirzi, Scott Yih

June 06, 2023

June 05, 2023

NLP

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering

Yung-Sung Chuang, Wei Fang, Daniel Li (FAIR), Scott Yih, James Glass

June 05, 2023

October 31, 2022

RESEARCH

NLP

Autoregressive Search Engines: Generating Substrings as Document Identifiers

Fabio Petroni, Giuseppe Ottaviano, Michele Bevilacqua, Patrick Lewis, Scott Yih, Sebastian Riedel

October 31, 2022

July 07, 2022

NLP

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

Xilun Chen, Armen Aghajanyan, Barlas Oguz, Scott Yih, Sonal Gupta, Patrick Huber

July 07, 2022

May 22, 2022

UNIPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

Madian Khabsa, Amjad Almahairi, Hao Ma, Lambert Mathias, Rui Hou, Scott Yih, Yuning Mao, Jiawei Han

May 22, 2022

May 06, 2022

Domain-matched Pre-training Tasks for Dense Retrieval

Barlas Oguz, Aleksandra Piktus, Anchit Gupta, Kushal Lakhotia, Patrick Lewis, Scott Yih, Sebastian Riedel, Sonal Gupta, Vladimir Karpukhin, Xilun Chen, Yashar Mehdad

May 06, 2022

October 15, 2020

CONVERSATIONAL AI

NLP

An Imitation Game for Learning Semantic Parsers from User Interaction

Scott Yih, Huan Sun, Yiqi Tang, Yu Su, Ziyu Yao

October 15, 2020

June 25, 2020

NLP

Language Models as Fact Checkers?

Madian Khabsa, Belinda Li, Hao Ma, Scott Yih, Sinong Wang, Nayeon Lee

June 25, 2020

May 06, 2020

RESEARCH

NLP

TaBert: Pretraining for Joint Understanding of Textual and Tabular Data

Scott Yih, Sebastian Riedel, Graham Neubig, Pengcheng Yin

May 06, 2020