Products

AI Research

Resources

About

Products

Scott Wen-tau Yih

RESEARCH SCIENTIST | SEATTLE, UNITED STATES

Scott is a Research Scientist at Facebook AI Research (FAIR) his general research interests include natural language processing, machine learning and information retrieval. He has worked on a variety of problems over the years, including information extraction, semantic role labeling, email spam filtering, keyword extraction and search & ad relevance. His recent work focuses on continuous representations and neural network models, with applications in knowledge base embedding, semantic parsing and question answering.

Personal Website

Research Areas

Computer Vision

Conversational AI

Human & Machine Intelligence

Natural Language Processing (NLP)

Ranking & Recommendations

Scott's Work

Model-based Interactive Semantic Parsing:A Unified Framework and A Text-to-SQL Case Study

Scott's Publications

July 13, 2026

AR/VR

RESEARCH

S-EMBER: A Large-Scale Benchmark for Streaming Egocentric Memory Retrieval

Xiaodong Wang, Xuanyi Zhao, Pedro Rodriguez, Devendra Singh Sachan, Barlas Oguz, Seungwhan Moon, Shang-Wen Li, Gargi Ghosh, Xin Dong, Wen-Tau Yih

July 13, 2026

Scott Wen-tau Yih

RESEARCH SCIENTIST | SEATTLE, UNITED STATES

Personal Website

Research Areas

Computer Vision

Conversational AI

Human & Machine Intelligence

Natural Language Processing (NLP)

Ranking & Recommendations

Scott's Work

Model-based Interactive Semantic Parsing:A Unified Framework and A Text-to-SQL Case Study

Scott's Publications

AR/VR

RESEARCH

S-EMBER: A Large-Scale Benchmark for Streaming Egocentric Memory Retrieval

RESEARCH

NLP

ReasonIR: Training Retrievers for Reasoning Tasks

NLP

FLAME : Factuality-Aware Alignment for Large Language Models

NLP

COMPUTER VISION

Meta CLIP 1.2

NLP

How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

NLP

FactScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

NLP

COMPUTER VISION

Retrieval-Augmented Multimodal Language Modeling

NLP

CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

NLP

Task-aware Retrieval with Instructions

NLP

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering

RESEARCH

NLP

Autoregressive Search Engines: Generating Substrings as Document Identifiers

NLP

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

UNIPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

Domain-matched Pre-training Tasks for Dense Retrieval

CONVERSATIONAL AI

NLP

An Imitation Game for Learning Semantic Parsers from User Interaction

NLP

Language Models as Fact Checkers?

RESEARCH

NLP

TaBert: Pretraining for Joint Understanding of Textual and Tabular Data