RESEARCH

NLP

Learning an Unreferenced Metric for Online Dialogue Evaluation

June 25, 2020

Abstract

Evaluating the quality of a dialogue interaction between two agents is a difficult task, especially in open-domain chit-chat style dialogue. There have been recent efforts to develop automatic dialogue evaluation metrics, but most of them do not generalize to unseen datasets and/or need a human-generated reference response during inference, making it infeasible for online evaluation. Here, we propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances, and leverages the temporal transitions that exist between them. We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.

Download the Paper

AUTHORS

Written by

Koustuv Sinha

Joelle Pineau

Jasmine Wang

Prasanna Parthasarathi

Ryan Lowe

William L Hamilton

Publisher

ACL

Related Publications

January 04, 2025

NLP

Transformers are Multi-State RNNs

Matanel Oren, Michael Hassid, Yossef (Yossi) Adi, Roy Schwartz

January 04, 2025

December 17, 2024

NLP

FLAME : Factuality-Aware Alignment for Large Language Models

Jack Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Scott Yih, Xilun Chen

December 17, 2024

December 12, 2024

NLP

CORE MACHINE LEARNING

Memory Layers at Scale

Vincent-Pierre Berges, Barlas Oguz

December 12, 2024

December 12, 2024

NLP

Byte Latent Transformer: Patches Scale Better Than Tokens

Artidoro Pagnoni, Ram Pasunuru, Pedro Rodriguez, John Nguyen, Benjamin Muller, Margaret Li, Chunting Zhou, Lili Yu, Jason Weston, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Ari Holtzman, Srini Iyer

December 12, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.