July 27, 2019
The majority of conversations a dialogue agent sees over its lifetime occur after it has already been trained and deployed, leaving a vast store of potential training signal untapped. In this work, we propose the self-feeding chatbot, a dialogue agent with the ability to extract new training examples from the conversations it participates in. As our agent engages in conversation, it also estimates user satisfaction in its responses. When the conversation appears to be going well, the user's responses become new training examples to imitate. When the agent believes it has made a mistake, it asks for feedback; learning to predict the feedback that will be given improves the chatbot's dialogue abilities further. On the PersonaChat chit-chat dataset with over 131k training examples, we find that learning from dialogue with a self-feeding chatbot significantly improves performance, regardless of the amount of traditional supervision.
Publisher
ACL
Research Topics
July 23, 2024
Llama team
July 23, 2024
May 06, 2024
Gregoire Mialon, Yann LeCun, Thomas Scialom, Clémentine Fourrier, Thomas Wolf
May 06, 2024
April 23, 2024
Sachit Menon, Ishan Misra, Rohit Girdhar
April 23, 2024
April 05, 2024
Suyu Ge, Chunting Zhou, Rui Hou, Madian Khabsa, Yi-Chia Wang, Qifan Wang, Jiawei Han, Yuning Mao
April 05, 2024
Product experiences
Foundational models
Product experiences
Latest news
Foundational models