META FAIR RESEARCH

Seamless Interaction

We are introducing a family of audiovisual behavioral motion models, which are compatible with both 2D and 3D renderings and trained on the Seamless Interaction Dataset.

RESEARCH OVERVIEW

Modeling two-party conversation dynamics

Advancing AI research modeling of face-to-face dynamics, including expressive gestures, active listening, turn-taking and visual synchrony.

RESEARCH CAPABILITY

Dyadic motion models

Our generative motion models that have been trained on dual sided conversations, can generate synchronous reactions, body gestures and facial expressions as seen in the examples below.

Research model output: Our motion model generates faces and body gestures that match the flow of conversation. Watch as the AI-generated individual on the right side uses hand gestures in sync with their words, like when saying "chill".

Research model output: Observe how the AI-generated individual on the right side raises their hands when their face looks up, emphasizing an expressive point.

Research model output: The AI-generated individual on the left side actively listens, nodding and maintaining eye contact while backchanneling.

Research model output: Our motion model captures the dynamic interplay of gestures and facial expressions that unfold throughout a conversation.

Additional capabilities

Our dyadic models are able to react to visual inputs and offer controllability in facial expressiveness.

Controllability

Research model output: Here are two versions of the same avatar, with the individual on the right side exhibiting greater expressiveness compared to the one on the left side.

Research model output: Notice how the more expressive avatar's smile is larger and has more active head nodding.

Visual input

Research model output: The responder reacts to the initiator side's playful wink.

Research model output: The responder responds to the initiator's expression of surprise.

RENDERING COMPATIBILITY

2D videos and 3D Codec Avatars

The outputs of our dyadic motion models are compatible with 2D and 3D renderings.

Visual rendering in 3D

Visual rendering in 2D photorealistic style

Visual rendering in 3D

Visual rendering in 3D

Visual rendering in 2D photorealistic style

DATASET

Seamless Interaction Dataset

The Seamless Interaction Dataset comprises over 4,000 hours of full-body, in-person, human face-to-face interaction videos. All our dyadic motion models were trained using this dataset.

Explore the dataset

4000+ Human participants

4,000+ participants, featuring naturalistic conversations between familiar pairs and professional actors.

65k+ Interactions

65,000 individual interactions spanning from casual to intense moments.

4000+ Hours

4,000+ hours of dyadic conversations, highlighting the breadth of conversational dynamics.

5000+ Annotated samples

5,000+ detailed annotations capturing self-described internal emotional states and visual behaviors.

1300+ Unique prompts

1,300+ unique interaction scenarios based on established psychological theory.

4K Video recordings

Videos recorded in 4K resolution.

Introducing a new era in AI communication

We explore dyadic motion modeling and its potential to transform the way we interact with AI systems, enabling more nuanced, expressive, and human-like interactions.

RESOURCES

Explore more of Seamless Interaction

Read the paper
Read the blog post
Download the dataset on GitHub
Download the dataset on HuggingFace