Humans can intuitively parallelise complex activities, but can a model learn this from observing a single person?
Given one egocentric video, we introduce the N-Body Problem: how N individuals, can hypothetically perform the
same set of tasks observed in this video.
The goal is to maximise speed-up, but naive assignment of video segments to
individuals often violates real-world constraints, leading to physically impossible scenarios like two people using the
same object or occupying the same space.
We formalise the N-Body Problem and propose a suite of metrics to evaluate both performance (speed-up, task coverage) and feasibility (spatial collisions, object conflicts and causal constraints).
We use structured prompting on Gemini Pro to reason about the 3D environment, object usage, and temporal dependencies to produce a viable parallel execution.
Interactive Demo (Move Between 1-Body [left] and 2-Body [right] Execution)
Single-Person Video
Parallel Execution (N=2)
Video Demo
Citation
@article{zhu2025nbody,
title = {The N-Body Problem: Parallel Execution from Single-Person Egocentric Video},
author = {Zhu, Zhifan and Huang, Yifei and Sato, Yoichi and Damen, Dima},
year = {2025},
eprint = {2512.11393},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}