EgoCS-400K: An Egocentric Gameplay Dataset for World Models
A new large‑scale egocentric gameplay dataset (EgoCS-400K) provides temporally aligned video, action, and language trajectories to support the development of interactive world models.
Advances in video generation have highlighted the need for interactive world models that go beyond static captions, requiring data that couples visual observations with the underlying actions, camera motions, state changes, and events that drive future scene evolution.
Challenges in Data Acquisition
The scarcity of datasets that simultaneously capture egocentric video, precise action labels, and corresponding language descriptions limits the training of robust world models. Web video collections offer visual diversity but lack executable actions and reliable state annotations, while robotic datasets provide fine‑grained state and action supervision at high