Playable Environments: Video Manipulation in Space and Time

CVPR 2022

Willi Menapace^☨, Stéphane Lathuilière^, Aliaksandr Siarohin,
Christian Theobalt^, Sergey Tulyakov^, Vladislav Golyanik^, Elisa Ricci^*

^☨ Work partially done while interning at MPI for Informatics
^* Equal senior contribution

Overview Datasets Interactive Videos Action Conditioning Reconstruction Camera Manipulation Style Manipulation

Action Conditioning Evaluation

In the following video we show the effects of actions on the generated video sequence. In each row we consider a starting frame and in each column we condsider a learned action. We generate a video starting from the initial frame for each of the learned actions.

Minecraft

Our method learns a set of actions whose meaning is consistent and independent from the starting frame. The model learns actions that correspond to the main movement directions. Note that each action is expressed relative to the current orientation of the camera.

Tennis

Similarly to the Minecraft dataset, our method learns a consistent action representation. The actions consistently capture each of the possible player movements.