PhysChoreo: Physics-Controllable Video Generation with Part-Aware Semantic Grounding

Haoze Zhang1, Tianyu Huang1, Zichen Wan1, Xiaowei Jin1, Hongzhi Zhang1, Hui Li1, Wangmeng Zuo1*
1Harbin Institute of Technology
*Corresponding Author

Demo Showcase

PhysChoreo enables diverse physics-controllable video generation capabilities

Abstract

While recent video generation models have achieved significant visual fidelity, they often suffer from the lack of explicit physical controllability and plausibility. To address this, some recent studies attempted to guide the video generation with physics-based rendering. However, these methods face inherent challenges in accurately modeling complex physical properties and effectively controlling the resulting physical behavior over extended temporal sequences.

In this work, we introduce PhysChoreo, a novel framework that can generate videos with diverse controllability and physical realism from a single image. Our method consists of two stages: first, it estimates the static initial physical properties of all objects in the image through part-aware physical property reconstruction. Then, through temporally instructed and physically editable simulation, it synthesizes high-quality videos with rich dynamic behaviors and physical realism.

Experimental results show that PhysChoreo can generate videos with rich behaviors and physical realism, outperforming state-of-the-art methods on multiple evaluation metrics.

Method Pipeline

PhysChoreo Pipeline

Overview of our pipeline. Given the input image and text prompt, we first reconstruct the initial material field of each object from the image. Then we generate the scene's trajectory video based on a physics-editable simulator with temporal instructions, and finally the trajectory video is used as conditional control to guide the generation of generative video model.

Comparison with State-of-the-Art Methods

We compare PhysChoreo with CogVideo, PhysGen3D, Veo, and Wan across diverse physics scenarios.
Use arrows to switch between different scenes within each category.

Bounce

Ball Teddy (1/3)

Ours

CogVideo

PhysGen3D

Veo

Wan

Ours

CogVideo

PhysGen3D

Veo

Wan

Ours

CogVideo

PhysGen3D

Veo

Wan

Crash

Splash Balls (1/1)

Ours

CogVideo

PhysGen3D

Veo

Wan

Fall

Knife (1/2)

Ours

CogVideo

PhysGen3D

Veo

Wan

Ours

CogVideo

PhysGen3D

Veo

Wan

Fly

Hat Trashcan (1/2)

Ours

CogVideo

PhysGen3D

Veo

Wan

Ours

CogVideo

PhysGen3D

Veo

Wan

Melt

Teddy (1/2)

Ours

CogVideo

PhysGen3D

Veo

Wan

Ours

CogVideo

PhysGen3D

Veo

Wan

BibTeX

@article{zhang2025physchoreo,
  author    = {Haoze Zhang and Tianyu Huang and Zichen Wan and Xiaowei Jin and Hongzhi Zhang and Hui Li and Wangmeng Zuo},
  title     = {PhysChoreo: Physics-Controllable Video Generation with Part-Aware Semantic Grounding},
  journal   = {arXiv preprint},
  year      = {2025},
}