TPDiff: Temporal Pyramid Video Diffusion Model

Show Lab, National University of Singapore
TL;DR: TPDiff reduces the training cost of video diffusion models by 50% and accelerates inference by 1.5 times.

Abstract

The development of video diffusion models unveils a significant challenge: the substantial computational demands. To mitigate this challenge, we note that the reverse process of diffusion exhibits an inherent entropy-reducing nature. Given the inter-frame redundancy in video modality, maintaining full frame rates in high-entropy stages is unnecessary. Based on this insight, we propose TPDiff, a unified framework to enhance training and inference efficiency. By dividing diffusion into several stages, our framework progressively increases frame rate along the diffusion process with only the last stage operating on full frame rate, thereby optimizing computational efficiency. To train the multi-stage diffusion model, we introduce a dedicated training framework: stage-wise diffusion. By solving the partitioned probability flow ordinary differential equations (ODE) of diffusion under aligned data and noise, our training strategy is applicable to various diffusion forms and further enhances training efficiency. Comprehensive experimental evaluations validate the generality of our method, demonstrating 2x faster training and 1.5x faster inference.

Qualitative Results

Method

(a) Pipeline of temporal pyramid video diffusion model. We divide diffusion process into multiple stages with increasing frame rate. In each stage, new frames are initially temporally interpolated from existing frames. b) Our training strategy: stage-wise diffusion. In vanilla diffusion models, the noise direction along the ODE path points toward the real data distribution. In stage-wise diffusion, the noise direction is oriented to the end point of the current stage.

BibTeX

@misc{tpdiff,
      title={TPDiff: Temporal Pyramid Video Diffusion Model}, 
      author={Lingmin Ran and Mike Zheng Shou},
      year={2025},
      eprint={2503.09566},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.09566}, 
}