Given either a flattened poster image, OmniPSD produces layered PSD files with transparent alpha channels, separating text, foreground elements, and background into clean RGBA layers that can be directly edited in tools. An online interactive demo is available at Lovart AI.
OmniPSD: a unified Diffusion-Transformer with a shared RGBA-VAE supports text-to-PSD layered generation and image-to-PSD decomposition, producing fully editable PSD layers with transparent alpha channels.
Image-to-PSD reconstruction uses a diffusion-based flow-matching model to iteratively decompose a flattened poster into editable layers.
Text-to-PSD synthesis leverages a 2×2 in-context RGBA grid and hierarchical captions within a unified diffusion transformer to jointly generate editable layers.
OmniPSD delivers sharper reconstructions and cleaner separated text, foreground, and background layers, while better preserving the original layout and colors.
We produce more coherent foreground/background layering and better aligned editable text layers for PSD export.
@article{Liu2025OmniPSD,
title = {OmniPSD: Layered PSD Generation with Diffusion Transformer},
author = {Liu, Cheng and Song, Yiren and Wang, Haofan and Shou, Mike Zheng},
journal = {arXiv preprint arXiv:2512.09247},
year = {2025},
archivePrefix = {arXiv},
eprint = {2512.09247},
primaryClass = {cs.CV},
doi = {10.48550/arXiv.2512.09247},
url = {https://arxiv.org/abs/2512.09247}
}