The architecture comprises a region division module with orange transition zones between regions. Each region contains a core area (in red) and an adjacent transition zone, processed through various sparsity levels for coarse, medium, and fine data. These data feed into a multi-level fusion module with a DensiNet for data augmentation. Within the DensiNet, we input poses into MipNeRF to optimize intestinal geometry learning. A DINO-ViT module is used for supervised training. After processing, the region integration module executes information filtering, fusion, and rendering across all blocks.
Colonoscopy reconstruction is pivotal for diagnosing colorectal cancer. However, accurate long-sequence colonoscopy reconstruction faces three major challenges: (1) dissimilarity among segments of the colon due to its meandering and convoluted shape; (2) co-existence of simple and intricately folded geometry structures; (3) sparse viewpoints due to constrained camera trajectories.
To tackle these challenges, we introduce a new reconstruction framework based on neural radiance field (NeRF), named ColonNeRF, which leverages neural rendering for novel view synthesis of long-sequence colonoscopy. Specifically, to reconstruct the entire colon in a piecewise manner, our ColonNeRF introduces a region division and integration module, effectively reducing shape dissimilarity and ensuring geometric consistency in each segment. To learn both the simple and complex geometry in a unified framework, our ColonNeRF incorporates a multi-level fusion module that progressively models the colon regions from easy to hard. Additionally, to overcome the challenges from sparse views, we devise a DensiNet module for densifying camera poses under the guidance of semantic consistency.
We conduct extensive experiments on both synthetic and real-world datasets to evaluate our ColonNeRF. Quantitatively, our ColonNeRF outperforms existing methods on two benchmarks over four evaluation metrics. Notably, our LPIPS-ALEX scores exhibit a substantial increase of about 67%-85% on the SimCol-to-3D dataset. Qualitatively, our reconstruction visualizations show much clearer textures and more accurate geometric details. These sufficiently demonstrate our superior performance over the state-of-the-art methods.
@article{shi2023colonnerf,
title = {ColonNeRF: Neural Radiance Fields for High-Fidelity Long-Sequence Colonoscopy Reconstruction},
author = {Shi, Yufei and Lu, Beijia and Liu, Jia-Wei and Li, Ming and Shou, Mike Zheng},
journal = {arXiv preprint arXiv:2312.02015},
year = {2023},
}