Publications
Reseach Goals: Faithfulness, Efficiency, Creativity.
2025
- TPAMILeRF: Learning Resampling Function for Adaptive and Efficient Image InterpolationJiacheng Li, Chang Chen, Fenglong Song, Youliang Yan, and 1 more authorIEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2025Efficiency Look-Up Table
Image resampling is a basic technique that is widely employed in daily applications, such as camera photo editing. Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors. Still, these methods are not the perfect substitute for interpolation, due to the drawbacks in efficiency and versatility. In this work, we propose a novel method of Learning Resampling Function (termed LeRF), which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption of interpolation. Specifically, LeRF assigns spatially varying resampling functions to input image pixels and learns to predict the hyper-parameters that determine the shapes of these resampling functions with a neural network. Based on the formulation of LeRF, we develop a family of models, including both efficiency-orientated and performance-orientated ones. To achieve interpolation-level efficiency, we adopt look-up tables (LUTs) to accelerate the inference of the learned neural network. Furthermore, we design a directional ensemble strategy and edge-sensitive indexing patterns to better capture local structures. On the other hand, to obtain DNN-level performance, we propose an extension of LeRF to enable it in cooperation with pre-trained upsampling models for cascaded resampling. Extensive experiments show that the efficiency-orientated version of LeRF runs as fast as interpolation, generalizes well to arbitrary transformations, and outperforms interpolation significantly, e.g., up to 3dB PSNR gain over Bicubic for x2 upsampling on Manga109. Besides, the performance-orientated version of LeRF reaches comparable performance with existing DNNs at much higher efficiency, e.g., less than 25% running time on a desktop GPU.
- CVPRPlug-and-Play Versatile Compressed Video EnhancementHuimin Zeng, Jiacheng Li, and Zhiwei XiongIn IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025Faithfulness Video Codec Quality Enhancement
As a widely adopted technique in data transmission, video compression effectively reduces the size of files, making it possible for real-time cloud computing. However, it comes at the cost of visual quality, posing challenges to the robustness of downstream vision models. In this work, we present a versatile codec-aware enhancement framework that reuses codec information to adaptively enhance videos under different compression settings, assisting various downstream vision tasks without introducing computation bottleneck. Specifically, the proposed codec-aware framework consists of a compression-aware adaptation (CAA) network that employs a hierarchical adaptation mechanism to estimate parameters of the frame-wise enhancement network, namely the bitstream-aware enhancement (BAE) network. The BAE network further leverages temporal and spatial priors embedded in the bitstream to effectively improve the quality of compressed input frames. Extensive experimental results demonstrate the superior quality enhancement performance of our framework over existing enhancement methods, as well as its versatility in assisting multiple downstream tasks on compressed videos as a plug-and-play module. Code and models are available at https://github.com/ZeldaM1/PnP-VCVE.
- ICLRLearning Gain Map for Inverse Tone MappingYinuo Liao, Yuanshen Guan, Ruikang Xu, Jiacheng Li, and 2 more authorsIn International Conference on Learning Representations (ICLR), 2025Faithfulness High Dynamic Range (HDR) Inverse Tone Mapping (ITM)
For a more compatible and consistent high dynamic range (HDR) viewing experience, a new image format with a double-layer structure has been developed recently, which incorporates an auxiliary Gain Map (GM) within a standard dynamic range (SDR) image for adaptive HDR display. This new format motivates us to introduce a new task termed Gain Map-based Inverse Tone Mapping (GM-ITM), which focuses on learning the corresponding GM of an SDR image instead of directly estimating its HDR counterpart, thereby enabling a more effective up-conversion by leveraging the advantages of GM. The main challenge in this task, however, is to accurately estimate regional intensity variation with the fluctuating peak value. To this end, we propose a dual-branch network named GMNet, consisting of a Local Contrast Restoration (LCR) branch and a Global Luminance Estimation (GLE) branch to capture pixel-wise and image-wise information for GM estimation. Moreover, to facilitate the future research of the GM-ITM task, we build both synthetic and real-world datasets for comprehensive evaluations: synthetic SDR-GM pairs are generated from existing HDR resources, and real-world SDR-GM pairs are captured by mobile devices. Extensive experiments on these datasets demonstrate the superiority of our proposed GMNet over existing HDR-related methods both quantitatively and qualitatively. The codes and datasets are available at https://github.com/qtlark/GMNet.
- WACVMulti-Spectral Image Color ReproductionJiacheng Li*, Chang Chen*, Xue Hu, Fenglong Song, and 2 more authorsIn Winter Conference on Applications of Computer Vision (WACV), 2025Faithfulness Image Signal Pipeline (ISP) Color Science Multi-Spectral
From camera to screen researchers have developed a well-established system for capturing and reproducing the color experience of human eyes. In this study we aim to upgrade this process by transiting from conventional RGB to multi-spectral image (MSI) color reproduction. While MSI offers evident advantages in color matching we find out it is not trivial to make good use of more spectral information for color constancy. Therefore we present a regularized color reproduction system that incorporates a spectral prior-guided optimization strategy to establish a sensor-optimized RGB projection for color matching along with a learning-based chromatic adaptation model for color constancy. Specifically we define the RGB projection through an end-to-end optimization under the guidance of sensor spectral sensitivities. Subsequently we devise a chromatic adaptation neural network that estimates the scene illuminance and an illuminance-adaptive matrix for auto white balancing and dynamic color correction respectively. Comprehensive experiments show the superiority of our system compared to alternative solutions.
- WACVAll-in-One Image Compression and RestorationHuimin Zeng, Jiacheng Li, Ziqiang Zheng, and Zhiwei XiongIn Winter Conference on Applications of Computer Vision (WACV), 2025Faithfulness Image Codec
Visual images corrupted by various types and levels of degradations are commonly encountered in practical image compression. However most existing image compression methods are tailored for clean images therefore struggling to achieve satisfying results on these images. Joint compression and restoration methods typically focus on a single type of degradation and fail to address a variety of degradations in practice. To this end we propose a unified framework for all-in-one image compression and restoration which incorporates the image restoration capability against various degradations into the process of image compression. The key challenges involve distinguishing authentic image content from degradations and flexibly eliminating various degradations without prior knowledge. Specifically the proposed framework approaches these challenges from two perspectives: ie content information aggregation and degradation representation aggregation. Extensive experiments demonstrate the following merits of our model: 1) superior rate-distortion (RD) performance on various degraded inputs while preserving the performance on clean data; 2) strong generalization ability to real-world and unseen scenarios; 3) more efficient over compared methods. Our code is available at https://github.com/ZeldaM1/All-in-one.
2024
- TPAMIToward DNN of LUTs: Learning Efficient Image Restoration With Multiple Look-Up TablesJiacheng Li, Chang Chen, Zhen Cheng, and Zhiwei XiongIEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2024Efficiency Look-Up Table
The widespread usage of high-definition screens on edge devices stimulates a strong demand for efficient image restoration algorithms. The way of caching deep learning models in a look-up table (LUT) is recently introduced to respond to this demand. However, the size of a single LUT grows exponentially with the increase of its indexing capacity, which restricts its receptive field and thus the performance. To overcome this intrinsic limitation of the single-LUT solution, we propose a universal method to construct multiple LUTs like a neural network, termed MuLUT. Firstly, we devise novel complementary indexing patterns, as well as a general implementation for arbitrary patterns, to construct multiple LUTs in parallel. Secondly, we propose a re-indexing mechanism to enable hierarchical indexing between cascaded LUTs. Finally, we introduce channel indexing to allow cross-channel interaction, enabling LUTs to process color channels jointly. In these principled ways, the total size of MuLUT is linear to its indexing capacity, yielding a practical solution to obtain superior performance with the enlarged receptive field. We examine the advantage of MuLUT on various image restoration tasks, including super-resolution, demosaicing, denoising, and deblocking. MuLUT achieves a significant improvement over the single-LUT solution, e.g., up to 1.1dB PSNR for super-resolution and up to 2.8dB PSNR for grayscale denoising, while preserving its efficiency, which is 100× less in energy cost compared with lightweight deep neural networks. Our code and trained models are publicly available at https://github.com/ddlee-cn/MuLUT.
- TMMRegion-Aware Portrait Retouching With Sparse Interactive GuidanceHuimin Zeng, Jie Huang, Jiacheng Li, and Zhiwei XiongIEEE Transactions on Multimedia (T-MM), 2024Creativity Image Enhancement Human Interaction
Portrait retouching aims to improve the aesthetic quality of input portrait photos and especially requires human-region priority. The deep learning-based methods largely provide promising retouched results. However, existing portrait retouching methods focus on automatic retouching, which treats all human-regions equally and ignores users’ preferences for specific individuals, thus suffering from limited flexibility in interactive scenarios. In this work, we emphasize the importance of users’ intents and explore the interactive portrait retouching task. Specifically, we propose a region-aware retouching framework with two branches: an automatic branch and an interactive branch. The automatic branch involves an encoding-decoding process, which searches region candidates and performs automatic region-aware retouching without user guidance. The interactive branch encodes sparse user guidance into a priority condition vector and modulates latent features with a region selection module to further emphasize the user-specified regions. Experimental results show that our interactive branch effectively captures users’ intents and generalizes well to unseen scenes with sparse user guidance, while our automatic branch also outperforms the state-of-the-art retouching methods due to improved region-awareness.
- MICCAIJoint EM Image Denoising and Segmentation with Instance-Aware InteractionZhicheng Wang, Jiacheng Li, Yinda Chen, Jiateng Shou, and 3 more authorsIn International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024Creativity Instance Segmentation
In large scale electron microscopy(EM), the demand for rapid imaging often results in significant amounts of imaging noise, which considerably compromise segmentation accuracy. While conventional approaches typically incorporate denoising as a preliminary stage, there is limited exploration into the potential synergies between denoising and segmentation processes. To bridge this gap, we propose an instance-aware interaction framework to tackle EM image denoising and segmentation simultaneously, aiming at mutual enhancement between the two tasks. Specifically, our framework comprises three components: a denoising network, a segmentation network, and a fusion network facilitating feature-level interaction. Firstly, the denoising network mitigates noise degradation. Subsequently, the segmentation network learns an instance-level affinity prior, encoding vital spatial structural information. Finally, in the fusion network, we propose a novel Instance-aware Embedding Module (IEM) to utilize vital spatial structure information from segmentation features for denoising. IEM enables interaction between the two tasks within a unified framework, which also facilitates implicit feedback from denoising for segmentation with a joint training mechanism. Through extensive experiments across multiple datasets, our framework demonstrates substantial performance improvements over existing solutions. Moreover, our framework exhibits strong generalization capabilities across different network architectures. Code is available at https://github.com/zhichengwang-tri/EM-DenoiSeg.
- ACM MMMLP Embedded Inverse Tone MappingPanjun Liu, Jiacheng Li, Lizhi Wang, Zheng-Jun Zha, and 1 more authorIn ACM International Conference on Multimedia (ACM MM), 2024Faithfulness High Dynamic Range (HDR) Inverse Tone Mapping (ITM)
The advent of High Dynamic Range/Wide Color Gamut (HDR/WCG) display technology has made significant progress in providing exceptional richness and vibrancy for the human visual experience. However, the widespread adoption of HDR/WCG images is hindered by their substantial storage requirements, imposing significant bandwidth challenges during distribution. Besides, HDR/WCG images are often tone-mapped into Standard Dynamic Range (SDR) versions for compatibility, necessitating the usage of inverse Tone Mapping (iTM) techniques to reconstruct their original representation. In this work, we propose a meta-transfer learning framework for practical HDR/WCG media transmission by embedding image-wise metadata into their SDR counterparts for later iTM reconstruction. Specifically, we devise a meta-learning strategy to pre-train a lightweight multilayer perceptron (MLP) model that maps SDR pixels to HDR/WCG ones on an external dataset, resulting in a domain-wise iTM model. Subsequently, for the transfer learning process of each HDR/WCG image, we present a spatial-aware online mining mechanism to select challenging training pairs to adapt the meta-trained model to an image-wise iTM model. Finally, the adapted MLP, embedded as metadata, is transmitted alongside the SDR image, facilitating the reconstruction of the original image on HDR/WCG displays. We conduct extensive experiments and evaluate the proposed framework with diverse metrics. Compared with existing solutions, our framework shows superior performance in fidelity, minimal latency, and negligible overhead. The codes are available at https://github.com/pjliu3/MLP_iTM.
- VCIPIn-Loop Filtering via Trained Look-Up TablesZhuoyuan Li, Jiacheng Li, Yao Li, Li Li, and 2 more authorsIn IEEE International Conference on Visual Communications and Image Processing (VCIP), 2024Efficiency Video Codec Look-Up Table
In-loop filtering (ILF) is a key technology in image/video coding for reducing the artifacts. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, establishing themselves a promising candidate tool for future standards. However, the utilization of deep neural networks (DNN) brings high computational complexity and raises high demand of dedicated hardware, which is challenging to apply into general use. To address this limitation, we study an efficient in-loop filtering scheme by adopting look-up tables (LUTs). After training a DNN with a predefined reference range for in-loop filtering, we cache the output values of the DNN into a LUT via traversing all possible inputs. In the coding process, the filtered pixel is generated by locating the input pixels (to-be-filtered pixel and reference pixels) and interpolating between the cached values. To further enable larger reference range within the limited LUT storage, we introduce an enhanced indexing mechanism in the filtering process, and a clipping/finetuning mechanism in the training. The proposed method is implemented into the Versatile Video Coding (VVC) reference software, VTM-11.0. Experimental results show that the proposed method, with three different configurations, achieves on average 0.13%∼0.51%, and 0.10% ∼0.39% BD-rate reduction under the all-intra (AI) and random-access (RA) configurations respectively. The proposed method incurs only 1% ∼8% time increase, an additional computation of 0.13 ∼0.93 kMAC/pixel, and 164 ∼1148 KB storage cost for a single model. Our method has explored a new and more practical approach for neural network-based ILF.
- CVPRLook-Up Table Compression for Efficient Image RestorationYinglong Li, Jiacheng Li, and Zhiwei XiongIn IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024Efficiency Model Compression Look-Up Table
Look-Up Table (LUT) has recently gained increasing attention for restoring High-Quality (HQ) images from Low-Quality (LQ) observations thanks to its high computational efficiency achieved through a" space for time" strategy of caching learned LQ-HQ pairs. However incorporating multiple LUTs for improved performance comes at the cost of a rapidly growing storage size which is ultimately restricted by the allocatable on-device cache size. In this work we propose a novel LUT compression framework to achieve a better trade-off between storage size and performance for LUT-based image restoration models. Based on the observation that most cached LQ image patches are distributed along the diagonal of a LUT we devise a Diagonal-First Compression (DFC) framework where diagonal LQ-HQ pairs are preserved and carefully re-indexed to maintain the representation capacity while non-diagonal pairs are aggressively subsampled to save storage. Extensive experiments on representative image restoration tasks demonstrate that our DFC framework significantly reduces the storage size of LUT-based models (including our new design) while maintaining their performance. For instance DFC saves up to 90% of storage at a negligible performance drop for x4 super-resolution. The source code is available on GitHub: https://github.com/leenas233/DFC.
2023
- OEFast 3D Reconstruction via Event-based Structured Light with Spatio-temporal CodingJiacheng Fu, Yueyi Zhang, Yue Li, Jiacheng Li, and 1 more authorOpt. Express, 2023Creativity Event Camera Structured Light Depth Sensing
Event-based structured light (SL) systems leverage bio-inspired event cameras, which are renowned for their low latency and high dynamics, to drive progress in high-speed structured light systems. However, existing event-based structured light methods concentrate on the independent construction of either time-domain or space-domain features for stereo matching, ignoring the spatio-temporal consistency towards depth. In this work, we build an event-based SL system that consists of a laser point projector and an event camera, and we devise a spatial-temporal coding strategy that realizes depth encoding in dual domains through a single shot. To exploit the spatio-temporal synergy, we further present STEM, a novel Spatio-Temporal Enhanced Matching approach for 3D reconstruction. STEM is comprised of two parts, the spatio-temporal enhancing (STE) algorithm and the spatio-temporal matching (STM) algorithm. Specifically, STE integrates the dual-domain information to increase the saliency of the temporal coding, providing a more robust basis for matching. STM is a stereo matching algorithm explicitly tailored to the unique characteristics of event data modality, which computes the disparity via a meticulously designed hybrid cost function. Experimental results demonstrate the superior performance of our proposed method, achieving a reconstruction rate of 16 fps and a low root mean square error of 0.56 mm at a distance of 0.72 m.
- CVPRStyle Projected Clustering for Domain Generalized Semantic SegmentationWei Huang, Chang Chen, Yong Li, Jiacheng Li, and 4 more authorsIn IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023Creativity Semantic Segmentation Domain Generalization
Existing semantic segmentation methods improve generalization capability, by regularizing various images to a canonical feature space. While this process contributes to generalization, it weakens the representation inevitably. In contrast to existing methods, we instead utilize the difference between images to build a better representation space, where the distinct style features are extracted and stored as the bases of representation. Then, the generalization to unseen image styles is achieved by projecting features to this known space. Specifically, we realize the style projection as a weighted combination of stored bases, where the similarity distances are adopted as the weighting factors. Based on the same concept, we extend this process to the decision part of model and promote the generalization of semantic prediction. By measuring the similarity distances to semantic bases (i.e., prototypes), we replace the common deterministic prediction with semantic clustering. Comprehensive experiments demonstrate the advantage of proposed method to the state of the art, up to 3.6% mIoU improvement in average on unseen scenarios. Code and models are available at https://gitee.com/mindspore/models/tree/master/research/cv/SPC-Net.
- CVPRLearning Steerable Function for Efficient Image ResamplingJiacheng Li*, Chang Chen*, Wei Huang, Zhiqiang Lang, and 3 more authorsIn IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023Efficiency Look-Up Table
Image resampling is a basic technique that is widely employed in daily applications. Existing deep neural networks (DNNs) have made impressive progress in resampling performance. Yet these methods are still not the perfect substitute for interpolation, due to the issues of efficiency and continuous resampling. In this work, we propose a novel method of Learning Resampling Function (termed LeRF), which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption of interpolation methods. Specifically, LeRF assigns spatially-varying steerable resampling functions to input image pixels and learns to predict the hyper-parameters that determine the orientations of these resampling functions with a neural network. To achieve highly efficient inference, we adopt look-up tables (LUTs) to accelerate the inference of the learned neural network. Furthermore, we design a directional ensemble strategy and edge-sensitive indexing patterns to better capture local structures. Extensive experiments show that our method runs as fast as interpolation, generalizes well to arbitrary transformations, and outperforms interpolation significantly, e.g., up to 3dB PSNR gain over bicubic for x2 upsampling on Manga109.
- ICCV WorkshopsPCTrans: Position-Guided Transformer with Query Contrast for Biological Instance SegmentationQi Chen, Wei Huang, Xiaoyu Liu, Jiacheng Li, and 1 more authorIn IEEE/CVF International Conference on Computer Vision Workshops (ICCV Workshops), 2023Creativity Instance Segmentation
Recently, query-based transformer gradually draws attention in segmentation tasks due to its powerful ability. Compared to instance segmentation in natural images, biological instance segmentation is more challenging due to high texture similarity, crowded objects and limited annotations. Therefore, it remains a pending issue to extract meaningful queries to model biological instances. In this paper, we analyze the problem when queries meet biological images and propose a novel Position-guided Transformer with query Contrast (PCTrans) for biological instance segmentation. PCTrans tackles the mentioned issue in two ways. First, for high texture similarity and crowded objects, we incorporate position information to guide query learning and mask prediction. This involves considering position similarity when learning queries and designing a dynamic mask head that takes instance position into account. Second, to learn more discriminative representation of the queries under limited annotated data, we further design two contrastive losses, namely Query Embedding Contrastive (QEC) loss and Mask Candidate Contrastive (MCC) loss. Experiments on two representative biological instance segmentation datasets demonstrate the superiority of PCTrans over existing methods.
2022
- TCSVTReference-Guided Landmark Image Inpainting With Deep Feature MatchingJiacheng Li, Zhiwei Xiong, and Dong LiuIEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), 2022Creativity Image Editing
Despite impressive progress made by recent image inpainting methods, they often fail to predict the original content when the corrupted region contains unique structures, especially for landmark images. Applying similar images as a reference is helpful but introduces a style gap of textures, resulting in color misalignment. To this end, we propose a style-robust approach for reference-guided landmark image inpainting, taking advantage of both the representation power of learned deep features and the structural prior from the reference image. By matching deep features, our approach builds style-robust nearest-neighbor mapping vector fields between the corrupted and reference images, in which the loss of information due to corruption leads to mismatched mapping vectors. To correct these mismatched mapping vectors based on the relationship between the uncorrupted and corrupted regions, we introduce mutual nearest neighbors as reliable anchors and interpolate around these anchors progressively. Finally, based on the corrected mapping vector fields, we propose a two-step warping strategy to complete the corrupted image, utilizing the reference image as a structural “blueprint”, avoiding the style misalignment problem. Extensive experiments show that our approach effectively and robustly assists image inpainting methods in restoring unique structures in the corrupted image.
- ECCVMuLUT: Cooperating Multiple Look-Up Tables for Efficient Image Super-ResolutionJiacheng Li*, Chang Chen*, Zhen Cheng, and Zhiwei XiongIn European Conference on Computer Vision (ECCV), 2022Efficiency Look-Up Table
The high-resolution screen of edge devices stimulates a strong demand for efficient image super-resolution (SR). An emerging research, SR-LUT, responds to this demand by marrying the look-up table (LUT) with learning-based SR methods. However, the size of a single LUT grows exponentially with the increase of its indexing capacity. Consequently, the receptive field of a single LUT is restricted, resulting in inferior performance. To address this issue, we extend SR-LUT by enabling the cooperation of Multiple LUTs, termed MuLUT. Firstly, we devise two novel complementary indexing patterns and construct multiple LUTs in parallel. Secondly, we propose a re-indexing mechanism to enable the hierarchical indexing between multiple LUTs. In these two ways, the total size of MuLUT is linear to its indexing capacity, yielding a practical method to obtain superior performance. We examine the advantage of MuLUT on five SR benchmarks. MuLUT achieves a significant improvement over SR-LUT, up to 1.1 dB PSNR, while preserving its efficiency. Moreover, we extend MuLUT to address demosaicing of Bayer-patterned images, surpassing SR-LUT on two benchmarks by a large margin.
- CVPRContextual Outpainting with Object-Level Contrastive LearningJiacheng Li, Chang Chen, and Zhiwei XiongIn IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022Creativity Image Editing Generative AI
We study the problem of contextual outpainting, which aims to hallucinate the missing background contents based on the remaining foreground contents. Existing image outpainting methods focus on completing object shapes or extending existing scenery textures, neglecting the semantically meaningful relationship between the missing and remaining contents. To explore the semantic cues provided by the remaining foreground contents, we propose a novel ConTextual Outpainting GAN (CTO-GAN), leveraging the semantic layout as a bridge to synthesize coherent and diverse background contents. To model the contextual correlation between foreground and background contents, we incorporate an object-level contrastive loss to regularize the learning of cross-modal representations of foreground contents and the corresponding background semantic layout, facilitating accurate semantic reasoning. Furthermore, we improve the realism of the generated background contents via detecting generated context in adversarial training. Extensive experiments demonstrate that the proposed method achieves superior performance compared with existing solutions on the challenging COCO-stuff dataset.
- MICCAIMask Rearranging Data Augmentation for 3D Mitochondria SegmentationQi Chen, Mingxing Li, Jiacheng Li, Bo Hu, and 1 more authorIn International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2022Creativity Instance Segmentation
3D mitochondria segmentation in electron microscopy (EM) images has achieved significant progress. However, existing learning-based methods with high performance typically rely on extensive training data with high-quality manual annotations, which is time-consuming and labor-intensive. To address this challenge, we propose a novel data augmentation method tailored for 3D mitochondria segmentation. First, we train a Mask2EM network for learning the mapping from the ground-truth instance masks to real 3D EM images in an adversarial manner. Based on the Mask2EM network, we can obtain synthetic 3D EM images from arbitrary instance masks to form a sufficient amount of paired training data for segmentation. Second, we design a 3D mask layout generator to generate diverse instance layouts by rearranging volumetric instance masks according to mitochondrial distance distribution. Experiments demonstrate that, as a plug-and-play module, the proposed method boosts existing 3D mitochondria segmentation networks to achieve state-of-the-art performance. Especially, the proposed method brings significant improvements when training data is extremely limited. Code will be available at: https://github.com/qic999/MRDA_MitoSeg.
2020
- ACM MMSemantic Image Analogy with a Conditional Single-Image GANJiacheng Li, Zhiwei Xiong, Dong Liu, Xuejin Chen, and 1 more authorIn ACM International Conference on Multimedia (ACM MM), 2020Creativity Image Editing Generative AI
Recent image-specific Generative Adversarial Networks (GANs) provide a way to learn generative models from a single image instead of a large dataset. However, the semantic meaning of patches inside a single image is less explored. In this work, we first define the task of Semantic Image Analogy: given a source image and its segmentation map, along with another target segmentation map, synthesizing a new image that matches the appearance of the source image as well as the semantic layout of the target segmentation. To accomplish this task, we propose a novel method to model the patch-level correspondence between semantic layout and appearance of a single image by training a single-image GAN that takes semantic labels as conditional input. Once trained, a controllable redistribution of patches from the training image can be obtained by providing the expected semantic layout as spatial guidance. The proposed method contains three essential parts: 1) a self-supervised training framework, with a progressive data augmentation strategy and an alternating optimization procedure; 2) a semantic feature translation module that predicts transformation parameters in the image domain from the segmentation domain; and 3) a semantics-aware patch-wise loss that explicitly measures the similarity of two images in terms of patch distribution. Compared with existing solutions, our method generates much more realistic results given arbitrary semantic labels as conditional input.