Rendering & Generative AI
Toward Efficiency: Neural Rendering
A significant shift is underway in computer graphics, with traditional rendering pipelines being reimagined through novel computational architectures. Technologies like DLSS, FSR, and PSSR1 exemplify this trend, leveraging machine learning to break the long-standing trade-off between rendering cost and visual quality. My current research operates at this frontier, where I architect and integrate novel learning-based components directly into the rendering pipeline. The objective is to significantly boost both efficiency and visual fidelity for demanding applications, including real-time interactive graphics and scalable cloud-based rendering services.
References
Toward Creativity: Image Editing
My exploration into generative AI and image editing commenced with a compelling challenge focused on Dunhuang Image Inpainting, part of the e-heritage workshop1 at ICCV 2019. The objective was to restore ancient paintings by filling in missing regions using an edge-guided contextual attention mechanism. Our team was honored with the WINNER🏆 prize in this challenge.
In the same year, the best paper award at ICCV 2019 for SinGAN2 captured my attention, particularly its novel approach to training Generative Adversarial Networks (GANs) on a single image without requiring paired data. This insight directly motivated my initial research project focused on leveraging single-image GANs to empower novel image editing capabilities. We conceptualized this work as “Semantic Image Analogy”, a tribute to the foundational “Image Analogy”3 paper.
Subsequently, my research delved deeper into understanding and leveraging spatial correlations within and between images. This led me to revisit inpainting with the RefMatch project, where we utilized reference images to extract fine-grained structural details for high-fidelity completion of missing regions. A distinctive aspect of was its reliance on pre-trained Deep Neural Networks (DNNs) solely as feature extractors, eschewing an explicit learning phase. Instead, pattern recognition was achieved through a multi-scale nearest neighbor search approach, which is kind of rebellious at that time when deep learning was dominating the field.
Continuing this exploration of correlations, I initiated the Contextual Outpainting project. Here, I investigated the semantic relationships between different parts within an image, employing techniques such as VAEs (Variational Autoencoders) and Contrastive Learning. This line of inquiry has since been extended to incorporate later advancements like LoRA Adaptors and Stable Diffusion models. Additionally, I contributed to research on image retouching guided by sparse, interactive user instructions, a system that utilized cross-attention mechanisms and MoEs (Mixture of Experts).
My experience also extends to utilizing single-step/few-step diffusion models for image enhancement, with a particular focus on facial images. Moving forward, my interest in this domain is expanding from the manipulation of 2D images to tackling the exciting challenges of generating and editing 3D assets🎨.
References
-
ICCV Workshop on eHeritage, 2019 ↩
-
Learning a Generative Model from a Single Natural Image, in ICCV 2019 ↩
-
Image Analogies, in SIGGPRAPH 2001 ↩