Streaming & Display
Toward Efficiency: Learned Look-Up Tables (LUTs)
My deep dive into Learned Look-Up Tables (LUTs)—a fundamental data structure in the image signal pipeline—began with an appreciation for the pioneering SR-LUT1 paper. It demonstrated the remarkable potential of achieving interpolation-level efficiency for super-resolution by converting a compact neural network into a LUT. However, its limitation to fixed upsampling factors presented a clear challenge for the field: how to imbue learned LUTs with the critical property of continuity to truly rival classical interpolation.
Our initial attempts, inspired by MetaSR2, revealed that directly predicting interpolation weights was infeasible with the limited capacity of a small receptive field. This led to our first key contribution: a method for expanding model capacity by orchestrating multiple LUTs in concert, much like layers in a deep network. This approach effectively overcame the exponential size growth of a single LUT, culminating in our publications on MuLUT and its more advanced successor, DNN-of-LUTs.
A subsequent breakthrough was born from a moment of insight while re-examining classic operators. While diving deeper to the origin of the coefficients of Bicubic interpolation, I realized they are derived from underlying assumptions of smoothness and continuity under a cubic formulation (i.e., the resampling function is a cubic polynomial). This sparked a new idea: instead of predicting interpolation weights from scratch, why not predict the hyperparameters that define these constraints? This concept formed the basis for our research on Learning Resampling Function (LeRF). Drawing inspiration from the seminal work on Steering Kernel Regression3, which notably powers Google Camera’s Super Res Zoom feature4, we proposed a learning-based method, built upon MuLUT, to adaptively learn the hyperparameters that shape the resampling functions for different spatial locations. LeRF delivered on the original promise: a learned, continuous alternative to classical interpolation that runs as fast as interpolation, generalizes well to arbitrary transformations, and outperforms it significantly, achieving up to a 3dB PSNR gain over Bicubic. We later refined this framework in LeRF++.
Another line of our research, Diagonal First Compression (DFC), was inspired by a unexpected discovery. I discovered an implementation bug in previous works while applying MuLUT to image demoisaicing. This bug, which never appears in super-resolution, highlighted a diagonal dominance property in the activation patterns during LUT lookups. This observation aligns with the known low-manifold distribution of natural image data5, indicating inherent redundancy in the learned LUTs. Consequently, we developed a diagonal-first compression technique that significantly reduces LUT size (achieving up to 10x compression) while preserving performance.
The interplay between the LUT trilogy (MuLUT, LeRF, and DFC), particularly concerning LUT size, can be understood through the following formulation:
Beyond image processing, our work on learned LUTs has also extended into the domain of video coding with the development of ILF-LUT. This method integrates learned LUTs into the video codec pipeline, specifically as an in-loop filter. ILF-LUT has demonstrated substantial improvements over existing in-loop filtering parts in the latest Versatile Video Coding (VVC) standard, offering a way to integrate learned components into the video codec pipeline. In the field of image and video codecs, beyond learned LUTs, I also mentored a work for Versatile Compressed Video Enhancement that takes the advantages of codec priors like motion vectors. Finally, I also contributed to a robust framework for All-in-One Image Compression and Restoration.
The concept of learned LUTs is gaining increasing traction within the research community. We are witnessing more advanced developments, exploration of new application scenarios (such as video quality enhancement6), and deployment on diverse hardware platforms (including FPGAs7). Looking ahead, I am dedicated to further expanding the applications of learned LUTs and developing more generalizable and powerful LUT-based components🚀.
References
-
Practical Single-Image Super-Resolution Using Look-Up Table, in CVPR 2021 ↩
-
Meta-SR: A Magnification-Arbitrary Network for Super-Resolution, in CVPR 2019 ↩
-
Kernel Regression for Image Processing and Reconstruction, in T-IP 2007 ↩
-
See Better and Further with Super Res Zoom on the Pixel 3, Google Research Blog ↩
-
On the local behavior of spaces of natural images, in IJCV 2008 ↩
-
Online Video Quality Enhancement with Spatial-Temporal Look-up Tables, in ECCV 2024 ↩
-
An Energy-Efficient Look-up Table Framework for Super Resolution on FPGA, in AICAS 2024 ↩
Toward Faithfulness: HDR Display
High Dynamic Range (HDR) imaging and precise color reproduction are fundamental to creating visually faithful and realistic media. My own deep dive into this field was sparked by the impressive “Live HDR+”1 feature on Google Pixel Phones, a powerful application of the seminal HDRNet framework2. This experience ignited a deeper interest in the entire HDR tech ecosystem, from capture techniques like bracketing exposure and staggered pixels to advanced display standards such as PQ (Perceptual Quantizer), HLG (Hybrid Log-Gamma), and Dolby Vision. This interest also led to an upgrade💰 of my home cinema setup.
The comprehensive adoption of HDR across both content capture and display is an inevitable technological progression. Since late 2023, the industry has converged on a powerful solution for distributing HDR images with backward compatibility: the use of HDR gain maps. This supplementary metadata allows a single file to be rendered correctly on both Standard Dynamic Range (SDR) and HDR displays, an approach now championed by industry leaders like Apple3, Google4, and Adobe5. [2025 Update: This is now part of the emerging ISO 21496 standard6.]
My research directly tackles this transition by developing learning-based tools to enable seamless HDR adoption. I have mentored two key projects in this domain:
- MLP Embedded Inverse Tone Mapping (ITM): We developed a framework that embeds a lightweight, per-image MLP network as “neural metadata” within a standard SDR file. This allows for high-fidelity Inverse Tone Mapping on HDR screens, effectively restoring the content’s original dynamic range.
- Learning Gain Maps for ITM: To bring the benefits of HDR to legacy content, this project developed a neural network capable of predicting gain maps for existing SDR images. This approach expands their dynamic range for compelling HDR presentation. As a key contribution, we also curated a new real-world dataset to drive further research and development.
Looking forward, my goal is to push the boundaries of visual faithfulness further. I plan to extend HDR principles into new dimensions, exploring temporal consistency for video and view-dependent effects for truly immersive and realistic visual experiences.
References
-
Live HDR+ and Dual Exposure Controls on Pixel 4 and 4a, Google Research Blog ↩
-
Deep bilateral learning for real-time image enhancement, in SIGGRPAH 2017 ↩